Monitor global infrastructure using Datadog and SolarWinds, triage and resolve L1 incidents, escalate complex issues, participate in incident response and post-incident reviews, maintain SOPs and ServiceNow tickets, support automation with basic scripting, and provide weekend on-call coverage.
JOB DESCRIPTION
- Monitor Sysco’s global infrastructure and systems using tools such as Datadog, SolarWinds, and other enterprise monitoring platforms.
- Detect, triage, and respond to incidents proactively before customer or business impact.
- Independently resolve:
- Server performance issues
- Monitoring agent issues
- Basic infrastructure and system alerts
- Escalate major incidents, complex infrastructure issues, and application-related incidents to L2/L3 teams in line with SOPs and SLAs.
- Ensure initial response and resolution targets are met for all priority levels.
- Participate in incident bridge calls and coordinate with internal and external stakeholders.
- Perform initial investigations and document findings to support faster resolution.
- Contribute to post-incident reviews and root cause analysis, including analysis via Datadog Watchdog.
- Follow and execute Standard Operating Procedures (SOPs) for known incidents.
- Maintain accurate documentation and ticket updates in ServiceNow.
- Support initiatives to improve First-Time Resolution (FTR) and reduce MTTR.
- Contribute to project-level operational improvements and initiatives tracked in Jira.
- Apply basic scripting or automation knowledge where applicable to support monitoring improvements and operational efficiency.
- Actively participate in knowledge sharing and continuous learning initiatives.
- Standard shift: Monday to Friday, from 10:30 AM to 7:30 PM CST
- Weekend on-call coverage required (one day per weekend, 10:30 AM – 7:30 PM CST; monthly shift rotation defined based on business needs, with prior notification provided by the team manager).
- Bachelor’s degree in Information Technology or equivalent experience.
- 2 years of experience in Operations Engineering, NOC, SRE, or similar roles.
- Strong understanding of:
- Windows Server and/or UNIX/Linux environments
- Networking fundamentals (LAN/WAN, TCP/IP, DHCP, firewalls, routing)
- Experience with an enterprise ticketing tool (e.g., ServiceNow,Jira).
- Strong communication skills in English and ability to work under pressure.
- Willingness to work in a Weekend on-call coverage required
- Excellent communication skills in English (B2+ or higher) and ability to collaborate across functions and geographies.
- Experience with Datadog, SolarWinds, or similar monitoring platforms.
- Exposure to AWS, Azure, or GCP.
- Familiarity with Jira for tracking initiatives and projects.
- ITIL certification or hands-on experience with ITIL practices.
- Basic scripting or automation knowledge (e.g., PowerShell, Bash, Python).
Benefits:
- This is a hybrid position based in Ultra Park II, Lagunilla (Heredia). On-site presence is required only when necessary, such as for meetings, trainings, or collaborative activities, in alignment with the company’s telework agreement, which currently requires employees to work on-site three (3) days per week)
- Private Medical Insurance
- Asociacion Solidarista
- Life Insurance
- Personal Day Off
Note: Only candidates with Costa Rican nationality or valid immigration status will be considered; applicants residing outside Costa Rica will not be considered, and relocation is not available
Similar Jobs
Productivity • Software • App development • Automation
Run pipeline, lifecycle, and demand programs to drive multi-seat B2B SaaS conversions. Build and execute full-funnel campaigns, manage HubSpot workflows and reporting, partner with sales on account targeting, and run customer advocacy, review-generation, and content initiatives to grow pipeline and bookings.
Top Skills:
Ai ToolsAutomation PlatformsCanvaCapterraFigmaG2HubspotMartech
Artificial Intelligence • Hardware • Healthtech • Software
The VP of Quality leads the development and maintenance of the Quality Management System (QMS), ensures compliance with ISO 13485, collaborates with engineering on product quality, and develops a high-performing quality team.
Top Skills:
CapaFmeaIec 62304Iso 13485Plm Software
Blockchain • eCommerce • Fintech • Payments • Software • Financial Services • Cryptocurrency
Design, build, and operate production ML decision systems to detect and prevent payment fraud, account takeover, scams, and other abuse. Integrate diverse signals into low-latency serving and batch scoring, own feature pipelines and model lifecycle, develop AI-assisted triage and feedback loops, and partner cross-functionally to balance fraud reduction with legitimate customer access.
Top Skills:
Cloud InfrastructureData LakehouseData WarehouseEmbeddingsFeature StoreJavaKafkaKotlinKubernetesLightgbmModel ServingMonitoringObservabilityPythonPyTorchSQLTensorFlowWorkflow OrchestrationXgboost
What you need to know about the Delhi Tech Scene
Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.



