SAFE Security

Site Reliability Engineer II

Reposted 8 Days Ago

Be an Early Applicant

New Delhi, Delhi

Senior level

New Delhi, Delhi

Senior level

As a Site Reliability Engineer II, you will ensure uptime and reliability of cloud environments, troubleshoot incidents, automate processes, and work collaboratively to improve system performance.

The summary above was generated by AI

At SAFE Security, our vision is to be the Champions of a Safer Digital Future and the Catalysts of Change. We believe in empowering individuals and teams with the freedom and responsibility to align their goals, ensuring we all move forward together.

We operate with radical transparency, autonomy, and accountability—there’s no room for brilliant jerks. We embrace a culture-first approach, offering an unlimited vacation policy, a high-trust work environment, and a commitment to continuous learning. For us, Culture is Our Strategy—check out our Culture Memo to dive deeper into what makes SAFE unique.

Job Overview:

As a Site Reliability Engineer, you will be responsible for providing the platform for our mission-critical cloud platform, which must maintain constant uptime, scale seamlessly, and allow new services and features to flourish.

The successful candidate will be highly self-motivated with a passion for excellence, quality and detail. SRE will not only support operations but also work closely with the developers and architects within SAFE to aid in product design and assist with the implementation to improve stability, security, and scalability.

Core Responsibilities:

Operate, monitor, and triage all aspects of our production environments to achieve our SLA and SLOs as part of a 24x7 on-call team.
Troubleshoot complicated, cross-platform issues handling OS, Networking, and databases in a cloud-based SaaS environment, handle live production incidents, debug/troubleshoot application and infrastructure issues, and follow and implement SRE best practices.
Design, build, and implement innovative solutions for previous, present, and future issues.
Prepare alert handling procedures, runbooks, etc., for common tasks and Incidents.
Automate deployment and orchestration of services into the cloud environment as well as other routine processes.
Actively participate in capacity planning, scale testing, and disaster recovery exercises.
Interact with and support partner teams, including engineering, QA, and CSE, to improve system reliability.
Conduct thorough RCA (Root Cause Analysis) for all production incidents: Identify root causes, document findings, publish incident summaries, and develop preventative actions to mitigate future occurrences.
Contribute to Infra architecture and non-functional requirements, ensuring they fit into a cohesive vision aligned with the rest of the platform's Technology roadmap for the launch.
Propagate SRE culture across the organization by sharing industry best practices, standards, approaches, documentation, and code with other engineering teams.

Qualifications/ Essentials Skills/ Experience:

Demonstrable experience in managing and maintaining high availability services based on AWS cloud infrastructure (minimum 5+ years).
Demonstrable Experience in cloud environments AWS and container technology, Docker and Kubernetes.
Demonstrable experience in managing and monitoring large-scale queueing technologies such as RabbitMQ or Kafka.
Hands-on experience in provisioning Infrastructure as Code (IaC) using Terraform Enterprise/OpenTofu/CDK.
Experience in CI/CD pipelines using GitHub Actions and Jenkins.
Valid AWS Associate level or higher certification
Experience in AWS Networking (VPC, Network Firewall, NACLs, SGs, TGW, DirectConnect), Route 53, HAProxy, Fargate Firewalls.
Experience in programming/scripting in Python for at least 3+ years.
Experience in monitoring and analyzing infrastructure performance using standard performance monitoring tools - Grafana/Prometheus, DataDog, Splunk, New Relic, etc.
Experience with Operational tools such as PagerDuty, Jira Service Management / ZenDesk, etc.

If you’re passionate about cyber risk, thrive in a fast-paced environment, and want to be part of a team that’s redefining security—we want to hear from you! 🚀

Top Skills

AWS

Datadog

Docker

Github Actions

Grafana

Jenkins

JIRA

Kubernetes

New Relic

Pagerduty

Prometheus

Python

Splunk

Terraform

Similar Jobs

CrowdStrike

Sr. Engineer - Observability Tracing (Remote, IND)

8 Hours Ago

Remote

Hybrid

Senior level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

As a Sr. Engineer - Observability, you will enhance monitoring and tracing, design tracing across microservices, and build Kubernetes operators.

Top Skills: AWSBashGCPGoJaegerKubernetesOpentelemetryPythonSentry

McCain Foods

Sr Mgr I&O - Cloud, DevSecOps, SRE & Obs

Yesterday

New Delhi, Delhi, IND

Senior level

Food • Retail • Agriculture • Manufacturing

Lead the infrastructure and technology strategy for cloud and DevSecOps, ensuring operational excellence and driving digital transformation efforts. Manage global application operations, oversee service delivery, and modernize security and infrastructure to enhance organizational efficiency and customer experience.

Top Skills: AnsibleAppdynamicsAWSAzureCheckmarxCloudComputeCoverityDevsecopsDnsElasticGCPGoIamJenkinsKubernetesNew RelicObservabilityPuppetPythonS3SreStorageTerraformVeracodeVpcVpnZaproxy

CrowdStrike

Sr Engineer - Sensor SDET (Remote)

2 Days Ago

Remote

Hybrid

Senior level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

Design, develop, and maintain automation frameworks and CI/CD pipelines, lead testing efforts across platforms, and mentor automation team members.

Top Skills: Aws Ec2Aws S3BashC++DockerGroovyJenkinsKubernetesPowershellPython

What you need to know about the Delhi Tech Scene

Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.

By clicking Apply you agree to share your profile information with the hiring company.