This role involves managing and automating infrastructure, deploying applications, ensuring system availability on cloud platforms, and collaborating with cross-functional teams.
Position: Site Reliability Engineer
Job Summary
As a Site Reliability Engineer, you will play a critical role in ensuring the availability and performance of our customer-facing platform. You will work closely with DevOps, DBA, and Development teams to provision and maintain infrastructure, deploy and monitor our applications, and automate workflows. Your contributions will have a direct impact on customer satisfaction and overall experience.
Responsibilities and Deliverables
- Manage, monitor, and maintain highly available systems (Windows and Linux)
- Analyze metrics and trends to ensure rapid scalability.
- Address routine service requests while identifying ways to automate and simplify.
- Create infrastructure as code using Terraform, ARM Templates, Cloud Formation.
- Maintain data backups and disaster recovery plans.
- Design and deploy CI/CD pipelines using GitHub Actions, Octopus, Ansible, Jenkins, Azure DevOps.
- Adhere to security best practices through all stages of the software development lifecycle
- Follow and champion ITIL best practices and standards.
- Become a resource for emerging and existing cloud technologies with a focus on AWS.
Organizational Alignment
- Reports to the Senior SRE Manager
- This role involves close collaboration with DevOps, DBA, and security teams.
Technical Proficiencies
- Hands-on experience with AWS is a must-have.
- Proficiency analyzing application, IIS, system, security logs and CloudTrail events
- Practical experience with CI/CD tools such as GitHub Actions, Jenkins, Octopus
- Experience with observability tools such as New Relic, Application Insights, AppDynamics, or DataDog.
- Experience maintaining and administering Windows, Linux, and Kubernetes.
- Experience in automation using scripting languages such as Bash, PowerShell, or Python.
- Configuration management experience using Ansible, Terraform, Azure Automation Run book or similar.
- Experience with SQL Server database maintenance and administration is preferred.
- Good Understanding of networking (VNET, subnet, private link, VNET peering).
- Familiarity with cloud concepts including certificates, Oauth, AzureAD, ASE, ASP, AKS, Azure Apps, Load Balancers, Application Gateway, Firewall, Load Balancer, API Management, SQL Server, Databases on Azure
Experience
- 5+ years of experience in SRE or System Administration role
- Demonstrated ability building and supporting high availability Windows/Linux servers, with emphasis on the WISA stack (Windows/IIS/SQL Server/ASP.net)
- 3+ years of experience with CI/CD tools
- 3+ years of experience working with cloud technologies including AWS, Azure.
- 1+ years of experience working with container technology including Docker and Kubernetes.
- Comfortable using Scrum, Kanban, or Lean methodologies.
Education
- Bachelor’s Degree or College Diploma in Computer Science, Information Systems, or equivalent experience.
Similar Jobs
Cloud • Information Technology • Internet of Things • Software • Consulting • Infrastructure as a Service (IaaS) • Automation
Lead customer-facing site reliability engineering for OpenShift managed cloud services: troubleshoot escalations, automate operations, run incident response and postmortems, mentor team members, and drive reliability, scalability, and customer satisfaction across cloud platforms.
Top Skills:
AnsibleAWSAzureContainersGCPGo (Golang)KubernetesLinuxOpenshiftPrometheusTcp/IpTerraform
Fintech • Financial Services
Lead architecture, build, and scale Sleek's cloud infrastructure and AI-ready platform. Improve reliability, observability, CI/CD, IaC, security, and incident response. Integrate and support AI/ML workloads, mentor engineers, and drive DevOps and platform enhancements to increase velocity and operational excellence.
Top Skills:
ArgocdAWSAzureBlue/GreenCanaryCi/CdCloudflareCloudFormationCloudwatchEbpfEcsEksElkEmbeddingsFalcoFluxGCPGitopsGpu WorkloadsKongKubernetesModel HostingNestjsNode.jsOpensearchOpentelemetryProgressive DeliveryPrometheusPulumiPythonTerraformTraefikVector DatabaseWaf
Cloud • Enterprise Web • Hardware • Information Technology • Internet of Things • Robotics • Semiconductor
Senior technical leader defining SRE strategy across multi-cloud (AWS/GCP) infrastructure. Establish reliability standards, SLIs/SLOs, observability, CI/CD guardrails, and deployment safety. Drive architecture and production-readiness reviews, incident response, and cross-team collaboration to ensure large-scale, multi-region platform availability supporting millions of IoT devices.
Top Skills:
AWSBlue/Green DeploymentsCanary DeploymentsCi/CdDatadogGCPGrafanaIotJavaKubernetesMtlsOpentelemetryProgressive DeliveryPrometheusPythonSsl/TlsTerraform
What you need to know about the Delhi Tech Scene
Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.


.jpeg)