CloudRaft Jobs

Site Reliability Engineer(SRE)

CloudRaft

Site Reliability Engineer(SRE)

Reposted 22 Days Ago

Remote

Hiring Remotely in India

Mid level

Remote

Hiring Remotely in India

Mid level

The Site Reliability Engineer (SRE) will manage cloud-native infrastructure, develop CI/CD pipelines, and ensure system reliability using best practices and automation tools.

The summary above was generated by AI

Note:

Immediate joiners or candidates who can join within 7 days only to apply. Folks who can start working with us from 15th June 2026 will be given priority. If you have already applied to CloudRaft in the last 90 days, we already have your CV/resume on file. Multiple applications from the same candidate will not be considered.

About CloudRaft

CloudRaft is a premier cloud-native consulting and engineering company that helps ambitious startups and digital-first organizations build, scale, and operate mission-critical platforms. We partner with innovators at the forefront of artificial intelligence, developer productivity, observability, digital commerce, and enterprise software—enabling them to accelerate growth with resilient, scalable, and production-ready cloud infrastructure.

Our experience spans organizations developing AI safety and governance platforms, AI Cloud, AI agent ecosystems, developer tooling, observability solutions, digital health products, customer engagement platforms, and technology-driven franchise networks. By combining deep expertise in Platform Engineering, Kubernetes, DevOps, Observability, and Cloud Native technologies, CloudRaft helps high-growth companies move faster, operate more reliably, and focus on building category-defining products.

Job Description

We are looking for passionate Site Reliability Engineers (SREs) to join our growing team. In this role, you will take end-to-end ownership of designing, building, operating, and scaling mission-critical infrastructure for our partners. You will be responsible for ensuring reliability, performance, security, and operational excellence while driving automation, improving system efficiency, and implementing innovative solutions. Working at the intersection of software engineering and operations, you will help create resilient platforms that enable fast-growing organizations to scale with confidence.

Responsibilities

Manage and maintain Kubernetes clusters across cloud platforms, including OpenShift, Amazon EKS, Azure AKS, and Google GKE.
Implement and manage CI/CD pipelines using tools such as Jenkins, GitHub Actions, Argo CD, or GitLab CI/CD.
Design and maintain observability stacks with tools including Prometheus, Grafana, Loki, OpenTelemetry, and related technologies. Be part of the team who support open source projects like Prometheus, Thanos, Mimir, CloudNativePG, Istio and more.
Optimize system performance and resolve production issues. Be part of the on call roster to provide 24x7 coverage for the critical production systems.
Implement SRE principles, including Service Level Indicators (SLIs) and Service Level Objectives (SLOs), to uphold system reliability.
Automate infrastructure and operational tasks using programming languages such as Go or Python, and Infrastructure as Code (IaC) tools like Terraform.
Apply agentic AIto automate the SDLC lifecycle, AIOps and automation.
Learn about emerging technologies, including AI, GPU Infrastructure
Contribute to knowledge sharing through technical writing and presentations.

Qualifications

Bachelor’s degree in Computer Science, Information Technology, or a related field.
2-5 years of experience in SRE, Platform Engineering, or DevOps Engineer.
Strong expertise in Kubernetes, cloud-native technologies, on-premise and major cloud platforms (AWS, Azure, GCP).
Proficiency in programming languages such as Python or Go or Node.js.
Familiarity with CI/CD tools and modern deployment practices.
Proficiency in one or more open source observability stacks and Infrastructure as Code (Terraform/Pulumi).
CKA/CKAD Certified (Brownie points!)
Excellent problem-solving abilities and communication skills.
Inclination toward open-source contributions is advantageous.

Benefits :

- Competitive salary

- Premium health insurance and various health & wellness benefits from a leading insurance provider through Plum

- Opportunity to work on the latest AI stack and GPU infrastructure

- Collaborative and supportive work environment full of learning

- Chance to take a front seat where you lead and deliver

Similar Jobs

CrowdStrike

Site Reliability Engineer

9 Hours Ago

Remote or Hybrid

India

Senior level

Cloud • Computer Vision • Information Technology • Sales • Security • Cybersecurity

Lead a distributed SRE team owning CI/CD platform reliability, automation, observability, and data infrastructure. Provide people management, technical direction, architecture input, operational excellence, and cross-team collaboration while driving automation, monitoring, and AI-assisted workflows.

Top Skills: AnsibleApache AirflowSparkAWSAzureBashBazelBitbucketChefDatadogGCPGitGithub ActionsGitlabGitlab CiGoGrafanaHumio/LogscaleJenkinsKafkaKubernetesNasNfsObject StorageOpensearchOraclePostgresPowershellPrometheusPulsarPuppetPythonRedisRedpandaSanSli/SloSplunkTerraformValleyVarnish

Circle (circle.so)

Senior Site Reliability Engineer

5 Days Ago

Easy Apply

Remote

India

Easy Apply

Senior level

Artificial Intelligence • Consumer Web • Digital Media • Information Technology • Social Impact • Software

Lead SRE work to keep Circle highly available and performant: respond to incidents, own monitoring/alerting/log management, manage and optimize MySQL/Postgres/ClickHouse/Redis databases, maintain server infrastructure and deployment pipelines, collaborate with engineering teams, and build internal SRE tooling and automation.

Top Skills: AWSClickhouseKubernetesLlm-Based Tools (Copilots)MySQLPostgresRedis

Coupa

Site Reliability Engineer

12 Days Ago

Remote

India

Senior level

Artificial Intelligence • Fintech • Information Technology • Logistics • Payments • Business Intelligence • Generative AI

The Lead Site Reliability Engineer will build, deploy, and manage microservices in Kubernetes, optimize cloud applications, and integrate emerging technologies in AI and GenAI, ensuring high reliability and scalability.

Top Skills: Amazon EksAWSAzureBashChefGCPGithub ActionsHelmKubernetesMySQLNew RelicPagerdutyPythonRundeckTerraform

What you need to know about the Delhi Tech Scene

Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.