HighLevel

Site Reliability Engineer

Posted 3 Days Ago

Be an Early Applicant

Remote

Hiring Remotely in Delhi, Connaught Place, New Delhi, Delhi

Mid level

Remote

Hiring Remotely in Delhi, Connaught Place, New Delhi, Delhi

Mid level

Join HighLevel as a Site Reliability Engineer to ensure system performance and scalability. Automate processes, enhance reliability, and collaborate with developers on observability and incident management.

The summary above was generated by AI

About HighLevel:

HighLevel is a cloud-based, all-in-one white-label marketing and sales platform that empowers marketing agencies, entrepreneurs, and businesses to elevate their digital presence and drive growth. With a focus on streamlining marketing efforts and providing comprehensive solutions, HighLevel helps businesses of all sizes achieve their marketing goals. We currently have ~1200 employees across 15 countries, working remotely as well as in our headquarters, which is located in Dallas, Texas. Our goal as an employer is to maintain a strong company culture, foster creativity and collaboration, and encourage a healthy work-life balance for our employees wherever they call home.

Our Website - https://www.gohighlevel.com/

YouTube Channel - https://www.youtube.com/channel/UCXFiV4qDX5ipE-DQcsm1j4g

Blog Post - https://blog.gohighlevel.com/general-atlantic-joins-highlevel/

Our Customers:

HighLevel serves a diverse customer base, including over 60K agencies & entrepreneurs and 500K businesses globally. Our customers range from small and medium-sized businesses to enterprises, spanning various industries and sectors.

Scale at HighLevel:

We operate at scale, managing over 40 billion API hits and 120 billion events monthly, with more than 500 micro-services in production. Our systems handle 200+ terabytes of application data and 6 petabytes of storage.

About the Role:

We are looking for a Site Reliability Engineer to join our team and help ensure the availability, performance, and scalability of our critical systems. You will work closely with development and operations teams to automate processes, enhance system reliability, and improve observability.

Requirements:

Experience: 4+ years in Site Reliability Engineering, DevOps, or Cloud Infrastructure roles
Cloud Expertise: Hands-on experience with GCP and AWS
Infrastructure as Code (IaC): Terraform, Helm, or equivalent tools
Containerisation & Orchestration: Docker, Kubernetes (GKE)
Observability: Experience with Prometheus, Grafana, ELK, OpenTelemetry, or similar monitoring/logging tools
Programming/Scripting: Proficiency in Python, Bash, or Shell scripting. Basic understanding of API parsing and JSON manipulation
CI/CD Pipelines: Hands-on experience with Jenkins, GitHub Actions, ArgoCD, or similar tools
Incident Management: Experience with on-call rotations, SLOs, SLIs, SLAs, Escalation Policies, and incident resolution
Databases: Experience in monitoring MongoDB, Redis, ES, Queue based etc

Responsibilities:

Develop and improve observability using monitoring, logging, tracing, and alerting tools (Prometheus, Grafana, ELK, OpenTelemetry, etc.)
Optimize system performance, troubleshoot incidents, and conduct post-mortems/RCA to prevent future issues
Collaborate with developers to enhance application reliability, scalability, and performance
Drive cost optimisation efforts in cloud environments.
Monitor multiple databases (MongoDB, Redis, ES, Queue based etc.)

EEO Statement:

The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government recordkeeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.

#LI-Remote

#LI-HB1

Top Skills

Argocd

AWS

Bash

Docker

Elk

GCP

Github Actions

Grafana

Helm

Jenkins

Kubernetes

MongoDB

Opentelemetry

Prometheus

Python

Queue

Redis

Shell Scripting

Terraform

Similar Jobs

Centene Corporation

Site Reliability Engineer II

19 Hours Ago

Remote

Junior

Healthtech

The Site Reliability Engineer II manages platform infrastructure performance and reliability, implements automation, troubleshoots incidents, and collaborates with teams for solutions.

Top Skills: AWSBitbucketGitlabPython

Rackspace Technology

Site Reliability Engineer / Observability Engineer

9 Days Ago

Remote

India

Senior level

Cloud • Information Technology • Software

The Site Reliability Engineer will implement observability solutions, develop monitoring tools, and collaborate on system performance to enhance application reliability.

Top Skills: AnsibleAppdynamicsAWSChefCloud FormationDatadogDynatraceGitLinux ShellNew RelicPerlPHPPuppetPythonRubySignalfxSplunkTerraform

Checkmate (itsacheckmate.com)

Site Reliability Engineer

2 Days Ago

Remote

India

Mid level

Artificial Intelligence • Food • Software

The Site Reliability Engineer ensures system reliability and availability through monitoring and automation while collaborating with teams for capacity planning and feature design.

Top Skills: AWSAzureDockerElk StackGCPGoGrafanaKubernetesPrometheusPythonShellTerraform

What you need to know about the Delhi Tech Scene

Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.

By clicking Apply you agree to share your profile information with the hiring company.