About Stem - Driven by human and artificial intelligence – Stem is unlocking energy intelligence.
Stem is a global leader reimagining technology to support the energy transition. Turning complexity into clarity, and potential into performance.
We help asset owners, operators and stakeholders benefit from the full value of their energy portfolio by enabling the intelligent development, deployment, and operation of clean energy assets. Our integrated software suite, PowerTrack, is the industry standard and best-in-class for asset monitoring, supported by professional and managed services, under one roof. Meant to tackle challenges as seamlessly as possible, Stem shows the information needed clearly and accurately and helps harness raw data to inform actionable insight. With global projects managed in 55 countries – from Germany to Japan and across North America – customers have relied on Stem for nearly 20 years to maximize the value of their clean energy projects.
Stem’s culture embodies diversity & inclusion beyond the traditional facets of gender, ethnicity, age, disabilities, and sexual orientation to include experience, personality, communication, workstyles, and more. At our core, Stem is at the momentous intersection of clean energy and software technology where diverse ideas, experiences, and professional skills converge to make the inclusive culture we have today. Together, we are turning old school thoughts about software and energy into progressive, collaborative, and innovative solutions. By joining our team, you will be collaborating with data scientists, energy experts, skilled salespeople, thought-leading executives and more from a range of backgrounds. This intersection of ideas, beliefs, and skills is what makes us unique enough to lead the world’s largest network of digitally connected energy storage systems.
We are seeking DevOps Engineers who want to work on designing, building, and operating cloud-native, enterprise-level platforms connected to a large fleet of IoT devices. Our technology stack includes:
- Languages/Frameworks: Python, Java, C#/.NET
- Databases: DynamoDB, MySQL, MS-SQL, PostgreSQL, MongoDB, InfluxDB, TimescaleDB
- Cloud Platform: AWS
- Observability: Datadog, Grafana, Prometheus, OpenSearch, CloudWatch
Responsibilities
- Design, deploy, automate, and manage AWS cloud-based production systems with IoT-connected devices, ensuring availability, performance, scalability, and security
- Build and maintain comprehensive observability solutions including metrics, logs, and distributed tracing to provide full-stack visibility across applications and infrastructure
- Design and implement alerting strategies that minimize noise, reduce alert fatigue, and enable rapid detection of production issues
- Develop runbooks, automated remediation workflows, and self-healing infrastructure to reduce mean time to recovery (MTTR)
- Analyze cloud spend and implement cost optimization strategies including right-sizing, Reserved Instances, Savings Plans, and resource lifecycle management
- Build dashboards and reporting tools to provide visibility into infrastructure costs and enable teams to make data-driven decisions
- Build and maintain self-service platforms through automation to increase developer productivity and assure product/service quality
- Troubleshoot and solve problems across AWS infrastructure and application domains; lead incident response and conduct blameless post-mortems
- Design durable and consistent patterns for distributed systems; recommend architecture and process improvements
- Collaborate across multiple functional and technical teams to deliver projects on time and build enterprise-level platforms per the roadmap
- Analyze and resolve complex infrastructure and application deployment issues
- Evaluate emerging technology trends to enable evolving business and operating models
- Facilitate the evaluation and selection of software products, services, and standards; design standard and custom software configurations
- Assess existing platforms to identify deficiencies and improvements; recommend whether to maintain, refresh, or retire products, services, or systems
- Ensure critical system security using industry-leading cloud security solutions
Requirements
- 5+ years of overall experience, with 3+ years in enterprise environments
- 3+ years building and managing cloud and IoT platforms supporting large, highly available, enterprise-grade applications
- 4+ years working with AWS technologies (e.g., EC2, EKS, ECS, S3, Redshift, VPC, Glacier, IAM, CloudWatch, SQS, Lambda, CloudTrail, Systems Manager, KMS, Kinesis) with emphasis on the AWS Well-Architected Framework
- Strong experience implementing observability solutions including metrics collection, centralized logging, and distributed tracing (e.g., OpenTelemetry, Jaeger, X-Ray)
- Proven ability to design effective alerting systems with appropriate thresholds, escalation policies, and on-call rotations
- Experience with incident management, root cause analysis, and building automated remediation workflows
- Demonstrated track record of identifying and implementing AWS cost optimization strategies (right-sizing, Reserved Instances, Savings Plans, spot instances, resource scheduling)
- Familiarity with AWS cost management tools (Cost Explorer, Budgets, Cost Allocation Tags, Compute Optimizer)
- Strong Infrastructure-as-Code skills using tools such as Terraform, Ansible, Python, and Shell scripting
- Hands-on experience with containerization and orchestration (e.g., Docker, Kubernetes, AWS EKS, ECS)
- Solid experience in 24x7 production AWS environments, including CI/CD pipelines (Jenkins, GitLab CI, etc.)
- Strong understanding of Site Reliability Engineering principles, SLOs/SLIs/SLAs, error budgets, and chaos engineering
- Linux and Windows server administration
- Experience with observability and monitoring platforms (e.g., Datadog, Grafana, Prometheus, OpenSearch/Elastic Stack, CloudWatch, PagerDuty)
- Understanding of network topologies and protocols (DNS, HTTP/HTTPS, SSH, SFTP, SMTP)
- Experience with IT compliance and risk management frameworks (e.g., NIST, SOC 2, SOX, FedRAMP)
- Experience collaborating with client IT organizations to define appropriate solutions
Preferred Qualifications
- AWS Solutions Architect Professional certification
- CKA: Certified Kubernetes Administrator certification
Stem, Inc. is an equal opportunity employer committed to diversity in the workplace and does not discriminate against any employee or applicant for employment because of race, color, sex, pregnancy, religion, national origin, ethnicity, citizenship, sexual orientation, gender identity, age, marital status, disability, genetic information, military status, protected veteran status or any other factor protected by applicable federal, state or local laws.
Top Skills
Stem, Inc. Gurugram, Haryana, IND Office
Gurugram, Haryana, India



