The Site Reliability Engineer will enhance cloud services by overseeing caching infrastructure and automation, ensuring high availability and performance. The role involves monitoring, debugging, and improving code while scaling distributed software in production environments. Responsibilities include communication across technical levels and implementing best practices in service reliability.
We are looking for an engineer who is passionate about scaling cloud services to join our growing SRE team. The SRE team owns the caching infrastructure, tooling, and automation that support Atlassian's suite of Cloud products.
We'd love it if you had an understanding of modern cloud infrastructure, programming expertise, operational experience and a desire to change the status quo. We're looking for an engineer who can analyze and help improve our services and processes to get us to an even higher level of reliability, performance, scalability, and cost efficiency.
On your first day, we'll expect you to have:
- 1+ years experience operating high-availability, fault-tolerant, scalable, distributed software in production: building monitoring into your code, tweaking dashboards, defining alerts, writing runbooks, etc.
- 1+ years of hands-on experience with public cloud offerings (AWS components like EC2, CloudFormation, RDS / Aurora, Caches, SQS - or equivalents, e.g. in GCP / Azure).
- Familiarity with Unix / Linux operating systems.
- Great emphasis to debug, improve code, and automate routine tasks.
- Backend engineering experience in one or more prominent languages such as Java, Go or Python.
- Strong communication skills in written and verbal forms, and an ability to communicate complex technical issues to a range of technical and non-technical audiences (management, peers, clients)
It would be great, but not mandatory if you had:
- Experience implementing caching solutions, strategies, and best practices.
- Experience in microservice architecture.
- Experience building web-services and clients using REST/GraphQL.
Top Skills
Go
Java
Python
Similar Jobs at Atlassian
Cloud • Information Technology • Productivity • Security • Software • App development • Automation
As a Site Reliability Engineer at Atlassian, you will manage and improve cloud infrastructure, automate processes, and ensure the reliability and performance of services. You will build monitoring into code, troubleshoot, and communicate technical issues effectively. Experience with public cloud offerings and backend engineering is essential.
Top Skills:
GoJavaPython
Cloud • Information Technology • Productivity • Security • Software • App development • Automation
As a Principal Data Engineer at Atlassian, you will lead the data engineering team, build scalable data solutions, and enhance data infrastructure. You will drive technical direction, ensure data quality, mentor engineers, and collaborate with cross-functional teams to support data-driven decisions within the organization.
Top Skills:
AWSSparkSQL
Cloud • Information Technology • Productivity • Security • Software • App development • Automation
As a Machine Learning Engineer at Atlassian, you will develop and implement advanced machine learning algorithms, collaborating with various teams to enhance AI functionalities across products. Responsibilities include designing system architectures, conducting evaluations, and applying AI/ML solutions to improve product effectiveness.
Top Skills:
JavaPython
What you need to know about the Delhi Tech Scene
Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.