Kyndryl Logo

Kyndryl

Senior Site Reliability Engineer

Posted Yesterday
Be an Early Applicant
3 Locations
Senior level
3 Locations
Senior level
The Senior Site Reliability Engineer at Kyndryl will manage and optimize storage solutions using ZFS and iSCSI, maintain Linux-based systems, implement automation tools, and ensure system reliability. Responsibilities include performance tuning, incident response, and documentation while collaborating with software development teams.
The summary above was generated by AI

Who We Are

At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities.


The Role

Skytap is an IaaS provider deployed globally in Azure that’s just joined forces with infrastructure services powerhouse Kyndryl, and the future has never looked brighter!  This exciting new chapter combines our cutting-edge cloud platform with Kyndryl’s robust infrastructure expertise, creating a unique opportunity to deliver smarter, more scalable capabilities for businesses worldwide.

 

As we integrate the best of both companies, we’re looking for talented individuals to help us define the next chapter of our journey. If you’re passionate about building products that make a real impact and ready to bring fresh ideas, we want you to be part of our growing team.  We’re creating a new era of seamless, high-performance solutions that drive real innovation and modernization for clients across industries. Let’s build that future together!

 
We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join Skytap’s Storage team. This technical role is crucial for supporting our software development infrastructure and maintaining production environments. You will work closely with Software Development Engineers (SDEs) to ensure the reliability, performance, and scalability of our storage solutions, with a strong focus on Linux-based systems.

Your Responsibilities:

  • Storage and Infrastructure Management:

    • Deploy, manage, and optimize storage solutions using ZFS and iSCSI across global data centers.

    • Implement and maintain automation and monitoring tools such as Puppet, Grafana, Zabbix, and Jenkins to enhance system performance and reliability.

    • Utilize Storcli for managing server storage configurations.

  • Linux Systems Expertise:

    • Manage and maintain Ubuntu-based systems, ensuring security and compliance.

    • Conduct performance tuning and capacity planning for Linux servers.

    • Develop and implement self-healing systems and automated recovery processes on Linux platforms.

  • Reliability Engineering:

    • Develop and implement strategies for improving system availability and performance.

    • Conduct root-cause analysis and incident response for storage-related issues.

    • Collaborate with SDEs to support software development infrastructure and deploy new product features.

  • Operational Excellence:

    • Manage on-call rotations, leveraging automation to minimize operational load.

    • Develop and maintain documentation for operational procedures and best practices.

    • Drive continuous improvement and innovation in storage operations.

  • Collaboration and Communication:

    • Work closely with cross-functional teams, including SDEs and infrastructure engineers.

    • Provide technical guidance and support for storage-related challenges.

    • Present data-driven insights to stakeholders to support decision-making.

Your Future at Kyndryl
Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world. This role is dynamic and influential – offering a wide range of professional and personal growth opportunities that you won’t find anywhere else.


Who You Are

Your Skills & Qualifications:

  • Proven experience in site reliability engineering, with a focus on storage solutions and Linux systems.

  • Strong knowledge of ZFS, iSCSI, and Ubuntu.

  • Expertise in automation and configuration management tools (e.g., Bash, Ansible, Puppet).

  • Familiarity with Hashicorp tools, SSH, and LDAP.

  • Experience with storcli for storage configuration.

  • Experience with monitoring tools such as Grafana, Zabbix, InfluxDB.

  • Ability to conduct root-cause analysis and implement effective solutions.

  • Strong problem-solving skills and ability to troubleshoot complex issues.

  • Experience with DevOps and SRE best practices.

  • Ability to work collaboratively with cross-functional teams.

  • Commitment to staying current with industry trends and technologies.

  • Willingness to learn and adapt to new tools and methodologies.

Bonus Skills: 

  • Ability to program in Python, Rust.

  • Software development experience, including technical design and deployment.

Join Skytap's Storage team and play a key role in ensuring the reliability and performance of our storage solutions. If you are passionate about site reliability engineering and have a strong technical background in storage and Linux systems, we encourage you to apply.


Being You

Diversity is a whole lot more than what we look like or where we come from, it’s how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we’re not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you – and everyone next to you – the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That’s the Kyndryl Way.


What You Can Expect

With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter – wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations.  At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed.

Get Referred!

If you know someone that works at Kyndryl, when asked ‘How Did You Hear About Us’ during the application process, select ‘Employee Referral’ and enter your contact's Kyndryl email address.

Top Skills

Ansible
Bash
Linux
Python
Rust
Ubuntu

Similar Jobs

13 Hours Ago
New Delhi, Delhi, IND
Mid level
Mid level
Information Technology • Cybersecurity
As a Site Reliability Engineer at AlgoSec, you will ensure the reliability and performance of production environments, collaborate with cross-functional teams, manage AWS infrastructure, implement CI/CD procedures, and enhance monitoring capabilities. Responsible for resolving service issues and supporting on-call rotations, you will create automation tools for proactive problem mitigation.
Top Skills: AWSBashCloudFormationGitJenkinsKubernetesLinuxPythonTerraform
9 Days Ago
Remote
Delhi, Connaught Place, New Delhi, Delhi, IND
Senior level
Senior level
Information Technology • Internet of Things • Marketing Tech
The Lead Site Reliability Engineer will ensure the availability, performance, and scalability of our systems, collaborating with development and operations teams to enhance reliability and observability, automate processes, and drive cost optimization efforts.
Top Skills: BashPython
16 Days Ago
New Delhi, Delhi, IND
Senior level
Senior level
Cybersecurity
The Site Reliability Engineer II will ensure the uptime and scalability of the cloud platform, troubleshoot production issues, automate deployments, and collaborate with development teams for system reliability. Responsibilities include incident management, capacity planning, and promoting SRE best practices across the organization.
Top Skills: AWSDockerKubernetesPython

What you need to know about the Delhi Tech Scene

Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account