Kyndryl

Senior Site Reliability Engineer

Reposted 16 Days Ago

Be an Early Applicant

3 Locations

Senior level

3 Locations

Senior level

As a Senior Site Reliability Engineer, you will manage and optimize storage solutions, ensuring reliability and performance of Linux systems, while collaborating with software developers and implementing automation tools.

The summary above was generated by AI

Who We Are

At Kyndryl, we design, build, manage and modernize the mission-critical technology systems that the world depends on every day. So why work at Kyndryl? We are always moving forward – always pushing ourselves to go further in our efforts to build a more equitable, inclusive world for our employees, our customers and our communities.

The Role

Skytap is an IaaS provider deployed globally in Azure that’s just joined forces with infrastructure services powerhouse Kyndryl, and the future has never looked brighter! This exciting new chapter combines our cutting-edge cloud platform with Kyndryl’s robust infrastructure expertise, creating a unique opportunity to deliver smarter, more scalable capabilities for businesses worldwide.

As we integrate the best of both companies, we’re looking for talented individuals to help us define the next chapter of our journey. If you’re passionate about building products that make a real impact and ready to bring fresh ideas, we want you to be part of our growing team. We’re creating a new era of seamless, high-performance solutions that drive real innovation and modernization for clients across industries. Let’s build that future together!

We are seeking a highly skilled Senior Site Reliability Engineer (SRE) to join Skytap’s Storage team. This technical role is crucial for supporting our software development infrastructure and maintaining production environments. You will work closely with Software Development Engineers (SDEs) to ensure the reliability, performance, and scalability of our storage solutions, with a strong focus on Linux-based systems.

Your Responsibilities:

Storage and Infrastructure Management:
- Deploy, manage, and optimize storage solutions using ZFS and iSCSI across global data centers.
- Implement and maintain automation and monitoring tools such as Puppet, Grafana, Zabbix, and Jenkins to enhance system performance and reliability.
- Utilize Storcli for managing server storage configurations.
Linux Systems Expertise:
- Manage and maintain Ubuntu-based systems, ensuring security and compliance.
- Conduct performance tuning and capacity planning for Linux servers.
- Develop and implement self-healing systems and automated recovery processes on Linux platforms.
Reliability Engineering:
- Develop and implement strategies for improving system availability and performance.
- Conduct root-cause analysis and incident response for storage-related issues.
- Collaborate with SDEs to support software development infrastructure and deploy new product features.
Operational Excellence:
- Manage on-call rotations, leveraging automation to minimize operational load.
- Develop and maintain documentation for operational procedures and best practices.
- Drive continuous improvement and innovation in storage operations.
Collaboration and Communication:
- Work closely with cross-functional teams, including SDEs and infrastructure engineers.
- Provide technical guidance and support for storage-related challenges.
- Present data-driven insights to stakeholders to support decision-making.

Your Future at Kyndryl
Kyndryl has a global footprint, which means that as a Site Reliability Engineer at Kyndryl you will have opportunities to work on projects and collaborate with colleagues from around the world. This role is dynamic and influential – offering a wide range of professional and personal growth opportunities that you won’t find anywhere else.

Who You Are

Your Skills & Qualifications:

Proven experience in site reliability engineering, with a focus on storage solutions and Linux systems.
Strong knowledge of ZFS, iSCSI, and Ubuntu.
Expertise in automation and configuration management tools (e.g., Bash, Ansible, Puppet).
Familiarity with Hashicorp tools, SSH, and LDAP.
Experience with storcli for storage configuration.
Experience with monitoring tools such as Grafana, Zabbix, InfluxDB.
Ability to conduct root-cause analysis and implement effective solutions.
Strong problem-solving skills and ability to troubleshoot complex issues.
Experience with DevOps and SRE best practices.
Ability to work collaboratively with cross-functional teams.
Commitment to staying current with industry trends and technologies.
Willingness to learn and adapt to new tools and methodologies.

Bonus Skills:

Ability to program in Python, Rust.
Software development experience, including technical design and deployment.

Join Skytap's Storage team and play a key role in ensuring the reliability and performance of our storage solutions. If you are passionate about site reliability engineering and have a strong technical background in storage and Linux systems, we encourage you to apply.

Being You

Diversity is a whole lot more than what we look like or where we come from, it’s how we think and who we are. We welcome people of all cultures, backgrounds, and experiences. But we’re not doing it single-handily: Our Kyndryl Inclusion Networks are only one of many ways we create a workplace where all Kyndryls can find and provide support and advice. This dedication to welcoming everyone into our company means that Kyndryl gives you – and everyone next to you – the ability to bring your whole self to work, individually and collectively, and support the activation of our equitable culture. That’s the Kyndryl Way.

What You Can Expect

With state-of-the-art resources and Fortune 100 clients, every day is an opportunity to innovate, build new capabilities, new relationships, new processes, and new value. Kyndryl cares about your well-being and prides itself on offering benefits that give you choice, reflect the diversity of our employees and support you and your family through the moments that matter – wherever you are in your life journey. Our employee learning programs give you access to the best learning in the industry to receive certifications, including Microsoft, Google, Amazon, Skillsoft, and many more. Through our company-wide volunteering and giving platform, you can donate, start fundraisers, volunteer, and search over 2 million non-profit organizations. At Kyndryl, we invest heavily in you, we want you to succeed so that together, we will all succeed.

Get Referred!

If you know someone that works at Kyndryl, when asked ‘How Did You Hear About Us’ during the application process, select ‘Employee Referral’ and enter your contact's Kyndryl email address.

Top Skills

Ansible

Bash

Grafana

Hashicorp Tools

Influxdb

Iscsi

Jenkins

Ldap

Puppet

Python

Rust

Ssh

Storcli

Ubuntu

Zabbix

Zfs

Similar Jobs

McCain Foods

Sr Mgr I&O - Cloud, DevSecOps, SRE & Obs

Yesterday

New Delhi, Delhi, IND

Senior level

Food • Retail • Agriculture • Manufacturing

Lead the infrastructure and technology strategy for cloud and DevSecOps, ensuring operational excellence and driving digital transformation efforts. Manage global application operations, oversee service delivery, and modernize security and infrastructure to enhance organizational efficiency and customer experience.

Top Skills: AnsibleAppdynamicsAWSAzureCheckmarxCloudComputeCoverityDevsecopsDnsElasticGCPGoIamJenkinsKubernetesNew RelicObservabilityPuppetPythonS3SreStorageTerraformVeracodeVpcVpnZaproxy

HighLevel

Site Reliability Engineer

Yesterday

Remote

Delhi, Connaught Place, New Delhi, Delhi, IND

Mid level

Information Technology • Internet of Things • Marketing Tech

Join HighLevel as a Site Reliability Engineer to ensure system performance and scalability. Automate processes, enhance reliability, and collaborate with developers on observability and incident management.

Top Skills: ArgocdAWSBashDockerElkEsGCPGithub ActionsGrafanaHelmJenkinsKubernetesMongoDBOpentelemetryPrometheusPythonQueueRedisShell ScriptingTerraform

SAFE Security

Site Reliability Engineer II

8 Days Ago

New Delhi, Delhi, IND

Senior level

Cybersecurity

As a Site Reliability Engineer II, you will ensure uptime and reliability of cloud environments, troubleshoot incidents, automate processes, and work collaboratively to improve system performance.

Top Skills: AWSDatadogDockerGithub ActionsGrafanaJenkinsJIRAKubernetesNew RelicPagerdutyPrometheusPythonSplunkTerraform

What you need to know about the Delhi Tech Scene

Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.

By clicking Apply you agree to share your profile information with the hiring company.