The Senior AI Infrastructure & Platform Engineer will build and optimize GPU-based AI infrastructures, manage deployments, and collaborate with data science teams to ensure efficient operations.
Role Overview
RequirementsRequired Skills & Experience
We are seeking a highly skilled Senior AI Infrastructure & Platform Engineer to join our client’s team in Riyadh. In this role, you’ll be responsible for building, managing, and optimizing scalable AI infrastructure and compute environments that support high-performance workloads, including GPU-accelerated AI/ML pipelines, cluster scheduling, and orchestration.
Key Responsibilities- Deploy, maintain, and optimize GPU-based compute clusters and infrastructure.
- Manage and operate GPU orchestration tools and platforms such as:
- Nvidia Base Command Manager (critical)
- Nvidia AI Enterprise Suite
- Nvidia GPU and Network Operators
- Nvidia NIMs and Blueprints
- Configure, deploy, and maintain compute workloads using scheduling and orchestration tools including:
- Slurm (critical)
- Vanilla Kubernetes
- Install, configure, and maintain the underlying OS (e.g. Canonical Ubuntu) and supporting system software.
- Monitor and troubleshoot infrastructure performance, availability, and reliability; ensure high uptime for AI/ML workloads.
- Work with data scientists, ML engineers, and dev teams to define infrastructure requirements, resource allocation, and deployment workflows.
- Develop automation scripts, CI/CD pipelines, and best practices for infrastructure provisioning and management.
- Document architecture, configurations, and operational procedures; enforce security, compliance, and backup policies.
RequirementsRequired Skills & Experience
- Proven experience managing GPU-based AI/ML infrastructure and compute clusters.
- Hands-on experience with:
- Nvidia Base Command Manager
- Nvidia AI Enterprise Suite
- Nvidia GPU/Network Operators, NIMs, Blueprints
- Strong experience with Slurm and/or Kubernetes orchestration.
- Solid Linux system administration skills — preferably on Ubuntu or similar distributions.
- Strong scripting/automation ability (e.g. Bash, Python, or relevant tooling) for provisioning, deployment, and maintenance.
- Excellent troubleshooting and performance-tuning skills.
- Experience collaborating with ML/data science teams and integrating infrastructure with their workflows.
- Strong understanding of networking, security, resource allocation, and cluster management best practices.
- Previous experience working in a high-performance computing (HPC) or AI-focused infrastructure team.
- Knowledge of containerization, container orchestration, and GPUs in cloud or on-prem environments.
- Experience with CI/CD, infrastructure-as-code (e.g. Terraform, Ansible), monitoring tools, and logging setups.
- Familiarity with workload scheduling, job queuing, resource quotas, and GPU-shared environments.
Top Skills
Ansible
Bash
Kubernetes
Nvidia Ai Enterprise Suite
Nvidia Base Command Manager
Nvidia Gpu
Python
Slurm
Terraform
Ubuntu
Similar Jobs
Information Technology
The Digital Solutions Sales Manager drives business growth through managing the sales cycle, developing strategies, and building client relationships in digital solutions.
Top Skills:
Ai TechnologiesEnterprise Content Management (Opentext)Erp Systems (SapIntegration Platforms (Boomi)It Operations And Service Management (Bmc Software)Managed ServicesMicrosoftOracleTemenos)
Information Technology • Software
The SOC Analyst L2 will monitor security incidents, manage vulnerabilities, perform analysis and respond to cyber threats while collaborating with IT teams to improve security posture.
Top Skills:
Ccna R&SForensicsLinux SecurityNetwork+Security +SIEMWindows Security
Software
The Inbound Sales Executive handles inbound merchant leads, guiding them through onboarding including verification and contract signing while ensuring timely coordination with internal teams.
Top Skills:
Crm Software
What you need to know about the Delhi Tech Scene
Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.


