About The Job:
We are seeking a visionary and hands-on Senior AI Technical Lead to spearhead our Generative AI initiatives. While many can build a prototype, you are the expert who can take it to production. This role focuses on the end-to-end lifecycle of GenAI: from high-performance inference hosting and automated MLOps pipelines to rigorous model benchmarking and safety guardrails.
You will lead a high-performing team to design systems that are not only "intelligent" but are scalable, cost-optimized, and ethically governed.
What Will You Do:
MLOps & High-Performance Inference
- Inference Server Management: Architect and optimize model serving using high-throughput engines like vLLM, NVIDIA Triton Inference Server, or TGI (Text Generation Inference).
- Scalable Hosting: Deploy and manage LLMs on Kubernetes (K8s), implementing auto-scaling based on concurrency and token throughput.
- MLOps Pipelines: Build robust CI/CD/CT (Continuous Testing) pipelines for model deployment, versioning, and rollback strategies.
- Resource Optimization: Implement model optimization techniques such as quantization (AWQ, GPTQ), LoRA/QLoRA adapters, and caching strategies to minimize latency and GPU costs.
Evaluation, Benchmarking & Guardrails
- Model Benchmarking: Establish a systematic framework for benchmarking LLMs/SLMs against industry standards (e.g., MMLU, HumanEval) and custom business-specific datasets.
- Automated Evaluation: Lead the implementation of "LLM-as-a-judge" workflows and evaluation frameworks like RAGAS, DeepEval, or LangSmith to measure relevance, faithfulness, and noise robustness.
- AI Guardrails: Design and deploy real-time safety layers using NeMo Guardrails, Guardrails AI, or Llama Guard to prevent hallucinations, PII leakage, and toxic outputs.
- A/B Testing: Design experimentation frameworks to compare model versions, prompt iterations, and RAG architectures in live environments.
Architecture & Patterns
- Production Patterns: Design multilayered micro-services that integrate with WhatsApp, Instagram, and web platforms via robust API gateways.
- Observability: Implement deep monitoring for Token-per-Second (TPS), Time-To-First-Token (TTFT), and cost-per-request using Prometheus, Grafana, or specialized AI observability tools.
- Data Ingestion for RAG: Build automated, secure pipelines for data chunking, embedding generation, and vector database synchronization (Pinecone, Weaviate, or Milvus).
Technical Leadership
- Team Mentorship: Lead a team of 10-12 engineers, fostering a culture of "Production-First" AI development.
- Strategic Roadmap: Drive the technical vision for internal AI tooling, including prompt libraries and model registries.
- Stakeholder Collaboration: Translate complex performance metrics (like P99 latency) into business impact for product managers and executives.
What You Will Bring
- Experience: 8–10 years in AI/ML development, with 3+ years focused specifically on LLM productionization and MLOps.
- Inference Expertise: Proven track record of serving large models in production environments (local or cloud-hosted).
- Deep Tech Stack:
- Languages: Expert-level Python.
- Frameworks: LangChain, LlamaIndex, Hugging Face (Transformers/PEFT/Accelerate).
- MLOps Tools: MLflow, Weights & Biases, Kubeflow, or BentoML.
- Safety/Eval: Experience with NeMo Guardrails, RAGAS, or custom evaluation harnesses.
- Infrastructure: Mastery of Docker, Kubernetes (GPU orchestration), and Azure AI Studio / AWS SageMaker.
• • Academic Background: Bachelor’s or Master’s degree in Computer Science, AI/ML, or a related technical field.
About Red Hat
Red Hat is the world’s leading provider of enterprise open source software solutions, using a community-powered approach to deliver high-performing Linux, cloud, container, and Kubernetes technologies. Spread across 40+ countries, our associates work flexibly across work environments, from in-office, to office-flex, to fully remote, depending on the requirements of their role. Red Hatters are encouraged to bring their best ideas, no matter their title or tenure. We're a leader in open source because of our open and inclusive environment. We hire creative, passionate people ready to contribute their ideas, help solve complex problems, and make an impact.
Inclusion at Red Hat
Red Hat’s culture is built on the open source principles of transparency, collaboration, and inclusion, where the best ideas can come from anywhere and anyone. When this is realized, it empowers people from different backgrounds, perspectives, and experiences to come together to share ideas, challenge the status quo, and drive innovation. Our aspiration is that everyone experiences this culture with equal opportunity and access, and that all voices are not only heard but also celebrated. We hope you will join our celebration, and we welcome and encourage applicants from all the beautiful dimensions that compose our global village.
Equal Opportunity Policy (EEO)
Red Hat is proud to be an equal opportunity workplace and an affirmative action employer. We review applications for employment without regard to their race, color, religion, sex, sexual orientation, gender identity, national origin, ancestry, citizenship, age, veteran status, genetic information, physical or mental disability, medical condition, marital status, or any other basis prohibited by law.
Red Hat does not seek or accept unsolicited resumes or CVs from recruitment agencies. We are not responsible for, and will not pay, any fees, commissions, or any other payment related to unsolicited resumes or CVs except as required in a written contract between Red Hat and the recruitment agency or party requesting payment of a fee.
Red Hat supports individuals with disabilities and provides reasonable accommodations to job applicants. If you need assistance completing our online job application, email [email protected]. General inquiries, such as those regarding the status of a job application, will not receive a reply.


