Lead the design and operation of ML systems for data collection and processing. Mentor teams, oversee technical direction, and ensure system reliability and efficiency. Responsible for cloud deployments and integrating advanced AI technologies.
Title: Lead Machine Learning Engineer
Location: Vashi, Navi Mumbai
As a Lead Machine Learning Engineer, you will be the hands-on technical owner of ML systems that power large-scale data collection, extraction, enrichment, and understanding of unstructured content. You'll design, build, and operate end-to-end solutions-from feature generation and training to low-latency inference and observability. These solutions will measurably improve coverage, freshness, quality, and unit cost across our data pipelines. Your toolbox spans classical ML, NLP, LLMs/GenAI, Agentic AI, Retrieval-Augmented Generation (RAG) frameworks, and Model Context Protocol (MCP). You will use these to deliver retrieval, extraction, classification, summarization, and autonomous tasking capabilities integrated cleanly into production workflows.
You'll own the architecture and implementation across AWS and GCP clouds, selecting managed services pragmatically and deploying resilient services via Docker and Kubernetes with CI/CD, autoscaling, canary/shadow releases, and tight SLIs/SLOs. You will institute MLOps best practices-experiment tracking, model and prompt registries, evaluation harnesses, data/feature drift detection, guardrails and policy enforcement, lineage and access controls-so teams can ship faster with confidence. Day to day, you'll write production-grade Python and SQL, apply GitHub Copilot to accelerate development responsibly, and partner with Product, Data, Platform/SRE, and Security to translate ambiguous problems into staged, observable deliveries.
You bring a curiosity to understand the domain by studying the applications, dataflow, and data schemas, and you use that context to design simpler, more accurate systems. It's a plus if you have familiarity with public and private equity data and related entity models, enabling smarter features, evaluation sets, and downstream integrations. As a lead IC, you mentor through design and code reviews, set technical direction, and improve reliability, security, and developer experience. You will champion cost-aware, privacy-first designs; lead deep dives to resolve complex issues; and iterate quickly to achieve measurable outcomes (precision/recall, latency, error budgets, and cost per document). This role is ideal for an engineer who thrives on shipping robust ML/LLM systems at scale and influencing cross-functional teams through exceptional technical judgment and execution.
Team Overview
You will be part of a multidisciplinary team of ML engineers and data scientists responsible for building AI & ML solutions and services as part of robust data collection pipelines handling large volumes of unstructured data. Team will focus on building scalable and reliable systems to process and categorize data that is essential for downstream data collection processing.
Outline of Duties and Responsibilities
Experience, Skills and Qualifications
Working Conditions
The job conditions for this position are in a standard office setting. Employees in this position use PC and phones on an ongoing basis throughout the day. Limited corporate travel may be required to remote offices or other business meetings and events.
Morningstar's hybrid work environment gives you the opportunity to collaborate in-person each week as we've found that we're at our best when we're purposely together on a regular basis. In most of our locations, our hybrid work model is four days in-office each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you'll have tools and resources to engage meaningfully with your global colleagues.
I10_MstarIndiaPvtLtd Morningstar India Private Ltd. (Delhi) Legal Entity
Location: Vashi, Navi Mumbai
As a Lead Machine Learning Engineer, you will be the hands-on technical owner of ML systems that power large-scale data collection, extraction, enrichment, and understanding of unstructured content. You'll design, build, and operate end-to-end solutions-from feature generation and training to low-latency inference and observability. These solutions will measurably improve coverage, freshness, quality, and unit cost across our data pipelines. Your toolbox spans classical ML, NLP, LLMs/GenAI, Agentic AI, Retrieval-Augmented Generation (RAG) frameworks, and Model Context Protocol (MCP). You will use these to deliver retrieval, extraction, classification, summarization, and autonomous tasking capabilities integrated cleanly into production workflows.
You'll own the architecture and implementation across AWS and GCP clouds, selecting managed services pragmatically and deploying resilient services via Docker and Kubernetes with CI/CD, autoscaling, canary/shadow releases, and tight SLIs/SLOs. You will institute MLOps best practices-experiment tracking, model and prompt registries, evaluation harnesses, data/feature drift detection, guardrails and policy enforcement, lineage and access controls-so teams can ship faster with confidence. Day to day, you'll write production-grade Python and SQL, apply GitHub Copilot to accelerate development responsibly, and partner with Product, Data, Platform/SRE, and Security to translate ambiguous problems into staged, observable deliveries.
You bring a curiosity to understand the domain by studying the applications, dataflow, and data schemas, and you use that context to design simpler, more accurate systems. It's a plus if you have familiarity with public and private equity data and related entity models, enabling smarter features, evaluation sets, and downstream integrations. As a lead IC, you mentor through design and code reviews, set technical direction, and improve reliability, security, and developer experience. You will champion cost-aware, privacy-first designs; lead deep dives to resolve complex issues; and iterate quickly to achieve measurable outcomes (precision/recall, latency, error budgets, and cost per document). This role is ideal for an engineer who thrives on shipping robust ML/LLM systems at scale and influencing cross-functional teams through exceptional technical judgment and execution.
Team Overview
You will be part of a multidisciplinary team of ML engineers and data scientists responsible for building AI & ML solutions and services as part of robust data collection pipelines handling large volumes of unstructured data. Team will focus on building scalable and reliable systems to process and categorize data that is essential for downstream data collection processing.
Outline of Duties and Responsibilities
- AI & ML Data Collection Leadership: Convert business goals into a clear AI/ML roadmap for data acquisition, extraction, enrichment, and measurable outcomes.
- Technical Oversight: Architect and ship scalable ML/NLP/LLM (RAG, embeddings, reranking, Agentic AI, MCP) services with high reliability and efficiency.
- Peer Leadership & Development: Mentor engineers and data scientists through design/code reviews, setting technical standards and elevating craftsmanship.
- NLP Technologies: Build and integrate classifiers, transformers, LLMs, and evaluators that process and categorize unstructured data at scale.
- Data Pipeline Engineering: Design, operate, and optimize high-throughput collection pipelines with robust orchestration, messaging, storage, and SLAs.
- Cross-functional Collaboration: Partner with Product, Data Collection Engineering, Platform/SRE, and Security to turn ambiguous needs into phased, observable deliveries.
- Innovation & Continuous Improvement: Pilot and productionize advances in GenAI, Agentic AI, RAG, and MCP to improve quality, speed, and cost.
- System Integrity & Security: Enforce data governance, privacy, and model transparency with least-privilege IAM, secrets management, and auditability.
- Process Improvement: Apply Agile/Lean/Fast-Flow practices to reduce cycle time, raise quality, and remove toil via automation.
- Cloud & Deployment: Deliver cloud-native solutions on AWS and GCP using Docker/Kubernetes, autoscaling, and progressive delivery patterns.
- MLOps & Reliability: Establish experiment tracking, registries, CI/CD, drift detection, SLIs/SLOs, and runbooks for dependable operations.
- Retrieval Quality & Evaluation: Implement offline/online evals (e.g., nDCG/MRR/precision@k), golden sets, and guardrails for RAG and search relevance.
- Cost, Performance & Observability: Optimize latency and unit cost with caching, batching, distillation, right-sizing, and clear dashboards/alerts.
- Documentation & Knowledge Sharing: Produce concise design docs, ADRs, and playbooks to ensure durable, cross-site knowledge transfer.
Experience, Skills and Qualifications
- Bachelor's, Master's, or PhD in Computer Science, Mathematics, Data Science, or a related field.
- 5+ years of experience in the ML Engineering and Data Science field, with a focus on LLM and GenAI technologies, particularly in data collection and unstructured data processing.
- 1+ years of experience in technical lead position.
- Strong expertise in NLP and machine learning, with hands-on experience in classifiers, large language models (LLMs), Model Context Protocol (MCP), Agentic AI, and other advanced NLP techniques.
- Extensive experience with data pipeline and messaging technologies such as Apache Kafka, Airflow, and cloud data platforms (e.g., Snowflake).
- Expert-level proficiency in Python, SQL, and other relevant programming languages and tools.
- Proficiency in Amazon Web Services (AWS) and Google Cloud Platform (GCP).
- Strong understanding of cloud-native technologies and containerization (e.g., Kubernetes, Docker) with experience in managing these systems globally.
- Demonstrated ability to solve complex technical challenges and deliver scalable solutions.
- Excellent communication skills with a collaborative approach to working with global teams and stakeholders.
- Experience working in fast-paced environments, particularly in industries that rely on data-intensive technologies (experience in fintech is highly desirable).
Working Conditions
The job conditions for this position are in a standard office setting. Employees in this position use PC and phones on an ongoing basis throughout the day. Limited corporate travel may be required to remote offices or other business meetings and events.
Morningstar's hybrid work environment gives you the opportunity to collaborate in-person each week as we've found that we're at our best when we're purposely together on a regular basis. In most of our locations, our hybrid work model is four days in-office each week. A range of other benefits are also available to enhance flexibility as needs change. No matter where you are, you'll have tools and resources to engage meaningfully with your global colleagues.
I10_MstarIndiaPvtLtd Morningstar India Private Ltd. (Delhi) Legal Entity
Top Skills
Airflow
Apache Kafka
AWS
Docker
GCP
Kubernetes
Python
SQL
Similar Jobs at Morningstar
Enterprise Web • Fintech • Financial Services
The QA Automation Engineer ensures quality in digital products through automation and manual testing, collaborating closely with the product team and maintaining QA best practices.
Top Skills:
Api AutomationAWSGitHarnessJavaJenkinsJIRAJmeterPostgresPostmanPythonRest AssuredSeleniumSQLSwaggerTest Rail
Enterprise Web • Fintech • Financial Services
Support HR operations and assist with employee queries, data management, and process improvement while providing efficient HR services.
Top Skills:
MS OfficeServicenowWorkday
Enterprise Web • Fintech • Financial Services
Analyze credit risk in structured credit by reviewing deal documents, performing data analysis, and maintaining databases within the CLO space.
Top Skills:
ExcelMssqlPythonSQLVBA
What you need to know about the Delhi Tech Scene
Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.

