Senior AI/ML Engineer
About the Role
We're seeking a Senior AI/ML Engineer to lead the design, development, and deployment of ML/AI solutions, agents, and data automations across our AWS-based analytics platform. In this role, you'll architect end-to-end ML systems, set technical direction, mentor junior engineers and analysts, and drive best practices in MLOps and ML infrastructure. This is a high-impact role for an experienced engineer who can own complex, ambiguous problems and deliver production-grade machine learning and AI agent solutions.
Key Responsibilities
- Architect, train, evaluate, and optimize machine learning models, owning the full model lifecycle from experimentation to production
- Design and build AI agents and automated workflows using Amazon Quick and AWS orchestration tools
- Define and implement efficient ML workflows, optimizing for performance, scalability, and cost
- Architect and maintain serverless data pipelines using AWS Glue, Step Functions, Lambda, and EventBridge Scheduler
- Lead the design of our analytics service engine for ingesting, transforming, and querying data across S3 storage (Excel/CSV files, Delta Tables, library files)
- Establish MLOps practices, CI/CD pipelines, and infrastructure standards for the team
- Integrate with external systems (e.g., SAP, Salesforce) and design robust data-sourcing strategies
- Design and build REST APIs and model-serving infrastructure for production workloads
- Mentor junior engineers, conduct code reviews, and set technical standards
- Partner with cross-functional stakeholders to translate business needs into ML solutions
Required Technical Skills
- Programming: Expert in Python with a track record of writing clean, well-tested, production-grade code
- ML Frameworks: Strong, hands-on experience with PyTorch, TensorFlow, and scikit-learn
- Model Development: Deep understanding of model training, evaluation, inference, and optimization for efficient ML at scale
- AI Agents & Automation: Proven experience building AI agents and automated workflows; proficient with Amazon Quick
- MCP & Tool Integration: Experience building and integrating Model Context Protocol (MCP) servers to connect LLMs and AI agents with external tools, data sources, and services
- APIs & Serving: Strong experience designing REST APIs and deploying/serving ML models in production
- Cloud & Infrastructure: Solid experience with cloud platforms (AWS, Azure, GCP) and containerization (Docker)
- MLOps & Tooling: Proficient with Git, CI/CD pipelines, and ML infrastructure best practices
Familiarity with Our Architecture
Our team's analytics platform is built on AWS. Deep familiarity with the following components is expected, and you'll help shape how we use and evolve them:
- Orchestration & Compute: AWS Glue, AWS Step Functions, AWS Lambda, EventBridge Scheduler
- Storage & Data: S3 (CSV/Excel, Delta Tables, library files), Glue Data Catalog, Glue Crawler
- Query & Analytics: Amazon Athena
- AI & Automation: Amazon Quick
- Integration: External systems such as SAP and Salesforce
- Notifications: Amazon SNS and Amazon SES for alerting and email
- Infrastructure & Security: AWS IAM, AWS Secrets Manager, CloudWatch, AWS Systems Manager (for environment parameters)\
- Source Control & CI/CD: Bitbucket for version control, pull request workflows, and pipeline-based deployments
Core Competencies
- Problem-Solving: Independently solves complex, ambiguous ML and automation challenges and designs scalable solutions
- Technical Leadership: Sets technical direction, drives architecture decisions, and mentors junior engineers
- Collaboration: Leads cross-functional initiatives and owns the delivery of significant components end-to-end
- Continuous Improvement: Champions MLOps, software engineering best practices, and ML infrastructure across the team
- Code Quality: Sets and enforces high standards through clean, testable code, rigorous reviews, and robust version control



