Porch Group is a leading vertical software and insurance platform and is positioned to be the best partner to help homebuyers move, maintain, and fully protect their homes. We offer differentiated products and services, with homeowners insurance at the center of this relationship. We differentiate and look to win in the massive and growing homeowners insurance opportunity by
1) providing the best services for homebuyers
2) led by advantaged underwriting in insurance
3) to protect the whole home
As a leader in the home services software-as-a-service (“SaaS”) space, we’ve built deep relationships with approximately 30 thousand companies that are key to the home-buying transaction, such as home inspectors, mortgage companies, and title companies.
In 2020, Porch Group rang the Nasdaq bell and began trading under the ticker symbol PRCH. We are looking to build a truly great company and are JUST GETTING STARTED.
Job Title: Senior AIOps Engineer I
Location: India
Workplace Type: Remote
Job Summary
The future is bright for the Porch Group, and we’d love for you to be a part of it as our Senior AIOps Engineer I
We are looking for a Senior AIOps Engineer I who will partner with product managers, platform engineers, data scientists, and machine learning engineers to ensure our AI and ML-powered systems are reliable, observable, secure, and cost-efficient in production.
You will focus on how AI systems run in real-world environments: monitoring model performance and drift, ensuring robust deployment pipelines, managing incidents, standing up new AI infrastructure, and improving the stability and scalability of our AI platform. You'll help evolve our AI & ML Ops stack and operational processes so teams can ship AI features quickly and safely.
Our AI/ML stack is based on Python and runs on Kubernetes (GKE) and Google Cloud Platform. We use tools such as Union Cloud (Flyte) for ML workflow orchestration, BentoML for model serving, Feast for feature stores, Label Studio for data annotation, BigQuery as our central data warehouse, and Dataflow for streaming/batch data pipelines. On the GenAI side, we operate a centralized LLM routing/gateway service across providers, batch prediction services for large-scale LLM inference, and are building out RAG infrastructure. You will maintain and harden this ecosystem — and stand up new infrastructure components as we expand our AI platform capabilities
What You Will Do As A Senior AIOps Engineer I
Own production reliability for AI/ML services
- Monitor and improve the reliability, availability, and performance of AI/ML-powered services running in production.
- Define and maintain SLOs/SLIs for critical AI systems (e.g., latency, error rates, model performance), tying them to user experience and business impact where possible.
- Own recurring model refresh cycles — coordinate retraining, validation, and redeployment of production models to prevent staleness and drift.
Build and improve AI observability
- Design and implement monitoring, logging, and alerting for models and data pipelines in partnership with AI Engineers and Data Scientists.
- Integrate model and system metrics with existing observability stacks (Datadog, Opik, etc.) and dashboards used by engineering and operations teams.
- Build and maintain monitoring workflows for pipeline health.
Support scalable, safe deployment of models
- Collaborate with data scientists and ML engineers to streamline deployment workflows for models and related services (blue/green, canary, A/B, shadow deployments).
- Support the productionization of image-based ML models, including batch prediction workflows, model performance monitoring, and data pipeline integration.
- Improve CI/CD pipelines and release processes for AI services to reduce risk and increase deployment frequency.
Stand up and operate AI infrastructure
- Provision, deploy, configure, and maintain new AI infrastructure components as they are adopted across the organization — including AI gateways, RAG platforms, LLM observability tools, agentic workflows, and no-code agent builders.
- Utilize and improve existing frameworks and tools (e.g., Union Cloud, BentoML, Feast, Kubernetes, Terraform, and GCP services) to support robust and maintainable AI infrastructure.
- Build automation and tooling to reduce manual operational work, especially around model promotion, configuration, environment management, and Docker image maintenance.
- Support multi-BU infrastructure provisioning — create and manage separate environments, projects, roles, and CI/CD integrations for different business units
Operate and maintain Label Studio
- Own the operational health of Label Studio — our production data annotation platform used for ground truth collection, model evaluation, and ML training dataset creation.
- Maintain the supporting infrastructure around Label Studio, including GCS storage buckets and BigQuery data pipelines that feed annotation projects; coordinate with Platform/IT partners for database and SSO dependencies.
- Support bi-weekly annotation project cycles and ensure platform availability for labeling specialists.
Build and maintain RAG and vector database infrastructure
- Stand up and operate RAG (Retrieval-Augmented Generation) platforms — including vector databases, embedding pipelines, and context retrieval APIs.
- Collaborate with AI Engineers on data source connectors, sync schedules, retention policies, and access control models for RAG infrastructure. Optimize embedding storage and retrieval performance at scale.
Optimize LLM costs
- Monitor and optimize LLM token usage and costs across providers, leveraging batch vs. real-time inference strategies to reduce spend.
- Implement and maintain centralized cost tracking dashboards and alerting for LLM consumption across business units.
- Evaluate and recommend cost-efficient model routing and provider selection strategies. .
Manage AI-related incidents and post-incident learning
- Participate in an on-call rotation for AI/ML systems, triaging and resolving production incidents that impact model performance or service reliability.
- Lead and contribute to post-incident reviews, driving concrete improvements to prevent recurrence and improve operability.
Collaborate across disciplines
- Work closely with data scientists, ML engineers, data engineers, and product managers to design solutions that are both technically sound and operationally sustainable.
- Partner with product, operations, and engineering to understand business processes and identify impactful opportunities to improve reliability, latency, and cost.
- Help define best practices for running AI in production, documenting standards, architecture diagrams, and operational playbooks for other teams.
Develop production-grade software
- Develop and maintain production-grade services and tooling, primarily in Python, that run on Kubernetes and integrate with our AI platform.
- Apply strong software engineering practices (testing, code reviews, CI/CD, observability, security) across AI-related services.
What You Will Bring As A Senior AIOps Engineer I
- Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred).
- 4+ years of professional software engineering experience, with including at least 3 years in commercial software engineering or production operations (SRE/DevOps/Platform/ML platform), operating services on Kubernetes and/or major cloud platforms.
- A curious and driven mindset, with a strong interest in how AI systems behave in production and how to make them more reliable and observable.
- Experience operating production systems at scale (preferably distributed, microservices- or Kubernetes-based architectures).
- Strong GCP experience preferred, including hands-on work with:
- GKE (Kubernetes Engine)
- BigQuery (data warehousing, SQL, schema management)
- Pub/Sub (event messaging)
- Vertex AI (batch prediction, model deployment)
- Cloud SQL, GCS, IAM
- Hands-on experience with:
- CI/CD tools and deployment automation (GitLab CI/CD, GitHub Actions, or similar).
- Infrastructure-as-code tools, preferably Terraform. Monitoring, logging, and alerting tools (e.g., Datadog, Prometheus, Grafana, ELK/EFK, or similar).
- Container management and Docker image lifecycle.
- Experience operating and administering self-hosted open-source tools on Kubernetes — installation, configuration, upgrades, security hardening, and backup/disaster recovery (DR).
- Understanding of basic machine learning concepts and model lifecycles (training, evaluation, deployment, monitoring), and familiarity with common ML tooling.
- Familiarity with LLM ecosystem tooling — AI gateways, vector databases, RAG pipelines, embedding models, or LLM evaluation frameworks — is a strong plus.
- Ability to communicate effectively with data scientists, ML engineers, and product stakeholders in both business and technical contexts.
- Strong computer science and programming fundamentals, with solid problem-solving skills.
- Proven ability to write high-quality software using engineering best practices such as:
- Test-Driven Development (TDD)
- SOLID principles
- Clean code and robust documentation
- Experience with Python-based data and ML stacks (e.g., NumPy, SciPy, Pandas, Scikit Learn, PyTorch) is a plus; experience building or operating ML infrastructure (e.g., feature stores, orchestration platforms, model serving, annotation tools) is highly desirable.
- Comfort participating in an on-call rotation and working in a trustful, respectful, and collaborative environment
Working Hours: 8 Hours (Excluding breaks)
4 Core US Hours Overlap (7:30pm- 11:30pm IST)
4 flexible hours (IST)
- Workspace: A quiet space to work, an internet connection of at least 30 Mbps download | 10 Mbps upload
The application window for this position is anticipated to close in 2 weeks (10 business days) from May 5, 2026. Please know this may change based on business and interviewing needs.
What You Will Get As A Porch Group Team Member
Pay Range*: INR 2,475,000 INR - 3,465,000 Annually
*Please know your actual pay at Porch will reflect a number of factors among which are your work experience and skillsets, job-related knowledge, alignment with market and our Porch employees, as well as your geographic location.
Our benefits package will provide you with comprehensive coverage for your health, life, and financial well-being.
- Our benefits include medical insurance, accident insurance and retiral benefits.
- Our wellness programs include 12 company-paid holidays, 2 flexible holidays, privilege/earned leave, casual/sick leave, paid maternity and paternity Leaves, and weekly wellness events.
#LI-DG1
#LI-REMOTE
What’s next?
Submit your application below and our Talent Acquisition team will be reviewing your application shortly! If your resume gets us intrigued, we will look to connect with you for a chat to learn more about your background, and then possibly invite you to have virtual interviews. What's important to call out is that we want to make sure not only that you're the right person for us, but also that we're the right next step for you, so come prepared with all the questions you have!
Porch is committed to building an inclusive culture of belonging that not only embraces the diversity of our people but also reflects the diversity of the communities in which we work and the customers we serve. We know that the happiest and highest performing teams include people with diverse perspectives that encourage new ways of solving problems, so we strive to attract and develop talent from all backgrounds and create workplaces where everyone feels seen, heard and empowered to bring their full, authentic selves to work.
Porch is an Equal Opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex including sexual orientation and gender identity, national origin, disability, protected veteran status, or any other characteristic protected by applicable laws, regulations, and ordinances.



