Porch Group Logo

Porch Group

Senior AIOps Engineer I

Posted Yesterday
Be an Early Applicant
Remote
Hiring Remotely in IN
Senior level
Remote
Hiring Remotely in IN
Senior level
As a Senior AIOps Engineer, you will ensure AI and ML systems are reliable and efficient in production, collaborating with teams to improve observability, model performance, and infrastructure stability.
The summary above was generated by AI

Porch Group is a leading vertical software and insurance platform and is positioned to be the best partner to help homebuyers move, maintain, and fully protect their homes. We offer differentiated products and services, with homeowners insurance at the center of this relationship. We differentiate and look to win in the massive and growing homeowners insurance opportunity by

1) providing the best services for homebuyers

2) led by advantaged underwriting in insurance

3) to protect the whole home

As a leader in the home services software-as-a-service (“SaaS”) space, we’ve built deep relationships with approximately 30 thousand companies that are key to the home-buying transaction, such as home inspectors, mortgage companies, and title companies.

In 2020, Porch Group rang the Nasdaq bell and began trading under the ticker symbol PRCH. We are looking to build a truly great company and are JUST GETTING STARTED.

Job Title: Senior AIOps Engineer I
Location: India  
Workplace Type: Remote  

Job Summary 

The future is bright for the Porch Group, and we’d love for you to be a part of it as our Senior AIOps Engineer I

We are looking for a Senior AIOps Engineer I who will partner with product managers, platform engineers, data scientists, and machine learning engineers to ensure our AI and ML-powered systems are reliable, observable, secure, and cost-efficient in production.

You will focus on how AI systems run in real-world environments: monitoring model performance and drift, ensuring robust deployment pipelines, managing incidents, standing up new AI infrastructure, and improving the stability and scalability of our AI platform. You'll help evolve our AI & ML Ops stack and operational processes so teams can ship AI features quickly and safely.

Our AI/ML stack is based on Python and runs on Kubernetes (GKE) and Google Cloud Platform. We use tools such as Union Cloud (Flyte) for ML workflow orchestration, BentoML for model serving, Feast for feature stores, Label Studio for data annotation, BigQuery as our central data warehouse, and Dataflow for streaming/batch data pipelines. On the GenAI side, we operate a centralized LLM routing/gateway service across providers, batch prediction services for large-scale LLM inference, and are building out RAG infrastructure. You will maintain and harden this ecosystem — and stand up new infrastructure components as we expand our AI platform capabilities

What You Will Do As A Senior AIOps Engineer I

Own production reliability for AI/ML services

  • Monitor and improve the reliability, availability, and performance of AI/ML-powered services running in production.
  • Define and maintain SLOs/SLIs for critical AI systems (e.g., latency, error rates, model performance), tying them to user experience and business impact where possible.
  • Own recurring model refresh cycles — coordinate retraining, validation, and redeployment of production models to prevent staleness and drift.

Build and improve AI observability

  • Design and implement monitoring, logging, and alerting for models and data pipelines in partnership with AI Engineers and Data Scientists.
  • Integrate model and system metrics with existing observability stacks (Datadog, Opik, etc.) and dashboards used by engineering and operations teams.
  • Build and maintain monitoring workflows for pipeline health.

Support scalable, safe deployment of models

  • Collaborate with data scientists and ML engineers to streamline deployment workflows for models and related services (blue/green, canary, A/B, shadow deployments).
  • Support the productionization of image-based ML models, including batch prediction workflows, model performance monitoring, and data pipeline integration.
  • Improve CI/CD pipelines and release processes for AI services to reduce risk and increase deployment frequency.

Stand up and operate AI infrastructure

  • Provision, deploy, configure, and maintain new AI infrastructure components as they are adopted across the organization — including AI gateways, RAG platforms, LLM observability tools, agentic workflows, and no-code agent builders.
  • Utilize and improve existing frameworks and tools (e.g., Union Cloud, BentoML, Feast, Kubernetes, Terraform, and GCP services) to support robust and maintainable AI infrastructure.
  • Build automation and tooling to reduce manual operational work, especially around model promotion, configuration, environment management, and Docker image maintenance.
  • Support multi-BU infrastructure provisioning — create and manage separate environments, projects, roles, and CI/CD integrations for different business units

Operate and maintain Label Studio

  • Own the operational health of Label Studio — our production data annotation platform used for ground truth collection, model evaluation, and ML training dataset creation.
  • Maintain the supporting infrastructure around Label Studio, including GCS storage buckets and BigQuery data pipelines that feed annotation projects; coordinate with Platform/IT partners for database and SSO dependencies.
  • Support bi-weekly annotation project cycles and ensure platform availability for labeling specialists.

Build and maintain RAG and vector database infrastructure

  • Stand up and operate RAG (Retrieval-Augmented Generation) platforms — including vector databases, embedding pipelines, and context retrieval APIs.
  • Collaborate with AI Engineers on data source connectors, sync schedules, retention policies, and access control models for RAG infrastructure. Optimize embedding storage and retrieval performance at scale.

Optimize LLM costs

  • Monitor and optimize LLM token usage and costs across providers, leveraging batch vs. real-time inference strategies to reduce spend.
  • Implement and maintain centralized cost tracking dashboards and alerting for LLM consumption across business units.
  • Evaluate and recommend cost-efficient model routing and provider selection strategies. .

Manage AI-related incidents and post-incident learning

  • Participate in an on-call rotation for AI/ML systems, triaging and resolving production incidents that impact model performance or service reliability.
  • Lead and contribute to post-incident reviews, driving concrete improvements to prevent recurrence and improve operability.

Collaborate across disciplines

  • Work closely with data scientists, ML engineers, data engineers, and product managers to design solutions that are both technically sound and operationally sustainable.
  • Partner with product, operations, and engineering to understand business processes and identify impactful opportunities to improve reliability, latency, and cost.
  • Help define best practices for running AI in production, documenting standards, architecture diagrams, and operational playbooks for other teams.

Develop production-grade software

  • Develop and maintain production-grade services and tooling, primarily in Python, that run on Kubernetes and integrate with our AI platform.
  • Apply strong software engineering practices (testing, code reviews, CI/CD, observability, security) across AI-related services.

What You Will Bring As A Senior AIOps Engineer I

  • Bachelor’s degree in Computer Science, Engineering, or related field (Master’s preferred).
  • 4+ years of professional software engineering experience, with including at least 3 years in commercial software engineering or production operations (SRE/DevOps/Platform/ML platform), operating services on Kubernetes and/or major cloud platforms.
  • A curious and driven mindset, with a strong interest in how AI systems behave in production and how to make them more reliable and observable.
  • Experience operating production systems at scale (preferably distributed, microservices- or Kubernetes-based architectures).
  • Strong GCP experience preferred, including hands-on work with:
  • GKE (Kubernetes Engine)
  • BigQuery (data warehousing, SQL, schema management)
  • Pub/Sub (event messaging)
  • Vertex AI (batch prediction, model deployment)
  • Cloud SQL, GCS, IAM
  • Hands-on experience with:
  • CI/CD tools and deployment automation (GitLab CI/CD, GitHub Actions, or similar).
  • Infrastructure-as-code tools, preferably Terraform. Monitoring, logging, and alerting tools (e.g., Datadog, Prometheus, Grafana, ELK/EFK, or similar).
  • Container management and Docker image lifecycle.
  • Experience operating and administering self-hosted open-source tools on Kubernetes — installation, configuration, upgrades, security hardening, and backup/disaster recovery (DR).
  • Understanding of basic machine learning concepts and model lifecycles (training, evaluation, deployment, monitoring), and familiarity with common ML tooling.
  • Familiarity with LLM ecosystem tooling — AI gateways, vector databases, RAG pipelines, embedding models, or LLM evaluation frameworks — is a strong plus.
  • Ability to communicate effectively with data scientists, ML engineers, and product stakeholders in both business and technical contexts.
  • Strong computer science and programming fundamentals, with solid problem-solving skills.
  • Proven ability to write high-quality software using engineering best practices such as:
  • Test-Driven Development (TDD)
  • SOLID principles
  • Clean code and robust documentation
  • Experience with Python-based data and ML stacks (e.g., NumPy, SciPy, Pandas, Scikit Learn, PyTorch) is a plus; experience building or operating ML infrastructure (e.g., feature stores, orchestration platforms, model serving, annotation tools) is highly desirable.
  • Comfort participating in an on-call rotation and working in a trustful, respectful, and collaborative environment
  • Working Hours: 8 Hours (Excluding breaks)

    4 Core US Hours Overlap (7:30pm- 11:30pm IST)

    4 flexible hours (IST)

  • Workspace: A quiet space to work, an internet connection of at least 30 Mbps download | 10 Mbps upload

The application window for this position is anticipated to close in 2 weeks (10 business days) from May 5, 2026. Please know this may change based on business and interviewing needs.

What You Will Get As A Porch Group Team Member 

Pay Range*: INR 2,475,000 INR - 3,465,000 Annually

*Please know your actual pay at Porch will reflect a number of factors among which are your work experience and skillsets, job-related knowledge, alignment with market and our Porch employees, as well as your geographic location. 

Our benefits package will provide you with comprehensive coverage for your health, life, and financial well-being.    

  • Our benefits include medical insurance, accident insurance and retiral benefits. 
  • Our wellness programs include 12 company-paid holidays, 2 flexible holidays, privilege/earned leave, casual/sick leave, paid maternity and paternity Leaves, and weekly wellness events. 

#LI-DG1
#LI-REMOTE

What’s next?

Submit your application below and our Talent Acquisition team will be reviewing your application shortly! If your resume gets us intrigued, we will look to connect with you for a chat to learn more about your background, and then possibly invite you to have virtual interviews. What's important to call out is that we want to make sure not only that you're the right person for us, but also that we're the right next step for you, so come prepared with all the questions you have!

Porch is committed to building an inclusive culture of belonging that not only embraces the diversity of our people but also reflects the diversity of the communities in which we work and the customers we serve. We know that the happiest and highest performing teams include people with diverse perspectives that encourage new ways of solving problems, so we strive to attract and develop talent from all backgrounds and create workplaces where everyone feels seen, heard and empowered to bring their full, authentic selves to work.

Porch is an Equal Opportunity employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex including sexual orientation and gender identity, national origin, disability, protected veteran status, or any other characteristic protected by applicable laws, regulations, and ordinances.

Similar Jobs

3 Hours Ago
Remote or Hybrid
India
Senior level
Senior level
Fintech • Professional Services • Consulting • Energy • Financial Services • Cybersecurity • Generative AI
The role involves managing program conflicts, facilitating communication across programs, process mapping, conducting quality assurance deep dives, and optimizing resource management, ensuring alignment and efficiency in financial transformation projects.
Top Skills: Azure DevopsClarityMS OfficeTableau
3 Hours Ago
Easy Apply
Remote
India
Easy Apply
Senior level
Senior level
Artificial Intelligence • Edtech • Mobile • Natural Language Processing • Productivity • Software
Lead the engineering team focusing on the data platform's architecture and scaling, ensuring reliability and performance while mentoring engineers and collaborating with cross-functional teams.
Top Skills: AWSAzureEltETLGCPJavaScriptNode.jsReactSpark
3 Hours Ago
Remote or Hybrid
Senior level
Senior level
Artificial Intelligence • Cloud • Information Technology • Sales • Security • Software • Cybersecurity
Join the AI Center of Excellence to design AI systems, collaborate with security teams, and optimize machine learning models for threat detection and automation, mentoring junior members in the process.
Top Skills: AWSBedrockEksHuggingfaceJupyter NotebooksKerasLambdaLangchainNumpyPandasPythonPyTorchSagemakerScikit-LearnTensorFlowTerraformTransformers

What you need to know about the Delhi Tech Scene

Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.

Sign up now Access later

Create Free Account

Please log in or sign up to report this job.

Create Free Account