Parallel Domain

Principal Site Reliability Engineer

Posted Yesterday

Be an Early Applicant

In-Office or Remote

Hiring Remotely in Vancouver, BC

Senior level

In-Office or Remote

Hiring Remotely in Vancouver, BC

Senior level

The Principal Site Reliability Engineer will oversee cloud infrastructure, enhance reliability, manage AWS/EKS operations, and lead incident response efforts.

The summary above was generated by AI

About the Role

Parallel Domain is looking for a Principal Site Reliability Engineer to own the reliability, scalability, and security of our cloud infrastructure - the backbone that runs simulation workloads for some of the most demanding customers in autonomous vehicle development.

This is a hands-on, high-ownership role. You'll be the primary infrastructure owner across our multi-region AWS/EKS platform, working closely with a small platform engineering team, partnering with engineering leads across simulation and ML, and our customer-facing teams.

What You'll Do

Infrastructure Ownership & Cloud Operations

Own and evolve our AWS-based infrastructure, improving platform performance and availability today, and building toward deployable configurations that support enterprise customer environments tomorrow.
Own EKS cluster operations across production regions: node pool strategy, AMI lifecycle, autoscaling, and Kubernetes workload health.
Support the GitOps deployment pipeline - define, deploy, and manage applications across clusters using infrastructure-as-code.
Manage complex networking: VPC design, cross-region connectivity, DNS, and load balancing.
Lead infrastructure deprecation and migration efforts with minimal disruption.

Reliability Engineering & Incident Response

Own SLO measurement infrastructure; enable proactive triage of emerging issues before they impact customers.
Lead incident investigation, root cause analysis and postmortems, driving systemic fixes rather than one-off patches.
Design and improve automated remediation systems to reduce MTTR.

Security & Access Management

Review and provide security-conscious feedback on platform architecture decisions.
Own cloud IAM governance - roles, policies, and access boundaries across accounts and services.
Lead compliance-adjacent work including audit-readiness, partner certification requirements, and supporting responses to customer security questionnaires.

Cross-Functional Collaboration

Partner with application development teams to build an inherently secure platform and drive next-generation deployment architecture.

Partner with customer teams to ensure availability for expected utilization.
Partner with Finance on cloud cost optimization - lifecycle policies, right-sizing, and spend visibility.
Support GPU and batch workloads in collaboration with simulation and ML engineering teams.

Platform Tooling & Developer Experience

Improve CI/CD pipelines and automated infrastructure validation.
Support engineering teams with infra-side debugging, log analysis, and environment configuration.

What We're Looking For

Technical Depth

5+ years in SRE, DevOps, or infrastructure engineering roles.
Infrastructure-as-code proficiency - Terraform modules, state management, and multi-environment patterns.
Deep AWS experience - EKS, EC2, IAM, S3, Storage Gateway, VPC networking, Transit Gateway, CloudFront, KMS, and IRSA.
Kubernetes expertise - cluster operations, node pools, probes, cordoning, pod scheduling, RBAC, Helm, node autoscaling (Karpenter experience a plus); solid understanding of containerization and AMI lifecycle management.
CI/CD - experience with GitOps workflows and pipeline tooling (ArgoCD, GitHub Actions, Jenkins)
Solid networking fundamentals - CIDR design, security groups, DNS, load balancing, VPN, cross-region connectivity.
Experience with monitoring and observability tooling - Prometheus, Grafana, Elasticsearch.
Comfort with Python and Bash for tooling and automation.
Familiarity working across Linux and Windows environments. Operational familiarity with Windows Server is a meaningful advantage.

Communication & Ownership

You communicate clearly across engineering, product, and customer-facing teams, flagging issues with urgency proportional to customer impact.
You advocate for SRE best practices and can effectively operationalize an informed and principled view on security.

You take end-to-end ownership of complex, multi-team efforts - from planning through execution and post-change verification.
You know when to push for a clean solution vs. when to accept a pragmatic one, and you communicate that tradeoff clearly.

Nice to Have

Experience with Windows-based workloads on EKS.
Experience supporting simulation, ML, or rendering workloads in cloud infrastructure; running GPU workloads on Kubernetes, including NVIDIA and DirectX device plugin configuration.
Experience with AWS Storage Gateway or Transfer Family integrations.
Familiarity with Envoy Gateway or similar.
Experience with container-optimized OS images (e.g., Bottlerocket, Packer).
Experience with cloud cost optimization at scale.

Core Tools

Terraform · AWS · Kubernetes · Helm · ArgoCD · Kustomize · Grafana · Prometheus · Elasticsearch · VictoriaLogs · Fluent Bit · GitHub Actions · Jenkins · Docker · Python · Bash

Top Skills

AWS

Bash

Ci/Cd

Eks

Elasticsearch

Gitops

Grafana

Jenkins

Kubernetes

Prometheus

Python

Terraform

Similar Jobs

Zapier

Automation Strategist (Customer Success)

17 Hours Ago

In-Office or Remote

Senior level

Artificial Intelligence • Productivity • Software • Automation

The Automation Strategist will guide customers in automating processes, help identify use cases, and promote AI-enabled transformation, focusing on value delivery and relationship building.

Top Skills: AIAutomation

Dandy

Director, Mechatronics Engineering

17 Hours Ago

Remote or Hybrid

Senior level

Computer Vision • Healthtech • Information Technology • Logistics • Machine Learning • Software • Manufacturing

Lead architecture and integration of high-precision mechatronic systems for special-purpose manufacturing machines. Drive prototyping, motor control, sensor fusion, vendor co-development, documentation (CAD, BOMs), testing, and production transition, with frequent supplier travel.

Top Skills: 3D Printing3D Vision SystemsAcs Motion ControllersAutomated Optical Inspection (Aoi)Beckhoff TwincatCanopenCfdCnc MillingElmo Motion ControllersEmbedded SystemsEthercatFeaHigh-Resolution Optical EncodersLaser InterferometryPlc ProgrammingSensor FusionSolidworks

Dropbox

Staff Product Designer

17 Hours Ago

Remote

Senior level

Artificial Intelligence • Cloud • Consumer Web • Productivity • Software • App development • Data Privacy

Lead design for CompanyOS, collaborating with cross-functional teams to create user experiences, conduct research, and iterate on designs based on feedback. Responsible for the product lifecycle from concept to execution.

Top Skills: Ai/Ml TechnologiesDesign SystemsPrototyping Tools

What you need to know about the Delhi Tech Scene

Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.