The Senior Data Engineer will design and maintain pipelines for event data ingestion and validation, ensuring operational reliability and consistency for analytics.
About HighLevel:
HighLevel is an AI-powered business operating system that gives agencies, entrepreneurs and SMBs the infrastructure to build, automate and scale. Today, HighLevel supports SMBs across 150+ countries, fueling community-driven growth rooted in real customer outcomes.
To date, businesses operating on HighLevel have generated over $7 billion in ecosystem value, demonstrating the impact of shared infrastructure at scale. By centralizing conversations, automation and intelligence into one system, we help businesses move faster, reduce complexity and execute efficiently.
Behind the platform, HighLevel powers more than 4 billion API hits and 2.5 billion message events daily. With 250 terabytes of distributed data, 250+ microservices and over 1 million domain names supported, our architecture is built for performance, resilience and long-term scalability.
Our People
With over 2,000 team members across 10+ countries, HighLevel operates as a global, remote-first organization built for speed and ownership. We value initiative, clarity and execution, creating space for ambitious people to build systems that support millions of businesses worldwide. Here, innovation thrives, ideas are celebrated and people come first, no matter where they call home.
Our Impact
Every month, HighLevel enables more than 1.5 billion messages, 200 million leads and 20 million conversations for the more than 1 million businesses we support. Behind those numbers are real people building independence, expanding opportunity and creating measurable impact. We’re proud to be a part of that.
Learn more about us on our YouTube Channel or Blog Posts
HighLevel is an AI-powered business operating system that gives agencies, entrepreneurs and SMBs the infrastructure to build, automate and scale. Today, HighLevel supports SMBs across 150+ countries, fueling community-driven growth rooted in real customer outcomes.
To date, businesses operating on HighLevel have generated over $7 billion in ecosystem value, demonstrating the impact of shared infrastructure at scale. By centralizing conversations, automation and intelligence into one system, we help businesses move faster, reduce complexity and execute efficiently.
Behind the platform, HighLevel powers more than 4 billion API hits and 2.5 billion message events daily. With 250 terabytes of distributed data, 250+ microservices and over 1 million domain names supported, our architecture is built for performance, resilience and long-term scalability.
Our People
With over 2,000 team members across 10+ countries, HighLevel operates as a global, remote-first organization built for speed and ownership. We value initiative, clarity and execution, creating space for ambitious people to build systems that support millions of businesses worldwide. Here, innovation thrives, ideas are celebrated and people come first, no matter where they call home.
Our Impact
Every month, HighLevel enables more than 1.5 billion messages, 200 million leads and 20 million conversations for the more than 1 million businesses we support. Behind those numbers are real people building independence, expanding opportunity and creating measurable impact. We’re proud to be a part of that.
Learn more about us on our YouTube Channel or Blog Posts
About the Role:
We are looking for a Lead Data Engineer to own the event ingestion and identity layer that connects product instrumentation to downstream analytical systems.
This role focuses on the operational reliability and correctness of event and identity data as it moves through the data platform. You will design and operate pipelines, schema validation, and replay workflows that ensure product events remain consistent and safe to use for analytics and customer-facing reporting.
You will work closely with product engineering teams on instrumentation patterns, with the CDP team on event contracts and definitions, and with platform teams to ensure event infrastructure and analytical systems scale reliably. This role builds the foundational event and identity datasets required for reliable downstream modeling. Behavioral models, canonical entities, and business analytics datasets are owned by the analytics engineering team.
This role focuses on the operational reliability and correctness of event and identity data as it moves through the data platform. You will design and operate pipelines, schema validation, and replay workflows that ensure product events remain consistent and safe to use for analytics and customer-facing reporting.
You will work closely with product engineering teams on instrumentation patterns, with the CDP team on event contracts and definitions, and with platform teams to ensure event infrastructure and analytical systems scale reliably. This role builds the foundational event and identity datasets required for reliable downstream modeling. Behavioral models, canonical entities, and business analytics datasets are owned by the analytics engineering team.
Responsibilities:
- Define event schemas, required fields, and compatibility rules in collaboration with the CDP team
- Implement automated validation and contract enforcement to prevent breaking schema changes
- Maintain versioning and compatibility guarantees for event producers and downstream consumers
- Build and maintain pipelines that ingest, validate, and process high-volume product events
- Ensure event streams are deduplicated, ordered correctly, and safe for downstream consumption
- Partner with platform teams to ensure ingestion pipelines scale with product growth
- Define and maintain identity stitching logic across anonymous and authenticated users
- Handle identity merges, splits, and corrections while preserving tenant boundaries
- Ensure identity resolution remains explainable, deterministic, and safe for downstream datasets
- Design workflows that allow event datasets and identity graphs to be replayed or rebuilt safely
- Build tooling for historical corrections, schema evolution, and dataset reprocessing
- Ensure downstream models can be rebuilt without manual intervention when definitions evolve
- Provide guidance and tooling that help product teams emit events consistently
- Maintain validation checks and schema enforcement that catch instrumentation issues early
- Collaborate with engineering teams to evolve instrumentation safely over time
- Ensure deletion and suppression requests propagate correctly through event and identity pipelines
- Partner with governance and security teams to support policy requirements
- Define requirements and interfaces for event infrastructure and downstream analytical systems
- Work with platform teams to ensure pipelines remain reliable, scalable, and observable.
Requirements:
- 7+ years of experience in data engineering, platform engineering, or product data roles
- Strong experience building and operating event ingestion or streaming pipelines
- Experience implementing schema validation, data contracts, or event governance frameworks
- Strong SQL and Python, with experience building data processing or validation tooling
- Familiarity with identity resolution, entity resolution, or customer identity systems
- Experience operating analytical data systems or large-scale event datasets
EEO Statement:
The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government record-keeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.
#LI-Remote #LI-NJ1
The company is an Equal Opportunity Employer. As an employer subject to affirmative action regulations, we invite you to voluntarily provide the following demographic information. This information is used solely for compliance with government record-keeping, reporting, and other legal requirements. Providing this information is voluntary and refusal to do so will not affect your application status. This data will be kept separate from your application and will not be used in the hiring decision.
#LI-Remote #LI-NJ1
Similar Jobs
Healthtech • Software
The role involves designing and maintaining scalable data architectures and pipelines, collaborating with teams to deliver data-driven solutions, and optimizing complex SQL across various databases.
Top Skills:
Apache AirflowAws (S3BedrockDjangoEc2FlaskGithub CopilotLambda)LangchainMs Sql ServerOraclePostgresPythonSnowflakeSQL
Cloud • Database
As a Senior Data Engineer, you'll design, develop, and maintain cloud data solutions, build data pipelines, ensure data quality, and collaborate with clients and teams.
Top Skills:
Apache AirflowSparkAzureCloud TechnologiesDatabricksDatabricksNoSQLSparkSQL
Insurance • Software • Energy • Financial Services
Lead a team of data engineers to build and maintain data pipelines, ensure engineering best practices, and collaborate with cross-functional teams.
Top Skills:
AirflowBigQueryComposerDbtGCPGcsKafkaPythonSparkSQL
What you need to know about the Delhi Tech Scene
Delhi, India's capital city, is a place where tradition and progress co-exist. While Old Delhi is known for its rich history and bustling markets, New Delhi is defined by its modern architecture. It's clear the region places a strong emphasis on preserving its cultural heritage while embracing technological advancements, particularly in artificial intelligence, which plays a central role in shaping the city's tech landscape, fueled by investments in research and development.


