Staff Software Engineer - Data

DoubleVerify5 months ago

Houston, TX, United States

Hybrid

Full-time

Junior Level (1-3 years)

Job Description

Position Overview

We are looking for a Staff Software Engineer to shape the future of our data platform with a focus on small data at scale. While many companies over-index on heavyweight distributed systems, we believe in the power of efficient, local-first, columnar engines like DuckDB to process and analyze data quickly, reliably, and cost-effectively.

As a Staff Software Engineer, you will set the technical direction for how our teams ingest, transform, and serve data, bridging the gap between lightweight embedded tools and cloud-scale systems. You’ll be hands-on in building pipelines, while also mentoring engineers and setting best practices across the organization.

Salary: $128,000 - $230,000 per year; this role is also eligible for bonus/commission, equity, and additional Benefits:.

About the Company: DoubleVerify

Key Responsibilities

Architect and Build Data Pipelines
- Design and implement data processing workflows using DuckDB, Polars, and Arrow/Parquet.
- Balance small-data local pipelines with cloud data warehouse backends.
Champion the Small Data Mindset
- Advocate for efficient, vectorized, local-first approaches where appropriate.
- Drive best practices for designing reproducible and testable data workflows.
Collaborate Cross-Functionally
- Partner with data science, professional services, and product engineering teams to define semantic data layers.
- Provide technical leadership in how data is versioned, validated, and surfaced for downstream use.
Operational Excellence
- Establish standards for CI/CD, observability, and reliability in data pipelines.
- Automate workflows and optimize data layout for performance and cost efficiency.
Mentor & Lead
- Serve as a thought leader in the organization, guiding engineers on when to use lightweight tools versus distributed platforms.
- Mentor senior and mid-level data engineers to accelerate their growth.

Required Qualifications

Deep expertise in SQL (window functions, CTEs, optimization).
Strong Python skills with data libraries.
Proficiency with DuckDB including extensions and parquet/iceberg integration.
Hands-on experience with columnar formats (Parquet, Arrow, ORC) and schema evolution.
Expertise in Kubernetes and Helm.
Cloud storage experience with AWS S3 and GCS.
Experience with semantic layer frameworks such as CubeJS.
Familiarity with CI/CD tooling including GitHub Actions, Terraform, and Docker/Kubernetes.
Track record of leading architecture decisions and mentoring teams.
Ability to set standards for maintainability and developer experience.

Preferred Qualifications

Experience with serverless and embedded analytics (e.g., DuckDB WASM in production).
Exposure to data versioning technologies such as Delta Lake, Iceberg, or Hudi.
Knowledge of ML/LLM data preparation workflows and vector database integrations.
Previous experience building hybrid stacks combining local development with cloud warehouse production.

Benefits & Perks

Benefits: Eligibility for bonus/commission, equity, and additional company benefits.

Required Skills

CI/CD

Embedded Systems

DuckDB

Warehousing

GitHub

Product Engineering

Technical Leadership

Cloud Storage (AWS S3, GCS)

Continuous Integration

Cloud Computing

Polars

Artificial Intelligence (AI)

Python Programming/Scripting Language

Mentoring

Data Science

Cross-Functional Collaboration

SQL (Structured Query Language)

Thought Leadership

Data Processing

Data Analysis

Terraform

Continuous Deployment/Delivery

Kubernetes

Distributed Computing

Snowflake Schema

Helm

Cost Control

Data Management

Software Engineering

CubeJS

Docker

Performance Tuning/Optimization

Professional Services

Data Warehousing

Leadership

Arrow/Parquet

Best Practices