Staff Software Engineer - Data

DoubleVerify5 months ago
Houston, TX, United States
Hybrid
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

We are looking for a Staff Software Engineer to shape the future of our data platform with a focus on small data at scale. While many companies over-index on heavyweight distributed systems, we believe in the power of efficient, local-first, columnar engines like DuckDB to process and analyze data quickly, reliably, and cost-effectively.

As a Staff Software Engineer, you will set the technical direction for how our teams ingest, transform, and serve data, bridging the gap between lightweight embedded tools and cloud-scale systems. You’ll be hands-on in building pipelines, while also mentoring engineers and setting best practices across the organization.

Salary: $128,000 - $230,000 per year; this role is also eligible for bonus/commission, equity, and additional Benefits:.

About the Company: DoubleVerify

Key Responsibilities

  • Architect and Build Data Pipelines
    • Design and implement data processing workflows using DuckDB, Polars, and Arrow/Parquet.
    • Balance small-data local pipelines with cloud data warehouse backends.
  • Champion the Small Data Mindset
    • Advocate for efficient, vectorized, local-first approaches where appropriate.
    • Drive best practices for designing reproducible and testable data workflows.
  • Collaborate Cross-Functionally
    • Partner with data science, professional services, and product engineering teams to define semantic data layers.
    • Provide technical leadership in how data is versioned, validated, and surfaced for downstream use.
  • Operational Excellence
    • Establish standards for CI/CD, observability, and reliability in data pipelines.
    • Automate workflows and optimize data layout for performance and cost efficiency.
  • Mentor & Lead
    • Serve as a thought leader in the organization, guiding engineers on when to use lightweight tools versus distributed platforms.
    • Mentor senior and mid-level data engineers to accelerate their growth.

Required Qualifications

  • Deep expertise in SQL (window functions, CTEs, optimization).
  • Strong Python skills with data libraries.
  • Proficiency with DuckDB including extensions and parquet/iceberg integration.
  • Hands-on experience with columnar formats (Parquet, Arrow, ORC) and schema evolution.
  • Expertise in Kubernetes and Helm.
  • Cloud storage experience with AWS S3 and GCS.
  • Experience with semantic layer frameworks such as CubeJS.
  • Familiarity with CI/CD tooling including GitHub Actions, Terraform, and Docker/Kubernetes.
  • Track record of leading architecture decisions and mentoring teams.
  • Ability to set standards for maintainability and developer experience.

Preferred Qualifications

  • Experience with serverless and embedded analytics (e.g., DuckDB WASM in production).
  • Exposure to data versioning technologies such as Delta Lake, Iceberg, or Hudi.
  • Knowledge of ML/LLM data preparation workflows and vector database integrations.
  • Previous experience building hybrid stacks combining local development with cloud warehouse production.

Benefits & Perks

  • Benefits: Eligibility for bonus/commission, equity, and additional company benefits.

Required Skills

CI/CD
Embedded Systems
DuckDB
Warehousing
GitHub
Product Engineering
Technical Leadership
Cloud Storage (AWS S3, GCS)
Continuous Integration
Cloud Computing
Polars
Artificial Intelligence (AI)
Python Programming/Scripting Language
Mentoring
Data Science
Cross-Functional Collaboration
SQL (Structured Query Language)
Thought Leadership
Data Processing
Data Analysis
Terraform
Continuous Deployment/Delivery
Kubernetes
Distributed Computing
Snowflake Schema
Helm
Cost Control
Data Management
Software Engineering
CubeJS
Docker
Performance Tuning/Optimization
Professional Services
Data Warehousing
Leadership
Arrow/Parquet
Best Practices