Data Engineer - Modern Data Platforms

Re:Build Manufacturing3 months ago
Los Angeles, CA, United States
Hybrid
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

Re:Build Manufacturing is a growing family of industrial and engineering businesses combining enabling technologies, operational superiority, and strategic M&A to build America’s next generation industrial company. We deploy deep expertise in engineering, operations management, and technology to supercharge the performance of our member companies. Our acquired businesses span multiple sectors including aerospace, defense, healthcare, and industrial equipment.

The Data Engineer will focus on utilizing modern data technologies to operationalize and expand the enterprise Data Lake. In this role, you will implement efficient ingestion strategies, integrate diverse data sources, and structure data for accessibility and analysis across hybrid on-prem and cloud environments.

Key Responsibilities

  • Co-design data interfaces and pipelines with software engineers and technical leads to align with application domain models and product roadmaps.
  • Build and operate batch, streaming, and change data capture (CDC) pipelines from diverse sources (e.g., ERP, CRM, Accounting systems) into the Data Lake.
  • Model curated data into data warehouse structures (e.g., star schemas, wide tables, semantic layers) optimized for business intelligence and KPI reporting.
  • Publish certified datasets and policy-aware retrieval assets to support analytics, AI, and retrieval-augmented generation (RAG) use cases.
  • Establish robust data observability and quality checks to ensure reliability and consistency.
  • Apply governance, security, and compliance controls—including role-based access, encryption, and auditing—across the data ecosystem.
  • Operate the platform reliably by orchestrating jobs, monitoring pipelines, and continuously tuning performance and cost.
  • Collaborate in line with The Re:Build Way, demonstrating continuous improvement and technical excellence.

Required Qualifications

  • Experience Required: 5 - 8+ years proven experience building production-grade data systems with expertise in cloud-based data lake architectures and data warehouses.
  • Qualifications: Demonstrated expertise in designing and operating data pipelines (batch, streaming, CDC), including schema evolution, backfills, and performance tuning.
  • Hands-on proficiency with Python and SQL, along with experience using distributed processing frameworks (e.g., Apache Spark) and CI/CD for data workflows.
  • Proven ability to design and implement ETL/ELT workflows and data modeling techniques.
  • Proficiency with cloud data platforms and services such as AWS, Databricks, and Snowflake.
  • Familiarity with open table formats (e.g., Iceberg, Delta, Hudi) and business intelligence data modeling.
  • Understanding of data governance, lineage, and quality frameworks to ensure data reliability and compliance.
  • Experience or strong interest in enabling AI/ML use cases, such as retrieval-augmented generation and vector indexes.
  • Education: Bachelor’s degree in Computer Science, Data Science, Mathematics, Analytics, or a related quantitative field (or equivalent experience).
  • Fluency in written and spoken English.
  • Strong personal attributes including enthusiasm, curiosity, leadership, adaptability, and effective communication skills; must thrive in a fast-paced environment and complete a background check with reliable professional references.

Benefits & Perks

  • Compensation: Base salary range $135 - $200K, plus an annual cash bonus and long term incentive, depending on candidate qualifications.
  • Benefits: Competitive Base Pay, performance-based bonus, Re:Build incentive stock awards, and a comprehensive benefits plan.
  • Location: Seattle, WA or Los Angeles, CA metropolitan areas.
  • Quarterly travel to the downtown Los Angeles office is required.

Required Skills

Databricks
SQL
Data Modeling
ETL/ELT Workflows
Python
Data Pipelines
Cloud Data Platforms
Apache Spark
Change Data Capture (CDC)
AWS
Snowflake