Staff Data Engineer - AI (Remote)

Rula7 months ago
Los Angeles, California, United States
Remote
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

We believe that mental health is just as important as physical health. We recognize that mental health issues can be complex and multifaceted, and we are dedicated to treating the whole person, not just the symptoms. Our aim is to create a world where mental health is embraced as an integral part of overall well-being.

We’re shaping the future of mental health care with AI-enabled experiences that enhance, not replace, the human connection at the core of therapy. As a Data Engineer, you will help build and maintain data pipelines from our central storage system designed to support ML/AI training workflows. You will work closely with data experts and specialists to design reliable flows of information, test for accuracy, and solve unexpected challenges – all to empower more individuals to get the mental health support they deserve.

Key Responsibilities

  • Build and maintain scalable ETL/ELT data pipelines to support ML/AI model training workflows.
  • Design reliable flows of information by testing for accuracy and resolving unexpected challenges.
  • Collaborate cross-functionally with data analysts, scientists, and ML engineers to transform raw data into actionable insights.

Required Qualifications

  • 8+ years of Data Pipeline Development – building and maintaining scalable ETL/ELT pipelines for ML/AI workflows using tools like AWS Glue, DBT, Dagster, Spark, or Ray; strong proficiency in Spark, Python, and SQL.
  • 8+ years of Cloud Infrastructure & Data Warehousing experience, including 4+ years focusing on AWS services (Redshift, S3, Glue, IAM, EMR, SageMaker) and optimizing data warehouses and managing data lakes.
  • Implement scalable data validation, quality checks, and error-handling mechanisms tailored for ML/AI pipelines, including bias detection and metadata management in compliance with regulations like HIPAA or CPRA.
  • Optimize data pipelines and queries, manage large datasets for efficiency and scalability, and adhere to best practices for high-throughput systems.
  • Experience with data security measures including encryption, role-based access control, and data masking, with a strong understanding of compliance standards (e.g., HIPAA, SOC 2).
  • Strong ability to work cross-functionally with data analysts, data scientists, and stakeholders while effectively communicating technical concepts to non-technical audiences.

Preferred Qualifications

  • Hands-on experience with AWS tools such as S3, Glue, EMR, SageMaker, and Lambda for building scalable ETL/ELT pipelines optimized for ML/LLM training.
  • Proven track record in implementing robust data validation, bias detection, and lineage tracking in data lakes using tools like Delta Lake or Iceberg.
  • Familiarity with infrastructure as code (IaC) tools like Terraform or CloudFormation for managing cloud resources.
  • Experience implementing and maintaining CI/CD pipelines for data workflows.
  • Ability to monitor and reduce costs for large-scale ML/AI workflows in AWS using techniques such as spot instances, auto-scaling EMR clusters, and efficient S3 storage tiers.
  • Strong partnership skills to work with data scientists and ML engineers in designing efficient pipelines, employing orchestration tools like Airflow or Dagster for incremental loading and cost optimization.

Benefits & Perks

  • 100% remote work environment (US-based only) to support a healthy work-life balance.
  • Attractive pay and benefits: Full transparency of pay ranges regardless of where you live in the United States.
  • Comprehensive health benefits: Medical, dental, vision, life, disability, and FSA/HSA.
  • 401(k) plan access: Start saving for your future.
  • Generous time-off policies including 2 company-wide shutdown weeks each year for self-care.
  • Paid parental leave available for all parents, including birthing, non-birthing, adopting, and fostering.
  • Employee Assistance Program (EAP) to support your mental and physical health.
  • New hire home office stipend to set up your workspace for success.
  • Quarterly department stipend to fund team-building activities or in-person gatherings.
  • Wellness events and lunch & learns to explore a variety of engaging topics.
  • Community and employee resource groups to foster a sense of inclusion and belonging.

Compensation Range: $184.1K - $228K

We are committed to fostering diversity, equity, and inclusion. Our culture is built on being safe, seen, heard, and valued – ensuring that mental healthcare works for everyone.

Required Skills

AWS Glue
Data Validation
Data Lakes
ETL/ELT
AWS Redshift
CI/CD Pipelines
Python
AWS SageMaker
Infrastructure as Code
Data Warehousing
Spark
Data Pipeline Development
AWS S3
SQL