Databricks Data Engineer with DevOps

Apptad Inc4 months ago

Los Angeles, CA, United States

Remote

Full-time

Junior Level (1-3 years)

Job Description

Position Overview

We are looking for an experienced Databricks Data Engineer with strong DevOps expertise to join our data engineering team. The ideal candidate will design, build, and optimize large-scale data pipelines on the Databricks Lakehouse platform while implementing robust CI / CD and deployment practices. This role requires strong skills in PySpark, SQL, Azure cloud services, and modern DevOps tooling. You will collaborate with cross-functional teams to deliver scalable, secure, and high performance data solutions.

Key Responsibilities

Responsibilities: Data Pipeline Development – Design, build, and maintain scalable ETL/ELT pipelines using Databricks; develop data processing workflows using PySpark/Spark and SQL; integrate data from ADLS, Azure Blob Storage, and various data sources; and implement Delta Lake best practices including schema evolution, ACID transactions, OPTIMIZE, ZORDER, and performance tuning.
Responsibilities: DevOps & CI / CD – Implement CI / CD pipelines for Databricks using Git, GitLab, Azure DevOps or similar tools; build and manage automated deployments with Databricks Asset Bundles; manage version control for notebooks and related artifacts; and automate cluster configuration, job creation, and environment provisioning.
Responsibilities: Collaboration & Business Support – Work with data analysts and BI teams to prepare datasets for reporting and dashboarding; collaborate with product owners, business partners, and engineering teams to translate requirements into scalable data solutions; and document data flows, architecture, and deployment processes.
Responsibilities: Performance & Optimization – Tune Databricks clusters, jobs, and pipelines for cost efficiency and high performance; monitor workflows, debug failures, and ensure pipeline stability; and implement job instrumentation and observability using logging and monitoring tools.
Responsibilities: Governance & Security – Implement and manage data governance using Unity Catalog; enforce access controls, data security, and compliance with enterprise policies; and ensure best practices around data quality, lineage, and auditability.

Required Qualifications

Strong hands-on experience with Databricks, including Delta Lake, Unity Catalog, Lakehouse Architecture, Delta Live Pipelines, Databricks Runtime, and Table Triggers.
Proficiency in PySpark, Spark, and advanced SQL.
Expertise with Azure cloud services (ADLS, ADF, Key Vault, Functions, etc.).
Experience with relational databases and data warehousing concepts.
Strong understanding of DevOps tools such as Git/GitLab, CI / CD pipelines, and Databricks Asset Bundles.
Familiarity with infrastructure-as-code (Terraform is a plus).

Preferred Qualifications

Knowledge of streaming technologies like Structured Streaming or Spark Streaming.
Experience building real-time or near real-time pipelines.
Exposure to advanced Databricks runtime configurations and tuning.
Certifications such as Databricks Certified Data Engineer Associate/Professional and Azure Data Engineer Associate.

Required Skills

Azure Functions

SQL

PySpark

Spark

Unity Catalog

Table Triggers

Lakehouse Architecture

Databricks Runtime

DevOps

Azure DevOps

Azure ADLS

Databricks Asset Bundles

Key Vault

Azure Data Factory

CI/CD

Azure Blob Storage

GitLab

Terraform

Delta Live Pipelines

Git

Databricks

Delta Lake