Databricks Data Engineer with DevOps
Apptad Inc4 months ago
Los Angeles, CA, United States
Remote
Full-time
Junior Level (1-3 years)
Job Description
Position Overview
We are looking for an experienced Databricks Data Engineer with strong DevOps expertise to join our data engineering team. The ideal candidate will design, build, and optimize large-scale data pipelines on the Databricks Lakehouse platform while implementing robust CI / CD and deployment practices. This role requires strong skills in PySpark, SQL, Azure cloud services, and modern DevOps tooling. You will collaborate with cross-functional teams to deliver scalable, secure, and high performance data solutions.
Key Responsibilities
- Responsibilities: Data Pipeline Development – Design, build, and maintain scalable ETL/ELT pipelines using Databricks; develop data processing workflows using PySpark/Spark and SQL; integrate data from ADLS, Azure Blob Storage, and various data sources; and implement Delta Lake best practices including schema evolution, ACID transactions, OPTIMIZE, ZORDER, and performance tuning.
- Responsibilities: DevOps & CI / CD – Implement CI / CD pipelines for Databricks using Git, GitLab, Azure DevOps or similar tools; build and manage automated deployments with Databricks Asset Bundles; manage version control for notebooks and related artifacts; and automate cluster configuration, job creation, and environment provisioning.
- Responsibilities: Collaboration & Business Support – Work with data analysts and BI teams to prepare datasets for reporting and dashboarding; collaborate with product owners, business partners, and engineering teams to translate requirements into scalable data solutions; and document data flows, architecture, and deployment processes.
- Responsibilities: Performance & Optimization – Tune Databricks clusters, jobs, and pipelines for cost efficiency and high performance; monitor workflows, debug failures, and ensure pipeline stability; and implement job instrumentation and observability using logging and monitoring tools.
- Responsibilities: Governance & Security – Implement and manage data governance using Unity Catalog; enforce access controls, data security, and compliance with enterprise policies; and ensure best practices around data quality, lineage, and auditability.
Required Qualifications
- Strong hands-on experience with Databricks, including Delta Lake, Unity Catalog, Lakehouse Architecture, Delta Live Pipelines, Databricks Runtime, and Table Triggers.
- Proficiency in PySpark, Spark, and advanced SQL.
- Expertise with Azure cloud services (ADLS, ADF, Key Vault, Functions, etc.).
- Experience with relational databases and data warehousing concepts.
- Strong understanding of DevOps tools such as Git/GitLab, CI / CD pipelines, and Databricks Asset Bundles.
- Familiarity with infrastructure-as-code (Terraform is a plus).
Preferred Qualifications
- Knowledge of streaming technologies like Structured Streaming or Spark Streaming.
- Experience building real-time or near real-time pipelines.
- Exposure to advanced Databricks runtime configurations and tuning.
- Certifications such as Databricks Certified Data Engineer Associate/Professional and Azure Data Engineer Associate.
Required Skills
Azure Functions
SQL
PySpark
Spark
Unity Catalog
Table Triggers
Lakehouse Architecture
Databricks Runtime
DevOps
Azure DevOps
Azure ADLS
Databricks Asset Bundles
Key Vault
Azure Data Factory
CI/CD
Azure Blob Storage
GitLab
Terraform
Delta Live Pipelines
Git
Databricks
Delta Lake