Databricks Architect

Tror AI for everyone3 months ago
Los Angeles, CA, United States
Hybrid
Full-time
Junior Level (1-3 years)

Job Description

Job Title: Databricks Architect

Location : Los Angeles CA (Hybrid)

The Databricks Architect is responsible for designing, implementing, and optimizing scalable data analytics and data engineering solutions on the Databricks Lakehouse Platform. This role requires deep expertise in cloud platforms (Azure/AWS/GCP), distributed data processing, Delta Lake architectures, and modern data engineering practices. The architect will collaborate with cross-functional teams to define data strategies, ensure platform reliability, and enable advanced analytics, ML, and BI use cases.

Key Responsibilities

  • Architecture & Design
    • Design end-to-end Databricks Lakehouse architectures for data ingestion, processing, storage, and consumption.
    • Define and implement Delta Lake patterns, including medallion architecture (Bronze/Silver/Gold).
    • Develop scalable data pipelines using PySpark, Spark SQL, and Databricks workflows.
    • Architect solutions for structured, semi-structured, and unstructured data.
  • Engineering & Implementation
    • Build robust ETL/ELT pipelines with Databricks notebooks, jobs, and workflows.
    • Design and implement high-performance streaming solutions using Structured Streaming.
    • Optimize Spark jobs for cost, performance, and scalability.
    • Implement CI/CD and automation using Databricks Repos, Git, and DevOps pipelines.
  • Cloud & Platform Expertise
    • Architect solutions across Azure/AWS/GCP leveraging native cloud services (e.g., Azure Data Factory, AWS Glue, GCP Dataflow).
    • Ensure security, governance, and compliance through Unity Catalog, RBAC, and encryption.
    • Monitor workloads and optimize cluster configurations for performance and cost.
  • Collaboration & Leadership
    • Work closely with data engineers, data scientists, BI teams, and business stakeholders.
    • Act as a subject matter expert (SME) for Databricks best practices, standards, and patterns.
    • Conduct architectural reviews and guide teams on design decisions.
    • Lead PoCs, evaluate new features, and drive platform adoption.
  • Quality, Governance & Observability
    • Define standards for data quality, lineage, observability, and governance.
    • Implement automated testing frameworks for pipelines and notebooks.
    • Establish performance baselines and monitoring dashboards.

Required Skills & Experience

Technical Skills

  • 7+ years of experience in data engineering/architecture.
  • 3+ years of hands-on experience with Databricks.
  • Strong expertise in Spark, PySpark, SQL, and distributed data processing.
  • Deep understanding of Delta Lake features: ACID transactions, OPTIMIZE, ZORDER, Auto Loader.
  • Experience with workflow orchestration, jobs, and Databricks REST APIs.
  • Hands-on expertise with at least one cloud platform:
    • Azure (preferred): ADF, ADLS, Key Vault, Event Hub, Azure DevOps
    • AWS: S3, Glue, Lambda, Kinesis
    • GCP: GCS, Dataflow, Pub/Sub
  • Familiarity with CI/CD, Git, DevOps, and Infrastructure-as-Code (Terraform preferred).

Soft Skills

  • Strong analytical and problem-solving skills.
  • Excellent communication and stakeholder management.
  • Ability to lead design discussions and guide technical teams.
  • Strong documentation and architectural blueprinting skills.

Preferred Qualifications

  • Databricks certifications, such as:
    • Databricks Certified Data Engineer Professional
    • Databricks Certified Machine Learning Professional
    • Databricks Lakehouse Fundamentals
  • Experience with MLflow, Feature Store, or MLOps workflows.
  • Experience working in regulated industries (BFSI, healthcare, etc.).

Required Skills

AWS
Azure
SQL
ACID Transactions
DevOps
Spark
CI/CD
Performance Optimization
ETL
Cloud Services
Observability
Databricks
GCP
ELT
Workflow Orchestration
Delta Lake
Data Engineering
Data Architecture
PySpark
Data Governance