Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

Enigma2 months ago
San Jose, CA, United States
Hybrid
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

We are seeking a talented Machine Learning Engineer to bridge research and production by productizing and optimizing advanced ML models. In this role, you will leverage state‐of‐the‐art techniques—including distributed training, model-efficiency methods, and scalable serving systems—to deliver robust, high-performance solutions. You will work collaboratively with cross-functional teams including ML Ops, Research, and Platform Engineering to ensure reliable and cost-efficient service delivery. Location: San Jose, CA

Key Responsibilities

  • Productize and optimize models from research into reliable, high-performance, and cost-effective production services with clear SLOs (latency, availability, cost).
  • Scale training across nodes/GPUs using techniques such as DDP, FSDP, ZeRO, and pipeline/tensor parallelism, ensuring optimal throughput and reduced time-to-train.
  • Implement model-efficiency techniques (quantization, distillation, pruning, KV-cache optimization, Flash Attention) to maintain quality during both training and inference.
  • Build and maintain robust model-serving systems incorporating batching, streaming, caching, and memory management using tools like vLLM, Triton, TGI, ONNX, TensorRT, and AITemplate.
  • Integrate with vector/feature stores and data pipelines (e.g., FAISS, Milvus, Pinecone, pgvector; Parquet, Delta) to support production requirements.
  • Define and monitor performance and cost KPIs, drive continuous improvements, and assist in capacity planning.
  • Collaborate closely with ML Ops, Scientists, and Infrastructure teams to ensure reproducible evaluations, CI/CD integration, and robust telemetry/observability.

Required Qualifications

  • Education: Bachelor's in Computer Science, Electrical/Computer Engineering, or a related field is required.
  • Experience Required: 3–5 years in ML/AI engineering roles with a proven track record in production model training or serving at scale.
  • Qualifications: Strong background in machine learning engineering with hands-on experience in Python, PyTorch (primary), and distributed training techniques (DDP, FSDP, ZeRO, pipeline/tensor parallelism).
  • Qualifications: Proven expertise in code profiling, optimization (PTQ, QAT, AWQ, GPTQ), and implementation of model efficiency strategies such as quantization, distillation, pruning, and KV-cache optimization.
  • Qualifications: Experience in building scalable model-serving systems and integrating with data pipelines and storage solutions (SQL/NoSQL, vector stores, Parquet/Delta).

Preferred Qualifications

  • Qualifications: Master’s degree in Computer Science, Electrical/Computer Engineering, or a related field, or equivalent industry experience.
  • Qualifications: Additional experience with large-scale GPU optimizations, advanced distributed training frameworks, and modern model serving technologies.

Benefits & Perks

  • Benefits: Hybrid work environment offering a balance between remote and on-site collaboration.

Required Skills

Profiling
ML Ops
GPU
Python
CI/CD
Distributed Training
Optimization
Model Serving
PyTorch
Deep Learning