Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA

Enigma2 months ago

San Jose, CA, United States

Hybrid

Full-time

Junior Level (1-3 years)

Job Description

Position Overview

We are seeking a talented Machine Learning Engineer to bridge research and production by productizing and optimizing advanced ML models. In this role, you will leverage state‐of‐the‐art techniques—including distributed training, model-efficiency methods, and scalable serving systems—to deliver robust, high-performance solutions. You will work collaboratively with cross-functional teams including ML Ops, Research, and Platform Engineering to ensure reliable and cost-efficient service delivery. Location: San Jose, CA

Key Responsibilities

Productize and optimize models from research into reliable, high-performance, and cost-effective production services with clear SLOs (latency, availability, cost).
Scale training across nodes/GPUs using techniques such as DDP, FSDP, ZeRO, and pipeline/tensor parallelism, ensuring optimal throughput and reduced time-to-train.
Implement model-efficiency techniques (quantization, distillation, pruning, KV-cache optimization, Flash Attention) to maintain quality during both training and inference.
Build and maintain robust model-serving systems incorporating batching, streaming, caching, and memory management using tools like vLLM, Triton, TGI, ONNX, TensorRT, and AITemplate.
Integrate with vector/feature stores and data pipelines (e.g., FAISS, Milvus, Pinecone, pgvector; Parquet, Delta) to support production requirements.
Define and monitor performance and cost KPIs, drive continuous improvements, and assist in capacity planning.
Collaborate closely with ML Ops, Scientists, and Infrastructure teams to ensure reproducible evaluations, CI/CD integration, and robust telemetry/observability.

Required Qualifications

Education: Bachelor's in Computer Science, Electrical/Computer Engineering, or a related field is required.
Experience Required: 3–5 years in ML/AI engineering roles with a proven track record in production model training or serving at scale.
Qualifications: Strong background in machine learning engineering with hands-on experience in Python, PyTorch (primary), and distributed training techniques (DDP, FSDP, ZeRO, pipeline/tensor parallelism).
Qualifications: Proven expertise in code profiling, optimization (PTQ, QAT, AWQ, GPTQ), and implementation of model efficiency strategies such as quantization, distillation, pruning, and KV-cache optimization.
Qualifications: Experience in building scalable model-serving systems and integrating with data pipelines and storage solutions (SQL/NoSQL, vector stores, Parquet/Delta).

Preferred Qualifications

Qualifications: Master’s degree in Computer Science, Electrical/Computer Engineering, or a related field, or equivalent industry experience.
Qualifications: Additional experience with large-scale GPU optimizations, advanced distributed training frameworks, and modern model serving technologies.

Benefits & Perks

Benefits: Hybrid work environment offering a balance between remote and on-site collaboration.

Required Skills

Profiling

ML Ops

GPU

Python

CI/CD

Distributed Training

Optimization

Model Serving

PyTorch

Deep Learning