Machine Learning Engineer | Python | Pytorch | Distributed Training | Optimisation | GPU | Hybrid, San Jose, CA
Enigma2 months ago
San Jose, CA, United States
Hybrid
Full-time
Junior Level (1-3 years)
Job Description
Position Overview
We are seeking a talented Machine Learning Engineer to bridge research and production by productizing and optimizing advanced ML models. In this role, you will leverage state‐of‐the‐art techniques—including distributed training, model-efficiency methods, and scalable serving systems—to deliver robust, high-performance solutions. You will work collaboratively with cross-functional teams including ML Ops, Research, and Platform Engineering to ensure reliable and cost-efficient service delivery. Location: San Jose, CA
Key Responsibilities
- Productize and optimize models from research into reliable, high-performance, and cost-effective production services with clear SLOs (latency, availability, cost).
- Scale training across nodes/GPUs using techniques such as DDP, FSDP, ZeRO, and pipeline/tensor parallelism, ensuring optimal throughput and reduced time-to-train.
- Implement model-efficiency techniques (quantization, distillation, pruning, KV-cache optimization, Flash Attention) to maintain quality during both training and inference.
- Build and maintain robust model-serving systems incorporating batching, streaming, caching, and memory management using tools like vLLM, Triton, TGI, ONNX, TensorRT, and AITemplate.
- Integrate with vector/feature stores and data pipelines (e.g., FAISS, Milvus, Pinecone, pgvector; Parquet, Delta) to support production requirements.
- Define and monitor performance and cost KPIs, drive continuous improvements, and assist in capacity planning.
- Collaborate closely with ML Ops, Scientists, and Infrastructure teams to ensure reproducible evaluations, CI/CD integration, and robust telemetry/observability.
Required Qualifications
- Education: Bachelor's in Computer Science, Electrical/Computer Engineering, or a related field is required.
- Experience Required: 3–5 years in ML/AI engineering roles with a proven track record in production model training or serving at scale.
- Qualifications: Strong background in machine learning engineering with hands-on experience in Python, PyTorch (primary), and distributed training techniques (DDP, FSDP, ZeRO, pipeline/tensor parallelism).
- Qualifications: Proven expertise in code profiling, optimization (PTQ, QAT, AWQ, GPTQ), and implementation of model efficiency strategies such as quantization, distillation, pruning, and KV-cache optimization.
- Qualifications: Experience in building scalable model-serving systems and integrating with data pipelines and storage solutions (SQL/NoSQL, vector stores, Parquet/Delta).
Preferred Qualifications
- Qualifications: Master’s degree in Computer Science, Electrical/Computer Engineering, or a related field, or equivalent industry experience.
- Qualifications: Additional experience with large-scale GPU optimizations, advanced distributed training frameworks, and modern model serving technologies.
Benefits & Perks
- Benefits: Hybrid work environment offering a balance between remote and on-site collaboration.
Required Skills
Profiling
ML Ops
GPU
Python
CI/CD
Distributed Training
Optimization
Model Serving
PyTorch
Deep Learning