Low-Latency Inference Systems Engineer - On-Device & GPU
Genesis AI3 days ago
San Francisco, CA, United States
On-site
Full-time
Junior Level (1-3 years)
Job Description
Job Description
Genesis AI is seeking an experienced individual to develop low-latency inference pipelines for on-device deployment in robotics. The role involves designing and optimizing distributed systems on GPU clusters, implementing efficient low-level code such as CUDA and Triton, and managing workloads to ensure high throughput and low latency.
Ideal candidates will have over 8 years of experience in distributed systems, a strong Python background, and mastery in kernel optimization. This position is essential for our cutting-edge work in machine learning infrastructure.
Required Skills
Low-latency inference systems
Python
CUDA
GPU clusters
On-device deployment
Kernel optimization
Triton
Distributed systems