Senior Software Engineer - vLLM Inference

Redhat4 months ago
Raleigh, NC, United States
Remote
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

At Red Hat, we believe the future of AI is open. The Inference team accelerates AI for enterprises by delivering open-source LLMs and vLLM solutions. As a Senior ML Ops Engineer, you will work closely with product and research teams to build and release the Red Hat AI Inference runtimes, improve DevOps tooling, and automate critical processes. Join us to shape the future of AI while contributing to what will be 2025's most popular open source project on GitHub. Compensation: $133,650.00 - $220,680.00 (actual offer based on qualifications).

Key Responsibilities

  • Collaborate with research and product development teams to scale machine learning products for internal and external applications.
  • Create and manage model training and deployment pipelines.
  • Contribute to managing and releasing upstream and midstream product builds.
  • Test to ensure correctness, responsiveness, and efficiency.
  • Troubleshoot, debug, and upgrade Dev & Test pipelines.
  • Identify and deploy cybersecurity measures through continuous vulnerability assessments and risk management.
  • Collaborate with cross-functional teams to align with market requirements and best practices.
  • Keep abreast of the latest technologies and industry standards.

Required Qualifications

  • 2+ years of experience in MLOps, DevOps, automation, and modern software deployment practices.
  • Experience evaluating LLMs for performance on accelerators and accuracy (e.g., HellaSwag, MMLU, Chatbot Arena, TruthfulQA, etc.).
  • Proficiency in Python and PyTest.
  • Strong experience with Git, GitHub Actions (including self-hosted runners), Terraform, Jenkins, Ansible, and other automation/monitoring tools.
  • High proficiency in administering Kubernetes/Openshift.
  • Familiarity with Agile development methodologies.
  • Experience with at least one cloud infrastructure (AWS, GCP, Azure, or IBM Cloud).
  • Solid programming and troubleshooting skills, especially in Python.
  • Ability to collaborate with a large, geographically dispersed team.
  • Experience in maintaining stable infrastructures.
  • A Bachelor’s degree or higher in computer science, mathematics, or a related discipline is valued, although technical prowess and practical experience are prioritized.

Preferred Qualifications

  • Familiarity with contributing to the vLLM CI community is considered a big plus.

Benefits & Perks

  • Comprehensive medical, dental, and vision coverage
  • Flexible Spending Account for healthcare and dependent care
  • Health Savings Account with a high deductible medical plan
  • Retirement 401(k) with employer match
  • Paid time off and holidays
  • Paid parental leave plans for all new parents
  • Leave benefits including disability, paid family medical leave, and paid military leave
  • Additional benefits such as employee stock purchase plan, family planning reimbursement, tuition reimbursement, transportation expense account, and employee assistance program

Required Skills

Automation
MLOps
Ansible
DevOps
Software Deployment
Agile Methodology
Openshift
CI/CD
Terraform
Kubernetes
Git
Cloud Computing (AWS/GCP/Azure/IBM Cloud)
Python
Jenkins
PyTest
Troubleshooting
Github Actions