Staff Machine Learning Infrastructure Engineer

StubHub9 months ago

Los Angeles, California, United States

Hybrid

Full-time

Junior Level (1-3 years)

Job Description

Position Overview

We are seeking an accomplished Staff Machine Learning Infrastructure Engineer to join StubHub’s Data Engineering & Analytics team as a high-impact individual contributor focused on building foundational machine learning platforms that power recommendation systems, pricing optimization, and personalization. Operating at a Staff level, you will set technical direction for ML infrastructure by architecting scalable systems and influencing cross-team initiatives. This role is Hybrid (3 days in office/2 days remote) based in New York, NY or Los Angeles, CA.

Key Responsibilities

Setting Technical Direction:
- Architect ML infrastructure strategy that aligns approaches across Data Science, ML Engineering, and Platform teams.
- Drive consensus on technical vision for feature stores, inference services, and model lifecycle management.
- Advocate for long-term technical progress while balancing immediate organizational needs.
- Establish architectural patterns that become standards within StubHub’s ML ecosystem.
Core ML Infrastructure & Exploration:
- Prototype and investigate ambiguous, high-impact ML infrastructure challenges.
- Build production-grade inference services with sub-100ms latency, intelligent caching, and 99.9% uptime SLAs.
- Design model lifecycle management systems including versioning, A/B testing, rollback capabilities, and performance monitoring.
- Modernize recommendation systems from legacy architectures to scalable, real-time streaming solutions.
- Explore innovative solutions for complex ML infrastructure challenges.
Technical Leadership & Mentorship:
- Provide an engineering perspective in high-level discussions about ML strategy.
- Mentor engineers across the platform and actively sponsor promising team members.
- Inject technical context into critical decision-making processes.
- Lead complex initiatives spanning multiple teams and quarters.
Being the "Glue":
- Connect different team efforts to ensure ML infrastructure initiatives succeed.
- Handle behind-the-scenes work to keep critical ML projects moving forward.
- Expedite high-priority ML infrastructure needs across the organization.
- Ensure strategic work is completed even when it spans multiple teams.

Required Qualifications

8+ years of relevant software or data engineering experience in a fast-paced, high-growth environment.
3+ years of experience with machine learning infrastructure, MLOps, or ML platform engineering.
Proven track record of setting technical direction and leading complex, multi-team initiatives.
Strong programming and analytical abilities with expertise in Python, Scala, or Java, and infrastructure-as-code.
Experience with feature store services, live inference systems (including caching, SLAs, and performance optimization), and model lifecycle management (versioning, A/B testing, rollback capabilities).
Experience with streaming systems such as Spark and Kafka, and exposure to cloud-based ML platforms like AWS SageMaker, Google Vertex AI, or Azure ML.
Experience mentoring engineers and establishing technical best practices.
Staff-Level Capabilities:
- Technical leadership through influence rather than formal management.
- Strategic thinking with the ability to balance long-term vision with immediate needs.
- Cross-functional collaboration with Data Science, Product, and Engineering teams.
- Excellent communication skills to provide technical insights at the organizational level.
- Strong problem-solving abilities for ambiguous, high-impact technical challenges.
- Mentorship and sponsorship experience to grow junior and mid-level engineers.

Preferred Qualifications

Experience with real-time recommendation systems and personalization platforms at scale.
Knowledge of ML model serving frameworks such as TensorFlow Serving, TorchServe, or Seldon.
Experience with A/B testing frameworks and experimentation platforms.
Familiarity with distributed computing frameworks like Ray or Dask.
Understanding of ML security and privacy considerations.
Track record of technical writing or speaking at conferences on ML infrastructure topics.

Benefits & Perks

Accelerated Growth Environment – Immerse yourself in an atmosphere designed for rapid skill and knowledge enhancement.
Top Tier Compensation Package – Enjoy a rewarding compensation package including enticing stock incentives.
Flexible Time Off – Embrace unlimited Flex Time Off for optimal work-life balance.
Comprehensive Benefits Package – Benefit from a complete package featuring 401k, plus premium Health, Vision, and Dental Insurance options.

Required Skills

Cloud-based ML Platforms (e.g., AWS SageMaker, Google Vertex AI, Azure ML)

Feature Store Design

Mentorship

Model Lifecycle Management

Infrastructure as Code

Real-time Inference

Python

Machine Learning Infrastructure

Streaming Systems (e.g., Spark, Kafka)

Technical Leadership

Scala/Java