Staff Machine Learning Infrastructure Engineer

StubHub9 months ago
Los Angeles, California, United States
Hybrid
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

We are seeking an accomplished Staff Machine Learning Infrastructure Engineer to join StubHub’s Data Engineering & Analytics team as a high-impact individual contributor focused on building foundational machine learning platforms that power recommendation systems, pricing optimization, and personalization. Operating at a Staff level, you will set technical direction for ML infrastructure by architecting scalable systems and influencing cross-team initiatives. This role is Hybrid (3 days in office/2 days remote) based in New York, NY or Los Angeles, CA.

Key Responsibilities

  • Setting Technical Direction:
    • Architect ML infrastructure strategy that aligns approaches across Data Science, ML Engineering, and Platform teams.
    • Drive consensus on technical vision for feature stores, inference services, and model lifecycle management.
    • Advocate for long-term technical progress while balancing immediate organizational needs.
    • Establish architectural patterns that become standards within StubHub’s ML ecosystem.
  • Core ML Infrastructure & Exploration:
    • Prototype and investigate ambiguous, high-impact ML infrastructure challenges.
    • Build production-grade inference services with sub-100ms latency, intelligent caching, and 99.9% uptime SLAs.
    • Design model lifecycle management systems including versioning, A/B testing, rollback capabilities, and performance monitoring.
    • Modernize recommendation systems from legacy architectures to scalable, real-time streaming solutions.
    • Explore innovative solutions for complex ML infrastructure challenges.
  • Technical Leadership & Mentorship:
    • Provide an engineering perspective in high-level discussions about ML strategy.
    • Mentor engineers across the platform and actively sponsor promising team members.
    • Inject technical context into critical decision-making processes.
    • Lead complex initiatives spanning multiple teams and quarters.
  • Being the "Glue":
    • Connect different team efforts to ensure ML infrastructure initiatives succeed.
    • Handle behind-the-scenes work to keep critical ML projects moving forward.
    • Expedite high-priority ML infrastructure needs across the organization.
    • Ensure strategic work is completed even when it spans multiple teams.

Required Qualifications

  • 8+ years of relevant software or data engineering experience in a fast-paced, high-growth environment.
  • 3+ years of experience with machine learning infrastructure, MLOps, or ML platform engineering.
  • Proven track record of setting technical direction and leading complex, multi-team initiatives.
  • Strong programming and analytical abilities with expertise in Python, Scala, or Java, and infrastructure-as-code.
  • Experience with feature store services, live inference systems (including caching, SLAs, and performance optimization), and model lifecycle management (versioning, A/B testing, rollback capabilities).
  • Experience with streaming systems such as Spark and Kafka, and exposure to cloud-based ML platforms like AWS SageMaker, Google Vertex AI, or Azure ML.
  • Experience mentoring engineers and establishing technical best practices.
  • Staff-Level Capabilities:
    • Technical leadership through influence rather than formal management.
    • Strategic thinking with the ability to balance long-term vision with immediate needs.
    • Cross-functional collaboration with Data Science, Product, and Engineering teams.
    • Excellent communication skills to provide technical insights at the organizational level.
    • Strong problem-solving abilities for ambiguous, high-impact technical challenges.
    • Mentorship and sponsorship experience to grow junior and mid-level engineers.

Preferred Qualifications

  • Experience with real-time recommendation systems and personalization platforms at scale.
  • Knowledge of ML model serving frameworks such as TensorFlow Serving, TorchServe, or Seldon.
  • Experience with A/B testing frameworks and experimentation platforms.
  • Familiarity with distributed computing frameworks like Ray or Dask.
  • Understanding of ML security and privacy considerations.
  • Track record of technical writing or speaking at conferences on ML infrastructure topics.

Benefits & Perks

  • Accelerated Growth Environment – Immerse yourself in an atmosphere designed for rapid skill and knowledge enhancement.
  • Top Tier Compensation Package – Enjoy a rewarding compensation package including enticing stock incentives.
  • Flexible Time Off – Embrace unlimited Flex Time Off for optimal work-life balance.
  • Comprehensive Benefits Package – Benefit from a complete package featuring 401k, plus premium Health, Vision, and Dental Insurance options.

Required Skills

Cloud-based ML Platforms (e.g., AWS SageMaker, Google Vertex AI, Azure ML)
Feature Store Design
Mentorship
Model Lifecycle Management
Infrastructure as Code
Real-time Inference
Python
Machine Learning Infrastructure
Streaming Systems (e.g., Spark, Kafka)
Technical Leadership
Scala/Java