Staff Machine Learning Infrastructure Engineer
StubHub9 months ago
Los Angeles, California, United States
Hybrid
Full-time
Junior Level (1-3 years)
Job Description
Position Overview
We are seeking an accomplished Staff Machine Learning Infrastructure Engineer to join StubHub’s Data Engineering & Analytics team as a high-impact individual contributor focused on building foundational machine learning platforms that power recommendation systems, pricing optimization, and personalization. Operating at a Staff level, you will set technical direction for ML infrastructure by architecting scalable systems and influencing cross-team initiatives. This role is Hybrid (3 days in office/2 days remote) based in New York, NY or Los Angeles, CA.
Key Responsibilities
- Setting Technical Direction:
- Architect ML infrastructure strategy that aligns approaches across Data Science, ML Engineering, and Platform teams.
- Drive consensus on technical vision for feature stores, inference services, and model lifecycle management.
- Advocate for long-term technical progress while balancing immediate organizational needs.
- Establish architectural patterns that become standards within StubHub’s ML ecosystem.
- Core ML Infrastructure & Exploration:
- Prototype and investigate ambiguous, high-impact ML infrastructure challenges.
- Build production-grade inference services with sub-100ms latency, intelligent caching, and 99.9% uptime SLAs.
- Design model lifecycle management systems including versioning, A/B testing, rollback capabilities, and performance monitoring.
- Modernize recommendation systems from legacy architectures to scalable, real-time streaming solutions.
- Explore innovative solutions for complex ML infrastructure challenges.
- Technical Leadership & Mentorship:
- Provide an engineering perspective in high-level discussions about ML strategy.
- Mentor engineers across the platform and actively sponsor promising team members.
- Inject technical context into critical decision-making processes.
- Lead complex initiatives spanning multiple teams and quarters.
- Being the "Glue":
- Connect different team efforts to ensure ML infrastructure initiatives succeed.
- Handle behind-the-scenes work to keep critical ML projects moving forward.
- Expedite high-priority ML infrastructure needs across the organization.
- Ensure strategic work is completed even when it spans multiple teams.
Required Qualifications
- 8+ years of relevant software or data engineering experience in a fast-paced, high-growth environment.
- 3+ years of experience with machine learning infrastructure, MLOps, or ML platform engineering.
- Proven track record of setting technical direction and leading complex, multi-team initiatives.
- Strong programming and analytical abilities with expertise in Python, Scala, or Java, and infrastructure-as-code.
- Experience with feature store services, live inference systems (including caching, SLAs, and performance optimization), and model lifecycle management (versioning, A/B testing, rollback capabilities).
- Experience with streaming systems such as Spark and Kafka, and exposure to cloud-based ML platforms like AWS SageMaker, Google Vertex AI, or Azure ML.
- Experience mentoring engineers and establishing technical best practices.
- Staff-Level Capabilities:
- Technical leadership through influence rather than formal management.
- Strategic thinking with the ability to balance long-term vision with immediate needs.
- Cross-functional collaboration with Data Science, Product, and Engineering teams.
- Excellent communication skills to provide technical insights at the organizational level.
- Strong problem-solving abilities for ambiguous, high-impact technical challenges.
- Mentorship and sponsorship experience to grow junior and mid-level engineers.
Preferred Qualifications
- Experience with real-time recommendation systems and personalization platforms at scale.
- Knowledge of ML model serving frameworks such as TensorFlow Serving, TorchServe, or Seldon.
- Experience with A/B testing frameworks and experimentation platforms.
- Familiarity with distributed computing frameworks like Ray or Dask.
- Understanding of ML security and privacy considerations.
- Track record of technical writing or speaking at conferences on ML infrastructure topics.
Benefits & Perks
- Accelerated Growth Environment – Immerse yourself in an atmosphere designed for rapid skill and knowledge enhancement.
- Top Tier Compensation Package – Enjoy a rewarding compensation package including enticing stock incentives.
- Flexible Time Off – Embrace unlimited Flex Time Off for optimal work-life balance.
- Comprehensive Benefits Package – Benefit from a complete package featuring 401k, plus premium Health, Vision, and Dental Insurance options.
Required Skills
Cloud-based ML Platforms (e.g., AWS SageMaker, Google Vertex AI, Azure ML)
Feature Store Design
Mentorship
Model Lifecycle Management
Infrastructure as Code
Real-time Inference
Python
Machine Learning Infrastructure
Streaming Systems (e.g., Spark, Kafka)
Technical Leadership
Scala/Java