Engineering Leader – AI & Machine Learning Operations (AIOps)
CloudBees Inc7 months ago
Los Angeles, California, United States
Remote
Full-time
Junior Level (1-3 years)
Job Description
Position Overview
CloudBees is the leading software delivery platform for modern enterprises, enabling companies to continuously innovate at scale. As a startup in the DevOps space, we empower developers and teams to build, test, and deploy software faster and more reliably. We are seeking a visionary and hands-on Engineering Leader to drive our Agentic & AI Operations (AIOps) strategy—focusing on platform reliability, scalability, and maintainability—and lead a growing team to build robust, scalable AI & ML infrastructure and pipelines.
Key Responsibilities
- Lead and scale a team responsible for AIOps, including model deployment, monitoring, and lifecycle management.
- Architect and implement AI/ML pipelines that are scalable, observable, and reproducible.
- Collaborate with cross-functional teams (data science, DevOps, product) to integrate AI/ML systems into our SaaS platform.
- Establish best practices for AI/ML experimentation, CI/CD for models, data versioning, and model governance.
- Own the full stack of AIOps infrastructure, from data ingestion to real-time inference systems.
- Drive technical vision and roadmap for ML platform development.
- Act as a mentor and coach, helping engineers grow in a fast-paced, startup environment.
- Manage a team of 5+ and launch new platforms from 0 to 1, driving adoption internally and externally.
Required Qualifications
- 7+ years of engineering experience, including platform engineering, system development, or related roles with at least 3 years in leadership.
- 3 years of experience with large-scale systems emphasizing reliability, scalability, and maintainability; plus 1 year with AI/ML systems.
- Strong hands-on experience with MLOps tools (e.g., MLflow, Kubeflow, SageMaker, Airflow, Metaflow).
- Proven track record of building ML pipelines in production environments.
- Experience with cloud infrastructure (AWS, GCP, or Azure) and container orchestration (Kubernetes).
- Deep knowledge of CI/CD practices as they relate to the ML lifecycle.
- Prior experience in a startup or fast-paced SaaS environment.
- Strong collaboration and communication skills.
- Experience deploying and managing services such as Amazon Bedrock or Vertex AI for LLMs.
Preferred Qualifications
- Experience integrating ML capabilities into developer-centric tools or platforms.
- Familiarity with data observability and ML monitoring tools (e.g., EvidentlyAI, Prometheus/Grafana for models).
- Knowledge of data privacy, compliance, and security in ML systems.
Benefits & Perks
- Work at the forefront of DevOps innovation and shape how ML supports developer productivity.
- Join a high-impact, mission-driven startup backed by top investors.
- Enjoy a flexible remote work culture with global teammates.
- Competitive compensation, stock options, and benefits.
Required Skills
System Architecture
DevOps
AI/ML Operations
MLOps Tools (MLflow, Kubeflow, SageMaker, etc.)
Team Leadership
Container Orchestration (Kubernetes)
CI/CD for ML
ML Pipeline Development
Cloud Infrastructure