Data Scientist - AI & Agentic Applications & Benchmarking

CloudBees7 months ago
Los Angeles, California, United States
Remote
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

CloudBees is the leading software delivery platform for enterprise DevOps teams. As a high-growth startup, we empower developers to build, deploy, and manage software more efficiently. We are now integrating agentic intelligence into our platform to supercharge developer workflows.

In this role as a startup-savvy Data Scientist, you will help define, measure, and evangelize the impact of Agentic Applications across our platform. You will work closely with engineers and product teams to prototype and measure AI and agentic experiences using evals, telemetry, and AI benchmarks—translating performance into clear, compelling narratives for both internal teams and our customers. As a founding member, you will be equal parts builder, evaluator, and communicator, with the technical depth to prototype in Python notebooks, Claude Code, and other tools.

Key Responsibilities

  • Partner with our platform team to develop and prototype telemetry, eval frameworks, and benchmarks for emerging agentic systems.
  • Collaborate with product and engineering teams to measure AI outcomes and usage across customers and teams.
  • Define KPIs and success metrics for AI and LLM-driven features and workflows.
  • Utilize Python notebooks to explore data, visualize insights, and rapidly test hypotheses.
  • Craft clear internal documentation, performance summaries, and thought leadership pieces that tell the story behind the numbers.
  • Enable engineering teams to instrument, log, and effectively evaluate agent performance.
  • Stay current with evolving metrics and evaluation techniques within the LLM and agentic AI ecosystem.

Required Qualifications

  • 3+ years of experience in data science or ML analytics roles, ideally in startup or high-growth environments.
  • Proficiency in Python, including building and sharing analysis via Jupyter notebooks.
  • Experience with evals, telemetry, A/B testing, and evaluation of user-facing ML systems.
  • Familiarity with AI/ML tools such as MLFlow, Hugging Face, or other Model/LLM frameworks.
  • Strong ability to partner with technical teams to define meaningful metrics and benchmarks.
  • Excellent written and verbal communication skills to effectively share outcomes and influence stakeholders.
  • Comfort working in fast-paced, ambiguous environments where speed and clarity are crucial.

Preferred Qualifications

  • Experience with agentic or LLM-based applications (e.g., evaluating AI copilots, autonomous workflows).
  • Familiarity with tools like LangSmith, OpenInference, or custom evaluation stacks.
  • Background in developer tools, DevOps, or platform engineering environments.

Benefits & Perks

  • Shape the future of AI-driven DevOps with real user impact.
  • Join a nimble, passionate team at the forefront of agentic system development.
  • Work in a flexible, remote-first culture built on trust and innovation.
  • Competitive salary, startup equity, and excellent benefits.

Required Skills

A/B Testing
Jupyter Notebooks
MLFlow
Agentic AI Evaluation
Data Science
Python
Statistical Analysis
Telemetry
Hugging Face
Technical Communication