Senior Software Engineer – Compute Infrastructure (C++)

Aurtiro5 months ago
San Francisco, CA, United States
On-site
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

Build virtualization software that sits below CUDA and maximizes utilization across large GPU fleets. This research-driven systems role involves designing the low-level runtime, networking stack, and scheduling layer that transforms raw accelerators into a fast, efficient compute platform for AI workloads. The work entails deep performance optimization, novel isolation techniques, and squeezing every bit of efficiency from heterogeneous hardware. You'll work on everything from kernel‑level primitives to cluster orchestration—challenges that go far beyond typical cloud infrastructure.

Join a small, exceptional team where your impact will directly shape the platform economics and push the boundaries of GPU virtualization.

Key Responsibilities

  • Virtualization Layer – Build the core software that isolates, schedules, and multiplexes GPU workloads across thousands of accelerators with minimal overhead.
  • Performance-Critical Systems – Design high-performance C++ services for resource allocation, job dispatch, and data-path optimization; profile end-to-end including memory layout, scheduling policies, and network utilization.
  • Low-Level Networking – Implement and optimize the stack below CUDA using RDMA, GPU-direct communication, and ultra-low-latency interconnects (NVLink, InfiniBand, RoCE).
  • Research & Iteration – Explore novel approaches to container startup, model loading, multi-tenancy, and failure handling in uncharted territories.

Required Qualifications

  • Strong systems fundamentals including concurrency, memory management, OS internals, and networking.
  • Experience building performance-sensitive software in C++ (or Rust/Go with willingness to work in C++).
  • Comfort with ambiguity – able to prototype, measure, debug hard problems, and iterate toward production-quality systems.
  • Passion for tackling complex debugging and profiling challenges to measurably improve performance.

Preferred Qualifications

  • Experience in GPU computing (CUDA, ROCm, NCCL), accelerator runtimes, or HPC workloads.
  • Familiarity with distributed systems, schedulers, or orchestration tools (Kubernetes, Slurm, custom frameworks).
  • Knowledge of high-speed networking or storage systems at scale.
  • Expertise in scheduling/orchestration, capacity planning, and resource accounting.
  • Proficiency in profiling/tracing tools (perf, flamegraphs, custom instrumentation).
  • Experience with performance work involving accelerators or high-throughput I/O.

Benefits & Perks

  • Competitive compensation and meaningful early-stage equity.
  • Relocation support available.
  • Full-time, in-office position in downtown San Francisco.
  • Opportunity to work on challenging, cutting-edge research problems that push the limits of GPU virtualization.

Required Skills

Memory Management
OS Internals
Concurrency
Profiling
Low-level Networking
Scheduling
Debugging
Virtualization
Performance Optimization
Distributed Systems
C++
GPU Computing