AI Infrastructure Platform Engineer
Job Description
Title: AI Platform Engineer
Location: FULLY remote in the US!
Salary: $175k-$275k base + RSUs + Full Benefits
Requirements: 7+ years in Systems/Platform Engineering or Distributed Systems (10+ considered for Staff level)
We are a next-generation AI infrastructure company on a serious growth trajectory.
Our most recent quarterly revenue was up 61% year-over-year - and we're just getting started. We recently partnered with a leading AI compute company to power their first Canadian data center out of our Montreal facility, and signed an $865 million, 10-year deal with a major AI infrastructure provider to deliver 40MW of AI compute capacity at our new North Carolina campus. We're also an authorized NVIDIA Preferred Partner and one of the first providers in the world offering the latest generation of NVIDIA servers. If you want to build the infrastructure that the biggest names in AI are betting on, this is the place to do it.
What You'll Do
You'll design and implement container orchestration platforms for NVIDIA DGX/HGX architectures, building and operating bare-metal provisioning infrastructure (PXE, Ironic, MAAS) at scale. You'll own the full GPU lifecycle - driver stability, CUDA/kernel compatibility, and throughput optimization - while integrating security across the stack using Vault, Kubernetes RBAC, and hardened container images. You'll also build observability solutions using Prometheus/Grafana, VictoriaMetrics, or NVIDIA DCGM, develop internal tooling and automation in Go or Python, own GitLab CI/CD pipelines, and participate in on-call rotation supporting customer GPU workloads.
What You Need
- 7+ years in Systems/Platform Engineering or Distributed Systems (10+ considered for Staff level)
- Expert Linux knowledge - kernel modules, sysctl tuning, hugepages, containerd/CRI-O
- Hands-on Kubernetes or SLURM experience on bare-metal (non-managed environments)
- Proficiency with Ansible, Terraform, and hardware provisioning tools
- Strong Go (preferred) or Python skills for building orchestration tooling
- Deep NVIDIA GPU stack knowledge - drivers, CUDA, MIG/GPU-slicing
- Familiarity with InfiniBand and/or RoCEv2 and NCCL performance tuning
What's In It for You
- $175k - $275k base DOE + RSUs
- 5 weeks PTO
- 401k w/ match
- Comprehensive benefits