Staff Software Engineer - Site Reliability

Ironclad, Inc.3 months ago
San Francisco, CA, United States
Hybrid
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

Site Reliability Engineer sits under the umbrella of Product and Engineering and plays a pivotal role in ensuring developers have the tools, infrastructure, and monitoring necessary to deliver an enterprise-grade experience. As a staff-level SRE, you will help set the technical strategy for the team, drive cross-team impact, improve organizational efficiency, and champion SRE culture at Ironclad.

Ironclad is the leading AI contracting platform that transforms agreements into assets. Contracts move faster, insights surface instantly, and teams drive work forward while keeping you in control. Recognized as a leader by Forrester, Gartner, Fortune, Fast Company, Forbes, and Business Insider, we empower transformative organizations—from OpenAI to the World Health Organization—to accelerate their business.

Compensation: Tier 1 with a Base Salary Range of $210K – $235K, plus equity. The actual base salary offered will depend on factors such as individual proficiency, anticipated performance, and location. This is part of our competitive total rewards package.

This is a hybrid role with required office attendance on Tuesdays and Thursdays for collaboration and connection.

Key Responsibilities

  • Be part of the Cloud Platform SRE Team, focused on building our Cloud Platform using modern tools and best practices.
  • Champion SRE best practices within the team and throughout the organization.
  • Ensure the reliability, availability, and performance of services and infrastructure.
  • Solve the whole problem – design, implement, and maintain scalable systems.
  • Automate repetitive operational tasks to streamline processes.
  • Monitor system performance and troubleshoot issues proactively.
  • Develop and document best practices for system operations.
  • Collaborate with development teams to enhance system design.
  • Manage incident responses and perform root cause analysis.
  • Participate in on-call rotations to handle critical issues as they arise.
  • Mentor team members to multiply output through leadership and guidance.

Required Qualifications

  • Minimum of 5 years of experience in a Site Reliability Engineering/DevOps role.
  • Expert knowledge of Docker and Kubernetes (Crossplane experience is a plus).
  • Strong knowledge of cloud platforms such as AWS and Google Cloud.
  • Proficiency in scripting and programming languages like Python, Typescript, or Bash.
  • Experience with infrastructure-as-code tools like Terraform or Pulumi.
  • Strong troubleshooting and analytical skills with a drive to help customers and learn new products.
  • Experience with CI/CD pipelines and deployment automation tools such as CircleCI and ArgoCD.
  • Robust understanding of networking and security principles.

Benefits & Perks

  • 100% health coverage for employees (medical, dental, and vision) and 75% coverage for dependents with buy-up plan options.
  • Market-leading leave policies, including gender-neutral parental leave and compassionate leave.
  • Family forming support through Maven for you and your partner.
  • Generous paid time off to take the time you need, when you need it.
  • Monthly stipends for wellbeing, hybrid work, and (if applicable) cell phone use.
  • Mental health support through Modern Health, including therapy, coaching, and digital tools.
  • Pre-tax commuter benefits for US employees.
  • 401(k) plan with Fidelity and employer match for US employees.
  • Regular team events to connect, recharge, and have fun.
  • And most importantly, the opportunity to help build the company you want to work at.

Required Skills

Networking
Leadership
Service Mesh
Typescript
Google Cloud
Incident management
Troubleshooting
Monitoring tools
CI/CD
Terraform
Database management
Bash
Kubernetes