Principal Software Engineer, Site Reliability Engineering

General Motors5 months ago
Raleigh, NC, United States
On-site
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

At General Motors, as part of our Site Reliability Engineering (SRE) team, you will play a crucial role in enhancing the reliability, efficiency, and scalability of our distributed systems. In this hands-on role as a Software Engineer, SRE IC, you will develop automated solutions, participate in incident response, and collaborate closely with development teams. Your work will ensure robust, resilient services while balancing cost-efficiency and performance.

Key Responsibilities

  • Automation and Reliability Improvements: Develop tools and software to automate operational processes, improve system reliability, and reduce manual intervention.
  • Observability and Monitoring: Implement and enhance monitoring and observability frameworks for proactive incident detection and resolution.
  • Incident Response: Participate in an on-call rotation to diagnose, troubleshoot, and mitigate production incidents, ensuring minimal downtime.
  • Collaboration with Development Teams: Work closely with developers to ensure service quality, scalability, and reliability, fostering a "You build it, you run it" culture.
  • Service Level Management: Manage SLIs, SLOs, and SLAs to set and maintain reliability expectations.
  • Engineering for Reliability: Apply best practices and common reliability patterns to production systems.
  • Failure Analysis and Post-Incident Reviews: Conduct deep-dive analyses and collaborate on reviews to drive continuous improvement.
  • Cost Efficiency: Evaluate system performance and recommend optimizations to reduce infrastructure costs while maintaining high reliability.

Required Qualifications

  • Proficiency in at least one programming language (e.g., Python, Go, Java) with familiarity across multiple ecosystems.
  • Solid understanding of operating systems, networking, distributed systems, databases, and storage architectures.
  • Deep understanding of system fundamentals, including operating systems, algorithms, and data structures.
  • Experience handling production incidents, including root cause analysis and mitigation of complex system failures.
  • Strong communication skills and the ability to explain technical concepts to both technical and business stakeholders.
  • Proven experience in automating manual processes, building deployment pipelines, or managing configuration systems.
  • Bachelor’s degree in computer science or a related field, or equivalent work experience.

Preferred Qualifications

  • Experience with cloud platforms (AWS, GCP, Azure).
  • Familiarity with container orchestration systems such as Kubernetes.
  • A track record of managing or developing distributed systems.
  • Prior experience with Java in production environments.
  • 8+ years of industry experience.
  • Additional academic qualifications in computer science or related fields are a plus.

Benefits & Perks

  • Compensation: Base compensation is $225,000 - $344,800, varying based on factors relevant to the position.
  • Bonus Potential: Incentive pay program offering payouts based on company performance, job level, and individual contribution.
  • Benefits: GM provides a broad range of health and wellbeing programs including medical, dental, vision, Health Savings Account, Flexible Spending Accounts, retirement savings plan, life insurance, paid vacations & holidays, tuition assistance, and GM vehicle discounts.
  • Relocation benefits may be available.

Required Skills

Cloud Platforms
Python
Networking
Go
Distributed Systems
Kubernetes
Site Reliability Engineering
Incident Management
Automation
Systems Engineering
Troubleshooting
Java
Software Engineering
Root Cause Analysis
Observability