System Operations Engineer - Cloud Infra

ERP SAVVY LLCabout 1 month ago
Austin, TX, United States
Hybrid
Full-time
Junior Level (1-3 years)

Job Description

Job Title

ERP SAVVY is seeking a highly skilled System Operations Engineer - Cloud Infra to support and streamline deployment activities across its AWS-based cloud environments. This role is critical to ensuring scalable, secure, and cost-efficient infrastructure operations for identity and biometric solutions.

Key Responsibilities

Infrastructure Provisioning & Deployment

  • Design and deploy secure, scalable AWS infrastructure including VPCs, subnets, route tables, and NAT gateways.
  • Configure and manage Network Load Balancers (NLBs) and Application Load Balancers (ALBs).
  • Deploy and manage infrastructure within AWS EKS clusters using Infrastructure as Code (IaC) with Terraform.
  • Automate provisioning and configuration tasks using GitLab CI/CD pipelines.

Monitoring & Observability

  • Develop and maintain Datadog dashboards for real-time system monitoring, performance metrics, and alerting.
  • Implement logging and monitoring best practices to ensure high availability and rapid incident response.

Documentation & Operational Readiness

  • Create detailed operational playbooks, runbooks, and troubleshooting guides.
  • Document infrastructure architecture, data flows, IAM roles, KMS keys, and Active Directory (AD) group mappings.
  • Maintain version-controlled documentation in Confluence or similar platforms.

Security & Compliance

  • Implement and maintain IAM policies, security groups, and encryption standards aligned with compliance requirements.
  • Support CFIUS clearance processes and ensure infrastructure meets U.S. federal security standards.
  • Participate in security audits and remediation efforts.

Cost Optimization & Efficiency

  • Analyze cloud usage patterns and recommend cost-saving strategies.
  • Implement tagging strategies and resource lifecycle policies to control cloud spend.

Collaboration & Support

  • Work closely with DevOps, Security, and Application teams to support deployment pipelines and production readiness.
  • Participate in sprint planning, standups, and cross-functional reviews to align infrastructure goals with business needs.

Expected Deliverables

  • Fully deployed and documented AWS infrastructure (VPCs, ALBs, NLBs, EKS, domain controllers).
  • Operational documentation including playbooks and troubleshooting guides.
  • Infrastructure diagrams and access control documentation.
  • Cost-optimized, resilient, and scalable cloud environments.

Required Skills and Qualifications

Technical Expertise

  • 6+ years of experience in cloud infrastructure engineering, with a focus on AWS.
  • Proficiency in AWS services: EC2, ALB/NLB, FSx, EKS, IAM, KMS, Route 53.
  • Strong experience with Terraform and GitLab CI/CD.
  • Familiarity with monitoring tools such as Datadog, CloudWatch, or Prometheus.

Systems Engineering

  • Proven ability to design and implement scalable, fault-tolerant infrastructure.
  • Deep understanding of networking concepts: VPC, subnets, routing, DNS, VPNs.
  • Experience with Active Directory integration and domain controller setup.

Required Skills

KMS
AWS
NLB
CloudWatch
Route 53
EKS
GitLab CI/CD
Datadog
ALB
Networking
VPC
EC2
Active Directory
Terraform
IAM