System Operations Engineer - Cloud Infra
ERP SAVVY LLCabout 1 month ago
Austin, TX, United States
Hybrid
Full-time
Junior Level (1-3 years)
Job Description
Job Title
ERP SAVVY is seeking a highly skilled System Operations Engineer - Cloud Infra to support and streamline deployment activities across its AWS-based cloud environments. This role is critical to ensuring scalable, secure, and cost-efficient infrastructure operations for identity and biometric solutions.
Key Responsibilities
Infrastructure Provisioning & Deployment
- Design and deploy secure, scalable AWS infrastructure including VPCs, subnets, route tables, and NAT gateways.
- Configure and manage Network Load Balancers (NLBs) and Application Load Balancers (ALBs).
- Deploy and manage infrastructure within AWS EKS clusters using Infrastructure as Code (IaC) with Terraform.
- Automate provisioning and configuration tasks using GitLab CI/CD pipelines.
Monitoring & Observability
- Develop and maintain Datadog dashboards for real-time system monitoring, performance metrics, and alerting.
- Implement logging and monitoring best practices to ensure high availability and rapid incident response.
Documentation & Operational Readiness
- Create detailed operational playbooks, runbooks, and troubleshooting guides.
- Document infrastructure architecture, data flows, IAM roles, KMS keys, and Active Directory (AD) group mappings.
- Maintain version-controlled documentation in Confluence or similar platforms.
Security & Compliance
- Implement and maintain IAM policies, security groups, and encryption standards aligned with compliance requirements.
- Support CFIUS clearance processes and ensure infrastructure meets U.S. federal security standards.
- Participate in security audits and remediation efforts.
Cost Optimization & Efficiency
- Analyze cloud usage patterns and recommend cost-saving strategies.
- Implement tagging strategies and resource lifecycle policies to control cloud spend.
Collaboration & Support
- Work closely with DevOps, Security, and Application teams to support deployment pipelines and production readiness.
- Participate in sprint planning, standups, and cross-functional reviews to align infrastructure goals with business needs.
Expected Deliverables
- Fully deployed and documented AWS infrastructure (VPCs, ALBs, NLBs, EKS, domain controllers).
- Operational documentation including playbooks and troubleshooting guides.
- Infrastructure diagrams and access control documentation.
- Cost-optimized, resilient, and scalable cloud environments.
Required Skills and Qualifications
Technical Expertise
- 6+ years of experience in cloud infrastructure engineering, with a focus on AWS.
- Proficiency in AWS services: EC2, ALB/NLB, FSx, EKS, IAM, KMS, Route 53.
- Strong experience with Terraform and GitLab CI/CD.
- Familiarity with monitoring tools such as Datadog, CloudWatch, or Prometheus.
Systems Engineering
- Proven ability to design and implement scalable, fault-tolerant infrastructure.
- Deep understanding of networking concepts: VPC, subnets, routing, DNS, VPNs.
- Experience with Active Directory integration and domain controller setup.
Required Skills
KMS
AWS
NLB
CloudWatch
Route 53
EKS
GitLab CI/CD
Datadog
ALB
Networking
VPC
EC2
Active Directory
Terraform
IAM