Global Infrastructure Engineer
Meta5 months ago
Atlanta, GA, United States
Hybrid
Full-time
Junior Level (1-3 years)
Job Description
Position Overview
The Site Operations team at Meta is responsible for the delivery of data center compute and storage that supports a growing global community. We are seeking a forward-thinking individual to lead global initiatives that tackle major technical and operational challenges. As an Infrastructure Engineer, you will work across multiple disciplines with data center teams, Core Systems, CEA, PE, and hardware engineering to architect adaptable solutions that enhance performance, efficiency, quality, and resiliency.
Key Responsibilities
- Represent Site Operations in defining and architecting new solutions on global initiatives, collaborating with stakeholders across Infra Data Centers & Infrastructure teams.
- Assemble and lead teams to address complex engineering challenges with deep technical expertise and a broad understanding of Meta’s overall infrastructure.
- Address ambiguous, global issues by demonstrating leadership and fostering collaboration across time zones, teams, and technical domains.
- Serve as a subject matter expert and mentor in the design, operation, and troubleshooting of tools, technologies, and processes within Site Operations.
- Assess risks and challenges associated with emerging hardware, data center, and software technologies, and implement effective mitigations.
- Leverage a holistic understanding of the full infrastructure stack to develop balanced solutions between physical and logical layers.
- Act as a global communication and advisory point of contact for projects affecting our global data center and server fleet.
- Utilize data-driven methodologies to define problems, plan solutions, and measure project progress.
- Build and nurture cross-functional relationships globally to advocate for the Site Operations Team and influence policies.
- Approximately 20% - 30% travel.
Required Qualifications
- Bachelor's degree in Computer Science, Computer Engineering, or a relevant technical field, or equivalent practical experience.
- Demonstrated knowledge of the full infrastructure stack with experience in building or operating logical infrastructure over complex, distributed physical systems.
- Proven communication skills with experience working in highly distributed environments.
- 10+ years of technical experience in large-scale data center or IT Infrastructure environments, or equivalent experience building platforms and systems for large-scale compute.
- Experience in building globally scalable solutions and translating strategic initiatives into executable projects.
- Understanding of data center functions and technologies including electrical, cooling, cabling, security, network, server, and storage systems.
- Experience in building, operating, and scaling Linux or Unix operating systems.
- Skilled in communicating analysis results and insights to influence cross-functional strategies.
- Experience with Data Center Design and Expansion.
Preferred Qualifications
- Extensive knowledge of storage and AI/ML related services and the hardware that supports them.
- Experience with coding or scripting in languages such as Bash, PHP, Python, SQL, or Perl.
- Proven ability to provide technical guidance to external vendors and partners, with familiarity in virtualization, containerization, distributed systems, fault tolerance, and incident management.
- Familiarity with high-level data center design, operations, and scaling of physical infrastructure, including basic electrical and mechanical systems.
Benefits & Perks
- Compensation: $208,000/year to $289,000/year + bonus + equity + benefits
- Individual compensation is determined by skills, qualifications, experience, and location.
- In addition to base compensation, Meta offers comprehensive benefits.
Required Skills
Networking
Containerization
Electrical/Mechanical Systems
Scalability & High Availability
Scripting (Python, Bash, PHP, SQL, Perl)
Communication
Data Center Design & Expansion
Team Leadership
Data Center Infrastructure
Distributed Systems
Linux/Unix Systems
Virtualization
Fault Tolerance
Troubleshooting