Site Reliability Engineer - Azure Cloud and Database Reliability

UBS3 months ago
Raleigh, NC, United States
On-site
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

Are you an analytic thinker who thrives on solving challenging problems? We are seeking an Infrastructure Engineer with Azure Database Administrator expertise to ensure high availability and stability of our cloud database platforms, including MSSQL, PostgreSQL, Cosmos DB, and Azure Cache for Redis. In this role, you will engage in the entire lifecycle of database services, apply Site Reliability Engineering practices, and ensure quality, security, reliability, and compliance across our infrastructure while working in an Agile environment.

Key Responsibilities

  • Engage in and improve the entire lifecycle of cloud database services (MSSQL, PostgreSQL, Cosmos DB, Azure Cache for Redis).
  • Apply Site Reliability Engineering practices to support, automate, and operate database platforms.
  • Ensure the quality, security, and compliance of infrastructure by implementing both functional and non-functional requirements.
  • Collaborate within Agile teams to drive the transformation toward a more agile organization.
  • Provide architectural, strategic, and project consultancy to a diverse client base.

Required Qualifications

  • Bachelor’s or Master’s degree (or equivalent experience) in Engineering with a focus on database and SRE practices.
  • Ideally 5+ years of experience in large enterprise environments, preferably in global financial or technology organizations.
  • Expertise in MSSQL, PostgreSQL, Cosmos DB, and Azure Cache for Redis, with strong performance tuning and server configuration skills.
  • Proven cloud experience on Azure and hands-on experience with infrastructure maintenance using Terraform or Ansible.
  • Strong proficiency in Python and other scripting languages for automation, with knowledge of underlying operating systems (Windows, Unix RHEL 7/8).
  • Demonstrated SRE mindset with experience in managing incident calls, problem management, and conducting blameless RCAs.
  • Excellent communication skills with the ability to present complex technical concepts to both technical and non-technical audiences.
  • Familiarity with CI/CD pipelines and DevOps practices using Azure DevOps.

Required Skills

PostgreSQL
Python
Azure
Site Reliability Engineering (SRE)
Unix (RHEL 7/8)
Cosmos DB
Windows
DevOps
Azure Cache for Redis
Ansible
Database Administration
Incident Management
Automation
CI/CD
MSSQL
Terraform