Systems Reliability Engineer

mthree4 months ago
New York, NY, United States
On-site
Full-time
Junior Level (1-3 years)

Job Description

Position Overview

SRE - Leading Investment Bank: Market leading investment bank requires a Systems Reliability Engineer to join their Reliability & Production Engineering department. This role supports Institutional Securities and Wealth Management brokerage Operations platforms that include diverse technologies hosted on premises and cloud platforms. The role involves day-to-day business support alongside reliability engineering tasks and emphasizes improving system reliability by collaborating with software developers and infrastructure engineering teams to develop automated solutions.

Key Responsibilities

  • Manage production tasks including incident and problem management, capacity management, monitoring, event management, change management, and plant hygiene.
  • Troubleshoot issues across the entire technology stack: hardware, software, application, and network.
  • Participate in on-call rotation and periodic conference calls with specialists across various time zones.
  • Proactively identify and address system reliability risks.
  • Collaborate with development teams to design, build, and maintain systems with a focus on reliability, stability, and resiliency.
  • Drive automation opportunities by scoping and creating automated solutions for deployment, management, and service visibility.
  • Represent the Reliability & Production Engineering organization in design reviews and operational readiness exercises.

Required Qualifications

  • Proven ability to troubleshoot and debug large-scale distributed applications across multiple layers including software, infrastructure, and databases.
  • Hands-on experience with enterprise tools such as Prometheus, Grafana, Splunk, and Apica.
  • Strong UNIX/Linux system support skills along with experience in cloud-based services.
  • Experience with automation/configuration/release management tools like Ansible and GitHub.
  • Proficiency in scripting languages such as Python, Bash, Perl, or Ruby, with at least one higher-level programming language.
  • Experience in creating stored procedures and optimizing SQL in Sybase or DB2.
  • Knowledge of Azure Networks, ServiceBus, Azure Virtual Machines, and AzureSQL is a plus.

Required Skills

Automation (Ansible, GitHub, Release Management)
Scripting (Python, Bash, Perl, Ruby)
UNIX/Linux System Support
SQL Optimization
Incident Management
Grafana
Capacity Management
Azure Networks
Prometheus
Problem Management
Monitoring
ServiceBus
AzureSQL
Cloud Services
Azure Virtual Machines
Splunk