The Digital Private Markets team is a fast-growing global group within J.P. Morgan’s Corporate & Investment Bank. We are building a high-profile and exciting new fintech business for the firm, with the goal of creating a market leading platform for private markets.

Job Summary

As a Site Reliability Engineer you'll combine software and systems to develop creative engineering solutions to operations problems. We focus on optimizing existing systems, building infrastructure, increasing reliability and reducing work through automation. You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you’ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you’ll be focused on running better production applications and systems.

Job Responsibilities

Develop, test and debug SRE based solutions to enable maximum uptime/supportability of PROD and UAT sites
Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
Supports the adoption of site reliability engineering best practices within your team

Required qualifications, capabilities, and skills

Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net
Experience deploying and managing services in Kubernetes, AWS or other cloud platforms
Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
Familiarity with troubleshooting common networking technologies and issues
Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation

Preferred qualifications, capabilities, and skills

Formal training or certification on site reliability engineering concepts and demonstrated experience
Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
Ability to initiate and implement ideas to solve business problems

Site Reliability Engineer III