Site Reliability Engineer III

JP Morgan

Location: Greater London

Job Type: Full time


The Digital Private Markets team is a fast-growing global group within J.P. Morgan’s Corporate & Investment Bank. We are building a high-profile and exciting new fintech business for the firm, with the goal of creating a market leading platform for private markets.

Job Summary

As a Site Reliability Engineer you'll combine software and systems to develop creative engineering solutions to operations problems. We focus on optimizing existing systems, building infrastructure, increasing reliability and reducing work through automation. You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you’ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you’ll be focused on running better production applications and systems.

Job Responsibilities

  • Develop, test and debug SRE based solutions to enable maximum uptime/supportability of PROD and UAT sites
  • Troubleshoot priority incidents, facilitate blameless post-mortems and ensure permanent closure of incidents
  • Collaborates with other software engineers and teams to design and implement deployment approaches using automated continuous integration and continuous delivery pipelines
  • Collaborates with other software engineers and teams to design, develop, test, and implement availability, reliability, scalability, and solutions in their applications
  • Collaborates with technical experts, key stakeholders, and team members to resolve complex problems
  • Understands service level indicators and utilizes service level objectives to proactively resolve issues before they impact customers
  • Supports the adoption of site reliability engineering best practices within your team

Required qualifications, capabilities, and skills

  • Proficient in site reliability culture and principles and familiarity with how to implement site reliability within an application or platform
  • Proficient in at least one programming language such as Python, Java/Spring Boot, and .Net
  • Experience deploying and managing services in Kubernetes, AWS or other cloud platforms
  • Experience in observability such as white and black box monitoring, service level objective alerting, and telemetry collection using tools such as Grafana, Dynatrace, Prometheus, Datadog, Splunk, and others
  • Experience with continuous integration and continuous delivery tools like Jenkins, GitLab, or Terraform
  • Familiarity with troubleshooting common networking technologies and issues
  • Ability to proactively recognize road blocks and demonstrates interest in learning technology that facilitates innovation

Preferred qualifications, capabilities, and skills

  • Formal training or certification on site reliability engineering concepts and demonstrated experience
  • Proficient knowledge of software applications and technical processes within a given technical discipline (e.g., Cloud, artificial intelligence, Android, etc.)
  • Ability to identify new technologies and relevant solutions to ensure design constraints are met by the software team
  • Ability to initiate and implement ideas to solve business problems
You’ve got this!