Applications for this job have closed. Try searching for similar jobs.

Site Reliability Engineer - Observability

JP Morgan

Location: Glasgow City

Job Type: Full time


Show that gap who’s boss!
Women are 16% less likely than men to apply to a job once they’ve viewed it, but are 16% more likely to get hired after applying to a job.*
*LinkedIn Talent Solutions Gender Insights Report 2019

Due to ongoing investment and expansion at our award-winning Technology Centre, we have a number of new opportunities available for talented Engineers.

Located in the heart of Glasgow City Centre, our Glasgow Technology Centre is one of 2 strategic, pure technology hubs globally at JP Morgan. Our opportunities offer the chance to work on interesting challenges and projects, with what are truly cutting-edge technologies.

As a member of our Infrastructure Engineering Group, we look first and foremost for people who are passionate around solving business problems through innovation and engineering practices. You will be required to apply your knowledge and expertise to all aspects of the engineering and software development lifecycle, as well as partner continuously with various stakeholders around the world on a daily basis to stay focused on common goals. We embrace a culture of experimentation and constantly strive for improvement and learning. You will work in a collaborative, trusting, thought-provoking environment - one that encourages diversity of thought and creative solutions that are in the best interests of our customers globally.

As part of our growing and evolving team we offer you an opportunity to learn and deepen your experience across a variety of strategic platforms in the marketplace such as logging tooling (e.g. Splunk and Elastic), visualisation tooling (e.g. Grafana, EDGE) and other monitoring capabilities. We will match your ambitions to develop your career in a continually evolving group where the variety of role will give you a platform to demonstrate your innovative ideas as well as partner with a globally diverse collective of partners and customer across the Bank. Talent development is a strong focus within our team and ownership of tasks, or support, is readily provided to give our engineers an opportunity to stretch and develop themselves in a creative and safe environment.

The role involves:

  • Understanding our implementation of various market leading technologies to be able to provide a stable and reliable platform for our customers internally within the Bank.
  • Continually reviewing the estate to understand issues that are raised and question “why?” they happen. We look for long term solutions rather than quick fixes, although quick fixes have their place when we need to ‘get back up and running’.
  • Reacting to customer queries and concerns, partnering with our Platform Owners and Engineering Teams.
  • We capture metrics and telemetry to understand our applications and supporting hardware in order to evidence the health and adherence to our SLA, SLO and SLI’s as well as monitoring against our error budgets. Where they deviate we will deliver solutions to get back on track, or more importantly stay on track!
  • Our focus is on automation and this includes proactively monitoring our systems to highlight events or issues that need attention. From here we need to intelligently and automatically decide on what corrective action needs to be taken, then acted on (i.e. self-healing). Where this fails, we auto-ticket to get human involvement quickly.
  • We need people who can think on their feet, think outside of the box and not to be afraid of experimenting.

Interested? Here is what we are looking for:

  • An understanding and experience of Site Reliability Engineering (SRE) concepts, terms and day-to-day activities.
  • Proven experience of software development or infrastructure engineering; Python language, Ansible development, synthetic testing, any other automation style activities, etc.
  • An understanding of some form of monitoring and event management tooling, and how these are implemented.
  • A good understanding of Cloud (internal or external) concepts and implementations.
  • You have great communication, team work and problem solving skills. You are good at self-management and are self-motivated to see your work through to implementation.
  • Appreciation of Incident Management processes (ServiceNow would be ideal), JIRA concepts, Agile SDLC, BitBucket

Nice to haves / let us support your upskilling:

  • It would be beneficial to have some experience in Logging tooling such as Splunk and or Elastic


  • Bachelor’s degree or equivalent experience in a software engineering discipline
  • Mastery in at least two or more software languages (e.g. Python, Java, Go, etc.) with respect to designing, coding, testing, and software delivery
  • Adept in the development of automated tools, systems, and services in multiple technology domains
  • Advanced knowledge of one or more infrastructure components (e.g. networking, cloud services, orchestration tools, containerization, compute, and storage systems)
  • Proficiency in service-level changes to a system and troubleshooting components