As a Site Reliability Engineer (SRE), you'll help build a meaningful engineering discipline, combining software and systems to develop creative engineering solutions to operations problems. Much of our support and software development focuses on optimizing existing systems, building infrastructure, and reducing work through automation. You’ll join a team of curious problem solvers with a diverse set of perspectives who are thinking big and taking risks. In this environment, you’ll take the lead on relevant projects, supported by an organization that provides the support and mentorship you need to learn and grow. As an SRE, you’ll be focused on running better production applications and systems.

The public cloud team is responsible for engineering and operating the cloud infrastructure and platforms of JPMC ensuring reliability, resiliency, and security. We have a Site Reliability Engineer (SRE) position to help the JPMC Cloud team to production support in the public cloud. In this role, you’ll be working with cloud engineers across the organization to build the platform, pipeline, and monitoring systems to ensure the application landscape is designed to take the most advantage of JPMC’s global cloud solution.

Responsibilities:

As an SRE you have the responsibility of ensuring the reliability, availability, and performance of the cloud infrastructure and platform.
Diagnose and repair issues using critical knowledge of cloud platform, systems, and application architecture.
Automate repeated manual tasks, develop tools and automation to improve the efficiency of the platform and infrastructure.
Develop monitoring and dashboards for observability and proactive alerting.
Analyze defects, propose improvements and drive efficiencies in systems and processes.
Author and improve the quality of technical engineering documentation
Performs Architecture, deployment, administration, configuration, testing, and integration tasks related to cloud platforms.
Helps to develop new cloud engineering strategies and implementations for the firm
Participates in 24x7 SRE on-call rotations and escalation workflows.

Qualifications:

Bachelor's degree in Computer Science, Information Technology, or equivalent technical qualification or professional experience.
Enterprise Cloud infrastructure experience (AWS, Azure, GCP) in a mission-critical environment
Experience with production infrastructure as an SRE (or Infrastructure software engineer, or devops engineer or production engineer)
Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
Systematic problem-solving and troubleshooting skills.
Proficiency in programming in one or more of the following languages: Python, Java, or Go.
Experience with Infrastructure as Code, using Terraform/CloudFormation/Ansible or other tools.
Excellent communication skills working with stakeholders and domain experts across the company to design solutions to user problems
Experience in one or more of the following technologies: Linux and Networking administration, Kubernetes, Databases
AWS/Terraform/Kubernetes Certifications are highly desirable