Applications for this job have closed. This page will redirect to the JP Morgan employer page in 10 seconds.

MLOps Manager of Software Engineering

Greater London
Full time
Posted
employer logo
JP Morgan
Banking, investment & finance
10,001+ employees
Compare top employers

The Aumni team at JPMorgan Chase is looking for a Software Engineering Manager to oversee a traditional SRE team and a MLOps [QU1]team to manage our core application, model hosting, deployment, and monitoring infrastructure in AWS.

A Software Engineering Manager within the Digital Private Markets department will help us manage multiple SRE teams with a joint focus on traditional web applications as well as AI/ML models. You will solve complex and broad business problems with clear communication, practical solutions, and stakeholder engagement. Through effective mentorship, management and system design, you will serve as a key enablement pillar for our software engineering and data science teams.

You will apply your extensive experience as a leader by sharing your knowledge of end-to-end operations, availability, reliability, and scalability in the AI/ML space. You also will serve as a mentor to your engineers as they enable the downstream Data Science and ML Engineering teams as they execute on our product roadmap. A focus on empathy, organization, and communication is key to success in this role.

Job responsibilities

  • Manages multiple teams responsible for core infrastructure to support AI/ML and web application initiatives
  • Oversees automated continuous integration and continuous delivery pipelines for the Software Development and Data Science teams to host web applications and develop AI/ML models
  • Mentors traditional SREs and MLOps engineers
  • Sets standards for Infrastructure, CI/CD and observability architecture
  • Fosters technical discussions with developers, key stakeholders, and team members to resolve complex technical problems
  • Builds technical roadmaps in collaboration with senior leadership and identifies risks or design optimizations
  • Proactively resolves issues before they impact internal and external stakeholders of deployed models
  • Champions the adoption of traditional SRE and MLOps best-practices within your teams

Required qualifications, capabilities, and skills

  • Formal training or certification on site reliability engineering concepts and/or 5+ years applied experience
  • 2+ years of Engineering Manager or Tech Lead experience in the SRE or MLOps domain
  • Experience leading agile sprint ceremonies
  • Proven ability to lead, inspire, and manage a diverse team of software engineers
  • Strong mentoring and coaching skills
  • Excellent verbal and written communication skills, with the ability to effectively convey complex technical concepts to various audiences
  • Ability to work with a geographically distributed team across multiple timezones
  • Ability to manage multiple projects and priorities effectively
  • Can articulate the importance of monitoring and observability in the AI/ML space. Enforces its implementation & use across an organization
  • Domain knowledge of machine learning applications and technical processes within the AWS ecosystem.
  • Expertise with Terraform, Kubernetes (or other container orchestration platforms), and CI/CD platforms such as Jenkins or Github Actions
  • Experience with event-driven, microservice oriented architectures, specifically with AWS Lambda
  • Understanding of the different roles served by data engineers, data scientists, machine learning engineers, and system architects, and how MLOps contributes to each of these workstreams

Preferred qualifications, capabilities, and skills

  • Experience managing multiple teams with ambiguity and external dependencies
  • Comfortable with team management, fostering collaboration, promoting design patterns, and presenting technical concepts to non-technical audiences
  • Ability to break down large concepts and goals into smaller requirements and manage multiple competing priorities
  • Understanding of ML model training and deployment procedures and techniques
  • Experience with data engineering and CI/CD best practices
  • Familiarity with observability concepts and telemetry collection using tools such as Datadog, Grafana, Prometheus, Splunk, and others
  • Experience working with ML engineering platforms such as Databricks and Sagemaker
  • Experience working with Data Engineering technologies such as Snowflake and Airflow

usually capitalized as MLOps[QU1]