Lead Site Reliability Engineer - Observability
Location: United Kingdom
Job Type: Full time
The Meraki Cloud supports millions of customer devices from 8 data centers around the world. Meraki’s customer base has grown by a factor of 2-3 every year, serving billions of HTTP requests per day globally. Our customers depend on our products to run their critical infrastructure of network switches, security appliances, wireless APs and security cameras.
Our SREs are responsible for building and growing the cloud that supports these customers and their networks. As a Lead Site Reliability Engineer on the Observability team you will lead the design, development and operation of large-scale, secure observability systems that make sure our services stay online and performant. We are a team of passionate software engineers that value quality and customer experience. Our team is based in the US and EMEA, and we embrace hybrid and remote work.
What you would be working on:
- Design, deploy and scale our Prometheus architecture to handle 100+ million active series and beyond.
- Deploy and operate large, high-performance ElasticSearch clusters holding 10-2000+TB of data.
- Deploy and grow high-throughput data pipelines built on Kafka, handling hundreds of thousands of events per second.
- Design and build an alerting system that allows engineering teams to construct alerts from multiple data sources and alerting workflows.
- Write libraries and APIs that give engineers self-service access to our monitoring, logging, and other observability systems.
- Use Terraform to deploy public and private cloud infrastructure.
We are looking for:
- Experience in designing, deploying and operating mid to large size distributed systems on VMs or bare metal machines running Linux (we run Debian and Ubuntu).
- Experience developing with languages like Ruby, Python, Go, Scala, or Bash.
- Being excited by the challenge of solving difficult problems in large distributed systems that deal with huge amounts of data.
- Interest in working on highly autonomous team that cares deeply about quality and customer experience.
- Being curious, able to learn fast and feel comfortable diving into unfamiliar code and systems to solve problems.
- Understanding the value of observability and can work with other teams to help them better monitor their services.
- Willing to be part of a production on-call rotation.
- Direct experience with the following technologies (or similar): Elasticsearch Logstash Kibana (ELK) stack, Kafka, Prometheus/Thanos/Cortex, Graphite, Ansible, Terraform, Consul.
At Cisco Meraki, we’re challenging the status quo with the power of diversity, inclusion, and collaboration. When we connect different perspectives, we can imagine new possibilities, inspire innovation, and release the full potential of our people. We’re building an employee experience that includes appreciation, belonging, growth, and purpose for everyone.
Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.
Cisco Covid-19 Vaccination Policy
The health and safety of Cisco's employees, customers, and partners is a top priority. Our goal is to protect and mitigate the spread of COVID-19 infection for strong business resiliency during the pandemic. Therefore, Cisco may require new hires to be fully vaccinated against COVID-19 if the role requires business-related travel, meeting with customers/partners (including visiting third-party sites on behalf of Cisco), attending trade events, and Cisco office entry, unless otherwise prohibited by applicable law, and in countries where COVID-19 vaccination is legally required. The company will consider legally required accommodations/exceptions for medical, religious, and other reasons as per the requirements of the role and in accordance with applicable law. Additional information will be provided to candidates about the requirements and accommodation process at the offer time based on region.