Senior Site Reliability Engineer, Observability

Cisco Meraki

Location: Remote - US only

Job Type: Full time


Be brave, not perfect.
- Reshma Saujani

The Meraki cloud supports millions of customer devices from 8 datacentres around the world. Meraki’s customer base has grown by a factor of 2-3 every year, serving more than 4 billion HTTP requests per day globally. Our customers depend on our products to run their critical infrastructure of network switches, security appliances, wireless APs and security cameras.

As SREs at Meraki, we are responsible for building and growing the cloud that supports these customers and their networks. We embrace the *nix way, automate away tedious tasks and build infrastructure as code. As a Senior Site Reliability Engineer on the Observability team you will help us to craft useful, scalable and secure monitoring systems that make sure we stay online and performant.

In this role you will join our team of experienced and passionate DevOps engineers based in the US and UK. You will guide the design, development and operation of the monitoring, log/event collection, and alerting systems that support all of Meraki Engineering.

Examples of projects our team works on:

  • Deploy and grow our Prometheus architecture to handle the next five years of metric growth.
  • Design, deploy and maintain ElasticSearch clusters holding 10-1000+TB of data, for a variety of use cases.
  • Gather requirements, design and build an alerting system that allows developers to construct alerts from multiple data sources and alerting workflows.
  • Develop data pipelines that allow engineers as well as non-technical team members to gain new insights into our production data.
  • Write libraries and APIs that provide a simple, unified interface to other developers when they use our monitoring, logging and event processing systems.
  • Automate infrastructure deployment so monitoring resources can be requested and automatically deployed.

You are an ideal candidate if you:

  • Are passionate about data, and believe in automating manual tasks with the right tools.
  • Have 6+ years experience designing, deploying and operating mid to large size bare metal or cloud environments.
  • Have 3+ years experience scripting or coding with languages like Ruby, Scala, Python, or Bash.
  • Feel comfortable diving into other people's source code to debug errors.
  • Understand *nix systems. We run Debian and Ubuntu.
  • Can work with other teams to help them better monitor their services.
  • Care about and empathize with the customer experience. You have experience supporting an externally-facing production environment, ideally in a team that follows the sun.
  • Have experience with the Elasticsearch, Logstash, Kibana (ELK) stack.
  • Bonus points for experience with: Prometheus, Grafana, Graphite, Kafka, Snowflake, Ansible, Ruby, Terraform, Consul.

Keywords: Observability, Monitoring, SRE, Site Reliability Engineering, DevOps, ElasticSearch, Logstash, Kibana, ELK, Grafana, Graphite, Prometheus, Kafka, Snowflake, Ansible, Ruby, Terraform, Consul.

At Cisco Meraki, we’re challenging the status quo with the power of diversity, inclusion, and collaboration. When we connect different perspectives, we can imagine new possibilities, inspire innovation, and release the full potential of our people. We’re building an employee experience that includes appreciation, belonging, growth, and purpose for everyone.

Cisco is an Affirmative Action and Equal Opportunity Employer and all qualified applicants will receive consideration for employment without regard to race, color, religion, gender, sexual orientation, national origin, genetic information, age, disability, veteran status, or any other legally protected basis. Cisco will consider for employment, on a case by case basis, qualified applicants with arrest and conviction records.

Cisco Covid-19 Vaccination Policy
The health and safety of Cisco's employees, customers, and partners is a top priority. Our goal is to protect and mitigate the spread of COVID-19 infection for strong business resiliency during the pandemic. Therefore, Cisco may require new hires to be fully vaccinated against COVID-19 if the role requires business-related travel, meeting with customers/partners (including visiting third-party sites on behalf of Cisco), attending trade events, and Cisco office entry, unless otherwise prohibited by applicable law, and in countries where COVID-19 vaccination is legally required. The company will consider legally required accommodations/exceptions for medical, religious, and other reasons as per the requirements of the role and in accordance with applicable law. Additional information will be provided to candidates about the requirements and accommodation process at the offer time based on region.

You’ve got this!