Cloud Network Engineer

Microsoft

Location: Redmond, Washington

Job Type: Full time

Posted

Be brave, not perfect.
- Reshma Saujani

The Azure Networking Team is growing and we’re looking for a creative and hard-working engineer who can combine networking experience with software development fundamentals to help build the world’s best private backbone network. We’re passionate about automating every facet of the network to converge on a self-growing, self-healing network. We want engineers that are equally passionate and can look at the problems with a customer's perspective to create user experiences that truly enable and empower Technicians, Engineers and Planners.

Focusing on the global backbone network, and working closely with partner teams in Azure Networking, you’ll be a critical part designing, developing, and deploying networking and automation aspects for the Wide Area Network. This will include owning components from design through to development and operations. This is a high visibility position in an area of large and expanding investment for Microsoft Azure and offers a terrific opportunity for technical and career growth.

Responsibilities

Network Design and Implementation

  • Offers modification and improvements to network technologies based on existing and emerging industry knowledge to contribute to the design, implementation, and operation of reliable, scalable, and high-performance networks.
  • Works in collaboration with teams across a single organization to develop reliable, scalable, and high-performance, network designs; independently produces design documents and implementation plans.
  • Partners with engineering and Program Management teams to understand customer, business, and technical requirements to propose network designs or modifications to architecture; participates in architecture reviews to ensure designs and modifications meet all functional, performance, scale, and compliance requirements.
  • Applies an understanding of diagrams of power and cooling systems, floor plan layouts, and the movement of people and equipment, to inform decisions about the development, expansion, or modification of datacenters or network sites. Works with internal teams (e.g., data center engineering teams, procurement, sourcing) and external partners (e.g., architects, contractors, hardware suppliers) and articulates the impact of design choices to ensure stakeholders have complete and accurate information.
  • Leads design, network/code and security reviews across teams to identify risks and prevent classes of bugs prior to production release by applying expertise in network implementation, available technologies, analysis of telemetry pipelines, and root cause analysis, as well as best practices in identifying and implementing solutions. Articulates the customer impact of design trade-offs and exceptions, and identifies capabilities and limitations of existing tools and resources to ensure they can support design implementation and verification.

Maintain and Operate Networks

  • Influences telemetry analytics designs to better identify patterns that reveal errors and unexpected problems; leverages an understanding of network architectures and infrastructure at scale to develop, measure, track, and improve the quality of telemetry pipelines that support automated monitoring and incident response.
  • Triages, troubleshoots, and repairs complex live site issues by applying expertise in physical network components and features (e.g., device operating systems), problem management tools (e.g., root cause analysis, trend analysis, postmortems, repair items), and/or low-level Application Programming Interfaces (APIs) and register sets, to diagnose and address problems using automated, long-term, and sustainable solutions with minimal or no disruption to customers. Participates in on-call/DRI duties to resolve incidents in production and provides guidance to other engineers on triage, troubleshooting, and resolution processes.
  • Develops process or technology solutions that proactively resolve issues with processes, physical network devices, and/or tooling, and makes optimal use of infrastructure and resources through simple designs and by leveraging automation; prioritizes the development of solutions to deliver high-quality, measurable improvements against Key Performance Indicators (KPIs) across teams.
  • Develops new automated testing and validation procedures for network devices, firmware, and configurations to drive solutions between internal teams and external vendors; adopts automated testing and validation tools and promotes their use within their team.
  • Demonstrates knowledge of data — knows what data is needed, how to find new or missing data and how to describe the impact of defects on customers or the impact of operations-focused scenarios on networks or infrastructure, as well as the relevance to product and service targets; identifies patterns and trends in data and interprets them to inform decisions related to improving and optimizing products and/or services.
  • Contributes to the development of knowledge base for datacenter or network site staff/technicians on how to repair and replace existing network hardware and components deployed in production, as well as how to install and deploy new network hardware and components; identifies systemic issues and inefficiencies related to installing and deploying new hardware and components and provides feedback to relevant groups.
  • Develops and implements reliable automation tools and services that increase engineering efficiency, reduce operational burden, and reduce human errors in production, while making optimal use of infrastructure and resources by automating tasks; uses and identifies opportunities to improve existing automation to increase efficiency, reduce errors, and support sustainable network operations at scale across teams.
  • Effectively manages multiple workstreams and resources during incidents, applies diagnostic expertise, provides guidance to other engineers working to mitigate and resolve issues, and maintains a commitment to the quality of products and services throughout the lifecycle; ensures proper notes from incidents are documented and drives the execution of quality postmortem and root cause analysis processes across teams. Performs analysis of historical incident data to identify trends, patterns and issues that should be addressed at high priority.
  • Collaborates with teams across the organization to manage and drive network deployments; works with and drives improvements in machine-readable definitions to manage deployments.
  • Analyzes capacity issues across complex transfer protocols and identifies network components that may need to be modified or replaced; applies an understanding of infrastructures and systems to anticipate capacity issues in other areas and shares information with internal teams, external partners, and suppliers to update hardware and network components as needed to meet current and anticipated capacity needs.
  • Supports innovation and cost management across teams by critically evaluating existing practices and tools developing ideas for simplifying and improving systems and tools that meet customer and/or business needs.

Supporting People and Execution

  • Collaborates within and across teams by proactively sharing information with an appropriate level of detail for their audience; overcomes obstacles by resolving conflicts and issues across interdependent teams and engages with partners and stakeholders so issues can be resolved and mutual objectives are met.
  • Mentors and provides feedback to other engineers, while also proactively seeking mentorship and feedback from others; shares ideas and insights for improving team-oriented behaviors, including DevOps and live site handling skills.

Qualifications

Required Qualifications:

  • 7+ years of network engineering experience in an online service, internet service provider or large enterprise environment.
  • 4+ years of professional software development experience in the networking domain, including Python, YAML, C#, REST, Go, and workflow systems.

Preferred Qualifications:

  • Experience with routing protocols such as: ISIS, OSPF, BGP, and MPLS technologies
  • Self-starter with proven ability to develop creative solutions that enable Customers.
  • Interest and ability to research new and emerging technologies to better solve networking problems.
  • Proven track record of fixing problems permanently.
  • Experience with traffic engineering solutions such as RSVP-TE, Segment Routing
  • CCIE/JNCIE Certification
  • A Bachelor's degree in Computer Science, Software Engineering, Electrical Engineering (or a related field), or equivalent alternative education, skills, and/or practical experience is required.

Ability to meet Microsoft, customer and/or government security screening requirements are required for this role. These requirements include, but are not limited to the following specialized security screenings:

  • Microsoft Cloud Background Check: This position will be required to pass the Microsoft Cloud Background Check upon hire/transfer and every two years thereafter.
You’ve got this!