Principal Network Reliability Engineer

Year    IN, India

Job Description



The Oracle Cloud Infrastructure (OCI) delivers mission-critical applications for top tier enterprises around the world. Our cloud offers unmatched hyper-scale, multi-tenant services deployed in more than 40 regions worldwide. The mission of our Network Reliability Engineering team is to provide exceptional network reliability and automation services that enable our customers to drive operational excellence in OCI networks at scale. By focusing on both reactive and proactive functions, we aim to minimize downtime, quickly resolve incidents, and continuously enhance network performance through automation, advanced monitoring, and a customer-centric approach.


As a Principal Network Reliability Engineer, you will play a critical role in designing, building, testing, deploying, and operating highly reliable, scalable network solutions to support Oracle's next-generation Cloud Infrastructure. You will help ensure the reliability and availability of large-scale distributed systems, managing hundreds of thousands of networking devices.


You will contribute to both proactive and reactive initiatives - automating processes, implementing advanced monitoring, swiftly resolving incidents, and continuously improving network performance. You possess strong coding abilities, a deep understanding of networking and distributed systems, and a passion for automation to drive operational excellence.


You thrive in a collaborative, agile environment, effectively manage multiple projects and priorities, and consistently deliver results in fast-paced, dynamic conditions. Most importantly, you are a dedicated team player, eager to learn and adapt, and committed to helping the team achieve exceptional standards of network reliability.



What you will bring:


Bachelor's degree in CS or related engineering field with 10+ years of Network Engineering experience or Master's or equivalent experience with 8+ years of Network Engineering experience. Experience working in a large ISP or cloud provider environment. Experience working in a network operations/reliability engineering role. Folks with solid understanding of protocols such as MPLS, BGP/OSPF/IS-IS, TCP, IPv4, IPv6, DNS, and DHCP. Also, VxLAN and EVPN will be an added advantage. Extensive experience with scripting or automation and data center design - Python preferred but must demonstrate expertise in scripting or compiled language. Experience with networking protocols such as TCP/IP, VPN, DNS, DHCP, and SSL. Experience with network monitoring and telemetry solutions. Experience with network modeling and programming - YANG, OpenConfig, NETCONF. Ability to use professional concepts and company objectives to resolve sophisticated issues in creative and effective ways. Capable of working under limited supervision. Excellent organizational, verbal, and written communication skills. Excellent judgment in influencing product roadmap direction, features, and priorities. Participate in an on-call rotation.

Responsibilities



Supports the design, deployment, and operations of a large-scale global Oracle Cloud Infrastructure (OCI). Primarily focused on the development and support of network fabric and systems through a combination of a deep-level understanding of networking at the protocol level coupled with programming skills. As OCI is a cloud-based network with a global footprint, this support will include hundreds of thousands of network devices supporting millions of servers, connected over a mix of dedicated backbone infrastructure, CLos Network, and the Internet.


Ownership mindset - delivering results, embracing ambiguity, and driving continuous improvements. Collaborate with program/project managers to develop breakthroughs and results. Will primarily use existing procedures and tools to develop and safely implement network change. However, may have to develop new procedures from time to time. Develop solutions to enable front line support teams to act on network failure conditions. Mentor junior engineers. Participates in network solution and architecture design process. Participate in operational rotations as either primary or secondary. Provide break-fix support for events. Serve as the partner concern point for event remediation. Lead post-event root cause analysis. Coordinate with networking automation services for the development and integration of support tooling. Coordinate with network supervising to capture telemetry and build alerts rules using them. Build dashboards to represent data at various network layers and device roles that help identify network issues, anomalies. Frequently develops scripts to automate routine tasks for team and business units. Serves as SME on software development projects for network automation and network monitoring. * Collaborate with network vendor technical account team and internal Quality Assurance team to drive bug resolution and assist in the qualification of new firmware and/or operating systems.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4486236
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    IN, India
  • Education
    Not mentioned
  • Experience
    Year