Join our team as a Principal Infra Developer where you will leverage your expertise in Splunk Admin SRE Grafana ELK and Dynatrace AppMon to drive infrastructure excellence. With a hybrid work model and day shifts you will play a crucial role in optimizing our systems ensuring seamless operations and contributing to the companys growth. Your efforts will directly impact our ability to deliver high-quality solutions to our clients.
Responsibilities
JD for DevOps Observability SME
Experience Minimum 8+ years of experience in IT industry
Role and Responsibilities
As a DevOps Engineer one should be able to analyze business needs and accordingly design lead drive and implement highly scalable and complex DevOps solutions.
In the capacity of an DevOps Observability SME the lead engineer is responsible for taking up lead roles in executing key engagements of rolling out Enterprise DevOps projectsprogramsorganizations. They need to build plans and help in implementing key engineering practices at team level. The lead engineer is also responsible for creating approach notes making presentations to customers and helping teams with DevOps adoption.
The key responsibilities may involve some or all the areas listed below
Design and build observability solutions based on Open Telemetry framework metrics logs traces for storage systems with a focus on scalability performance and reliability
Develop custom collectors exporters and instrumentation libraries using python to integrate with storage platforms.
Develop custom automation scripts and tools primarily using Python to integrate Open Telemetry observability components
Build and manage event streaming pipelines using Kafka for telemetry data collection and processing
Create and maintain dashboards and visualizations in Grafana to provide actionable insights into storage capacity performance and health
Implement and optimize logging solutions using Splunk Grafana including parsing indexing alerting and reporting for storage-related events
Design Develop monitor and maintain multiple CICD pipelines e.g. Jenkins Gitlab Github Argo CD to automate deployment and configuration of Open Telemetry observability tools and storage monitoring services
Deploy monitor and troubleshoot observability components in Kubernetes clusters ensuring proper resource allocation and service reliability
Continuously evaluate new technologies and tools to improve observability telemetry and system insights
Code development mostly in Python
Assessing the current state of DevOps maturity of the client environment and design and propose new solutions
Assess the improvement needs and accordingly suggest solutions for improved developer experience utilizing DevOps tool sets and automation
Identifying new tools and processes to improve the devOps platform onpremisecloud and automate processes
Lead the team to ensure solution implementation and adoption with positive results
Developing templates or scripts to automate everyday developer or operations functions
Work closely with Development and Testing teams both onshore and offshore and design best possible devOps solution
Collaborate and communicate well with stakeholders to drive transformations and improvements. Strong written and verbal communication comfortable with varying and reactive daily responsibilities
Preferred Professional Expertise
Ability to be hands on in technologyprogramming environment perform design and code reviews conduct independent testing and handle overall change management processes and implementation activities and post implementation support.
Experience in automated implementationdeployment of code in the AWS cloud infrastructure
Experience in conducting enterprise level assessment prepare solutions strategic solutions for improved developer experience
Contribute in requirements elicitation creation of application architecture document and creation of design artifacts deliver high quality codes support activities related to implementation and transition
Experience in interacting with external internal teams and key stakeholders for solution adoption issue resolution critical communications etc.
Analyze and resolve issues to ensure high quality deliverables at each stage of SDLC as per guidelines and norms of organization.
Experience of working with mixed onshore-offshore development teams
Experience of working with the Scrum Agile SAFe Kanban development methodology
Preferred Technical Expertise
Working experience with Open telemetry framework using Python Ansible Jenkins Grafana K8 SPLUNK create dashboards app Studio Unix rest API solutions integrator