Architect

Year    TS, IN, India

Job Description

Country/Region: IN
Requisition ID: 30909
Work Model:
Position Type:
Salary Range:
Location: INDIA - HYDERABAD - BIRLASOFT OFFICE

Title:

Architect


========================

Description:

Area(s) of responsibility


-----------------------------


: Reliability Architect - 6A




Reliability Architect with over 10 years of experience in proactive monitoring, automation, and observability. Skilled in AIOps/MLOps, infrastructure management, and performance optimization using modern tools and practices. Adept at leading incident response, mentoring support teams, and driving cross-functional collaboration to ensure system reliability and scalability.






Key Responsibilities:




Monitoring and Automation


Proactively monitor software systems to prevent incidents and automate routine operational tasks.

Effective Monitoring


Design monitoring systems that trigger alerts based on symptoms rather than outages, ensuring early detection and resolution.

Application Performance Monitoring (APM)


Implement and manage APM tools like New Relic or Dynatrace to track application performance, identify bottlenecks, and optimize resource usage.

Log Analysis with Splunk


Use Splunk to analyze logs for troubleshooting, anomaly detection, and improving system reliability.

Dashboards Preparation


Build intuitive dashboards to visualize system health, performance metrics, and operational KPIs.

Alerts Setup


Configure intelligent alerts based on thresholds and anomalies to ensure timely incident response.

Reports Scheduling


Automate regular reporting to provide insights into system performance, reliability, and trends.

Reliability Metrics


Define and track metrics such as SLOs, SLIs, and error budgets to measure and maintain system reliability.

Observability Skills


Apply observability practices including distributed tracing, logging, and metrics collection to gain deep insights into system behavior.

AI-Driven Monitoring & Automation


Utilize AIOps techniques to proactively detect anomalies, automate incident response, and enable self-healing systems through intelligent alerting and predictive analytics.

Observability & ML Integration


Integrate machine learning models with observability tools to enhance system insights, optimize performance, and ensure reliability of AI-powered services in production.

Cross-Team Collaboration


Work closely with development and support teams to enhance service reliability through rigorous testing and release procedures.

Capacity Planning


Participate in system design reviews and capacity planning to ensure scalability and performance.

Debugging and Incident Response


Lead incident response efforts, analyze debugging information, and manage rollbacks of faulty software deployments.

Mentoring Support Teams


Guide and mentor L1/L2 support teams to establish best practices in monitoring and observability.

Infrastructure Management


Manage infrastructure using tools like

Chef

,

Ansible

,

Terraform

,

GitLab CI/CD

, and

Kubernetes

.

Documentation


Maintain comprehensive documentation of processes and procedures to ensure operational consistency and reduce redundancy.

Proactive Mindset


Approach challenges with enthusiasm, ownership, and a continuous improvement mindset.

Beware of fraud agents! do not pay money to get a job

MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.


Job Detail

  • Job Id
    JD4667498
  • Industry
    Not mentioned
  • Total Positions
    1
  • Job Type:
    Full Time
  • Salary:
    Not mentioned
  • Employment Status
    Permanent
  • Job Location
    TS, IN, India
  • Education
    Not mentioned
  • Experience
    Year