Development, Digital & Technology/Information Technology
Location(s)
Hyderabad - Telangana - India
CBRE Group, Inc. is seeking an experienced L2 Staff Software Engineer to join its Managed Services Team in Hyderabad, Gurugram, or Noida. The role focuses on ensuring system reliability, scalability, and efficiency through incident management, automation, and collaboration with cross-functional teams.
Key Responsibilities:
Debug and resolve time-sensitive issues across AWS, Azure, or GCP, identifying points of failure and collaborating with internal teams for resolution ensuring minimal service disruption and adherence to ITIL best practices.
using Java or .NET, or C#, with automation support via Python scripting.
Apply ITIL best practices across incident, change, and problem management processes to ensure consistent, efficient, and compliant service delivery.
Demonstrate a strong understanding of system and cloud architecture, and proactively recommend best practices for scalability, reliability, and maintainability across applications and infrastructure.
Collaborate closely with solution architects and engineering teams to apply ITIL best practices across incident, change, and problem management, while leveraging a strong understanding of system architecture and design principles to identify flaws in underlying designs and recommend scalable, reliable, and maintainable solutions.
Write, optimize, and troubleshoot SQL queries, stored procedures, and ensure database performance.
Own the setup, configuration, and optimization of Datadog for full-stack observability, and actively leverage its AIOps capabilities--including anomaly detection, event correlation, and automated root cause analysis--to enhance incident response and system reliability.
Champion a mindset of continuous improvement in support operations by proactively identifying inefficiencies, streamlining workflows, and implementing automation or process enhancements to eliminate repetitive effort and improve overall service quality.
Design and implement automation workflows using Python to streamline operational tasks and reduce manual effort.
Perform API testing and debugging using tools like Postman, ensuring robust integrations and data flow.
Handle and manipulate JSON data structures for application and API interactions.
Utilize GitHub Copilot and other AI tools to accelerate development and troubleshooting tasks.
Analyse reports and logs to drill down issues, identify technical/functional/knowledge/operational debt, and drive resolution strategies.
Recommend and implement scaling and redundancy strategies in cloud infrastructure to ensure high availability.
Manage and troubleshoot containerized applications using Docker and Kubernetes in production environments
Mentor junior engineers, providing guidance on technical best practices and career development.
Ensure alignment with organizational standards and cloud governance policies (e.g., cloud gates), actively working towards compliance in all deployments, configurations, and operational practices across cloud environments.
Incident Management:
Own the incident management lifecycle: detection, response, resolution, and post-mortem analysis.
Conduct root cause analysis and implement preventive measures.
Ensure change requests are properly assessed, documented, and executed with minimal impact
Change Management:
Manage the change management process, ensuring controlled and efficient implementation of changes
Assess the impact of proposed changes and mitigate potential risks.
Ensure compliance with change management policies and procedures.
Metrics and Reporting:
Maintain dashboards for real-time visibility into operational health.
Use data-driven insights to identify recurring issues and recommend process improvements.
Transformation and Automation:
Identify opportunities for process automation and implement solutions to improve efficiency.
Evaluate and implement new monitoring tools
Key Requirements:
Java or .NET or C# & Python
4-6 years of experience in AWS, Azure, GCP (including debugging and scaling strategies)
Database Management: Minimum of 2 years of SQL, stored procedures, performance tuning
API Testing & Debugging: Postman, RESTful APIs
Data Handling: JSON structures, data parsing
Monitoring & Observability: Datadog (including AIOps features like anomaly detection, event correlation)
Containerization: Docker, Kubernetes
Automation: Python scripting, workflow automation
Reporting & Analysis: Log analysis, issue drill-down, technical debt identification
AI Tools: GitHub Copilot, GenAI familiarity
ITIL Fundamentals: Incident, change, and problem management
System & Cloud Architecture: Design principles, scalability, redundancy
Collaboration: Working with architects and engineering teams
Continuous Improvement: Process optimization, effort elimination
Experience with AIOps platforms such as:
Moogsoft - for event correlation and noise reduction
Datadog - for full-stack observability and AI-driven root cause analysis
Splunk ITSI - for predictive analytics and service intelligence
ServiceNow ITOM - for workflow automation and anomaly detection
Ability to interpret and act on AI-driven insights for proactive incident resolution.
Experience in tools like Docker and Kubernetes for managing containerized applications.
Experience with monitoring and logging solutions such as Prometheus, Grafana, and the ELK stack (Elasticsearch, Logstash, Kibana).
Expertise in creating Datadog dashboards, monitors, and log pipelines.
Must have Skills:
Excellent analytical and troubleshooting skills to diagnose and resolve complex issues.
Effective communication skills to collaborate with cross-functional teams and convey technical information clearly.
Ability to thrive in a fast-paced environment, managing multiple tasks and projects simultaneously.
* Previous experience in a similar role or relevant industry experience is highly preferred. Knowledge of cloud platforms like AWS, Azure, or Google Cloud
Beware of fraud agents! do not pay money to get a job
MNCJobsIndia.com will not be responsible for any payment made to a third-party. All Terms of Use are applicable.