We are seeking an experienced DevOps Engineer to build and maintain the infrastructure, deployment pipelines, and monitoring systems for our energy management platforms. You will work with AWS cloud services, Docker, Jenkins, and a comprehensive observability stack to ensure our systems are reliable, scalable, and performant. The ideal candidate will enable continuous delivery, implement infrastructure as code, and ensure our platform can handle real-time energy data from thousands of devices while maintaining high availability and security.
===================
Key Responsibilities:
=========================
CI/CD & Infrastructure Automation
--------------------------------------
Design, implement, and maintain CI/CD pipelines using Jenkins and GitHub Actions
Implement Infrastructure as Code using Terraform or CloudFormation
Manage multi-environment deployments (dev, qa, staging, demo, production) with automated promotion strategies
Manage GitHub repositories, branching strategies, and access controls
Automate repetitive operational tasks to improve efficiency
Cloud Infrastructure Management
-----------------------------------
Manage AWS cloud infrastructure including ECS (Elastic Container Service), EC2, SSM (Systems Manager), S3, VPC, and related services
Build and manage Docker containers for microservices deployment
Configure load balancers and implement auto-scaling strategies
Optimize system performance and cost efficiency in AWS
Ensure security best practices including IAM policies, secrets management (SSM Parameter Store), and network security
Monitoring & Observability
-------------------------------
Configure and maintain monitoring solutions using CloudWatch, Grafana, InfluxDB, and Telegraf
Set up alerting and incident response procedures for production systems
Implement logging aggregation and analysis solutions
Create dashboards and metrics for system health and performance tracking
Participate in on-call rotation for production support
Security & Compliance
--------------------------
Implement backup and disaster recovery procedures
Ensure compliance with security standards and best practices
Manage SSL/TLS certificates and secure communication channels
Conduct security audits and vulnerability assessments
Implement network segmentation and access controls
Collaboration & Documentation
----------------------------------
Collaborate with developers to optimize application performance and resource utilization
Troubleshoot production issues and perform root cause analysis
Document infrastructure architecture and runbooks in Confluence
Contribute to capacity planning and scalability assessments
Provide technical guidance on deployment and infrastructure topics
Qualifications
==================
Education
-------------
Bachelor's or Master's degree in Computer Science, Engineering, or related field, or equivalent professional experience
Experience
--------------
Minimum of 3 years of professional experience in DevOps, SRE, or similar roles
Strong hands-on experience with AWS services (ECS, EC2, SSM, CloudWatch, S3, VPC)
Solid experience with Docker containerization and orchestration
Experience building and maintaining CI/CD pipelines (Jenkins or similar)
Experience with Git/GitHub and version control workflows
Experience with Kubernetes (EKS) or other container orchestration platforms (desirable)
Experience with Infrastructure as Code (Terraform, CloudFormation) (desirable)
Experience with monitoring IoT or real-time data processing systems (desirable)
AWS certifications (Solutions Architect, DevOps Engineer, etc.) (highly desirable)
Bash, Python, or similar for automation (required)
IaC:
Terraform or CloudFormation (desirable)
Tools:
Jira, Confluence
Experience with Ansible, Puppet, or Chef (desirable)
Knowledge of security scanning tools and practices (desirable)
Experience with time-series databases optimization (desirable)
Understanding of networking concepts (VPC, subnets, routing, load balancing)
Soft Skills
---------------
Strong troubleshooting skills and ability to diagnose complex infrastructure issues
Understanding of system architecture, networking, and security best practices
Ability to automate repetitive tasks and improve operational efficiency
Experience with monitoring, logging, and observability best practices
Proactive in identifying and resolving potential issues before they impact users
Ability to balance speed of delivery with stability and security requirements
Collaborative team player who works effectively with developers and other stakeholders
Strong communication skills to explain technical concepts to various audiences
Languages
-------------
Fully proficient in English both written and spoken (required)
Knowledge of French or German (desirable)