a. Independently deliver well-scoped projects (e.g., cluster upgrades, new GitOps workflows, Terraform module stack, observability rollout). b. Debug production issues across layers (K8s, networking, cloud IAM, CI/CD, app configuration), not just within a single tool. c. Improve the platform, not only operates it: adds automation, self-service, paved roads, reliability patterns. d. Will be comfortable being in an on-call rotation and writing solid runbooks/alerts as the capability matures.
Collaboration & Communication:
a. Collaborate with the software engineering and IT support teams. b. Communicate effectively with stakeholders to update them on progress, results, and any issues encountered. c. Provide input into the platform design process based on experience and gathered requirements.
Reporting & Documentation:
a. Document all work, providing user guides and other documentation as required. b. Ability to clearly comment on check-ins and link to JIRA or similar task management tools. c. Provide regular updates on project status and results to colleagues. Skills and Technologies Core Platform / Kubernetes Hands-on experience operating Kubernetes in production, including: o On-premises clusters (bare metal / virtualized) and cloud-managed K8s (e.g., AKS/EKS/GKE). o Cluster lifecycle: provisioning, upgrades, scaling, backup/restore, and incident troubleshooting. o Working knowledge of networking, ingress, service meshes (nice-to-have), storage classes, and RBAC. Helm (or equivalent) for packaging and deploying services; ability to create, maintain, and version charts. Infrastructure as Code & Automation Strong Terraform experience: o Building reusable modules, managing state, environment promotion, and drift detection. o Provisioning both cloud infrastructure and (where relevant) on-prem supporting services. Configuration management / automation using Ansible (preferred) or equivalent: o Playbooks/roles for consistent host provisioning, patching, and service configuration. Bare-metal provisioning exposure is a plus: o PXE Boot and automated OS/hardware bring-up workflows (or willingness to learn). GitOps & CI/CD Proven use of GitOps practices for platform and application delivery. ArgoCD for declarative continuous delivery: o Application/project configuration, sync policies, multi-environment promotion, RBAC, and troubleshooting. CI integration using GitHub Actions (or similar): o Automated testing, security scanning, Terraform plan/apply pipelines, and deployment workflows. Security & Zero Trust Practical application of a Zero Trust security approach, including: o Identity-first access controls, least privilege, and strong service-to-service authentication. o Network policy / segmentation concepts (e.g., K8s NetworkPolicies, cloud security groups). o Secrets management (Vault / cloud KMS / sealed secrets / SOPS -- specify your choice later). Familiarity with compliance-oriented thinking: auditability, change traceability, and secure defaults. Cloud & Hybrid Infrastructure Solid production experience with at least one major cloud: AWS, Azure, or GCP. o Core services: compute, networking, IAM, load balancing, storage, monitoring. Willingness and ability to work across multiple clouds as needed. Understanding of hybrid patterns: connectivity, identity federation, shared observability, and workload placement trade-offs. Observability & Reliability Experience implementing and operating monitoring, logging, and alerting for platform and workloads: o Prometheus/Grafana, ELK/Opensearch, Datadog, or similar. Reliability practices: o On-call participation, incident response, post-mortems, runbooks, capacity planning. Engineering Practices & Collaboration Strong scripting and automation ability (Bash + Python/Go preferred). Comfortable working with developers to create "paved road" self-service platforms. Good operational hygiene: o Documentation, change management, clear communication during incidents and releases. Authority To deliver a CI/CD supporting platform capability. To run clearly scoped projects and make decisions about exact implementation. To secure the platform and its capabilities. To provide guidance and knowledge to their colleagues on the platform and how best to use it. If the above JD suits your profile, please mail your updated resume to