Job Description:
Proven experience in Site Reliability Engineering, DevOps, or a similar role.
Skills:
- Proficiency in programming and scripting languages (e.g., Python, Go, Bash).
- Proven experience managing Kubernetes in production environments.
- Experience with cloud platforms (Azure) and container orchestration systems (Kubernetes, Docker).
- Experience in Prometheus, Grafana, Datadog, AppDynamics, New Relic, Elasticsearch, etc.
- Strong understanding of distributed systems, microservices architecture, and networking.
- Expertise in designing monitoring systems with KPIs, SLOs, and SLIs.
- Experience with incident response, postmortem analysis, and reliability reporting.
#J-18808-Ljbffr