Job Title: Automation Engineer - Cloud and Reliability
Job Responsibilities:
- Develop scripts to automate processes and reduce toil and failures.
- Monitor the health of applications, batch processes, and data feeds.
- Set up monitoring systems and develop dashboards for performance tracking.
- Lead and triage major incidents, investigating and troubleshooting failures.
- Perform problem management to identify root causes and implement preventative actions using monitoring and automation.
- Support disaster recovery exercises to ensure system resilience.
- Collaborate with peers, vendors, and managers to resolve issues and meet objectives.
Required Qualifications:
- 2+ years of experience working in an SRE (Site Reliability Engineering) team.
- 2+ years of experience supporting cloud applications on platforms such as GCP, Azure, and OpenShift.
- 2+ years of experience in the Software Development Life Cycle (SDLC).
- 3+ years of experience in automation using Unix Shell scripting, Python, or similar technologies.
- 3+ years of experience with Autosys.
- 3+ years of experience with databases such as Oracle, MSSQL, Teradata, or MongoDB.
Desired Qualifications:
- Experience with ETL tools such as Informatica and Ab Initio.
- Familiarity with Enterprise Data Lake technologies, including Spark, Hive, and MapR.
- Experience with Java, JavaScript, or Talend.
- 3+ years of experience in observability, including setting up monitoring with Elastic APM, AppDynamics, and visualizing data using Grafana dashboards.