Job Title: Site Reliability Engineer Lead
Location: Phoenix, AZ – Hybrid (2 days a week in the office)
Duration: 04 Months Contract To Hire
Only on W2
Day
Shift: EST
Work schedule Mon-Fri 07:00am ET - 3:30pm ET
Additional Skills and Notes: Lead - 8+ years of experience
Job Description
Candidate should possess skills that are aligned with the Site Reliability Engineering (SRE) principles with a focus on the discipline specific to this position. SRE is a software engineering approach to Client’s SRC Operations. SRE professionals use software as a tool to manage systems, solve problems, and automate tasks. Engineers’ focus specializes in improving all aspects of reliability, acting as a conduit between infrastructure and application teams on support issues and improving tools, automation, processes, and software.
Responsibilities | SRE
- Monitor systems and infrastructure to maintain operational and performance levels
- Rotational on-call responsibilities
- Work closely with other SRC professionals/engineers when issues arise, collaborate on troubleshooting, and provide consultation/resolution with events/incidents
- Anticipate potential problems before they become impacting and collaborate to determine solutions
- Gather and analyze metrics from tools and system/application logs to assist in performance tuning, fault finding, and resolution
- Create sustainable systems and services through automation, process enhancement, tools, and noise reduction
- Build automation to manage the SRC operations and eliminate/minimize manual functions and toil
- Collaborate with Application/Infrastructure support engineers and operations teams
- Engage in post-incident reviews for improvements and determining the cause to prevent recurrence
Required Knowledge, Skills, and Abilities
Possess a breadth and depth of technical and management knowledge
Continuous improvement mindset, always looking for opportunities to streamline, routinize, or automate
Working knowledge across technology in the following support areas:
- Server: Administration and troubleshooting in Linux and Windows as well as patching and basic scripting skills (PowerShell, Bash)
- Converged Solutions: Experience in VCE/UCP (including VMWare versions 6 and above), platform and network connectivity, and patching – understanding of current threat analysis and remediation trends, alongside PowerShell and Linux scripting skills
- Storage: CIFS/NFS, Linux and Windows scripting, DPA reporting, Avamar and Data Domain administration, and solid understanding of Windows and Linux environments
- Middleware: Linux, Windows, WebSphere, Apache, IIS, WebLogic and Tomcat
- Mainframes: JCL, CICS SYSPLEX
- Networking: Strong understanding of the network protocols and OSI Model, as well as Network+ Certification
- Workflow and Knowledge Management: ServiceNow
- Collaboration Tools: TrueSight, Jira, and Confluence
- Process: Skilled and knowledgeable in ITSM; proficiency in operations analytics methodologies to drive performance improvement (e.g., Lean)
- Strong troubleshooting and problem-solving skills, with the ability to analyze and resolve complex technical issues
- ITIL fundamentals
- Familiarity with Problem Management, Change Management, Release Management, Event Management, and Incident Management
- Operational background, with Critical Incident Management skills, and experience with monitoring tools such as BigPanda, Dynatrace, and Truesight.
Soft Skills
- Adaptability to prioritize criticality to incoming incidents; high volume environment
- Capable of balancing multiple projects
- Ability to quickly learn and adapt to testing and support requirements for non-production work, including creating documentation for new processes and procedures
- Strong problem resolution skills including the ability to drive problems in a fast fast-paced environment problem bridges
- Strong skills in addressing production-critical incidents
- Strong troubleshooting and problem-solving skills, with the ability to analyze and resolve complex technical issues
- Excellent communication and interpersonal skills, with the ability to collaborate effectively with stakeholders at all levels
- Self-motivated and able to work independently or as part of a team, taking ownership of tasks and driving them to completion
- Insatiable curiosity about how technologies work and how technologies interface in complex, large-scale environments
Education / Experience
Bachelor’s degree in Engineering, Computer Science, or related field required (or equivalent experience
2 years experience supporting a large enterprise centre
Interview- video, 2 rounds if needed.