Summary
Carrier Services offers seamless integration of Apple Retail Stores and Apple Online store with major US Carriers for iPhone activations. We are looking for a talented Site Reliability Engineer to join our growing team. As an SRE, you will be responsible for ensuring the reliability, scalability, and performance of our systems and services. You will work closely with our engineering and operations teams to design, build, and maintain robust infrastructure and automation solutions. If you are an SRE engineer who can thrive in a dynamic environment and can make a meaningful impact through your technical expertise and dedication to excellence, come join our team as a Site Reliability Engineer (SRE).
Description
This role demands extensive hands-on experience of working as an SRE engineer for large-scale, customer-facing Cloud applications. The candidate should have a good understanding of SRE principles, including monitoring, alerting, error budgets, fault analysis, and other common reliability engineering concepts. Excellent troubleshooting and problem-solving skills are essential. The candidate will be expected to represent the SRE organization in design reviews and operational readiness exercises for new and existing services. Collaboration with technical and non-technical teams and analyzing statistics to gain a clear picture of the current state of our system will be required. Having good working knowledge of Oracle and Cassandra databases will be beneficial. A passion for automating manual operations and improving them through repeated iteration is necessary. A good understanding of networking and load balancing concepts is also required, along with the ability to lead a small team and develop innovative solutions. Candidates should be self-motivated, capable of making business-critical decisions, and comfortable working in a dynamic, ever-changing environment. Proactivity in dealing with critical production issues and taking them to closure while working with required partners is essential. Participation in an on-call rotation providing hands-on technical expertise during service-impacting events is expected.
Minimum Qualifications
- 4 years of experience in Incident Management for large-scale, high-impact customer-facing retail applications, debugging and driving root cause analysis, prioritizing by impact, and ensuring timely resolution.
- 4 years of excellent troubleshooting, problem-solving, and debugging abilities.
- 4 years of experience in monitoring & building complex queries and dashboards using Splunk and Prometheus.
- 4 years of proficiency in at least one scripting language (e.g., Python).
- 2 years of experience with Oracle and Cassandra databases, including writing complex queries.
- BS in Computer Science or equivalent work experience.
Key Qualifications Preferred Qualifications
- Willingness to participate in on-call rotations and provide weekend coverage as needed.
- Experience in communicating complex technical concepts to both technical and non-technical stakeholders.
- Strong problem-solving skills, software development, and debugging skills.
- Proven track record of taking ownership and successfully delivering results.
Education & Experience Additional Requirements
- Apple is an equal opportunity employer that is committed to inclusion and diversity. We take affirmative action to ensure equal opportunity for all applicants without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, Veteran status, or other legally protected characteristics. Learn more about your EEO rights as an applicant.
#J-18808-Ljbffr