Roadie, a UPS Company, is a logistics management and crowdsourced delivery platform. Founded in 2014, Roadie offers businesses fast, flexible and asset-light logistics solutions for last-mile delivery. Roadie enables local delivery to more than 95% of U.S. households by providing access to more than 200,000 independent drivers nationwide – allowing businesses to offer their customers delivery optionality for almost any industry, from airlines to artisans.
Roadie is seeking a Senior Site Reliability Engineer to join our growing Technical Operations Team. We are looking for a candidate who has experience implementing site reliability principles, as well as production level Kubernetes experience. The ideal candidate is a skilled problem solver with intimate knowledge of site reliability practices, standard dev ops principles, AWS, scripting languages and Kubernetes.
What You'll Do
- Build systems that optimize the uptime and reliability of our platform, and support the management and optimization of our software delivery pipeline, observability and infrastructure operations.
- Maintain, support, and engineer production and non-production Kubernetes Clusters (EKS) as well as ES, MSK, RDS, and EC (Redis) clusters.
- Deploy and maintain monitoring and logging solutions based on Prometheus, Loki, Thanos, Grafana, OpenTelemetry and New Relic.
- Collaborate with cross-functional teams to identify and address potential bottlenecks, optimize resource utilization, and proactively prevent system failures.
- Define and manage SLO, SLI and error budgets.
- Develop processes, tools and automation to reduce toil across engineering teams.
- Plan and forecast service capacity and demand, assess cost optimization, and tune systems and software.
- Debug production / non-production issues.
- Take part in 24/7 on-call rotation.
What You Bring
- 5+ Years in various SRE roles.
- 5+ Years in various DevOPS/System Engineering roles.
- 5+ Years of experience building and managing production Kubernetes infrastructure.
- 6+ Years experience with popular scripting languages (Python, Ruby, Bash, etc.).
- Experience with Infrastructure as code such as Terraform or Crossplane.
- Experience with CI/CD Development tools (CircleCI, etc.).
- Experience with GitOPS Tools (ArgoCD).
- Experience using a broad range of AWS technologies (RDS, ElasticSearch, VPC, EKS, S3, CloudFront, MSK, Elasticache, CloudWatch, etc.).
- Experience developing and maintaining YAML templating systems (Helm charts, Kustomize, etc.).
- Must be able to work independently, be self-motivated and handle multiple priorities.
- Comfortable working in a fast-paced agile environment.
Finally, a willingness to admit what you don’t know, and learn what you need to learn quickly.
Why Roadie?
- Competitive compensation packages.
- 100% covered health insurance premiums for yourself.
- 401k with company match.
- Tuition and student loan repayment assistance (that’s right - Roadie will contribute directly to your existing student loans!).
- Flexible work schedule with unlimited PTO.
- Monthly WFH stipend.
- Paid sabbatical leave- tenured team members are given time to rest, relax, and explore.
- The technology you need to get the job done.
This role is not eligible for Visa sponsorship. Applicants must be authorized to work for any employer in the U.S.
#J-18808-Ljbffr