Lead Site Reliability Engineer

job
  • Tbwa Chiat/Day Inc
Job Summary
Location
,MS
Job Type
Contract
Visa
Any Valid Visa
Salary
PayRate
Qualification
BCA
Experience
2Years - 10Years
Posted
15 Mar 2025
Share
Job Description

Roadie, a UPS Company, is a logistics management and crowdsourced delivery platform. Founded in 2014, Roadie offers businesses fast, flexible and asset-light logistics solutions for last-mile delivery. Roadie enables local delivery to more than 95% of U.S. households by providing access to more than 200,000 independent drivers nationwide – allowing businesses to offer their customers delivery optionality for almost any industry, from airlines to artisans.

Roadie is seeking a Senior Site Reliability Engineer to join our growing Technical Operations Team. We are looking for a candidate who has experience implementing site reliability principles, as well as production level Kubernetes experience. The ideal candidate is a skilled problem solver with intimate knowledge of site reliability practices, standard dev ops principles, AWS, scripting languages and Kubernetes.

What You'll Do

  • Build systems that optimize the uptime and reliability of our platform, and support the management and optimization of our software delivery pipeline, observability and infrastructure operations.
  • Maintain, support, and engineer production and non-production Kubernetes Clusters (EKS) as well as ES, MSK, RDS, and EC (Redis) clusters.
  • Deploy and maintain monitoring and logging solutions based on Prometheus, Loki, Thanos, Grafana, OpenTelemetry and New Relic.
  • Collaborate with cross-functional teams to identify and address potential bottlenecks, optimize resource utilization, and proactively prevent system failures.
  • Define and manage SLO, SLI and error budgets.
  • Develop processes, tools and automation to reduce toil across engineering teams.
  • Plan and forecast service capacity and demand, assess cost optimization, and tune systems and software.
  • Debug production / non-production issues.
  • Take part in 24/7 on-call rotation.

What You Bring

  • 5+ Years in various SRE roles.
  • 5+ Years in various DevOPS/System Engineering roles.
  • 5+ Years of experience building and managing production Kubernetes infrastructure.
  • 6+ Years experience with popular scripting languages (Python, Ruby, Bash, etc.).
  • Experience with Infrastructure as code such as Terraform or Crossplane.
  • Experience with CI/CD Development tools (CircleCI, etc.).
  • Experience with GitOPS Tools (ArgoCD).
  • Experience using a broad range of AWS technologies (RDS, ElasticSearch, VPC, EKS, S3, CloudFront, MSK, Elasticache, CloudWatch, etc.).
  • Experience developing and maintaining YAML templating systems (Helm charts, Kustomize, etc.).
  • Must be able to work independently, be self-motivated and handle multiple priorities.
  • Comfortable working in a fast-paced agile environment.

Finally, a willingness to admit what you don’t know, and learn what you need to learn quickly.

Why Roadie?

  • Competitive compensation packages.
  • 100% covered health insurance premiums for yourself.
  • 401k with company match.
  • Tuition and student loan repayment assistance (that’s right - Roadie will contribute directly to your existing student loans!).
  • Flexible work schedule with unlimited PTO.
  • Monthly WFH stipend.
  • Paid sabbatical leave- tenured team members are given time to rest, relax, and explore.
  • The technology you need to get the job done.

This role is not eligible for Visa sponsorship. Applicants must be authorized to work for any employer in the U.S.

#J-18808-Ljbffr