Site Reliability Engineer

job
  • The Judge Group
Job Summary
Location
Minneapolis ,MN 55400
Job Type
Contract
Visa
Any Valid Visa
Salary
PayRate
Qualification
BCA
Experience
2Years - 10Years
Posted
26 Jan 2025
Share
Job Description

Job Title: Site Reliability Engineer

Duration: Direct hire

Location: Hybrid Role - must be able to commit to 3 days/week in our Bloomington office


What you’ll be doing:

  • Collaborate with development and operations teams to design, implement, and maintain observability frameworks that provide deep insights into system performance, particularly for data and ML pipelines.
  • Lead the establishment of Service Level Objectives (SLOs) and Service Level Indicators (SLIs), ensuring they align with business goals and drive continuous performance improvements.
  • Partner with stakeholders to understand system performance requirements and translate them into actionable performance engineering strategies.
  • Proactively identify performance bottlenecks and collaborate with teams to implement solutions that enhance system scalability and reliability.
  • Design and execute performance regression test suites, focusing on data-intensive and ML workloads, to ensure continuous performance optimization.
  • Own the reliability and performance metrics of our systems, driving a culture of performance excellence and proactive issue resolution.
  • Collaborate with subject matter experts to gain a deep understanding of domain-specific performance challenges, particularly in data and ML pipelines.
  • Utilize tools like Datadog, Jira, and GitHub to monitor system performance, manage projects, and track issues, with a strong emphasis on performance-related metrics.
  • Define and monitor success metrics, ensuring our systems consistently meet or exceed performance and reliability targets.
  • Actively contribute to the continuous improvement of performance engineering practices across the team, fostering a culture of excellence in observability and system performance.
  • Perform other duties as assigned.


What you’ll bring to us:

  • Bachelor’s degree in computer science, Engineering, or a related field.
  • Five years of experience in a site-reliability-focused role responsible for establishing reliability standards in a cloud-native environment
  • Strong expertise in establishing SLOs/SLIs and building observability frameworks for complex systems.
  • Proficiency with cloud services, particularly AWS, and experience in designing scalable and reliable architectures.
  • Hands-on experience with performance monitoring and observability tools like Datadog.
  • Proficiency in version control systems like Git/GitHub and infrastructure as code tools like Terraform.

Other Smiliar Jobs
 
  • San Antonio, TX
  • 3 Days ago
  • San Antonio, TX
  • 3 Days ago
  • Salt Lake City, UT
  • 3 Days ago
  • Broomall, PA
  • 3 Days ago
  • Deerfield, IL
  • 3 Days ago
  • Concord, NC
  • 3 Days ago
  • Fort Myers, FL
  • 3 Days ago
  • Broomall, PA
  • 3 Days ago
  • Plano, TX
  • 15 Hours ago
  • Camas, WA
  • 15 Hours ago