Armis, the cyber exposure management & security company , protects the entire attack surface and manages an organization’s cyber risk exposure in real time. In a rapidly evolving, perimeter-less world, Armis ensures that organizations continuously see , protect and manage all critical assets - from the ground to the cloud . Armis secures Fortune 100, 200 and 500 companies as well as national governments, state and local entities to help keep critical infrastructure, economies and society safe and secure 24/7.
Armis is a privately held company headquartered in California.
Location: Remote anywhere in the US
Join a dynamic and highly-skilled team responsible for maintaining and ensuring the reliability of Armis’s cutting-edge services and applications. The Site Reliability Engineer / Production Engineer plays a key role in seamless deployments, real-time monitoring, and efficient management of critical services relied upon by federal customers. The SRE is always looking to innovate and better our processes, procedures, and proactive measures while maintaining compliance in a highly regulated environment.
Responsibilities include:
- Build and deploy Kubernetes services using tools such as Git, Helm, and custom tools.
- Guarantee uptime and reliability for production systems through proactive monitoring using Prometheus, Grafana, and Alertmanager.
- Develop playbooks, tools, and scripts to streamline processes and shorten problem resolution time.
- Manage vulnerability assessments and facilitate prompt remediation to maintain security and compliance.
- Troubleshoot a wide range of issues, from helm templating to SQL tuning, to determine root causes and solutions to prevent future occurrences.
- Collaborate closely with members of the DevOps and Engineering teams to ensure smooth upgrades.
- Improve or create new tools that help enable other engineers to perform their work more efficiently.
- Perform ad-hoc tasks requested by other teams (i.e., running SQL scripts).
- Debug a wide range of Kubernetes services.
- Debug complex Python code and provide an analysis of findings with potential solutions.
- Develop and maintain comprehensive documentation for procedures and processes to ensure knowledge sharing and continuity.
- Identify and fix gaps in processes.
- Offer on-call support during off-hours (nights and weekends) to address critical incidents and ensure system stability.
Requirements:
Ideal candidates will have:
- 3+ years of experience with Python and Bash.
- 3+ years of experience with Kubernetes, particularly with EKS.
- 3+ years of experience with Helm.
- Working knowledge of Git.
- 3+ years of experience with Prometheus/Alertmanager/Grafana and using these tools to increase proactive monitoring and assist in debugging.
- Working knowledge of AWS services.
- 3+ years of experience debugging Kubernetes services, providing solutions, and communicating findings to appropriate teams.
- 3+ years of experience with Python code, providing solutions, and communicating findings to appropriate teams.
Preferred Qualifications:
- Focused on creating detailed documentation.
- Think outside of the box, identify opportunities and solutions to process improvement.
- Team player both within the team and with outside teams (including international).
Salary range guidance for this position: $100,000- $170,000.00
The salary range listed does not include other forms of compensation or benefits (e.g., bonuses, commissions, stocks, health insurance benefits, etc.) offered to candidates. Visit our careers site for more information on benefits at Armis.
Armis sets you up for success with comprehensive health benefits, discretionary time off, paid holidays including monthly me days, and a highly inclusive and diverse workplace. Put your unique experiences and perspective to work in an environment where they will enable you to thrive, grow, and live your life with integrity.
Armis is proud to be an equal opportunity employer. We never discriminate based on race, ethnicity, color, ancestry, national origin, religion, sex, sexual orientation, gender identity, age, disability, veteran status, genetic information, marital status or any other legally protected (or not) status.
#J-18808-Ljbffr