Apex is searching for an HPC Software Engineer with 5+ years of experience. Candidate must be able to obtain a Public Trust and commute a few times per year to Rockville, MD.
If interested, email me at
Thanks :)
HPC Software Support Engineer Job Description
Responsibilities:
- Work with a 4000+ core HPC cluster that is GPU-focused and a 1,500+ HPC cluster supporting the hardware and operating system environments
- Supporting bioinformatics applications for a large and diverse research community with needs in genomics, cryo-electron microscopy, and AI/ML
- Monitor the portfolio of software applications and be proactive in planning upgrades and license renewals
- Monitor and report on cluster performance and generate data to show usage and trends
- Triage support requests from the research community and work with others in the Scientific Infrastructure team to resolve issues and complete service requests
- Collaborate with researchers to guide them in effective use of the HPC resources, such as job scheduler submission, data formats, and building data workflows
- Engage with researchers to understand their HPC needs to include data life cycle management, integration of scientific instruments to HPC, and storage capacity and compute requirements
- Provide input to the Scientific Infrastructure team leader for setting priorities for cluster operations, scheduling policies, resources needed, etc.
- Attend and actively participate in daily standup meetings to provide updates on progress, discuss obstacles, and co-ordinate tasks with other team members
- Work collaboratively in a team environment to achieve project goals
- Engage in open communication, share knowledge, and support fellow teammates
- Provide feedback and contribute to the continuous improvement of team processes
What You’ll Need to Succeed:
Education:
Required Experience:
- Five years of related experience
Required Technical Skills:
- Minimum of five years of experience with servers, datacenters, networking, and related technologies
- Minimum of five years of experience managing Linux systems
- Experience with Spack package manager, including making packages from PyPi, R, Github
- Experience installing and packaging GPU applications and optimizing job submission scripts that are used for ML model training, data mining operations, or high-res graphics rendering
- Experience with Python scripting
- Experience using Git distributed workflows
- Experience with Ansible manage system configuration
- Experience with Terraform for provisioning systems
#J-18808-Ljbffr