Senior Compiler Engineer
Our client aims to revolutionize hardware and systems through a software-first approach, empowering AI innovators to surpass previous limits. The goal is to alleviate computational burdens, streamline model training and deployment, and ultimately maximize societal benefits from this transformative technology.
Their SPU (Spatial Processing Unit) stands as the pinnacle of programmable digital processors, significantly reducing AI-related costs. Efforts span across all engineering layers, encompassing hardware, runtime compilers, kernel optimization, algorithm development, and software architecture.
Looking for a Senior Compiler Engineer to drive compiler optimization for our state-of-the-art technology, enhancing code efficiency on our specialized hardware. You will join a collaborative team dedicated to innovative problem-solving and quality product creation. Make a lasting impact in AI's future.
Responsibilities:
- Lead the design, enhancement, and maintenance of our next-generation SPU compiler.
- Propose and implement enhancements to our Intermediate Representation (IR) to accommodate emerging trends in machine learning model architectures.
- Develop novel compiler passes and scheduling techniques to optimize code generation.
- Employ state-of-the-art parallelization and partitioning methodologies to automate kernel generation and exploit optimized kernels.
- Engage in rapid prototyping and data-driven exploration to evaluate new concepts.
- Benchmark and analyze compiler outputs on SPU hardware, ensuring peak performance.
- Collaborate closely with hardware and software teams to align with the evolving requirements of ML engineers and drive architectural improvements.
- Develop tools for performance bottleneck analysis.
Qualifications:
- Bachelor's degree in computer science, computer engineering, electrical engineering, or equivalent; preference given to applicants with Master's or PhD.
- 2+ years of experience in compiler development, particularly in compiler backends and retargeting.
- Proficiency (5+ years) in C/C++ (C++14 or newer) and Python.
- Understanding of functional programming principles.
- Familiarity with loop optimization techniques (vectorization, unrolling, fusion, parallelization, etc.).
- Experience with FPGAs or CGRAs.
- Knowledge of DL frameworks such as Tensorflow or PyTorch preferred, but not required.
- Working knowledge of LLVM, MLIR, and polyhedral models.
- Exposure to ONNX is advantageous.