Deskripsi Pekerjaan
Are you passionate about highâperformance computing and artificial intelligence? Join the Agency for Science, Technology and Research (A*STAR) as an HPC AI Engineer on the Frontier platform at the National Supercomputing Centre (NSCC). In this role you will design, implement and optimise cuttingâedge AI solutions that empower research across healthcare, scientific discovery, and beyond.
You will work closely with domain scientists, data engineers and system administrators to translate complex computational challenges into scalable AI models that run efficiently on massive GPU clusters. Your expertise will ensure that our Frontier supercomputer remains at the forefront of AI innovation, delivering measurable impact for Singaporeâs research ecosystem.
Key responsibilities include developing and fineâtuning deepâlearning pipelines, profiling and accelerating computeâintensive workloads, and creating robust software frameworks that enable seamless experimentation. You will also contribute to documentation, user training, and openâsource projects, fostering a culture of knowledge sharing and continuous improvement.
If you thrive in a collaborative, multidisciplinary environment and want to shape the future of AIâdriven scientific research, weâd love to hear from you.
Tanggung Jawab
- Design, develop and optimise deepâlearning models and pipelines for highâperformance GPU clusters.
- Profile and improve the performance of largeâscale AI workloads on the Frontier supercomputer.
- Collaborate with researchers to translate scientific problems into scalable AI solutions.
- Maintain and enhance HPC software stacks, including CUDA, MPI, and containerised environments.
- Produce detailed technical documentation, bestâpractice guides, and training materials.
- Contribute to openâsource projects and internal codebases, promoting reproducibility and efficiency.
- Support system administrators in troubleshooting hardware and software issues related to AI workloads.
Kualifikasi
- PhD or Masterâs degree in Computer Science, Electrical Engineering, Data Science, or a related discipline.
- 3+ years of handsâon experience with HPC environments and largeâscale GPU computing.
- Proficiency in programming languages such as Python, C++, and CUDA.
- Strong experience with deepâlearning frameworks (e.g., TensorFlow, PyTorch) and dataâprocessing libraries.
- Inâdepth knowledge of parallel programming models (MPI, OpenMP) and job schedulers (Slurm, PBS).
- Familiarity with Linux system administration and softwareâdeployment workflows.
- Excellent problemâsolving skills and the ability to communicate complex technical concepts to nonâexpert audiences.
- Prior exposure to national or international supercomputing facilities is a plus.