Deskripsi Pekerjaan
Are you passionate about pushing the boundaries of computing power? Join the National Supercomputing Centre (NSCC) Singapore and the Agency for Science, Technology and Research (A*STAR) as an HPC System Engineer. You will play a pivotal role in maintaining and optimizing our world-class supercomputing infrastructure. Our team is dedicated to supporting cutting-edge scientific research, ensuring high availability, scalability, and robust security across our systems. We are looking for a proactive professional to manage complex system architectures and drive technical excellence. This is a unique opportunity to work with state-of-the-art hardware and collaborate with leading researchers in Singapore.
In this role, you will bridge the gap between hardware and software, ensuring our high-performance computing environment delivers peak performance for critical research projects. We value innovation, collaboration, and technical depth, offering you a chance to shape the future of computational science in the region.
Tanggung Jawab
- Design, deploy, and manage high-performance computing (HPC) clusters, storage systems, and network infrastructure.
- Monitor system performance, troubleshoot complex hardware and software issues, and implement performance optimizations to ensure uptime.
- Ensure the security and integrity of the supercomputing infrastructure through regular audits, patch management, and access control.
- Collaborate with software developers and researchers to optimize applications for parallel processing environments and HPC workflows.
- Develop and maintain automation scripts and tools (e.g., Python, Bash) to streamline system administration and deployment tasks.
- Provide technical support, training, and guidance to internal users regarding HPC resource utilization and best practices.
- Stay abreast of the latest advancements in HPC technologies and evaluate their applicability to our evolving research needs.
Kualifikasi
- Bachelor’s degree or higher in Computer Science, Engineering, Physics, or a related technical field.
- Proven experience in system administration, preferably within an HPC or large-scale data center environment.
- Strong proficiency in Linux/Unix environments, shell scripting (Bash, Perl), and Python.
- Deep understanding of networking protocols, storage systems (SAN/NAS), and virtualization technologies.
- Solid knowledge of high-performance computing concepts, including MPI, job scheduling (e.g., Slurm, PBS), and containerization (Docker/Singularity).
- Excellent problem-solving skills and the ability to work in a fast-paced, collaborative team setting.
- Relevant certifications (e.g., RHCE, SUSE Certified Engineer) are highly desirable.