Deskripsi Pekerjaan
We are seeking a talented Site Reliability Engineer (SRE) to join our Product Operations team in Singapore. In this pivotal role, you will be at the forefront of our technological evolution, applying SRE principles to enhance system reliability, performance, and scalability across our platform.
As an SRE, you will bridge the gap between development and operations, working closely with product teams to ensure our services meet the highest standards of availability and efficiency. You will leverage your expertise in automation, monitoring, and incident response to drive continuous improvement across our infrastructure.
This is an exciting opportunity for a driven professional who thrives in fast-paced environments and is passionate about building resilient systems. You will have the chance to shape our operational practices and make a significant impact on our products and customers.
Tanggung Jawab
- Design, implement, and maintain scalable infrastructure and automated solutions to ensure system reliability and performance
- Develop and deploy monitoring, alerting, and incident management systems to proactively identify and resolve issues
- Collaborate with development teams to embed reliability principles into the software development lifecycle
- Conduct post-incident reviews and implement corrective actions to prevent future outages
- Optimize system performance through capacity planning, load testing, and performance tuning
- Create and maintain infrastructure-as-code and configuration management practices
- Champion automation initiatives to reduce manual work and improve operational efficiency
- Establish SRE best practices and documentation for the engineering organization
Kualifikasi
- Bachelor's degree in Computer Science, Engineering, or a related technical field
- Proven experience in Site Reliability Engineering, DevOps, or Systems Engineering roles
- Strong proficiency in scripting languages such as Python, Go, or Bash
- Experience with cloud platforms (AWS, GCP, or Azure) and container orchestration (Kubernetes, Docker)
- Solid understanding of networking concepts, protocols, and security best practices
- Excellent problem-solving skills with the ability to troubleshoot complex distributed systems
- Strong communication and collaboration skills to work effectively with cross-functional teams
- Experience with CI/CD pipelines, infrastructure automation tools, and monitoring solutions