Deskripsi Pekerjaan
Are you passionate about building hyper-scale multimedia systems? ByteDance is looking for a talented Site Reliability Engineer (SRE) to join our Media Platform team in Singapore. In this role, you will be at the heart of our mission to deliver seamless video experiences to millions of global users.
The Media Platform team is responsible for architecting and maintaining a highly competitive video transmission network. As an SRE, you will bridge the gap between development and operations, focusing on system reliability, performance optimization, and cost-efficiency. You will work on cutting-edge challenges related to low-latency streaming, distributed systems, and massive-scale data processing. If you thrive in a fast-paced environment and are driven by engineering excellence, we want to hear from you.
Tanggung Jawab
- Design, build, and maintain scalable infrastructure to support our global video transmission and media processing services.
- Optimize system performance and resource utilization to minimize operational costs while maintaining high availability.
- Automate infrastructure provisioning, monitoring, and incident response through robust tooling and CI/CD pipelines.
- Conduct deep-dive troubleshooting and root cause analysis for complex distributed system failures.
- Collaborate with cross-functional software engineering teams to define and maintain Service Level Objectives (SLOs).
- Proactively identify capacity bottlenecks and implement architectural improvements to support rapid traffic growth.
Kualifikasi
- Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
- 3+ years of experience in SRE, DevOps, or Software Engineering roles, preferably within high-traffic environments.
- Proficiency in programming languages such as Go, Python, C++, or Java.
- Deep understanding of Linux systems, networking protocols (TCP/IP, HTTP/HTTPS), and distributed system architecture.
- Hands-on experience with containerization and orchestration technologies, specifically Kubernetes and Docker.
- Experience with cloud infrastructure (AWS, GCP, or Azure) and monitoring stacks (Prometheus, Grafana, ELK).
- Strong analytical mindset with a proven ability to solve complex technical problems under pressure.