Deskripsi Pekerjaan

Are you passionate about building hyper-scale multimedia systems? ByteDance is looking for a talented Site Reliability Engineer (SRE) to join our Media Platform team in Singapore. In this role, you will be at the heart of our mission to deliver seamless video experiences to millions of global users.
The Media Platform team is responsible for architecting and maintaining a highly competitive video transmission network. As an SRE, you will bridge the gap between development and operations, focusing on system reliability, performance optimization, and cost-efficiency. You will work on cutting-edge challenges related to low-latency streaming, distributed systems, and massive-scale data processing. If you thrive in a fast-paced environment and are driven by engineering excellence, we want to hear from you.

Tanggung Jawab

Design, build, and maintain scalable infrastructure to support our global video transmission and media processing services.
Optimize system performance and resource utilization to minimize operational costs while maintaining high availability.
Automate infrastructure provisioning, monitoring, and incident response through robust tooling and CI/CD pipelines.
Conduct deep-dive troubleshooting and root cause analysis for complex distributed system failures.
Collaborate with cross-functional software engineering teams to define and maintain Service Level Objectives (SLOs).
Proactively identify capacity bottlenecks and implement architectural improvements to support rapid traffic growth.

Kualifikasi

Bachelor’s or Master’s degree in Computer Science, Engineering, or a related technical field.
3+ years of experience in SRE, DevOps, or Software Engineering roles, preferably within high-traffic environments.
Proficiency in programming languages such as Go, Python, C++, or Java.
Deep understanding of Linux systems, networking protocols (TCP/IP, HTTP/HTTPS), and distributed system architecture.
Hands-on experience with containerization and orchestration technologies, specifically Kubernetes and Docker.
Experience with cloud infrastructure (AWS, GCP, or Azure) and monitoring stacks (Prometheus, Grafana, ELK).
Strong analytical mindset with a proven ability to solve complex technical problems under pressure.

Site Reliability Engineer - Media Platform

Deskripsi Pekerjaan

Tanggung Jawab

Kualifikasi

Keahlian yang Dibutuhkan

Siap Mengambil Tantangan Ini?

Lowongan Terkait

Backend Developer (Node.js Expert)

Network Operation Center Director

ServiceNow CMDB Architect