Deskripsi Pekerjaan
Join ByteDance's Global Traffic Infrastructure (GTI) team, where you'll architect and maintain the backbone of our global digital ecosystem. As a Site Reliability Engineer, you'll design scalable, high-traffic systems that serve billions of users across 150+ countries. Our unified platform leverages cutting-edge edge infrastructure to ensure seamless content delivery, optimize latency, and maintain 99.99% uptime. You'll work with state-of-the-art technologies to automate infrastructure deployment, implement robust monitoring systems, and drive continuous improvement initiatives. This role offers the unique opportunity to solve complex challenges in distributed systems while contributing to innovations that shape the future of global internet infrastructure.
Tanggung Jawab
- Design and implement scalable traffic routing systems for global edge infrastructure
- Automate infrastructure provisioning, monitoring, and incident response workflows
- Optimize system performance and cost-efficiency across distributed environments
- Develop fault-tolerant architectures with built-in redundancy and failover mechanisms
- Collaborate with product teams to define SLOs/SLIs and drive reliability initiatives
- Lead post-mortem analysis and implement preventive measures for critical incidents
- Contribute to open-source projects and internal tooling innovation
Kualifikasi
- Bachelor's degree in Computer Science/Engineering or equivalent practical experience
- 5+ years in SRE, DevOps, or infrastructure engineering roles
- Expertise in Linux systems, networking protocols, and cloud platforms (AWS/GCP/Azure)
- Proficiency in automation tools (Ansible, Terraform, Kubernetes)
- Strong coding skills in Python/Go and scripting languages
- Experience with observability stacks (Prometheus, Grafana, ELK)
- Knowledge of CDNs, load balancing, and DNS infrastructure
- Ability to troubleshoot complex distributed systems under high load