Deskripsi Pekerjaan
Join the Mindteck Engineering Team
Mindteck is looking for a talented and dedicated Site Reliability Engineer (SRE) to join our expanding team in Cyberjaya, Selangor. In this role, you will be at the heart of our technical operations, bridging the gap between software development and systems engineering. Your primary mission is to ensure that our services are reliable, performant, and scalable to meet the evolving needs of our global clients.
As an SRE at Mindteck, you will treat operations as an engineering problem. You will be responsible for building and maintaining robust monitoring systems, measuring availability and latency, and ensuring the overall health of our production environments. We are looking for a professional who practices sustainable incident response and is passionate about automation to reduce manual 'toil'. Cyberjaya offers a vibrant tech ecosystem, and we provide a workspace that encourages innovation, continuous learning, and technical excellence.
If you are a problem-solver who thrives on complex challenges and wants to work with cutting-edge cloud technologies, Mindteck is the place for you. Help us build the next generation of resilient infrastructure and contribute to a culture of blameless post-mortems and proactive system optimization.
Tanggung Jawab
- Design, build, and maintain software tools to automate infrastructure and improve system reliability.
- Monitor service availability, latency, and overall system health using advanced observability platforms.
- Manage and scale cloud infrastructure, ensuring high performance and cost-efficiency.
- Lead incident response activities and conduct comprehensive, blameless post-mortems.
- Implement and optimize CI/CD pipelines to facilitate rapid and secure code deployments.
- Collaborate with development teams to define and track Service Level Objectives (SLOs) and SLIs.
- Perform capacity planning and system tuning to support business growth.
Kualifikasi
- Bachelor’s Degree in Computer Science, Information Technology, or a related engineering field.
- Minimum of 3 years of experience in Site Reliability Engineering, DevOps, or Systems Administration.
- Proficiency in scripting and programming languages such as Python, Go, or Bash.
- Hands-on experience with containerization technologies including Docker and Kubernetes.
- Strong background in Infrastructure as Code (IaC) using tools like Terraform or Ansible.
- Deep understanding of cloud platforms such as AWS, Azure, or Google Cloud Platform.
- Familiarity with monitoring and logging tools like Prometheus, Grafana, or ELK Stack.
- Excellent analytical skills and the ability to solve complex technical issues under pressure.