Deskripsi Pekerjaan
Are you a skilled Site Reliability Engineer passionate about Generative AI? Ridik is looking for a dedicated SRE Engineer to join our team on a contract basis. This role is pivotal in ensuring the stability and performance of our critical applications and production environments, with a significant focus on integrating and supporting GenAI technologies.
In this position, you will bridge the gap between software development and operations, working closely with cross-functional teams to optimize infrastructure and streamline deployment pipelines. The ideal candidate will have a strong background in cloud-native technologies and a keen interest in the rapidly evolving AI landscape.
Tanggung Jawab
- Oversee 75% of application and production support activities, ensuring high availability and minimal downtime.
- Collaborate with GenAI engineering teams to deploy, monitor, and scale machine learning models.
- Implement and maintain robust monitoring, logging, and alerting systems using industry-standard tools.
- Troubleshoot complex production incidents and drive root cause analysis (RCA) to improve system resilience.
- Automate operational tasks using scripting languages (Python, Bash) to enhance efficiency.
- Ensure security compliance and best practices across all infrastructure components.
Kualifikasi
- Bachelor’s degree in Computer Science, Engineering, or a related technical field.
- 3+ years of experience as an SRE, DevOps Engineer, or similar role.
- Hands-on experience with containerization technologies (Docker, Kubernetes).
- Proficiency in at least one major cloud provider (AWS, Azure, or GCP).
- Strong scripting skills in Python or similar languages.
- Familiarity with GenAI tools, LLMs, or AI deployment frameworks is a strong plus.