Deskripsi Pekerjaan
Are you a seasoned IT professional with a passion for ensuring system stability and leading high-performing teams? Eclaro is seeking a dynamic and experienced Lead Production Support Engineer to join our innovative team in Bonifacio Global City, Metro Manila. This pivotal role is perfect for a technical leader who thrives at the intersection of operational excellence, incident management, and continuous reliability improvement.
At Eclaro, we are committed to delivering cutting-edge solutions and maintaining the highest standards of system performance. As our Lead Production Support Engineer, you will blend your profound technical expertise with strong leadership capabilities to oversee critical production environments. You will be instrumental in guiding our support initiatives, driving proactive measures to enhance system uptime, and leading swift resolution of complex incidents. This is a unique opportunity to own people, priorities, and processes, making a tangible impact on our core operations and the success of our business. If you are a proactive problem-solver with a knack for fostering a culture of reliability and operational efficiency, we invite you to apply and help us shape the future of our technical landscape.
Tanggung Jawab
- Lead and mentor a team of Production Support Engineers, fostering their growth and ensuring high performance.
- Oversee end-to-end incident management, including detection, escalation, resolution, and post-incident review (RCA), for critical production systems.
- Drive proactive initiatives to enhance system reliability, stability, performance, and operational efficiency through automation and process improvements.
- Collaborate closely with Development, QA, and Operations teams to ensure seamless deployments and robust operational readiness for new features and services.
- Develop, implement, and enforce best practices for production support processes, documentation, and knowledge sharing.
- Manage critical escalations, communicate effectively with stakeholders, and provide regular updates on system health and incident status.
- Contribute to the design and implementation of monitoring, alerting, and logging solutions to ensure comprehensive system observability.
- Participate in on-call rotations and provide expert-level support for complex issues as needed.
Kualifikasi
- Bachelor’s degree in Computer Science, Information Technology, Engineering, or a related field.
- Minimum of 5-7 years of progressive experience in IT Production Support, Site Reliability Engineering (SRE), or IT Operations, with at least 2 years in a leadership or senior role.
- Strong understanding of ITIL principles and frameworks (Incident, Problem, Change Management).
- Proficiency in monitoring and alerting tools (e.g., Splunk, ELK Stack, Prometheus, Grafana, Dynatrace).
- Solid experience with scripting languages (e.g., Python, Shell, PowerShell) for automation and troubleshooting.
- Deep knowledge of operating systems (Linux/Unix, Windows Server) and networking fundamentals.
- Experience with relational and NoSQL databases (e.g., SQL Server, Oracle, MongoDB, Cassandra).
- Familiarity with cloud platforms (AWS, Azure, GCP) and containerization technologies (Docker, Kubernetes) is a plus.
- Exceptional analytical, problem-solving, and communication skills (written and verbal).
- Ability to work effectively under pressure in a fast-paced, complex technical environment.