Deskripsi Pekerjaan
At ConnectOS, we are looking for a skilled Data Engineer - AWS to join our global team, focusing on the education sector. In this critical role, you will collaborate with the Data Science Team and enterprise capability teams to drive data transformation initiatives. You will be responsible for designing, building, and maintaining scalable data pipelines on the AWS cloud platform, leveraging services like AWS Glue, Amazon Redshift, Amazon S3, AWS Lambda, and Amazon EMR.
Your primary focus will be to ensure efficient data ingestion, processing, and storage to support advanced analytics and machine learning models. You will implement robust ETL processes, clean and transform data, and optimize data warehouse performance. Data quality and integrity will be paramount, requiring you to establish monitoring frameworks and best practices. You will also work closely with data scientists to understand their data needs and provide timely, reliable datasets.
As a Data Engineer, you will contribute to the architecture of data systems, recommend improvements, and stay abreast of the latest AWS cloud technologies. You will be expected to document processes, mentor junior team members, and contribute to a culture of knowledge sharing. This role offers an exciting opportunity to work in a dynamic, fast-paced environment where your work directly impacts business decisions and outcomes. You will be part of a supportive global team committed to innovation and professional growth.
Key aspects of the role include:
- Designing and implementing end-to-end data pipelines using AWS services
- Collaborating with data scientists to understand requirements and deliver high-quality data
- Ensuring data security, compliance, and governance within the cloud environment
- Optimizing data storage and retrieval for cost and performance
- Automating recurring data processes using scripts and orchestration tools
- Troubleshooting data issues and providing root cause analysis
- Participating in Agile ceremonies and sprint planning
Tanggung Jawab
- Design and implement scalable ETL pipelines using AWS Glue, Lambda, and Step Functions.
- Build and manage data warehouses in Amazon Redshift, including schema design and query optimization.
- Develop data processing applications in Python and SQL to transform and clean large datasets.
- Monitor and troubleshoot data pipeline performance, ensuring high availability and reliability.
- Collaborate with data scientists and analysts to understand data requirements and deliver curated datasets.
- Maintain comprehensive documentation of data architecture, pipelines, and best practices.
- Implement data quality checks and governance frameworks to ensure data accuracy and security.
Kualifikasi
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- At least 3-5 years of experience in data engineering or a similar role.
- Strong proficiency in AWS services: Glue, S3, Redshift, Lambda, EMR, and IAM.
- Expertise in Python and SQL for data manipulation and scripting.
- Experience with ETL frameworks and data pipeline orchestration (e.g., Apache Airflow).
- Solid understanding of data modeling, data warehousing, and big data technologies (Spark, Hive).
- Excellent problem-solving skills and attention to detail.
- Effective communication skills and ability to work in a collaborative team environment.