Design, develop, and maintain scalable data pipelines for data ingestion, transformation, and loading into GCP services (BigQuery, Dataflow, DataProc, Pub/Sub, Storage).
Collaborate with data architects and analysts to translate data requirements into technical specifications.
Utilize Debezium and Apache Flink to capture and process change data from various sources, including Oracle databases (expertise with GoldenGate preferred).
Develop and implement data quality checks and procedures for ensuring data accuracy and consistency.
Automate data processing tasks using scripting languages such as Python, Bash, or PowerShell.
Monitor and troubleshoot data pipelines to identify and resolve issues.
Contribute to the development and maintenance of data governance frameworks and policies.
Collaborate with other engineers to integrate data pipelines with applications and analytics tools.
SkillSets:
Minimum 5+ years of experience in building and operationalizing large-scale enterprise data solutions using GCP data and analytics services, alongside third-party tools like Spark, Hive, Cloud DataProc, Cloud Dataflow, Apache Beam/Composer, BigTable, Cloud BigQuery, Cloud Pub/Sub, Cloud Storage, Cloud Functions, & Github.
Strong proficiency in Python development is mandatory.
3+ years of experience in working with data pipelines, ETL processes, and data warehousing concepts.
Expertise in Debezium and Apache Flink for change data capture and processing.
Strong understanding of Oracle databases, with familiarity with GoldenGate highly desired.
Working knowledge of SQL and experience with relational databases.
Familiarity with data modeling concepts and experience with data modeling tools is a plus.
Experience with scripting languages such as Bash or PowerShell.
Experience with data version control systems such as Git is a plus.
Understanding of data security best practices is essential.
Excellent communication and collaboration skills are required.
Ability to work independently and as part of a team is essential.
Experience with containerization technologies like Docker or Kubernetes.
Must have Google Cloud Certified Professional Data Engineer or equivalent data engineering certification.