Pipeline Design & Development: Create, maintain, and optimize scalable ETL/ELT pipelines and workflows for seamless data ingestion from diverse sources.
Database Management: Design complex SQL queries, stored procedures, and relational data models/schemas for performance and scalability.
Performance Optimization: Tune SQL queries, database functions, and ETL jobs for cost-efficiency and speed.
Data Quality & Integrity: Implement robust testing, validation, and monitoring to ensure data accuracy, completeness, and reliability.
Architecture & Collaboration: Work with data scientists and engineers to build data infrastructure, supporting the full AI/ML lifecycle.
Data Modeling & Warehousing: Develop Data Flow Diagrams (DFD) and ER models, focusing on data warehousing, using tools like Snowflake, Redshift, or Oracle.
Languages: Strong proficiency in SQL (complex queries) and Python.
Database Systems: Expert-level knowledge of RDBMS (Oracle, MS SQL, PostgreSQL, MySQL).
ETL/Workflow Tools: Experience with Apache Airflow, Kafka, Spark, or similar technologies.
Cloud/Big Data: Experience with AWS (S3, Redshift, Athena) or cloud-native data warehouses.
Data Governance: Experience implementing security, data modeling best practices, and documentation.