Data Engineering

Data Engineering

Build reliable data pipelines from ingestion to analytics. ETL vs ELT, batch vs streaming, Apache Spark, Airflow orchestration, dbt transformations, data warehousing (Snowflake, BigQuery, Redshift), data lakes, and the modern data stack that turns raw data into business value.

FundamentalsTopics 1–10
  • ·What is Data Engineering?
  • ·The Modern Data Stack
  • ·Batch vs Streaming
  • ·ETL vs ELT
  • ·Star Schema Modelling
  • ·File Formats: Parquet vs CSV
  • ·Lake, Warehouse, Lakehouse
  • ·Medallion Architecture
  • ·Data Quality & Lineage
  • ·The Data Engineer's Role
Start Fundamentals
IntermediateTopics 1–10
  • ·Python for Pipelines
  • ·Pandas ETL Patterns
  • ·PySpark: Distributed Compute
  • ·Kafka: Streaming Basics
  • ·Change Data Capture (CDC)
  • ·dbt: Models & Testing
  • ·Connector Ingestion
  • ·REST API Ingestion
  • ·Incremental vs Full Load
  • ·Data Contracts
Start Intermediate
IntermediateTopics 1–10
  • ·Why Orchestration?
  • ·Airflow Architecture
  • ·Writing Your First DAG
  • ·Sensors & Triggers
  • ·XComs & Task Communication
  • ·Prefect: Flows and Tasks
  • ·Dagster: Asset-Based Thinking
  • ·Scheduling Strategies
  • ·Failure Handling & Retries
  • ·Orchestration Best Practices
Start Intermediate
AdvancedTopics 1–10
  • ·BigQuery Architecture
  • ·Snowflake Virtual Warehouses
  • ·Columnar Storage Internals
  • ·Partitioning & Clustering
  • ·dbt Incremental Models
  • ·Slowly Changing Dimensions
  • ·Query Optimisation
  • ·Warehouse Cost Management
  • ·The Lakehouse (Delta/Iceberg)
  • ·Data Vault 2.0
Start Advanced
ProductionTopics 1–10
  • ·DataOps & CI/CD
  • ·Data Observability
  • ·Real-Time Analytics
  • ·Data Mesh Architecture
  • ·Cost at Scale
  • ·Secrets & Access Patterns
  • ·Disaster Recovery
  • ·Data Cataloging
  • ·SLAs for Data Products
  • ·The Modern DE Career
Start Production