Get in Touch

Course Outline

Introduction, Objectives, and Migration Strategy

  • Course goals, alignment with participant profiles, and success criteria.
  • High-level migration approaches and risk considerations.
  • Setup of workspaces, repositories, and lab datasets.

Day 1 — Migration Fundamentals and Architecture

  • Lakehouse concepts, Delta Lake overview, and Databricks architecture.
  • Differences and implications of SMP versus MPP for migration.
  • Medallion (Bronze→Silver→Gold) design and Unity Catalog overview.

Day 1 Lab — Translating a Stored Procedure

  • Hands-on migration of a sample stored procedure to a notebook.
  • Mapping temporary tables and cursors to DataFrame transformations.
  • Validation and comparison with original output.

Day 2 — Advanced Delta Lake & Incremental Loading

  • ACID transactions, commit logs, versioning, and time travel capabilities.
  • Auto Loader, MERGE INTO patterns, upserts, and schema evolution.
  • OPTIMIZE, VACUUM, Z-ORDER, partitioning, and storage tuning.

Day 2 Lab — Incremental Ingestion & Optimization

  • Implementation of Auto Loader ingestion and MERGE workflows.
  • Application of OPTIMIZE, Z-ORDER, and VACUUM; validation of results.
  • Measurement of read/write performance improvements.

Day 3 — SQL in Databricks, Performance & Debugging

  • Analytical SQL features: window functions, higher-order functions, and JSON/array handling.
  • Reading the Spark UI, DAGs, shuffles, stages, tasks, and diagnosing bottlenecks.
  • Query tuning patterns: broadcast joins, hints, caching, and spill reduction.

Day 3 Lab — SQL Refactoring & Performance Tuning

  • Refactoring a complex SQL process into optimized Spark SQL.
  • Utilizing Spark UI traces to identify and resolve skew and shuffle issues.
  • Benchmarking before/after scenarios and documenting tuning steps.

Day 4 — Tactical PySpark: Replacing Procedural Logic

  • Spark execution model: driver, executors, lazy evaluation, and partitioning strategies.
  • Transforming loops and cursors into vectorized DataFrame operations.
  • Modularization, UDFs/pandas UDFs, widgets, and reusable libraries.

Day 4 Lab — Refactoring Procedural Scripts

  • Refactoring a procedural ETL script into modular PySpark notebooks.
  • Introduction of parametrization, unit-style tests, and reusable functions.
  • Code review and application of best-practice checklists.

Day 5 — Orchestration, End-to-End Pipeline & Best Practices

  • Databricks Workflows: job design, task dependencies, triggers, and error handling.
  • Designing incremental Medallion pipelines with quality rules and schema validation.
  • Integration with Git (GitHub/Azure DevOps), CI, and testing strategies for PySpark logic.

Day 5 Lab — Build a Complete End-to-End Pipeline

  • Assembly of a Bronze→Silver→Gold pipeline orchestrated with Workflows.
  • Implementation of logging, auditing, retries, and automated validations.
  • Running the full pipeline, validating outputs, and preparing deployment notes.

Operationalization, Governance, and Production Readiness

  • Unity Catalog governance, lineage, and access controls best practices.
  • Cost management, cluster sizing, autoscaling, and job concurrency patterns.
  • Deployment checklists, rollback strategies, and runbook creation.

Final Review, Knowledge Transfer, and Next Steps

  • Participant presentations of migration work and lessons learned.
  • Gap analysis, recommended follow-up activities, and handoff of training materials.
  • References, further learning paths, and support options.

Requirements

  • Foundational understanding of data engineering concepts.
  • Practical experience with SQL and stored procedures (Synapse / SQL Server).
  • Familiarity with ETL orchestration concepts (such as ADF or similar tools).

Target Audience

  • Technology managers possessing a background in data engineering.
  • Data engineers transitioning from procedural OLAP logic to Lakehouse patterns.
  • Platform engineers responsible for driving Databricks adoption.
 35 Hours

Number of participants


Price per participant

Upcoming Courses

Related Categories