Get in Touch

Course Outline

Introduction:

  • Apache Spark within the Hadoop Ecosystem
  • Overview of Python and Scala

Foundational Theory:

  • Architecture
  • RDDs
  • Transformations and Actions
  • Stages, Tasks, and Dependencies

Hands-on Workshop: Understanding Basics via Databricks Environment:

  • Exercises with the RDD API
  • Core action and transformation functions
  • PairRDDs
  • Joins
  • Caching strategies
  • Exercises with the DataFrame API
  • SparkSQL
  • DataFrame operations: select, filter, group, sort
  • User-Defined Functions (UDFs)
  • Introduction to the DataSet API
  • Streaming

Hands-on Workshop: Understanding Deployment via AWS Environment:

  • Basics of AWS Glue
  • Comparing AWS EMR and AWS Glue
  • Example jobs in both environments
  • Evaluating advantages and disadvantages

Additional Topics:

  • Introduction to Apache Airflow orchestration

Requirements

Programming skills (preferably in Python and Scala)

Basic knowledge of SQL

 21 Hours

Number of participants


Price per participant

Testimonials (3)

Upcoming Courses

Related Categories