Get in Touch

Course Outline

Module 1: Informatica Data Engineering Management Overview

  • Core concepts of Data Engineering
  • Features of Data Engineering Management
  • Benefits of Data Engineering Management
  • Architecture of Data Engineering Management
  • Developer tasks in Data Engineering Management
  • New features in Data Engineering Integration 10.4

Module 2: Ingestion and Extraction in Hadoop

  • Integrating DEI with Hadoop clusters
  • Understanding Hadoop file systems
  • Data ingestion into HDFS and Hive using Sqoop
  • Initial load for Mass Ingestion to HDFS and Hive
  • Incremental load for Mass Ingestion to HDFS and Hive
  • Lab: Configure Sqoop to process data between Oracle and HDFS
  • Lab: Configure Sqoop for processing data between an Oracle database and Hive
  • Lab: Create mapping specifications using Mass Ingestion Service

Module 3: Native and Hadoop Engine Strategy

  • Engine strategy for Data Engineering Integration
  • Hive Engine architecture
  • MapReduce
  • Tez
  • Spark architecture
  • Blaze architecture
  • Lab: Execute a mapping in Spark mode
  • Lab: Connect to a Deployed Application

Module 4: Data Engineering Development Process

  • Advanced transformations in Data Engineering Integration, Python, and Update Strategy
  • Hive ACID use cases
  • Stateful computing and windowing
  • Lab: Create a reusable Python transformation
  • Lab: Create an active Python transformation
  • Lab: Perform Hive upserts
  • Lab: Use the LEAD windowing function
  • Lab: Use the LAG windowing function
  • Lab: Create a macro transformation

Module 5: Complex File Processing

  • Data Engineering file formats: Avro, Parquet, JSON
  • Complex file data types: Structs, Arrays, Maps
  • Complex configuration, operators, and functions
  • Lab: Convert flat file data objects to Avro files
  • Lab: Use complex data types—Arrays, Structs, and Maps—in mappings

Module 6: Hierarchical Data Processing

  • Hierarchical data processing
  • Flattening hierarchical data
  • Dynamic flattening with schema changes
  • Hierarchical data processing with schema changes
  • Complex configuration, operators, and functions
  • Dynamic ports
  • Dynamic input rules
  • Lab: Flatten a complex port in a mapping
  • Lab: Build dynamic mappings using dynamic ports
  • Lab: Build dynamic mappings using input rules
  • Lab: Perform dynamic flattening of complex ports
  • Lab: Parse hierarchical data on the Spark engine

Module 7: Mapping Optimization and Performance Tuning

  • Validation environments
  • Execution environments
  • Mapping optimization
  • Mapping recommendations and insights
  • Scheduling, queuing, and node labeling
  • Mapping audits
  • Lab: Implement recommendations
  • Lab: Implement insights
  • Lab: Implement mapping audits

Module 8: Monitoring Logs and Troubleshooting in Hadoop

  • Hadoop environment logs
  • Spark engine monitoring
  • Blaze engine monitoring
  • REST Operations Hub
  • Log aggregator
  • Troubleshooting
  • Lab: Monitor mappings using REST Operations Hub
  • Lab: View and analyze logs using Log Aggregator

Module 9: Intelligent Structure Model

  • Overview of Intelligent Structure Discovery
  • Intelligent Structure Model
  • Lab: Use an Intelligent Structure Model in a mapping

Module 10: Databricks Overview

  • Overview of Databricks
  • Steps to configure Databricks
  • Databricks clusters
  • Notebooks, jobs, and data
  • Delta Lakes

Module 11: Databricks Integration

  • Databricks integration
  • Components of the Informatica and Databricks environments
  • Runtime process on the Databricks Spark engine
  • Databricks integration task flow
  • Prerequisites for Databricks integration
  • Cluster workflows
  • Demo: Set up Databricks connection
  • Demo: Run a mapping with the Databricks Spark engine

Requirements

Developer Tool for Big Data Developers

 21 Hours

Number of participants


Price per participant

Testimonials (2)

Upcoming Courses

Related Categories