Pentaho Data Integration Fundamentals Training Course
Pentaho Data Integration is an open-source tool designed for defining jobs and data transformations.
In this instructor-led, live training, participants will learn how to leverage Pentaho Data Integration's robust ETL capabilities and user-friendly interface to manage the entire big data lifecycle, maximizing the value of data within their organization.
By the end of this training, participants will be able to:
- Create, preview, and execute basic data transformations that include steps and hops
- Configure and secure the Pentaho Enterprise Repository
- Integrate diverse data sources to produce a unified, analytics-ready format
- Deliver results to third-party applications for further processing
Audience
- Data Analysts
- ETL Developers
Format of the Course
- A combination of lectures, discussions, exercises, and extensive hands-on practice
Course Outline
Introduction
Installing and Configuring Pentaho
Overview of Pentaho Features and Architecture
Understanding Pentaho's In-Memory Caching
Navigating the User Interface
Connecting to a Data Source
Configuring the Pentaho Enterprise Repository
Transforming Data
Viewing the Transformation Results
Resolving Transformation Errors
Processing a Data Stream
Reusing Transformations
Scheduling Transformations
Securing Pentaho
Integrating with Third-party Applications (Hadoop, NoSQL, etc.)
Analytics and Reporting
Pentaho Design Patterns and Best Practices
Troubleshooting
Summary and Conclusion
Requirements
- An understanding of relational databases
- An understanding of data warehousing
- An understanding of ETL (Extract, Transform, Load) concepts
Need help picking the right course?
Pentaho Data Integration Fundamentals Training Course - Enquiry
Pentaho Data Integration Fundamentals - Consultancy Enquiry
Testimonials (2)
Very useful in because it helps me understand what we can do with the data in our context. It will also help me
Nicolas NEMORIN - Adecco Groupe France
Course - KNIME Analytics Platform for BI
It's a hands-on session.
Vorraluck Sarechuer - Total Access Communication Public Company Limited (dtac)
Course - Talend Open Studio for ESB
Related Courses
Data Engineering Integration for Developers
21 HoursThis course is designed for software version 10.5. You will learn how to accelerate Data Engineering Integration by handling large data ingestions, incremental loading, transformations, complex file processing, dynamic mappings, and Python scripting. We will explore how to leverage application logic for various Data Engineering scenarios while also focusing on monitoring, troubleshooting, and best practices.
Objectives
Upon successfully completing this course, students should be able to:
- Ingest large volumes of data into Hive and HDFS
- Perform incremental loads in mass ingestion processes
- Conduct both initial and incremental loads
- Integrate with relational databases using SQOOP
- Execute transformations across multiple engines
- Run mappings using JDBC in Spark mode
- Perform stateful computing and windowing operations
- Process complex file formats
- Parse hierarchical data on the Spark engine
- Run profiles and select sampling options on the Spark engine
- Execute dynamic mappings
- Create audits for mappings
- Monitor logs using the REST Operations Hub
- Monitor logs with Log Aggregation and troubleshoot issues
- Run mappings in a Databricks environment
- Create mappings to access Delta Lake tables
- Optimize the performance of Spark and Databricks jobs
KNIME Analytics Platform for BI
21 HoursKNIME Analytics Platform is a prominent open-source solution for data-driven innovation, enabling you to uncover the potential within your data, discover new insights, or forecast future scenarios. With over 1,000 modules, hundreds of ready-to-use examples, a wide array of integrated tools, and the most extensive selection of advanced algorithms available, KNIME Analytics Platform is an ideal toolkit for any data scientist or business analyst.
This course on KNIME Analytics Platform offers a valuable opportunity for beginners, advanced users, and KNIME experts alike. Participants will be introduced to KNIME, learn how to use it more effectively, and gain skills in creating clear and comprehensive reports based on KNIME workflows.
KNIME Analytical Platform - Comprehensive Training
35 HoursThe "KNIME Analytical Platform" training offers a comprehensive overview of this free data analysis platform. The program covers data processing and analysis introduction, KNIME installation and configuration, building workflows, business model creation methodology, and data modeling. The course also discusses advanced data analysis tools, workflow import and export, tool integration, ETL processes, data exploration, visualization, extensions, and integrations with tools such as R, Java, Python, Gephi, Neo4j. The conclusion includes a discussion on reporting, BIRT integration, and the KNIME WebPortal.
Oracle GoldenGate
14 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at sysadmins and developers who wish to set up, deploy, and manage Oracle GoldenGate for data transformation.
By the end of this training, participants will be able to:
- Install and configure Oracle GoldenGate.
- Understand Oracle databases replication using the Oracle GoldenGate tool.
- Understand the Oracle GoldenGate architecture.
- Configure and perform a database replication and migration.
- Optimize Oracle GoldenGate performance and troubleshoot issues.
Pentaho Open Source BI Suite Community Edition (CE)
28 HoursPentaho Open Source BI Suite Community Edition (CE) is a comprehensive business intelligence package that offers data integration, reporting, dashboards, and data loading capabilities.
In this instructor-led, live training, participants will learn how to fully leverage the features of Pentaho Open Source BI Suite Community Edition (CE).
By the end of this training, participants will be able to:
- Install and configure Pentaho Open Source BI Suite Community Edition (CE)
- Grasp the fundamentals of Pentaho CE tools and their functionalities
- Create reports using Pentaho CE
- Integrate external data sources into Pentaho CE
- Work with big data and analytics in Pentaho CE
Audience
- Programmers
- BI Developers
Format of the course
- Part lecture, part discussion, exercises, and extensive hands-on practice
Note
- To request a customized training for this course, please contact us to arrange.
Pentaho Data Integration Advanced
21 HoursPentaho Data Integration is a comprehensive platform for designing enterprise-grade ETL and data pipelines.
This instructor-led, live training (online or onsite) is designed for advanced-level engineers who aim to master high-performance, enterprise-scale, and highly automated PDI solutions.
Upon completing this course, participants will be able to:
- Design large-scale ETL pipelines with advanced orchestration capabilities.
- Optimize complex transformations for maximum performance.
- Implement scripting, automation, and hybrid integration patterns effectively.
- Develop robust, maintainable, and production-ready workflows.
Format of the Course
- Expert-led demonstrations and in-depth architectural discussions.
- Extensive hands-on lab work addressing advanced real-world ETL challenges.
- Practical development experience in a production-like environment.
Course Customization Options
- Please contact us if you need a customized version of this training.
Pentaho Data Integration Intermediate
21 HoursPentaho Data Integration is a platform designed for data extraction, transformation, and loading.
This training, led by an instructor and conducted either online or on-site, is aimed at intermediate-level practitioners who want to enhance their PDI skills for more complex transformation scenarios.
Upon completing this training, participants will be able to:
- Design multi-step transformations with enhanced performance.
- Work effectively with variables, parameters, and reusable components.
- Integrate PDI with databases, APIs, and external systems seamlessly.
- Apply best practices for creating maintainable and scalable ETL pipelines.
Format of the Course
- Interactive demonstrations and detailed explanations by the instructor.
- Guided exercises and scenario-based practice sessions.
- Hands-on experience in a real-world ETL project environment.
Course Customization Options
- If you need a customized version of this course, please contact us to tailor it to your needs.
Sensor Fusion Algorithms
14 HoursSensor Fusion involves the combination and integration of data from various sensors to offer a more accurate, reliable, and contextually rich view of the information.
The implementation of Sensor Fusion requires algorithms that can effectively filter and integrate data from different sources.
Audience
This course is designed for engineers, programmers, and architects who work with multi-sensor systems.
Talend Administration Center (TAC)
14 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at system administrators, data scientists, and business analysts who wish to set up Talend Administration Center to deploy and manage the organization's roles and tasks.
By the end of this training, participants will be able to:
- Install and configure Talend Administration Center.
- Understand and implement Talend management fundamentals.
- Build, deploy, and run business projects or tasks in Talend.
- Monitor the security of datasets and develop business routines based on the TAC framework.
- Obtain a broader comprehension of big data applications.
Talend Big Data Integration
28 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at technical persons who wish to deploy Talend Open Studio for Big Data to simplifying the process of reading and crunching through Big Data.
By the end of this training, participants will be able to:
- Install and configure Talend Open Studio for Big Data.
- Connect with Big Data systems such as Cloudera, HortonWorks, MapR, Amazon EMR and Apache.
- Understand and set up Open Studio's big data components and connectors.
- Configure parameters to automatically generate MapReduce code.
- Use Open Studio's drag-and-drop interface to run Hadoop jobs.
- Prototype big data pipelines.
- Automate big data integration projects.
Talend Cloud
7 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at data administrators and developers who wish to manage, monitor, and operate data integration processes using Talend Cloud services.
By the end of this training, participants will be able to:
- Navigate the Talend Management Console to manage users and roles in the platform.
- Evaluate data to find and understand relevant datasets.
- Create a pipeline to process and monitor data at rest or in action.
- Prepare data for analysis to generate insights relevant to the business.
Talend Data Stewardship
14 HoursThis instructor-led, live training in Uzbekistan (online or onsite) is aimed at beginner to intermediate-level data analysts who wish to deepen their understanding and skills in managing and improving data quality using Talend Data Stewardship.
By the end of this training, participants will be able to:
- Gain a comprehensive understanding of the role of data stewardship in maintaining data quality.
- Use Talend Data Stewardship for managing data quality tasks.
- Create, assign, and manage tasks within Talend Data Stewardship, including workflow customization.
- Use the tool's reporting and monitoring capabilities to track data quality and stewardship efforts.
Talend Open Studio for ESB
21 HoursIn this instructor-led, live training in Uzbekistan, participants will learn how to use Talend Open Studio for ESB to create, connect, mediate and manage services and their interactions.
By the end of this training, participants will be able to
- Integrate, enhance and deliver ESB technologies as single packages in a variety of deployment environments.
- Understand and utilize Talend Open Studio's most used components.
- Integrate any application, database, API, or Web services.
- Seamlessly integrate heterogeneous systems and applications.
- Embed existing Java code libraries to extend projects.
- Leverage community components and code to extend projects.
- Rapidly integrate systems, applications and data sources within a drag-and-drop Eclipse environment.
- Reduce development time and maintenance costs by generating optimized, reusable code.