Thank you for sending your enquiry! One of our team members will contact you shortly.
Thank you for sending your booking! One of our team members will contact you shortly.
Course Outline
Section 1: Introduction to Hadoop
- History and core concepts of Hadoop
- Ecosystem overview
- Available distributions
- High-level architecture
- Common myths about Hadoop
- Key challenges in Hadoop implementation
- Hardware and software considerations
- Lab: Initial exploration of Hadoop
Section 2: HDFS
- Design principles and architecture
- Core concepts (horizontal scaling, replication, data locality, rack awareness)
- Daemons: Namenode, Secondary Namenode, DataNode
- Communication mechanisms and heartbeats
- Data integrity management
- Read and write paths
- Namenode High Availability (HA) and Federation
- Labs: Interacting with HDFS
Section 3 : Map Reduce
- Concepts and architecture
- Daemons (MRV1): JobTracker and TaskTracker
- Processing phases: driver, mapper, shuffle/sort, reducer
- MapReduce Version 1 and Version 2 (YARN)
- Deep dive into MapReduce internals
- Introduction to Java-based MapReduce programming
- Labs: Executing a sample MapReduce program
Section 4 : Pig
- Pig compared to Java MapReduce
- Pig job workflow
- Pig Latin language fundamentals
- Data processing with Pig
- Transformations and joins
- User Defined Functions (UDF)
- Labs: Writing Pig scripts for data analysis
Section 5: Hive
- Architecture and design
- Data types
- SQL capabilities in Hive
- Creating Hive tables and executing queries
- Data partitioning
- Joins
- Text processing techniques
- Labs: Various hands-on exercises on data processing with Hive
Section 6: HBase
- Concepts and architecture
- Comparison of HBase, RDBMS, and Cassandra
- HBase Java API
- Handling time series data in HBase
- Schema design strategies
- Labs: Interacting with HBase via the shell; programming with the HBase Java API; Schema design exercises
Requirements
- Proficiency in the Java programming language is required, as most practical exercises will be conducted in Java.
- Familiarity with the Linux environment is essential, including the ability to navigate the command line and edit files using tools like vi or nano.
Lab environment
Zero Install : Participants do not need to install Hadoop software on their own devices. A fully functional Hadoop cluster will be provided for use during the course.
Students will need the following tools
- An SSH client (Linux and Mac systems come with built-in SSH clients; for Windows, PuTTY is recommended)
- A web browser to access the cluster, with Firefox being the recommended option.
28 Hours
Testimonials (1)
Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already