Hadoop For Administrators Training Course

Apache Hadoop is the most widely adopted framework for processing Big Data across clusters of servers. In this three-day course (optionally extended to four days), participants will explore the business benefits and real-world use cases of Hadoop and its ecosystem. They will learn how to plan cluster deployment and growth, as well as how to install, maintain, monitor, troubleshoot, and optimize Hadoop environments. The course also includes hands-on practice with bulk data loading, familiarisation with various Hadoop distributions, and managing tools within the Hadoop ecosystem. The programme concludes with a discussion on securing the cluster using Kerberos.

“…The materials were exceptionally well-prepared and comprehensively covered. The lab sessions were highly beneficial and meticulously organised.”
— Andrew Nguyen, Principal Integration DW Engineer, Microsoft Online Advertising

Audience

Hadoop administrators

Format

Lectures and hands-on labs, with an approximate balance of 60% lectures and 40% labs.

This course is available as onsite live training in Uzbekistan or online live training.

Thank you for sending your enquiry! One of our team members will contact you shortly.

Thank you for sending your booking! One of our team members will contact you shortly.

Course Outline

Introduction
- Hadoop history and core concepts
- The Hadoop ecosystem
- Hadoop distributions
- High-level architecture overview
- Common Hadoop myths
- Hadoop challenges (hardware and software)
- Labs: Discuss your Big Data projects and challenges
Planning and installation
- Selecting software and Hadoop distributions
- Sizing the cluster and planning for future growth
- Selecting appropriate hardware and network infrastructure
- Rack topology considerations
- Installation procedures
- Multi-tenancy implementation
- Directory structure and log management
- Benchmarking techniques
- Labs: Cluster installation and performance benchmarking
HDFS operations
- Core concepts (horizontal scaling, replication, data locality, rack awareness)
- Nodes and daemons (NameNode, Secondary NameNode, HA Standby NameNode, DataNode)
- Health monitoring strategies
- Command-line and browser-based administration
- Adding storage and replacing defective drives
- Labs: Getting familiar with HDFS command-line operations
Data ingestion
- Using Flume for logs and other data ingestion into HDFS
- Using Sqoop to import data from SQL databases to HDFS, and export back to SQL
- Hadoop data warehousing with Hive
- Copying data between clusters (distcp)
- Leveraging S3 as a complement to HDFS
- Best practices and architectures for data ingestion
- Labs: Setting up and using Flume and Sqoop
MapReduce operations and administration
- Parallel computing before MapReduce: Comparing HPC with Hadoop administration
- MapReduce cluster workloads
- Nodes and daemons (JobTracker, TaskTracker)
- Walkthrough of the MapReduce UI
- MapReduce configuration
- Job configuration
- Optimising MapReduce performance
- Fool-proofing MapReduce: Guidance for programmers
- Labs: Running MapReduce examples
YARN: New architecture and capabilities
- YARN design goals and implementation architecture
- New components: ResourceManager, NodeManager, Application Master
- Installing YARN
- Job scheduling under YARN
- Labs: Investigating job scheduling mechanisms
Advanced topics
- Hardware monitoring
- Cluster monitoring
- Adding and removing servers, upgrading Hadoop
- Backup, recovery, and business continuity planning
- Oozie job workflows
- Hadoop high availability (HA)
- Hadoop Federation
- Securing your cluster with Kerberos
- Labs: Setting up monitoring systems
Optional tracks
- Cloudera Manager for cluster administration, monitoring, and routine tasks; installation and usage. In this track, all exercises and labs are conducted within the Cloudera distribution environment (CDH5).
- Ambari for cluster administration, monitoring, and routine tasks; installation and usage. In this track, all exercises and labs are performed within the Ambari cluster manager and Hortonworks Data Platform (HDP 2.0).

Requirements

Proficiency in basic Linux system administration
Fundamental scripting skills

Prior knowledge of Hadoop and Distributed Computing is not required, as these concepts will be introduced and explained throughout the course.

Lab environment

Zero Install: There is no need to install Hadoop software on students’ machines! A fully functional Hadoop cluster will be provided for practical exercises.

Students will require the following:

An SSH client (Linux and Mac users already have built-in SSH clients; for Windows users, PuTTY is recommended)
A web browser to access the cluster. We recommend Firefox with the FoxyProxy extension installed.

21 Hours

Need help picking the right course?
uzbekistan@nobleprog.com or +919818060888

Testimonials (1)

Hands on exercises. Class should have been 5 days, but the 3 days helped to clear up a lot of questions that I had from working with NiFi already

Hadoop For Administrators Training Course

Audience

Format

Course Outline

Requirements

Lab environment

Testimonials (1)

James - BHG Financial

Course - Apache NiFi for Administrators

Related Categories

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites

Hadoop For Administrators Training Course

Audience

Format

Course Outline

Requirements

Lab environment

Testimonials (1)

James - BHG Financial

Course - Apache NiFi for Administrators

Related Courses

Administrator Training for Apache Hadoop

Audience:

Goal:

Big Data Analytics in Health

Hadoop for Developers (4 days)

Advanced Hadoop for Developers

Hadoop Administration on MapR

Target Audience:

Hadoop and Spark for Administrators

HBase for Developers

Infomatica with Big Data (BDM)

Apache NiFi for Administrators

Apache NiFi for Developers

Python, Spark, and Hadoop for Big Data

Related Categories

Hadoop

This site in other countries/regions

Europe

Asia Pacific

North America

South America

Africa / Middle East

Other sites