Get in Touch

Course Outline

Overview of Big Data:

  • Defining Big Data
  • Reasons behind the rising popularity of Big Data
  • Case studies involving Big Data
  • Key characteristics of Big Data
  • Solutions for managing Big Data

Hadoop and Its Components:

  • Definition of Hadoop and its core components
  • Hadoop architecture and its capability to handle specific data types
  • A brief history of Hadoop, including the companies that use it and the reasons for adoption
  • Detailed explanation of the Hadoop framework and its components
  • Explanation of HDFS and the processes for reading from and writing to the Hadoop Distributed File System
  • Procedures for setting up a Hadoop cluster in various modes: standalone, pseudo-distributed, and multi-node

(This section covers establishing a Hadoop cluster using VirtualBox, KVM, or VMware, addressing critical network configurations, starting Hadoop Daemons, and performing cluster testing).

  • Explanation of the Map Reduce framework and its operational mechanics
  • Executing Map Reduce jobs on a Hadoop cluster
  • Comprehending replication, mirroring, and rack awareness within Hadoop clusters

Planning a Hadoop Cluster:

  • Strategies for planning your Hadoop cluster
  • Evaluating hardware and software requirements for cluster planning
  • Analyzing workloads to plan the cluster effectively, preventing failures, and ensuring optimal performance

Introduction to MapR and Its Value:

  • An overview of MapR and its architecture
  • Understanding and utilizing the MapR Control System, MapR Volumes, snapshots, and mirrors
  • Planning clusters specifically for MapR
  • Comparing MapR with other distributions and Apache Hadoop
  • Process of MapR installation and cluster deployment

Cluster Setup and Administration:

  • Managing services, nodes, snapshots, mirror volumes, and remote clusters
  • Understanding and managing nodes
  • Comprehending Hadoop components and installing them alongside MapR Services
  • Accessing data on the cluster, including via NFS, and managing services and nodes
  • Managing data through volumes, handling users and groups, assigning roles to nodes, commissioning and decommissioning nodes, cluster administration, performance monitoring, configuring and analyzing metrics, and administering MapR security
  • Understanding and working with M7, the native storage for MapR tables
  • Configuring and tuning the cluster for optimum performance

Cluster Upgrades and Integration with Other Systems:

  • Upgrading the MapR software version and types of upgrades
  • Configuring the MapR cluster to access an HDFS cluster
  • Setting up a MapR cluster on Amazon Elastic MapReduce

All topics include demonstrations and practice sessions to provide learners with hands-on experience.

Requirements

  • Foundational knowledge of the Linux file system
  • Basic understanding of Java
  • Familiarity with Apache Hadoop (recommended)
 28 Hours

Testimonials (1)

Related Categories