[xxxx] syllabus - big data administration training for apache hadoop - 280715

2
COURSE OVERVIEW The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Cloudera’s open source big data platform is the most widely adopted in the world that offers the industry's highest quality technical support for Apache Hadoop to easily install, configure and manage Hadoop cluster. This 4-day course provides students with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster. After completing this course, students will be able to install Hadoop, MapReduce, Hive, Impala, and Pig, perform initial HDFS configuration, configure HDFS high availability, securing a Hadoop cluster with Kerberos, and maintain & monitor Hadoop cluster. Duration: 4 days WHO SHOULD ATTEND System administrators who will be setting up or maintaining a Hadoop cluster PREREQUISITES Some basic knowledge of Linux operating systems is strongly recommended. BIG DATA ADMINISTRATOR TRAINING FOR APACHE HADOOP

Upload: ari-pribadi

Post on 16-Dec-2015

213 views

Category:

Documents


1 download

DESCRIPTION

sdsad

TRANSCRIPT

Course OverviewThe Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. Clouderas open source big data platform is the most widely adopted in the world that offers the industry's highest quality technical support for Apache Hadoop to easily install, configure and manage Hadoop cluster.This 4-day course provides students with a comprehensive understanding of all the steps necessary to operate and maintain a Hadoop cluster. After completing this course, students will be able to install Hadoop, MapReduce, Hive, Impala, and Pig, perform initial HDFS configuration, configure HDFS high availability, securing a Hadoop cluster with Kerberos, and maintain & monitor Hadoop cluster.Duration: 4 days

Who Should Attend

System administrators who will be setting up or maintaining a Hadoop clusterPrerequisites Some basic knowledge of Linux operating systems is strongly recommended.

Course Content

Suggested Next Course

Cloudera Manager Training for Apache Hadoop

Linux Administration and Security Big Data Developer for Apache Hadoop

Big Data Administrator Training for Apache Hadoop

The Case for Apache Hadoop

Why Hadoop? Fundamental Concepts Core Hadoop Components

HDFS

HDFS Features Writing and Reading Files NameNode Memory Considerations Overview of HDFS Security Using the Namenode Web UI Using the Hadoop File Shell

Getting Data into HDFS

Ingesting Data from External Sources with Flume Ingesting Data from Relational Databases with Sqoop REST Interfaces Best Practices for Importing Data

MapReduce

What Is MapReduce? Feature of MapReduce Basic Concepts Architectural Overview MapReduce Version 2 Failure Recovery Using the Job Tracker Web UI

Planning Your Hadoop Cluster

General Planning Considerations Choosing the Right Hardware Network Considerations Configuring Nodes Planning for Cluster Management

Hadoop Installation and Initial Configuration

Deployment Types Installing Hadoop Specifying the Hadoop Configuration Performing Initial HDFS Configuration Performing Initial MapReduce Configuration Hadoop Logging

Installing and Configuring Hive, Impala, & Pig

Hive Impala Pig

Hadoop Clients

What are Hadoop Clients? Installing and Configuring Hadoop Clients Installing and Configuring Hue Hue Authentication and Authorization

Cloudera Manager

The Motivation for Cloudera Manager Cloudera Manager Features Standard and Enterprise Versions Cloudera Manager Topology Installing Cloudera Manager Installing Hadoop Using Cloudera Manager Performing Basic Administration Tasks Using Cloudera Manager

Advanced Cluster Configuration

Advanced Configuration Parameters Configuring Hadoop Ports Explicitly Including and Excluding Hosts Configuring HDFS for Rack Awareness Configuring HDFS High Availability

Hadoop Security

Why Hadoop Security Is Important Hadoops Security System Concepts What Kerberos Is and How it Works Securing a Hadoop Cluster with Kerberos

Managing and Scheduling Jobs

Managing Running Jobs Scheduling Hadoop Jobs Configuring the FairScheduler

Cluster Maintenance

Checking HDFS Status Copying Data between Clusters Adding and Removing Cluster Nodes Rebalancing the Cluster Cluster Upgrading

Cluster Monitoring and Troubleshooting

General System Monitoring Monitoring Hadoop Clusters Troubleshooting Hadoop Clusters Common Misconfigurations Common Misconfigurations