cloudera administrator training for apache hadoop v2

3
Administrator Training for Apache Hadoop Take your knowledge to the next level with Cloudera’s Apache Hadoop Training and Certification Cloudera University’s three-day administrator training course for Apache Hadoop provides system administrators a comprehensive understanding of all the steps necessary to operate and manage Hadoop clusters. From installation and configuration through load balancing and tuning your cluster, Cloudera’s administration course has you covered. Through lecture and interactive, hands-on exercises, attendees will cover topics such as: > Introduction to Apache Hadoop and HDFS > Apache Hadoop architecture > Proper cluster configuration and deployment > Populating HDFS using Apache Sqoop > Management and monitoring tools > Job scheduling > Best practices for maintaining Apache Hadoop in production > Installing and managing other Apache Hadoop projects > Diagnosing, tuning and solving Apache Hadoop issues Upon completion of the course, attendees receive a voucher for a Cloudera Certified Administrator for Apache Hadoop (CCAH) exam. Certification is a great differentiator; it helps establish individuals as leaders in their field, providing customers with tangible evidence of skills and expertise. AUDIENCE This course is designed for people with at least a basic level of Linux system administration experience. Prior knowledge of Hadoop is not required. Cloudera Administrator Training for Apache Hadoop helped me to advance my use of Apache Hadoop and cultivate a better understanding of the platform’s inner workings. The course material, interactive labs and exercises really helped cement together all the little bits and pieces that I had bumped into prior to the class into a useful mental model of how Apache Hadoop works. ERIC MARSHALL, SENIOR SYSTEM ADMINISTRATOR TRAINING SHEET

Upload: deepu1403

Post on 05-Sep-2015

23 views

Category:

Documents


5 download

DESCRIPTION

Training for Cloudera Hadoop Admin Training

TRANSCRIPT

  • Administrator Training for Apache Hadoop

    Take your knowledge to the next level with Clouderas Apache Hadoop Training and Certification

    Cloudera Universitys three-day administrator training course for Apache Hadoop provides system administrators a comprehensive understanding of all the steps necessary to operate and manage Hadoop clusters. From installation and configuration through load balancing and tuning your cluster, Clouderas administration course has you covered.

    Through lecture and interactive, hands-on exercises, attendees will cover topics such as:

    > Introduction to Apache Hadoop and HDFS> Apache Hadoop architecture> Proper cluster configuration and deployment> Populating HDFS using Apache Sqoop> Management and monitoring tools> Job scheduling> Best practices for maintaining Apache Hadoop in production> Installing and managing other Apache Hadoop projects> Diagnosing, tuning and solving Apache Hadoop issues

    Upon completion of the course, attendees receive a voucher for a Cloudera Certified Administrator for Apache Hadoop (CCAH) exam. Certification is a great differentiator; it helps establish individuals as leaders in their field, providing customers with tangible evidence of skills and expertise.

    AUDIENCE This course is designed for people with at least a basic level of Linux system administration experience. Prior knowledge of Hadoop is not required.

    Cloudera Administrator Training for Apache Hadoop helped me to advance my use of Apache Hadoop and cultivate a better understanding of the platforms inner workings. The course material, interactive labs and exercises really helped cement together all the little bits and pieces that I had bumped into prior to the class into a useful mental model of how Apache Hadoop works.

    ERIC MARSHALL,SENIOR SYSTEM ADMINISTRATOR

    TRAINING SHEET

  • Course Outline: Cloudera Administrator Training for Apache Hadoop

    Introduction

    The Case for Apache Hadoop> A Brief History of Hadoop> Core Hadoop Components> Fundamental Concepts

    The Hadoop Distributed File System> HDFS Features> HDFS Design Assumptions> Overview of HDFS Architecture> Writing and Reading Files> NameNode Considerations> An Overview of HDFS Security> Hands-On Exercise

    MapReduce> What Is MapReduce?> Features of MapReduce> Basic MapReduce Concepts> Architectural Overview> MapReduce Version 2> Failure Recovery> Hands-On Exercise

    An Overview of the Hadoop Ecosystem> What is the Hadoop Ecosystem?> Integration Tools> Analysis Tools> Data Storage and Retrieval Tools

    Planning your Hadoop Cluster> General planning Considerations> Choosing the Right Hardware> Network Considerations> Configuring Nodes

    Hadoop Installation> Deployment Types> Installing Hadoop> Using Cloudera Manager

    for Easy Installation> Basic Configuration Parameters> Hands-On Exercise

    Advanced Configuration> Advanced Parameters> Configuring Rack Awareness> Configuring Federation> Configuring High Availability> Using Configuration

    Management Tools

    Hadoop Security> Why Hadoop Security Is Important> Hadoops Security System Concepts> What Kerberos Is and How it Works> Configuring Kerberos Security> Integrating a Secure Cluster

    with Other Systems

    Managing and Scheduling Jobs> Managing Running Jobs> Hands-On Exercise> The FIFO Scheduler> The FairScheduler> Configuring the FairScheduler> Hands-On Exercise

    Cluster Maintenance> Checking HDFS Status> Hands-On Exercise> Copying Data Between Clusters> Adding and Removing

    Cluster Nodes> Rebalancing the Cluster> Hands-On Exercise> NameNode Metadata Backup> Cluster Upgrading

    Cluster Monitoring and Troubleshooting > General System Monitoring> Managing Hadoops Log Files> Using the NameNode and

    JobTracker Web UIs> Hands-On Exercise> Cluster Monitoring with Ganglia> Common Troubleshooting Issues> Benchmarking Your Cluster

    Populating HDFS From External Sources> An Overview of Flume> Hands-On Exercise> An Overview of Sqoop> Best Practices for Importing Data

    Installing and Managing Other Hadoop Projects> Hive> Pig> HBase

    Conclusion

    Appendix: Kerberos Configuration

    TRAINING SHEET

  • 2012 Cloudera, Inc. All rights reserved. Cloudera and the Cloudera logo are trademarks or registered trademarks of Cloudera Inc. in the USA and other countries. All other trademarks are the property of their respective companies. Information is subject to change without notice.

    Cloudera, Inc. 220 Portage Avenue, Palo Alto, CA 94306 USA | 1-888-789-1488 or 1-650-362-0488 | cloudera.com

    Cloudera Certified Administrator for Apache Hadoop (CCAH)Establish yourself as a trusted and valuable resource by completing the certification exam for Apache Hadoop administrators. CCAH certifies the core system administrator skills sought by companies and organizations deploying Apache Hadoop. The exam can be demanding and will test your fluency with concepts and terminology in the following areas:

    Hadoop Distributed File System (HDFS)Recognize and identify daemons and understand the normal operation of an Apache Hadoop cluster, both in data storage and in data processing. Describe the current features of computing systems that motivate a system like Apache Hadoop:> HDFS Design> HDFS Daemons> HDFS Federation> HDFS HA> Securing HDFS (Kerberos)> File Read and Write Paths

    MapReduce Understand MapReduce core concepts and MapReduce v2 (MRv2 / YARN).

    Apache Hadoop Cluster Planning Discuss the principal points to consider in choosing the hardware and operating systems to host an Apache Hadoop cluster.

    Apache Hadoop Cluster Installation and Administration Analyze cluster handling of disk and machine failures. Recognize and identify regular tools for monitoring and managing HDFS.

    Resource Management Describe how the default FIFO scheduler and the FairScheduler handle the tasks in a mix of jobs running on a cluster.

    Monitoring and Logging Discuss the functions and features of Apache Hadoops logging and monitoring systems.

    Ecosystem Understand ecosystem projects and what you need to do to deploy them on a cluster.

    TRAINING SHEET