day 1 big data & hadoop by soapt

Post on 22-Nov-2014

356 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

Big Data & Hadoop

❑ LIVE On-Line Classes❑ Class recordings made available for life time❑ Quizzes and Assignments at end of each chapter❑ Technical support ❑ Project work ❑ Assessment and Certification❑ Post Training Guidance and Support❑ Assistance in finding relevent Job

Day 1 Day 2

Week 1 Understanding Big DataHadoop Architecture

Hadoop ClusterData Loading Techniques

Week 2 Basic MapReduce Advanced MapReduceYARN 2.0

Week 3 PIG Latin Hive

Week 4 NoSQL Databases, HBase and ZooKeeper

Project Work

❏❏

NYSE generates about one terabyte of new trade data per day to Perform stock trading analytics to determine trends for optimal trades.

Volume Variety Velocity

Big Data

❏❏❏❏❏

❏❏❏❏

Storage - Backup / Read - Write

Processing (ETL)

Usage / Visualization

OLTP RDBMS

Social

Logs

Expensive Storage and processing

Lot of Data Discarded

Storage spread across. Not easily accessible. Limited storage capacity

Reports

Reports

OLTP RDBMS

Social

Logs

Lot of Data Discarded

Reports(Batch)

Hadoop

DW Reports

❏❏

❏❏

❏❏

1 Machine4 I/O Channels

Each Channel -- 100 MBps

100 Machines4 I/O Channels

Each Channel -- 100 MBps

1 Machine4 I/O Channels

Each Channel -- 100 MBps

100 Machines4 I/O Channels

Each Channel -- 100 MBps

Reading 1 TB Data

45 Minutes .45 Minutes

Story of Hadoop ❏

Characterstics of Hadoop

Hadoop

Reliable

Economical

Scalable

Fault Tolerant

❏❏❏

❏❏❏❏❏

Hadoop Core Components

❏❏❏

❏❏❏❏❏

❏❏

Name Node:Keeps track of overall file directory structure and the placement of Data Block

Name Node(Stores metadata only)METADATA:/user/doug/hinfo-> 1 3 5/user/doug/pdetail-> 4 2

NameNode

Edit Logs

FSImage

NameNode

SedondaryNamenode

File System Metadata

Its been an hour ?

Quiz

Quiz

QuizWhen the NameNode fails, Secondary NameNode takes over instantly and

prevents Cluster Failure:❑ TRUE❑ FALSE

Quiz

When the NameNode fails, Secondary NameNode takes over instantly and prevents Cluster Failure:

❑ TRUE❑ FALSE

False. Secondary NameNode is used for creating NameNode Checkpoints. NameNode can be manually recovered using ‘edits’ and ‘FSImage’ stored in Secondary NameNode.

JobTracker

JobTracker (cotd..)

JobTracker (cotd..)

Quiz

Quiz

Rack 1 Rack 2 Rack 3

Block A Block B Block C

Topology script property topology.script.file.name in core-site.xml

Green - GA VersionsBlack - Not Released by Apache yetRed - Commercial

❏❏

Class 2 Pre-work❏ Setup hadoop environment using documents provided on google

drive❏ CDH3 (recommended) or CDH4 ❏ Execute basic linux commands❏ Execute HDFS hands on commands

❏ Attempt the class-1 assignment

Thank You !See you in next class

top related