day 1 big data & hadoop by soapt

64
Big Data & Hadoop

Upload: kumar-vivek

Post on 22-Nov-2014

356 views

Category:

Documents


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Day 1 big data & hadoop By SoApt

Big Data & Hadoop

Page 2: Day 1 big data & hadoop By SoApt

❑ LIVE On-Line Classes❑ Class recordings made available for life time❑ Quizzes and Assignments at end of each chapter❑ Technical support ❑ Project work ❑ Assessment and Certification❑ Post Training Guidance and Support❑ Assistance in finding relevent Job

Page 3: Day 1 big data & hadoop By SoApt

Day 1 Day 2

Week 1 Understanding Big DataHadoop Architecture

Hadoop ClusterData Loading Techniques

Week 2 Basic MapReduce Advanced MapReduceYARN 2.0

Week 3 PIG Latin Hive

Week 4 NoSQL Databases, HBase and ZooKeeper

Project Work

Page 4: Day 1 big data & hadoop By SoApt

Page 5: Day 1 big data & hadoop By SoApt

❏❏

NYSE generates about one terabyte of new trade data per day to Perform stock trading analytics to determine trends for optimal trades.

Page 6: Day 1 big data & hadoop By SoApt

Volume Variety Velocity

Big Data

Page 7: Day 1 big data & hadoop By SoApt

Page 8: Day 1 big data & hadoop By SoApt

Page 12: Day 1 big data & hadoop By SoApt

❏❏❏❏❏

❏❏❏❏

Page 13: Day 1 big data & hadoop By SoApt
Page 14: Day 1 big data & hadoop By SoApt
Page 15: Day 1 big data & hadoop By SoApt

Page 16: Day 1 big data & hadoop By SoApt
Page 17: Day 1 big data & hadoop By SoApt

Storage - Backup / Read - Write

Processing (ETL)

Usage / Visualization

Page 18: Day 1 big data & hadoop By SoApt

OLTP RDBMS

Social

Logs

Expensive Storage and processing

Lot of Data Discarded

Storage spread across. Not easily accessible. Limited storage capacity

Reports

Reports

Page 19: Day 1 big data & hadoop By SoApt

OLTP RDBMS

Social

Logs

Lot of Data Discarded

Reports(Batch)

Hadoop

DW Reports

Page 20: Day 1 big data & hadoop By SoApt

❏❏

❏❏

Page 21: Day 1 big data & hadoop By SoApt

❏❏

Page 22: Day 1 big data & hadoop By SoApt
Page 23: Day 1 big data & hadoop By SoApt
Page 24: Day 1 big data & hadoop By SoApt
Page 25: Day 1 big data & hadoop By SoApt
Page 26: Day 1 big data & hadoop By SoApt

1 Machine4 I/O Channels

Each Channel -- 100 MBps

100 Machines4 I/O Channels

Each Channel -- 100 MBps

Page 27: Day 1 big data & hadoop By SoApt

1 Machine4 I/O Channels

Each Channel -- 100 MBps

100 Machines4 I/O Channels

Each Channel -- 100 MBps

Reading 1 TB Data

45 Minutes .45 Minutes

Page 28: Day 1 big data & hadoop By SoApt

Story of Hadoop ❏

Page 29: Day 1 big data & hadoop By SoApt

Page 30: Day 1 big data & hadoop By SoApt

Page 31: Day 1 big data & hadoop By SoApt

Page 32: Day 1 big data & hadoop By SoApt

Characterstics of Hadoop

Hadoop

Reliable

Economical

Scalable

Fault Tolerant

Page 33: Day 1 big data & hadoop By SoApt
Page 34: Day 1 big data & hadoop By SoApt
Page 35: Day 1 big data & hadoop By SoApt

❏❏❏

❏❏❏❏❏

Page 36: Day 1 big data & hadoop By SoApt

Hadoop Core Components

Page 37: Day 1 big data & hadoop By SoApt
Page 38: Day 1 big data & hadoop By SoApt
Page 39: Day 1 big data & hadoop By SoApt

Page 40: Day 1 big data & hadoop By SoApt

❏❏❏

❏❏❏❏❏

❏❏

Name Node:Keeps track of overall file directory structure and the placement of Data Block

Name Node(Stores metadata only)METADATA:/user/doug/hinfo-> 1 3 5/user/doug/pdetail-> 4 2

Page 41: Day 1 big data & hadoop By SoApt

NameNode

Edit Logs

FSImage

Page 42: Day 1 big data & hadoop By SoApt

NameNode

SedondaryNamenode

File System Metadata

Its been an hour ?

Page 43: Day 1 big data & hadoop By SoApt

Quiz

Page 44: Day 1 big data & hadoop By SoApt

Quiz

Page 45: Day 1 big data & hadoop By SoApt

QuizWhen the NameNode fails, Secondary NameNode takes over instantly and

prevents Cluster Failure:❑ TRUE❑ FALSE

Page 46: Day 1 big data & hadoop By SoApt

Quiz

When the NameNode fails, Secondary NameNode takes over instantly and prevents Cluster Failure:

❑ TRUE❑ FALSE

False. Secondary NameNode is used for creating NameNode Checkpoints. NameNode can be manually recovered using ‘edits’ and ‘FSImage’ stored in Secondary NameNode.

Page 47: Day 1 big data & hadoop By SoApt

JobTracker

Page 48: Day 1 big data & hadoop By SoApt

JobTracker (cotd..)

Page 49: Day 1 big data & hadoop By SoApt

JobTracker (cotd..)

Page 50: Day 1 big data & hadoop By SoApt

Quiz

Page 51: Day 1 big data & hadoop By SoApt

Quiz

Page 52: Day 1 big data & hadoop By SoApt
Page 53: Day 1 big data & hadoop By SoApt
Page 54: Day 1 big data & hadoop By SoApt

Rack 1 Rack 2 Rack 3

Block A Block B Block C

Topology script property topology.script.file.name in core-site.xml

Page 55: Day 1 big data & hadoop By SoApt

Page 56: Day 1 big data & hadoop By SoApt

Page 57: Day 1 big data & hadoop By SoApt
Page 58: Day 1 big data & hadoop By SoApt
Page 59: Day 1 big data & hadoop By SoApt
Page 60: Day 1 big data & hadoop By SoApt

Green - GA VersionsBlack - Not Released by Apache yetRed - Commercial

Page 62: Day 1 big data & hadoop By SoApt

❏❏

Page 63: Day 1 big data & hadoop By SoApt

Class 2 Pre-work❏ Setup hadoop environment using documents provided on google

drive❏ CDH3 (recommended) or CDH4 ❏ Execute basic linux commands❏ Execute HDFS hands on commands

❏ Attempt the class-1 assignment

Page 64: Day 1 big data & hadoop By SoApt

Thank You !See you in next class