Download - Day 1 big data & hadoop By SoApt
Big Data & Hadoop
❑ LIVE On-Line Classes❑ Class recordings made available for life time❑ Quizzes and Assignments at end of each chapter❑ Technical support ❑ Project work ❑ Assessment and Certification❑ Post Training Guidance and Support❑ Assistance in finding relevent Job
Day 1 Day 2
Week 1 Understanding Big DataHadoop Architecture
Hadoop ClusterData Loading Techniques
Week 2 Basic MapReduce Advanced MapReduceYARN 2.0
Week 3 PIG Latin Hive
Week 4 NoSQL Databases, HBase and ZooKeeper
Project Work
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏❏
❏
❏
NYSE generates about one terabyte of new trade data per day to Perform stock trading analytics to determine trends for optimal trades.
Volume Variety Velocity
Big Data
❏
❏
❏
❏
❏
❏
❏
❏
❏❏❏❏
❏
❏❏❏❏❏
❏❏❏❏
❏
❏
Storage - Backup / Read - Write
Processing (ETL)
Usage / Visualization
OLTP RDBMS
Social
Logs
Expensive Storage and processing
Lot of Data Discarded
Storage spread across. Not easily accessible. Limited storage capacity
Reports
Reports
OLTP RDBMS
Social
Logs
Lot of Data Discarded
Reports(Batch)
Hadoop
DW Reports
❏
❏❏
❏❏
❏
❏❏
1 Machine4 I/O Channels
Each Channel -- 100 MBps
100 Machines4 I/O Channels
Each Channel -- 100 MBps
1 Machine4 I/O Channels
Each Channel -- 100 MBps
100 Machines4 I/O Channels
Each Channel -- 100 MBps
Reading 1 TB Data
45 Minutes .45 Minutes
Story of Hadoop ❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
❏
Characterstics of Hadoop
Hadoop
Reliable
Economical
Scalable
Fault Tolerant
❏❏❏
❏❏❏❏❏
Hadoop Core Components
❏
❏
❏
❏
❏
❏❏❏
❏❏❏❏❏
❏❏
Name Node:Keeps track of overall file directory structure and the placement of Data Block
Name Node(Stores metadata only)METADATA:/user/doug/hinfo-> 1 3 5/user/doug/pdetail-> 4 2
NameNode
Edit Logs
FSImage
❑
❑
❑
❑
NameNode
SedondaryNamenode
File System Metadata
Its been an hour ?
Quiz
Quiz
QuizWhen the NameNode fails, Secondary NameNode takes over instantly and
prevents Cluster Failure:❑ TRUE❑ FALSE
Quiz
When the NameNode fails, Secondary NameNode takes over instantly and prevents Cluster Failure:
❑ TRUE❑ FALSE
False. Secondary NameNode is used for creating NameNode Checkpoints. NameNode can be manually recovered using ‘edits’ and ‘FSImage’ stored in Secondary NameNode.
JobTracker
JobTracker (cotd..)
JobTracker (cotd..)
Quiz
Quiz
Rack 1 Rack 2 Rack 3
Block A Block B Block C
Topology script property topology.script.file.name in core-site.xml
❑
❑
❑
❑
Green - GA VersionsBlack - Not Released by Apache yetRed - Commercial
❏ http://blog.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/
❏ https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know❏ https://hadoop.apache.org/releases.html❏ http://hortonworks.com/blog/apache-hadoop-2-is-ga/
❏
❏
❏❏
Class 2 Pre-work❏ Setup hadoop environment using documents provided on google
drive❏ CDH3 (recommended) or CDH4 ❏ Execute basic linux commands❏ Execute HDFS hands on commands
❏ Attempt the class-1 assignment
Thank You !See you in next class