Download - Day 1 big data & hadoop By SoApt

Big Data & Hadoop

❑ LIVE On-Line Classes❑ Class recordings made available for life time❑ Quizzes and Assignments at end of each chapter❑ Technical support ❑ Project work ❑ Assessment and Certification❑ Post Training Guidance and Support❑ Assistance in finding relevent Job

Day 1 Day 2

Week 1 Understanding Big DataHadoop Architecture

Hadoop ClusterData Loading Techniques

Week 2 Basic MapReduce Advanced MapReduceYARN 2.0

Week 3 PIG Latin Hive

Week 4 NoSQL Databases, HBase and ZooKeeper

Project Work

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏❏

❏

❏

NYSE generates about one terabyte of new trade data per day to Perform stock trading analytics to determine trends for optimal trades.

Volume Variety Velocity

Big Data

❏

❏

❏

❏

❏❏❏❏

❏

http://www.soapt.com/blog/?p=147






http://en.wikipedia.org/wiki/Big_Data

http://www-01.ibm.com/software/data/bigdata/



❏❏❏❏

❏❏❏❏

http://wiki.apache.org/hadoop/PoweredBy


❏❏❏

❏❏❏❏❏



❏❏❏❏❏

❏❏❏❏

❏

❏

Storage - Backup / Read - Write

Processing (ETL)

Usage / Visualization

OLTP RDBMS

Social

Logs

Expensive Storage and processing

Lot of Data Discarded

Storage spread across. Not easily accessible. Limited storage capacity

Reports

Reports

OLTP RDBMS

Social

Logs

Lot of Data Discarded

Reports(Batch)

Hadoop

DW Reports

❏

❏❏

❏❏

❏

❏❏

1 Machine4 I/O Channels

Each Channel -- 100 MBps

100 Machines4 I/O Channels


1 Machine4 I/O Channels


100 Machines4 I/O Channels


Reading 1 TB Data

45 Minutes .45 Minutes

Story of Hadoop ❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

❏

Characterstics of Hadoop

Hadoop

Reliable

Economical

Scalable

Fault Tolerant

❏❏❏

❏❏❏❏❏

Hadoop Core Components

❏

❏

❏

❏

❏

❏❏❏

❏❏❏❏❏

❏❏

Name Node:Keeps track of overall file directory structure and the placement of Data Block

Name Node(Stores metadata only)METADATA:/user/doug/hinfo-> 1 3 5/user/doug/pdetail-> 4 2

NameNode

Edit Logs

FSImage

❑

❑

❑

❑

NameNode

SedondaryNamenode

File System Metadata

Its been an hour ?

QuizWhen the NameNode fails, Secondary NameNode takes over instantly and

prevents Cluster Failure:❑ TRUE❑ FALSE

Quiz

When the NameNode fails, Secondary NameNode takes over instantly and prevents Cluster Failure:

❑ TRUE❑ FALSE

False. Secondary NameNode is used for creating NameNode Checkpoints. NameNode can be manually recovered using ‘edits’ and ‘FSImage’ stored in Secondary NameNode.

JobTracker

JobTracker (cotd..)

Rack 1 Rack 2 Rack 3

Block A Block B Block C

Topology script property topology.script.file.name in core-site.xml

❑

❑

Green - GA VersionsBlack - Not Released by Apache yetRed - Commercial

❏ http://blog.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/

❏ https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know❏ https://hadoop.apache.org/releases.html❏ http://hortonworks.com/blog/apache-hadoop-2-is-ga/

http://blog.cloudera.com/blog/2012/01/an-update-on-apache-hadoop-1-0/



https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know

https://blogs.apache.org/bigtop/entry/all_you_wanted_to_know

https://hadoop.apache.org/releases.html

https://hadoop.apache.org/releases.html

http://hortonworks.com/blog/apache-hadoop-2-is-ga/

http://hortonworks.com/blog/apache-hadoop-2-is-ga/

❏

❏

❏❏

Class 2 Pre-work❏ Setup hadoop environment using documents provided on google

drive❏ CDH3 (recommended) or CDH4 ❏ Execute basic linux commands❏ Execute HDFS hands on commands

❏ Attempt the class-1 assignment

Thank You !See you in next class

Download - Day 1 big data & hadoop By SoApt

Top Related