introduction to hadoop

48
Workshop on data analytics using big data tools ‘ 2016 – BHARATHIAR UNIVERSITY K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Upload: karthika-karthi

Post on 16-Apr-2017

240 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Introduction to hadoop

Workshop on data analytics using big data tools ‘ 2016 – BHARATHIAR UNIVERSITY

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 2: Introduction to hadoop

INTRODUCTION TO

Presented ByK.SANTHIYAPh.d Research ScholarDepartment of Computer ApplicationsBharathiar University

Under the Guidance ofDr.V.BHUVANESWARI

Assistant ProfessorDepartment of Computer

ApplicationsBharathiar UniversityK.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 3: Introduction to hadoop

AGENDA

• WORLD OF DATA Few Instances

• CONVENTIONAL APPROACHES Limitations

• HADOOP FRAMEWORK Terminology Review

• HADOOP COMPONENTS HDFS & MAPREDUCE

• HDFS – IN DETAIL• HADOOP ECOSYSTEM

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 4: Introduction to hadoop

DATA EXPLOSION

2.5 quintillion bytes of data is created each day…..

1K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 5: Introduction to hadoop

WORLD WIDE DATA

Since the beginning of Time

Last two years

2K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 6: Introduction to hadoop

2.9 375 20 24 50 700 1.3 72Million MB Hrs PB Million Billion Exabytes items

THE WORLD OF DATA

3K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 7: Introduction to hadoop

Minimum size that a Big Data file starts with is at least

1 Terabyte

4K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 8: Introduction to hadoop

5K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 9: Introduction to hadoop

&

6K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 10: Introduction to hadoop

Conventional approaches

RDBMSOS FILE SYSTEM

SQL QUERIESCUSTOM FRAMEWORK

* C / C++* PERL* PYTHON

35

7K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 11: Introduction to hadoop

ISSUES IN LEGACY SYSTEMSLimited Storage CapacityLimited Processing CapacityNo ScalabilitySingle point of FailureSequential ProcessingRDBMSs can handle Structured DataRequires preprocessing of DataInformation is collected according to current business needs

8K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 12: Introduction to hadoop

How do we mine (and mind)

all this data?

HOW TO RESOLVE ALL THESE ISSUES?

9K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 13: Introduction to hadoop

Mr. HADOOP says he has a solution to our BIG problem !

10K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 14: Introduction to hadoop

11K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 15: Introduction to hadoop

43

12K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 16: Introduction to hadoop

Companies Using

13K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 17: Introduction to hadoop

What is

Apache Hadoop is a framework that allowsfor the distributed processing of large datasets across clusters of commodity computers using a simple programming model.

ConceptMoving computation is more efficient than moving large data

14K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 18: Introduction to hadoop

STORAGE COMPUTATION COMPLEXITY

15K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 19: Introduction to hadoop

TWO DAEMONS OF HADOOP

44

16K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 20: Introduction to hadoop

ARCHITECTURE

17K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 21: Introduction to hadoop

TERMINIOLOGY REVIEW

Node 1

Node 2

Node n

::

Rack 1

Node 1

Node 2

Node n

::

Rack 2 ::

Clus

ter

18

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 22: Introduction to hadoop

HADOOP CLUSTER ARCHITECTURE

19K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 23: Introduction to hadoop

20K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 24: Introduction to hadoop

HADOOP CORE SERVICES

i. Name Nodeii.Data Nodeiii.Resource Manageriv.Application Masterv.Node Managervi.Secondary Name Node

21K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 25: Introduction to hadoop

HDFS – REAL LIFE CONNECT

• A college library was gifted a massive collection of books by a patron. The books were very popular titles. The librarian decided to arrange the books in a small rack, and distribute multiple copies of each book in other racks, so that students can find the books easily. Similarly, HDFS creates multiple copies of a data block, and keeps them in separate systems for easy access.

22K.Santhiya , Ph.d Research

Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 26: Introduction to hadoop

WHAT IS HDFS• Hadoop Distributed File System

Highly Fault tolerant , distributed , reliable , scalable file system for data storage.

Stores multiple copies of data on different nodes

A File is split up into blocks and stored on multiple machines

Hadoop cluster typically has a single namenode and no. of data nodes to form a hadoop cluster. 2

3K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari,

Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 27: Introduction to hadoop

HDFS BLOCKS

• Files are broken in to large blocks. Typically 128 MB block size Blocks are replicated for reliability One replica on local node, Another replica on a remote rack, Third replica on local rack,

Additional replicas are randomly placed

24K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 28: Introduction to hadoop

HDFS BLOCKS contd.,ADVANTAGES OF HDFS BLOCKSFixed Size Chunk of file < block size : Only needed space is

used.Eg : 420 MB file is split as

25K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 29: Introduction to hadoop

HDFS Operation Principle

26K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 30: Introduction to hadoop

NAME NODE

27K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 31: Introduction to hadoop

DATA NODE

28K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 32: Introduction to hadoop

SECONDARY NAME NODE

29K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 33: Introduction to hadoop

HDFS ARCHITECTURE

30K.Santhiya , Ph.d Research

Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 34: Introduction to hadoop

HDFS – BLOCK REPLICATION ARCHITECTURE

31K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 35: Introduction to hadoop

NAMENODE IN HA MODE

32K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 36: Introduction to hadoop

Name Node HA Architecture

33K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 37: Introduction to hadoop

BUSINESS SCENARIO olivia tyler is the evp of it operations with nutri worldwide, inc.,and she has decided to use

hdfs for storing big data. she will use hdfs shell to store the data in a hadoop file system, and she will execute various commands on it.

34K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 38: Introduction to hadoop

35K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 39: Introduction to hadoop

HADOOP SHELL COMMANDS

hadoop fs -mkdir /learning

hadoop fs –copyFromLocal test.txt /learning

hadoop fs -ls /learning

hadoop fs -cat/learning/test.txt

36K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 40: Introduction to hadoop

HADOOP ECOSYSTEM COMPONENTS

37K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 41: Introduction to hadoop

DATA TRANSFER COMPONENTS

38K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 42: Introduction to hadoop

DATA STORE COMPONENTS

• Following are the data store components of the Hadoop Ecosystem.

DISTRIBUTEDSCALABLE

BIG DATA STORE

SCALABLECONSISTENTDISTRIBUTED

STRUCTURED KEY VALUE STORE

SORTED DISTRIBUTED KEY

VALUE DATA STORAGE AND

RETRIEVAL SYSTEM

HBASE CASSANDRA ACCUMULO

39K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 43: Introduction to hadoop

Serialization Components

• The serialization components are Avro, Trevni, and Thrift.

• Avro is a data serialization system. • Trevni is a column file format used to

permit compatible, independent implementations that read and /or write files in this format.

• Thrift is a framework for scalable, cross-language services development.

40K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,-

WDABT 2016

Page 44: Introduction to hadoop

JOB EXECUTION COMPONENTS

• Following are the job execution components :

41K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 45: Introduction to hadoop

WORK MANAGEMENT COMPONENTS

42K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 46: Introduction to hadoop

CONCLUSION

56

43K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 47: Introduction to hadoop

REFERENCES• J. Gantz and D. Reinsel, ``The digital universe in 2020: Big data, bigger

digital shadows, and biggest growth in the far east,'' in Proc. IDC iView,IDC Anal. Future, 2012.

• (2015) Available : [online] http://expandedramblings.com/index.php/by-the-numbers-a-gigantic-list-of-google-stats-and-facts/

• D. Evans and R. Hutley, ``The explosion of data,'' white paper, 2010.

• Seema Acharya, Subhashini Chelleppan " Big Data and Analytics "Wiley India Pvt Ltd , 2015

• Dhruba Borthakur , " HDFS Architecture Guide " , 2013.

• Available:[Online]http:// hortonworks.com/hadoop/flume/#section_2

• Marko Grobelnik , " Big-Data tutorial" , white paper,2012.

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016

Page 48: Introduction to hadoop

44

K.Santhiya , Ph.d Research Scholar , Dr.V.Bhuvaneswari, Asst.Professor, Dept. of Comp. Appll., Bharathiar University,- WDABT 2016