hadoop distriubted file system (hdfs) presentation 27- 5-2015

42
In The Name of Allah The Most Merciful The Most Gracious • Name: Abdul Nasir Afridi • Roll Number:01 • Batch#10 • Subject: Advanced Database And Data mining. Page-1

Upload: abdul-nasir

Post on 16-Feb-2017

198 views

Category:

Education


3 download

TRANSCRIPT

Page 1: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

In The Name of Allah The Most Merciful The Most Gracious

• Name: Abdul Nasir Afridi• Roll Number:01• Batch#10• Subject: Advanced Database And Data

mining.Page-1

Page 2: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Research Article1. Performance Evaluation of Read and

Write operations in Hadoop Distributed File System.

Published: 2014 Sixth International Symposium on Parallel Architectures, Algorithms and Programming

Conference Paper: IEEE Computer Society

Authors: Dr T Ragunathan et al.

7B-2

Page 3: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Research Article • High Performance and Fault Tolerant

Distributed File System for Big Data Storage and Processing using Hadoop

• Published: 2014 International Conference on Intelligent Computing Applications

• © 2014 IEEE Conference Publishing Services

7B-3

Page 4: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Research Article• A Distributed Storage Model for EHR

Based on HBase

• Published: © 2011 IEEE International Conference on Information Management, Innovation Management and Industrial Engineering

7B-4

Page 5: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Research Article

7B-5

H-Store: A High-Performance, Distributed Main Memory Transaction Processing System

Published: August 23-28, 2008, Auckland, New Zealand Conference Paper:ACM 978-1-60558-306-8/08/08 Copyright 2008 VLDB Endowment,

Page 6: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

• Keywords-• Hadoop Distributed File System(HDFS);• H-Base• Electronic healthcare record(EHR)• Distritued Storage• Big Data • MapReduce

7B-6

Page 7: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

What is Apache Hadoop?• Hadoop Distributed File System:• HDFS, the storage layer of Hadoop, is a

distributed, scalable, Java-based file system adept at storing large volumes of unstructured data

• It is an open-source system developed by Apache in Java.

• It is designed to handle very large data sets.• It is designed to scale to very large clusters.• It is designed to run on commodity hardware.

7B-7

Page 8: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem

7B-8

Page 9: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop History

7B-9

Page 10: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop Echosystem

7B-10

Page 11: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem

7B-11

Page 12: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• Hadoop Distributed File System:HDFS, the

storage layer of • Hadoop, is a distributed, scalable, Java-based

file system.• It offers data replication. • It offers automatic failover in the event of a

crash. •• It automatically fragments storage over the

cluster. •• It brings processing to the data. •• Its supportlarge volumes of file into the milion

7B-12

Page 13: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• MapReduce:• MapReduce is a software framework that

serves as the compute layer of Hadoop.• MapReduce jobs are divided into two

parts.The mapfunction divides a query into multiple parts and processes data at the node level.

• The reducefunction aggregates the results of the map function to determine the answer to the query.

7B-13

Page 14: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• Hive:Hive is a Hadoop-based data warehouse

developed by Facebook. It allows users to write queries in SQL, which are then converted to map-reduce. This allows SQL programmers with no map-reduce experience to use the warehouse and makes it easier to integrate with business intelligence and visualization tools such as Micro Strategy, Tableau, Revolutions Analytics, etc

7B-14

Page 15: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• Pig:Pig Latin is a Hadoop-based language

developed by Yahoo. It is relatively easy to learn and is adept at very

deep, very long data pipelines (a limitation of SQL.)

Pig, originally developed at Yahoo research, is a high-level language for building map-reduce programs for Hadoop,

thus simplifying the use of map-reduce. It is a data flow language that provides high-level commands

7B-15

Page 16: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem

7B-16

Page 17: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• HBase:• HBase is a non-relational database that

allows for low-latency, quick lookups in Hadoop.

• It adds transactional capabilities to Hadoop, allowing users to conduct updates,inserts, and deletes.

• E-Bay and Facebook use HBase heavily

7B-17

Page 18: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• Flume:• Flume is a framework for populating

Hadoop with data.• Agents are populated throughout ones’

IT infrastructure (inside web servers, application servers, and mobile devices, for example) to collect data and integrate it into Hadoop.

7B-18

Page 19: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• Oozie:• Oozie is a workflow processing system that

lets users define a series of jobs written in multiple languages (such as mapreduce, Pig and Hive) then intelligently links them to one another.

• Oozie allows users to specify, for example, that a particular query is only to be initiated after specified previous jobs on which it relies for data are completed

7B-19

Page 20: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• Whirr:• Whirr is a set of libraries that allows

users to easily spin-up Hadoop clusters on top of Amazon EC2, Rackspace, or any virtual infrastructure.

• It supports all major virtualized infrastructure vendors on the market

7B-20

Page 21: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• Avro:• Avro is a data serialization system that

allows for encoding the schema of Hadoop files.

• It is adept at parsing data and performing removed procedure calls.

7B-21

Page 22: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• Mahout:• Mahout is a data-mining library.• It takes the most popular data-mining

algorithms for performing clustering, regression testing, and statistical modeling

• and implements them using the map-reduce mode

7B-22

Page 23: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

7B-23

Page 24: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop echosystem• Sqoop:• Sqoop is a connectivity tool for moving data

from non-Hadoop data stores such as relational databases and data warehouses into Hadoop.

• It allows users to specify the target location inside of Hadoop and instruct Sqoop to move data from Oracle, Teradata, or other relational databases to the target

7B-24

Page 25: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop Configuration File

7B-25

Page 26: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Data Ingress And Egress

7B-26

Page 27: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Joining Type Venn Diagram

7B-27

Page 28: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Big dataBig data is being generated by everything

around us at all times. Every digital process and social media

exchange produces it. Systems, sensors and mobile devices

transmit it. Big data is arriving from multiple sources at an

alarming velocity, volume and variety. To extract meaningful value from big data,

you need optimal processing power, analytics capabilities and skills.

7B-28

Page 29: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Big Data

7B-29

Page 30: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Typical Hadoop cluster integrates MapReduce and HFDS

Master/slave architecture

7B-30

Page 31: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Pictorial Representation Hadoop

7B-31

Page 32: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Physical Architecture of Hadoop echosystem

7B-32

Page 33: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

HDFS

7B-33

Page 34: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

MapReduce

7B-34

Page 35: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

HDFS Namenode

7B-35

Page 36: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Scheduling• By default▫ Hadoop uses FIFO to schedule jobs. ▫  No preemption once a job is running.In Hadoop version 2.x fair scheduling

introduces.assigning resources to applications such that all applications get, on average, an equal share of resources over time

7B-36

Page 37: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

Hadoop Implementation

7B-37

Page 38: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

References• Reference• The Ministry of Health of P . R. China.

Health records infrastructure and data standards.[CP/OL].[ 2009 05] http://www.moh.gov.cn/publicfiles/business/cmsresources/mohbgt/cmsrsdocument/doc4359.doc

• Jonathan R. Owens. Hadoop Real-World Solutions Cookbook Copyright© 2013 Packt Publishing

7B-38

Page 39: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

References• HDFS:Architecture[OL].http://

hadoop.apache.org/ • Terabyte sort[OL]. http://sortbenchmark.org/. • T. White, Hadoop: The Definitive Guide.

O'Reilly Media, Yahoo! Press, June 5, 2009.• Mahesh, Bharath, Keerthivasan, “Review of

Distributed File Systems: Concepts and Case Studies” ECE 677 Distributed Computing Systems - Fall 2010

• Jeff Markham , Apache Hadoop™ YARN.• Addison-Wesley Press ,2014

7B-39

Page 40: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

References• Eric Sammer ,Hadoop Operations

Copyright © 2012 Published by O’Reilly Media

• Kevin Sitto and Marshall Presser,Field Guide to Hadoop, Copyright © 2015, Published by O’Reilly Media

• John Wiley & Sons, NoSQL For Dummies® New Jersey Media and software compilation copyright © 2015

7B-40

Page 41: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

7B-41

Page 42: Hadoop Distriubted File System (HDFS) presentation 27- 5-2015

7B-42