introduction to apache hadoop zibo wang. introduction what is apache hadoop? apache hadoop is a...

9

Introduction to Apache Hadoop Zibo Wang

Upload: alicia-cook

Post on 12-Jan-2016

227 views

Category:

Documents

7 download

Report

Download

Embed Size (px):

TRANSCRIPT

Page 1: Introduction to Apache Hadoop Zibo Wang. Introduction What is Apache Hadoop? Apache Hadoop is a software framework which provides open source libraries

Introduction to Apache Hadoop

Zibo Wang

Page 2: Introduction to Apache Hadoop Zibo Wang. Introduction What is Apache Hadoop? Apache Hadoop is a software framework which provides open source libraries

Introduction What is Apache Hadoop?

Apache Hadoop is a software framework which provides open source libraries for data-intensive computing using simple single map-reduce interface and its own distributed file system called HDFS.

Started by Doug Cutting and Mike Cazfarella. Written in JAVA

Page 3: Introduction to Apache Hadoop Zibo Wang. Introduction What is Apache Hadoop? Apache Hadoop is a software framework which provides open source libraries

Introduction The use of Hadoop

Compute Storage Database

The advantages of Hadoop Scalable Algorithms Log Management Extract-Transform-Load (ETL) Platform

Page 4: Introduction to Apache Hadoop Zibo Wang. Introduction What is Apache Hadoop? Apache Hadoop is a software framework which provides open source libraries

Map-Reduce Introduced by Google

A simple and powerful interface that enables automatic parallelization and distribution of large-scale computation.

Two major functions Map Reduce

Nodes and trackers

Page 5: Introduction to Apache Hadoop Zibo Wang. Introduction What is Apache Hadoop? Apache Hadoop is a software framework which provides open source libraries

Map-Reduce

Page 6: Introduction to Apache Hadoop Zibo Wang. Introduction What is Apache Hadoop? Apache Hadoop is a software framework which provides open source libraries

Hadoop Distributed File System (HDFS) It has large block size (default 64mb) for

storage to compensate for seek time to network bandwidth. So very large files for storage are ideal.

Streaming data access. Write once and read many times architecture. Since files are large time to read is significant parameter than seek to first record.

Commodity hardware. It is designed to run on commodity hardware which may fail. HDFS is capable of handling it.

Page 7: Introduction to Apache Hadoop Zibo Wang. Introduction What is Apache Hadoop? Apache Hadoop is a software framework which provides open source libraries

HDFS Architecture Filesystem Metadata Framework of write Framework of read

Page 8: Introduction to Apache Hadoop Zibo Wang. Introduction What is Apache Hadoop? Apache Hadoop is a software framework which provides open source libraries

Prominent Users of Hadoop Yahoo!

More than 10,000 core Linux cluster Open scource

Facebook 30 PB data

Amazon Amazon Elastic Compute Cloud Amazon Simple Storage Service

Page 9: Introduction to Apache Hadoop Zibo Wang. Introduction What is Apache Hadoop? Apache Hadoop is a software framework which provides open source libraries

Thank you!

Apache Hadoop Email Lists

Intro to Apache Hadoop

Apache Hadoop Developer Training.pdf

Apache Hadoop Today & Tomorrow · 2019-12-21 · Apache Hadoop Projects . Programming Languages . Computation Object Storage Zookeeper (Coordination) Core Apache Hadoop Related Apache

Apache Hadoop FileSystem Internals - SNIA · Apache Hadoop FileSystem Internals Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System ... July 2005 – Nutch uses MapReduce

Apache Hadoop Developer Training

Introduction to Apache hadoop

Apache Hadoop 1.1

Apache Hadoop Tutorial - enos.itcollege.eeenos.itcollege.ee/~jpoial/allalaadimised/reading/Apache-Hadoop... · and theMapReduce paper. A key advantage of Apache Hadoop is its design

20100130 hadoop apache

Apache Hadoop India Summit 2011 talk "Making Apache Hadoop Secure" by Devaraj Das

Apache hadoop technology : Beginners

Introduccion apache hadoop

Apache Hadoop Crash Course

Apache Hadoop Today & Tomorrow - SNIA€¦ · Apache Hadoop Projects . Programming Languages . Computation Object Storage Zookeeper (Coordination) Core Apache Hadoop Related Apache

RefCardz - Apache Hadoop

Apache Hadoop 3 Current Status Ajisaka - schd.wsschd.ws/hosted_files/apachebigdata2016/0d/Apache Hadoop 3 Current... · Apache Hadoop 3, Current Status Apache: ... n metrics2 sink

Apache Hadoop 2.0

Apache Spark & Hadoop

Apache Hadoop and Hive

Making Apache Hadoop Secure

Apache Hadoop Security - Ranger

MapReduce Programming with Apache Hadoop - DSTdst.lbl.gov/ACSDownloads/kjackson/downloads/Hadoop-HDFS8-12pm.… · MapReduce Programming with Apache Hadoop Viraj Bhat ... (hadoop,

Apache hadoop hbase

Introduction Apache oozie (Hadoop workflow engine)€¦ · Hadoop Professional Training 4. Apache OOZie HandsOn Professional Training INTRODUCTION APACHE OOZIE (HADOOP WORKFLOW ENGINE)

Apache Hadoop Releaseshadoop.apache.org/old/releases.pdf · Apache Hadoop 2.9.0 is the first release of Hadoop 2.9 line and will be the starting release for Apache Hadoop 2.9.x line

Apache Hadoop Today & Tomorrow - SNIA · 2020-05-05 · Apache Hadoop Projects . Programming Languages . Computation Object Storage Zookeeper (Coordination) Core Apache Hadoop Related

Apache Hadoop YARN

Introduction to Apache Hadoop & Pig - SALSAHPCsalsahpc.indiana.edu/CloudCom2010/slides/PDF/tutorials/Yahoo... · Hadoop & Pig Milind Bhandarkar ... (hadoop, pig) (apache, pig) (hadoop,

Spring for Apache Hadoop - Reference Documentation · Spring for Apache Hadoop 1.1.0.RELEASE-phd1 Spring for Apache Hadoop - Reference Documentation ii Table of Contents Preface

Apache Hadoop Ingestion Patterns & Apache Flume

Apache Hadoop Tutorial

Apache Hadoop Ecosystem - LIAS (Lab · Apache Hadoop Ecosystem ... Apache Drill, Cloudera Impala. Thank you for Your Attention Q & A Apache Hadoop Ecosystem ENSMA …

Apache Hadoop Java API