tackling big data with hadoop

28
TACKLING BIG DATA WITH HADOOP David Howell Sunday, September 11, 11

Upload: poorlytrainedape

Post on 17-May-2015

1.812 views

Category:

Technology


7 download

DESCRIPTION

An introduction to Hadoop, present at Vermont Code Camp 2011.

TRANSCRIPT

Page 1: Tackling Big Data with Hadoop

TACKLING BIG DATA WITH HADOOP

David Howell

Sunday, September 11, 11

Page 2: Tackling Big Data with Hadoop

WHAT IS BIG DATA?

Sunday, September 11, 11

Page 3: Tackling Big Data with Hadoop

WHAT IS BIG DATA?Google web crawl

Sunday, September 11, 11

Page 4: Tackling Big Data with Hadoop

WHAT IS BIG DATA?stream of Twitter messages

Sunday, September 11, 11

Page 5: Tackling Big Data with Hadoop

WHAT IS BIG DATA?Annoying Farmville requests on Facebook

Sunday, September 11, 11

Page 6: Tackling Big Data with Hadoop

WHAT IS BIG DATA?terabyte-scale data sets

awkward to work with using traditional tools

Sunday, September 11, 11

Page 7: Tackling Big Data with Hadoop

WHAT IS BIG DATA?requires distributed computing

Sunday, September 11, 11

Page 8: Tackling Big Data with Hadoop

MEDIUM DATAdozens to hundreds of gigabytes

still awkward to work with using traditional tools

Sunday, September 11, 11

Page 9: Tackling Big Data with Hadoop

MAP-REDUCEhttp://labs.google.com/papers/mapreduce.html

Sunday, September 11, 11

Page 10: Tackling Big Data with Hadoop

Sunday, September 11, 11

Page 11: Tackling Big Data with Hadoop

Sunday, September 11, 11

Page 12: Tackling Big Data with Hadoop

COUNTING AT SCALE

Sunday, September 11, 11

Page 13: Tackling Big Data with Hadoop

function map_1(t, search_phrase)emit(search_phrase, 1)

function reduce_1(search_phrase, counts)total = 0for count in countstotal += count

emit(search_phrase, total)

function map_2(search_phrase, total)emit(total, search_phrase)

function reduce_2(total, search_phrases)for search_phrase in search_phrasesemit(search_phrase, total)

sort and shuffle

sort and shuffle

Sunday, September 11, 11

Page 14: Tackling Big Data with Hadoop

cat IN | sort | uniq -c > OUTmap shuffle reduce

awk ‘{print $2,$1}’ OUT | sort > FINAL map shuffle reduce

Sunday, September 11, 11

Page 15: Tackling Big Data with Hadoop

WHY BOTHER?

Sunday, September 11, 11

Page 16: Tackling Big Data with Hadoop

HADOOP

Sunday, September 11, 11

Page 17: Tackling Big Data with Hadoop

DISTRIBUTED COMPUTING PLATFORM

Sunday, September 11, 11

Page 18: Tackling Big Data with Hadoop

TOOLS IN THE PLATFORM

Higher Level APIs•Hive•Cascading•Pig

Map-Reduce APIs•Java•C++•UNIX pipes

Sunday, September 11, 11

Page 19: Tackling Big Data with Hadoop

THE ORIGIN STORY

Sunday, September 11, 11

Page 20: Tackling Big Data with Hadoop

WHO’S USING IT?

Sunday, September 11, 11

Page 21: Tackling Big Data with Hadoop

HADOOPHow does it work?

Sunday, September 11, 11

Page 22: Tackling Big Data with Hadoop

Sunday, September 11, 11

Page 23: Tackling Big Data with Hadoop

Sunday, September 11, 11

Page 24: Tackling Big Data with Hadoop

Sunday, September 11, 11

Page 25: Tackling Big Data with Hadoop

Sunday, September 11, 11

Page 26: Tackling Big Data with Hadoop

DEMO!

Sunday, September 11, 11

Page 27: Tackling Big Data with Hadoop

YOUR DATA PLATFORM

ad hocunstructuredprototypingexperimentdata-driven

curiosityplay

Sunday, September 11, 11

Page 28: Tackling Big Data with Hadoop

LEARN MORE

http://hadoop.apache.org/http://www.cloudera.com/

Hadoop: The Definitive Guide

@[email protected]

http://github.com/dehowell/hadoop-crypto-demo

Sunday, September 11, 11