hadoop for humans
DESCRIPTION
TRANSCRIPT
Hadoop for humans
Kien PhamSoftware Engineer - R&D Anaheim, CA
10/04/2013
Friday, October 4, 13
Hadoop?
Friday, October 4, 13
isa framework
HDFSMap /Reduce
http://www.flickr.com/photos/d90nikon/6195610430/sizes/o/in/photostream/
Friday, October 4, 13
Map / ReduceFriday, October 4, 13
MapperI like SendGrid and email, you like SendGrid and email too
1 1 1 1 1
Friday, October 4, 13
MapperI like SendGrid and email, you like SendGrid and email too
1 1 1 1 1
I like SendGrid and email, you like SendGrid and email too1 1 1 1 1
I like SendGrid and email, you like SendGrid and email too1 1 1 1 1
worker 1
worker 2
worker 3
Friday, October 4, 13
Reducer1like
SendGridemailSendGridemail
1111
1likeSendGridemail
22
Friday, October 4, 13
1likeSendGridemail
22
key value
Friday, October 4, 13
key value{"d": "2013-09-01", "t": "j"}
{"d": "2013-09-02", "t": "j"}
{"d": "2013-09-01", "t": "x"}
{"d": "2013-09-02", "t": "x"}
764872
269661
190889
71693
Friday, October 4, 13
HDFS
Friday, October 4, 13
HDFS
Friday, October 4, 13
HDFS @ SG138 TB
Friday, October 4, 13
1 TB = 1,024 GB138TB = 141,312 GB
300GB / day
141,312 GB / 300 GB = 471 daysFriday, October 4, 13
S3Friday, October 4, 13
2015
50% of the world’s data
Hadoop will process
http://www.flickr.com/photos/tisdale53/4737492082/
Friday, October 4, 13
custom jobs?
Friday, October 4, 13
mrgumble
Friday, October 4, 13
abstract Hadoop process
Friday, October 4, 13
startstop
statusresult
Friday, October 4, 13
mrgumble start -j my_cool_job
Friday, October 4, 13
mrgumble stop -j my_cool_job
Friday, October 4, 13
mrgumble status --job_id 1234
Friday, October 4, 13
mrgumble result -j job_name
Friday, October 4, 13
excited?
Friday, October 4, 13
template.pyhadoop-jobs repo jobs/
Friday, October 4, 13
import mrgumbleimport sgstats-hadoop
Friday, October 4, 13
Live Demo
Friday, October 4, 13