hadoop for humans

27
Hadoop for humans Kien Pham Software Engineer - R&D Anaheim, CA 10/04/2013 Friday, October 4, 13

Upload: kien-pham

Post on 13-Dec-2014

297 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hadoop for humans

Hadoop for humans

Kien PhamSoftware Engineer - R&D Anaheim, CA

10/04/2013

Friday, October 4, 13

Page 2: Hadoop for humans

Hadoop?

Friday, October 4, 13

Page 3: Hadoop for humans

isa framework

HDFSMap /Reduce

http://www.flickr.com/photos/d90nikon/6195610430/sizes/o/in/photostream/

Friday, October 4, 13

Page 4: Hadoop for humans

Map / ReduceFriday, October 4, 13

Page 5: Hadoop for humans

MapperI like SendGrid and email, you like SendGrid and email too

1 1 1 1 1

Friday, October 4, 13

Page 6: Hadoop for humans

MapperI like SendGrid and email, you like SendGrid and email too

1 1 1 1 1

I like SendGrid and email, you like SendGrid and email too1 1 1 1 1

I like SendGrid and email, you like SendGrid and email too1 1 1 1 1

worker 1

worker 2

worker 3

Friday, October 4, 13

Page 7: Hadoop for humans

Reducer1like

SendGridemailSendGridemail

1111

1likeSendGridemail

22

Friday, October 4, 13

Page 8: Hadoop for humans

1likeSendGridemail

22

key value

Friday, October 4, 13

Page 9: Hadoop for humans

key value{"d": "2013-09-01", "t": "j"}

{"d": "2013-09-02", "t": "j"}

{"d": "2013-09-01", "t": "x"}

{"d": "2013-09-02", "t": "x"}

764872

269661

190889

71693

Friday, October 4, 13

Page 10: Hadoop for humans

HDFS

Friday, October 4, 13

Page 11: Hadoop for humans

HDFS

Friday, October 4, 13

Page 12: Hadoop for humans

HDFS @ SG138 TB

Friday, October 4, 13

Page 13: Hadoop for humans

1 TB = 1,024 GB138TB = 141,312 GB

300GB / day

141,312 GB / 300 GB = 471 daysFriday, October 4, 13

Page 14: Hadoop for humans

S3Friday, October 4, 13

Page 15: Hadoop for humans

2015

50% of the world’s data

Hadoop will process

http://www.flickr.com/photos/tisdale53/4737492082/

Friday, October 4, 13

Page 16: Hadoop for humans

custom jobs?

Friday, October 4, 13

Page 17: Hadoop for humans

mrgumble

Friday, October 4, 13

Page 18: Hadoop for humans

abstract Hadoop process

Friday, October 4, 13

Page 19: Hadoop for humans

startstop

statusresult

Friday, October 4, 13

Page 20: Hadoop for humans

mrgumble start -j my_cool_job

Friday, October 4, 13

Page 21: Hadoop for humans

mrgumble stop -j my_cool_job

Friday, October 4, 13

Page 22: Hadoop for humans

mrgumble status --job_id 1234

Friday, October 4, 13

Page 23: Hadoop for humans

mrgumble result -j job_name

Friday, October 4, 13

Page 24: Hadoop for humans

excited?

Friday, October 4, 13

Page 25: Hadoop for humans

template.pyhadoop-jobs repo jobs/

Friday, October 4, 13

Page 26: Hadoop for humans

import mrgumbleimport sgstats-hadoop

Friday, October 4, 13

Page 27: Hadoop for humans

Live Demo

Friday, October 4, 13