hadoop: the big answer to the big question of the big data

27
BIG THE TO THE OF THE ANSWER QUESTION DATA eleks DevTalks #1 by Victor Haydin

Upload: victor-haydin

Post on 21-Nov-2014

6.507 views

Category:

Technology


1 download

DESCRIPTION

More info: http://www.elekslabs.com/2012/02/devtalks-1-presentations.html Video: http://www.youtube.com/watch?feature=player_embedded&v=GENRle60Elk

TRANSCRIPT

Page 1: Hadoop: the Big Answer to the Big Question of the Big Data

BIGTHETO THEOF THE

ANSWERQUESTIONDATA

eleks DevTalks #1

by Victor Haydin

Page 2: Hadoop: the Big Answer to the Big Question of the Big Data

Gordon Moore

Page 3: Hadoop: the Big Answer to the Big Question of the Big Data

1975 2012Cost of 1 TB storage

$208 000 000 $110

Cost of 1 GFLOPS/s computing facility

$62 000 000 $1.50

Number of network hosts

57 > 1 000 000 000

World’s data amount

~130 GB ~2.9 ZB

Page 4: Hadoop: the Big Answer to the Big Question of the Big Data

1 ZB = 1 000 000 000 000 000 000 000 B(1021)

Page 5: Hadoop: the Big Answer to the Big Question of the Big Data
Page 6: Hadoop: the Big Answer to the Big Question of the Big Data
Page 7: Hadoop: the Big Answer to the Big Question of the Big Data
Page 8: Hadoop: the Big Answer to the Big Question of the Big Data

Commodity Hardware

Page 9: Hadoop: the Big Answer to the Big Question of the Big Data
Page 10: Hadoop: the Big Answer to the Big Question of the Big Data
Page 11: Hadoop: the Big Answer to the Big Question of the Big Data

Wikipedia: “Apache Hadoop is a software framework that supports data-intensive distributed applications”

Page 12: Hadoop: the Big Answer to the Big Question of the Big Data

Main Contributors

Page 13: Hadoop: the Big Answer to the Big Question of the Big Data
Page 14: Hadoop: the Big Answer to the Big Question of the Big Data

HDFS: Hadoop Distributed File System

Hardware Failure

Streaming Data Access

Large Data Sets

Simple Coherency Mode (write-once)

Portability

Page 15: Hadoop: the Big Answer to the Big Question of the Big Data
Page 16: Hadoop: the Big Answer to the Big Question of the Big Data

Moving Computation is cheaper then moving Data

Page 17: Hadoop: the Big Answer to the Big Question of the Big Data

MapReduce

Page 18: Hadoop: the Big Answer to the Big Question of the Big Data

Map(k1,v1) → list(k2,v2)

void map(string key, string value): for each word w in value: yield return KeyValuePair(w, 1);

Reduce(k2, list (v2)) → list(v3)

void reduce(string key, int[] values): int sum = 0; for each pc in values: sum += pc; return KeyValuePair(key, sum);

Page 19: Hadoop: the Big Answer to the Big Question of the Big Data
Page 20: Hadoop: the Big Answer to the Big Question of the Big Data
Page 21: Hadoop: the Big Answer to the Big Question of the Big Data

Demo

Page 22: Hadoop: the Big Answer to the Big Question of the Big Data

Ecosystem

ZooKeeper

Page 23: Hadoop: the Big Answer to the Big Question of the Big Data

3K+ nodes, 36+ PB

45K nodes, 180-200 PB

Page 24: Hadoop: the Big Answer to the Big Question of the Big Data

vspowered by

Page 25: Hadoop: the Big Answer to the Big Question of the Big Data

FutureCore:• HDFS: high-availability and scalability• MapReduce: modularity and alternative ways to perform queriesEcosystem development:• Apache BigTop: consolidation project• HBase, Hive, Pig, ZooKeeper, Avro, Sqoop: stabilizing, interoperability• Incubator: Flume, Ozzie, Whirr

Page 26: Hadoop: the Big Answer to the Big Question of the Big Data

Demo

Page 27: Hadoop: the Big Answer to the Big Question of the Big Data

Q&A