hadoop summit 2010 frameworks panel elephant bird

11
Hadoop Frameworks Kevin Weil @kevinweil Twitter

Upload: kevin-weil

Post on 15-Jan-2015

3.846 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Hadoop summit 2010 frameworks panel elephant bird

Hadoop FrameworksKevin Weil @kevinweil

Twitter

Page 2: Hadoop summit 2010 frameworks panel elephant bird

A framework for working with structured data within the Hadoop ecosystem

2

Elephant Bird

Page 3: Hadoop summit 2010 frameworks panel elephant bird

A framework for working with structured data within the Hadoop ecosystem

› Protocol Buffers

› Thrift

› JSON

› W3C Logs

3

Elephant Bird

Page 4: Hadoop summit 2010 frameworks panel elephant bird

A framework for working with structured data within the Hadoop ecosystem

› InputFormats

› OutputFormats

› Hadoop Writables

› Pig LoadFuncs

› Pig StoreFuncs

› Hbase LoadFuncs

4

Elephant Bird

Page 5: Hadoop summit 2010 frameworks panel elephant bird

A framework for working with structured data within the Hadoop ecosystem… plus:

› LZO Compression

› Code Generation

› Hadoop Counter Utilities

› Misc Pig UDFs

5

Elephant Bird

Page 6: Hadoop summit 2010 frameworks panel elephant bird

You should only need to specify the data schema

6

Why?

Page 7: Hadoop summit 2010 frameworks panel elephant bird

You should only need to specify the (flexible, forward-backward compatible, self-documenting) data schema

7

Why?

Page 8: Hadoop summit 2010 frameworks panel elephant bird

You should only need to specify the (flexible, forward-backward compatible, self-documenting) data schema

Everything else can be codegen’d.

8

Why?

Page 9: Hadoop summit 2010 frameworks panel elephant bird

You should only need to specify the (flexible, forward-backward compatible, self-documenting) data schema

Everything else can be codegen’d.

Less Code. Efficient Storage. Focus on the Data.

9

Why?

Page 10: Hadoop summit 2010 frameworks panel elephant bird

You should only need to specify the (flexible, forward-backward compatible, self-documenting) data schema

Everything else can be codegen’d.

Less Code. Efficient Storage. Focus on the Data.

Underlies 20,000 Hadoop jobs at Twitter every day.

10

Why?

Page 11: Hadoop summit 2010 frameworks panel elephant bird

You should only need to specify the (flexible, forward-backward compatible, self-documenting) data schema

Everything else can be codegen’d.

Less Code. Efficient Storage. Focus on the Data.

Underlies 20,000 Hadoop jobs at Twitter every day.

http://github.com/kevinweil/elephant-bird: contributors welcome!

11

Why?