hadoop summit 2010 frameworks panel elephant bird
DESCRIPTION
TRANSCRIPT
Hadoop FrameworksKevin Weil @kevinweil
A framework for working with structured data within the Hadoop ecosystem
2
Elephant Bird
A framework for working with structured data within the Hadoop ecosystem
› Protocol Buffers
› Thrift
› JSON
› W3C Logs
3
Elephant Bird
A framework for working with structured data within the Hadoop ecosystem
› InputFormats
› OutputFormats
› Hadoop Writables
› Pig LoadFuncs
› Pig StoreFuncs
› Hbase LoadFuncs
4
Elephant Bird
A framework for working with structured data within the Hadoop ecosystem… plus:
› LZO Compression
› Code Generation
› Hadoop Counter Utilities
› Misc Pig UDFs
5
Elephant Bird
You should only need to specify the data schema
6
Why?
You should only need to specify the (flexible, forward-backward compatible, self-documenting) data schema
7
Why?
You should only need to specify the (flexible, forward-backward compatible, self-documenting) data schema
Everything else can be codegen’d.
8
Why?
You should only need to specify the (flexible, forward-backward compatible, self-documenting) data schema
Everything else can be codegen’d.
Less Code. Efficient Storage. Focus on the Data.
9
Why?
You should only need to specify the (flexible, forward-backward compatible, self-documenting) data schema
Everything else can be codegen’d.
Less Code. Efficient Storage. Focus on the Data.
Underlies 20,000 Hadoop jobs at Twitter every day.
10
Why?
You should only need to specify the (flexible, forward-backward compatible, self-documenting) data schema
Everything else can be codegen’d.
Less Code. Efficient Storage. Focus on the Data.
Underlies 20,000 Hadoop jobs at Twitter every day.
http://github.com/kevinweil/elephant-bird: contributors welcome!
11
Why?