the meta of hadoop - comad 2012

14
The Meta of Hadoop Joydeep Sen Sarma Ex-Facebook DI Lead, Founder Qubole

Upload: joydeep-sen-sarma

Post on 11-Jun-2015

453 views

Category:

Documents


0 download

DESCRIPTION

What do you talk about to a hall full of database gurus? Instead of science - my talk focused on the art. What made Hadoop successful? What can we learn from it? What principles work well in building software for large scale services? What are some interesting unsolved problems in a world overrun by open-source (and VC investments :-))

TRANSCRIPT

Page 1: The Meta of Hadoop - COMAD 2012

The Meta of Hadoop

Joydeep Sen SarmaEx-Facebook DI Lead, Founder Qubole

Page 2: The Meta of Hadoop - COMAD 2012

Intro

• File/Database Systems developer (ex- Netapp/Oracle)• Yahoo (2005-07), Facebook (2007-11)

• @Facebook:– SysAdmin: operated massive Hadoop/Hive installs– Architect: conceived/wrote Apache Hive. made Hbase@FB

happen– Herded cats: first manager of Data Infra team– IT engineer/DBA: built ETL tools, warehouse/reporting for FB

Virtual Currency

• Founder Qubole Inc. (2011-)

Page 3: The Meta of Hadoop - COMAD 2012

Why Hadoop Succeeded

• Complete Solution and Extensible– useful to Engineers, Data Scientists, Analysts– performance isn’t everything.– Agile – Businesses much faster than before

• Market Dynamics– Captive Super-Reference Customer – Yahoo– Had early market to itself for Long-Time

• Separation of Compute and Storage– Parallel Computing != Database

Page 4: The Meta of Hadoop - COMAD 2012

Why Hadoop Succeeded

DATA

DATA

• Data Consolidation!– Just store everything in HDFS– MR/Hive/Pig can chew

anything

• Lights Out Architecture– Low System Operational Cost– Low Data Management Cost• Don’t need Data Priests

Page 5: The Meta of Hadoop - COMAD 2012

Meta Takeaways

Page 6: The Meta of Hadoop - COMAD 2012

Adaptive Lights-Out Software

• Successful efforts:– Automatic map-join/skew join implementations– Automatic local mode, resource cache

• Failed:– Statistics: alter table analyze table– Pre-Bucketing tables

Learning Frameworks for Systems Software

Page 7: The Meta of Hadoop - COMAD 2012

Adaptive Lights-Out Software

• Caching + Prefetching is Adaptive– Replication is not– Can bridge gap between Compute and Storage

• Page Cache over Disk >> In-memory– Degrades gracefully

• Provide APIs – not packages

Page 8: The Meta of Hadoop - COMAD 2012

Murphy’s Law

• No Trusted Components

• Defend everything– Rate-Limit access to every resource– Log and Monitor everything

• Clear and Overwhelming Force– Oversize it!

• Think QOS from Day-1

Page 9: The Meta of Hadoop - COMAD 2012

Open Source

• Small is Beautiful– Build small easy to use/understand components– Redis!

• Iterative Small Changes– Operators HATE large releases– Hive (2 weeks) vs. Hadoop (2 years?)

Page 10: The Meta of Hadoop - COMAD 2012

Opportunities

Page 11: The Meta of Hadoop - COMAD 2012

Interesting Problems - I

• Collaborative Analysis– Most analysis is Repeat– Tracking and Searching historical analysis

• Consistency Aware Querying– OLAP: Snapshots instead of live tables– OLTP: Lookup stale caches instead of master

Page 12: The Meta of Hadoop - COMAD 2012

Interesting Problems - II

• SQL is Rope– Better than procedural – but still Rope– Higher Level templates: moving averages

• Data = Mutating + Immutable– Immutable data is easy to manage– Cheap: One copy per data center (Facebook

Haystack)

Page 13: The Meta of Hadoop - COMAD 2012

Think Services, not Software

• Software is getting less interesting– Even Distributed Systems Software

• Run/Operate long-running, hot services– Innovate inside this boundary

Page 14: The Meta of Hadoop - COMAD 2012

Q&A