hadoop @ facebook - qcon tokyo 2016...

Hadoop @ Facebook

Usage, Issues, Solutions & Beyond

Joydeep Sen Sarma

About Me Facebook: Ran/Managed Hadoop ~ 3 years

Co-Author Hive

Mentor/PM Hadoop Fair-Scheduler

Used Hadoop/Hive (as Warehouse/ETL Dev)

Re-wrote significant chunks of Hadoop Job Scheduling (incl. Corona)

Qubole: Running World’s largest Hadoop and Presto clusters on AWS

2007

2014

Hadoop @ FB

2007-2011

Use Cases

• Reporting (SQL)

- Complex ETL (Joins)

- Simple Summaries (Counters)

• Index Generation (Map-Reduce/Java)

• Data Mining (C++)

• Adhoc Queries (SQL)

• Data Driven Applications (PYMK)

Friend Map

Success Factors: Hadoop

• Scales Linearly, Centrally Managed

• Operationally Simple (15 people – 25 PB)

• Can do almost anything ..

Success Factors: Hive

• SQL is easy (add scripts for map-reduce)

• Browser Based Interface

Success Factors: Scribe

• Log data using Scribe from any application

• Simple to add attributes to user page views

• JSON encoded logging allows schema evolution

Issues

#1: Hadoop Scheduling

• “Have you no Hadoop Etiquettes?” (c 2007) (reducer count capped in response)

• User takes down entire Cluster (OOM) (c 2007-09)

• Bad Job slows down entire Cluster (c 2009)

• Steady State Latencies get intolerable (c 2010-)

• ”How do I know I am getting my fair share?” (c 2011)

• “Too few reducer slots, cluster idle” (c 2013)

#2: Hive Latency

• Queries take at least 30s

- Users expect MySql like latency

• Hive is too slow for BI tools like Tableau

• Difficult to write/test complex SQL applications

• Not possible to do drill downs into detail data

- No indices and latency too high

#3: Real Time Metrics

• Nightly or hourly aggregations not enough

- Need Real-Time

- Driven by Advertiser requirements

• MySql not great to store long-tail summaries

#3: Cluster Too Big

• Single HDFS runs into space/power/network limits

• Hard to do Queries across Data Centers

#4: Not fit for all applications

• In-memory grid better for Graph applications

Solutions

Hadoop Scheduling - Isolation

• Hadoop Fair-Scheduler

- Fix Preemption and Speculation

• Separate clusters for Production Pipelines

• Platinum, Gold, Silver

• Prevent memory over-consumption

- Slaves, JobTracker, NameNode

Hadoop Scheduling - Corona

• Overlaps with Apache YARN

• Separate Cluster Resource Allocation

- from Map-Reduce Task Scheduling

• Push Tasks to Slaves

- Reduced Latency and Idle Time

• Highly Concurrent Design

- Optimistic Locking

- Fast and Scales to thousands of nodes

Fast Queries – Presto! Hive Presto

Uses Hadoop MR for execution Pipelined execution model

Spills intermediate data to FS Intermediate data in memory

Can tolerate failures Does not tolerate failures

Automatic join ordering User-specified join ordering

Can handle joins of two large tables

One table needs to fit in memory

Supports grouping sets Does not support GS

Plug in custom code Cannot plug in custom code

More data types Limited data types

Presto vs Hive Perf

- Presto is 2.5-7x faster

- Some queries run out of memory

Other FB Projects

• Calligraphus (Scribe++)

• Puma

- Real-Time Processing

- Hbase for storage

• Apache Giraph

- In-Memory graph computations

Scaling Issues – Cloud!

• Easily start large Hadoop clusters

- Isolate large ad-hoc workloads from Production

- Qubole auto-scales Hadoop Up/Down

• AWS S3 is much easier to use than HDFS

- also: GCE and Azure Storage

- Use HDFS as Cache (Qubole Columnar Cache)

insert overwrite table dest

select … from ads join

campaigns on …group by …;

21

Auto-Scaling

StarCluster

Map Tasks

ReduceTasks

Demand

Supply

AWS / GCE

Progress

Master

Slaves

Job Tracker

Hive Latency - Qubole

• Works with Small Files!

– Faster Split Computation (8x)

–Prefetching S3 files (30%)

• Direct writes to S3

–HIVE-1620

• Multi-Tenant Hive Server

• JVM Reuse!

– Fix re-entrancy issues

–1.2-2x speedup

• Columnar Cache

–Use HDFS as cache for S3

–Upto 5x faster for JSON data

Get a smartphone/tablet stand

when you exchange your business card

Thank You!

[email protected]

Appendix

In Production @

hadoop @ facebook - qcon tokyo 2016...

Documents