large-scale web apps @ pinterest

26
Large Scale Web Apps @Pinterest (Powered by Apache HBase) May 5, 2014

Upload: hbasecon

Post on 10-May-2015

908 views

Category:

Software


0 download

DESCRIPTION

Speaker: Varun Sharma (Pinterest) Over the past year, HBase has become an integral component of Pinterest's storage stack. HBase has enabled us to quickly launch and iterate on new products and create amazing pinner experiences. This talk briefly describes some of these applications, the underlying schema, and how our HBase setup stays highly available and performant despite billions of requests every week. It will also include some performance tips for running on SSDs. Finally, we will talk about a homegrown serving technology we built from a mashup of HBase components that has gained wide adoption across Pinterest.

TRANSCRIPT

Page 1: Large-scale Web Apps @ Pinterest

Large Scale Web Apps @Pinterest (Powered by Apache HBase)

May 5, 2014

Page 2: Large-scale Web Apps @ Pinterest

Pinterest is a visual discovery tool for collecting the things you love, and discovering related content along the way.

What is Pinterest ?

Page 3: Large-scale Web Apps @ Pinterest
Page 4: Large-scale Web Apps @ Pinterest

ScaleChallenges @scale • 100s of millions of pins/repins per month • Billions of requests per week • Millions of daily active users • Billions of pins • One of the largest discovery tools on the internet

Page 5: Large-scale Web Apps @ Pinterest

Storage stack @Pinterest!

• MySQL • Redis (persistence and for cache) • MemCache (Consistent Hashing)

App Tier

Manual Sharding

Sharding Logic

Page 6: Large-scale Web Apps @ Pinterest

Why HBase ?!

• High Write throughput - Unlike MySQL/B-Tree, writes don’t ever seek on Disk

• Seamless integration with Hadoop • Distributed operation

- Fault tolerance - Load Balancing - Easily add/remove nodes !

Non-Technical Reasons • Large active community • Large scale online use cases

Page 7: Large-scale Web Apps @ Pinterest

Outline!

• Features powered by HBase • SaaS (Storage as a Service)

- MetaStore - HFile Service (Terrapin)

• Our HBase setup - optimizing for High availability & Low latency

Page 8: Large-scale Web Apps @ Pinterest

Applications/Features!

• Offline - Analytics - Search Indexing - ETL/Hadoop worklows

• Online - Personalized Feeds - Rich Pins - Recommendations

!

Why HBase ?

Page 9: Large-scale Web Apps @ Pinterest

Personalized Feeds

WHY HBASE ? Write Heavy load due to Pin fanout.

Recommended Pins

Users I follow

Page 10: Large-scale Web Apps @ Pinterest

Rich Pins

WHY HBASE ? Negative Hits with Bloom Filters

Page 11: Large-scale Web Apps @ Pinterest

Recommendations

HADOOP 1.0

HBASE + HADOOP 2.0

HADOOP 2.0

WHY HBASE ? Seamless Data Transfer from Hadoop

Generate Recommendations

DistCP Jobs

Serving Cluster

Page 12: Large-scale Web Apps @ Pinterest

SaaS

• Large number of feature requests • 1 Cluster per feature • Scaling with organizational growth • Need for “defensive” multi tenant storage • Previous solutions reaching their limits

Page 13: Large-scale Web Apps @ Pinterest

MetaStore I• Key Value store on top of HBase • 1 HBase Table per Feature with salted keys • Pre split tables • Table level rate limiting (online/offline reads/writes) • No Scan support • Simple client API !

!

string getValue(string feature, string key, boolean online); void setValue(string feature, string key, string value,

boolean online);

Page 14: Large-scale Web Apps @ Pinterest

MetaStore II

MetaStore Thrift Server

Primary HBase Secondary HBase

Clients

Master/Master Replication

Thrift

Salting + Rate Limiting ZooKeeper

Issue Gets/Sets

Notifications

Metastore Config - Rate Limits - Primary Cluster

Page 15: Large-scale Web Apps @ Pinterest

HFile Service (Terrapin)

• Solve the Bulk Upload problem • HBase backed solution

- Bulk upload + major compact - Major compact to delete old data

• Design solution from scratch using mashup of: - HFile - HBase BlockCache - Avoid compactions - Low latency key value lookups

!

!

!

Page 16: Large-scale Web Apps @ Pinterest

High Level Architecture I

!

Client Library /Service

ETL/Batch Jobs Load/Reload

HFile Servers

!

HFiles on Amazon S3

Key/Value Lookups

Multiple HFiles/Server

Page 17: Large-scale Web Apps @ Pinterest

High Level Architecture II• Each HFile server runs 2 processes

- Copier: pulls HFiles from S3 to local disk - Supershard: serves multiple HFile shards to client

• ZooKeeper - Detecting alive servers - Coordinating loading/swapping of new data - Enabling clients to detect availability of new data

• Loader Module (replaces distcp) - Trigger new data copy - Trigger swap through zookeeper - Update ZooKeeper and notify client

• Client library understands sharding • Old data deleted by background process !

!

Page 18: Large-scale Web Apps @ Pinterest

Salient Features

• Multi tenancy through namespacing • Pluggable sharding functions - modulus, range & more • HBase Block Cache • Multiple clusters for redundancy • Speculative execution across clusters for low latency !

!

!

Page 19: Large-scale Web Apps @ Pinterest

Setting up for Success• Many online usecases/applications • Optimize for:

- Low MTTR - high availability - Low latency (performance)

!

!

Page 20: Large-scale Web Apps @ Pinterest

MTTR - I

DEADLIVE STALE20sec 9min 40sec

!

• Stale nodes avoided - As candidates for Reads - As candidate replicas for writes - During Lease Recovery

• Copying of underreplicated blocks starts when a Node is marked as “Dead”

DataNode States

Page 21: Large-scale Web Apps @ Pinterest

MTTR - II

Failure Detection

Lease Recovery

Log Split

Recover Regions

30 sec ZooKeeper session timeout

HDFS 4721

HDFS 3703 + HDFS 3912

< 2 min

!

• Avoid stale nodes at each point of the recovery process • Multi minute timeouts ==> Multi second timeouts

Page 22: Large-scale Web Apps @ Pinterest

Simulate, Simulate, Simulate

Simulate “Pull the plug failures” and “tail -f the logs” • kill -9 both datanode and region server - causes connection refused errors • kill -STOP both datanode and region server - causes socket timeouts • Blackhole hosts using iptables - connect timeouts + “No Route to host” - Most representative of AWS failures

Page 23: Large-scale Web Apps @ Pinterest

PerformanceConfiguration tweaks • Small Block Size, 4K-16K • Prefix compression to cache more - when data is in the key, close to 4X reduction for some data sets • Separation of RPC handler threads for reads vs writes • Short circuit local reads • HBase level checksums (HBASE 5074)

Hardware • SATA (m1.xl/c1.xl) and SSD (hi1.4xl) • Choose based on limiting factor

- Disk space - pick SATA for max GB/$$ - IOPs - pick SSD for max IOPs/$$, clusters with heavy reads or heavy compaction activity

Page 24: Large-scale Web Apps @ Pinterest

Performance (SSDs)

HFile Read Performance • Turn off block cache for Data Blocks, reduce GC + heap fragmentation • Keep block cache on for Index Blocks • Increase “dfs.client.read.shortcircuit.streams.cache.size” from 100 to 10,000 (with short circuit reads) • Approx. 3X improvement in read throughput !

Write Performance • WAL contention when client sets AutoFlush=true • HBase 8755

Page 25: Large-scale Web Apps @ Pinterest

In the Pipeline...!

• Building a graph database on HBase • Disaster recovery - snapshot + incremental backup + restore • Off Heap cache - reduce GC overhead and better use of hardware • Read path optimizations

Page 26: Large-scale Web Apps @ Pinterest

And we are Hiring !!