large-scale web apps @ pinterest
DESCRIPTION
Speaker: Varun Sharma (Pinterest) Over the past year, HBase has become an integral component of Pinterest's storage stack. HBase has enabled us to quickly launch and iterate on new products and create amazing pinner experiences. This talk briefly describes some of these applications, the underlying schema, and how our HBase setup stays highly available and performant despite billions of requests every week. It will also include some performance tips for running on SSDs. Finally, we will talk about a homegrown serving technology we built from a mashup of HBase components that has gained wide adoption across Pinterest.TRANSCRIPT
Large Scale Web Apps @Pinterest (Powered by Apache HBase)
May 5, 2014
Pinterest is a visual discovery tool for collecting the things you love, and discovering related content along the way.
What is Pinterest ?
ScaleChallenges @scale • 100s of millions of pins/repins per month • Billions of requests per week • Millions of daily active users • Billions of pins • One of the largest discovery tools on the internet
Storage stack @Pinterest!
• MySQL • Redis (persistence and for cache) • MemCache (Consistent Hashing)
App Tier
Manual Sharding
Sharding Logic
Why HBase ?!
• High Write throughput - Unlike MySQL/B-Tree, writes don’t ever seek on Disk
• Seamless integration with Hadoop • Distributed operation
- Fault tolerance - Load Balancing - Easily add/remove nodes !
Non-Technical Reasons • Large active community • Large scale online use cases
Outline!
• Features powered by HBase • SaaS (Storage as a Service)
- MetaStore - HFile Service (Terrapin)
• Our HBase setup - optimizing for High availability & Low latency
Applications/Features!
• Offline - Analytics - Search Indexing - ETL/Hadoop worklows
• Online - Personalized Feeds - Rich Pins - Recommendations
!
Why HBase ?
Personalized Feeds
WHY HBASE ? Write Heavy load due to Pin fanout.
Recommended Pins
Users I follow
Rich Pins
WHY HBASE ? Negative Hits with Bloom Filters
Recommendations
HADOOP 1.0
HBASE + HADOOP 2.0
HADOOP 2.0
WHY HBASE ? Seamless Data Transfer from Hadoop
Generate Recommendations
DistCP Jobs
Serving Cluster
SaaS
• Large number of feature requests • 1 Cluster per feature • Scaling with organizational growth • Need for “defensive” multi tenant storage • Previous solutions reaching their limits
MetaStore I• Key Value store on top of HBase • 1 HBase Table per Feature with salted keys • Pre split tables • Table level rate limiting (online/offline reads/writes) • No Scan support • Simple client API !
!
string getValue(string feature, string key, boolean online); void setValue(string feature, string key, string value,
boolean online);
MetaStore II
MetaStore Thrift Server
Primary HBase Secondary HBase
Clients
Master/Master Replication
Thrift
Salting + Rate Limiting ZooKeeper
Issue Gets/Sets
Notifications
Metastore Config - Rate Limits - Primary Cluster
HFile Service (Terrapin)
• Solve the Bulk Upload problem • HBase backed solution
- Bulk upload + major compact - Major compact to delete old data
• Design solution from scratch using mashup of: - HFile - HBase BlockCache - Avoid compactions - Low latency key value lookups
!
!
!
High Level Architecture I
!
Client Library /Service
ETL/Batch Jobs Load/Reload
HFile Servers
!
HFiles on Amazon S3
Key/Value Lookups
Multiple HFiles/Server
High Level Architecture II• Each HFile server runs 2 processes
- Copier: pulls HFiles from S3 to local disk - Supershard: serves multiple HFile shards to client
• ZooKeeper - Detecting alive servers - Coordinating loading/swapping of new data - Enabling clients to detect availability of new data
• Loader Module (replaces distcp) - Trigger new data copy - Trigger swap through zookeeper - Update ZooKeeper and notify client
• Client library understands sharding • Old data deleted by background process !
!
Salient Features
• Multi tenancy through namespacing • Pluggable sharding functions - modulus, range & more • HBase Block Cache • Multiple clusters for redundancy • Speculative execution across clusters for low latency !
!
!
Setting up for Success• Many online usecases/applications • Optimize for:
- Low MTTR - high availability - Low latency (performance)
!
!
MTTR - I
DEADLIVE STALE20sec 9min 40sec
!
• Stale nodes avoided - As candidates for Reads - As candidate replicas for writes - During Lease Recovery
• Copying of underreplicated blocks starts when a Node is marked as “Dead”
DataNode States
MTTR - II
Failure Detection
Lease Recovery
Log Split
Recover Regions
30 sec ZooKeeper session timeout
HDFS 4721
HDFS 3703 + HDFS 3912
< 2 min
!
• Avoid stale nodes at each point of the recovery process • Multi minute timeouts ==> Multi second timeouts
Simulate, Simulate, Simulate
Simulate “Pull the plug failures” and “tail -f the logs” • kill -9 both datanode and region server - causes connection refused errors • kill -STOP both datanode and region server - causes socket timeouts • Blackhole hosts using iptables - connect timeouts + “No Route to host” - Most representative of AWS failures
PerformanceConfiguration tweaks • Small Block Size, 4K-16K • Prefix compression to cache more - when data is in the key, close to 4X reduction for some data sets • Separation of RPC handler threads for reads vs writes • Short circuit local reads • HBase level checksums (HBASE 5074)
Hardware • SATA (m1.xl/c1.xl) and SSD (hi1.4xl) • Choose based on limiting factor
- Disk space - pick SATA for max GB/$$ - IOPs - pick SSD for max IOPs/$$, clusters with heavy reads or heavy compaction activity
Performance (SSDs)
HFile Read Performance • Turn off block cache for Data Blocks, reduce GC + heap fragmentation • Keep block cache on for Index Blocks • Increase “dfs.client.read.shortcircuit.streams.cache.size” from 100 to 10,000 (with short circuit reads) • Approx. 3X improvement in read throughput !
Write Performance • WAL contention when client sets AutoFlush=true • HBase 8755
In the Pipeline...!
• Building a graph database on HBase • Disaster recovery - snapshot + incremental backup + restore • Off Heap cache - reduce GC overhead and better use of hardware • Read path optimizations
And we are Hiring !!