hbase: extreme makeover
Post on 10-May-2015
791 Views
Preview:
DESCRIPTION
TRANSCRIPT
HBase: Extreme makeover Vladimir Rodionov Hadoop/HBase architect Founder of BigBase.org
HBaseCon 2014 Features & Internal Track
Agenda
About myself • Principal PlaKorm Engineer @Carrier IQ, Sunnyvale, CA • Prior to Carrier IQ, I worked @ GE, EBay, Plumtree/BEA. • HBase user since 2009. • HBase hacker since 2013. • Areas of experTse include (but not limited to) Java,
HBase, Hadoop, Hive, large-‐scale OLAP/AnalyTcs, and in-‐memory data processing.
• Founder of BigBase.org
What?
BigBase = EM(HBase)
BigBase = EM(HBase)
EM(*) = ?
BigBase = EM(HBase)
EM(*) =
BigBase = EM(HBase)
EM(*) =
Seriously?
BigBase = EM(HBase)
EM(*) =
Seriously? for HBase It’s a MulT-‐Level Caching soluTon
Real Agenda • Why BigBase? • Brief history of BigBase.org project • BigBase MLC high level architecture (L1/L2/L3) • Level 1 -‐ Row Cache. • Level 2/3 -‐ Block Cache RAM/SSD. • YCSB benchmark results • Upcoming features in R1.5, 2.0, 3.0. • Q&A
HBase • STll lacks some original BigTable’s features. • STll not able to uTlize efficiently all RAM. • No good mixed storage (SSD/HDD) support. • Single Level Caching only. Simple. • HBase + Large JVM Heap (MemStore) = ?
BigBase • Adds Row Cache and block cache compression. • UTlizes efficiently all RAM (TBs). • Supports mixed storage (SSD/HDD). • Has MulT Level Caching. Not that simple. • Will move MemStore off heap in R2.
BigBase History
Koda (2010) • Koda -‐ Java off heap object cache, similar to Terracola’s BigMemory.
• Delivers 4x Tmes more transacTons … • 10x Tmes beler latencies than BigMemory 4. • Compression (Snappy, LZ4, LZ4HC, Deflate). • Disk persistence and periodic cache snapshots. • Tested up to 240GB.
Karma (2011-‐12) • Karma -‐ Java off heap BTree implementaTon to support fast in memory queries.
• Supports extra large heaps, 100s millions – billions objects.
• Stores 300M objects in less than 10G of RAM. • Block Compression. • Tested up to 240GB. • Off Heap MemStore in R2.
Yamm (2013) • Yet Another Memory Manager. – Pure 100% Java memory allocator. – Replaced jemalloc in Koda. – Now Koda is 100% Java. – Karma is the next (sTll on jemalloc). – Similar to memcached slab allocator.
• BigBase project started (Summer 2013).
BigBase Architecture
MLC – MulT-‐Level Caching HBase 0.94
Disk
JVM
RAM
LRUBlockCache
MLC – MulT-‐Level Caching HBase 0.94
Disk
JVM
RAM
LRUBlockCache
HBase 0.96
Disk
JVM
RAM Bucket cache
One level of caching : • RAM (L2)
MLC – MulT-‐Level Caching HBase 0.94
Disk
JVM
RAM
LRUBlockCache
HBase 0.96
Bucket cache
JVM
RAM
One level of caching : • RAM (L2) • Or DISK (L3)
MLC – MulT-‐Level Caching HBase 0.94
Disk
JVM
RAM
LRUBlockCache
HBase 0.96
Disk
JVM
RAM Bucket cache
BigBase 1.0
Block Cache L3 SSD
JVM
RAM
Row Cache L1
Block Cache L2
MLC – MulT-‐Level Caching HBase 0.94
Disk
JVM
RAM
LRUBlockCache
HBase 0.96
Disk
JVM
RAM Bucket cache
BigBase 1.0
JVM
RAM
Row Cache L1
Block Cache L2
BlockCache L3 Network
MLC – MulT-‐Level Caching HBase 0.94
Disk
JVM
RAM
LRUBlockCache
HBase 0.96
Disk
JVM
RAM Bucket cache
BigBase 1.0
JVM
RAM
Row Cache L1
Block Cache L2
BlockCache L3 memcached
MLC – MulT-‐Level Caching HBase 0.94
Disk
JVM
RAM
LRUBlockCache
HBase 0.96
Disk
JVM
RAM Bucket cache
BigBase 1.0
JVM
RAM
Row Cache L1
Block Cache L2
BlockCache L3 DynamoDB
BigBase Row Cache (L1)
Where is BigTable’s Scan Cache? • Scan Cache caches hot rows data. • Complimentary to Block Cache. • STll missing in HBase (as of 0.98). • It’s very hard to implement in Java (off heap). • Max GC pause is ~ 0.5-‐2 sec per 1GB of heap • G1 GC in Java 7 does not resolve the problem. • We call it Row Cache in BigBase.
Row Cache vs. Block Cache
HFile Block HFile Block HFile Block HFile Block HFile Block
Row Cache vs. Block Cache
Row Cache vs. Block Cache
BLOCK CACHE
ROW CACHE
Row Cache vs. Block Cache
ROW CACHE
BLOCK CACHE
Row Cache vs. Block Cache
ROW CACHE
BLOCK CACHE
BigBase Row Cache • Off Heap Scan Cache for HBase. • Cache size: 100’s of GBs to TBs. • EvicTon policies: LRU, LFU, FIFO, Random.
• Pure 100% -‐ compaTble Java. • Sub-‐millisecond latencies, zero GC. • Implemented as RegionObserver coprocessor.
Row Cache
YAMM Codecs Kryo SerDe
KODA
BigBase Row Cache • Read through cache. • It caches rowkey:CF. • Invalidates key on every mutaTon. • Can be enabled/disabled per table and per table:CF.
• New ROWCACHE alribute. • Best for small rows (< block size)
Row Cache
YAMM Codecs Kryo SerDe
KODA
Performance-‐Scalability • GET (small rows < 100 bytes): 175K operaTons per sec per one Region Server (from cache).
• MULTI-‐GET (small rows < 100 bytes): > 1M records per second (network limited) per one Region Server.
• LATENCY : 99% < 1ms (for GETs) with 100K ops. • VerTcal scalability: tested up to 240GB (the maximum available in Amazon EC2).
• Horizontal scalability: limited by HBase scalability. • No more memcached farms in front of HBase clusters.
BigBase Block Cache (L2, L3)
What is wrong with Bucket Cache? Scalability LIMITED
MulT-‐Level Caching (MLC) NOT SUPPORTED
Persistence (‘ozeap’ mode) NOT SUPPORTED
Low latency apps NOT SUPPORTED
SSD friendliness (‘file’ mode) NOT FRIENDLY
Compression NOT SUPPORTED
What is wrong with Bucket Cache? Scalability LIMITED
MulT-‐Level Caching (MLC) NOT SUPPORTED
Persistence (‘ozeap’ mode) NOT SUPPORTED
Low latency apps NOT SUPPORTED
SSD friendliness (‘file’ mode) NOT FRIENDLY
Compression NOT SUPPORTED
What is wrong with Bucket Cache? Scalability LIMITED
MulT-‐Level Caching (MLC) NOT SUPPORTED
Persistence (‘ozeap’ mode) NOT SUPPORTED
Low latency apps NOT SUPPORTED
SSD friendliness (‘file’ mode) NOT FRIENDLY
Compression NOT SUPPORTED
What is wrong with Bucket Cache? Scalability LIMITED
MulT-‐Level Caching (MLC) NOT SUPPORTED
Persistence (‘ozeap’ mode) NOT SUPPORTED
Low latency apps ?
SSD friendliness (‘file’ mode) NOT FRIENDLY
Compression NOT SUPPORTED
What is wrong with Bucket Cache? Scalability LIMITED
MulT-‐Level Caching (MLC) NOT SUPPORTED
Persistence (‘ozeap’ mode) NOT SUPPORTED
Low latency apps NOT SUPPORTED
SSD friendliness (‘file’ mode) NOT FRIENDLY
Compression NOT SUPPORTED
What is wrong with Bucket Cache? Scalability LIMITED
MulT-‐Level Caching (MLC) NOT SUPPORTED
Persistence (‘ozeap’ mode) NOT SUPPORTED
Low latency apps NOT SUPPORTED
SSD friendliness (‘file’ mode) NOT FRIENDLY
Compression NOT SUPPORTED
Here comes BigBase Scalability HIGH
MulT-‐Level Caching (MLC) SUPPORTED
Persistence (‘ozeap’ mode) SUPPORTED
Low latency apps SUPPORTED
SSD friendliness (‘file’ mode) SSD-‐FRIENDLY
Compression SNAPPY, LZ4, LZHC, DEFLATE
Here comes BigBase Scalability HIGH
MulT-‐Level Caching (MLC) SUPPORTED
Persistence (‘ozeap’ mode) SUPPORTED
Low latency apps SUPPORTED
SSD friendliness (‘file’ mode) SSD-‐FRIENDLY
Compression SNAPPY, LZ4, LZHC, DEFLATE
Here comes BigBase Scalability HIGH
MulT-‐Level Caching (MLC) SUPPORTED
Persistence (‘ozeap’ mode) SUPPORTED
Low latency apps SUPPORTED
SSD friendliness (‘file’ mode) SSD-‐FRIENDLY
Compression SNAPPY, LZ4, LZHC, DEFLATE
Here comes BigBase Scalability HIGH
MulT-‐Level Caching (MLC) SUPPORTED
Persistence (‘ozeap’ mode) SUPPORTED
Low latency apps SUPPORTED
SSD friendliness (‘file’ mode) SSD-‐FRIENDLY
Compression SNAPPY, LZ4, LZHC, DEFLATE
Here comes BigBase Scalability HIGH
MulT-‐Level Caching (MLC) SUPPORTED
Persistence (‘ozeap’ mode) SUPPORTED
Low latency apps SUPPORTED
SSD friendliness (‘file’ mode) SSD-‐FRIENDLY
Compression SNAPPY, LZ4, LZHC, DEFLATE
Here comes BigBase Scalability HIGH
MulT-‐Level Caching (MLC) SUPPORTED
Persistence (‘ozeap’ mode) SUPPORTED
Low latency apps SUPPORTED
SSD friendliness (‘file’ mode) SSD-‐FRIENDLY
Compression SNAPPY, LZ4, LZHC, DEFLATE
Wait, there are more … Scalability HIGH MulT-‐Level Caching (MLC) SUPPORTED Persistence (‘ozeap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-‐FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE Non disk–based L3 cache SUPPORTED RAM Cache opTmizaTon IBCO
Wait, there are more … Scalability HIGH MulT-‐Level Caching (MLC) SUPPORTED Persistence (‘ozeap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-‐FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE Non disk–based L3 cache SUPPORTED RAM Cache opTmizaTon IBCO
BigBase 1.0 vs. HBase 0.98 BigBase HBase 0.98
Row Cache (L1) YES NO
Block Cache RAM (L2) YES (fully off heap) YES (parTally off heap)
Block Cache (L3) DISK YES (SSD-‐ friendly) YES (not SSD – friendly)
Block Cache (L3) NON DISK YES NO
Compression YES NO
RAM Cache persistence YES (both L1 and L2) NO
Low Latency opTmized YES NO
MLC support YES (L1, L2, L3) NO (either L2 or L3)
Scalability HIGH MEDIUM (limited by JVM heap)
YCSB Benchmark
Test setup (AWS)
• HBase 0.94.15 – RS: 11.5GB heap (6GB LruBlockCache on heap); Master: 4GB heap.
• Clients: 5 (30 threads each), collocated with Region Servers.
• Data sets: 100M and 200M. 120GB / 240GB approximately. Only 25% fits in a cache.
• Workloads: 100% read (read100, read200, hotspot100), 100% scan (scan100, scan200) –zipfian.
• YCSB 0.1.4 (modified to generate compressible data). We generated compressible data (with factor of 2.5x) only for scan workloads to evaluate effect of compression in BigBase block cache implementaTon.
• Common – Whirr 0.8.2; 1 (Master + Zk) + 5 RS; m1.xlarge: 15GB RAM, 4 vCPU, 4x420 HDD
• BigBase 1.0 (0.94.15) – RS: 4GB heap (6GB off heap cache); Master: 4GB heap.
• HBase 0.96.2 – RS: 4GB heap (6GB Bucket Cache off heap); Master: 4GB heap.
Test setup (AWS)
• HBase 0.94.15 – RS: 11.5GB heap (6GB LruBlockCache on heap); Master: 4GB heap.
• Clients: 5 (30 threads each), collocated with Region Servers.
• Data sets: 100M and 200M. 120GB / 240GB approximately. Only 25% fits in a cache.
• Workloads: 100% read (read100, read200, hotspot100), 100% scan (scan100, scan200) –zipfian.
• YCSB 0.1.4 (modified to generate compressible data). We generated compressible data (with factor of 2.5x) only for scan workloads to evaluate effect of compression in BigBase block cache implementaTon.
• Common – Whirr 0.8.2; 1 (Master + Zk) + 5 RS; m1.xlarge: 15GB RAM, 4 vCPU, 4x420 HDD
• BigBase 1.0 (0.94.15) – RS: 4GB heap (6GB off heap cache); Master: 4GB heap.
• HBase 0.96.2 – RS: 4GB heap (6GB Bucket Cache off heap); Master: 4GB heap.
Benchmark results (RPS)
11405
6123 5553 6265
4086 3850
15150
3512 2855 3224 1500
709 820 434 228 0
2000
4000
6000
8000
10000
12000
14000
16000
BigBase R1.0 HBase 0.96.2 HBase 0.94.15
read100
read200
hotspot100
scan100
scan200
Average latency (ms)
13 24 27 23 36 39 10 44 52 48 102
223 187
375
700
0
100
200
300
400
500
600
700
800
BigBase R1.0 HBase 0.96.2 HBase 0.94.15
read100
read200
hotspot100
scan100
scan200
95% latency (ms)
51 91 100 88 124 138 38
152 197 175
405
950
729
0 100 200 300 400 500 600 700 800 900 1000
BigBase R1.0 HBase 0.96.2 HBase 0.94.15
read100
read200
hotspot100
scan100
scan200
99% latency (ms)
133 190 213 225
304 338
111
554 632
367
811
0 100 200 300 400 500 600 700 800 900
BigBase R1.0 HBase 0.96.2 HBase 0.94.15
read100
read200
hotspot100
scan100
scan200
YCSB 100% Read
3621
1308
2281
1111 1253 770
0 500 1000 1500 2000 2500 3000 3500 4000
BigBase R1.0 HBase 0.94.15
Per Server
50M 100M 200M
• 50M = 2.77X • 100M = 2.05X • 200M = 1.63X • 50M = 40% fits cache • 100M = 20% fits cache • 200M = 10% fits cache • What is the maximum?
YCSB 100% Read
3621
1308
2281
1111 1253 770
0 500 1000 1500 2000 2500 3000 3500 4000
BigBase R1.0 HBase 0.94.15
Per Server
50M 100M 200M
• 50M = 2.77X • 100M = 2.05X • 200M = 1.63X • 50M = 40% fits cache • 100M = 20% fits cache • 200M = 10% fits cache • What is the maximum? • ~ 75X (hotspot 2.5/100) • 56K (BB) vs. 750 (HBase) • 100% in cache
All data in cache • Setup: BigBase 1.0, 48G
RAM, (8/16) CPU cores – 5 nodes (1+ 4)
• Data set: 200M (300GB) • Test: Read 100%, hotspot • YCSB 0.1.4 – 4 clients • 40 threads – 100K • 100 threads – 168K • 200 threads – 224K • 400 threads -‐ 262K
100,000 168,000 224,000 262,000 99% 1 2 3 7 95% 1 1 2 3 avg 0.4 0.6 0.9 1.5
0 1 2 3 4 5 6 7 8
Latency (m
s)
Hotspot (2.5/100 – 200M data)
All data in cache • Setup: BigBase 1.0, 48G
RAM, (8/16) CPU cores – 5 nodes (1+ 4)
• Data set: 200M (300GB) • Test: Read 100%, hotspot • YCSB 0.1.4 – 4 clients • 40 threads – 100K • 100 threads – 168K • 200 threads – 224K • 400 threads -‐ 262K
100,000 168,000 224,000 262,000 99% 1 2 3 7 95% 1 1 2 3 avg 0.4 0.6 0.9 1.5
0 1 2 3 4 5 6 7 8
Latency (m
s)
Hotspot (2.5/100 – 200M data)
100K ops: 99% < 1ms
What is next? • Release 1.1 (2014 Q2) – Support HBase 0.96, 0.98, trunk – Fully tested L3 cache (SSD)
• Release 1.5 (2014 Q3) – YAMM: memory allocator compacTng mode . – IntegraTon with Hadoop metrics. – Row Cache: merge rows on update (good for counters). – Block Cache: new evicTon policy (LRU-‐2Q). – File read posix_fadvise ( bypass OS page cache). – Row Cache: make it available for server-‐side apps
What is next? • Release 2.0 (2014 Q3) – HBASE-‐5263: Preserving cache data on compacTon – Cache data blocks on memstore flush (configurable). – HBASE-‐10648: Pluggable Memstore. Off heap implementaTon, based on Karma (off heap BTree lib).
• Release 3.0 (2014 Q4) – Real Scan Cache – caches results of Scan operaTons on immutable store files.
– Scan Cache integraTon with Phoenix and with other 3rd party libs provided rich query API for HBase.
Download/Install/Uninstall • Download BigBase 1.0 from www.bigbase.org • InstallaTon/upgrade takes 10-‐20 minutes • BeaTficaTon operator EM(*) is inverTble:
HBase = EM-‐1(BigBase) (the same 10-‐20 min)
Q & A Vladimir Rodionov Hadoop/HBase architect Founder of BigBase.org
HBase: Extreme makeover Features & Internal Track
top related