hbase: extreme makeover

HBase: Extreme makeover Vladimir Rodionov Hadoop/HBase architect Founder of BigBase.org

HBaseCon 2014 Features & Internal Track

Agenda

About myself •  Principal PlaKorm Engineer @Carrier IQ, Sunnyvale, CA •  Prior to Carrier IQ, I worked @ GE, EBay, Plumtree/BEA. •  HBase user since 2009. •  HBase hacker since 2013. •  Areas of experTse include (but not limited to) Java,

HBase, Hadoop, Hive, large-‐scale OLAP/AnalyTcs, and in-‐memory data processing.

•  Founder of BigBase.org

BigBase = EM(HBase)

EM(*) = ?

BigBase = EM(HBase)

EM(*) =

BigBase = EM(HBase)

EM(*) =

Seriously?

BigBase = EM(HBase)

EM(*) =

Seriously? for HBase It’s a MulT-‐Level Caching soluTon

Real Agenda •  Why BigBase? •  Brief history of BigBase.org project •  BigBase MLC high level architecture (L1/L2/L3) •  Level 1 -‐ Row Cache. •  Level 2/3 -‐ Block Cache RAM/SSD. •  YCSB benchmark results •  Upcoming features in R1.5, 2.0, 3.0. •  Q&A

HBase •  STll lacks some original BigTable’s features. •  STll not able to uTlize efficiently all RAM. •  No good mixed storage (SSD/HDD) support. •  Single Level Caching only. Simple. •  HBase + Large JVM Heap (MemStore) = ?

BigBase •  Adds Row Cache and block cache compression. •  UTlizes efficiently all RAM (TBs). •  Supports mixed storage (SSD/HDD). •  Has MulT Level Caching. Not that simple. •  Will move MemStore off heap in R2.

BigBase History

Koda (2010) •  Koda -‐ Java off heap object cache, similar to Terracola’s BigMemory.

•  Delivers 4x Tmes more transacTons … •  10x Tmes beler latencies than BigMemory 4. •  Compression (Snappy, LZ4, LZ4HC, Deflate). •  Disk persistence and periodic cache snapshots. •  Tested up to 240GB.

Karma (2011-‐12) •  Karma -‐ Java off heap BTree implementaTon to support fast in memory queries.

•  Supports extra large heaps, 100s millions – billions objects.

•  Stores 300M objects in less than 10G of RAM. •  Block Compression. •  Tested up to 240GB. •  Off Heap MemStore in R2.

Yamm (2013) •  Yet Another Memory Manager. – Pure 100% Java memory allocator. – Replaced jemalloc in Koda. – Now Koda is 100% Java. – Karma is the next (sTll on jemalloc). – Similar to memcached slab allocator.

•  BigBase project started (Summer 2013).

BigBase Architecture

MLC – MulT-‐Level Caching HBase 0.94

LRUBlockCache

HBase 0.96

RAM Bucket cache

One level of caching : •  RAM (L2)

LRUBlockCache

HBase 0.96

Bucket cache

One level of caching : •  RAM (L2) •  Or DISK (L3)

LRUBlockCache

HBase 0.96

RAM Bucket cache

BigBase 1.0

Block Cache L3 SSD

Row Cache L1

Block Cache L2

LRUBlockCache

HBase 0.96

RAM Bucket cache

BigBase 1.0

Row Cache L1

Block Cache L2

BlockCache L3 Network

LRUBlockCache

HBase 0.96

RAM Bucket cache

BigBase 1.0

Row Cache L1

Block Cache L2

BlockCache L3 memcached

LRUBlockCache

HBase 0.96

RAM Bucket cache

BigBase 1.0

Row Cache L1

Block Cache L2

BlockCache L3 DynamoDB

BigBase Row Cache (L1)

Where is BigTable’s Scan Cache? •  Scan Cache caches hot rows data. •  Complimentary to Block Cache. •  STll missing in HBase (as of 0.98). •  It’s very hard to implement in Java (off heap). •  Max GC pause is ~ 0.5-‐2 sec per 1GB of heap •  G1 GC in Java 7 does not resolve the problem. •  We call it Row Cache in BigBase.

Row Cache vs. Block Cache

HFile Block HFile Block HFile Block HFile Block HFile Block

BLOCK CACHE

ROW CACHE

BLOCK CACHE

ROW CACHE

BLOCK CACHE

BigBase Row Cache •  Off Heap Scan Cache for HBase. •  Cache size: 100’s of GBs to TBs. •  EvicTon policies: LRU, LFU, FIFO, Random.

•  Pure 100% -‐ compaTble Java. •  Sub-‐millisecond latencies, zero GC. •  Implemented as RegionObserver coprocessor.

Row Cache

YAMM Codecs Kryo SerDe

BigBase Row Cache •  Read through cache. •  It caches rowkey:CF. •  Invalidates key on every mutaTon. •  Can be enabled/disabled per table and per table:CF.

•  New ROWCACHE alribute. •  Best for small rows (< block size)

Row Cache

YAMM Codecs Kryo SerDe

Performance-‐Scalability •  GET (small rows < 100 bytes): 175K operaTons per sec per one Region Server (from cache).

•  MULTI-‐GET (small rows < 100 bytes): > 1M records per second (network limited) per one Region Server.

•  LATENCY : 99% < 1ms (for GETs) with 100K ops. •  VerTcal scalability: tested up to 240GB (the maximum available in Amazon EC2).

•  Horizontal scalability: limited by HBase scalability. •  No more memcached farms in front of HBase clusters.

BigBase Block Cache (L2, L3)

What is wrong with Bucket Cache? Scalability LIMITED

MulT-‐Level Caching (MLC) NOT SUPPORTED

Persistence (‘ozeap’ mode) NOT SUPPORTED

Low latency apps NOT SUPPORTED

SSD friendliness (‘file’ mode) NOT FRIENDLY

Compression NOT SUPPORTED

Low latency apps ?

Here comes BigBase Scalability HIGH

MulT-‐Level Caching (MLC) SUPPORTED

Persistence (‘ozeap’ mode) SUPPORTED

Low latency apps SUPPORTED

SSD friendliness (‘file’ mode) SSD-‐FRIENDLY

Compression SNAPPY, LZ4, LZHC, DEFLATE

Wait, there are more … Scalability HIGH MulT-‐Level Caching (MLC) SUPPORTED Persistence (‘ozeap’ mode) SUPPORTED Low latency apps SUPPORTED SSD friendliness (‘file’ mode) SSD-‐FRIENDLY Compression SNAPPY, LZ4, LZHC, DEFLATE Non disk–based L3 cache SUPPORTED RAM Cache opTmizaTon IBCO

BigBase 1.0 vs. HBase 0.98 BigBase HBase 0.98

Row Cache (L1) YES NO

Block Cache RAM (L2) YES (fully off heap) YES (parTally off heap)

Block Cache (L3) DISK YES (SSD-‐ friendly) YES (not SSD – friendly)

Block Cache (L3) NON DISK YES NO

Compression YES NO

RAM Cache persistence YES (both L1 and L2) NO

Low Latency opTmized YES NO

MLC support YES (L1, L2, L3) NO (either L2 or L3)

Scalability HIGH MEDIUM (limited by JVM heap)

YCSB Benchmark

Test setup (AWS)

•  HBase 0.94.15 – RS: 11.5GB heap (6GB LruBlockCache on heap); Master: 4GB heap.

•  Clients: 5 (30 threads each), collocated with Region Servers.

•  Data sets: 100M and 200M. 120GB / 240GB approximately. Only 25% fits in a cache.

•  Workloads: 100% read (read100, read200, hotspot100), 100% scan (scan100, scan200) –zipfian.

•  YCSB 0.1.4 (modified to generate compressible data). We generated compressible data (with factor of 2.5x) only for scan workloads to evaluate effect of compression in BigBase block cache implementaTon.

•  Common – Whirr 0.8.2; 1 (Master + Zk) + 5 RS; m1.xlarge: 15GB RAM, 4 vCPU, 4x420 HDD

•  BigBase 1.0 (0.94.15) – RS: 4GB heap (6GB off heap cache); Master: 4GB heap.

•  HBase 0.96.2 – RS: 4GB heap (6GB Bucket Cache off heap); Master: 4GB heap.