Download - TriHUG January 2012 Talk by Chris Shain
HBase: An Introduction
Christopher ShainSoftware Development LeadTresata
About Tresata
Hadoop for Financial Services First completely Hadoop-powered analytics
application Widely recognized as “Big Data Startup to
Watch” Winner of the 2011 NCTA Award for
Emerging Tech Company of the Year
Based in Charlotte NC We are hiring! [email protected]
About Me
Software Development Lead at Tresata
Background in Financial Services IT End-User Applications Data Warehousing ETL
Email: [email protected] Twitter: @chrisshain
OK, enough about me.
What is HBase? From: http://hbase.apache.org:▪ “HBase is the Hadoop database.”
I think this is a confusing statement
What isn’t HBase?
‘Database’, to many, means: Transactions Joins Indexes
HBase has none of these More on this later
What is HBase, really?
HBase is a data storage platform designed to work hand-in-hand with Hadoop Distributed Failure-tolerant Semi-structured Low latency Strictly consistent HDFS-Aware “NoSQL”
Where did HBase come from?
Need for a low-latency, distributed datastore with unlimited horizontal scale
Hadoop (MapReduce) doesn’t provide low-latency
Traditional RDBMS don’t scale out horizontally
History of HBase
November 2006: Google BigTable whitepaper published: http://research.google.com/archive/bigtable.html
February 2007: Initial HBase Prototype October 2007: First ‘usable’ HBase January 2008: HBase becomes Apache
subproject of Hadoop March 2009: HBase 0.20.0 May 10th, 2010: HBase becomes
Apache Top Level Project
Common Use Cases
Web Indexing
Social Graph
Messaging (Email etc.)
HBase Architecture
HBase is written almost entirely in Java JVM clients are first-class citizensHBase Master
RegionServer
RegionServer
RegionServer
RegionServer
Proxy(Thrift or
REST)
JVM Clients
Non-JVM Clients
Data Organization in HBase All data is stored in Tables Table rows have exactly one Key, and all
rows in a table are physically ordered by key Tables have a fixed number of Column
Families (more on this later!) Each row can have many Columns in each
column family Each column has a set of values, each with
a timestamp Each row:family:column:timestamp
combination represents coordinates for a Cell
What is a Column Family?
Defined by the Table A Column Family is a group of related
columns with it’s own name All columns must be in a column
family Each row can have a completely
different set of columns for a column family
Row: Column Family:
Columns:
Chris
Friends
Friends:Bob
Bob Friends:Chris Friends:James
James Friends:Bob
Rows and Cells
Not exactly the same as rows in a traditional RDBMS Key: a byte array (usually a UTF-8 String) Data: Cells, qualified by column family, column,
and timestamp (not shown here)Row Key:
Column Families :(Defined by the Table)
Columns:(Defined by the Row)(May vary between rows)
Cells:(Created with Columns)
Chris
Attributes Attributes:Age 30
Attributes:Height 68
Friends Friends:Bob 1 (Bob’s a cool guy)
Friends:Jane 0 (Jane and I don’t get along)
Cell Timestamps
All cells are created with a timestamp
Column family defines how many versions of a cell to keep
Updates always create a new cell Deletes create a tombstone (more
on that later) Queries can include an “as-of”
timestamp to return point-in-time values
Deletes
HBase deletes are a form of write called a “tombstone”
Indicates that “beyond this point any previously written value is dead”
Old values can still be read using point-in-time queries
Timestamp Write Type Resulting Value
Point-In-TimeValue “as of” T+1
T + 0 PUT (“Foo”) “Foo” “Foo”
T + 1 PUT (“Bar”) “Bar” “Bar”
T + 2 DELETE <none> “Bar”
T + 3 PUT (“Foo Too”)
“Foo Too” “Bar”
Use Case – Time Series
Requirement: Store real-time stock tick data
Requirement: Accommodate many simultaneous readers & writers
Requirement: Allow for reading of current price for any ticker at any point in time
Ticker Timestamp Sequence
Bid Ask
IBM 09:15:03:001 1 179.16 179.18
MSFT 09:15:04:112 2 28.25 28.27
GOOG 09:15:04:114 3 624.94 624.99
IBM 09:15:04:155 4 179.18 179.19
Time Series Use Case – RDBMS Solution:
Keys Column DataType
PK
Ticker Varchar
Timestamp DateTime
Sequence_Number Integer
Bid_Price Decimal
Ask_Price Decimal
Keys Column DataType
PK Ticker Varchar
Bid_Price Decimal
Ask_Price Decimal
Latest Prices:
Historical Prices:
Time Series Use Case – HBase Solution:
Row Key Family:Column
[Ticker].[Rev_Timestamp].[Rev_Sequence_Number]
Prices:Bid
Prices:Ask
HBase throughput will scale linearly with # of nodes
No need to keep separate “latest price” table A scan starting at “ticker” will always
return the latest price row
HBase Data Partitioning
HBase scales horizontally
Needs to split data over many RegionServers
Regions are the unit of scale
Regions & RegionServers
All HBase tables are broken into 1 or more regions
Regions have a start row key and an end row key
Each Region lives on exactly one RegionServer
RegionServers may host many Regions
When RegionServers die, Master detects this and assigns Regions to other RegionServers
Region Distribution
Table Region Region
Server
Users
“Aaron” – “George” Node01
“George” – “Matthew”
Node02
“Matthew” – “Zachary”
Node01
Row Keys in Region“Aaron” – “George”
“Aaron”
“Bob”
“Chris”
Row Keys in Region“George” – “Matthew”
“George”Row Keys in Region“Matthew” – “Zachary”
“Matthew”
“Nancy”
“Zachary”
-META- Table
“Users” Table
HBase Architecture
Deceptively simple
HBase Architecture
HBase Master
RegionServer
RegionServer
RegionServer
RegionServer
Proxy(Thrift or
REST)
JVM Clients
Non-JVM Clients
Backup HBase Master
ZooKeeper Cluster
Component Roles & Responsibilities
ZooKeeper Keeps track of which server is the
current HBase Master
HBase Master Keeps track of Region/RegionServer
mapping Manages the -ROOT- and .META. tables Responsible for updating ZooKeeper
when these change
More Component Roles & Responsibilities
RegionServer Stores table regions
Clients Need to be smarter than RDBMS clients First connect to ZooKeeper to get
RegionServer for a given Table/Region Then connect directly to RegionServer to
interact with the data All connections over Hadoop RPC – non-
JVM clients use proxy (Thrift or REST (Stargate))
HBase Catalog Tables
-ROOT- Table
.META.[region]
info:regioninfo
info:server
info:serverstartcode
.META. Table
[table],[region start key],[region id]
info:regioninfo
info:server
info:serverstartcode
Points to DataNode hosting .META. region.…
Regular User Table
… whatever … …
Points to DataNode hosting table region.
HBase Master Reliability
HBase Master is not necessarily a single point of failure (SPOF) Multiple masters can be running Current ‘active’ Master controlled via ZooKeeper Make sure you have enough ZooKeeper nodes!
Master is not needed for client connectivity Clients connect directly to ZooKeeper to find
Regions Everything Master does can be put off until one
is elected
ZooKeeper Quorum
HBase Master & ZooKeeper
ZooKeeper Node
ZooKeeper Node
ZooKeeper Node
HBase Master
(Current)
HBase Master
(Standby)
HBase Master
(Standby)
HBase RegionServer Reliability HBase tolerates RegionServer failure
when running on HDFS Data is replicated by HDFS (dfs.replication
setting) Lots of issues around fsync, failure before
data is flushed - some probably still not fixed Thus, data can still be lost if node fails after a
write
HDFS NameNode is still SPOF, even for HBase
RegionServer Write Ahead Log Similar to log in many RDBMS
All operations by default written to log before considered ‘committed’ (can be overridden for ‘disposable fast writes’)
Log can be replayed when region is moved to another RegionServer
One WAL per RegionServer
Writes
WAL
MemStore
HFile
HFile
Flushed periodically (10s by default)
Flushed when MemStore gets too big
HBase RegionServer & HDFSRegionServer
Region
Log
HDFS ClientBlock
Block
Block
Block
HDFS DataNode
Block
Block
Block
Block
Store
StoreFileHFile M
em
Sto
re
Store
StoreFileHFile M
em
Sto
re
Store
StoreFileHFile M
em
Sto
re
Block
Block
Block
HDFS DataNode
Block
Block
Block
Block
HDFS DataNode
Block
Block
Block
Block
HDFS DataNode
Block
Block
Block
Block
HBase data locality on HDFS
A RegionServer is not guaranteed to be on the same physical node as it’s data
Compaction causes RegionServer to write preferentially to local node But this is a function of HDFS Client, not
HBase
Flushes and Compactions All data is in memory initially (memstore) HBase is a write-only system
Modifications and deletes are just writes with later timestamps
Function of HDFS being append-only Eventually old writes need to be
discarded 2 Types of Compactions:
Minor Major
Flushes
All HBase edits are initially stored in memory (memstore)
Flushes occur when memstore reaches a certain size By default 67,108,864 bytes Controlled by
hbase.hregion.memstore.flush.size configuration property
Each flush creates a new HFile
Minor Compaction
Triggered when a certain number of HFiles are created for a given Region Store (+ some other conditions) By default 3 HFiles Controlled by hbase.hstore.compactionThreshold configuration
property
Compacts most recent HFiles into one By default, uses RegionServer-local HDFS node
Does not eliminate deletes Only touches most recent HFiles
NOTE: All column families are compacted at once (this might change in the future)
Major Compaction
Triggered every 24 hours (with random offset) or manually Large HBase installations usually leave
this for manual operators
Re-writes all HFiles into one
Processes deletes Eliminates tombstones Erases earlier entries
Guarantees
HBase does not have transactions However:
Row-level modifications are atomic: All modifications to a row will succeed or fail as a unit
Gets are consistent for a given point in time▪ But Scans may return 2 rows from different
points in time All data read has been ‘durably stored’▪ Does NOT mean flushed to disk- can still be
lost!
Appendix
Do's and Don'ts: Schema Design DO: Design your schema for linear range scans
on your most common queries. Scans are the most efficient way to query a lot
of rows quickly
DON’T: Use more than 2 or 3 column families. Some operations (flushing and compacting)
operate on the whole row
DO: Be aware of the relative cardinality of column families Wildly differing cardinality leads to sparsity and
bad scanning results.
Do's and Don'ts: Schema Design DO: Be mindful of the size of your row and
column keys They are used in indexes and queries, can
be quite large!
DON’T: Use monotonically increasing row keys Can lead to hotspots on writes
DO: Store timestamp keys in reverse Rows in a table need to be read in order,
usually you want most recent
Do's and Don'ts: Queries
DO: Query single rows using exact-match on key (Gets) or Scans for multiple rows Scans allow efficient I/O vs. multiple gets
DON’T: Use regex-based or non-prefix column filters Very inefficient
DO: Tune the scan cache and batch size parameters Drastically improves performance when returning
lots of rows
HBase Architecture
Deceptively simpleHBase Master
RegionServer
RegionServer
RegionServer
RegionServer
Proxy(Thrift or
REST)
JVM Clients
Non-JVM Clients
ZooKeeper Quorum
HBase Master & ZooKeeper
ZooKeeper Node
ZooKeeper Node
ZooKeeper Node
HBase Master
(Current)
HBase Master
(Standby)
HBase Master
(Standby)
HBase RegionServer & HDFSRegionServer
Region
Log
HDFS ClientBlock
Block
Block
Block
HDFS DataNode
Block
Block
Block
Block
Store
StoreFileHFile M
em
Sto
re
Store
StoreFileHFile M
em
Sto
re
Store
StoreFileHFile M
em
Sto
re
Block
Block
Block
HDFS DataNode
Block
Block
Block
Block
HDFS DataNode
Block
Block
Block
Block
HDFS DataNode
Block
Block
Block
Block
Use Case – User PreferencesRequirement: Store an arbitrary set of
preferences for all usersRequirement: Each user may choose to
store a different set of preferencesRequirement: Preferences may be of
different data types (Strings, Integers, etc)Requirement: Developers will add new
preference options all the time, so we shouldn’t need to modify the database structure when adding them
User Preferences Use Case – Relational Solution:
One possible RDBMS solution: Key/Value table All values as strings Flexible, but wastes space
Keys: Column: Data Type:
PKUserID Int
PreferenceName Varchar
PreferenceValue Varchar
User Preferences Use Case – HBase Solution:
Store all preferences in the Preferences column family
Preference name as column name, preference value as (serialized) byte array: HBase client library provides methods
for serializing many common data typesRow Key: Family: Column: Value:
Chris PreferencesAge 30
Hometown “Mineola, NY”
Joe Preferences Birthdate 11/13/1987