strata + hadoop world 2012: apache hbase features for the enterprise

37
Headline Goes Here Speaker Name or Subhead Goes Here DO NOT USE PUBLICLY PRIOR TO 10/23/12 Apache HBase Features for the Enterprise Jonathan Hsieh | @jmhsieh Software Engineer at Cloudera / HBase PMC Member October 2012

Upload: cloudera-inc

Post on 24-May-2015

1.537 views

Category:

Documents


0 download

DESCRIPTION

Apache HBase is a distributed data store that is in production today at many enterprises and sites serving large volumes of near-real-time random-accesses. As Apache HBase matures, the community has augmented the system with new features that many enterprise consider to be hard requirements. We will discuss how the upcoming HBase 0.96 release addresses many of these shortcomings by introducing new features that will help the administrator monitor and control access to the system, and new mechanisms to minimize downtime due to expected and unexpected outages.

TRANSCRIPT

Page 1: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

Headline Goes HereSpeaker Name or Subhead Goes Here

DO NOT USE PUBLICLY PRIOR TO 10/23/12

Apache HBase Features for the EnterpriseJonathan Hsieh | @jmhsiehSoftware Engineer at Cloudera / HBase PMC Member October 2012

Page 2: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 20122

Who Am I?

• Cloudera:• Software Engineer• Apache HBase committer / PMC • Apache Flume founder / PMC• Apache Sqoop committer / PMC

• U of Washington:• Research in Distributed Systems

Page 3: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 20123

What is Apache HBase?

Apache HBase is an open source, distributed,

scalable, consistent, low latency, random access non-relational database built on Apache Hadoop

ZK HDFS

App MR

Page 4: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

9/23/12 Strangeloop 20124

HBase provides Low-latency Random Access

• Writes: • 1-3ms, 1k-10k writes/sec per node

• Reads: • 0-3ms cached, 10-30ms disk• 10-40k reads / second / node from

cache• Cell size:

• 0-3MB preferred• Read, write and insert data anywhere in

the table• No sequential write limitations

0000000000

1111111111

2222222222

3333333333

4444444444

5555555555

6666666666

7777777777

1

23

4

5

Page 5: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 20125

HBase On a Cluster

Name node

Name node

Rack 1

Rack 2

HDFS NameNodesHBase Masters

ZooKeeper Quorum

Slave Boxes (DN + RS)

Page 6: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 20126

Production Apache HBase Applications

• Inbox• Storage• Web• Search• Analytics• Monitoring

More Case Studies at http://www.hbasecon.com/agenda/

Page 7: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 20127

Production Systems Need to Avoid Risk

• Unfortunately, all things can fail.

• Enterprises need to minimize risk.• Understand potential data loss scenarios• Understand potential unavailability scenarios• Must have a disaster recovery story

• Downtime, data loss == risk

• Let’s talk about how HBase deals with:• Risks from within the cluster• Risks from outside the cluster• Risks posed by Users

• Goal: Remove or reduce negative impact of potential risks

Page 8: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

“ ”

Expect the best, plan for the worst, and prepare to be surprised.

Page 9: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

Risks from within the clusterHosts and Services

Page 10: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201210

Causes of HBase Downtime within the cluster

• Unplanned Maintenance• Hardware failures• Software errors• Human error

• Planned Maintenance• Upgrades• Migrations

Goal: Reduce downtime from hours to minutes to seconds.

Page 11: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201211

Unplanned Downtime

• Two sources of unavailability• Detection time• Recovery time

Detection Time Recovery Timetime

Failure EventService realizes there is a

problem starts fixing.

Service is restored.

Service still thinks we are ok

Page 12: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201212

Reduce downtime by speeding up recovery time

• Distributed log splitting (0.92)• Automated metadata repairs with hbck (0.92)• Enable of writes while recovering from failure (0.96)

Detection Time time

Failure EventService realizes there is a

problem starts fixing.

Service is restored.

Service still thinks we are ok

Page 13: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201213

Reduce downtime by speeding up detection

• Proactively notify to recover from process failures quickly

time

Failure Event

Service still thinks we are ok

Service realizes there is a problem starts fixing.

Service is restored.

0.92/0.94 All Master Failure Detection 180s Some Region Server Failure detection 180s

0.96 Master process failure detection 0-1s

Region Server Failure detection0-1s

Page 14: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201214

Manual Problem detection: Metrics

• Goal: Pinpoint root causes of problems faster

• Take a baseline of your system in steady-state

• Anomalies like spikes or dips from baseline can indicate problems

• Ex: Slow Query Logging• Integrates with existing infrastructure via

JMX or use with Ganglia, Cloudera Manager

Page 15: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201215

Metrics from all levels of the system

• HBase Region Servers• Operations / sec • Get / put latencies (0.92)• Per CF metrics (0.94)• Per Region metrics (0.94)

• HBase Master• RIT metrics• Replication Metrics

• System/JVM• GC, RPC metrics

HDFSDataNode

JVMJVM

Host

DisksMemoryCPUs Network

Region serverRegion

MasterJVMHost

DisksMemoryCPUs Network

CF CF

Page 16: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201216

Page 17: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201217

Eliminate HBase downtime

• Highly Available stack: HDFS (2.0) / ZK / HBASE • Client Cross-version wire compatibility (0.96)• Rolling Restarts• Online Schema Change (experimental)

time

Maintenance Begins

Service remains online, slightly degraded

Full service is restored.

Maintenance Time

Page 18: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201218

High Availability: HBase + HDFS + ZK

Name node

Name node

Rack 1

Rack 2

HDFS NameNodesHBase Masters

ZooKeeper Quorum

Slave Boxes (DN + RS)

Page 19: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201219

Wire Compatibility

• Reduces downtime due to planned maintenance• Wire compatibility + Extensible Data formats

• Allow for forwards and backwards compatibility• Older clients can talk to newer servers and visa versa

• Rolling upgrade• Upgrade a single node at a time while system runs

• Allows API and changes while guaranteeing wire compatibility between different minor versions

• HDFS client-server compatibility between Major Versions

ZK HDFS

App MR

Page 20: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

Risks from outside the clusterDatacenters and Disaster Recovery

Page 21: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201221

External Risks

Page 22: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201222

Geographically separated copies of data

Page 23: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201223

Strategy: HBase-Supported Batch Backups

• Export / Dist CP / Import• 3 batch MR jobs• Several extra copies of data• High latency (hours)

• Copy Table• 1 MR Job• Single copy of data• High Latency (hours)• Incremental table copies

ExportMR Job

ImportMR Job

Dist CP MR Job

Copy Table MR Job

Page 24: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201224

Strategy: Custom Application-managed Replication

• Application writes to two instances of HBase• Low Latency• Adds complexity • Inefficient

App App

Page 25: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201225

Strategy: HBase replication (0.92+)

• HBase Asynchronously copy edit logs to other clusters.• Replication lag measured in seconds• Automatically catch up from failures.• Eventually consistent• Efficient batching

• Master-slave† (0.90)• Master-master (0.92)

Replication

logslogs

logs logs

Page 26: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201226

Master-Master Replication

logs logs

logs

Replicating data reduces chances of data loss.

Page 27: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

Risks from Users“Problem exists between keyboard and chair.”

Page 28: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201228

Oops… User Error

• How do we prevent user error?• How do we recover from user error?

Recovery Timetime

User Error:drop ‘table’

Service is restored, major data loss

Service is down!

Page 29: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201229

Prevent user mistakes: User-level Security

• Authentication:• Ensure the identity of the services or users that are communicating

• Access Control:• Ensure user has permission to execute table data operations

time

User Error:drop ‘table’

Operation rejected, insufficient permissions.

Page 30: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201230

HBase User-level Security

• Based on Kerberos for HBase, HDFS and Zookeeper• Grant privileges to users• Revoke privileges from users.• Column Family and Table granularity

• Confidentiality:• Ensure information is only seen by intended users.

• Audit Trails:• Track which users performed particular operations

Page 31: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201231

Recovering from User Mistakes: Table Snapshots

• Snapshot the state of a table at a certain moment in time• Restore it or Clone it later, creating a new read write table• Export it to another cluster with minimal impact on HBase

time

User Error:drop ‘table’

Service is restored, minor data loss

Periodic snapshot

2

Service is down!

restore

1Periodic snapshot

Page 32: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201232

Table Snapshots (0.96+)

• Under development, slated for HBase 0.96• Multiple snapshot flavors planned

• Offline snapshots• Online Snapshots

• Snapshot uses• Recover from application or user error.• Application experimentation (no need to

spin up another cluster for replication)• Use MR directly on snapshot files

Page 33: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

ConclusionsHBase for the enterprise.

Page 34: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201234

Feature Summary by Category

Detection Time Recovery Timetime

Avoid Down Time• Rolling Restart • Online backups • Table Replication• Security Access Controls (0.92)• Wire Compatibility (0.96)• Snapshots (0.96)

Reduce Detection Time • Improved metrics• Proactive notification of HMaster failure (0.96)• Proactive notification of RegionServer failure (0.96)

Reduce Recovery Time• Distributed Log Splitting (0.92)• Improvements to HBCK (0.94)• Allow writes during recovery (0.96)•Snapshots (0.96)

Page 35: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

10/25/12 Strata Hadoop World 201235

Feature Summary by Version0.90 0.92 0.94 0.96 (Upcoming Release)

•Metrics •Metrics •CF+Region Granularity Metrics

•CF +Region Granularity Metrics•Improved failure detection time

•Distributed log splitting*•HBCK improvements*

•Distributed log splitting•HBCK improvements

•Distributed log splitting•HBCK improvements

•Distributed log splitting•HBCK improvements

•Copy Table / Import / Export•Master-Slave Replication†

•Copy Table / Import / Export•Master-Master Replication

•Copy Table / Import / Export•Master-Master Replication

•Copy Table / Import / Export•Master-Master Replication•Client Wire compatibility

•Authentication and Authorization

•Authentication and Authorization

•Authentication and Authorization•(Snapshots)

Recovery in Hours Recovery in Minutes Recovery in Minutes (Recovery in Seconds)

† experimental (in progress) *backported in CDH

Page 36: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise

Thank You!Jonathan Hsieh | @jmhsiehSoftware Engineer, ClouderaApache HBase committer / PMC member

Page 37: Strata + Hadoop World 2012: Apache HBase Features for the Enterprise