hbase backups and performance on mapr

17
06/16/2022 © MapR Technologies, Inc. 1 HBase on MapR Lohit VijayaRenu, MapR Technologies, Inc. HBase contributor day at Yahoo, June 30 2011

Upload: lohitvijayarenu

Post on 24-May-2015

3.327 views

Category:

Technology


1 download

DESCRIPTION

Slides from HBase User Group about HBase backups and performance on MapR distribution for Apache Hadoop.

TRANSCRIPT

Page 1: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 1

HBase on MapR

Lohit VijayaRenu, MapR Technologies, Inc.HBase contributor day at Yahoo, June 30 2011

Page 2: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 2

• Who am I?• Lohit VijayaRenu, Software Engineer at MapR Technologies

([email protected])

• MapR • Combines the best of the Hadoop community contributions with

significant internally financed infrastructure development to provide complete distribution for Apache Hadoop (www.mapr.com)

Page 3: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 3

HBase on MapR

• Backups using Snapshots• Performance on MapR• Highly available MapR• MapR Control System

Page 4: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 4

HBase Backups

"We're trying to come up with right strategy for backing up HBase tables ...Currently, we're employing exports (writing onto HDFS of another cluster directly), but is taking too long (~5 hours to export ~5GB of data)...” Manoj Murumkar

"...Recently I encountered a problem about data loss of HBase. So it comes to the question that how to backup HBase data to recover table records...What about copy the directory of HBase to another directory in HDFS?... " Liu Xianglong

Source: hbase-user group

Available options

• Export/Import• CopyTable• Distcp• Backup from Mozilla• Cluster Replication• Table Snapshots

Source: http://blog.sematext.com/2011/03/11/hbase-backup-options/

Page 5: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 5

MapR Snapshots

REDIRECT ON WRITE FOR SNAPSHOT

A B C C’ D

Data Blocks

Snapshot 20110630

Snapshot 20110629

Snapshot 3

• Entire /hbase can be snapshotted while HBase is running

• Snapshots are consistentSaves space by sharing blocksLightning fast

• Zero performance loss on writing to original

• Scheduled, or on-demand

• REST API for creation and deletion of snapshots

MapR

HBASE

READ / WRITE

/hbase/hbase/.snapshot/Snapshot20110630/hbase/.snaphsot/Snapshot20110629/hbase/.snaphsot/Snapshot3

Page 6: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 6

MapR SnapshotsHBase table in DFS

Take snapshot on running HBase

Restore from snapshot

Page 7: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 7

MapR Control System

• Snapshot information

• Snapshot Schedules

• All UI operations have REST APIs

• More info at www.mapr.com

Page 8: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 8

MapR Mirroring

• Mirror is physical copy of data

• Consistent, point-in-time data replication to different cluster

• Differential deltas are updated

• Compressed and check-summed

• Scheduled or on-demand• REST API for setup, start and

stop mirror

WANDatacenter 2

Production Backup

Datacenter 1

Page 9: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 9

HBase performance

"...Initially, when the table was empty I was getting around 300 inserts per second with 50 writing threads. Then, when the region split and a second server was added the rate suddenly jumped to 3000 inserts/sec per server, so ~6000 for the two servers...“ Eran Kutner

"...My scenario is similar, we need under 10k rows, 10-20 columns and which can have thousands of version with value not greater than 300 bytes...Can we get 40-50k records/sec insertion speed in HBase??...“ Gaurav Vashishth

Source: hbase-user group

Page 10: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 10

YCSB setup

RS RSRSRS

YCSB

Master

YCSBYCSBYCSB

ZooKeeper • Modified YCSB to use

ZooKeeper to have co-ordinated start.

• HMaster and RegionServer running on MapR

• YCSB Client running on RS nodes

https://github.com/lohitvijayarenu/YCSB

MapR

Page 11: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 11

0

5000

10000

15000

20000

25000

30000

35000

Node1 Node5 Node2 Node3 Node4

Seconds

Ops

• YCSB Clients doing inserts from all cluster nodes.

• Throughput rates were similar from all nodes

• All operations in cluster completed around same time.

YCSB operations from nodes

Page 12: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 12

Insert (all nodes)

74310

165730

Insert operations per sec

Stock Hadoop MapR

Insert performance

Ops

Seconds

10 RS, 11 2TB @72008 Cores, 24GB RAM, 2Gbps3 Replication, No compression

Dataset: 1B rowsRow size: 1K

6500

5000

10000

15000

20000

25000

30000

Insert (one node)

Page 13: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 13

0100200300400500600700800900

1000

Read (all nodes)

3447

6732

Read operations per secStock Hadoop MapR

Read performance

Seconds

Ops

9 RS, 5 500G @72008 cores, 24GB RAM, 2Gbps

Dataset: 0.9B rowsRow size: 1K

Read (one node)

Page 14: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 14

HBase High Availability

"...In HBase 0.90 I have seen that it has a fault tolerant behavior of triggering lease recovery and closing the file when the writer dies in the middle. Yet does hbase have any workaround/recovery when NameNode is restarted in the middle of the file write(possibly the HLog file , after some syncs)???..." Gokulakannan M

source: hbase-user group

Page 15: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 15

MapR High Availability

• No single point of failure• Distributed NameNode• Automatic and transparent failover• Better performance• Replicated and persisted to disk• Fully distributed and highly

scalable• Real time HBase on MapR

MapR(No Single Point of Failure)

HBASE

READ / WRITE

NodeNN

NodeNN

NodeNN

NodeNN

NodeNN

NodeNN

Page 16: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 16

MapR Heatmap™

• Intuitive• Insightful• Comprehensive• One node or

thousands• More at

www.mapr.com

Page 17: HBase backups and performance on MapR

04/12/2023 © MapR Technologies, Inc. 17

Credits

• Michael Stack and Ryan Rawson for their valuable feedback.

• Brian Cooper and Adam Silberstein for their help with YCSB

• Active and helpful HBase community

More Information• http://www.mapr.com• http://mapr.com/only-with-map

r.html• Follow us @mapr• Download and try from

www.mapr.com