hbase backups and performance on mapr
Post on 24-May-2015
3.327 Views
Preview:
DESCRIPTION
TRANSCRIPT
04/12/2023 © MapR Technologies, Inc. 1
HBase on MapR
Lohit VijayaRenu, MapR Technologies, Inc.HBase contributor day at Yahoo, June 30 2011
04/12/2023 © MapR Technologies, Inc. 2
• Who am I?• Lohit VijayaRenu, Software Engineer at MapR Technologies
(lohit@maprtech.com)
• MapR • Combines the best of the Hadoop community contributions with
significant internally financed infrastructure development to provide complete distribution for Apache Hadoop (www.mapr.com)
04/12/2023 © MapR Technologies, Inc. 3
HBase on MapR
• Backups using Snapshots• Performance on MapR• Highly available MapR• MapR Control System
04/12/2023 © MapR Technologies, Inc. 4
HBase Backups
"We're trying to come up with right strategy for backing up HBase tables ...Currently, we're employing exports (writing onto HDFS of another cluster directly), but is taking too long (~5 hours to export ~5GB of data)...” Manoj Murumkar
"...Recently I encountered a problem about data loss of HBase. So it comes to the question that how to backup HBase data to recover table records...What about copy the directory of HBase to another directory in HDFS?... " Liu Xianglong
Source: hbase-user group
Available options
• Export/Import• CopyTable• Distcp• Backup from Mozilla• Cluster Replication• Table Snapshots
Source: http://blog.sematext.com/2011/03/11/hbase-backup-options/
04/12/2023 © MapR Technologies, Inc. 5
MapR Snapshots
REDIRECT ON WRITE FOR SNAPSHOT
A B C C’ D
Data Blocks
Snapshot 20110630
Snapshot 20110629
Snapshot 3
• Entire /hbase can be snapshotted while HBase is running
• Snapshots are consistentSaves space by sharing blocksLightning fast
• Zero performance loss on writing to original
• Scheduled, or on-demand
• REST API for creation and deletion of snapshots
MapR
HBASE
READ / WRITE
/hbase/hbase/.snapshot/Snapshot20110630/hbase/.snaphsot/Snapshot20110629/hbase/.snaphsot/Snapshot3
04/12/2023 © MapR Technologies, Inc. 6
MapR SnapshotsHBase table in DFS
Take snapshot on running HBase
Restore from snapshot
04/12/2023 © MapR Technologies, Inc. 7
MapR Control System
• Snapshot information
• Snapshot Schedules
• All UI operations have REST APIs
• More info at www.mapr.com
04/12/2023 © MapR Technologies, Inc. 8
MapR Mirroring
• Mirror is physical copy of data
• Consistent, point-in-time data replication to different cluster
• Differential deltas are updated
• Compressed and check-summed
• Scheduled or on-demand• REST API for setup, start and
stop mirror
WANDatacenter 2
Production Backup
Datacenter 1
04/12/2023 © MapR Technologies, Inc. 9
HBase performance
"...Initially, when the table was empty I was getting around 300 inserts per second with 50 writing threads. Then, when the region split and a second server was added the rate suddenly jumped to 3000 inserts/sec per server, so ~6000 for the two servers...“ Eran Kutner
"...My scenario is similar, we need under 10k rows, 10-20 columns and which can have thousands of version with value not greater than 300 bytes...Can we get 40-50k records/sec insertion speed in HBase??...“ Gaurav Vashishth
Source: hbase-user group
04/12/2023 © MapR Technologies, Inc. 10
YCSB setup
RS RSRSRS
YCSB
Master
YCSBYCSBYCSB
ZooKeeper • Modified YCSB to use
ZooKeeper to have co-ordinated start.
• HMaster and RegionServer running on MapR
• YCSB Client running on RS nodes
https://github.com/lohitvijayarenu/YCSB
MapR
04/12/2023 © MapR Technologies, Inc. 11
0
5000
10000
15000
20000
25000
30000
35000
Node1 Node5 Node2 Node3 Node4
Seconds
Ops
• YCSB Clients doing inserts from all cluster nodes.
• Throughput rates were similar from all nodes
• All operations in cluster completed around same time.
YCSB operations from nodes
04/12/2023 © MapR Technologies, Inc. 12
Insert (all nodes)
74310
165730
Insert operations per sec
Stock Hadoop MapR
Insert performance
Ops
Seconds
10 RS, 11 2TB @72008 Cores, 24GB RAM, 2Gbps3 Replication, No compression
Dataset: 1B rowsRow size: 1K
6500
5000
10000
15000
20000
25000
30000
Insert (one node)
04/12/2023 © MapR Technologies, Inc. 13
0100200300400500600700800900
1000
Read (all nodes)
3447
6732
Read operations per secStock Hadoop MapR
Read performance
Seconds
Ops
9 RS, 5 500G @72008 cores, 24GB RAM, 2Gbps
Dataset: 0.9B rowsRow size: 1K
Read (one node)
04/12/2023 © MapR Technologies, Inc. 14
HBase High Availability
"...In HBase 0.90 I have seen that it has a fault tolerant behavior of triggering lease recovery and closing the file when the writer dies in the middle. Yet does hbase have any workaround/recovery when NameNode is restarted in the middle of the file write(possibly the HLog file , after some syncs)???..." Gokulakannan M
source: hbase-user group
04/12/2023 © MapR Technologies, Inc. 15
MapR High Availability
• No single point of failure• Distributed NameNode• Automatic and transparent failover• Better performance• Replicated and persisted to disk• Fully distributed and highly
scalable• Real time HBase on MapR
MapR(No Single Point of Failure)
HBASE
READ / WRITE
NodeNN
NodeNN
NodeNN
NodeNN
NodeNN
NodeNN
04/12/2023 © MapR Technologies, Inc. 16
MapR Heatmap™
• Intuitive• Insightful• Comprehensive• One node or
thousands• More at
www.mapr.com
04/12/2023 © MapR Technologies, Inc. 17
Credits
• Michael Stack and Ryan Rawson for their valuable feedback.
• Brian Cooper and Adam Silberstein for their help with YCSB
• Active and helpful HBase community
More Information• http://www.mapr.com• http://mapr.com/only-with-map
r.html• Follow us @mapr• Download and try from
www.mapr.com
top related