hbase incremental backup
DESCRIPTION
TRANSCRIPT
2012/07/23
HBase Incremental Backup / Restore
How to perform Incremental Backup/Restore?
• HBase ships with a handful of useful tools – CopyTable– Export / Import
CopyTable
• Purpose:– Copy part of or all of a table, either to the same cluster or
another cluster
• Usage:– bin/hbase org.apache.hadoop.hbase.mapreduce.CopyTable [--starttime=X] [--
endtime=Y] [--new.name=NEW] [--peer.adr=ADR] tablename
• Options:– starttime: Beginning of the time range.– endtime: End of the time range. Without endtime means
starttime to forever.– new.name: New table's name.– peer.adr: Address of the peer cluster given in the format
hbase.zookeeper.quorum:hbase.zookeeper.client.port:zookeeper.znode.parent
– families: Comma-separated list of ColumnFamilies to copy.
CopyTable (cont.)
• Limitation– Can only backup to another table (Scan + Put)– While a CopyTable is running, newly inserted or updated rows
may occur and these concurrent edits may cause inconsistency.
Export
• Purpose:– Dump the contents of table to HDFS in a sequence file
• Usage:– $ bin/hbase org.apache.hadoop.hbase.mapreduce.Export <tablename>
<outputdir> [[<starttime> [<endtime>]]]
• Options:– *tablename: The name of the table to export– *outputdir: The location in HDFS to store the exported data– starttime: Beginning of the time range– endtime: The matching end time for the time range of the scan
used
Export (cont.)
• Limitation– Can only backup to HDFS in a sequence file (Scan + Write to
HDFS).– While a CopyTable is running, newly inserted or updated rows
may occur and these concurrent edits may cause inconsistency.
Import
• Purpose:– Load data that has been exported back into HBase
• Usage– $ bin/hbase org.apache.hadoop.hbase.mapreduce.Import <tablename>
<inputdir>
Conclusion
• Regular (ex. Daily) Incremental backup– Use Export and organize output dir as a meaningful hierarchy
• /table_name/2012 (year) /07 (month) /01 (date)
/02 … /31 /01 (hour) … /24
– Perform Import to restore data on-demand
• To reduce the overhead, don’t perform it during the peak time
Question?