hbase from the trenches - phoenix data conference 2015

31
HBase from the Trenches Avinash Ramineni Email: [email protected] LinkedIn: https://www.linkedin.com/in/avinashramineni

Upload: clairvoyantllc

Post on 15-Apr-2017

112 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: HBase from the Trenches - Phoenix Data Conference 2015

HBase fromtheTrenchesAvinashRamineni

Email: [email protected]: https://www.linkedin.com/in/avinashramineni

Page 2: HBase from the Trenches - Phoenix Data Conference 2015

Agenda

• IntrotoHBase– Overview– DataModel– Architecture

• CommonProblems• BestPractices• ToolsandUtilities

Page 3: HBase from the Trenches - Phoenix Data Conference 2015

IntrotoHBase

• Non-relationaldistributedcolumn-orienteddatabase–ModeledafterGoogle’sBigTable• BillionsofRowsandMillionsofColumns

• Sparse,consistent,distributedsortedmap• BuiltontopofHDFS• TightintegrationwithMapReduce• SupportsRandomCRUDOperations

Page 4: HBase from the Trenches - Phoenix Data Conference 2015

IntrotoHBase

• FaultTolerant• HorizontallyScalable• Real-timeRandomread-writeaccesstodatastoredinHDFS

• Millionsofqueries/second• Supportfortransactionsatasinglerowlevel• Bloomfilters• AutomaticSharding• ImplementedinJava

Page 5: HBase from the Trenches - Phoenix Data Conference 2015

DataModel

• DataisstoredinTables• Tablescontainrows– Rowsarereferencedbyauniquekey- Rowkey

• Rowsaremadeofcolumnswhicharegroupedincolumnfamilies

• Rowsaresorted• Everythingisstoredasasequenceofbytes• Allentriesareversionedandtimestamped

Page 6: HBase from the Trenches - Phoenix Data Conference 2015

DataRepresenation

Page 7: HBase from the Trenches - Phoenix Data Conference 2015

HBase Cluster

• HBase Master• Zookeeper• RegionServers• HDFS- DataNodes

Page 8: HBase from the Trenches - Phoenix Data Conference 2015

ComponentView

Page 9: HBase from the Trenches - Phoenix Data Conference 2015

LogicalView

Page 10: HBase from the Trenches - Phoenix Data Conference 2015

HBase API

• APIissimple• Operations– Get,Put,Delete,Scan,MapReduce

• Connection• Createthisinstanceonlyonceperapplicationandshareitduringitsruntime• Htable

– Zookeeper• HBase:meta

Page 11: HBase from the Trenches - Phoenix Data Conference 2015

ColumnFamilies• Allcolumnsthatareaccesed togetherneedtobegroupedintoaColumnFamily

• Noneedtoaccessorloaddatathatisnotused• Atthecolumnfamilywecandefinethesettingslike– compression,versionretentionpolicy,cachepriority– Understandthedata,accesspatternandgroupcolumnfamily

• ColumnFamilyandColumnQualifiersarestoredasbytes– Avoidbeingverbose

Page 12: HBase from the Trenches - Phoenix Data Conference 2015

HBase WritePath

Page 13: HBase from the Trenches - Phoenix Data Conference 2015

HBase Compactions

• HDFSdoesnotsupportupdates– HFilesareimmutable– NewHFilesarecreated

• MinorCompactions– SmallHFilesaremergedintolargerHfiles– Deletesarenotapplied

• MajorCompactions– Hfiles withincolumnfamilyaremergedintoSingleHfile

– Deletesareapplied

Page 14: HBase from the Trenches - Phoenix Data Conference 2015

Rowkey

• Immutable• Getitrightthefirsttimebeforealotofdataisloaded

• Whatifwegotitwrong?– Newtableandloadthedata– IfTTLset..let thedataexpire

Page 15: HBase from the Trenches - Phoenix Data Conference 2015

SecondaryIndexes

• Querying/AccessingrecordsotherthanbyRowkey

• MapReducejobstopopulateindextable– Periodicupdate

• Buildasecondaryindexwithdualwrites• Co-processors

Page 16: HBase from the Trenches - Phoenix Data Conference 2015

RegionHotspotting• Clienttrafficnotequallydistributedacrosstheregionservers– Performancedegradation– Regionunavailability

• Poorrowkeydesign– MonotonicallyincreasingRowKey

• TimeseriesorSequence– Salting

• ReadVsWrites• GET?

– Hashing• Saltwithone-wayhashofrowkey

Page 17: HBase from the Trenches - Phoenix Data Conference 2015

ShortCircuitReads

• RegionServersareco-locatedwithdatanodes• HMaster assignsRegionskeepingdatalocalityintoconsideration(mostly)

• dfs.client.read.short-circuit– RegionServersreadthedatadirectlyfromHDFSratherthangoingthroughDatanode

• LocalityLoss

Page 18: HBase from the Trenches - Phoenix Data Conference 2015

Pre-Splitting• Regionsplitting– Growsuntill itneedstobesplit– Regionatatimeisservedbyonly1RegionServer

• Pre-splitatableintoregionsattablecreationtime– Uniformlydistributewriteloadacrossregionservers– Understandthekeyspace

• Riskofunevenloaddistribution• Autosplitting– Constantsizeregionsplitpolicy– IncreasingToUpperBoundRegionSplitPolicy

Page 19: HBase from the Trenches - Phoenix Data Conference 2015

BulkLoading

• NativeAPI– DisableWAL

• MapReduce JobtogenerateHfile– Loadusingcompletebulkload /importTSV tool• Loadsintorelevantregion

– Fasterthangoingthroughnormalwritepath• NowritestoWALandMemstore• Noflushingandcompacting

Page 20: HBase from the Trenches - Phoenix Data Conference 2015

Troubleshooting

• ulimit -n– Limitsonnumberoffilesandprocesss

• HBase isdatabaseandneedstoopenanumberoffiles

• dfs.datanode.max.transfer.threadsrr.• Network• OSParameters

Page 21: HBase from the Trenches - Phoenix Data Conference 2015

YouareDeadException

• RegionServersgoingdown– Zookeeper• Distributedco-ordinated service

– HBase Masteraskstheregionservertoshutdown– GarbageCollection– Zookeepersessiontimeout

Page 22: HBase from the Trenches - Phoenix Data Conference 2015

PerformanceTuning• Compression– Reducesdatastoredondiskandtransferred– Compressionspeedoverratio

• LoadBalancing- Balancer• MergingRegions• BatchWrites– ClientWriteBuffer– AutoFlush

• MemStore-localallocationbuffers– GarbageCollectionIssues

Page 23: HBase from the Trenches - Phoenix Data Conference 2015

Tuning• HeavyWrites– Flushes,compacting,splitting increaseIOanddegradeclusterperformance• KeepRegionsizeslarger• KeepHfile sizelarge

• HeavySequentialReads• Higherblocksize• AvoidCachingontable

• HeavyRandomReads• HigherBlocklevel cache• LowerMemstore limit• Smallerblocksize

Page 24: HBase from the Trenches - Phoenix Data Conference 2015

ApachePhoenix

• SQLoverHbase– CompilesintoHbase Scans– Orchetrates parallelexecution– Aggregatequeries

• JDBCAPI’soverNativeHBase API.• SaltingBucketsPreSplitting• Trafodion– TransactionalSQLonHBase

Page 25: HBase from the Trenches - Phoenix Data Conference 2015

Hannibal

• MonitorandmaintainHBase Clusters• Howwellregionsarebalancedoverthecluster?

• Howwellregionsaresplitforeachtable• Howregionsevolveovertime• Howlongcompactionstake• IntegrationwithHUE

Page 26: HBase from the Trenches - Phoenix Data Conference 2015

Hannibal

Page 27: HBase from the Trenches - Phoenix Data Conference 2015

Hannibal

Page 28: HBase from the Trenches - Phoenix Data Conference 2015

Hannibal

Page 29: HBase from the Trenches - Phoenix Data Conference 2015

OperationalAspects• Metrics

– Master• Clusterrequests,split time,split size

– RegionServer• Blockcache,memstore,compaction,store,IO

• Htrace– Tracetoolforparalleldistributedsystem

• Monitoring– Nagios– Hannibal– Ganglia– Graphite– OpenTSDB

• Backup– Export,CopyTable,Snapshot

Page 30: HBase from the Trenches - Phoenix Data Conference 2015

Questions?

[email protected]

Page 31: HBase from the Trenches - Phoenix Data Conference 2015