kudu cloudera meetup paris

39
1 © Cloudera, Inc. All rights reserved. Intro to Apache Kudu Hadoop Storage for Fast Analytics on Fast Data @nihed [email protected]

Upload: mbarek-nihed

Post on 05-Apr-2017

349 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Kudu Cloudera Meetup Paris

1©Cloudera,Inc.Allrightsreserved.

IntrotoApacheKuduHadoopStorageforFastAnalyticsonFastData@[email protected]

Page 2: Kudu Cloudera Meetup Paris

2©Cloudera,Inc.Allrightsreserved.

Agenda

•WhyKudu?Motivationsanddesigngoals•Usecases•Designandsomeinternals• Performance•Howtogetstarted

Page 3: Kudu Cloudera Meetup Paris

3©Cloudera,Inc.Allrightsreserved.

WhyKudu?• Gapsincapabilitiesofcurrentstoragesystems• Changinghardwarelandscape

Page 4: Kudu Cloudera Meetup Paris

4©Cloudera,Inc.Allrightsreserved.

PreviousstoragelandscapeoftheHadoopecosystemHDFS (GFS)excelsat:

• Batchingestonly(eg hourly)• Efficientlyscanninglargeamounts

ofdata(analytics)HBase (BigTable)excelsat:

• Efficientlyfindingandwritingindividualrows

• Makingdatamutable

Gapsexistwhenthesepropertiesareneededsimultaneously

Page 5: Kudu Cloudera Meetup Paris

5©Cloudera,Inc.Allrightsreserved.

• HighthroughputforbigscansGoal:Within2xofParquet

• Low-latencyforshortaccessesGoal: 1msread/writeonSSD

• Database-like semantics(initiallysingle-rowACID)

• Relationaldatamodel• SQLqueriesareeasy• “NoSQL”stylescan/insert/update(Java/C++client)

Kududesigngoals

Page 6: Kudu Cloudera Meetup Paris

6©Cloudera,Inc.Allrightsreserved.

Changinghardwarelandscape

• Spinningdisk->solidstatestorage• NANDflash:Upto450kread250kwriteiops,about2GB/secreadand1.5GB/secwritethroughput, atapriceoflessthan$3/GBanddropping

• 3DXPoint memory (1000xfasterthanNAND,cheaperthanRAM)

• RAM ischeaperandmoreabundant:• 64->128->256GBoverlastfewyears

• Takeaway:The nextbottleneckisCPU,andcurrentstoragesystemsweren’tdesignedwithCPUefficiencyinmind.

Page 7: Kudu Cloudera Meetup Paris

7©Cloudera,Inc.Allrightsreserved.

WhereKudufitsintheHadoopbigdatastackStorageforfast(lowlatency)analyticsonfast(highthroughput)data

• Simplifiesthearchitectureforbuildinganalyticapplicationsonchangingdata

• Optimizedforfastanalyticperformance

• NativelyintegratedwiththeHadoopecosystemofcomponents

FILESYSTEMHDFS

NoSQLHBASE

INGEST– SQOOP,FLUME,KAFKA

DATAINTEGRATION&STORAGE

SECURITY– SENTRY

RESOURCEMANAGEMENT– YARN

UNIFIEDDATASERVICES

BATCH STREAM SQL SEARCH MODEL ONLINE

DATAENGINEERING DATADISCOVERY&ANALYTICS DATAAPPS

SPARK,HIVE,PIG

SPARK IMPALA SOLR SPARK HBASE

RELATIONALKUDU

Page 8: Kudu Cloudera Meetup Paris

8©Cloudera,Inc.Allrightsreserved.

Kudu:Scalableandfasttabularstorage

• Scalable• Testedupto275nodes(~3PBcluster)• Designedtoscaleto1000sofnodes,tensofPBs

• Fast• Millions ofread/writeoperationspersecondacrosscluster• MultipleGB/secondreadthroughputpernode

• Tabular• Storetableslikeanormaldatabase(cansupportSQL,Spark,etc)• Individualrecord-levelaccessto100+billionrowtables(Java/C++/PythonAPIs)

Page 9: Kudu Cloudera Meetup Paris

9©Cloudera,Inc.Allrightsreserved.

Kuduusage

• TablehasaSQL-likeschema• Finitenumberofcolumns(unlikeHBase/Cassandra)• Types:BOOL,INT8,INT16,INT32,INT64,FLOAT,DOUBLE,STRING,BINARY,TIMESTAMP

• Somesubsetofcolumnsmakesupapossibly-compositeprimarykey• FastALTERTABLE

• JavaandC++“NoSQL”styleAPIs• Insert(),Update(),Delete(),Scan()

• IntegrationswithMapReduce,Spark,andImpala• ApacheDrillwork-in-progress

Page 10: Kudu Cloudera Meetup Paris

10©Cloudera,Inc.Allrightsreserved.

Usecasesandarchitectures

Page 11: Kudu Cloudera Meetup Paris

11©Cloudera,Inc.Allrightsreserved.

Kuduusecases

Kuduisbestforusecasesrequiringasimultaneouscombinationofsequentialandrandomreadsandwrites

• Timeseries• Examples:Streamingmarketdata,frauddetection/prevention,riskmonitoring• Workload:Insert,updates,scans,lookups

• Machinedataanalytics• Example:Networkthreatdetection• Workload:Inserts,scans,lookups

• Onlinereporting• Example:Operationaldatastore(ODS)• Workload:Inserts,updates,scans,lookups

Page 12: Kudu Cloudera Meetup Paris

12©Cloudera,Inc.Allrightsreserved.

“Traditional”real-timeanalyticsinHadoopFrauddetectionintherealworldmeansstoragecomplexity

Considerations:● HowdoIhandlefailure

duringthisprocess?

● HowoftendoIreorganizedatastreaminginintoaformatappropriateforreporting?

● Whenreporting,howdoIseedatathathasnotyetbeenreorganized?

● HowdoIensurethatimportantjobsaren’tinterruptedbymaintenance?

NewPartition

MostRecentPartition

HistoricalData

HBase

ParquetFile

Haveweaccumulatedenoughdata?

ReorganizeHBasefileintoParquet

• Waitforrunningoperationstocomplete• DefinenewImpalapartitionreferencingthenewlywrittenParquetfile

IncomingData(MessagingSystem)

ReportingRequest

StorageinHDFS

Page 13: Kudu Cloudera Meetup Paris

13©Cloudera,Inc.Allrightsreserved.

Real-timeanalyticsinHadoopwithKudu

Improvements:● Onesystemtooperate

● Nocronjobsorbackgroundprocesses

● Handlelatearrivalsordatacorrectionswithease

● Newdataavailableimmediatelyforanalyticsoroperations

HistoricalandReal-timeData

IncomingData(MessagingSystem)

ReportingRequest

StorageinKudu

Page 14: Kudu Cloudera Meetup Paris

14©Cloudera,Inc.Allrightsreserved.

Xiaomiusecase

• World’s4th largestsmart-phonemaker(mostpopularinChina)

• GatherimportantRPCtracingeventsfrommobileappandbackendservice.

• Servicemonitoring&troubleshootingtool.

Highwritethroughput

• >5Billionrecords/dayandgrowing

Querylatestdataandquickresponse

• Identifyandresolveissuesquickly

Cansearchforindividualrecords

• Easyfortroubleshooting

Page 15: Kudu Cloudera Meetup Paris

15©Cloudera,Inc.Allrightsreserved.

XiaomibigdataanalyticspipelineBeforeKudu

LargeETLpipelinedelays● Highdatavisibilitylatency

(from1hourupto1day)

● Dataformatconversionwoes

Orderingissues● Logarrival(storage)not

exactlyincorrectorder

● Mustread2– 3daysofdatatogetallofthedatapointsforasingleday

Page 16: Kudu Cloudera Meetup Paris

16©Cloudera,Inc.Allrightsreserved.

XiaomibigdataanalyticspipelineSimplifiedwithKudu

LowlatencyETLpipeline● ~10sdatalatency

● ForappsthatneedtoavoiddirectbackpressureorneedETLforrecordenrichment

Directzero-latencypath● Forappsthatcantolerate

backpressureandcanusetheNoSQLAPIs

● Appsthatdon’tneedETLenrichmentforstorage/retrieval

OLAPscanSidetablelookupResultstore

Page 17: Kudu Cloudera Meetup Paris

17©Cloudera,Inc.Allrightsreserved.

HowitworksReplicationandfaulttolerance

Page 18: Kudu Cloudera Meetup Paris

18©Cloudera,Inc.Allrightsreserved.

Tablesandtablets

• Eachtableishorizontallypartitionedinto tablets• Range orhash partitioning

• PRIMARY KEY (host, metric, timestamp) DISTRIBUTE BY HASH(timestamp) INTO 100 BUCKETS

• Translation: bucketNumber = hashCode(row[‘timestamp’]) % 100• EachtablethasNreplicas(3or5),withRaftconsensus• Tabletservershosttabletsonlocaldiskdrives

Page 19: Kudu Cloudera Meetup Paris

19©Cloudera,Inc.Allrightsreserved.

Metadata

• Replicatedmaster• Actsasatabletdirectory• Actsasacatalog(whichtablesexist,etc)• Actsasaloadbalancer(tracksTSliveness,re-replicatesunder-replicatedtablets)

• Cachesallmetadata inRAMforhighperformance• Clientconfiguredwithmasteraddresses

• Asksmasterfortabletlocationsasneededandcachesthem

Page 20: Kudu Cloudera Meetup Paris

20©Cloudera,Inc.Allrightsreserved.

Client

HeyMaster!Whereistherowfor‘mpercy’intable“T”?

It’spartoftablet2,whichisonservers{Z,Y,X}.BTW,here’sinfoonothertabletsyoumightcareabout:T1,T2,T3,…

UPDATEmpercySETcol=foo

MetaCacheT1:…T2:…T3:…

Page 21: Kudu Cloudera Meetup Paris

21©Cloudera,Inc.Allrightsreserved.

Faulttolerance

• Operationsreplicated usingRaft consensus• Strictquorumalgorithm.SeeRaftpaperfordetails

• Transientfailures:• Followerfailure:Leadercanstillachievemajority• Leaderfailure:automaticleaderelection(~5seconds)• RestartdeadTSwithin5minanditwillrejointransparently

• Permanentfailures• After5minutes,automaticallycreatesanewfollowerreplicaandcopiesdata

• Nreplicascantoleratemaximumof(N-1)/2failures

Page 22: Kudu Cloudera Meetup Paris

22©Cloudera,Inc.Allrightsreserved.

HowitworksColumnarstorage

Page 23: Kudu Cloudera Meetup Paris

23©Cloudera,Inc.Allrightsreserved.

Columnarstorage

{25059873,22309487,23059861,23010982}

Tweet_id

{newsycbot,RideImpala,fastly,llvmorg}

User_name

{1442865158,1442828307,1442865156,1442865155}

Created_at

{Visualexp…,Introducing..,MissingJuly…,LLVM3.7….}

text

Page 24: Kudu Cloudera Meetup Paris

24©Cloudera,Inc.Allrightsreserved.

Columnarstorage

{25059873,22309487,23059861,23010982}

Tweet_id

{newsycbot,RideImpala,fastly,llvmorg}

User_name

{1442865158,1442828307,1442865156,1442865155}

Created_at

{Visualexp…,Introducing..,MissingJuly…,LLVM3.7….}

text

SELECTCOUNT(*)FROMtweetsWHEREuser_name =‘newsycbot’;

Onlyread1column

1GB 2GB 1GB 200GB

Page 25: Kudu Cloudera Meetup Paris

25©Cloudera,Inc.Allrightsreserved.

Columnarcompression

{1442865158,1442828307,1442865156,1442865155}

Created_at

Created_at Diff(created_at)

1442865158 n/a

1442828307 -36851

1442865156 36849

1442865155 -1

64bitseach 17bitseach

• Manycolumnscancompresstoafewbitsperrow!

• Especially:• Timestamps• Timeseriesvalues• Low-cardinalitystrings

• Massivespacesavingsandthroughputincrease!

Page 26: Kudu Cloudera Meetup Paris

26©Cloudera,Inc.Allrightsreserved.

Integrations

Page 27: Kudu Cloudera Meetup Paris

27©Cloudera,Inc.Allrightsreserved.

SparkDataSource integration(WIP)

sqlContext.load("org.kududb.spark",Map("kudu.table" -> “foo”,

"kudu.master" -> “master.example.com”)).registerTempTable(“mytable”)

df = sqlContext.sql(“select col_a, col_b from mytable “ +“where col_c = 123”)

https://github.com/tmalaska/SparkOnKudu

Page 28: Kudu Cloudera Meetup Paris

28©Cloudera,Inc.Allrightsreserved.

Impalaintegration

• CREATE TABLE … DISTRIBUTE BY HASH(col1) INTO 16 BUCKETS AS SELECT … FROM …

• INSERT/UPDATE/DELETE

• Optimizations like predicate pushdown, scan parallelism, plans for more on the way

• Not an Impala user? Apache Drill integration in the works by Dremio

Page 29: Kudu Cloudera Meetup Paris

29©Cloudera,Inc.Allrightsreserved.

MapReduceintegration

•MostKuduintegration/correctnesstestingviaMapReduce•Multi-frameworkcluster(MR+HDFS+Kuduonthesamedisks)•KuduTableInputFormat /KuduTableOutputFormat

•Supportforpushingdownpredicates,columnprojections,etc.

Page 30: Kudu Cloudera Meetup Paris

30©Cloudera,Inc.Allrightsreserved.

Performance

Page 31: Kudu Cloudera Meetup Paris

31©Cloudera,Inc.Allrightsreserved.

TPC-H(analyticsbenchmark)

• 75servercluster• 12(spinning)diskseach,enoughRAMtofitdataset• TPC-HScaleFactor100(100GB)

• Examplequery:• SELECT n_name, sum(l_extendedprice * (1 - l_discount)) as revenue FROM customer, orders, lineitem, supplier, nation, region WHERE c_custkey = o_custkey AND l_orderkey = o_orderkey AND l_suppkey = s_suppkey AND c_nationkey = s_nationkey AND s_nationkey = n_nationkey AND n_regionkey = r_regionkey AND r_name = 'ASIA' AND o_orderdate >= date '1994-01-01' AND o_orderdate < '1995-01-01’ GROUP BY n_name ORDER BY revenue desc;

Page 32: Kudu Cloudera Meetup Paris

32©Cloudera,Inc.Allrightsreserved.

• KuduoutperformsParquetby31%(geometricmean)forRAM-residentdata

Page 33: Kudu Cloudera Meetup Paris

33©Cloudera,Inc.Allrightsreserved.

VersusotherNoSQLstorage• ApachePhoenix:OLTPSQLenginebuiltonHBase• 10nodecluster(9worker,1master)• TPC-HLINEITEMtableonly(6Brows)

2152

21976

131

0,04

1918

13,2

1,7

0,7

0,15

155

9,3

1,4 1,5 1,37

0,01

0,1

1

10

100

1000

10000Load TPCHQ1 COUNT(*)

COUNT(*)WHERE…

single-rowlookup

Time(sec)

Phoenix

Kudu

Parquet

Page 34: Kudu Cloudera Meetup Paris

34©Cloudera,Inc.Allrightsreserved.

WhataboutNoSQL-stylerandomaccess?(YCSB)

• YCSB 0.5.0-snapshot• 10nodecluster(9worker,1master)

• 100Mrowdataset• 10Moperationseachworkload

Page 35: Kudu Cloudera Meetup Paris

35©Cloudera,Inc.Allrightsreserved.

Demo

Page 36: Kudu Cloudera Meetup Paris

36©Cloudera,Inc.Allrightsreserved.

Kuducommunity

Yourcompanyhere!

Page 37: Kudu Cloudera Meetup Paris

37©Cloudera,Inc.Allrightsreserved.

Gettingstartedasauser

• https://kudu.apache.org/

• Quickstart VM• Easiestwaytogetstarted• ImpalaandKuduinaneasy-to-installVM

• CSDandParcels• ForinstallationonaClouderaManager-managedcluster

Page 38: Kudu Cloudera Meetup Paris

38©Cloudera,Inc.Allrightsreserved.

Gettingstartedasadeveloper

• https://github.com/apache/kudu

• Apache2.0licensedopensourceandanASFincubatorproject.• Contributionsarewelcomeandencouraged!Becomeacommitter!

Page 39: Kudu Cloudera Meetup Paris

39©Cloudera,Inc.Allrightsreserved.

Questions?http://getkudu.io@ApacheKudu