introduction to hdf 3.0
TRANSCRIPT
![Page 1: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/1.jpg)
1 ©HortonworksInc.2011– 2017AllRightsReserved
TimothySpann2017FutureofData– PrincetonMeetupJune20,2017HostedbyTRACIntermodal
Introduction to HDF 3.0
![Page 2: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/2.jpg)
2 ©HortonworksInc.2011– 2017AllRightsReserved
• Schema Registry – Milind Pandit• HDFStreamingUpdates– TimSpann
• EDWOptimizationwithHadoopandHDF- GregoryCKeys,PhD.
![Page 3: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/3.jpg)
3 ©HortonworksInc.2011– 2017AllRightsReserved
AmbariIntegration
![Page 4: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/4.jpg)
4 ©HortonworksInc.2011– 2017AllRightsReserved
FormatandSchemaAwareEfficientFlowManagement
à Provideprocessorsforschemaawarerecordstructureforcommonprocessingpatterns– Split,Enrich,Partition,Convert,Query (SQLqueriespoweredbyApacheCalcite)– Put/GetrecordsbetweenNiFi andKafka,ElasticSearch,RDMBS(moresoon)– Easybridgingto/fromColumnardataformatslikeORCorParquet
à Separateformat/schemaspecificlogicintoextensiblerecordreadersandwriters– Developerscanwritenewreaders/writers– Userscancreatenewreaders/writerswithscriptingliveinproduction!
à Sowhat?– Formatandschemaawareprocessing*with*genericreusablecomponents– Maintainsfullprovenance/lineagetrail– Dramaticspeed/efficiencyincreasepernode– IntegrationwithHortonworksSchemaRegistryandextensibleforothers
![Page 5: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/5.jpg)
5 ©HortonworksInc.2011– 2017AllRightsReserved
RecordReaderCS
![Page 6: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/6.jpg)
6 ©HortonworksInc.2011– 2017AllRightsReserved
RecordWriterCS
![Page 7: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/7.jpg)
7 ©HortonworksInc.2011– 2017AllRightsReserved
‘QueryRecord’Processor– Treatstreamingrecordsastables
![Page 8: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/8.jpg)
8 ©HortonworksInc.2011– 2017AllRightsReserved
ComponentVersioning
![Page 9: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/9.jpg)
9 ©HortonworksInc.2011– 2017AllRightsReserved
StreamProcessing– IntroducingStreamingAnalyticsManager(SAM)
StreamingAnalyticsManager
AbrandnewproductmoduleintheHDFstacktodesign,develop,deployandmanagestreaminganalyticsappwithadrag-and-dropuserexperience
![Page 10: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/10.jpg)
10 ©HortonworksInc.2011– 2017AllRightsReserved
SAM- WriteComplexStreamingApplicationsWithNoCode
StreamingAnalyticsManager
à AbrandnewproductmoduleintheHDFstacktodesign,develop,deployandmanagestreaminganalyticsappwithdrag-and-dropparadigm– Buildstreaminganalyticsapplicationsthatdoeventcorrelation,contextenrichment,complex
patternmatching,analyticalaggregationsandcreationofalerts/notificationswheninsightsarediscovered.
– Givethecodersthepowertoaddkeyfunctionsandextendtheplatform (addcustomsinks,processors,spouts,etc..)
![Page 11: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/11.jpg)
11 ©HortonworksInc.2011– 2017AllRightsReserved
SAM’sValueProposition
à Buildanddeploycomplexstreamapplicationswithoutwritinganycode
à Onlyopensourcetoolinthemarket withgraphicalprogrammingparadigm
à Speedtime-to-markettobuildcomplexstreaminganalyticsapplications
à Buildstreaminganalyticsapplicationswithoutspecializedskillsets.
à Decoupledataformatfromthestreamingapplicationitselfwhilebeingschemaaware
à Supportmultipleunderliningstreamingengines
![Page 12: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/12.jpg)
12 ©HortonworksInc.2011– 2017AllRightsReserved
StreamBuilderModuleforAppDevelopers
à Buildercomponents,shownonthecanvaspalette,arethebuildingblocksusedbytheappdevelopertobuildstreamingapplications
à Draganddroptobuildaworkingstreamingapplicationwithoutwritingasinglelineofcode
à 4TypesofComponents:Sources,Processors,SinksandCustom
![Page 13: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/13.jpg)
13 ©HortonworksInc.2011– 2017AllRightsReserved
StreamInsightModuleforBusinessAnalysts
à Atooltocreatereal-timeanalyticsdashboards,chartsandgraphs
à 30+visualizationchartsoutoftheboxwithcustomizationcapability
à DruidistheAnalyticsEnginethatpowerstheStreamInsightModule.
![Page 14: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/14.jpg)
14 ©HortonworksInc.2011– 2017AllRightsReserved
StreamOpsModuleforITOperations
à Createandmanagedifferentenvironmentsinwhichindividualstreamingapplicationswillbebuilt
à EnvironmentsconsistsofservicessuchasHDFS,Kafka,Stormfromdifferentservicepools
à Savetimeandreduceoperationaloverheadwithsamedraganddropparadigmasthestreambuildmodule
![Page 15: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/15.jpg)
15 ©HortonworksInc.2011– 2017AllRightsReserved
StreamBuilderModuleforAppDevelopers
à Buildercomponents,shownonthecanvaspalette,arethebuildingblocksusedbytheappdevelopertobuildstreamingapps.
à Draganddroptobuildaworkingstreamingapplicationwithoutwritingasinglelineofcode.
à 4TypesofComponents:Sources,Processors,SinksandCustom
![Page 16: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/16.jpg)
16 ©HortonworksInc.2011– 2017AllRightsReserved
SAMisAllaboutDoingReal-TimeAnalyticsontheStream
Real-TimePrescriptiveAnalytics
Real-TimeAnalytics
Real-TimePredictiveAnalytics
Real-TimeDescriptiveAnalytics
Whatshouldwedorightnow?
Whatcouldhappennow/soon?
Whatishappeningrightnow?
![Page 17: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/17.jpg)
17 ©HortonworksInc.2011– 2017AllRightsReserved
Real-TimePrescriptiveAnalytics
à Question:Whatshouldwedorightnow?
à Context:Itisrainy,thedriverisbeenontheroadfor12hoursandhehas30highspeedingalertsovera3minutewindowinthelast2hours.
à Answer:DispatcharadiocalltotheDrivertoslowdown
![Page 18: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/18.jpg)
18 ©HortonworksInc.2011– 2017AllRightsReserved
Real-TimePredictiveAnalytics
à Question:NoviolationeventsbutwhatmighthappenthatIneedtobeworriedabout?
à Mydatascienceteamhasamodelthatcanpredictthatbasedon– Weather– Roads– DriverHRinfolikedrivercertificationstatus,wagePlan– Drivertimesheetinfolikehours,andmilesloggedoverthelastweek
![Page 19: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/19.jpg)
19 ©HortonworksInc.2011– 2017AllRightsReserved
BuildingthePredictiveModelonHDP
Exploresmallsubsetofeventstoidentifypredictivefeaturesandmakeahypothesis.E.g.hypothesis:“foggyweathercausesdriverviolations”
1
IdentifysuitableMLalgorithmstotrainamodel– wewilluseclassificationalgorithmsaswehavelabeledeventsdata
2
TransformenrichedeventsdatatoaformatthatisfriendlytoSparkMLlib – manyMLlibsexpecttrainingdatainacertainformat
3
TrainalogisticclassificationSparkmodelonYARN,withaboveeventsastraininginput,anditeratetofinetunegeneratedmodel
4
![Page 20: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/20.jpg)
20 ©HortonworksInc.2011– 2017AllRightsReserved
LogisticalRegressionModel
![Page 21: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/21.jpg)
21 ©HortonworksInc.2011– 2017AllRightsReserved
ScoringthePredictiveModelonHDF
UseSAM’senrich/customprocessorstoenrichtheeventwiththefeaturesrequiredforthemodel6
EnrichwithFeatures
UseSAM’sprojection/customprocessorstotransform/normalizethestreamingeventandthefeaturesrequiredforthemodel
7Transform/Normalize
UseSAM’sPMMLprocessortoscorethemodelforeachstreameventwithitsrequiredfeatures8
ScoreModel
UseSAM’sruleandnotificationprocessorstoalert,notifyandtakeactionusingtheresultsofthemodel9
Alert/Notify/Action
ExporttheSparkMllib modelandimportintotheHDF’sModelRegistry5 Model
Registry
![Page 22: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/22.jpg)
22 ©HortonworksInc.2011– 2017AllRightsReserved
SAM’sModelRegistryandPMMLProcessor
à ModelRegistry– Samhasrepositorytostore
andmanagePMMLbasedpredictivemodels
– Firstclassfeatureslikeversion,evolutionpolicies,etc,willbeaddedinfuturerelease
à PMMLProcessor– Processorthatcanusemodel
fromtheregistryandscorethemodelsbasedontheinputstreamofeventscomingin
![Page 23: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/23.jpg)
23 ©HortonworksInc.2011– 2017AllRightsReserved
SAMExtensibility:CustomProcessors,UDF,UDAFs
à CustomComponents– Mostuserswillwanttobuildcustomcomponentstomeet
certainrequirements.– SAMprovidestheabilitytoaddbuildcustomcomponent
usingtheSAMSDK– ThejarsthencanthenbeuploadedinSAMviatheUser
Interface
à 3TypesofCustomComponents– CustomProcessors– CustomUDF
• UserdefinedfunctionsthatareusedbytheProjectionprocessor
– CustomUDAFs• Userdefinedaggregatefunctionsthatareusedbythe
Aggregateprocessor.– SDKcanbeusedtocreatecustomUDFfunctionsfor
windowedaggregations
![Page 24: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/24.jpg)
24 ©HortonworksInc.2011– 2017AllRightsReserved
StreamingSplitJoinPattern
à 3Enrichmentshavetoperformedontheeventstreamtofeedintomodel:– FromLat,Longandtime,queryweather
conditions– FromdriverId,lookupinformationabout
driver’scertificationandwageplan– FromdriverId,lookupinformationabouthow
manymilesandhourswasonthedriverontheroadlastweek
à StreamingSplitJoinPattern– ComplexPatternthatallowsparallelprocessing
todecreaselatency(UsedbyApacheMetronextensively)
1. CreateasplitJoin Key2. Splitthestreamintonwherenisthenumber
ofdifferentenrichmentsyouwanttodo3. JointhenstreamsbasedonthesplitJoinKey
ComplexpatterntoimplementthatSAMallowstheusertodo
simplywithnocode!
![Page 25: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/25.jpg)
25 ©HortonworksInc.2011– 2017AllRightsReserved
StreamInsightModuleforBusinessAnalysts
à Atooltocreatetime-seriesandreal-timeanalyticsdashboards,chartsandgraphs
à 30+visualizationchartsoutoftheboxwithcustomizationcapability
à DruidistheAnalyticsEnginethatpowerstheStreamInsightModule.
![Page 26: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/26.jpg)
26 ©HortonworksInc.2011– 2017AllRightsReserved
StreamingAnalyticsManager
![Page 27: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/27.jpg)
27 ©HortonworksInc.2011– 2017AllRightsReserved
SetUpAnEnvironmentforSAM
![Page 28: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/28.jpg)
28 ©HortonworksInc.2011– 2017AllRightsReserved
HortonworksSAMCanvastobuildtheStreamingAnalyticsAppwithoutwritingalineofcode
![Page 29: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/29.jpg)
29 ©HortonworksInc.2011– 2017AllRightsReserved
HortonworksSAMAppDashboard
![Page 30: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/30.jpg)
30 ©HortonworksInc.2011– 2017AllRightsReserved
SchemaRegistryDashboardandDetailsofOneSchema
![Page 31: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/31.jpg)
31 ©HortonworksInc.2011– 2017AllRightsReserved
Contact:
[email protected]/futureofdata-princeton
community.hortonworks.com/users/9304/tspann.html
![Page 32: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/32.jpg)
32 ©HortonworksInc.2011– 2017AllRightsReserved
HortonworksCommunityConnection
Read access for everyone, join to participate and be recognized
• FullQ&APlatform(likeStackOverflow)
• KnowledgeBaseArticles
• CodeSamplesandRepositories
![Page 33: Introduction to HDF 3.0](https://reader030.vdocuments.us/reader030/viewer/2022020314/5a65429a7f8b9ace0b8b48ab/html5/thumbnails/33.jpg)
33 ©HortonworksInc.2011– 2017AllRightsReserved
CommunityEngagement
Participate now at: community.hortonworks.com©HortonworksInc.2011– 2015.AllRightsReserved
4,000+RegisteredUsers
10,000+Answers
15,000+TechnicalAssets
One Website!