Download - Introduction to HDF 3.0
1 ©HortonworksInc.2011– 2017AllRightsReserved
TimothySpann2017FutureofData– PrincetonMeetupJune20,2017HostedbyTRACIntermodal
Introduction to HDF 3.0
2 ©HortonworksInc.2011– 2017AllRightsReserved
• Schema Registry – Milind Pandit• HDFStreamingUpdates– TimSpann
• EDWOptimizationwithHadoopandHDF- GregoryCKeys,PhD.
3 ©HortonworksInc.2011– 2017AllRightsReserved
AmbariIntegration
4 ©HortonworksInc.2011– 2017AllRightsReserved
FormatandSchemaAwareEfficientFlowManagement
à Provideprocessorsforschemaawarerecordstructureforcommonprocessingpatterns– Split,Enrich,Partition,Convert,Query (SQLqueriespoweredbyApacheCalcite)– Put/GetrecordsbetweenNiFi andKafka,ElasticSearch,RDMBS(moresoon)– Easybridgingto/fromColumnardataformatslikeORCorParquet
à Separateformat/schemaspecificlogicintoextensiblerecordreadersandwriters– Developerscanwritenewreaders/writers– Userscancreatenewreaders/writerswithscriptingliveinproduction!
à Sowhat?– Formatandschemaawareprocessing*with*genericreusablecomponents– Maintainsfullprovenance/lineagetrail– Dramaticspeed/efficiencyincreasepernode– IntegrationwithHortonworksSchemaRegistryandextensibleforothers
5 ©HortonworksInc.2011– 2017AllRightsReserved
RecordReaderCS
6 ©HortonworksInc.2011– 2017AllRightsReserved
RecordWriterCS
7 ©HortonworksInc.2011– 2017AllRightsReserved
‘QueryRecord’Processor– Treatstreamingrecordsastables
8 ©HortonworksInc.2011– 2017AllRightsReserved
ComponentVersioning
9 ©HortonworksInc.2011– 2017AllRightsReserved
StreamProcessing– IntroducingStreamingAnalyticsManager(SAM)
StreamingAnalyticsManager
AbrandnewproductmoduleintheHDFstacktodesign,develop,deployandmanagestreaminganalyticsappwithadrag-and-dropuserexperience
10 ©HortonworksInc.2011– 2017AllRightsReserved
SAM- WriteComplexStreamingApplicationsWithNoCode
StreamingAnalyticsManager
à AbrandnewproductmoduleintheHDFstacktodesign,develop,deployandmanagestreaminganalyticsappwithdrag-and-dropparadigm– Buildstreaminganalyticsapplicationsthatdoeventcorrelation,contextenrichment,complex
patternmatching,analyticalaggregationsandcreationofalerts/notificationswheninsightsarediscovered.
– Givethecodersthepowertoaddkeyfunctionsandextendtheplatform (addcustomsinks,processors,spouts,etc..)
11 ©HortonworksInc.2011– 2017AllRightsReserved
SAM’sValueProposition
à Buildanddeploycomplexstreamapplicationswithoutwritinganycode
à Onlyopensourcetoolinthemarket withgraphicalprogrammingparadigm
à Speedtime-to-markettobuildcomplexstreaminganalyticsapplications
à Buildstreaminganalyticsapplicationswithoutspecializedskillsets.
à Decoupledataformatfromthestreamingapplicationitselfwhilebeingschemaaware
à Supportmultipleunderliningstreamingengines
12 ©HortonworksInc.2011– 2017AllRightsReserved
StreamBuilderModuleforAppDevelopers
à Buildercomponents,shownonthecanvaspalette,arethebuildingblocksusedbytheappdevelopertobuildstreamingapplications
à Draganddroptobuildaworkingstreamingapplicationwithoutwritingasinglelineofcode
à 4TypesofComponents:Sources,Processors,SinksandCustom
13 ©HortonworksInc.2011– 2017AllRightsReserved
StreamInsightModuleforBusinessAnalysts
à Atooltocreatereal-timeanalyticsdashboards,chartsandgraphs
à 30+visualizationchartsoutoftheboxwithcustomizationcapability
à DruidistheAnalyticsEnginethatpowerstheStreamInsightModule.
14 ©HortonworksInc.2011– 2017AllRightsReserved
StreamOpsModuleforITOperations
à Createandmanagedifferentenvironmentsinwhichindividualstreamingapplicationswillbebuilt
à EnvironmentsconsistsofservicessuchasHDFS,Kafka,Stormfromdifferentservicepools
à Savetimeandreduceoperationaloverheadwithsamedraganddropparadigmasthestreambuildmodule
15 ©HortonworksInc.2011– 2017AllRightsReserved
StreamBuilderModuleforAppDevelopers
à Buildercomponents,shownonthecanvaspalette,arethebuildingblocksusedbytheappdevelopertobuildstreamingapps.
à Draganddroptobuildaworkingstreamingapplicationwithoutwritingasinglelineofcode.
à 4TypesofComponents:Sources,Processors,SinksandCustom
16 ©HortonworksInc.2011– 2017AllRightsReserved
SAMisAllaboutDoingReal-TimeAnalyticsontheStream
Real-TimePrescriptiveAnalytics
Real-TimeAnalytics
Real-TimePredictiveAnalytics
Real-TimeDescriptiveAnalytics
Whatshouldwedorightnow?
Whatcouldhappennow/soon?
Whatishappeningrightnow?
17 ©HortonworksInc.2011– 2017AllRightsReserved
Real-TimePrescriptiveAnalytics
à Question:Whatshouldwedorightnow?
à Context:Itisrainy,thedriverisbeenontheroadfor12hoursandhehas30highspeedingalertsovera3minutewindowinthelast2hours.
à Answer:DispatcharadiocalltotheDrivertoslowdown
18 ©HortonworksInc.2011– 2017AllRightsReserved
Real-TimePredictiveAnalytics
à Question:NoviolationeventsbutwhatmighthappenthatIneedtobeworriedabout?
à Mydatascienceteamhasamodelthatcanpredictthatbasedon– Weather– Roads– DriverHRinfolikedrivercertificationstatus,wagePlan– Drivertimesheetinfolikehours,andmilesloggedoverthelastweek
19 ©HortonworksInc.2011– 2017AllRightsReserved
BuildingthePredictiveModelonHDP
Exploresmallsubsetofeventstoidentifypredictivefeaturesandmakeahypothesis.E.g.hypothesis:“foggyweathercausesdriverviolations”
1
IdentifysuitableMLalgorithmstotrainamodel– wewilluseclassificationalgorithmsaswehavelabeledeventsdata
2
TransformenrichedeventsdatatoaformatthatisfriendlytoSparkMLlib – manyMLlibsexpecttrainingdatainacertainformat
3
TrainalogisticclassificationSparkmodelonYARN,withaboveeventsastraininginput,anditeratetofinetunegeneratedmodel
4
20 ©HortonworksInc.2011– 2017AllRightsReserved
LogisticalRegressionModel
21 ©HortonworksInc.2011– 2017AllRightsReserved
ScoringthePredictiveModelonHDF
UseSAM’senrich/customprocessorstoenrichtheeventwiththefeaturesrequiredforthemodel6
EnrichwithFeatures
UseSAM’sprojection/customprocessorstotransform/normalizethestreamingeventandthefeaturesrequiredforthemodel
7Transform/Normalize
UseSAM’sPMMLprocessortoscorethemodelforeachstreameventwithitsrequiredfeatures8
ScoreModel
UseSAM’sruleandnotificationprocessorstoalert,notifyandtakeactionusingtheresultsofthemodel9
Alert/Notify/Action
ExporttheSparkMllib modelandimportintotheHDF’sModelRegistry5 Model
Registry
22 ©HortonworksInc.2011– 2017AllRightsReserved
SAM’sModelRegistryandPMMLProcessor
à ModelRegistry– Samhasrepositorytostore
andmanagePMMLbasedpredictivemodels
– Firstclassfeatureslikeversion,evolutionpolicies,etc,willbeaddedinfuturerelease
à PMMLProcessor– Processorthatcanusemodel
fromtheregistryandscorethemodelsbasedontheinputstreamofeventscomingin
23 ©HortonworksInc.2011– 2017AllRightsReserved
SAMExtensibility:CustomProcessors,UDF,UDAFs
à CustomComponents– Mostuserswillwanttobuildcustomcomponentstomeet
certainrequirements.– SAMprovidestheabilitytoaddbuildcustomcomponent
usingtheSAMSDK– ThejarsthencanthenbeuploadedinSAMviatheUser
Interface
à 3TypesofCustomComponents– CustomProcessors– CustomUDF
• UserdefinedfunctionsthatareusedbytheProjectionprocessor
– CustomUDAFs• Userdefinedaggregatefunctionsthatareusedbythe
Aggregateprocessor.– SDKcanbeusedtocreatecustomUDFfunctionsfor
windowedaggregations
24 ©HortonworksInc.2011– 2017AllRightsReserved
StreamingSplitJoinPattern
à 3Enrichmentshavetoperformedontheeventstreamtofeedintomodel:– FromLat,Longandtime,queryweather
conditions– FromdriverId,lookupinformationabout
driver’scertificationandwageplan– FromdriverId,lookupinformationabouthow
manymilesandhourswasonthedriverontheroadlastweek
à StreamingSplitJoinPattern– ComplexPatternthatallowsparallelprocessing
todecreaselatency(UsedbyApacheMetronextensively)
1. CreateasplitJoin Key2. Splitthestreamintonwherenisthenumber
ofdifferentenrichmentsyouwanttodo3. JointhenstreamsbasedonthesplitJoinKey
ComplexpatterntoimplementthatSAMallowstheusertodo
simplywithnocode!
25 ©HortonworksInc.2011– 2017AllRightsReserved
StreamInsightModuleforBusinessAnalysts
à Atooltocreatetime-seriesandreal-timeanalyticsdashboards,chartsandgraphs
à 30+visualizationchartsoutoftheboxwithcustomizationcapability
à DruidistheAnalyticsEnginethatpowerstheStreamInsightModule.
26 ©HortonworksInc.2011– 2017AllRightsReserved
StreamingAnalyticsManager
27 ©HortonworksInc.2011– 2017AllRightsReserved
SetUpAnEnvironmentforSAM
28 ©HortonworksInc.2011– 2017AllRightsReserved
HortonworksSAMCanvastobuildtheStreamingAnalyticsAppwithoutwritingalineofcode
29 ©HortonworksInc.2011– 2017AllRightsReserved
HortonworksSAMAppDashboard
30 ©HortonworksInc.2011– 2017AllRightsReserved
SchemaRegistryDashboardandDetailsofOneSchema
31 ©HortonworksInc.2011– 2017AllRightsReserved
Contact:
[email protected]/futureofdata-princeton
community.hortonworks.com/users/9304/tspann.html
32 ©HortonworksInc.2011– 2017AllRightsReserved
HortonworksCommunityConnection
Read access for everyone, join to participate and be recognized
• FullQ&APlatform(likeStackOverflow)
• KnowledgeBaseArticles
• CodeSamplesandRepositories
33 ©HortonworksInc.2011– 2017AllRightsReserved
CommunityEngagement
Participate now at: community.hortonworks.com©HortonworksInc.2011– 2015.AllRightsReserved
4,000+RegisteredUsers
10,000+Answers
15,000+TechnicalAssets
One Website!