introduction to hdf 3.0

Post on 22-Jan-2018

486 Views

Category:

Technology

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1 ©HortonworksInc.2011– 2017AllRightsReserved

TimothySpann2017FutureofData– PrincetonMeetupJune20,2017HostedbyTRACIntermodal

Introduction to HDF 3.0

2 ©HortonworksInc.2011– 2017AllRightsReserved

• Schema Registry – Milind Pandit• HDFStreamingUpdates– TimSpann

• EDWOptimizationwithHadoopandHDF- GregoryCKeys,PhD.

3 ©HortonworksInc.2011– 2017AllRightsReserved

AmbariIntegration

4 ©HortonworksInc.2011– 2017AllRightsReserved

FormatandSchemaAwareEfficientFlowManagement

à Provideprocessorsforschemaawarerecordstructureforcommonprocessingpatterns– Split,Enrich,Partition,Convert,Query (SQLqueriespoweredbyApacheCalcite)– Put/GetrecordsbetweenNiFi andKafka,ElasticSearch,RDMBS(moresoon)– Easybridgingto/fromColumnardataformatslikeORCorParquet

à Separateformat/schemaspecificlogicintoextensiblerecordreadersandwriters– Developerscanwritenewreaders/writers– Userscancreatenewreaders/writerswithscriptingliveinproduction!

à Sowhat?– Formatandschemaawareprocessing*with*genericreusablecomponents– Maintainsfullprovenance/lineagetrail– Dramaticspeed/efficiencyincreasepernode– IntegrationwithHortonworksSchemaRegistryandextensibleforothers

5 ©HortonworksInc.2011– 2017AllRightsReserved

RecordReaderCS

6 ©HortonworksInc.2011– 2017AllRightsReserved

RecordWriterCS

7 ©HortonworksInc.2011– 2017AllRightsReserved

‘QueryRecord’Processor– Treatstreamingrecordsastables

8 ©HortonworksInc.2011– 2017AllRightsReserved

ComponentVersioning

9 ©HortonworksInc.2011– 2017AllRightsReserved

StreamProcessing– IntroducingStreamingAnalyticsManager(SAM)

StreamingAnalyticsManager

AbrandnewproductmoduleintheHDFstacktodesign,develop,deployandmanagestreaminganalyticsappwithadrag-and-dropuserexperience

10 ©HortonworksInc.2011– 2017AllRightsReserved

SAM- WriteComplexStreamingApplicationsWithNoCode

StreamingAnalyticsManager

à AbrandnewproductmoduleintheHDFstacktodesign,develop,deployandmanagestreaminganalyticsappwithdrag-and-dropparadigm– Buildstreaminganalyticsapplicationsthatdoeventcorrelation,contextenrichment,complex

patternmatching,analyticalaggregationsandcreationofalerts/notificationswheninsightsarediscovered.

– Givethecodersthepowertoaddkeyfunctionsandextendtheplatform (addcustomsinks,processors,spouts,etc..)

11 ©HortonworksInc.2011– 2017AllRightsReserved

SAM’sValueProposition

à Buildanddeploycomplexstreamapplicationswithoutwritinganycode

à Onlyopensourcetoolinthemarket withgraphicalprogrammingparadigm

à Speedtime-to-markettobuildcomplexstreaminganalyticsapplications

à Buildstreaminganalyticsapplicationswithoutspecializedskillsets.

à Decoupledataformatfromthestreamingapplicationitselfwhilebeingschemaaware

à Supportmultipleunderliningstreamingengines

12 ©HortonworksInc.2011– 2017AllRightsReserved

StreamBuilderModuleforAppDevelopers

à Buildercomponents,shownonthecanvaspalette,arethebuildingblocksusedbytheappdevelopertobuildstreamingapplications

à Draganddroptobuildaworkingstreamingapplicationwithoutwritingasinglelineofcode

à 4TypesofComponents:Sources,Processors,SinksandCustom

13 ©HortonworksInc.2011– 2017AllRightsReserved

StreamInsightModuleforBusinessAnalysts

à Atooltocreatereal-timeanalyticsdashboards,chartsandgraphs

à 30+visualizationchartsoutoftheboxwithcustomizationcapability

à DruidistheAnalyticsEnginethatpowerstheStreamInsightModule.

14 ©HortonworksInc.2011– 2017AllRightsReserved

StreamOpsModuleforITOperations

à Createandmanagedifferentenvironmentsinwhichindividualstreamingapplicationswillbebuilt

à EnvironmentsconsistsofservicessuchasHDFS,Kafka,Stormfromdifferentservicepools

à Savetimeandreduceoperationaloverheadwithsamedraganddropparadigmasthestreambuildmodule

15 ©HortonworksInc.2011– 2017AllRightsReserved

StreamBuilderModuleforAppDevelopers

à Buildercomponents,shownonthecanvaspalette,arethebuildingblocksusedbytheappdevelopertobuildstreamingapps.

à Draganddroptobuildaworkingstreamingapplicationwithoutwritingasinglelineofcode.

à 4TypesofComponents:Sources,Processors,SinksandCustom

16 ©HortonworksInc.2011– 2017AllRightsReserved

SAMisAllaboutDoingReal-TimeAnalyticsontheStream

Real-TimePrescriptiveAnalytics

Real-TimeAnalytics

Real-TimePredictiveAnalytics

Real-TimeDescriptiveAnalytics

Whatshouldwedorightnow?

Whatcouldhappennow/soon?

Whatishappeningrightnow?

17 ©HortonworksInc.2011– 2017AllRightsReserved

Real-TimePrescriptiveAnalytics

à Question:Whatshouldwedorightnow?

à Context:Itisrainy,thedriverisbeenontheroadfor12hoursandhehas30highspeedingalertsovera3minutewindowinthelast2hours.

à Answer:DispatcharadiocalltotheDrivertoslowdown

18 ©HortonworksInc.2011– 2017AllRightsReserved

Real-TimePredictiveAnalytics

à Question:NoviolationeventsbutwhatmighthappenthatIneedtobeworriedabout?

à Mydatascienceteamhasamodelthatcanpredictthatbasedon– Weather– Roads– DriverHRinfolikedrivercertificationstatus,wagePlan– Drivertimesheetinfolikehours,andmilesloggedoverthelastweek

19 ©HortonworksInc.2011– 2017AllRightsReserved

BuildingthePredictiveModelonHDP

Exploresmallsubsetofeventstoidentifypredictivefeaturesandmakeahypothesis.E.g.hypothesis:“foggyweathercausesdriverviolations”

1

IdentifysuitableMLalgorithmstotrainamodel– wewilluseclassificationalgorithmsaswehavelabeledeventsdata

2

TransformenrichedeventsdatatoaformatthatisfriendlytoSparkMLlib – manyMLlibsexpecttrainingdatainacertainformat

3

TrainalogisticclassificationSparkmodelonYARN,withaboveeventsastraininginput,anditeratetofinetunegeneratedmodel

4

20 ©HortonworksInc.2011– 2017AllRightsReserved

LogisticalRegressionModel

21 ©HortonworksInc.2011– 2017AllRightsReserved

ScoringthePredictiveModelonHDF

UseSAM’senrich/customprocessorstoenrichtheeventwiththefeaturesrequiredforthemodel6

EnrichwithFeatures

UseSAM’sprojection/customprocessorstotransform/normalizethestreamingeventandthefeaturesrequiredforthemodel

7Transform/Normalize

UseSAM’sPMMLprocessortoscorethemodelforeachstreameventwithitsrequiredfeatures8

ScoreModel

UseSAM’sruleandnotificationprocessorstoalert,notifyandtakeactionusingtheresultsofthemodel9

Alert/Notify/Action

ExporttheSparkMllib modelandimportintotheHDF’sModelRegistry5 Model

Registry

22 ©HortonworksInc.2011– 2017AllRightsReserved

SAM’sModelRegistryandPMMLProcessor

à ModelRegistry– Samhasrepositorytostore

andmanagePMMLbasedpredictivemodels

– Firstclassfeatureslikeversion,evolutionpolicies,etc,willbeaddedinfuturerelease

à PMMLProcessor– Processorthatcanusemodel

fromtheregistryandscorethemodelsbasedontheinputstreamofeventscomingin

23 ©HortonworksInc.2011– 2017AllRightsReserved

SAMExtensibility:CustomProcessors,UDF,UDAFs

à CustomComponents– Mostuserswillwanttobuildcustomcomponentstomeet

certainrequirements.– SAMprovidestheabilitytoaddbuildcustomcomponent

usingtheSAMSDK– ThejarsthencanthenbeuploadedinSAMviatheUser

Interface

à 3TypesofCustomComponents– CustomProcessors– CustomUDF

• UserdefinedfunctionsthatareusedbytheProjectionprocessor

– CustomUDAFs• Userdefinedaggregatefunctionsthatareusedbythe

Aggregateprocessor.– SDKcanbeusedtocreatecustomUDFfunctionsfor

windowedaggregations

24 ©HortonworksInc.2011– 2017AllRightsReserved

StreamingSplitJoinPattern

à 3Enrichmentshavetoperformedontheeventstreamtofeedintomodel:– FromLat,Longandtime,queryweather

conditions– FromdriverId,lookupinformationabout

driver’scertificationandwageplan– FromdriverId,lookupinformationabouthow

manymilesandhourswasonthedriverontheroadlastweek

à StreamingSplitJoinPattern– ComplexPatternthatallowsparallelprocessing

todecreaselatency(UsedbyApacheMetronextensively)

1. CreateasplitJoin Key2. Splitthestreamintonwherenisthenumber

ofdifferentenrichmentsyouwanttodo3. JointhenstreamsbasedonthesplitJoinKey

ComplexpatterntoimplementthatSAMallowstheusertodo

simplywithnocode!

25 ©HortonworksInc.2011– 2017AllRightsReserved

StreamInsightModuleforBusinessAnalysts

à Atooltocreatetime-seriesandreal-timeanalyticsdashboards,chartsandgraphs

à 30+visualizationchartsoutoftheboxwithcustomizationcapability

à DruidistheAnalyticsEnginethatpowerstheStreamInsightModule.

26 ©HortonworksInc.2011– 2017AllRightsReserved

StreamingAnalyticsManager

27 ©HortonworksInc.2011– 2017AllRightsReserved

SetUpAnEnvironmentforSAM

28 ©HortonworksInc.2011– 2017AllRightsReserved

HortonworksSAMCanvastobuildtheStreamingAnalyticsAppwithoutwritingalineofcode

29 ©HortonworksInc.2011– 2017AllRightsReserved

HortonworksSAMAppDashboard

30 ©HortonworksInc.2011– 2017AllRightsReserved

SchemaRegistryDashboardandDetailsofOneSchema

31 ©HortonworksInc.2011– 2017AllRightsReserved

Contact:

TimothySpann@PaaSDeVwww.meetup.com/futureofdata-princeton

community.hortonworks.com/users/9304/tspann.html

32 ©HortonworksInc.2011– 2017AllRightsReserved

HortonworksCommunityConnection

Read access for everyone, join to participate and be recognized

• FullQ&APlatform(likeStackOverflow)

• KnowledgeBaseArticles

• CodeSamplesandRepositories

33 ©HortonworksInc.2011– 2017AllRightsReserved

CommunityEngagement

Participate now at: community.hortonworks.com©HortonworksInc.2011– 2015.AllRightsReserved

4,000+RegisteredUsers

10,000+Answers

15,000+TechnicalAssets

One Website!

top related