intro data streaming slidesnotes... · 2018. 9. 24. · data streaming operators two main types:...

Post on 31-Dec-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

IntroductiontoDataStreaming

VincenzoGulisano,Ph.D.

Agenda

• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 2

Agenda

• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 3

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 4

IoTenablesforincreasedawareness,security,power-efficiency,...

largeIoTsystemsarecomplex

traditionaldataanalysistechniquesalonearenotadequate!

AMIs[1,2,3,4] VNs[5,6]

• demand-response• scheduling[7]• micro-grids• detectionofmediumsizeblackouts[8]• detectionofnontechnicallosses• ...

5

• autonomousdriving• platooning• accidentdetection[9]• variabletolls[9]• congestionmonitoring[10]• ...

IoTenablesforincreasedawareness,security,power-efficiency,...

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

AMIs VNs

6

largeIoTsystemsarecomplex

ThedatastreamingparadigmanditsuseinFogarchitectures

Characteristics[15]:1. edgelocation2. locationawareness3. lowlatency4. geographicaldistribution5. large-scale

6. supportformobility7. real-timeinteractions8. predominanceofwireless9. heterogeneous10. interoperability/federation11. interactionwiththecloud

VincenzoGulisano

7

traditionaldataanalysistechniquesalonearenotadequate![13,14]

ThedatastreamingparadigmanditsuseinFogarchitectures

1. doestheinfrastructureallowforbillionsofreadingsperdaytobetransferredcontinuously?

2. thelatencyincurredwhiletransferringdata,doesthatunderminetheutilityoftheanalysis?

3. isitsecuretoconcentrateallthedatainasingleplace?[11]

4. isitsmarttogiveawayfine-graineddata?[12]

VincenzoGulisano

Agenda

• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 8

MainMemory

Motivation

DBMSvs.DSMS

Disk

1 Data

QueryProcessing

3 Queryresults

2 Query

MainMemory

QueryProcessing

ContinuousQueryData Query

results

9ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

Beforewestart...aboutdatastreamingandStreamProcessingEngines(SPEs)

10

AnincompletelistofSPEs(cf.relatedworkin[16]):

time

BorealisThe Aurora Project

STanfordstREamdatAManager

NiagaraCQ

COUGAR

StreamCloud

Coveringallofthem/discussingwhichusecasesarebestforeachoneoutofscope...thefollowingshowconnectionbetweenwhatisbeingpresentedandacertainSPE

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

datastream:unboundedsequenceoftuplessharingthesameschema

11

Example:vehicles’speedreports

time

Field Field

vehicleid text

time(secs) text

speed(Km/h) double

Xcoordinate double

Ycoordinate double

A 8:00 55.5 X1 Y1

Let’sassumeeachsource(e.g.,vehicle)producesanddeliversatimestampsortedstream

A 8:07 34.3 X3 Y3

A 8:03 70.3 X2 Y2

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

continuousquery(orsimplyquery):DirectedAcyclicGraph(DAG)ofstreamsandoperators

12

OP

OP

OP

OP OP

OP OP

sourceop(1+outstreams)

sinkop(1+instreams)

stream

op(1+in,1+outstreams)

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

datastreamingoperators

Twomaintypes:• Statelessoperators• donotmaintainanystate• one-by-oneprocessing• iftheymaintainsomestate,suchstatedoesnotevolvedependingonthetuplesbeingprocessed

• Statefuloperators• maintainastatethatevolvesdependingonthetuplesbeingprocessed• produceoutputtuplesthatdependonmultipleinputtuples

13

OP

OP

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

statelessoperators

14

Filter ...

Map

Union...

Filter/routetuplesbasedonone(ormore)conditions

Transformeachtuple

Mergemultiplestreams(withthesameschema)intoone

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

statelessoperators

15

Filter ...

Map

Union...

source:http://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

statelessoperators

16

Filter ...

Map

Union...

source:https://ci.apache.org/projects/flink/flink-docs-release-1.0/apis/streaming/index.html

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

statelessoperators

17

Filter ...

Map

Union...

source:http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

statefuloperators

18

Aggregateinformationfrommultipletuples(e.g.,max,min,sum,...)

Jointuplescomingfrom2streamsgivenacertainpredicate

Aggregate

Join

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

statefuloperators

19

source:http://storm.apache.org/releases/2.0.0-SNAPSHOT/Trident-tutorial.html

source:http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams

source:http://spark.apache.org/docs/latest/streaming-programming-guide.html#transformations-on-dstreams

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

Waitamoment!

ifstreamsareunbounded,howcanweaggregateorjoin?

20ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

windows andstatefulanalysis[16]

Statefuloperationsaredoneoverwindows:• Time-based(e.g.,tuplesinthelast10minutes)• Tuple-based(e.g.,giventhelast50tuples)

21

time[8:00,9:00)

[8:20,9:20)

[8:40,9:40)

Usuallyapplicationsrelyontime-basedslidingwindows

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

time-basedslidingwindowaggregation(count)

22

Counter:4

time[8:00,9:00)

8:05 8:15 8:22 8:45 9:05

Output:4

Counter:1Counter:2

Counter:3

Counter:3

time

8:05 8:15 8:22 8:45 9:05

[8:20,9:20)

weassumedeachsourceproducesanddeliversatimestampsortedstream!Whathappensifthisisnotthecase?

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

23

windows andstatefulanalysis

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

24

basicoperatorsanduser-definedoperators

Besidesasetofbasicoperators,SPEsusuallyallowtheusertodefinead-hocoperators(e.g.,whenexistingaggregationarenotenough)

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

samplequery

Foreachvehicle,raiseanalertifthespeedofthelatestreportismorethan2timeshigherthanitsaveragespeedinthelast30days.

25

time

A 8:00 55.5 X1 Y1 A 8:07 34.3 X3 Y3

A 8:03 70.3 X2 Y2

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

26

Field

vehicleid

time(secs)

speed(Km/h)

Xcoordinate

Ycoordinate

Computeaveragespeedforeach

vehicleduringthelast30days

Aggregate

Fieldvehicleid

time(secs)

avgspeed(Km/h)

Join

Checkcondition

Filter

Field

vehicleid

time(secs)

speed(Km/h)

Joinonvehicleid

Fieldvehicleid

time(secs)

avgspeed(Km/h)

speed(Km/h)

samplequery

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

27

A J F

samplequery

Notice:• thesamesemanticscanbedefinedinseveralways(usingdifferentoperatorsandcomposingthemindifferentways)• Usingmanybasicbuildingblockscaneasethetaskofdistributingandparallelizingtheanalysis(moreinthefollowing...)

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

Whydatastreaming,then?

Expressive

Online

Parallel&Distributed

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 28

Expressive

Online

Parallel&Distributed

A F M

A F M

A F M

A F M

FM

A

A

A

F

F

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 29

Samplequerythate.g.validatesdata/raisesalarms...

Agenda

• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 30

1. Distributeddeployment2. Paralleldeployment3. Orderinganddeterminism4. Shared-nothingvsshared-memoryparallelism5. Loadbalancing6. Elasticity7. Faulttolerance8. Datasharing

31

Challengesandresearchquestions

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

Beforewestart...

32

Followingexamplesarefromvehicularnetworks

Road-sideunitRSU Vehicle

Server

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

1- Distributeddeployment– wheretoplaceagivenoperator?[17,4,18,19]

33

M?

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

2- Paralleldeployment– howdoweparallelizetheanalysis? [20,21]

34

M

M A

A

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

35

M

M A

A

A 8:00 55.5 X1 Y1

A 8:07 34.3 X3 Y3

A 8:03 70.3 X2 Y2

A 8:00 55.5 X1 Y1

A 8:07 34.3 X3 Y3

A 8:03 70.3 X2 Y2

Whatiftuplewithtimestamp8:00arrivesaftertuplewithtimestamp8:07?

3– Orderinganddeterminism[22,23,24]

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

4– shared-nothingvs.shared-memoryparallelism[25]

36

M

M

...

A

AJ

...

Howtotakeadvantageofmulti-corearchitectures?

Howtoboostinter-nodeparallelismandintra-nodeparallelism?

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

5– loadbalancing&statetransfer[20,26]

37

IfweshifttheprocessingofacertainsubsetoftuplesfromnodeAtonodeB,howdotransferitspreviousstate?

MA

MA

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

6– elasticity[20,27]

38

How/whentoprovisionordecommissionnewresourcesdependingontheanalysis’costsfluctuations?

J

ThedatastreamingparadigmanditsuseinFogarchitectures

J J

J

VincenzoGulisano

7– faulttolerance[16,28,29]

39

Howtoreplaceafailednodeminimizingrecoverytime(makingittransparenttotheenduser)?

ThedatastreamingparadigmanditsuseinFogarchitectures

J

J

J

J

J

J

VincenzoGulisano

8– datasharing(differentialprivacy)[2,30,31,32]

40

Howtopreventprivacyleaks?

Supposeweareinterestedinpublishingvehicles’averagespeedoverawindowofonehour...

WecouldaggregatebyRSU!

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

8– datasharing(differentialprivacy)

41

Waitamoment!what ifasinglevehicleisconnectedtoacertainRSU?

Whetheracertainmechanismpreservesornottheprivacyoftheunderlyingdatadependsontheknowledgeoftheadversary

Differentialprivacyassumestheworstcasescenario!

ThedatastreamingparadigmanditsuseinFogarchitecturesVincenzoGulisano

Agenda

• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 42

Millionsofyearsofevolution

Millionsofsensors

• Storeinformation• Iteratemultipletimesoverdata• Think,donotrushthroughdecisions

• ”Hard-wired”routines• Real-timedecisions• High-throughput/low-latency

ShouldI(really)haveanextrapieceofcake?

Danger!!!Run!!!

Humans

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 43

Years/Decadesofevolution

Millionsofsensors

WhattrafficcongestionpatternscanIobservefrequently?

Don’ttakeover,carinoppositelane!

• Storeinformation• Iteratemultipletimesoverdata• Think,donotrushthroughdecisions

Databases,dataminingtechniques...

Datastreaming,distributedandparallelanalysis

• Continuousanalysis• Real-timedecisions• High-throughput/low-latency

Computers(cyber-physical/IoTsystems)

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 44

Agenda

• Motivation• Thedatastreamingprocessingparadigm• Challengesandresearchquestions• Conclusions• Bibliography

VincenzoGulisano ThedatastreamingparadigmanditsuseinFogarchitectures 45

Bibliography1. Zhou,Jiazhen,RoseQingyang Hu,andYiQian."Scalabledistributedcommunicationarchitecturestosupportadvanced

meteringinfrastructureinsmartgrid."IEEETransactionsonParallelandDistributedSystems23.9(2012):1632-1642.2. Gulisano,Vincenzo,etal."BES:DifferentiallyPrivateandDistributedEventAggregationinAdvancedMeteringInfrastructures."

Proceedingsofthe2ndACMInternationalWorkshoponCyber-PhysicalSystemSecurity.ACM,2016.3. Gulisano,Vincenzo,MagnusAlmgren,andMarinaPapatriantafilou."Onlineandscalabledatavalidationinadvancedmetering

infrastructures."IEEEPESInnovativeSmartGridTechnologies,Europe.IEEE,2014.4. Gulisano,Vincenzo,MagnusAlmgren,andMarinaPapatriantafilou."METIS:atwo-tierintrusiondetectionsystemforadvanced

meteringinfrastructures."InternationalConferenceonSecurityandPrivacyinCommunicationSystems.SpringerInternationalPublishing,2014.

5. Yousefi,Saleh,MahmoudSiadat Mousavi,andMahmoodFathy."Vehicularadhocnetworks(VANETs):challengesandperspectives."20066thInternationalConferenceonITSTelecommunications.IEEE,2006.

6. ElZarki,Magda,etal."Securityissuesinafuturevehicularnetwork."EuropeanWireless.Vol.2.2002.7. Georgiadis,Giorgos,andMarinaPapatriantafilou."Dealingwithstoragewithoutforecastsinsmartgrids:Problem

transformationandonlineschedulingalgorithm."Proceedingsofthe29thAnnualACMSymposiumonAppliedComputing.ACM,2014.

8. Fu,Zhang,etal."Onlinetemporal-spatialanalysisfordetectionofcriticaleventsinCyber-PhysicalSystems."BigData(BigData),2014IEEEInternationalConferenceon.IEEE,2014.

ThedatastreamingparadigmanditsuseinFogarchitectures 46VincenzoGulisano

Bibliography9. Arasu,Arvind,etal."Linearroad:astreamdatamanagementbenchmark."Proceedingsofthe

ThirtiethinternationalconferenceonVerylargedatabases-Volume30.VLDBEndowment,2004.

10. Lv,Yisheng,etal."Trafficflowpredictionwithbigdata:adeeplearningapproach."IEEETransactionsonIntelligentTransportationSystems16.2(2015):865-873.

11. Grochocki,David,etal."AMIthreats,intrusiondetectionrequirementsanddeploymentrecommendations."SmartGridCommunications(SmartGridComm),2012IEEEThirdInternationalConferenceon.IEEE,2012.

12. Molina-Markham,Andrés,etal."Privatememoirsofasmartmeter."Proceedingsofthe2ndACMworkshoponembeddedsensingsystemsforenergy-efficiencyinbuilding.ACM,2010.

13. Gulisano,Vincenzo,etal."Streamcloud:Alargescaledatastreamingsystem."DistributedComputingSystems(ICDCS),2010IEEE30thInternationalConferenceon.IEEE,2010.

14. Stonebraker,Michael,Uǧur Çetintemel,andStanZdonik."The8requirementsofreal-timestreamprocessing."ACMSIGMODRecord34.4(2005):42-47.

15. Bonomi,Flavio,etal."Fogcomputinganditsroleintheinternetofthings."ProceedingsofthefirsteditionoftheMCCworkshoponMobilecloudcomputing.ACM,2012.

ThedatastreamingparadigmanditsuseinFogarchitectures 47VincenzoGulisano

Bibliography16. Gulisano,VincenzoMassimiliano. StreamCloud:AnElasticParallel-DistributedStream

ProcessingEngine.Diss.Informatica,2012.17. Cardellini,Valeria,etal."Optimaloperatorplacementfordistributedstreamprocessing

applications."Proceedingsofthe10thACMInternationalConferenceonDistributedandEvent-basedSystems.ACM,2016.

18. Costache,Stefania,etal."UnderstandingtheData-ProcessingChallengesinIntelligentVehicularSystems."Proceedingsofthe2016IEEEIntelligentVehiclesSymposium(IV16).

19. Giatrakos,Nikos,Antonios Deligiannakis,andMinosGarofalakis."ScalableApproximateQueryTrackingoverHighlyDistributedDataStreams."Proceedingsofthe2016InternationalConferenceonManagementofData.ACM,2016.

20. Gulisano,Vincenzo,etal."Streamcloud:Anelasticandscalabledatastreamingsystem."IEEETransactionsonParallelandDistributedSystems23.12(2012):2351-2365.

21. Shah,Mehul A.,etal."Flux:Anadaptivepartitioningoperatorforcontinuousquerysystems."DataEngineering,2003.Proceedings.19thInternationalConferenceon.IEEE,2003.

ThedatastreamingparadigmanditsuseinFogarchitectures 48VincenzoGulisano

Bibliography22. Cederman,Daniel,etal."Briefannouncement:concurrentdatastructuresforefficientstreamingaggregation."Proceedingsof

the26thACMsymposiumonParallelisminalgorithmsandarchitectures.ACM,2014.23. Ji,Yuanzhen,etal."Quality-drivenprocessingofslidingwindowaggregatesoverout-of-orderdatastreams."Proceedingsofthe

9thACMInternationalConferenceonDistributedEvent-BasedSystems.ACM,2015.24. Ji,Yuanzhen,etal."Quality-drivendisorderhandlingforconcurrentwindowedstreamquerieswithsharedoperators."

Proceedingsofthe10thACMInternationalConferenceonDistributedandEvent-basedSystems.ACM,2016.25. Gulisano,Vincenzo,etal."Scalejoin:Adeterministic,disjoint-parallelandskew-resilientstreamjoin."BigData(BigData),2015

IEEEInternationalConferenceon.IEEE,2015.26. Ottenwälder,Beate,etal."MigCEP:operatormigrationformobilitydrivendistributedcomplexeventprocessing."Proceedings

ofthe7thACMinternationalconferenceonDistributedevent-basedsystems.ACM,2013.27. DeMatteis,Tiziano,andGabrieleMencagli."Keepcalmandreactwithforesight:strategiesforlow-latencyandenergy-efficient

elasticdatastreamprocessing."Proceedingsofthe21stACMSIGPLANSymposiumonPrinciplesandPracticeofParallelProgramming.ACM,2016.

28. Balazinska,Magdalena,etal."Fault-toleranceintheBorealisdistributedstreamprocessingsystem."ACMTransactionsonDatabaseSystems(TODS)33.1(2008):3.

29. CastroFernandez,Raul,etal."Integratingscaleoutandfaulttoleranceinstreamprocessingusingoperatorstatemanagement."Proceedingsofthe2013ACMSIGMODinternationalconferenceonManagementofdata.ACM,2013.

ThedatastreamingparadigmanditsuseinFogarchitectures 49VincenzoGulisano

Bibliography

30. Dwork,Cynthia."Differentialprivacy:Asurveyofresults."InternationalConferenceonTheoryandApplicationsofModelsofComputation.SpringerBerlinHeidelberg,2008.

31. Dwork,Cynthia,etal."Differentialprivacyundercontinualobservation."Proceedingsoftheforty-secondACMsymposiumonTheoryofcomputing.ACM,2010.

32. Kargl,Frank,ArikFriedman,andRoksana Boreli."Differentialprivacyinintelligenttransportationsystems."ProceedingsofthesixthACMconferenceonSecurityandprivacyinwirelessandmobilenetworks.ACM,2013.

ThedatastreamingparadigmanditsuseinFogarchitectures 50VincenzoGulisano

top related