d2.6 learning resources 3 · 2018-02-01 · project acronym: edsa project full name: european data...

36
Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable Editor: Alexander Mikroyannidis (OU) Other contributors: Huw Fryer (SOTON), Shatha Jaradat (KTH), Mihhail Matskin (KTH), Angi Voss (Fraunhofer), Ryan Goodman (ODI), David Tarrant (ODI), Emily Vacher (ODI), Gillian Whitworth (ODI) Deliverable Reviewers: Inna Koval (JSI), Angi Voss (Fraunhofer) Deliverable due date: 31/01/2018 Submission date: 24/01/2018 Distribution level: P Version: 1.0 This document is part of a research project funded by the Horizon 2020 Framework Programme of the European Union

Upload: others

Post on 25-Jun-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Projectacronym: EDSA

Projectfullname: EuropeanDataScienceAcademy

Grantagreementno: 643937

D2.6Learningresources3

DeliverableEditor: AlexanderMikroyannidis(OU)

Othercontributors:

HuwFryer(SOTON),ShathaJaradat(KTH),MihhailMatskin(KTH),AngiVoss(Fraunhofer),RyanGoodman(ODI),DavidTarrant(ODI),EmilyVacher(ODI),GillianWhitworth(ODI)

DeliverableReviewers:

InnaKoval(JSI),AngiVoss(Fraunhofer)

Deliverableduedate: 31/01/2018

Submissiondate: 24/01/2018

Distributionlevel: P

Version: 1.0

ThisdocumentispartofaresearchprojectfundedbytheHorizon2020FrameworkProgrammeoftheEuropeanUnion

Page 2: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

ChangeLog

Version Date Amendedby Changes

0.1 14/11/2017 AlexanderMikroyannidis

Outlineandresponsibilitiesofcontributors.

0.2 10/01/2018 AlexanderMikroyannidis

Versionforinternalreview.

0.3 23/01/2018 AlexanderMikroyannidis

Revisedversion.

1.0 24/01/2018 AlexanderMikroyannidis

FinalQA.

Page 3: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page3of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

TableofContents

ChangeLog............................................................................................................................................................................................2 TableofContents...............................................................................................................................................................................3 ListofTables........................................................................................................................................................................................4 ListofFigures......................................................................................................................................................................................4 1. Executivesummary...............................................................................................................................................................5 2. Introduction..............................................................................................................................................................................6 3. Programming/ComputationalThinking(RandPython)................................................................................8 3.1 Moduleoverview.........................................................................................................................................................8 3.1.1 LearningObjectives....................................................................................................................................................8 3.1.2 SyllabusandTopicDescriptions.........................................................................................................................8 3.1.3 AlgorithmicThinking................................................................................................................................................8 3.1.4 Computingasanabstraction.................................................................................................................................9 3.1.5 TheWebandtheCloud............................................................................................................................................9 3.1.6 DataManagement....................................................................................................................................................10 3.1.7 PythonProgramming.............................................................................................................................................10 3.2 Learningmaterials&deliverymethods.......................................................................................................10 3.3 Relevancetocurriculum.......................................................................................................................................12 3.4 Relevancetodemandanalysis..........................................................................................................................12 3.5 Furtherdevelopmentplans................................................................................................................................12

4. DataIntensiveComputing..............................................................................................................................................13 4.1 Moduleoverview......................................................................................................................................................13 4.1.1 PartI(Data-IntensiveComputing)SyllabusDescription....................................................................13 4.1.2 PartII(AdvancedTopicsinDistributedSystems)SyllabusDescription....................................15 4.1.3 PartIII(ScalableMachineLearningandDeepLearning)SyllabusDescription......................17 4.2 Learningmaterials&deliverymethods.......................................................................................................19 4.3 Relevancetocurriculum.......................................................................................................................................22 4.4 Relevancetodemandanalysis..........................................................................................................................22 4.5 Furtherdevelopmentplans................................................................................................................................22

5. SocialMediaAnalytics......................................................................................................................................................23 5.1 Moduleoverview......................................................................................................................................................23 5.2 Learningmaterials&deliverymethods.......................................................................................................26 5.3 Relevancetocurriculum.......................................................................................................................................27 5.4 Relevancetodemandanalysis..........................................................................................................................27 5.5 Furtherdevelopmentplans................................................................................................................................28

6. DataExploitationincludingdatamarketsandlicensing................................................................................29

Page 4: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page4of36EDSAGrantAgreementno.643937

6.1 Moduleoverview......................................................................................................................................................29 6.1.1 Course1:Exploitingdatasciencetocreatevalue...................................................................................29 6.1.2 Course2:Datasciencemeansbusiness........................................................................................................30 6.1.3 Course3:Datascienceecosystems.................................................................................................................30 6.2 Learningmaterials&deliverymethods.......................................................................................................31 6.3 Relevancetocurriculum.......................................................................................................................................31 6.4 Relevancetodemandanalysis..........................................................................................................................32 6.5 Furtherdevelopmentplans................................................................................................................................35

7. Conclusion...............................................................................................................................................................................36

ListofTablesTable1:ThefinalEDSAcurriculum.---------------------------------------------------------------------------------6

ListofFiguresFigure1:SampleofthelearningmaterialsoftheProgramming/ComputationalThinkingmodule.11 Figure2:ScreenshotoftheProgramming/ComputationalThinkingmodulehostedbytheFutureLearnplatform.-------------------------------------------------------------------------------------------------11 Figure3:ScreenshotfromtheIntroductionpresentationofPartI(Data-IntensiveComputing).----19 Figure4:ScreenshotfromtheBigDataStoragepresentationofPartI(Data-IntensiveComputing).20 Figure5:ScreenshotfromthelearningmaterialsofPartII(AdvancedTopicsinDistributedSystems).------------------------------------------------------------------------------------------------------------------------------20 Figure6:ScreenshotfromthelearningmaterialsofPartII(AdvancedTopicsinDistributedSystems).------------------------------------------------------------------------------------------------------------------------------21 Figure7:ScreenshotfromthelearningmaterialsofPartIII(ScalableMachineLearningandDeepLearning).-----------------------------------------------------------------------------------------------------------------21 Figure8:ScreenshotfromthelearningmaterialsofPartIII(ScalableMachineLearningandDeepLearning).-----------------------------------------------------------------------------------------------------------------22 Figure9:ScreenshotfromthelearningmaterialsoftheSocialMediaAnalyticsmodule.--------------26 Figure10:ScreenshotfromthelearningmaterialsoftheSocialMediaAnalyticsmodule.------------27 Figure11:ScreenshotfromthelearningmaterialsoftheSocialMediaAnalyticsmodule.------------27 Figure12:GoogletrendanalysisforBigdataandDataScience.---------------------------------------------32 Figure13:TheEDSAdashboardskillschart.---------------------------------------------------------------------33

Page 5: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page5of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

1. ExecutivesummaryThisdeliverablepresentsthemodulesthathavebeenaddedtotheEDSAcoursesportfolioduringthethird year (Y3)of theproject.According to the finalEDSA curriculumpresented inD2.3(M30), thefollowing4moduleswerescheduledforreleaseduringY3:

• Programming/ComputationalThinking(RandPython)• DataIntensiveComputing• SocialMediaAnalytics• DataExploitationincludingdatamarketsandlicensing

Theprojecthasbeenfocusedonbridgingthedatascienceskillsgapviaasupplyoftrainingmaterials.As a result, the EDSA courses portfolio includes a wide range of courses offered by renownededucational institutions both inside and outside the project consortium. These courses are selectedbasedontheirrelevancetotheEDSAcurriculumandtheEDSAdemandanalysis.

Thisdeliverablepresentsthelearningresourcesthathavebeenaddedtotheproject’scoursesportfolioinY3, inorder tocover the topicsof the finalEDSAcurriculumandaddress thecurrentdemandasidentifiedbytheEDSAdemandanalysis.

Page 6: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page6of36EDSAGrantAgreementno.643937

2. IntroductionTheEDSAcoursesportfolioincorporatesawiderangeofhighqualitylearningresources,eitherofferedbyprojectpartnersorbythirdparties.Thesecoursesareavailableas:

• MassiveOpenOnlineCourses(MOOCs)• Face-to-facecourses• Onlinecourses• Blendedcourses(deliveredface-to-faceandonline)

ThemaincriteriafortheselectionofcoursesforinclusionintheEDSAcoursesportfolioaretheEDSAcurriculumandtheEDSAdemandanalysis.CoursesareselectedbasedontheirpotentialofaddressingtheEDSAcurriculumtopicsaswellasthetrainingneedsofdatascientistsasidentifiedbytheEDSAdemandanalysis.WithregardstocompliancewiththeEDSAdemandanalysis,coursesareevaluatedagainsttherecommendationsoftheStudyEvaluationReport(D1.4).WithregardstocompliancewiththeEDSAcurriculum,coursesareevaluatedagainstthelatestversionofthecurriculumandthetopicsitaddresses.

ThefinalversionoftheEDSAcurriculum,togetherwithadiscussionaboutitsobjectivesandhowthesehavebeenachieved,hasbeenpresented inD2.3and isshown inTable1. In this table,modulesaregroupedbythestagetheybelongto.ThemodulesscheduledforreleaseinY3arehighlightedandarethefollowing:

1. Programming/ComputationalThinking(RandPython)2. DataIntensiveComputing3. SocialMediaAnalytics4. DataExploitationincludingdatamarketsandlicensing

Themodulespresentedinthisdeliverablearethereforecentredaroundtheabove4topics.

Table1:ThefinalEDSAcurriculum.

Topic Stage Schedule AllocatedPartner

FoundationsofDataScience Foundations M6 SOTON

FoundationsofBigData Foundations M6 JSI

Statistical/MathematicalFoundations Foundations M18 JSI

Programming/ComputationalThinking(RandPython)

Foundations M30 SOTON

BigDataArchitecture StorageandProcessing

M6 Fraunhofer

DistributedComputing StorageandProcessing

M6 KTH

Page 7: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page7of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

DataManagementandCuration StorageandProcessing

M18 TU/e

LinkedDataandtheSemanticWeb StorageandProcessing

M18 SOTON

DataIntensiveComputing StorageandProcessing

M30 KTH

MachineLearning,DataMiningandBasicAnalytics

Analysis M6 Persontyle

ProcessMining Analysis M6 TU/e

BigDataAnalytics Analysis M18 Fraunhofer

DataVisualisationandStorytelling InterpretationandUse

M18 ODI

SocialMediaAnalytics InterpretationandUse

M30 Fraunhofer

DataExploitationincludingdatamarketsandlicensing

InterpretationandUse

M30 ODI

The remainderof thisdeliverablepresents themodules thathavebeen incorporated into theEDSAcoursesportfolioinY3,basedontheirrelevancetotheEDSAcurriculumandtheEDSAdemandanalysis.Foreachmodule,wepresentanoverviewofitsobjectivesandlearningoutcomes,weidentifythetypesoflearningmaterialsusedandhowthesearedelivered,andwediscusshoweachmoduleisrelatedtothe EDSA curriculum and the demand analysis, as well as the plans in place for their furtherdevelopment.

Page 8: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page8of36EDSAGrantAgreementno.643937

3. Programming/ComputationalThinking(RandPython)

3.1 ModuleoverviewThisisacourseaboutgettingpeopletothinklikecomputerscientists. Manypeoplethinkcomputerscienceisonlyaboutwritingcode.Whilstthisisapartofit,inthiscourseweaimtoshowmoreoftheprocessesbehindwritingthecode,addingthecodeasanafterthought.WebasethecourseononeofourmodulesattheUniversityofSouthampton,whichweusetogetpeoplefrombackgroundsotherthancomputerscienceuptospeedonourWebScienceprogrammes.WealsousetheworkofWing(2006)1aboutcomputationalthinking.

In addition to being able to focus on thinking like a computer scientist,we alsowish to introducelearnerstothepracticalskillstheyneedtoputthisthinkingintopractice.Assuch,eachweekwehaveincludedinstructionsaboutusingthePythonprogramminglanguagetoapplycorecomputationalandprogrammingconcepts.

ThiscourseisanexceptionwhencomparedtotheothercoursesintheEDSAcurriculum,inthatitisafoundationalcoursewhichprovidesbackgroundtocompletetheremainderofthedatasciencecourses.Assuch,itisslightlyfurtherawayfromwhatmightbeconsidereda“core”ofdatascience.

3.1.1 LearningObjectivesAftercompletionofthiscourse,participantswillbeableto:

1. Describehowcomputingispossiblethroughaseriesofabstractions2. Assessthemosteffectivealgorithmforaparticulartask3. Designsimplealgorithmsforcommoncomputingtasks4. RetrievedatastoredfromaWebAPI,andstoreitinarelationaldatabase5. DevelopsimpleapplicationsusingthePythonprogramminglanguage6. Approacheverydayproblemswithacomputationalfocus

3.1.2 SyllabusandTopicDescriptionsThisisanintroductorycourse,whichintroducestheconceptsrequiredtobeabletoworkasacomputerscientist.Thelayoutofthecourseisasfollows:

3.1.3 AlgorithmicThinkingInthissection,studentsareintroducedtotheconceptthattherearedifferentwaysofperformingtasks,someofwhicharebetterthanothers.Inaddition,withthesedifferentways,thestudentswilllearnhowanalgorithmmaybeevaluated.

Algorithms:

● Introduction-Introductiontoalgorithms,andwhytheyareuseful.Comparesacomputationalalgorithmtoafoodrecipe

● DifferentMethods-Therearemanywaysofdoingthesamething,eachwithadvantagesanddisadvantages.Showsthatabetteralgorithmcanleadtoagreaterimprovementinperformancethanfastercomputers

● Evaluation-Havingestablishedthatmorethanonealgorithmcanperformthesametask,weintroducesomeformalmeansofevaluatingtheaveragecase,bestcase,andworstcasescenarios.

1http://dl.acm.org/citation.cfm?id=1118215

Page 9: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page9of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

● Tractability-Notallalgorithmscancompleteinareasonabletime,sometimesanapproximatesolutionisabetteroption

DataStructures:

● Introduction -Datastructureshavedifferentforms,tooptimiseaparticulartypeoffunctiongivencertainconstraints

● Simpledatastructures-Introducesthestackandqueueasdatastructures,andhowefficientlytheyperformcertaintasks

3.1.4 ComputingasanabstractionIn this section, the course looks at ways in which computers make use of black boxes and otherabstractionstobuilduponwhathasbeencompletedbefore:

● Computerarchitecture-Acomputerisaclearexampleofanabstraction,discusseshowitgetsfromswitchesandgatestoanoperatingsystem

○ Binary/machinecode,logicgates,hex,assembly● Programmingasanabstraction-Computersaregoodatautomation,hereweseehowthe

automationofsomeprinciplesleadsto○ Compiledvsinterpreted○ Differenttypesofprogramming,e.g.,procedural,functional,objectoriented,eventbased○ Higherlevellanguages

● Objectorientedprogramming-Aclassisanabstractionofathing,containingrepresentationsonlyofthethingswewantitto.

● Dataasanabstraction○ Howdataorinformationareabstractedfrombinarytorepresenttext,statistics,orother

mediacontent

3.1.5 TheWebandtheCloudTheWorldWideWeb(theWeb)iscentraltomoderndataprocesses.Inthissection,weintroducetheWebasaninstanceofacomputernetwork. WiththeubiquityofInternetconnections,aninevitableextensionwastheconceptoftheCloud,whereonlineoperatorsprovideandmanageserviceswhichwouldpreviouslyhavebeendoneonabusiness’sowninfrastructure.

● Whatarecomputernetworks?Anintroductiontobasicconceptsofhowdataaretransportedacrossanetwork

● TheWWW-Themostwidelyknownnetwork○ Client/servermodel-ExplainsthemodelwhichisusedbytheWeb○ HTTP-Explainsthebackgroundtohowtheprotocolhandlesrequestandresponseto

connecttotheWeb,andtheuseofdifferentRESTverbs○ DisplayingpagesontheWeb

■ HTML&CSS-separationofconcernsbetweencontentandpresentation● TheCloud

○ Introductiontothecloud-Servicesprovidedbycloudcompaniesvaryfromprovisionofhardwaretofullyrunningservices.Thissectionexplainsthecloudbusinessmodel.

○ Types of cloud services - This section introduces the different types of servicesprovidedbycloudcompanies,suchasIaas,PaaS,SaaS

Page 10: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page10of36EDSAGrantAgreementno.643937

3.1.6 DataManagementThissectionuses“datamanagement”inabroadway,astodefinealldifferentmeansofmanipulatingandrepresentingdata.

● Introductiontodatamanagement○ Storedatasomewhere,whilstbeingabletohaveaccesstoit○ Differenttrade-offswithdifferentsituations○ Typesofdata:Unstructured,semi-structured,structured○ Differentscenariosfordatamanagement

● Data Representation - Provides examples of a variety of scenarios of how data may berepresented,suchthatitcanbemanipulatedforanalysisinthefuture.

○ Text files - The simplest way of representing data. Introduces the following dataformats:csv/tsv/json/xml,andgoodpracticesforlogfiles

○ Images,videos-Followinganalysisofdataitmayberepresentedinanimageorvideo;orthesecouldthemselvesbeusedasdatatobeanalysed.Discussionofcommonimageandvideoencoding.

○ Relationaldatabase-Themostcommonwayofstoringdata,usingSQLtomanipulateandretrieve.

○ NoSQL-IntroducestheCAPtheorem,anddiscussestrade-offsbetweenrelationalandnon-relationaldata.

○ APIs-DatastoredinsomemannercanbemadeavailablefromanAPI.BuildsonthediscussionfromHTTPandintroducesRESTAPIs

3.1.7 PythonProgrammingThisisdonealongsidetheothersections,asameansofapplyingthetheoreticalconceptsintroducedintheotherweek. Itwill not give expertise inPythonprogramming, butwill provide a foundation todevelopexpertiseinthefuture.TheintentionisnottoteachhowtoprograminPythonsomuch,butrathertoapplytheprinciplesintheremainderofthecourse.

PythonischosenaboveRforthiscurriculum,sinceitismoregeneralpurpose,andmorerelevanttotheprinciples of programming generally as opposed to R which fits better as a language for themathematicalsideofprogramming.Inaddition,thereareothercourseswithinthiscurriculumwhichadditionallycoverR,suchas‘StatisticalandMathematicalFoundations’.TeachingPythonwillcoveronlytheessentialssufficienttoperformthesetasks,withstudentsreferredtoothercoursesshouldtheywishtogainamoredetailedunderstanding.

● Settingupadevelopmentenvironment● Primitivetypes● Controlflowandloops● Classesandfunctions

3.2 Learningmaterials&deliverymethodsThecourseisofferedbytheUniversityofSouthamptonandwillbedeliveredviatheFutureLearnMOOCplatform.Thematerialsaredeliveredthroughacombinationofvideos/slides,aswellastextwritteninmarkdownintheplatformitself.Participantsgetanopportunitytopracticetheskillstheyhavelearntthroughexercises,andquizzesintheFutureLearnsystem.

Page 11: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page11of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

Figure1:SampleofthelearningmaterialsoftheProgramming/ComputationalThinking

module.

Figure2:ScreenshotoftheProgramming/ComputationalThinkingmodulehostedbythe

FutureLearnplatform.

Page 12: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page12of36EDSAGrantAgreementno.643937

3.3 RelevancetocurriculumThis course is a foundational module in the EDSA curriculum, scheduled to be released inM36. Itaddresses the “Computational thinking” module, as this was defined in the final EDSA curriculumpresentedinD2.3.

3.4 RelevancetodemandanalysisInD1.4,itwasrecommendedthatEDSAshouldcreateadatascienceskillsframework,whichwasdonebasedon theEDISONskills areas. Inaddition, thedemand fordifferent skills basedon thedemanddashboardwasusedinordertoidentifyrelevantskills.AsdiscussedinD2.3,therewasadistinctionbetweenwhatshouldcomprisecore“datascience”,andskillswhichwouldberequiredtoobtainajobindatascience.Thiscoursefillsthegapasabasiccourse,whichprovidestheprerequisitesforbeingabletoperformthemoreadvancedtechnicalskillswhilstallowingothercoursestoretainthefocusoncoredatascienceskills.

3.5 FurtherdevelopmentplansThiscoursewillbereleasedontheFutureLearnplatforminearly2018.Followinginitialrun,feedbackwillbesoughtwithaviewtodevelopingbasedonthis.Inaddition,extraareaswillbeinvestigatedforpossibleinclusioninfutureversionsofthecourse.

Page 13: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page13of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

4. DataIntensiveComputing

4.1 ModuleoverviewThedataintensivecomputingmoduleiscomposedofthreeparts.PartI(Data-IntensiveComputing)provides a solid foundation for understanding large scale distributed systems used for storing andprocessingmassivedata.Awidevarietyofadvancedtopicsindataintensivecomputingareintroduced.Thesetopicsincludedistributedfilesystems,NoSQLdatabases,processingdata-at-rest(batchdata)anddata-in-motion (streaming data), graph processing, and resource management. Part II (AdvancedTopicsinDistributedSystems)complementsthefirstpart,byfocusingontheconceptsofgraphtheorywhichallowtoexplaintheconnectivityanddynamicsofmanyreal-worldnetworks.TheobjectiveofPartIIistoprovideadeeperunderstandingandstudyofthebehaviourofthenetworksarisingwithinDistributedSystems.ThecourseinPartIIwillcoverthetopicsofDistributedDataManagement,LargeGraphProcessing,Publish/SubscribeSystemsandNavigableSmall-WorldOverlays.PartIII(ScalableMachineLearningandDeepLearning)providesthestudentswithasolidfoundationforunderstandinglarge-scale machine learning algorithms, in particular, Deep Learning, and their application areas.Techniques for efficient parallelization of machine learning algorithms are explained to show theintersectionbetweendistributedsystemsandmachinelearningfields.

4.1.1 PartI(Data-IntensiveComputing)SyllabusDescriptionDetailedContent

Syllabus Description

Introduction Conceptsandprinciplesofcloudcomputinganddataintensivecomputing.CloudcomputingandBigData(maintrends,definitionsandcharacteristics).CloudComputingModels:IaaS,PaaSandSaaS.Clouddeploymentmodels.DimensionsofBigData:Volume,Velocity,Variety,Vacillation.BigDataStack:Processing,Storage,ResourceManagement.

Storage DistributedFileSystems.GoogleFileSystem(GFS).MasterOperations.SystemInterations.FaultTolerance.FlatDatacenterStorage(FDS).DatabasesandDatabasemanagement.NoSQL.Dynamo.DataConsistency.BigTable.Cassandra

ParallelProcessing

ProgrammingLanguages:CrashcourseinScala.MapReduce.FlumeJava.Dryad.SparkandSparkSQL.ProjectTungsten

StreamProcessing

Introductiontostreamprocessing.SPSprogrammingmodel.DataFlowCompositionandManipulation.Parallelization.FaultTolerance.DistributedmessagingSystem.Kafka.Storm.SEEP.Naiad.SparkStreaming.StructuredStreaming.FlinkStream.MillWheel.GoogleCloudDataflow

GraphProcessing

GraphAlgorithmsCharacteristics.Data-parallelvs.Graph-ParallelComputation.Pregel.GraphLab.PowerGraph(GrpahLab2).GraphX.X-Stream.Edge-CentricprogrammingModel.Streamingpartitions.Chaos.Storageandcomputationmodel

Machine DataandKnowledge.KnowledgeDiscoveryfromData.Data

Page 14: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page14of36EDSAGrantAgreementno.643937

learningwithMllibandTensorflow

MiningFunctionalities.ClassificationandRegression(SupervisedLearning).Clustering(Unsupervisedlearning).MLlib.Tensorflow

ResourceManagement

Mesos.YARN

ExistingMaterials

• DataIntensiveComputing,RoyalInstituteoftechnology,KTH• Data-intensiveComputingSystems,DukeUniversity,

https://www.cs.duke.edu/courses/spring15/compsci516/• Data-IntensiveComputing,IllinoisInstituteofTechnology,

http://www.cs.iit.edu/~iraicu/teaching/CS554-F13/• Data-IntensiveScalableComputing,BrownUniversity,http://cs.brown.edu/courses/csci2950-

u/f11/• Data-IntensiveComputing,TheCityUniversityofHongKong,

https://www.cityu.edu.hk/ug/201415/course/CS4480.htm• DataIntensiveComputingandClouds,UniversityofCentralFlorida,

http://www.eecs.ucf.edu/~jwang/Teaching/EEL6938-s12/• DataIntensiveComputing,UniversityofBuffalo,

http://www.cse.buffalo.edu/shared/course.php?e=CSE&n=587

FurtherReading

• P.Mell,andT.Grance.TheNISTDefinitionofCloudComputing• 800-145.NationalInstituteofStandardsandTechnology(NIST),Gaithersburg,MD,(September2011)• Armbrust,M.,Fox,A.,Griffith,R.,Joseph,A.D.,Katz,R.H.andKonwinski,A.,Lee,G.,Patterson,D.A.

,Rabkin,A.,Stoica,I.,andZaharia,M.AbovetheClouds:ABerkeleyViewofCloudComputing.EECSDepartment,UniversityofCalifornia,Berkeley,TechnicalReportNo.UCB/EECS-2009-28,February10,2009

• S.Ghemawat,H.dGobioff,andShun-TakLeung.TheGoogleFileSystem.19thACMSymposiumonOperatingSystemsPrinciples,LakeGeorge,NY,October,2003

• E.B.Nightingale,J.Elson,J.DeanandS.Ghemawat.FlatDatacenterStorage.USENIXAssociation10thUSENIXSymposiumonOperatingSystemsDesignandImplementation(OSDI’12).

• DeCandia,G.,Hastorun,D.,Jampani,M.,Kakulapati,G.,Lakshman,A.,Pilchin,A.,Sivasubramanian,S.,Vosshall, P., andVogels,W.Dynamo:Amazon'sHighlyAvailableKey-value Store. ProceedingsofTwenty-firstACMSIGOPSSymposiumonOperatingSystemsPrinciples,SOSP’07

• Chang,F.,Dean, J., Ghemawat,S.,Hsieh,W.C.,Wallach,D.A., Burrows,M., Chandra,T., Fikes,A.,Gruber,R.E.Bigtable:ADistributedStorageSystemforStructuredData.ACMTrans.Comput.Syst.,4:1--4:26,2008

• Lakshman,andP.Malik.Cassandra:adecentralizedstructuredstoragesystem.OperatingSystemsReview44(2):35-40(2010)

• J. Dean , S. Ghemawat. MapReduce: simplified data processing on large clusters. OSDI’04:PROCEEDINGSOFTHE6THCONFERENCEONSYMPOSIUMONOPERATINGSYSTEMSDESIGNANDIMPLEMENTATION

Page 15: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page15of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

• Chambers,A.Raniwala, F. Perry, S.Adams,R.Henry,R.BradshawandNathan. FlumeJava:Easy,EfficientData-ParallelPipelines.ACMSIGPLANConferenceonProgrammingLanguageDesignandImplementation(PLDI),ACMNewYork,NY2010,2PennPlaza,Suite701NewYork,NY10121-0701(2010),pp.363-375

• Zaharia,M.,Chowdhury,M.,Das,T.,Dave,A.,Ma,J.,McCauley,M.,Franklin,M.J.,Shenker,S,Stoica,I.Resilient Distributed Datasets: A Fault-tolerant Abstraction for In-memory Cluster Computing.Proceedings of the 9th USENIX Conference onNetworked SystemsDesign and Implementation.2012

• G.Cugola,AlessandroMargara.Processingflowsofinformation:Fromdatastreamtocomplexeventprocessing.ACMComput.Surv.44(3):15:1-15:62(2012)

• J.-H.Hwang,M.Balazinska,A.Rasin,U.Çetintemel,M.Stonebraker,S.B.Zdonik:• High-AvailabilityAlgorithmsforDistributedStreamProcessing.ICDE2005:779-790• R.C.Fernandez,M.Migliavacca,E.Kalyvianaki,andP.Pietzuch.2013.Integratingscaleoutandfault

toleranceinstreamprocessingusingoperatorstatemanagement.InProceedingsofthe2013ACMSIGMODInternationalConferenceonManagementofData(SIGMOD'13).ACM,NewYork,NY,USA,725-736.

• M.Zaharia,T.Das,H.Li,S.Shenker,andI.Stoica.2012.Discretizedstreams:anefficientandfault-tolerantmodelforstreamprocessingonlargeclusters.InProceedingsofthe4thUSENIXconferenceonHotTopicsinCloudCcomputing(HotCloud'12).USENIXAssociation,Berkeley,CA,USA,10-10.

• T.Akidau,R.Bradshaw,C.Chambers,S.Chernyak,R.J.Fernández-Moctezuma,R.Lax,S.McVeety,D.Mills, F. Perry, E. Schmidt, and S. Whittle. 2015. The dataflow model: a practical approach tobalancingcorrectness,latency,andcostinmassive-scale,unbounded,out-of-orderdataprocessing.Proc.VLDBEndow.8,12(August2015),1792-1803.

• X.Meng,J.Bradley,B.Yavuz,E.Sparks,S.Venkataraman,D.Liu,J.Freeman,DBTsai,M.Amde,S.Owen,D.Xin,R.Xin,M. J.Franklin,R.Zadeh,M.Zaharia,andA.Talwalkar.2016.MLlib:machinelearninginapachespark.J.Mach.Learn.Res.17,1(January2016),1235-1241.

4.1.2 PartII(AdvancedTopicsinDistributedSystems)SyllabusDescriptionDetailedContent

Syllabus Description

Introduction Networks:behaviouranddynamics.AnalysisofNetworks.Networksversusgraphs.Complexityofnetworks.

MainConcepts Basicdefinitions.Paths.Cycles.Connectivity.(Giant)Components.Distance.

Networkmodels

G(n,m)model.Erdos-Renyirandomgraph.Randomgraphsandrealworld.Watts-Strogatzmodel.Small-Worldsmodel.Preferentialattachmentmodel.Randomwalks.

PageRank, ConvergenceofRandomWalk.RelationtoWebsearch.Google

Page 16: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page16of36EDSAGrantAgreementno.643937

GraphSpectra

pagerank.Topicspecificpagerank.Graphspectrum.Spectralgraphpartitioning.

GraphExploration,NavigableSmall-WorldNetworks

Howtoexplorebignetworks?Influenceofnodedegree.Milgram’s(Small-worlds)experiment.ImplicationsforP2Psystems.NavigationinWatts-StrogatzSmall-Worlds.Kleinberg’smodelofSmall-Worlds.

NavigableStructuredOverlays,GossipingAlgorithms,Cyclon

Small-WorldsbasedP2Poverlays.ApproximationofKleinberg’smodel.TraditionalDHTsandKleinberg’smodel.Basicnavigationprinciples.Gossipingalgorithms.Cyclon.

Topologyconstructionbygossiping.Publish/SubscribeSystems

Proactivegossipframework.Topologycreation.T-man.Publish/Subscribesystems.Topicbasedpublish/subscribe.ContentTopicbasedpublish/subscribe.Filterbasedrouting.Overlay-Per-Topicpublish/subscribe.SpiderCast.Interest-Aware(greedy)Links.Tera.Rendezvousbasedpublish/subscribe.

HybridPub/SubSystemsScribe

Decentralizedpublish/subscribe.Vitis.BuildingNavigableStructure.

ExistingMaterials

• AdvancedTopicsinDistributedComputing,atRoyalInstituteofTechnology,KTH• NetworkAnalysis,UniversityofMichigan,

https://www.icpsr.umich.edu/icpsrweb/sumprog/courses/0131• SocialandInformationNetworkAnalysis,StanfordcenterforProfessionalDevelopment,

http://scpd.stanford.edu/search/publicCourseSearchDetails.do?method=load&courseId=7932016• SocialNetworkAnalysis,UniversityofMichigan,Coursera,https://www.class-

central.com/mooc/338/coursera-social-network-analysis• NetworkAnalysisandModelling,SantaFeInstitute,

http://tuvalu.santafe.edu/~aaronc/courses/5352/

Page 17: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page17of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

FurtherReading

• "Networks,Crowds,andMarkets:ReasoningAboutaHighlyConnectedWorld"byDavidEasleyandJonKleinberg

• "Networks:AnIntroduction"byMarkNewman• “FoundationsofDataScience”byJ.HopcroftandR.Kannan

4.1.3 PartIII(ScalableMachineLearningandDeepLearning)SyllabusDescriptionMainTopics:

• MachineLearning(ML)Principles

• UsingScalableDataAnalyticsFrameworkstoparallelizemachinelearningalgorithms

• DistributedLinearRegression

• DistributedLogisticRegression

• DistributedPrincipalComponentAnalysis

• LinearAlgebra,ProbabilityTheoryandNumericalComputation

• FeedforwardDeepNetworks

• RegularizationinDeepLearning

• OptimizationforTrainingDeepModels

• ConvolutionalNetworks

• SequenceModelling:RecurrentandRecursiveNets

DetailedContent

Syllabus Description

Introduction Abriefhistoryandapplicationexamplesofdeeplearning,Large-ScaleMachineLearning:atGoogleandinindustry,MLbackground,briefoverviewofDeepLearning,understandingDeepLearningSystems,LinearAlgebrareview,ProbabilityTheoryreview.

DistributedMLandLinearRegression

SupervisedandUnsupervisedlearning,MLpipeline,Classificationpipeline,Linearregression,DistributedML,ComputationalComplexity

GradientDescentandSparkML

Optimizationtheoryreview,gradientdescentforLeastSquaresRegression,TheGradient,Large-ScaleMLPipelines,FeatureExtraction,FeatureHashing,ApacheSparkandSparkML.

Page 18: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page18of36EDSAGrantAgreementno.643937

LogisticRegressionandClassification

ProbabilisticInterpretation,MultinomialLogisticClassification,ClassificationExampleinTensorflow,QuickLookinTensorflow

FeedforwardNeuralNetsandBackprop

NumericalStability,NeuralNetworks,FeedforwardNeuralNetworks,FeedforwardPhase,Backpropagation

RegularizationandDebugging

AFlowofDeepLearning,TechniquesforTrainingDeepLearningNets,Regularization,WhydoesDeepLearningwork?

ConvolutionalNeuralNetworks

HowConvolutionalNeuralNetsWork,ConvNetswithDepth,MemoryComplexity,CaseStudies,ConvNetsforEverything

RecurrentNeuralnetworks,Sequence-to-SequenceLearningandAutoencoders,LongShort-termMemory(LSTM),Attention,Autoencoders:UnsupervisedFeatureLearning,

DeepReinforcementLearning

MarkovdecisionProcesses,OverviewofReinforcementlearning,SupervisedLearningvsReinforcementLearning,DeepRL,Q-learning,DeepPolicyNetworks,DistributedRLArchitecture,AsynchronousRL.

CaseStudy AlphaGo,WhenwillDeepLearningbecomeIntelligent?

ExistingMaterial

• ScalablemachinelearningandDeepLearning,atRoyalInstituteoftechnology,KTH

• ScalableMachineLearning,edX,https://courses.edx.org/courses/BerkeleyX/CS190.1x/1T2015/info

• DistributedMachineLearningwithApacheSparkhttps://www.edx.org/course/distributed-machine-learning-apache-uc-berkeleyx-cs120x

• DeepLearningSystems,UniversityofWashington,http://dlsys.cs.washington.edu/

• ScalableMachineLearning,UniversityofBerkeley,

https://bcourses.berkeley.edu/courses/1413454/

Page 19: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page19of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

FurtherReading

• IanGoodfellowandYoshuaBengioandAaronCourville.DeepLearning,MITPress• SparkMLPipelines,http://spark.apache.org/docs/latest/ml-pipeline.html• SparkMLOverview,https://www.infoq.com/articles/apache-sparkml-data-pipelines• SparkMLundertheHood,http://spark.tc/machine-learning-in-apache-spark-2-0-under-the-hood-

and-over-the-horizon-2/?cm_mc_uid=02948943737214702940997&cm_mc_sid_50200000=1472964414

• Tensorflow,https://www.tensorflow.org/versions/r0.11/tutorials/

4.2 Learningmaterials&deliverymethodsThecourseisofferedbyKTHandeachofitspartsisdeliveredasfollows:

PartI:Data-IntensiveComputing

The course consists of thirteen face-to-face lectures in the form of presentations. Slides and videolecturesareprovidedtostudentsthroughaninternaluniversityportal.Studentshavetostudyacoupleof research papers before each lecture. Four Exercises are provided as lab assignments,which arediscussedwithstudentswith teacherassistants.Tworeadingassignmentsare toreviewpapersandencouragethestudentstoanalyzepapersinacriticalway.

Figure3:ScreenshotfromtheIntroductionpresentationofPartI(Data-IntensiveComputing).

Page 20: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page20of36EDSAGrantAgreementno.643937

Figure4:ScreenshotfromtheBigDataStoragepresentationofPartI(Data-Intensive

Computing).

PartII:AdvancedTopicsinDistributedSystems

Thecourseconsistsofeightface-to-facelecturesintheformofpresentations.Slidesandvideolecturesareprovidedtostudents.Studentshavetostudyacoupleofresearchpapersbeforeeachlecture.Tworeadingassignmentsaretoreviewpapersandencouragethestudentstoanalyzepapersinacriticalway.Studentsarerequiredtogiveanoralpresentationrelatedtothepaperstheyreview.Belowaresomescreenshotsfromthecourselearningmaterials.

Figure5:ScreenshotfromthelearningmaterialsofPartII(AdvancedTopicsinDistributed

Systems).

Page 21: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page21of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

Figure6:ScreenshotfromthelearningmaterialsofPartII(AdvancedTopicsinDistributed

Systems).

PartIII:ScalableMachineLearningandDeepLearning

Thecourseconsistsofelevenface-to-facelecturesintheformofpresentations.Studentsarerequiredto work on programming assignments using Scala/python programming languages. They are alsorequiredtosucceedinaprojectwhichideatheychoosefromthecontextofthecourseandusingdeeplearning.Belowaresomescreenshotsfromthecourselearningmaterials.

Figure7:ScreenshotfromthelearningmaterialsofPartIII(ScalableMachineLearningand

DeepLearning).

Page 22: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page22of36EDSAGrantAgreementno.643937

Figure8:ScreenshotfromthelearningmaterialsofPartIII(ScalableMachineLearningand

DeepLearning).

4.3 RelevancetocurriculumThecoursecorrespondstothe“Data-IntensiveComputing”moduleoftheEDSAcurriculum,asthiswaspresentedinD2.3.

4.4 RelevancetodemandanalysisThecoursesprovideasolidfoundationforstudentsinmachinelearninganddataminingcombinedwithdistributedcomputing.Twoofthesecoursesaregiveninablendedformat,andthethirdoneisaface-to-facecourse.Thecoursesarerichintopsoftskillsrequiredcurrentlyinthedatasciencefield.Studentsareencouragedtodopeer-reviewsdiscussions,oralpresentationsandpaperreviewsinthesecourses,which improve their communication skills. All courses in this module use open-source tools andprogramming libraries. There are plans to make some of these courses reachable across differentorganizations.

4.5 FurtherdevelopmentplansThecontentsofallcoursesofthismoduleareupdatedaccordingtorecentpapersandimprovementsinthefield.Labassignmentsarealsoenhancedeachtimethecourseisgiven.Studentsprovideevaluationat the end of course about the programming languages, and platforms that are used. Courses areadjustedaccordingtoreviewsthatarerelevant.ThereareplanstoprovidesomeofthesecoursesasMOOCcourses.

Page 23: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page23of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

5. SocialMediaAnalytics

5.1 ModuleoverviewTheparticipantsgetasoundoverviewofsocialmediaanalysisproblemsandanalysistechniques.Targetistheextractionandsummarizationofuserstatementsfromsocialmediasources.Therelevanttextdata mining concepts and deep learning approaches will be introduced. The underlying modelassumptionsarediscussedasfarastheyareimportantforthepracticalapplicationandinterpretationofresults.Specialemphasisisputontheperformanceevaluationofprocedures.Tobeabletoprocesscomprehensive collections specific Big Data implementationswill be utilized. Most approaches aredemonstrated by stepping through small Python scripts to show the necessary computations. TheparticipantswillbeabletorealisticallyassesstheapplicationofsocialmediaanalysistechnologiesfordifferentusagescenariosandcanstartwiththeirownexperimentsusingdifferentanalysispackageswithPython.

Syllabus Conceptsandmethods

Introduction

•SocialMedia

•MediaAnalytics

•Recentsuccessstories

Typesofsocialmedia

Userbaseofsocialmedia

Relevanceofsocialmediaanalysis

Analysisbymachinelearning

Mainmachinelearningparadigms

Differenttypesofanalysisquestions

Importantapplications

Downloadandpreparation

•Documentformats

•Corpora

•Google,Twitter,Facebook

•Preparationofdata

Searchenginesfordocumentcollection

crawlingpackages

parsingofdifferentdocumentformats

languagedetection

sentencesplittingandtokenization

partofspeechtagging

Examplescripts

Classifysocialmediaposts

•Commonapproaches

•Performanceevaluation

•Application

Definition

Differenttypes:logisticregression,

supportvectormachine

Overfitting

testandperformancemeasures

featureselection

BigDataapproaches:Spark,TensorFlow

ExampleScripts

Page 24: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page24of36EDSAGrantAgreementno.643937

Wordsimilarityinposts

•Groupwordsinposts

•Detectingtopicsinposts

Unsupervisedlearningofwordproperties

topicmodels:Assumingunderlyingtopics

word2vec:predictingwordsintheneighborhood

stochasticoptimization

negativesampling

BigDataimplementations:Spark,Tensorflow

Examplescripts

Clusteringsocialmediaposts

•Similarityofpost

•visualization

Definingsimilaritymeasuresusingtopicmodelsandwordembeddings

Clusteringapproaches

Visualizationandspecialization

Interpretationofresults

Examplescripts

Detectingnames,products

•Detectingentities

•Longrangedependencies

Predictpropertiesofwords

Probabilistic:ConditionalRandomField

recurrentneuralnetwork

automaticfeatureselection

wordmodels

bidirectionalrecurrentneuralnetwork

testandperformancemeasures

practicaltrainingandevaluation

Examplescripts

Opinionsinsocialmedia

•Predictusersentiment

•opinionsandaspects

•phrasesandtheirrelations

Attachinglabelstophrasesandsentences

Typesofopinionmining

Aspectbasedopinions

Evaluation

Examplescripts

Practicalsocialmediaanalytics

•Lessonslearnt

•selectingalgorithms

•combiningapproaches

Descriptionofapracticalevaluation

Combiningdifferentapproachesandalgorithms

Evaluationandperformanceassessment

visualization

Existingtraining:

● Face-to-facetrainingatFraunhofer:http://www.iais.fraunhofer.de/socialmediaanalytics.html

● https://www.diygenius.com/10-free-online-courses-in-social-media-and-inbound-marketing/

Page 25: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page25of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

● https://apps.ep.jhu.edu/course-homepages/3523-605.433-social-media-analytics-piorkowski-mcculloh

● https://www.coursera.org/specializations/social-media-marketingExercises:Startingwiththeexamplescriptstheparticipantswillbegivennewdatasets.Theyhavetoselect an analysis technique, perform the analysis and report and interpret the evaluation andvisualizations.

Sourcesforfurtherreading:

• G.F.KhanSevenLayersofSocialMediaAnalytics(2015):MiningBusinessInsightsfromSocialMediaText,Actions,Networks,Hyperlinks,Apps,SearchEngine,andLocationData.Kindle

• T.Schreck,D.Keim(2013),VisualAnalysisofSocialMediaData,• C.C.Aggarwal,C.X.Zhai(2012)MiningTextData.Springer• CDManning,PRaghavan,HSchütze(2008):IntroductiontoInformationRetrieval• C.Biemann,A.Mehler(2015):TextMining:FromOntologyLearningtoAutomatedTextProcessing

Applications• http://www.iais.fraunhofer.de/data-scientist.html• https://www.tensorflow.org/• http://scikit-learn.org/stable/

Descriptionoftopics:

1. Introduction:Differenttypesofsocialmedia(Facebook,Twitter,Instagram,etc.)areconsideredand the utilization by users is discussed. Social media analysis is relevant as many actualdecisionofusers,e.g.buyinggoodsorvotinginelectionsarebasedonsocialmedia.Thiscourseaimsatextractingandsummarizingthestatementsofusersfromsocialmediatogivearealisticimpressionofthepublicviewonrelevantquestions.Differenttypesofanalysisquestionswillbediscussed.Therelevantmachinelearningconceptswillbeintroduced.Finallyanumberofapplicationhighlightswillbepresented.

2. DownloadandPreparation:Inthislecturetechniquesfordownloading,crawlingandmonitoringsocialmediawillbepresented.Asasecondstepthedifferentformat–pdf,text,html,ePUB–havetobeparsedtoextracttheunderlyingtext.Thistextusuallyispre-processedtoextractwords and sentences. Often only specific parts of the documents are required, e.g. htmlnavigation or advertisements have to be excluded. Part-of-speech tagging may be used toexclude specific types ofwords – e.g. functionwords. The approaches are demonstrated byPythonsamplescripts.

3. Classify Social Media Posts: Often we want to categorize the documents according to theircontents, e.g. soccer, politics, recipes, etc. This can be done with high reliability by textclassification procedures. These approaches require a training set with manually classifiedexamples. Logistic regression and support vector machines are introduced as relevanttechniques.Wediscussthetrainingprocessandthephenomenonofoverfitting,whichmaybecontrolledbyregularization.Correspondingperformancemeasuresbasedonanindependenttest set are required. We use up to date libraries like scikit-learn as well as Spark andTensorFlow,whicharereadyforBigDataapplications.TheapproachesaredemonstratedbyPythonsamplescripts.

4. ClusteringWordsandSocialMediaPosts.Mostinterestinginsocialmediaanalyticsistoshowthespectrumofissueswhichareaddressedinposts.Onerelevantanalysisapproachinvestigatesthesimilarityofwordsbyputtingwordsintoahigh-dimensionalsemanticspace.Topicmodelsassumethatthereareanumberofunderlyingthemesinasocialmediasource.Thesetopicsareextracted without any user input (unsupervised) and each word in a document can be

Page 26: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page26of36EDSAGrantAgreementno.643937

representedasamixtureoftopics.Thedifferentwordembeddingapproach(word2vec)directlyrepresentseachwordasavector inahigh-dimensionalsemanticspace.Both techniquesareintroduced, and their training in the Big Data context is discussed. Both techniquesmay beextendedtodefinethecontentofadocumentaswellasthesimilarityofcompletedocuments.TheapproachesaredemonstratedbyPythonsamplescripts.

5. DetectingNamesandProducts.Aspecificrequirementinsocialmediaanalyticsisthedetectionofentities,i.e.phrasesofaspecifictype.Thesehavehighlyvaryingformandcanonlydetectedfromtheircontext.Hence,weintroducetwotypesofwordsequenceanalysisproceduresusingpre-labelled training data: Conditional Random Fields and Recurrent Neural Networks. Wediscusstheunderlyingassumptionsandtrainingproceduresandassesstheirperformancebyappropriateevaluationmeasures.TheapproachesaredemonstratedbyPythonsamplescripts.

6. OpinionMininginSocialMedia.Opinionmining(orsentimentanalysis)hastheaimtoextractsubjective user statements on specific issues from a social media post. There are differentapproachesrangingfromassigninganopinionscoretoawholedocumenttodetectingopinionsspecifiedforaspecificaspect.Wediscussarangeofanalysistechniquesincludingclassificationandwordsequenceanalysis.TheapproachesaredemonstratedbyPythonsamplescripts.

7. PracticalSocialMediaAnalysis.Socialmediacontentanalysisrequirestheknowledgeofalargetoolboxoftechniqueswithspecificunderlyingmodelsandlimitations.Fromourownprojectexperience,wegivepractical hints on the selectionof techniques, the successionof analysissteps, and their evaluation. Most important is the visualization of results, especially forclusteringapproachesliketopicmodellingorwordembedding.Wedemonstratethisusingt-SNEandGephi.Finally,usersmaydiscusstheirownquestionswiththeinstructors.

5.2 Learningmaterials&deliverymethodsThecourse isofferedbyFraunhoferandconsistsof two face-to-facedays, fullofpresentationswithcompactinformation.Scriptsareshownanddemonstrated,butthereisnotimeforexercises.Belowaresomescreenshotsfromthefirstpresentation.

Figure9:ScreenshotfromthelearningmaterialsoftheSocialMediaAnalyticsmodule.

Page 27: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page27of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

Figure10:ScreenshotfromthelearningmaterialsoftheSocialMediaAnalyticsmodule.

Figure11:ScreenshotfromthelearningmaterialsoftheSocialMediaAnalyticsmodule.

5.3 RelevancetocurriculumThecoursecorrespondstomodule13ofthefinalEDSAcurriculum,asthiswaspresentedinD2.3.Itaddressestheanalyticstageatanadvancedlevel.Itconcentratesontextualdataspecificallyfromsocialmedia.Thus,itisanattempttotargetaspecificsector.

5.4 RelevancetodemandanalysisThecourseconcentratesonopensourcetoolsandlibraries.Butitisratherspecialanddoesnottrytoreachparticipantsfromdifferentunits.Also,itdoesnotincludeblendinglearningformats,butthisisgoingtochange.

Page 28: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page28of36EDSAGrantAgreementno.643937

5.5 FurtherdevelopmentplansFraunhoferhasathree-levelcertificationprogramfordatascientists.In2018,anewcertificate“DataScientistSpecialized inMachineLearning”will beadded. It includes an examinationwhich tests fortheoreticalknowledgeandpracticalskills.Thisexaminationispreparedbyagroupofblendedcourses.Allcoursesstartface-to-faceandendwithapracticalphase,wheretheparticipantsdoexerciseswithbigdatasetsinadistributedmachinelearningplatform.Themandatorycourseisaboutdeeplearningandothercurrentmethodsofmachinelearning.Amongtheoptional,morespecialcoursesisacourseondeeptextanalytics2,firsttobedeliveredinMay2018.Thisblendedcoursewillreplacetheformerpurelyface-to-facecourseon“SocialMediaAnalytics”.

2https://www.bigdata.fraunhofer.de/de/datascientist/seminare/machine_learning_zertifizierung/textverstehen.html

Page 29: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page29of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

6. DataExploitationincludingdatamarketsandlicensing

6.1 ModuleoverviewDatascienceisfundamentallyaboutgeneratingimpactfromdata.Impactisachievedwhendataactsasacatalystforchangeeitherforsocial,environmentaloreconomicgain.

Attheheartofdrivingchangeisfindingandtellingstoriesindata,ascoveredinthe“FindingStoriesinData”3course.Thecombinationofthisalongwithstrongcommunicationandleadershipskillscanhelplead tochange inbusinesspractice ineitherprofitability,marketpositioningand/orefficiency.Datascience can inform and bring evidence for new and changing business models and effective dataexploitationcanalsounlocknewandunexpectedmarkets.

This set of 9 online lessons, offered by ODI, looks at how exploiting a world of data can lead tounexpected benefits for businesses and citizens alike. The 9 lessons are divided into 3 sequentialmodulesthatwillguidelearnersthroughthreekeytopicareasindataexploitation,datamarketsandlicensing.

• Course1:Exploitingdatasciencetocreatevalue• Aim:Enableyoutorecognisetheopportunitiestoexploitdatascienceandanalysetherisksof

doingso.o Lesson1:Exploitingdatascienceo Lesson2:Managingriskandrewardo Lesson3:Openinnovationanddatascience

• Course2:Datasciencemeansbusiness• Aim:Enableyoutoanalyseanumberofdifferentbusinessmodelstoexploitdatascience.

o Lesson1:Businessmodelsindatascienceo Lesson2:Pitchingfordatascienceo Lesson3:Datamarketsandlicensing

• Course3:Datascienceecosystems• Aim:Enableyoutoexploitdatascienceindifferentecosystems.

o Lesson1:Embeddingdatasciencewithinabusinesso Lesson2:Exploringthestartupecosystemo Lesson3:Exploitingdatasciencewithinasector

Curated content in the syllabuswill beaccompaniedwithnarration tohelp guide a learner in theirinterpretation of the resources. Additionally, supporting materials and guidance from those in thecommunitywillhelpexplainhowdatasciencehasopenedupnewmarkets.

6.1.1 Course1:ExploitingdatasciencetocreatevalueThissetoflessonslookbroadlyatthewaysthatdatasciencecancreatevalue,howtoidentifytherightopportunitiesandhowtomanagerisk.Bytheendofthismodulelearnersshouldbeabletomaketherightgo/no-godecisionsforexploitingdatascienceintheirownbusinesses.

Exploitingdatascience

Acrosstheworld,peopleareexploitingdatasciencetoinformdecisions,buildnewbusinessesandformnewcollaborations.Theaimofthislessonistohelplearnersunderstandhowdatasciencecreatesvalueandintroducethemtoanumberofcasestudies.Forexample,welookathowTransportforLondonusedWiFiaccesspointstotrackmovementofpassengersthroughthenetworkintheirfirstevershortstudyabletocollection400,000pointsofdataforanalysis.

3http://courses.edsa-project.eu/course/view.php?id=52

Page 30: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page30of36EDSAGrantAgreementno.643937

Managingriskandreward

Inordertoexploitdatasciencesuccessfully,itisimportanttobeabletoidentifythepotentialrisksandconsidersolutionstomanageproblemsthatmayarise.Oneofthemainconcernsisthatdatascienceisblurring“disciplinaryboundariesand fieldwithlonghistoriesofethical reviewarebeing inundatedwith work from fields with no history of review and indeed active movements against suchrequirements.”4Aswellasethics,thislessonlooksatopportunityandlegalriskandintroducescasestudiesthatfoundthemselvesinhotwater.Themoduleconcludeswithsometoolsandguidesthatwillhelpmitigateriskswhenexploitingdatascience.

Openinnovationanddatascience

Datascienceblursdisciplinaryboundariesandinvolvesexpertsfrommanyfields.Oftenonecompanywillnothavealltherequiredskillsinhousetorapidlyexploitdatascience.Openinnovationisakeyenablertofastinnovation.Openinnovationisacollaborativeapproachtocreatinginnovativeproductsandservices.Beingcollaborativewiththecommunitycanalsohelpimproveproductreputationorhelpidentifyrisksearlier.Thelessonlooksathowopeninnovationplaysakeyroleindataexploitationalongwithkeycasestudies.

6.1.2 Course2:DatasciencemeansbusinessBuildingfromthefirstcourse,thiscourselooksatwaystoembedinnovationfromdatasciencewithinanexisting(ornew)businessmodel.AkeypartofBusiness Intelligence ishowtousedata todriveforwardabusinessmodel. This set of lessons looks atwheredata science fits indifferentbusinessmodelsandhowtopitchfordatascienceagainstabusinessmodel.Furtherthiscourselooksathowlicensing is a core part of the data market and how different models can be exploited both forcommercialgainandforinnovation.

Businessmodelsindatascience

Manybusinessesarenowusingdatascienceaspartoftheirbusinessmodel.Inordertosuccessfullyexploitdatascience,organisationsneedtounderstandhowdataiscreatingvaluefortheirbusiness.Thislessons looks at how to create a value proposition and the link between this and various businessmodels.Theaimofthislessonistoenablelearnerstorealisehowtoexploitdatasciencedeeplywithinabusinessmodel.

Pitchingfordatascience

Onceapotentialneworinnovativevaluepropositionandbusinessmodelhavebeenbuilt,thenextkeystageistocommunicatethiseffectivelyeitherwithinanorganisationortoexternalfunders.Havingagoodpitchisessentialforgaininginterestfromorganisationleaders,investorsandpotentialpartners.Thishelpsestablishandgrowdatasciencebusinesses.Thislessonlooksathowtocreateaneffectivepitchfordatascienceledinnovation.

Datamarketsandlicensing

Dataisanewrawmaterial.Akeyroleofadatascientististorefinethismaterialtocreatesomethingmorevaluable.Atthesametimerawmaterialsatvariousstagesofrefinementare tradedorsold toothers.Thislessonlooksatsomeofthemarketssurroundingdataandhowtheserelatedtoaspectrumofopportunitiesandbusinessmodels.

6.1.3 Course3:DatascienceecosystemsTheaimofthiscourseistoenableparticipantstoexploitdatascienceindifferentecosystems.Withinthe field of data science, there aremanydifferent levels of ecosystem. These range froma internal

4https://www.forbes.com/sites/kalevleetaru/2017/10/16/is-it-too-late-for-big-data-ethics/#aa20de23a6d1

Page 31: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page31of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

businessecosystemthat ismadeupofmanydifferentdepartmentsand teams, toall theactorsandstakeholderswithinasectororindustry.

Embeddingdatasciencewithinabusiness

Moderndatascientistsoftenfacethechallengeofhowtoembeddatasciencewithinabusiness.Forabusinesstofullyembracedatascience,itmustbeattheheartofallbusinessactivities.Inthismodule,weexploretheimportanceofadatasciencestrategyandhowtodevelopadatasciencestrategy,aswellashowtoreduce frictionand increasebuy-in.Thiswillensure thatdatasciencesitsat theheartofbusinessactivities.

Exploringthestartupecosystem

Adatascientistmightworkwithinanorganisationortheymighthaveanewandnovelideathattheywishtobringtomarket.Fordatascientistsenteringthestartupecosystem,itcanbeaverydifferentexperiencefromtheecosystemofanindividualbusiness.Inthismoduleweexplorethedifferentstagesofstartupsthatexistwithin theecosystem, therolesofdifferent fundingtypesandsources,andtheadditionalsupportthatisavailabletothosenavigatingthedatasciencestartupecosystem.

Exploitingdatascienceinasector

Amoderndatascientistisexpectedtocombinedomainexpertisewithmoretraditionaldatascienceskills. Knowing how to exploit data science within a sector is an essential skill for a modern datascientist.Inthislesson,weexploredatascienceecosystemswithinsectors,drawingonexamplesfromtheentertainment,transportandphysicalactivitysectors,aswellasthedifferencebetweendatascienceinaclosed,sharedoropensectorecosystem.

6.2 Learningmaterials&deliverymethodsTheentire3coursecurriculumencompassing9lessonsisofferedbyODIandisavailableentirelyonlineforanyonetoengagewithfreelyatanytimewithouttheneedforenrolment.Althoughthelessonsarenumberedinorder,learnerscanchoosetheirownpathiftheychoose.Eachlessonispresentedintheformof an interactivewebpage following the latest instructionaldesign theoryusing an eLearningplatformvoted themost innovative4 year in a rowat the learnXawards5.The eLearningplatformprovidesacompletelycrossplatformresponsivewayofdeliveringinteractivelearning.Eachlessonisbackedwithlearningoutcomeswhichareassessedthroughtheuseofinteractivequestioningattheendofeach.

Eachlessonfeatures:

● Practicalstepsforaccomplishingkeytasks● Case-studies● Knowledge-checksandquizzes

6.3 RelevancetocurriculumSofar,theEDSAcurriculumhasprovidededucationalresourcesofskillsandmethodologieswhichcovermanyskillsareasindepth.Forexample,thereareatleast5coursesintheareasof“Bigdatatechnologiesandsystems”and“Computingmethodologies”.However,currentlytherearenocourseson“Businessanalysisorganisationandmanagement”andonlyoneintheareasof“Businessanalysisandenterpriseorganisation”and“Businessprocessmanagement”.ByfocussingspecificallyonBusinessIntelligenceandexploitingdatascienceinbusinessesthiscoursehasbeenspecificallydesignedtocoversuchareas.

5https://www.adaptlearning.org/index.php/about/

Page 32: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page32of36EDSAGrantAgreementno.643937

6.4 RelevancetodemandanalysisThekeyoutcomesofthedemandanalysis(D1.2)providedaclearindicationthattherewasaneedforbroadintroductoryleveltrainingindatascience.8keycurriculumareaswereidentifiedofwhich6focuson“hardskills”:

1. BigData2. Machinelearningandprediction3. Datacollectionandanalysis4. Mathsandstatistics5. Interpretationandvisualisation6. Advancedcomputingandprogramming

Thisleavestwoareasofthecurriculumwhicharemore“softskills”based:

1. Businessintelligenceanddomainexpertise2. Opensourcetoolsandconcepts

This course has been designed focus on how data science can unlock value for new and existingbusinesses.SincethestartoftheEuropeanDataScienceAcademyprojectin2015datasciencehasbeengrowingininterest,whilerelatedareassuchasBigData,AIandMachinelearningcontinuetoalsoshowaconstantfocus.

Figure12:GoogletrendanalysisforBigdataandDataScience.

Importantly this shows thatdata science, as adiscipline is gaining in importance.Data science is adisciplinethatbringsimportantscientificandbusinessapproachestotoolsandtechniqueslikemachinelearninganditisimportanttothusalsofocusonthesoftskillsonwhatapplyingdatasciencemeanstobusinessesandhowmanagerscanmakethemostofdatascienceinexistingbusinesses.

Big dataData science

Page 33: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page33of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

Figure13:TheEDSAdashboardskillschart.

ThecurrentEDSAdashboardalsoshowsthegrowinginterestinexploitingopendatainbusinesses.The4thmostrequiredskillinjobsiscurrentlybusinessintelligence.

Businessintelligenceisfundamentallyaboutthestrategiesandtechnologiesthatallowbusinesstobuildinsightfromdata.InordertoexploitBIrequiresknowledgeofboththehardskills(technologies)andthesoftskills (strategies) tounderstandthe implicationsofsuchdataanalysis.Thus,datascientistsrequire some level of strategic knowledge while existing managers need to understand theopportunitiesandrisksofapplyingdatascienceandotherBItechniques.

Thisrequirementwasalsoemphasised inthedemandanalysis(D1.2).D1.2pointedoutareportbyMcKinseyonBigDataandthelackofmanagerswillingittakerisksonemployingdatascientists6.

D1.2recommendedthatcurricularneeds toaddress thechallengesofdatasciencebeingembeddedwithinbusinessatalllevelssuchthatpotentialismaximised.Thismodulehasbeenspecificallydesignedtohelpbridgethegapbetweentheharddatascienceskillsandtheirimplicationsonbusinesses.Bytheend of themodule learnerswill have a better business intelligence of the opportunity and risks ofexploitingdatascience,bothwithinexistingorganisationsandtobuildnewones.

Thedemandanalysisstudyevaluationreport(D1.4)makesanumberofrecommendationsforguidingthe development of the EDSA curricular. Table 1 from this study makes seven recommendations,summarisedas:

1. Holistictrainingapproach2. Opensourcebasedtraining3. Softskillstraining4. Basicdataliteracyskills5. Blendedtraining6. Datascienceskillsframework7. Navigationandguidance

Thedataexploitationincludingdatamarketsandlicensingcoursefocusesonmanyoftheseareas,inparticular.

1.Holistictrainingapproach

Most professional courseproviders offer very focused courses that go intodepth about howparticulartools,suchasHadoop,R,PythonorSPSS,canbeusedbybusinesses.

6Bigdata:Thenextfrontierforcompetition-http://www.mckinsey.com/features/big_data

Page 34: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page34of36EDSAGrantAgreementno.643937

- EDSAstudyevaluationreport(D1.4,4.2.1,p65)Thedataexploitationincludingdatamarketsandlicensingcoursefocusesonsoftskillsthatareeasilytransferableintodifferentdomains.Thecourseintroducesavarietyofmodellingtoolsthatcanbeusedinmanydomainsandsectors;forexample,thelesson“Businessmodelsindatascience”introducesthebusinessmodelcanvaswhichisawidelyusedtoolforidentifyinggreatbusinessopportunities.

Thisholisticapproachwillhelpequipamoderndatascientistwiththetheoreticalknowledgeandsoftskillsrequiredtohelpintegratedatascienceintomanydifferentmarkets,sectorsandbusiness.

2.Opensourcebasedtraining

Themajorityofskillsdevelopmentindatascienceisachievedthroughthesemeans[opensourcetoolsandresources].

- EDSAstudyevaluationreport(D1.4)Thedataexploitationincludingdatamarketsandlicensingcoursewillbeofferedasafreeinteractiveonlinecourse,builtitselffromopensourcesoftware.Additionally,allpracticalexercisesavailablewithinthecourse(aswellasthemajorityofthoselinkedtoexternally)willbebaseduponfreelyavailableopensourcetoolsandmodels.Thisgiveslearnersthemaximumpotentialforself-studyandlearningwithoutanyfinancialorotherbarrierstousingtools.

3.Softskillstraining

Data scientists are often hired with high expectations regarding their abilities to transformbusinesstacticsandstrategies;thussoft-skillssuchastheseareseenasdesirableandneedgreaterfocusindatasciencetraining.

- EDSAstudyevaluationreport(D1.4) ThedataExploitationincludingdatamarketsandlicensingcourseisentirelyfocussedontheaspectsofhowtotransformtheirapplicationsintobusinesstacticsandstrategies.Thethreecoursesalltakeakeybusinessfocusandlooknotjustathowdatasciencefitswithindifferentbusinessmodelsbutalsohowthiscanhelpchangeasectorandwhatstrategiestotake.

Thishoweverrequiresstrongpresentationandcommunicationskillsinordertoinfluenceseniormanagementandotherfunctionaldepartmentstomaketherightdecisionsbasedondata.

- EDSAstudyevaluationreport(D1.4)Inthesecondpartofthecourseon“Datasciencemeansbusiness”therearelessonsspecificallyonhowtocreateandpitchavaluepropositiontobusinessleadersinordertobringaboutchange.Theseskillsareessentialforleadingdatascienceinnovators.

4.Basicdataliteracyskills

Onekeyaspectofdataliteracyisunderstandingwhendatashouldnotbeusedtoinformadecision.Specifically,inthefirstpartoftheDataExploitationincludingdatamarketsandlicensingcoursethereisalessonentitled“Managingriskandreward”.Thislooksspecificallyabouttherisksofdatascienceblurringboundariesbetweendisciplinesthatresultsinalossofexpertiseandrigorofresultsthatcancreateanegativeimpact.

5.Navigationandguidance

EachlessoninthedataExploitationincludingdatamarketsandlicensingcoursehasbeendesignedtostandalone.Theintroductorycontentofeachlessonoutlinestheskillsthatwillbeacquiredandhowtheyarerelevanttoamoderndatascientist.Linkswillbemadetootherlessonswherenecessary.Themainnavigationpageforthelessonswillallowuserstochoosetheirpathwaythroughthelessons.Whileitwillberecommendedthatusersfollowthelessonsinorder,interlinkingthelessonswillmeanthisisnotstrictlynecessarytoaccessrelevantcontent.

Attheendofeachlesson,learnerswillbeguidednotonlytothenextlessonbutalsotoanyexternalexercises,contentandcoursesthroughwhichtheycanexpandtheirknowledge.

Page 35: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

D2.6Learningresources3Page35of36

2018©Copyrightlieswiththerespectiveauthorsandtheirinstitutions.

6.5 FurtherdevelopmentplansThiscoursewillbemaintainedandfurtherdevelopedintwokeyways:

1. Increase the number of external links to beginner level and related data science courses: Asdiscovered by the demand analysis, well-structured learning for data science beginners islacking.TheODIwillcontinuetomaintainanyrelevantlinksinthelearningresourcestoensurerelevancetolearners.

2. Repackageasacommercialproduct:ThecoursewillbeofferedasacommercialproductbyODI,asdescribedintheODIexploitationplanpresentedintheProjectExploitationReport(D5.4).

Page 36: D2.6 Learning resources 3 · 2018-02-01 · Project acronym: EDSA Project full name: European Data Science Academy Grant agreement no: 643937 D2.6 Learning resources 3 Deliverable

Page36of36EDSAGrantAgreementno.643937

7. ConclusionThe EDSA courses portfolio has incorporated awide range of learning resources, either offered byprojectpartnersorbythirdparties.ThisdeliverablehaspresentedthemodulesthathavebeenaddedtotheEDSAcoursesportfoliointhethirdyearoftheproject,inordertocoverthefollowing4topicsofthefinalEDSAcurriculum.:

● Programming/ComputationalThinking(RandPython)● DataIntensiveComputing● SocialMediaAnalytics● DataExploitationincludingdatamarketsandlicensing

In particular, we have presented an overview of the objectives and learning outcomes of the newmodules,wehaveidentifiedthetypesoflearningmaterialsusedandhowthesearedelivered,andwehavediscussedhowthenewmodulesarerelatedtotheEDSAcurriculumandthedemandanalysis,aswellastheplansinplacefortheirfurtherdevelopment.