machine learning without the phd - azure ml

33
Simon Elliston Ball Head of Big Data @sireb Machine Learning without the PhD - Azure ML http://bit.ly/learningAzureML #learnAzureML

Upload: simon-elliston-ball

Post on 14-Dec-2014

1.231 views

Category:

Technology


2 download

DESCRIPTION

Introduction to Azure ML, and how to build Machine Learning models in the cloud without needing a PhD in Data Science.

TRANSCRIPT

  • 1. Machine Learning without thePhD - Azure MLSimon Elliston BallHead of Big Data@sirebhttp://bit.ly/learningAzureML#learnAzureML

2. Thanks for CloudBurst to: 3. Skynet.Self aware flying robots.in the cloud 4. Not Skynet 5. Goals of machine learningPredictionExplanationAutomationAnomaly detection 6. PredictionPrediction is very difficult,especially when its about thefuture.- Niels Bohr 7. ClassificationBinaryA or B A or !AMulti-ClassA, B, C, D, EBinary Multi-ClassA or !A, ok: B or !B, how about C? 8. Types of learningRegressionClusteringClassificationNatural Language ProcessingDeep learning 9. Science, with DataHypothesisModelTestEvaluateConclude 10. Cleaning80% of any data scientists time is spent cleaning the dataleaving just 20% to complain about cleaning the datahttps://www.flickr.com/photos/derekgavey/4283300990 11. CleaningMissing valuesNormalizationScalinghttps://www.flickr.com/photos/derekgavey/4283300990 12. CleaningMissing valuesNormalizationScalingFilteringhttps://www.flickr.com/photos/derekgavey/4283300990Signal ProcessingComplex ModelsSimple ThresholdsSmoothingMoving Average 13. CleaningMissing valuesNormalizationScalingFilteringMeta data and naminghttps://www.flickr.com/photos/derekgavey/4283300990Column NamesProjectionType Cleanups 14. SplitsTrainingTestingValidationhttps://www.flickr.com/photos/tabor-roeder/1160613880680%20%e.g. when Comparing different models 15. ModelingContinuous?Discrete?Labeled?SupervisedUn-Labeled?Unsupervised 16. Regression 17. Decision TreesWeather != Snow Weather = SnowTemp > 20C Wind > 20 18. Neural Networks 19. Neural Networks 20. Clustering 21. ScoringApply model to your dataOutputs:ResultThe probability that the result is sort of right 22. Demo timeAzure Portal -> Machine Learning StudioNew Experimenthttps://www.flickr.com/photos/mgdtgd/144569790 23. EvaluatingIs my classifier any good?False NegativeTrue PositiveFalse Positive True NegativePrecision: TP/(TP+FP)Accuracy(TP+TN)/(P+N) 24. EvaluatingHow far out was I?Error distance functions:Mean squared errorMean absolute errorR2 Coefficient of Determinationhttps://www.flickr.com/photos/dahlstroms/3945656390 25. Demo 26. R you ready?Programming Language based on S+By StatisticiansFor statisticiansMany many librariesIncluded in Azure ML 27. R you ready?abcabindactuarade4AdMitaodapeapproximatorarmarulesarulesVizashassertthatAtelieRBaBooNBACCOBaMbarkBASbaseBayesDAbayesGARCHbayesmbayesmixbayesQRbayesSurvBayesthreshBayesTreeBayesValidateBayesXBayHazbbemkrBCBCSFBCEbclustbcpBenfordTestsbfpBHbisoregbitbitopsBLRBMABmixBMSbnlearnboaBolstadbootbootstrapbqtlBradleyTerry2brewbrglmbspecbspmmaBVScairoDevicecalibratorcarcaretcatnetcaToolschronclassclusterclusterSimcodacodetoolscoincolorspacecombinatcompilercorpcorcslogisticctvcubaturedata.tabledatasetsdatedclonedealDeducerDeducerExtrasdeldirDEoptimRdeSolvedevtoolsdichromatdigestdistromdlmdoSNOWdplyrDPpackagedsee1071EbayesThreshebdbNeteffectsemulatorensembleBMAentropyEvalEstevaluateevdbayesevoraexactLoglinTestexpmextremevaluesfactorQRfaoutlierfitdistrplusFMEforeachforecastforeignformatRFormulafracdiffgamgamlrgbmgclusgdatageegeneticsgeoRgeoRglmgeosphereggmcmcggplot2glmmBUGSglmnetgmodelsgmpgnmgooglePublicDatagoogleVisGPArotationgplotsgraphicsgrDevicesgregmiscgridgridExtragrowcurvesgrpreggsubfngtablegtoolsgWidgetsgWidgetsRGtk2haplo.statshbsaehdrcdeheavyhflightsHHHIhighrHmischtmltoolshttpuvhttrIBrokersigraphintervalsiplotsipredirriteratorsJavaGDJGRkernlabKernSmoothKFKSDSkinship2kknnklaRknitrkslabelingLahmanlarslatticelatticeExtralavalavaanleapsLearnBayeslimSolvelme4lmmlmPermlmtestlocfitlpSolvemagicmagrittrmapdatamapprojmapsmaptoolsmaptreemarkdownMASSMasterBayesMatrixmatrixcalcMatrixModelsmaxentmaxLikmcmcMCMCglmmMCMCpackmemoisemethodsmgcvmicemicrobenchmarkmimeminpack.lmminqamisc3dmiscFmiscToolsmixtoolsmlbenchmlogitBMAmnormtMNPmodeltoolsmombfmonomvnmonregmosaicMSBVARmsmmultcompmulticoolmunsellmvoutliermvtnormncvregnlmeNLPnnetnumbersnumDerivopenNLPopenNLPdataOutlierDCOutlierDMoutlierspacbpredparallelpartitionspartyPAWLpbivnormpcaPPpermuteplsplyrpngpolynomPottsUtilspredmixcorPresenceAbsenceprodlimprofdpmprofileModelprotopsclpsychquadprogquantregqvcalcR.matlabR.methodsS3R.ooR.utilsR2HTMLR2jagsR2OpenBUGSR2WinBUGSrampsRandomFieldsrandomForestRArcInforasterrbugsRColorBrewerRcppRcppArmadillorcppbugsRcppEigenRcppExamplesRCurlrelimpreshapereshape2rgdalrgeosrglRGraphicsRGtk2RJaCGHrjagsrJavaRJSONIOrobCompositionsrobustbaseRODBCrootSolveroxygenroxygen2rpartrrcovrscproxyRSGHBRSNNSRTextToolsRUnitrunjagsRunuranrworldmaprworldxtraSampleSizeMeansSampleSizeProportionssandwichsbgcopscalesscapeMCMCscatterplot3dsciplotsegmentedsemseriationsetRNGsgeostatshapefilesshinySimpleTableslamsmoothSurvsnasnowSnowballCsnowFTspspacetimeSparseMspatialspBayesspdepspikeslabsplancssplinesspTimerstatsstats4stochvolstringrstrucchangestsmstsm.classSuppDistssurvivalsvmpathtautcltktcltk2TeachingDemostensorAtestthattextcattextirtfplottframetgpTH.datatimeDatetmtoolstranslationstreetseriestsfatsoutliersTSPUsingRutilsvarSelectIPvcdveganVGAMVIFwhiskerwordcloudXLConnectXMLxtablextsyamlziczipfRzoo 28. R you ready?Two Data Sets enter.One Data Set leaves.(And a chart if youre lucky) 29. ProductionOnce more from the topModeled in RApplied in C#, Java, whatever, but not R 30. Publish a web service.https://www.flickr.com/photos/jurvetson/6858583426 31. Thanks!Simon Elliston [email protected]@sireb 32. Questions?Simon Elliston [email protected]@sirebhttp://bit.ly/learningAzureML#learnAzureML