predictive analytics - big data & artificial intelligence

27
October 2016 Predictive Analytics Big Data & Artificial Intelligence

Upload: manish-jain

Post on 16-Apr-2017

1.107 views

Category:

Technology


1 download

TRANSCRIPT

October2016

PredictiveAnalyticsBigData&ArtificialIntelligence

Agenda

ArtificialIntelligence AI

BigDataMachineLearning

DeepLearning

NeuralNetworks

NLPNaturalLanguageProcessing

Demystifythefollowingbuzzwords.

ImageRecognition2

UltimateGoal:PredictiveAnalytics

Predictwhatuserswillwanttobuy.

AconsumersearchesforaTVandbasedonpreviouscustomersdata,showaproductthathasahighprobabilityofbeingboughtaswell.

3

EvolutionofDataAnalytics1990s 2000s

Excel BusinessIntelligence(BI)Dashboards

2015andbeyond

ActionableInsights

WhatHappened? What’sHappening? WhatWillHappen?

4

TheProcess

Structuredandunstructured(ex.

video)data

Dataisstoredindatabasesand

servers

DataGenerated

DataStored

ActionableInsights

DataProcessing

ProcessthedatausingCPU/GPUsandAIalgorithmstodetectpatterns

Predictivesignalsaregenerated

CentralProcessingUnit(CPU)/GraphicsProcessingUnit(GPU)

BigData ArtificialIntelligence

5

HowDidWeGetHere?Databases(the80s)

DataWarehousing(the90s)

• Relationaldatabases• Gigabytesinsize• Lowlatency

• Terabytesinsize• Customhardware

6

Today,it’sBigData

7

ArtificialIntelligence(AI)

8

ArtificialIntelligence(AI)

9

WhenToUseMachineLearning

Apatternexists1

Wecannotpindownthepatternmathematically

2

Wehavedataandhopefullylotsofdata

10

TypesofMachineLearning

11

SupervisedLearning

X

X

XX

X

Price

SquareFeet

Weknowwhatwearetryingtopredict.Weusesomeexamplesthatweandthemodelknowtheanswersto“train”ourmodel.Itcanthengeneratepredictionstoexampleswedon’tknowtheanswerto.

Example:Predictthepriceofahousebasedonthesizeofthehouse.

XX

12

UnsupervisedLearning

OO OO

O

OOOOO

X

Y

OOO OO

Wedon’tknowwhatwearetryingtopredict.Wearetryingtoidentifysomenaturallyoccurringpatternsinthedatawhichmaybeinformative.

Example:Trytoidentify“clusters”ofcustomersbasedonthedatawehaveonthem.

13

WhatisDeepLearning?• DeepLearningandNeuralNetworksaresynonymous

• It’sabranchofmachinelearningbasedonasetofalgorithmsthatattempttomodelhighlevelabstractionsindatabyusingadeepgraphwithmultipleprocessinglayers,composedofmultiplelinearandnon-lineartransformations

Whatwesee Whatthecomputer“sees” 14

ToolsofTheTrade

ApacheSystemML

GoogleCloudMachineLearning

15

[email protected]

Questions?

version:draft

Appendix

17

AIResearchersGeoffreyHinton

UniversityofTorontoGoogle

Yoshua Bengio

UniversityofMontreal

YannLeCun

NewYorkUniversityFacebook

AndrewNg

StanfordUniversityBaidu

18

CPUvsGPUPerformance

19

MapReduce

20

TheName…Hadoop

NamedaftertheyellowtoyelephantofDougCutting’sson.

In2006whileworkingatYahoo,DougcameupwiththeHadoopframework.In2008,itwastakenoverbytheopensourcegroup

Apache,hencetheofficialnameisApacheHadoop.21

HadooptotheRescue“anopensourceframeworkwritteninJavaforstoringand

processingmassiveamountsofdatainadistributedmanner”

1HadoopDistributedFileSystem(HDFS).Scalablefilesystemthatdistributesandstoresdataacrossmanymachinesinacluster.

MapReduce – frameworkfordistributedprocessing.

2KeyComponentsoftheFramework:

Storage 2 Analysis

22

Hadoop Architecture

Hadoopcanrunoncheapcommoditizedhardwareonpremiseorinthecloud.

Storesfilesinlargeblocks(64MB)acrossmultiplemachinesforfaulttolerance.Bydefault,dataisstoredon3separatemachines

HDFS

MapReduceBreakslargedataprocessingproblems intomultiple steps,namelyMappers(DataNode)andReducers(TaskTrackers)thatcanbeworkedoninparallelonmultiplemachines

23

MapReduce StoreSalesData(100MB)

Mappers NameNode1 DataNode1(64MB)

DataNode2(36MB)

LA NYC LA NYC

Reducers JobTracker TaskTracker1

LA LA

TaskTracker2

NYC NYC

ShuffleandSort

24

MapReduceMap Shuffle&Sort Reduce Result

25

Hadoop1.0vs2.0

26

TheFuture…

27