introduction to rapidminer studio v7

61
Dublin R Lightning Talks Event Introduction to Rapidminer Geraldine Gray, PhD March 24 th 2016

Upload: geraldinegray

Post on 11-Apr-2017

918 views

Category:

Data & Analytics


0 download

TRANSCRIPT

Page 1: Introduction to RapidMiner Studio V7

DublinRLightningTalksEvent

Introduction to Rapidminer Geraldine Gray, PhD

March 24th 2016

Page 2: Introduction to RapidMiner Studio V7

Introduc9onsGeraldineisalecturerinIns9tuteofTechnologyBlanchardstown(ITB)

CoordinatorforITB’sMScinAppliedDataScienceandAnaly9cs

[email protected] https://ie.linkedin.com/in/geraldine-gray-9b2b187

@GGrayITB geraldine.gray.itb

Page 3: Introduction to RapidMiner Studio V7

Overview Objec9ve:u  Introduc9ontoRapidMinerStudiofordataanaly9cs

Agenda:1.  OverviewofRapidMinerStudiointerface2.  Impor9ngadataset3.  Descrip9vesta9s9csandvisualisa9on4.  Datamodelling5.  Modelevalua9on6.  Datacleaning7.  AddingRscript

G. Gray 3

Page 4: Introduction to RapidMiner Studio V7

Topic1:OverviewofRapidminerStudio

G. Gray 4

Page 5: Introduction to RapidMiner Studio V7

InstallingRapidmineronyourownmachine

ThelatestversionofRapidminerStudioisV7,itcanbedownloadedfromhUps://rapidminer.com/products/comparison/

•  Forwindows:downloadtherapidminer-install.exeandinstall.DefaultsinstallittoC:\programfiles,andaddittothestart>programsmenu.

•  Formac:downloadthe.dmgandaddittoyourapplica9onsfolder.

G. Gray 5

Page 6: Introduction to RapidMiner Studio V7

Background Rapidminercomeswithover:

u Over125miningalgorithms

u Over100datacleaningandprepara9onfunc9ons.

u Over30chartsfordatavisualisa9on,

u  andselec9onofmetricstoevaluatemodelperformance.

Eachfunc9onisavailableasanOPERATOR,(whichisimplementedasaJavaclass).Aprocessisbuiltbyconnec9ngoperatorstogether,withtheoutputofoneoperatorpassingasinputtothenext.Thisisalldonebydraganddrop.

G. Gray 6

Page 7: Introduction to RapidMiner Studio V7

Creating a repository •  All processes created in Rapidminer are saved to a

repository. The repository will also store other objects including datasets and prediction models.

•  A repository maps to a folder on your machine created specifically for Rapidminer work.

Before starting RapidMiner studio for the first time, create a folder somewhere on your machine that will store your process and datasets from todays workshop.

•  The folder can be local to the machine, on a external drive/USB, or in the cloud.

G. Gray 7

Page 8: Introduction to RapidMiner Studio V7

StartupRapidminerWhenyoustartRapidminerstudio,youarepresentedwithanini9alintroduc9onwindow.Closethiswindowtoseethemaininterface.

G. Gray 8

Page 9: Introduction to RapidMiner Studio V7

RAPID MINER GUI

Processdesignwindow

Parameterseangsforselectedopera9on

Logofac9vi9es,includingerrors.Ifthisismissing,

addfromView/ShowPanel

Availableoperators

Explana9onoftheselectedoperator

Navigaterepositories

G. Gray 9

Page 10: Introduction to RapidMiner Studio V7

Rapid Miner toolbars Run process

Stop process

Automatically connect operators

undo redo

save

new open

Add/remove breakpoints

Show and alter the order in which operators run

Resize the process window

Process design view

View process results

Add a note / comment

Enable/disable an operator

Rightclickop9ons:

G. Gray 10

Page 11: Introduction to RapidMiner Studio V7

ProcessesandDatasets•  Yourrapidminerrepository(folder)willcontaindifferenttypesofobjects,mostcommonly:

•  Datasets–theactualdataitself•  Thesymbolisabluecylinder

•  Processes–aseriesofoperatorsthatareappliedtoadatasettoanalyseit.•  Thesymbolistwocogwheels•  Aprocesswillreadinadataset,carryoutvarioustasksonit,andoutputtheresults.AprocessdoesNOTchangetheoriginaldataset.

G. Gray 11

Page 12: Introduction to RapidMiner Studio V7

Repositories•  Rapidminercomeswitharepositorycalledsamples,whichhasa

numberofdatasetsandexampleprocesses.–  Youcannoteditthesamplesrepository

Tocreateyouownrepository,selectthedropdownboxontherepositorywindow,select‘createrepository’,andbrowsetothefolderyoucreated.

G. Gray 12

Page 13: Introduction to RapidMiner Studio V7

Findinganoperator•  Rapidminercomeswithmanyoperators,sofindingtheoneyouwant

canbedaun9ngatfirst.•  Onceyougetfamiliarwithoperatornames,youcanfindthemmore

easilyusingthefilteratthetopoftheoperatorwindow

G. Gray 13

Listalloperatorsthatstartwith‘read’

Listalloperatorswhosefirstwordstartswith‘dec’,and2ndwordstartswith‘t’.

Page 14: Introduction to RapidMiner Studio V7

Topic2:Impor9ngadataset

G. Gray 14

Page 15: Introduction to RapidMiner Studio V7

Reading in a dataset Therearetwoop9onsforaccessingadataset:1.  YoucanuseoneofthemanyReadoperatorsto

readdataintoRapidminertemporarilyforapar9cularprocess.

2. 

•  Rapidminershipswithanumberofdatasetsalready

loadedintheSAMPLESrepository

Onceadatasetisinarepository,youcanaccessitusingtheRetrieveoperator.

You can import a dataset into your repository, where it will be available to all processes via the retrieve operator. This is the most efficient method, as meta data is stored with the dataset.

G. Gray 15

Page 16: Introduction to RapidMiner Studio V7

Wine Quality Dataset WearefirstgoingtoimporttheWINEQUALITYdatasetfromtheUCIrepository:hUp://archive.ics.uci.edu/ml/machine-learning-databases/wine-quality/

AUributes:1-fixedacidity2-vola9leacidity3-citricacid4-residualsugar5-chlorides6-freesulfurdioxide7-totalsulfurdioxide8-density9-pH10-sulphates11–alcoholOutputvariable(basedonsensorydata):12-quality(scorebetween0and10)

Downloadthewine-quality-red.csvfilefromtheUCIwebsite.TakealookatthedatasetinExcelorNotepad/Textpad.Thefirstrowiscolumnheadings.Columnsareseparatedby‘;’

G. Gray 16

Google:UCIrepository,andlookforwinequality(notwine)

Page 17: Introduction to RapidMiner Studio V7

Importing the wine dataset into Rapidminer

1.  ReturntoRapidminer2.  Select‘adddata’;then‘mycomputer’andbrowsetothedownloaded

file.

3.  Youarepresentedwithanumberofscreenstosetthemetadataforthisdatasetasfollows...

G. Gray 17

Page 18: Introduction to RapidMiner Studio V7

Importing the wine dataset into Rapidminer

Thefirstscreenspecifiesimportseangs,includingthecolumndelimiter.ApreviewatboUomtellsyouiftheseangsarecorrect

G. Gray 18

Page 19: Introduction to RapidMiner Studio V7

Importing the wine dataset into Rapidminer

•  ThesecondscreenspecifiesdatatypeforeachaUribute,anditsroleinthedataanaly9csprocess

G. Gray 19

Mostdatatypesareintui9ve.Binominal:binaryaUribute,itcanonlyhavetwovalues.RapidminerwillassumebinomialifanaUributehasjusttwodis9nctvaluesinthefirst100rowsscanned.Thisisnotalwayscorrect.Polynominal:anon-numericaUributewithmul9plevalues.

Page 20: Introduction to RapidMiner Studio V7

Importing the wine dataset into Rapidminer

ROLE•  AUributeswithoutaroleareusedbyminingalgorithmstoiden9fypaUerns

inthedataset.•  Predic9onmodelswillaUempttopredicttheaUributewiththeroleof

LABEL.•  TheaUributewiththeroleofIDisaprimarykey,usedinJOINopera9ons.•  Youcanspecifyother,userdefined,rolesforaUributestobeignoredby

miningalgorithms

G. Gray 20

ChangetheroleofthefinalaUribute,quality,tolabel.

Page 21: Introduction to RapidMiner Studio V7

Importing the wine dataset into Rapidminer

Inthefinalscreen,specifythenameofthedataset,i.e.wine,andbrowsetotherepositoryfolderwhereitistobestored.Thedatasetwillnowappearinyourrepositorywindow

G. Gray 21

Page 22: Introduction to RapidMiner Studio V7

Topic3:Descrip9veSta9s9csandVisualisa9on

G. Gray 22

Page 23: Introduction to RapidMiner Studio V7

ExploringadatasetInthesamples/datarepositorythereareanumberofdatasetsalreadyimported(i.e.IntheRMformat).ClickontheTITANICdatasettoopenit.Thisautoma9callybringsyoutotheresultsview.Withintheresultsview,therearefivetabsonthelenhandside.Wewilllookatthefirstthree:1.  Data:Viewthedatainthedataset2.  Sta9s9cs:Viewsummarysta9s9csonthedataset3.  Charts:Arangeofvisualiza9onsofthedataset

G. Gray 23

Page 24: Introduction to RapidMiner Studio V7

Thedataview•  Thedataviewlistsalltherowsinthedataset,andreportsonthe

numberofrows(examples),andcolumns(aUributes)inthedataset.

•  Thefiltersontherighthandsideallowyoutoinves9gaterowswithmissingvalues.

G. Gray 24

Page 25: Introduction to RapidMiner Studio V7

Thesta9s9csviewThesta9s9csviewgivesmetadataoneachaUribute,specifically:

–  Datatypes–  Numberofmissingvalues–  Min,max,averagefornumberaUributes–  Least,Mostandalistofvaluesfornon-numericaUributes

ClickingonanaUributewillshowahistogramforthataUributeThisisagoodviewforanini9alqualityassessmentof:

1.  Missingvalues2.  Outliervalues3.  AUributeswhosedistribu9onofvaluesisnotasexpec9ng,

indica9ngthedatasetinnotrepresenta9veofthepopula9onofinterest.

G. Gray 25

Page 26: Introduction to RapidMiner Studio V7

Thechartsview•  Thechartsviewgivesyouaccesstoarangeofvisualisa9onsforyour

dataset.

G. Gray 26

Page 27: Introduction to RapidMiner Studio V7

Thechartsview

G. Gray 27

Gotothechartviewofthe9tanicdataset.Underchartstyle,select‘histrogramcolor’.SetHistrogramto‘age’;Colorto‘Survived’;andreducetheOpaquenessofthehistrogram.a)  Doesitappearthatprioritywasgiventochildren?b)  Insteadof‘age’plot‘sex’.Doesitappearthat

prioritywasgiventowomen?c)  Lookingatahistogramof‘class’,whichclassof

passengerwasmostlikelytosurvive?

Page 28: Introduction to RapidMiner Studio V7

ThechartsviewWearegoingtolookatonemoredataset,theirisdataset,whichhasitsownwikipediapage:hUps://en.wikipedia.org/wiki/Iris_flower_data_set

G. Gray 28

AUributes:a1:SepalLengtha2:SepalWidtha3:PetalLengtha4:PetalWidth

Classlabel:Iris-setosaIris-veriscolorIrish-virginica

Page 29: Introduction to RapidMiner Studio V7

Thechartsview

•  NavigatetotheIRISdatasetinthesamples/datarepository.Doubleclicktoopenitintheresultsview.

•  Inthechartsview,select‘ScaUerMatrix’.ThisshowsascaUerplotofallpairsofaUributes,colourcodedbyclasslabel.

a)  Arethethreeclasseswellseparated?b)  SelectaScaUer3-DColorplot.Bydefaultitcolorcodesbyclasslabel.

Useyourmousetorotatetheplotandsoviewitfromdifferentperspec9ves.

G. Gray 29

Page 30: Introduction to RapidMiner Studio V7

Closealltabsintheresultsview

G. Gray 30

Page 31: Introduction to RapidMiner Studio V7

Topic4

Buildingapredic9vemodel

G. Gray 31

Page 32: Introduction to RapidMiner Studio V7

Classifica9onAclassifica9onalgorithmtrainsamodeltopredictaclasslabel–oneoftheaUributesinthedatasetThisclasslabeldefinesgroupsinthedatasetThealgorithmlearnswhatdifferen9atesthesegroupsfromeachother

G. Gray 32

ClassLabel A1 A2 A3 A4Iris-setosa 5.1 3.5 1.4 0.2Iris-setosa 5 3.6 1.4 0.2Iris-setosa 5.7 3.8 1.7 0.3Iris-setosa 4.6 3.6 1 0.2Iris-setosa 5 3.3 1.4 0.2Iris-versicolor 6 2.2 4 1Iris-versicolor 6.7 3.1 4.4 1.4Iris-versicolor 6.8 2.8 4.8 1.4Iris-versicolor 5.7 3 4.2 1.2Iris-versicolor 5.7 2.9 4.2 1.3Iris-virginica 7.1 3 5.9 2.1Iris-virginica 7.2 3.6 6.1 2.5Iris-virginica 6.5 3.2 5.1 2Iris-virginica 6.7 3.3 5.7 2.1Iris-virginica 6 3 4.8 1.8

Page 33: Introduction to RapidMiner Studio V7

Classifica9onalgorithmsClassifica9onalgorithmsuselabeleddatatolearnhowtoiden9fyinstancesofeachclass

Willitbeeasytotrainamodeltodifferen9atebetweenthethreetypesofirisbelow?

G. Gray 33

Iris virginica

Iris veriscolor

Iris setosa

Page 34: Introduction to RapidMiner Studio V7

Classifica9onalgorithmsTherearemanyclassifica9onalgorithmsimplementedinRapidminer,undermodeling/predic9ve.Wewilllookatonesuchalgorithm:aDecisionTree

G. Gray 34

Page 35: Introduction to RapidMiner Studio V7

Star9ngaprocess...

•  SofarinRapidminer,wehavejustlookedatdatasets,wehaven’tactuallydoneanythingwiththedata.

•  Inthissec9onwewillcreateaRapidminerprocessthattrainsaclassifica9onmodel...

ReturntotheDesignView

Theprocesswindowshouldbeempty

G. Gray 35

Page 36: Introduction to RapidMiner Studio V7

Star9ngaprocessTheprocesswillstartbyretrievingadataset.

–  Wewillusetheirisdataset

Navigatetotheirisdatasetinthedata/samplesrepository,anddragitintotheprocesswindow.

–  ThisaddsaRetrieveoperator,whichretrievesadatasetfromtherepository.

G. Gray 36

Page 37: Introduction to RapidMiner Studio V7

Buildingamodel

–  Drag‘DecisionTree’fromtheoperatorswindowontotheprocesswindow,aner‘Retrieve’.

–  Connectthe‘out’portfromRetrieve(clickonthesemicircle)tothe‘tra’portofthe‘DecisionTrees’(clickonthesemicircle)

–  ConnectbothoutputportsoftheDecisionTreetotheprocessoutputport

G. Gray 37

Page 38: Introduction to RapidMiner Studio V7

Aboutports...

G. Gray 38 38 Process input port

Process output ports Operator

input ports Operator output ports

MandatoryinputportOp9onalinputportOutputporthasavalue

Outputportdoesnothaveavalue

Ports represents input to an operator, and outputs from an operator. Data an other objects are passed from one operator to the next in a process, as indicated by ports that are connected. Colors are used to indicate the type of data/object, e.g:

purple: dataset green: model brown: model performance

Hover over a port to see the type of object required.

Connectmatchingcolours

Page 39: Introduction to RapidMiner Studio V7

Runtheprocesstobuildthemodel•  Runtheprocess.Rapidminerwillautoma9callybringyoutotheresults

view.•  Therearetwotabsintheresultsview(becausewehadtwooutputsfrom

theprocess:–  Thedatasetitself–  Thedecisiontreeclassifica9onmodel

•  ClickontheDecisionTreetab

G. Gray 39

Page 40: Introduction to RapidMiner Studio V7

Classifica9onmodel

ThetextonLeafnodesisthepredictedclasslabel.

G. Gray 40

AUributes:a1:SepalLengtha2:SepalWidtha3:PetalLengtha4:PetalWidth

Theheightofthebarindicatesthenumberofrowsthatmatchedthisbranch.Hoveroverthenodetogettheactualnumbers

Amixofcoloursindicatesthatnotallrowsmatchingthisbranchwereinthesameclass

Branchesonthedecisiontreerepresentif..then..rules,e.g.ifa3<=2.450thentheflowerisIrisSetosa

WhichaUributesweremostpredic9veoftheclasslabel?

Page 41: Introduction to RapidMiner Studio V7

Topic5

Modelaccuracy(andbuildingblocks)

G. Gray 41

Page 42: Introduction to RapidMiner Studio V7

ModelaccuracyAdecisiontreeproducesanicevisualisa9onoftherulesthatpredictclassmembership.Itscanbeusedasawaytoexplorehistoricdata(Descrip9vemodeling).However,thedecisiontreeitselfdoesnottellushowaccuratethemodelwillbewhenappliedtonewdata(i.e.datathatwasnotavailabletoitduringtraining.).

i.e.canwereplyontheaccuracyofitspredic9ons?(Predic9vemodeling)Todeterminemodelaccuracywhenmakingpredic9onsonnewdata,wedothefollowing:

G. Gray 42

Page 43: Introduction to RapidMiner Studio V7

Modelaccuracy

G. Gray 43

1.Splitthedatasetintoatrainingdatasetandatestdataset

2.Trainingamodelonthetrainingdataset

3.Applythemodeltothetestdataset

4.Calculatehowmanyrowswerepredictedcorrectly.

Page 44: Introduction to RapidMiner Studio V7

Modelaccuracy

G.Gray 44

Label A1 A2 A3 A4Iris-versicolor 6 2.2 4 1Iris-setosa 4.6 3.6 1 0.2Iris-versicolor 5.7 2.9 4.2 1.3Iris-versicolor 5.7 3 4.2 1.2Iris-virginica 7.1 3 5.9 2.1Iris-virginica 6 3 4.8 1.8Iris-versicolor 6.7 3.1 4.4 1.4Iris-virginica 6.5 3.2 5.1 2Iris-setosa 5 3.3 1.4 0.2Iris-virginica 6.7 3.3 5.7 2.1Iris-setosa 5.1 3.5 1.4 0.2

Training data

Label A1 A2 A3 A4Predictedvalue

Iris-setosa 5 3.6 1.4 0.2 ?Iris-versicolor 6.8 2.8 4.8 1.4 ?Iris-virginica 7.2 3.6 6.1 2.5 ?Iris-setosa 5.7 3.8 1.7 0.3 ?

Test data

Classifica9onalgorithm

Trainmodel

Classifica9onmodel

Applymodel

TrueLabel PredictedlabelIris-setosa Iris-setosaIris-versicolor Iris-virginicaIris-virginica Iris-versicolorIris-setosa Iris-setosa

Accuracy: 50%

Labeled data

Page 45: Introduction to RapidMiner Studio V7

ModelaccuracyinRM•  ReturntotheDesignView•  RightclickontheDecisionTreeoperatoranddeleteit•  Rightclickanywhereintheprocesswindow,selectInsertBuildingBlock,andthenNominalX-Valida9on.

•  AValida9onoperatorisaddedtotheprocesswindow.Moveittotherightoftheretrieveoperatorandconnecttheports.

G. Gray 45

Buildingblocksaregroupsofoperatorsfrequentlyusedtogether.Youcandefineyourown,orusethe5predefinedbuildingblocks

TheiconontheboUomrightcorneroftheoperatorindicatesthereareotheroperatorsembeddedwithinthisoperator.Clickontheoperatortoviewitssub-processes

Page 46: Introduction to RapidMiner Studio V7

Modelaccuracy1.Thevalida9onoperatorsplitsthedatasetintopar99ons:someareusedfortrainingwhileothersareusedfortes9ng

G. Gray 46

2. Train a Decision Tree on the training portion of the dataset

3. Apply the decision tree model to the test portion of the dataset

4. Calculate how many predictions were correct

Page 47: Introduction to RapidMiner Studio V7

Modelaccuracy•  Returnuptotherootlevel.•  Outputthemodel(mod)andtheperformance(ave)port.

•  Runtheprocess

G. Gray 47

Page 48: Introduction to RapidMiner Studio V7

Modelaccuracy–confusionmatrixTheperformanceoperatorgivestheoverallmodelaccuracy,andaccuracywithineachclassdepictedasaconfusionmatrix:

G. Gray 48

pred.: refers to the class label predicted by the decision tree

true: Refers to the actual class label in the original dataset

4 rows in the dataset were predicted as being Iris-virginica, but were actually iris-veriscolor

5 rows in the dataset were predicted as being Iris-veriscolor, but were actually iris-virginica

The diagonal represents correct predictions

Page 49: Introduction to RapidMiner Studio V7

Topic6:Datacleaning

Crea9ngaRapidminerprocessto1.  RemoveaUributes2.  RemoveRows

3.  Fillmissingvalues

G. Gray 49

Page 50: Introduction to RapidMiner Studio V7

Datacleaning

•  Theirisdatasetisacleandataset,withclassesthatareeasytodis9nguish.

•  Datasetsarenotusuallysoclean,oreasytomodel.•  Thenextsec9onwillbuildaRapidminerprocesstocleanadatasetand

thentrainaclassifica9onmodel...•  ReturntotheDesignView.•  Saveyourcurrentprocesstoyourrepository,andcallitDT-IRIS

•  Startanewprocess

•  Choseablanktemplate

G. Gray 50

Page 51: Introduction to RapidMiner Studio V7

Datacleaning•  Theprocesswillstartbyretrievingadataset.

–  Wewillusethe9tanicdataset,andsortoutthemissingvalues•  Navigatetothe9tanicdatasetinthedata/samplesrepository,anddrag

itintotheprocesswindow.–  ThisaddsaRetrieveoperator,whichretrievesadatasetfromtherepository.

•  The9tanicdatasethas1309rows.5aUributeshadmissingvalues

G. Gray 51

AEeibutes Numbermissing %agemissingPassengerFare 1 0.08%PortofEmbarka9on 2 0.15%Age 263 20.09%LifeBoat 823 62.87%Cabin 1014 77.46%

Page 52: Introduction to RapidMiner Studio V7

Datacleaning

Step1:RemoveaUributeswith>40%missing–  Drag‘selectaUributes’ontotheprocesswindowaner‘Retrieve’.–  ConnecttheoutputfromRetrieve(clickonthesemicircle)totheInputof

‘SelectAUributes’(clickonthesemicircle)–  Clickon‘SelectAUributes’toviewitsparametersontherighthandpane.

WemustspecifywhataUributesininclude/excludeintheprocess.

G. Gray 52

•  SetaUributefilterto‘subset’;clickon‘selectaUributes’,anddoubleclickonCabinandLifeboattomovethemtotherighthandlist.Clickapply.

•  Clickon‘invertselect’asthesearetheaUributeswedoNOTwanttoselect.

RUN THE PROCESS

Page 53: Introduction to RapidMiner Studio V7

Datacleaning

Step2:ReplacemissingvaluesinAGE–  Drag‘replacemissingvalues’ontotheprocesswindowaner

‘SelectAUributes’.–  Connectthe‘exa’outputfromselectaUributestothe‘exa’

inputof‘replacemissingvalues’–  Clickon‘replacemissingvalues’toviewitsparametersonthe

righthandpane.

G. Gray 53

•  SetaUributefilterto‘single’;clickthedropdownboxbelow,andselect‘age’

•  Thedefaultisthatmissingvalueswillbereplacedbytheaveragevalueforage

RUN THE PROCESS

Page 54: Introduction to RapidMiner Studio V7

Datacleaning

Step3:RemoverowsforaUributeswith<5%missing–  TheonlyaUributeslenwithmissingvaluesarePassengerFareand

PortofEmbarka9on.RemovingALLrowswithmissingvalueswillhandletheremainingmissingvalues

–  DragFilterExamplesontotheprocesswindowanerReplacemissing.Selectfilterexamplestoviewitsparameters:

•  Clickthecustom_filtersdropdownboxintheoperatorsparameters,andselectno_missing_aUributes

G. Gray 54

RUN THE PROCESS

Page 55: Introduction to RapidMiner Studio V7

Buildapredic9vemodelonthecleaneddata

•  Rightclickontheprocesswindow,andaddaNominalX-Valida9onblocktotheendoftheprocess.

•  Connecttheports,ensuringmodelandtheaccuracy(ave)areoupuUedfromtheprocess.

G. Gray 55

A red port indicates there may be an error. Run the process to check . . .

Page 56: Introduction to RapidMiner Studio V7

Buildapredic9vemodelonthecleaneddata

•  LookfortheSetRoleoperator,anddropitontotheprocesswindow.•  ConnectitinbetweenRetrieveandSelectAUributes.•  Clickonsetroletoviewitsparameters.SetaUributenametosurvived,

andtargetroletolabel.Thedatasetnothasaclasslabel.

G. Gray 56

HowaccurateistheDecisionTree?WhichaUributesweremostpredic9veoftheclasslabel?

RUN THE PROCESS

Page 57: Introduction to RapidMiner Studio V7

Topic7:AddingRcode

G. Gray 57

Page 58: Introduction to RapidMiner Studio V7

RunningRscriptwithinRapidminer

•  ThereareanumberofextensionstoRapidMinerstudioavailablefreefromtheirmarketplace,includinganextensiontorunRscriptwithinRapidminer.Installedpackagesarelistedundertheextensionsfolder.

•  TheoperatortorunRscripts‘ExecuteR’.Theoperatorsparameter

providestheeditorforRscript;Inputsaretheparameterstoamandatorymainfunc9on;Areturnstatementdefinestheoutputsfromtheoperator.

G. Gray 58

Page 59: Introduction to RapidMiner Studio V7

RunningRscriptwithinRapidminer

Theoperatorshelpgivesalinktotheexampleprocess.ThePolynomialdatasetissplitintotwopar99ons.LearnModelcontainsRscripttotrainalinearmodel;ApplyRModelcontainsRscripttoapplythemodelandrecorditsperformance.Thescriptforbothisonthenextslide...

G. Gray 59

Page 60: Introduction to RapidMiner Studio V7

RunningRscriptwithinRapidminer

•  LearnModel

#trainalinearmodelonthetrainingdataandreturnthelearnedmodelrm_main=func9on(data){

linearModel<-lm(formula=label~.,data=data) return(linearModel)}

•  ApplyRmodel##loadthetrainedmodelandapplyitonthetestdatarm_main=func9on(model,data){#applythemodelandbuildapredic9onresult<-predict(model,data)#addthepredic9ontotheexamplesetdata$predic9on<-result#updatethemetadatametaData$data$predic9on<<-list(type="real",role="predic9on")return(data)}G. Gray 60

Page 61: Introduction to RapidMiner Studio V7

Learningmore...WehavejusttouchedonafewoftheoperatorsinRapidminer.•  Thesamples/processesrepositoryinRapidminerhasmanymore

examples.•  Therapidminerwebsitehastrainingmaterial.•  TheRapidminerResourceswebsitealsohastrainingmaterial,someof

whichisfree.•  Neuralmarkettrends(ThomasOU)alsohasgoodvideosonRapidminer.

G. Gray 61

Books:1.  RapidminerDataMiningUseCasesand

BusinessAnaly9csApplica9ons.Editors:

Dr.MarkusHofmann&RalfKlinkenberg

2.  ExploringdatawithRapidminerby

AndrewChisholm(freetodownload)