the rattle package -...

24
The rattle Package September 30, 2007 Type Package Title A graphical user interface for data mining in R using GTK Version 2.2.64 Date 2007-09-29 Author Graham Williams <[email protected]> Maintainer Graham Williams <[email protected]> Depends R (>= 2.2.0) Suggests RGtk2, ada, amap, arules, bitops, cairoDevice, cba, combinat, doBy, e1071, ellipse, fEcofin, fCalendar, fBasics, foreign, fpc, gdata, gtools, gplots, Hmisc, kernlab, MASS, Matrix, mice, network, pmml, randomForest, rggobi, ROCR, RODBC, rpart, RSvgDevice, XML Description Rattle provides a Gnome (RGtk2) based interface to R functionality for data mining. The aim is to provide a simple and intuitive interface that allows a user to quickly load data from a CSV file (or via ODBC), transform and explore the data, and build and evaluate models, and export models as PMML (predictive modelling markup language). All of this with knowing little about R. All R commands are logged and available for the user, as a tool to then begin interacting directly with R itself, if so desired. Rattle also exports a number of utility functions and the graphical user interface does not need to be run to deploy these. License GPL version 2 or newer URL http://rattle.togaware.com/ R topics documented: audit ............................................. 2 calcInitialDigitDistr ..................................... 3 calculateAUC ........................................ 3 centers.hclust ........................................ 4 drawTreeNodes ....................................... 5 drawTreesAda ........................................ 6 evaluateRisk ......................................... 7 1

Upload: others

Post on 15-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

The rattle PackageSeptember 30, 2007

Type Package

Title A graphical user interface for data mining in R using GTK

Version 2.2.64

Date 2007-09-29

Author Graham Williams <[email protected]>

Maintainer Graham Williams <[email protected]>

Depends R (>= 2.2.0)

Suggests RGtk2, ada, amap, arules, bitops, cairoDevice, cba, combinat, doBy, e1071, ellipse, fEcofin,fCalendar, fBasics, foreign, fpc, gdata, gtools, gplots, Hmisc, kernlab, MASS, Matrix, mice,network, pmml, randomForest, rggobi, ROCR, RODBC, rpart, RSvgDevice, XML

Description Rattle provides a Gnome (RGtk2) based interface to R functionality for data mining. Theaim is to provide a simple and intuitive interface that allows a user to quickly load data from aCSV file (or via ODBC), transform and explore the data, and build and evaluate models, andexport models as PMML (predictive modelling markup language). All of this with knowing littleabout R. All R commands are logged and available for the user, as a tool to then begin interactingdirectly with R itself, if so desired. Rattle also exports a number of utility functions and thegraphical user interface does not need to be run to deploy these.

License GPL version 2 or newer

URL http://rattle.togaware.com/

R topics documented:audit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2calcInitialDigitDistr . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3calculateAUC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3centers.hclust . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4drawTreeNodes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5drawTreesAda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6evaluateRisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1

Page 2: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

2 audit

genPlotTitleCmd . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8rattle_gui . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9listRPartRules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9listTreesAda . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10plotBenfordsLaw . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11plotNetwork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11plotOptimalLine . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12plotRisk . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14printRandomForests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16randomForest2Rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17rattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18savePlotToFile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19treeset.randomForest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

Index 21

audit Sample dataset for illustration Rattle functionality.

Description

The audit dataset is an artificially constructed dataset that has some of the characteristics of a trueaudit dataset for modelling productive and non-productive audits. It is used to illustrate binaryclassification.

The target variable is Adjusted, an integer which is either 0 (for non-producitve audits) or 1 (forproductive audits). Productive audits are those that result in an adjustment being made to a client’sclaims. The dollar value of those adjustments is also recorded (as the Adjusted column).

The independent variables include Age, type of Employment, level of Education, Maritalstatus, Occupation, level of Income, Sex, amount of Deductions being claimed, Hoursworked per week, and country in which they have a bank Account.

An identifier is included as the ID variable.

The dataset is quite small, consisting of just 2000 entities. It primary purpose is to illustrate mod-elling in Rattle, so a minimally sized dataset is suitable.

Format

A data frame.

Page 3: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

calcInitialDigitDistr 3

calcInitialDigitDistrGenerate a frequency count of the initial digits

Description

In the context of Benford’s Law calculate the distribution of the frequencies of the first digit of thenumbers supplied as the argument.

Usage

calcInitialDigitDistr(l)

Arguments

l a vector of numbers.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

calculateAUC Determine area under a curve (e.g. a risk or recall curve) of a riskchart

Description

Given the evaluation returned by evaluateRisk, for example, calculate the area under the risk orrecall curves, to use as a metric to compare the performance of a model.

Usage

calculateAUC(x, y)

Arguments

x a vector of values for the x points.

y a vector of values for the y points.

Details

The area is returned.

Page 4: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

4 centers.hclust

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

See Also

evaluateRisk.

Examples

## this is usually used in the context of the evaluateRisk function## Not run: ev <- evaluateRisk(predicted, actual, risk)

## imitate this output hereev <- data.frame(Caseload=c(1.0, 0.8, 0.6, 0.4, 0.2, 0),

Precision=c(0.15, 0.18, 0.21, 0.25, 0.28, 0.30),Recall=c(1.0, 0.95, 0.80, 0.75, 0.5, 0.0),Risk=c(1.0, 0.98, 0.90, 0.77, 0.30, 0.0))

## Calculate the areas unde the Risk and the Recall curves.calculateAUC(ev$Caseload, ev$Risk)calculateAUC(ev$Caseload, ev$Recall)

centers.hclust List Cluster Centers for a Hierarchical Cluster

Description

Generate a matrix of centers from a hierarchical cluster.

Usage

centers.hclust(x, h, nclust=10, use.median=FALSE)

Arguments

x The data used to build the cluster.

h

nclust

use.median

Page 5: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

drawTreeNodes 5

Details

For the specified number of clusters, cut the hierarchical cluster appropriately to that number ofclusters, and return the mean (or median) of each resulting cluster.

Author(s)

Daniele Medri and 〈[email protected]

References

Package home page: http://rattle.togaware.com

drawTreeNodes Draw nodes of a decision tree

Description

Draw the nodes of a decision tree

Usage

drawTreeNodes(tree, cex = par("cex"), pch = par("pch"),size = 2.5 * cex, col = NULL, nodeinfo = FALSE,units = "", cases = "cases",digits = getOption("digits"),decimals = 2,print.levels = TRUE, new = TRUE)

Arguments

tree an rpart decision tree.

cex .

pch .

size .

col .

nodeinfo .

units .

cases .

digits .

decimals the number of decimal digits to include in numeric split nodes.

print.levels .

new .

Page 6: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

6 drawTreesAda

Details

A variation of draw.tree from the maptree package.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

Examples

## this is usually used in the context of the plotRisk function## Not run: drawTreeNodes(ds.rp)

drawTreesAda Draw trees from an Ada model

Description

Using the Rattle drawTreeNodes, draw a selection of Ada trees.

Usage

drawTreesAda(model, trees=0, title="")

Arguments

model an ada model.trees The list of trees to draw. Use 0 to draw all trees.title An option title to add.

Details

Using Rattle’s drawTreeNodes underneath, a plot for each of the specified trees from an Ada modelwill be displayed.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

Examples

## Not run: drawTreesAda(ds.ada)

Page 7: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

evaluateRisk 7

evaluateRisk Summarise the performance of a data mining model

Description

By taking predicted values, actual values, and measures of the risk associated with each case, gen-erate a summary that groups the distinct predicted values, calculating the accumulative percentageCaseload, Recall, Risk, Precision, and Measure.

Usage

evaluateRisk(predicted, actual, risks)

Arguments

predicted a numeric vector of probabilities (between 0 and 1) representing the probabilityof each entity being a 1.

actual a numeric vector of classes (0 or 1).

risks a numeric vector of risk (e.g., dollar amounts) associated with each entity thathas a acutal of 1.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

See Also

plotRisk.

Examples

## simulate the data that is typical in data mining

## we often have only a small number of positive known casecases <- 1000actual <- as.integer(rnorm(cases) > 1)adjusted <- sum(actual)nfa <- cases - adjusted

## risks might be dollar values associated adjusted casesrisks <- rep(0, cases)risks[actual==1] <- round(abs(rnorm(adjusted, 10000, 5000)), 2)

Page 8: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

8 genPlotTitleCmd

## our models will generated a probability of a case being a 1predicted <- rep(0.1, cases)predicted[actual==1] <- predicted[actual==1] + rnorm(adjusted, 0.3, 0.1)predicted[actual==0] <- predicted[actual==0] + rnorm(nfa, 0.1, 0.08)predicted <- signif(predicted)

## call upon evaluateRisk to generate performance summaryev <- evaluateRisk(predicted, actual, risks)

## have a look at the first few and last fewhead(ev)tail(ev)

## the performance is usually presented as a Risk Chart## under the CRAN MS/Windows this causes a problem, so don't run for now## Not run: plotRisk(ev$Caseload, ev$Precision, ev$Recall, ev$Risk)

genPlotTitleCmd Generate a string to add a title to a plot

Description

Generate a string that is intended to be eval’d that will add a title and sub-title to a plot. Thestring is a call to title, supplying the given arguments, pasted together, as the main title, andgenerating a sub-title that begins with ‘Rattle’ and continues with the current date and time, andfinishes with the current user’s username. This is used internally in Rattle to adorn a plot withrelevant information, but may be useful outside of Rattle.

Usage

genPlotTitleCmd(..., vector=FALSE)

Arguments

... one or more strings that will be pasted together to form the main title.

vector whether to return a vector as the result.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

See Also

eval, title, plotRisk.

Page 9: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

rattle_gui 9

Examples

## generate some random plotplot(rnorm(100))

## generate the string representing the command to add titlestl <- genPlotTitleCmd("Sample Plot of", "No Particular Import")

## cause the string to be executed as an R commandeval(parse(text=tl))

rattle_gui Interal Rattle user interface callbacks.

Description

These are exported from the package so that the GUI can access the callbacks. They should other-wise be ignored.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

listRPartRules List the rules corresponding to the rpart decision tree

Description

Display a list of rules for an rpart decision tree.

Usage

listRPartRules(model, compact=FALSE)

Arguments

model an rpart model.

compact whether to list cateogricals compactly.

Details

Traverse a decision tree to generate the equivalent set of rules, one rule for each path from the rootnode to a leaf node.

Page 10: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

10 listTreesAda

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

Examples

## Not run: listRPartRules(ds.rpart)

listTreesAda List trees from an Ada model

Description

Display the textual representation of a selection of Ada trees.

Usage

listTreesAda(model, trees=0)

Arguments

model an ada model.

trees The list of trees to list. Use 0 to list all trees.

Details

Using rpart’s print method display each of the specified trees from an Ada model.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

Examples

## Not run: listTreesAda(ds.ada)

Page 11: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

plotBenfordsLaw 11

plotBenfordsLaw Plot a chart comparing Benford’s Law with a supplied numeric vertor

Description

Plots a barchart of Benford’s Law and the distribution of the frequencies of the first digit of thenumbers supplied as the argument.

Usage

plotBenfordsLaw(l)

Arguments

l a vector of numbers to compare to Benford’s Law.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

Examples

# A simple example using the audit data from Rattle.data(audit)plotBenfordsLaw(audit$Income)

plotNetwork Plot a circular map of network links between entities

Description

Plots a circular map of entities and their relationships. The entities are around the edge of the circlewith lines linking the entities depending on their relationships as represented in the supplied FLOWargument. Line widths represent relative magnitude of flows, as do line colours, and the font sizeof a label for an entity represents the size of the total flow into that entity. Useful for displaying, forexample, cash flows between entities.

Usage

plotNetwork(flow)

Page 12: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

12 plotOptimalLine

Arguments

flow a square matrix of directional flows.

Details

The FLOW is a square matrix that records the directional flow and magnitude of the flow from oneentity to another. The flow may represent a flow of cash from one company to another company.The dimensions of the square matrix are the number of entities represented.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

Examples

# Create a small sample matrix.flow <- matrix(c(0, 10000, 0, 0, 20000,

1000, 0, 10000, 0, 0,5000, 0, 0, 0, 3000,0, 1000000, 600000, 0, 0,0, 50000, 0, 500000, 0),

nrow=5, byrow=TRUE)rownames(flow) <- colnames(flow) <- c("A", "B", "C", "D", "E")plotNetwork(flow)

# Use data from the network package.data(flo)plotNetwork(flo)

plotOptimalLine Plot three lines on a risk chart, one vertical and two horizontal

Description

Plots a a vertical line at x up to max of y1 and y2, then horizontal from this line at y1 and y2.Intended for plotting on a plotRisk.

Usage

plotOptimalLine(x, y1, y2, pr = NULL, colour = "plum", label = NULL)

Page 13: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

plotOptimalLine 13

Arguments

x location of vertical line.

y1 location of one horizontal line.

y2 location of other horizontal line.

pr Aprint a percentage at this point.

colour of the line.

label at bottom of line.

Details

Intended to plot an optimal line on a Risk Chart as plotted by plotRisk.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

See Also

plotRisk.

Examples

## this is usually used in the context of the plotRisk function## Not run: ev <- evaluateRisk(predicted, actual, risk)

## imitate this output hereev <- NULLev$Caseload <- c(1.0, 0.8, 0.6, 0.4, 0.2, 0)ev$Precision <- c(0.15, 0.18, 0.21, 0.25, 0.28, 0.30)ev$Recall <- c(1.0, 0.95, 0.80, 0.75, 0.5, 0.0)ev$Risk <- c(1.0, 0.98, 0.90, 0.77, 0.30, 0.0)

## plot the Risk ChartplotRisk(ev$Caseload, ev$Precision, ev$Recall, ev$Risk,

chosen=60, chosen.label="Pr=0.45")

## plot the optimal pointplotOptimalLine(40, 77, 75, colour="maroon")

Page 14: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

14 plotRisk

plotRisk Plot a risk chart

Description

Plots a Rattle Risk Chart. Such a chart has been developed in a practical context to present theperformance of data mining models to clients, plotting a caseload against performance, allowing aclient to see the tradeoff between coverage and performance.

Usage

plotRisk(cl, pr, re, ri = NULL, title = NULL,show.legend = TRUE, xleg = 60, yleg = 55,optimal = NULL, optimal.label = "", chosen = NULL, chosen.label = "",include.baseline = TRUE, dev = "", filename = "", show.knots = NULL,risk.name = "Revenue", recall.name = "Adjustments",precision.name = "Strike Rate")

Arguments

cl a vector of caseloads corresponding to different probability cutoffs. Can beeither percentages (between 0 and 100) or fractions (between 0 and 1).

pr a vector of precision values for each probability cutoff. Can be either percent-ages (between 0 and 100) or fractions (between 0 and 1).

re a vector of recall values for each probability cutoff. Can be either percentages(between 0 and 100) or fractions (between 0 and 1).

ri a vector of risk values for each probability cutoff. Can be either percentages(between 0 and 100) or fractions (between 0 and 1).

title the main title to place at the top of the plot.

show.legend whether to display the legend in the plot.

xleg the x coordinate for the placement of the legend.

yleg the y coordinate for the placement of the legend.

optimal a caseload (percentage or fraction) that represents an optimal performance pointwhich is also plotted. If instead the value is TRUE then the optimal pointis identified internally (maximum valud for (recall-casload)+(risk-caseload)) and plotted.

optimal.labela string which is added to label the line drawn as the optimal point.

chosen a caseload (percentage or fraction) that represents a user chosen optimal perfor-mance point which is also plotted.

chosen.label a string which is added to label the line drawn as the chosen point.include.baseline

if TRUE (the default) then display the diagonal baseline.

Page 15: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

plotRisk 15

dev a string which, if supplied, identifies a device type as the target for the plot. Thismight be one of wmf (for generating a Windows Metafile, but only available onMS/Windows), pdf, or png.

filename a string naming a file. If dev is not given then the filename extension is used toidentify the image format as one of those recognised by the dev argument.

show.knots a vector of caseload values at which a vertical line should be drawn. These mightcorrespond, for example, to individual paths through a decision tree, illustratingthe impact of each path on the caseload and performance.

risk.name a string used within the plot’s legend that gives a name to the risk. Often the riskis a dollar amount at risk from a fraud or from a bank loan point of view, so thedefault is Revenue.

recall.name a string used within the plot’s legend that gives a name to the recall. The recallis often the percentage of cases that are positive hits, and in practise these mightcorrespond to known cases of fraud or reviews where some adjustment to per-haps a incom tax return or application for credit had to be made on reviewingthe case, and so the default is Adjustments.

precision.namea string used within the plot’s legend that gives a name to the precision. Acommon name for precision is Strike Rate, which is the default here.

Details

Caseload is the percentage of the entities in the dataset covered by the model at a particular prob-ability cutoff, so that with a cutoff of 0, all (100%) of the entities are covered by the model. Witha cutoff of 1 (0%) no entities are covered by the model. A diagonal line is drawn to represent abaseline random performance. Then the percentage of positive cases (the recall) covered for a par-ticular caseload is plotted, and optionally a measure of the percentage of the total risk that is alsocovered for a particular caseload may be plotted. Such a chart allows a user to select an appropriatetradeoff between caseload and performance. The charts are similar to ROC curves. The precision(i.e., strike rate) is also plotted.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

See Also

evaluateRisk, genPlotTitleCmd.

Examples

## this is usually used in the context of the evaluateRisk function## Not run: ev <- evaluateRisk(predicted, actual, risk)

## imitate this output here

Page 16: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

16 printRandomForests

ev <- NULLev$Caseload <- c(1.0, 0.8, 0.6, 0.4, 0.2, 0)ev$Precision <- c(0.15, 0.18, 0.21, 0.25, 0.28, 0.30)ev$Recall <- c(1.0, 0.95, 0.80, 0.75, 0.5, 0.0)ev$Risk <- c(1.0, 0.98, 0.90, 0.77, 0.30, 0.0)

## plot the Risk ChartplotRisk(ev$Caseload, ev$Precision, ev$Recall, ev$Risk,

chosen=60, chosen.label="Pr=0.45")

## Add a titleeval(parse(text=genPlotTitleCmd("Sample Risk Chart")))

printRandomForests Print a representtaion of the Random Forest models to the console

Description

A randomForest model, by default, consists of 500 decision trees. This function walks through eachtree and generates a set of rules which are printed to the console. This takes a considerable amountof time and is provided for users to access the actual model, but it is not yet used within the RattleGUI. It may be used to display the output of the RF (but it takes longer to generate than the modelitself!). Or it might only be used on export to PMML or SQL.

Usage

printRandomForests(model, models=NULL, include.class=NULL, format="")

Arguments

model a randomForest model.

models a list of integers limiting the models in MODEL that are displayed.

include.classlimit the output to the specific class.

format possible values are "VB".

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

Page 17: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

randomForest2Rules 17

Examples

## Display a ruleset for a specific model amongst the 500.## Not run: printRandomForests(rfmodel, 5)

## Display a ruleset for specific models amongst the 500.## Not run: printRandomForests(rfmodel, c(5,10,15))

## Display a ruleset for each of the 500 models.## Not run: printRandomForests(rfmodel)

randomForest2Rules Generate accessible data structure of a randomForest model

Description

A randomForest model, by default, consists of 500 decision trees. This function walks through eachtree and generates a set of rules. This takes a considerable amount of time and is provided for usersto access the actual model, but it is not yet used within the Rattle GUI. It may be used to display theoutput of the RF (but it takes longer to generate than the model itself!). Or it might only be used onexport to PMML or SQL.

Usage

randomForest2Rules(model, models=NULL)

Arguments

model a randomForest model.

models a list of integers limiting the models in MODEL that are converted.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

Examples

## Generate a ruleset for a specific model amongst the 500.## Not run: randomForest2Rules(rfmodel, 5)

## Generate a ruleset for specific models amongst the 500.## Not run: randomForest2Rules(rfmodel, c(5,10,15))

## Generate a ruleset for each of the 500 models.## Not run: randomForest2Rules(rfmodel)

Page 18: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

18 rattle

rattle Display the Rattle User Interface

Description

The Rattle user interface uses the RGtk2 package to present an intuitive point and click interface fordata mining, extensively building on the excellent collection of R packages for data manipulation,exploration, analysis, and evaluation.

Usage

rattle(csvname=NULL)

Arguments

csvname the name of a CSV file to load into Rattle on startup.

Details

Refer to the Rattle home page in the URL below for a growing reference manual for using Rattle.

Whilst the underlying functionality of Rattle is built upon a vast collection of other R packages,Rattle itself provides a collection of utility functions used within Rattle. These are made availablethrough loading the rattle package into your R library. The See Also section lists these utilityfunctions that may be useful outside of Rattle.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

See Also

evaluateRisk, genPlotTitleCmd, plotRisk.

Examples

## You can start rattle with a path to a csv file to pre-specify the## dataset. You then need to click Execute to load the data.

rattle(system.file("csv", "audit.csv", package = "rattle"))

Page 19: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

savePlotToFile 19

savePlotToFile Save a plot in some way

Description

For the current device, or for the device identified, save the plot displayed there in some way. This iseither saved to file, copied to the clipboard for pasting into other applications, or sent to the printerfor saving a hard copy.

Usage

savePlotToFile(file.name, dev.num=dev.cur())copyPlotToClipboard(dev.num=dev.cur())printPlot(dev.num=dev.cur())

Arguments

file.name Character string naming the file including the file name extension which is usedto specify the type of file to save.

dev.num A device number indicating which device to save.

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

treeset.randomForestGenerate a representtaion of a tree in a Random Forest

Description

Usage

treeset.randomForest(model, n=1, root=1, format="R")

Arguments

model a randomForest model.n a specific tree to list.root where to start the stree from, primarily for internal use.format one of "R", "VB".

Page 20: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

20 treeset.randomForest

Author(s)

[email protected]

References

Package home page: http://rattle.togaware.com

Examples

## Display a treeset for a specific model amongst the 500.## Not run: treeset.randomForests(rfmodel, 5)

Page 21: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

Index

∗Topic aplotgenPlotTitleCmd, 7

∗Topic clustercenters.hclust, 4

∗Topic datasetsaudit, 2

∗Topic dplotevaluateRisk, 6

∗Topic environmentrattle, 17

∗Topic hplotcalcInitialDigitDistr, 2calculateAUC, 3drawTreeNodes, 4drawTreesAda, 5listTreesAda, 9plotBenfordsLaw, 10plotNetwork, 11plotOptimalLine, 12plotRisk, 13printRandomForests, 15randomForest2Rules, 16savePlotToFile, 18treeset.randomForest, 19

∗Topic internalrattle_gui, 8

∗Topic treelistRPartRules, 9

ada.formula (rattle_gui), 8audit, 2

calcInitialDigitDistr, 2calculateAUC, 3cat_toggled (rattle_gui), 8centers.hclust, 4con_toggled (rattle_gui), 8copyPlotToClipboard

(savePlotToFile), 18

display_click_execute_message(rattle_gui), 8

display_click_odbc_execute_message(rattle_gui), 8

drawTreeNodes, 4drawTreesAda, 5

eval, 7, 8evaluateRisk, 3, 6, 15, 17

genPlotTitleCmd, 7, 15, 17

imp_edited (rattle_gui), 8imp_toggled (rattle_gui), 8ISO_8601_to_POSIX_datetime_format

(rattle_gui), 8

listRPartRules, 9listTreesAda, 9load_rdata_set_combo

(rattle_gui), 8

on_about_menu_activate(rattle_gui), 8

on_ada_continue_button_clicked(rattle_gui), 8

on_ada_defaults_button_clicked(rattle_gui), 8

on_ada_draw_button_clicked(rattle_gui), 8

on_ada_errors_button_clicked(rattle_gui), 8

on_ada_importance_button_clicked(rattle_gui), 8

on_ada_list_button_clicked(rattle_gui), 8

on_ada_stumps_button_clicked(rattle_gui), 8

on_ada_stumps_checkbutton_toggled(rattle_gui), 8

21

Page 22: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

22 INDEX

on_arff_filechooserbutton_update_preview(rattle_gui), 8

on_arff_radiobutton_toggled(rattle_gui), 8

on_associate_plot_frequency_button_clicked(rattle_gui), 8

on_associate_rules_button_clicked(rattle_gui), 8

on_boost_radiobutton_toggled(rattle_gui), 8

on_categorical_clear_button_clicked(rattle_gui), 8

on_confusion_radiobutton_toggled(rattle_gui), 8

on_continuous_clear_button_clicked(rattle_gui), 8

on_copy1_activate (rattle_gui), 8on_correlation_radiobutton_toggled

(rattle_gui), 8on_csv_filechooserbutton_update_preview

(rattle_gui), 8on_csv_radiobutton_toggled

(rattle_gui), 8on_ctivate (rattle_gui), 8on_cut1_activate (rattle_gui), 8on_data_entry_radiobutton_toggled

(rattle_gui), 8on_delete_menu_activate

(rattle_gui), 8on_dtree_radiobutton_toggled

(rattle_gui), 8on_e1071_radiobutton_toggled

(rattle_gui), 8on_editdata_button_clicked

(rattle_gui), 8on_evaluate_csv_radiobutton_toggled

(rattle_gui), 8on_evaluate_distributions_button_clicked

(rattle_gui), 8on_evaluate_radiobutton_group_changed

(rattle_gui), 8on_evaluate_rdataset_radiobutton_toggled

(rattle_gui), 8on_execute_button_clicked

(rattle_gui), 8on_explot_radiobutton_toggled

(rattle_gui), 8on_export_button_clicked

(rattle_gui), 8on_export_pmml_activate

(rattle_gui), 8on_gbm_importance_button_clicked

(rattle_gui), 8on_ggobi_radiobutton_toggled

(rattle_gui), 8on_hclust_data_plot_button_clicked

(rattle_gui), 8on_hclust_dendrogram_button_clicked

(rattle_gui), 8on_hclust_discriminant_plot_button_clicked

(rattle_gui), 8on_hclust_plot_button_clicked

(rattle_gui), 8on_hclust_radiobutton_toggled

(rattle_gui), 8on_hclust_stats_button_clicked

(rattle_gui), 8on_help_ada_activate

(rattle_gui), 8on_help_arff_activate

(rattle_gui), 8on_help_confusion_table_activate

(rattle_gui), 8on_help_correlation_activate

(rattle_gui), 8on_help_csv_activate

(rattle_gui), 8on_help_distributions_activate

(rattle_gui), 8on_help_general_activate

(rattle_gui), 8on_help_ggobi_activate

(rattle_gui), 8on_help_glm_activate

(rattle_gui), 8on_help_hierarchical_correlation_activate

(rattle_gui), 8on_help_impute_activate

(rattle_gui), 8on_help_kmeans_activate

(rattle_gui), 8on_help_log_activate

(rattle_gui), 8on_help_nomenclature_data_activate

(rattle_gui), 8on_help_odbc_activate

Page 23: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

INDEX 23

(rattle_gui), 8on_help_principal_components_activate

(rattle_gui), 8on_help_randomForest_activate

(rattle_gui), 8on_help_rdata_file_activate

(rattle_gui), 8on_help_rdataset_activate

(rattle_gui), 8on_help_roles_activate

(rattle_gui), 8on_help_rpart_activate

(rattle_gui), 8on_help_sample_activate

(rattle_gui), 8on_help_sensitivity_activate

(rattle_gui), 8on_help_summary_activate

(rattle_gui), 8on_help_support_vector_machine_activate

(rattle_gui), 8on_help_weight_calculator_activate

(rattle_gui), 8on_hiercor_radiobutton_toggled

(rattle_gui), 8on_impute_radiobutton_toggled

(rattle_gui), 8on_kernlab_radiobutton_toggled

(rattle_gui), 8on_kmeans_data_plot_button_clicked

(rattle_gui), 8on_kmeans_discriminant_plot_button_clicked

(rattle_gui), 8on_kmeans_plot_button_clicked

(rattle_gui), 8on_kmeans_radiobutton_toggled

(rattle_gui), 8on_kmeans_seed_button_clicked

(rattle_gui), 8on_kmeans_stats_button_clicked

(rattle_gui), 8on_lift_radiobutton_toggled

(rattle_gui), 8on_new_activate (rattle_gui), 8on_new_button_clicked

(rattle_gui), 8on_nnet_radiobutton_toggled

(rattle_gui), 8

on_notebook_switch_page(rattle_gui), 8

on_odbc_radiobutton_toggled(rattle_gui), 8

on_open_activate (rattle_gui), 8on_open_button_clicked

(rattle_gui), 8on_paste1_activate (rattle_gui), 8on_plot_close_button_clicked

(rattle_gui), 8on_plot_copy_button_clicked

(rattle_gui), 8on_plot_print_button_clicked

(rattle_gui), 8on_plot_save_button_clicked

(rattle_gui), 8on_prcomp_radiobutton_toggled

(rattle_gui), 8on_precision_radiobutton_toggled

(rattle_gui), 8on_priors_entry_changed

(rattle_gui), 8on_rattle_menu_activate

(rattle_gui), 8on_rdata_filechooserbutton_update_preview

(rattle_gui), 8on_rdata_radiobutton_toggled

(rattle_gui), 8on_rdataset_radiobutton_toggled

(rattle_gui), 8on_regression_radiobutton_toggled

(rattle_gui), 8on_rf_errors_button_clicked

(rattle_gui), 8on_rf_importance_button_clicked

(rattle_gui), 8on_rf_print_tree_button_clicked

(rattle_gui), 8on_rf_radiobutton_toggled

(rattle_gui), 8on_risk_comboboxentry_changed

(rattle_gui), 8on_risk_radiobutton_toggled

(rattle_gui), 8on_roc_radiobutton_toggled

(rattle_gui), 8on_rpart_loss_comboboxentry_set_focus_child

(rattle_gui), 8

Page 24: The rattle Package - uni-bayreuth.deftp.uni-bayreuth.de/math/statlib/R/CRAN/doc/packages/rattle.pdf · The rattle Package September 30, 2007 Type Package Title A graphical user interface

24 INDEX

on_rpart_plot_button_clicked(rattle_gui), 8

on_rpart_rules_button_clicked(rattle_gui), 8

on_sample_checkbutton_toggled(rattle_gui), 8

on_sample_count_spinbutton_changed(rattle_gui), 8

on_sample_percentage_spinbutton_changed(rattle_gui), 8

on_sample_radiobutton_toggled(rattle_gui), 8

on_sample_seed_button_clicked(rattle_gui), 8

on_save_as_activate (rattle_gui),8

on_save_button_clicked(rattle_gui), 8

on_save_menu_activate(rattle_gui), 8

on_score_radiobutton_toggled(rattle_gui), 8

on_sensitivity_radiobutton_toggled(rattle_gui), 8

on_summary_find_button_clicked(rattle_gui), 8

on_summary_next_button_clicked(rattle_gui), 8

on_summary_radiobutton_toggled(rattle_gui), 8

on_svm_kernel_comboboxentry_changed(rattle_gui), 8

on_svm_radiobutton_toggled(rattle_gui), 8

on_tools_associate_activate(rattle_gui), 8

on_tools_cluster_activate(rattle_gui), 8

on_tools_data_activate(rattle_gui), 8

on_tools_evaluate_activate(rattle_gui), 8

on_tools_explore_activate(rattle_gui), 8

on_tools_log_activate(rattle_gui), 8

on_tools_model_activate(rattle_gui), 8

on_tools_sample_activate(rattle_gui), 8

on_tools_transform_activate(rattle_gui), 8

on_tools_variables_activate(rattle_gui), 8

on_tooltips_activate(rattle_gui), 8

on_twoclass_radiobutton_toggled(rattle_gui), 8

on_unsupervised_radiobutton_toggled(rattle_gui), 8

on_variables_toggle_ignore_button_clicked(rattle_gui), 8

on_variables_toggle_input_button_clicked(rattle_gui), 8

on_viewdata_button_clicked(rattle_gui), 8

on_viewdata_find_button_clicked(rattle_gui), 8

on_viewdata_next_button_clicked(rattle_gui), 8

open_odbc_set_combo (rattle_gui),8

paste, 7plotBenfordsLaw, 10plotNetwork, 11plotOptimalLine, 12plotRisk, 7, 8, 12, 13, 17printPlot (savePlotToFile), 18printRandomForests, 15

quit_rattle (rattle_gui), 8

randomForest2Rules, 16rattle, 17rattle_gui, 8

savePlotToFile, 18

title, 7, 8treeset.randomForest, 19

update_comboboxentry_with_dataframes(rattle_gui), 8