1 peter fox data analytics – itws-4963/itws-6965 week 7a, march 10, 2015 labs: more data, models,...

Peter Fox

Data Analytics – ITWS-4963/ITWS-6965

Week 7a, March 10, 2015

Labs: more data, models, prediction, deciding with trees

Assignment 6 on Website• Your term projects should fall within the scope of a data analytics

problem of the type you have worked with in class/ labs, or know of yourself – the bigger the data the better. This means that the work must go beyond just making lots of figures. You should develop the project to indicate you are thinking of and exploring the relationships and distributions within your data. Start with a hypothesis, think of a way to model and use the hypothesis, find or collect the necessary data, and do both preliminary analysis, detailed modeling and summary (interpretation). Grad students must develop two types of models.– Note: You do not have to come up with a positive result, i.e. disproving the hypothesis

is just as good.

• Introduction (2%)• Data Description (3%)• Analysis (5%)• Model Development (12%)• Conclusions and Discussion (3%)• Oral presentation (5%) (~5 mins)

Titanic – Bayes (from last week)

> data(Titanic)

> mdl <- naiveBayes(Survived ~ ., data = Titanic)

Naive Bayes Classifier for Discrete PredictorsCall: naiveBayes.formula(formula = Survived ~ ., data = Titanic)A-priori probabilities:Survived No Yes 0.676965 0.323035 Conditional probabilities: ClassSurvived 1st 2nd 3rd Crew No 0.08187919 0.11208054 0.35436242 0.45167785 Yes 0.28551336 0.16596343 0.25035162 0.29817159 SexSurvived Male Female No 0.91543624 0.08456376 Yes 0.51617440 0.48382560 AgeSurvived Child Adult No 0.03489933 0.96510067 Yes 0.08016878 0.91983122 Try Lab6b_9_2014.R

Classification Bayes (last week)

• Retrieve the abalone.csv dataset• Predicting the age of abalone from physical

measurements. • Perform naivebayes classification to get

predictors for Age (Rings). Interpret.• Compare to what you got from kknn (weighted

nearest neighbors) in class 4b

http://www.ugrad.stat.ubc.ca/R/library/mlbench/html/HouseVotes84.

html > require(mlbench)

> data(HouseVotes84)

> model <- naiveBayes(Class ~ ., data = HouseVotes84)

> predict(model, HouseVotes84[1:10,-1])

[1] republican republican republican democrat democrat democrat republican republican republican

[10] democrat

Levels: democrat republican 5

House Votes 1984> predict(model, HouseVotes84[1:10,-1], type = "raw")

democrat republican

[1,] 1.029209e-07 9.999999e-01

[2,] 5.820415e-08 9.999999e-01

[3,] 5.684937e-03 9.943151e-01

[4,] 9.985798e-01 1.420152e-03

[5,] 9.666720e-01 3.332802e-02

[6,] 8.121430e-01 1.878570e-01

[7,] 1.751512e-04 9.998248e-01

[8,] 8.300100e-06 9.999917e-01

[9,] 8.277705e-08 9.999999e-01

[10,] 1.000000e+00 5.029425e-116

House Votes 1984

> pred <- predict(model, HouseVotes84[,-1])

> table(pred, HouseVotes84$Class)

pred democrat republican

democrat 238 13

republican 29 155

Hair, eye color> data(HairEyeColor)

> mosaicplot(HairEyeColor)

> margin.table(HairEyeColor,3)

Male Female

279 313

> margin.table(HairEyeColor,c(1,3))

Hair Male Female

Black 56 52

Brown 143 143

Red 34 37

Blond 46 81

Construct a naïve Bayes classifier and test it! 8

Another example> A = c(1, 2.5); B = c(5, 10); C = c(23, 34)

> D = c(45, 47); E = c(4, 17); F = c(18, 4)

> df <- data.frame(rbind(A,B,C,D,E,F))

> colnames(df) <- c("x","y")

> hc <- hclust(dist(df))

> plot(hc)

> df$cluster <- cutree(hc,k=2) # 2 clusters

> plot(y~x,df,col=cluster)9

1 peter fox data analytics – itws-4963/itws-6965 week 7a, march 10, 2015 labs: more data, models,...

e house votes

raw democrat republican

democrat levels

r slide

data description

necessary data

data analytics problem

titanic bayes

Documents

lab manual for itws lab.pdf

itws lab manual

1 peter fox data analytics – itws-4963/itws-6965 week 3a,...

itws capstone (rpi, fall 2013)

engineering a semantic web: itws capstone lecture (spring...

cs 4963: ui design

1 peter fox data analytics – itws-4963/itws-6965 week 7b,...

final itws lab manual-ravi kumar

1 peter fox data analytics – itws-4963/itws-6965 week 11a,...

1 peter fox data analytics – itws-4963/itws-6965 week 1a,...

itws-2210 / csci-4963. logistics format – class w/ mini...

1 peter fox data analytics – itws-4963/itws-6965 week 4b,...

1 peter fox data analytics – itws-4963/itws-6965 week 10b,...

final itws lab manual-.pdf

1 peter fox data analytics – itws-4963/itws-6965 week 13a,...

multimedia icom 5007lpor carlos m. rubert (802)01-6965

itws manual

1 peter fox data analytics – itws-4963/itws-6965 week 10a,...

1 peter fox data analytics itws-4963/itws-6965 week 2a,...

inshaa 6965