visualizationrose/590b/pdf/presentation1.pdf · auto.key=t) title: microsoft powerpoint -...

Post on 15-Aug-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Visualization

• Exam data from CSCE 350

• Analysis?

• Mean & SD?

Anscombe’s Quartet

• Start by visualization

• Model selection

• Example: import education.csv

• Just for chuckles: calculate mean

• How to visualize?

• Suggestions?

• Plot individual columns?

• Plot everything?– Plot(education)

• No obvious correlation between high school and other dimensions

• What about bs and adv?

• How can we investigate?

• plot(education$bs ~ education$adv)

• Model with linear regression

• How?

Linear model

• Check out lm: ?lm

• lm(education$bs ~ education$adv)

• Visualize: use abline check out: ?abline

• abline(lm(education$bs ~ education$adv), col=“red”)

• Is it linear?

• Try non‐linear model: ?lowess

• lowess(education$bs ~ education$adv)

• Now plot with different color

• Try lines: ?lines

• lines(lowess(education$bs ~ education$adv), col="blue")

Drug trials

• Null hypothesis: 4 drugs have similar outcome

• Create data:– trials = sample(c("drug1", "drug2", "drug3","drug4"), size=1000, replace =T)

What does this command do?

Look at result

• outcome = ifelse(trials=="drug1", rlnorm(1000,meanlog=log(35)), ifelse(trials=="drug2", rlnorm(1000,meanlog=log(50)), ifelse(trials=="drug4", rlnorm(1000,meanlog=log(60)), rlnorm(1000,meanlog=log(80)))))

• What does this command do?

• Create data frame:– dt = data.frame(trial=trials, results=outcome)

• Examine the data frame

• How?

• Could print it

• Use summary command– summary(dt)

– What does the summary tell us?

– So which drug is better?

• Find the mean response of each drug

• How?

• Lookup aggregate function: ?aggregate

• Find the mean response of each drug

• aggregate(x=dt$results, by=list(dt$trial), FUN="mean")

• Now plot result using boxplot– boxplot(results ~ as.factor(trial), data = dt)

• What does “data=dt” do?

• Scaling not so good. What to do?

• Try log plot– boxplot(results ~ as.factor(trial), data = dt, log="y")

– Better!

• Use lm again to create model– model = lm(log10(results) ~ as.factor(trial), data=dt)

– What does this command do?• log10()?• as.factor()?

• Examine the resulting model– summary(model)

– How do we interpret the summary?

– Look at the p‐value on the F‐stat.

– Should we accept or reject the null hypothesis?

• Lets try some more visualization– Library(lattice)

– densityplot(~ results, group=trials, data=dt, auto.key=T)

– densityplot(~ log10(results), group=trials, data=dt, auto.key=T)

top related