data visualization with r (ii) dr. jieh-shan george yeh [email protected]

22
Data Visualization with R (II) Dr. Jieh-Shan George YEH [email protected]

Upload: dorcas-flynn

Post on 21-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

Data Visualization with R (II)

Dr. Jieh-Shan George [email protected]

Page 2: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

2

Outlines

• Data Visualization with R• Visualizing Different Type of Data– Univariate– Univariate Categorical– Bivariate Categorical– Bivariate Continuous vs Categorical– Bivariate Continuous vs Continuous– Bivariate: Continuous vs Time

Page 3: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

3

Data Visualization with R

• Both anecdotally, and per Google Trends, R is the language and tool most closely associated with creating data visualizations. – http://www.google.com/trends/explore?hl=en-US#q=

R%20language,%20Data%20Visualization,%20D3.js,%20Processing.js&cmpt=q

Page 4: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

4

Google Trend on R & Data Visualization

Page 5: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

5

Google Trend on R & Data Visualization

Page 6: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

6

GRAPH FOR DATA MINING

Page 7: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

7

Hierarchical Clustering

• hc<-hclust(dist(mtcars))• plot(hc)• rect.hclust(hc, k=4)

Page 8: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

8

Decision Tree

require(rpart)require(rpart.plot)rp1<-rpart(factor(cyl)~mpg, data=mtcars)prp(rp1)

Page 9: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

9

OTHERS

Page 10: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

10

Financial TimeseriesQuantitative Financial Modeling Framework

• require(quantmod)• getSymbols("YHOO",src="google") # from google

finance• getSymbols("YHOO", from="2014-01-01")• chartSeries(YHOO)

Page 11: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

11

• barChart(YHOO)• candleChart(YHOO,multi.col=TRUE,theme="white") • chartSeries(to.weekly(YHOO),up.col='white',dn.col='

blue')

Page 12: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

12

GGPLOT2

Page 13: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

13

ggplot2

• The ggplot2 package, created by Hadley Wickham, offers a powerful graphics language for creating elegant and complex plots.

• Originally based on Leland Wilkinson's The Grammar of Graphics, ggplot2 allows you to create graphs that represent both univariate and multivariate numerical and categorical data in a straightforward manner.

• Grouping can be represented by color, symbol, size, and transparency. The creation of trellis plots (i.e., conditioning) is relatively simple.

• qplot() (for quick plot) hides much of this complexity when creating standard graphs.

Page 14: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

14

qplot()• The qplot() function can be used to create the most common graph

types. While it does not expose ggplot's full power, it can create a very wide range of useful plots. The format is:

qplot(x, y, data=, color=, shape=, size=, alpha=, geom=, method=, formula=, facets=, xlim=, ylim= xlab=, ylab=, main=, sub=)

Notes:• At present, ggplot2 cannot be used to create 3D graphs or mosaic

plots.• Use I(value) to indicate a specific value. For example size=z makes the

size of the plotted points or lines proportional to the values of a variable z. In contrast, size=I(3) sets each point or line to three times the default size.

Page 15: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

15

Customizing ggplot2 Graphs

• Unlike base R graphs, the ggplot2 graphs are not effected by many of the options set in the par( ) function.

• They can be modified using the theme() function, and by adding graphic parameters within the qplot() function.

• For greater control, use ggplot() and other functions provided by the package.

• ggplot2 functions can be chained with "+" signs to generate the final plot.

Page 16: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

16

Page 17: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

17

Example

# ggplot2 exampleslibrary(ggplot2)

# create factors with value labels mtcars$gear <- factor(mtcars$gear,levels=c(3,4,5), labels=c("3gears","4gears","5gears")) mtcars$am <- factor(mtcars$am,levels=c(0,1), labels=c("Automatic","Manual")) mtcars$cyl <- factor(mtcars$cyl,levels=c(4,6,8), labels=c("4cyl","6cyl","8cyl"))

Page 18: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

18

# Kernel density plots for mpg# grouped by number of gears (indicated by color)qplot(mpg, data=mtcars, geom="density", fill=gear, alpha=I(.5), main="Distribution of Gas Milage", xlab="Miles Per Gallon", ylab="Density")

Page 19: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

19

# Scatterplot of mpg vs. hp for each combination of gears and cylinders# in each facet, transmission type is represented by shape and colorqplot(hp, mpg, data=mtcars, shape=am, color=am, facets=gear~cyl, size=I(3), xlab="Horsepower", ylab="Miles per Gallon")

Page 20: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

20

# Separate regressions of mpg on weight for each number of cylindersqplot(wt, mpg, data=mtcars, geom=c("point", "smooth"), method="lm", formula=y~x, color=cyl, xlab="Weight", ylab="Miles per Gallon“, main="Regression of MPG on Weight",

)

Page 21: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

21

# Boxplots of mpg by number of gears # observations (points) are overlayed and jitteredqplot(gear, mpg, data=mtcars, geom=c("boxplot", "jitter"), fill=gear, main="Mileage by Gear Number", xlab="", ylab="Miles per Gallon")

Page 22: Data Visualization with R (II) Dr. Jieh-Shan George YEH jsyeh@pu.edu.tw

22

• To learn more, see the ggplot reference site– http://docs.ggplot2.org/current/index.html