statistical software r. more data sets …. see

49
Statistical Software R

Upload: clarence-goodwin

Post on 13-Jan-2016

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Statistical Software R. More data sets …. See

Statistical Software R

Page 2: Statistical Software R. More data sets …. See

More data sets …. See http://www.statsci.org

Page 3: Statistical Software R. More data sets …. See

What is R ?

A new(?) standard to interchange the ideas of statistics.

- 1st version was published in early 90’s

- Public SW by GNU, under GPL ( It’s free ).

- S language + Math/Stat Lib + Graphical tools

- More information: http://www.cran.r-

project.org

Page 4: Statistical Software R. More data sets …. See

Time vs Time

Dev. time

Run time

C, FORTRAN

Excel

R

Develop for 1 month, run in 1 second.

Or, develop for 1 day, run in 10

min.

Page 5: Statistical Software R. More data sets …. See

Applicability, range of

Applicability

Convenience

C, FORTRAN

Excel

R

C, FORTRAN

R

Excel

Calculator

Page 6: Statistical Software R. More data sets …. See

R, Excel and C

- Excel is a SW for general purpose

- R is a professional SW

- C is a developing tool having wide range of applicability

Page 7: Statistical Software R. More data sets …. See

GUI ?

• Clicking is slower and hard than typing !!

• Clicking is not good for iterative job at

company

• Clicking is easy to generate garbage !!

GUI is a good feature , especially for novice!

Page 8: Statistical Software R. More data sets …. See

R is ~

R = S lang. + Math & Stat Lib. + Graphic

tools

Easy & efficient handling of data

Rich modern statistical routines

Free under GPL of GNU

- R is at the center of statistical development.

- To turn ideas into SW, quickly and faithfully.

- R is a tool for saving & exchanging statistical data

Page 9: Statistical Software R. More data sets …. See

Very good book, but a little difficult to novice.

Page 10: Statistical Software R. More data sets …. See

Easier alternatives

Page 11: Statistical Software R. More data sets …. See

There are many easy books (try to find in amazon)

and free tutorial guides in internet.

http://cran.r-project.org/doc/manuals/R-intro.pdf

Official free introductory guide:

Page 12: Statistical Software R. More data sets …. See

http://tryr.codeschool.com/

A free self study guide sites:

http://www.sr.bham.ac.uk/~ajrs/R/index.html

Page 13: Statistical Software R. More data sets …. See

http://www.cran.r-project.org/bin/windows/base/R-2.10.1-win32.exe

Download

R ver. 2.10.1, base package, executable binary file :

Contributed packages: downloading inside of R

By clicking the install icon, you can install R easily.

Page 14: Statistical Software R. More data sets …. See

ENIAC programming, 1946

Page 15: Statistical Software R. More data sets …. See

A journey for easy scientific computing

Pascal

S

C

Lisp

Scheme

S-plus

C++

COBOL

Algol60

Smalltalk

FORTRAN

APL

OOSense

Semantics

Syntax

ENIAC

Page 16: Statistical Software R. More data sets …. See

Features of R

1. Vector Arithmetic (APL, S-plus)

2. Object Oriented property (Smalltalk, S-

plus)

3. Lazy evaluation (S-plus)

4. (Nested) lexical scoping (Scheme,

PASCAL)

Page 17: Statistical Software R. More data sets …. See

1. Vector Arithmetic

x <- c(10,20,30) + c(5,5,5)

y <- c(10,20,30) + c(1,2,3)

Page 18: Statistical Software R. More data sets …. See

2. Object oriented property

Smalltalk (1970, A. Kay, Xerox)

Everything is an object, and every object has a

class.

Object is everything ?

Integrated concept : Variable, Data, Function,

…..

Unified framework to work on. (user)

Class has the info of the object. (types of var)

Page 19: Statistical Software R. More data sets …. See

거시기

갑옷을 거시기하자 ( 갑옷을 입자 , 갑옷을

벗자 )

class: 갑옷 method: 거시기 object: 실제

개개의 갑옷

Page 20: Statistical Software R. More data sets …. See

Concept of OO

Clicking the mouse button !

( open a file, execute a pgm, delete a

file, ….)

Let the function work properly

according to the characteristics of

objects !

Make human command easier

and make computer work harder

to understand the command.

Page 21: Statistical Software R. More data sets …. See

OO in R

- diag(3), diag(c(1,2,3)), diag(diag(3))

- plot(sunspots) , plot(Titanic),

plot(USJudgeRatings)

- attributes(sunspots) ,

attributes(Titanic),

attributes (USJudgeRatings)

Page 22: Statistical Software R. More data sets …. See

How to use R

1) Help : by menu, help(plot), ?title

2) demo(); demo(nlm); demo(image)

3) x <- matrix(1:4,2,); ls();

attributes(x)

4) #Install & Upload package tseries; search()

5) save.image("C:/temp/a.RData"); q()

Page 23: Statistical Software R. More data sets …. See

Memory & HDD

HDD

Peripheral device Computer

CPU

Memory

Page 24: Statistical Software R. More data sets …. See

How R works

Frame for computing

Input Output

.GlobalEnv

library

….

Environment

Namespace & Loaded Value

> search()

> searchpaths()

….

Memory

HDD

new objects

loaded package

> ls() # shows objects inside of libraries

Page 25: Statistical Software R. More data sets …. See

R data sets

R has its own data sets for testing

- data();

- Titanic; ?Titanic

- plot(Titanic)

Page 26: Statistical Software R. More data sets …. See

http://www.aw.com/sharpe Data sets of SVV

Get text file and excel file in your computer,

and decompress.

Make copies of text files under “C:\temp\text”

Page 27: Statistical Software R. More data sets …. See

SDV data : see p 188 # 32 , Economic Analysis data

Page 28: Statistical Software R. More data sets …. See

You can draw by yourself very simply !

data.svv<-dir("c:/temp/text")dfile.svv<-paste("c:/temp/text/",data.svv,sep="")

dsv<- read.table(dfile.svv[37],head=TRUE, sep="\t")

y<-dsv[,3]x<-dsv[,4]

plot(x,y, pch=16, col="purple", xlab="Sogang Stat" )

points(20000,40, pch=1, cex=10, col="blue")title("Economic Analysis")

Page 29: Statistical Software R. More data sets …. See

Install & load packages

Memory

HDD Internet

Load

Install

Server

Page 30: Statistical Software R. More data sets …. See

Stock price data from finance.yahoo.com

ghq<-get.hist.quote # upload the package “tseries”

time<- "1996-01-01"

kospi <- ghq(ins = "^ks11", start =time, quote = "Close")

dscon <- ghq(ins = "011160.ks", start = time, quote ="Close")

tm <- ghq(ins = "tm", start =time, quote = "Close")

plot(tm,xlab="Toyata Motors")

plot(kospi,dscon,type="l", xlab=" 종합주가지수 ", ylab=" 두산건설 " )

Page 31: Statistical Software R. More data sets …. See

Hanoi Tower

By simple programming, graphical implementation of

Hanoi tower is possible in R . The code & program

were loaded to cyber campus.

- hanoi(4)

- hanoi(14)

Page 32: Statistical Software R. More data sets …. See

Business Statistics, Sogang Business School

# This is comment line.# download R from cran.r-project.org # explain menu first

q() # Stop R session; Do not save the workspace

# .First<-function() cat("Helo everyone ?\n") # .Last<-function() { cat(“Bye, SBS Students !")} # ls() # ls(all=TRUE)

q()

# Save the workspace

Page 33: Statistical Software R. More data sets …. See

# Now, we know the first and the last of R# That is, we know everything of R

q help help(q)

Page 34: Statistical Software R. More data sets …. See

data()

help(data)

sunspots

help(sunspots)

hist(sunspots)

help(hist)

args(hist) # arguments of the function

hist()

hist(sunspots, nclass=10) # with more

intervals

Page 35: Statistical Software R. More data sets …. See

par(mfrow=c(1,2)) # set graphic

layout

hist(sunspots) # in different

layout

hist(sunspots, nclass=20) # two in a

picture

hist(sunspots, nclass=20,plot=F) # without

plot

Page 36: Statistical Software R. More data sets …. See

?co2

# co2 and sunspots in Jan 59 - Dec 83 ?

co2x<- co2[1:(12*(83-58))]

sunpt<-sunspots[-(1:(12*(1958-1748)))]

par(mfrow=c(2,1))

plot(co2x)

plot(sunpt)

Page 37: Statistical Software R. More data sets …. See

x <- rnorm(100,0,1) # random number

generator

y<-rnorm(100,0,1) # each has 100

elements

x # show x

y # show y

xy<- x + y

( z<-rnorm(100,0,1) ) # assign and show

ls() # show objects in …

Page 38: Statistical Software R. More data sets …. See

# tuning for graphic layout

help(par)

# Text and Symbols: cex, pch, type, xlab,

ylab, ....

# The Plot Area: bty, pty, xlim, ylim, ....

# Figure and Page Areas: mfrow, ....

# Miscellaneous: lty, ....

Page 39: Statistical Software R. More data sets …. See

plot(x,y)

plot(xy, y)

# set the graphic parameters

par(mfrow=c(2,2), pty="s")

plot(x, y, pch=0, cex=0.7 ) # pch and

cex

plot(xy, y, pch=16,cex=0.7)

plot(x,y, pch=0, cex=1.2 )

plot(xy,y, pch=16, cex=1.2 )

Page 40: Statistical Software R. More data sets …. See

par(mfrow=c(1,1)) # mfrow

plot(xy,y, pch=16, cex=1.2 )

plot(xy,y, type="n") # prepare

axis only

points(xy,y, pch=16, cex=1.2 )

lines(xy,y)

# plot only points, but not axis

plot(xy,y, axes=FALSE, xlab="x+y",

ylab="y")

Page 41: Statistical Software R. More data sets …. See

cbind(x, y, xy) # column binding

y[y>0]

xy[y>0]

cbind(x, y, xy) [y>0]

plot(xy,y, type="n", xlab="x+y", ylab="y" )

# axis only

points(xy[y>0],y[y>0], pch=16, cex=0.6 )

# for y>0

points(xy[y<=0],y[y<=0], pch=1, cex=0.8 )

# y <= 0

Page 42: Statistical Software R. More data sets …. See

# pch

plot(c(-1,8),c(-1,8), type="n")

for(i in 0:7) for(j in 0:7) points(i, j, pch=i+8*j,

cex=1.2)

points(-0.5, -0.5, pch="9", cex=1.2)

points(7.5, 7.5, pch=" 한 ", cex=1.2)

Page 43: Statistical Software R. More data sets …. See

identify( xy, y, x)

# to pick the points, using (left) mouse

button

identify( xy, y, round(x,2), cex=0.6)

# to stop, use (right) mouse button

pts<-locator(5)

polygon(pts)

help(polygon)

Page 44: Statistical Software R. More data sets …. See

par() # all graphic parameters

par()$usr # usr

uc <- par()$usr # to simplify

lines( c(uc[1], uc[2]), c(0,0), lty=2) # center

line

lines( c(0,0), c(uc[3], uc[4]), lty=2) # lty

# diagonal line

lines( c(uc[1], uc[2]), c(uc[3], uc[4]) , lty=1)

text( 1.0, -1.2, " positive y-values ! ")

              title(" (x+y) and y from N(0,1) ", cex=0.6 )

Page 45: Statistical Software R. More data sets …. See

help(USJudgeRatings) USJudgeRatings

pairs(USJudgeRatings)

pairs(USJudgeRatings[1:5])

Page 46: Statistical Software R. More data sets …. See

## put histograms on the diagonal

panel.hist <- function(x, ...) { usr <- par("usr"); on.exit(par(usr)) par(usr = c(usr[1:2], 0, 1.5) ) h <- hist(x, plot = FALSE) breaks <- h$breaks; nB <- length(breaks) y <- h$counts; y <- y/max(y) rect(breaks[-nB], 0, breaks[-1], y, col="cyan", ...) }

pairs(USJudgeRatings[1:5],

panel=panel.smooth,

cex = 1.5, pch = 24, bg="light blue",

diag.panel=panel.hist, cex.labels = 2,

font.labels=2)

Page 47: Statistical Software R. More data sets …. See

# You can fix and modify the picture in

power point

# Class Assignment.

# draw the picture of (2x+y, 2y)

# for different pch parameters

# in a plot and put a legend.

Page 48: Statistical Software R. More data sets …. See

# Important functions to understand R

# ls(); search(); searchpaths()

# attributes()

# c(); data.frame() ; factor();

ordered()

# apply()

Page 49: Statistical Software R. More data sets …. See

Thank you !!