learning notes of r for python programmer (temp1)
TRANSCRIPT
Learning Notes of R
For Python Programmer
R Basic Scalar Types
• R basic scalar data types
– integer ( 1L,2L,3L,…)
– numeric ( 1,2,3,…)
– character
– complex
– logical (TRUE, FALSE)
• and(&) , or(|), not(!)
R Basic Scalar Types Constructors
• RScalarType(0) == NULL
– length(xxx(0)) == 0
• RScalarType(1)
– integer 0L/ 0
– numeric 0
– character “”
– complex 0+0i
– logical FALSE
R Basic Object Types
• R basic data structure types
– (row) vector (In R, everything is vector)
– matrix
– list
– data.frame
– factor
– environment
• In R the “base" type is a vector, not a scalar.
R Object
Find R Object’s Properties
• length(object)
• mode(object) / class(object)/ typeof(obj)
• attributes(object)
• attr(object, name)
• str(object)
Python type(obj)
• R> class(obj)
• R> mode(obj)
• R> typeof(obj)
class mode typeof
1 "numeric" "numeric" "double"
1:10 “integer" "numeric" “integer"
“1” "character" "character" "character"
class "function" "function" "builtin"
Python dir(obj)
• attributes(obj)
• str(object)
• ls() (Python> dir() )
• The function attributes(object) returns a list of all the non-intrinsic attributes currently defined for that object.
R attr(object, name)
• The function attr(object, name) can be used to select a specific attribute.
• When it is used on the left hand side of an assignment it can be used either to associate a new attribute with object or to change an existing one.
• For example • > attr(z, "dim") <- c(10,10)
– allows R to treat z as if it were a 10-by-10 matrix.
R character
Python “a,b,c,d,e”.split(“,”) (R strsplit)
• strsplit(“a,b,c,d,e”,“\\,“)
• (Output R-list)
• unlist(strsplit(“a,b,c,d,e”,“\\,"))[vector_index]
R paste
• paste(“a”,”b”,sep=“”) – Python> “a”+”b” “ab”
R-List
Python-Dictionary
Python Dictionary (R List)
• Constructor
– Rlist <- list(key1=value1, … , key_n = value_n)
• Evaluate
– Rlist$key1 (Python> D[key1])
– Rlist[[1]]
• Sublist
– Rlist[key_i] (output list(key_i=value_i))
Python D[“new_key”]=new_value
• Rlist$new_key = new_value or
• Rlist$new_key <- new_value
Python> del D[key]
• New_Rlist <- Rlist[-key_index] or
• New_Rlist <- Rlist[-vector_of_key_index]
Python Dict.keys()
• vector_of_Rlist_keys <- names(Rlist) • ( output “vector_of_Rlist_keys” is a R-vector)
R-Vector
Python-List
Python List (R vector)
• [Constructor] vector(mode , length)
– vector(mode = "character", length = 10)
• 0:10
– 0:10 == c(0,1,2,3,4,5,6,7,8,9,10)
– Python> range(0,11) )
• seq(0,1,0.1)
– seq(0,1,0.1) == 0:10*0.1
– Matlab> linspace(0,1,0.1)
• rep(0:10, times = 2)
Python List.methods
• vector <- c(vector, other_vector) – Python> List.append
• vector[-J] or vector[-(I:J)] – Python> List.pop
• subvector <- vector[vector_of_index]
• which( vector == value ) – Python> List.index(value)
R which
• which( vector == value ) – Python> List.index(value)
• which( vector < v) or which( vector > v)
• which(arg, arr.in=TRUE)
• http://fortheloveof.co.uk/2010/04/11/r-select-specific-elements-or-find-their-index-in-a-vector-or-a-matrix/
R vector
• length(vector)
– Python> len(List)
• names(vector)
• rownames(vector)
Python> element in List
• R> element %in% R-Vector
• R> !(element %in% R-Vector) (not in)
R matrix
R-Vector with Dimension
R-Matrix
• Constructor:
– matrix( ?? , nrow = ?? , ncol = ?? )
– as.matrix( ?? )
R-Matrix=R-Vector with Dimension
> x <- 1:15
> class(x)
[1] "integer"
> dim(x) <- c(3, 5)
> class(x)
[1] "matrix"
Names on Matrix
• Just as you can name indices in a vector you can (and should!) name columns and rows in a matrix with colnames(X) and rownames(X).
• E.g.
– colname(R-matrix) <- c(name_1,name_2,…)
– colname(R-matrix) [i] <- name_i
Functions on Matrix
• If X is a matrix apply(X, 1, f) is the result of applying f to each row of X; apply(X, 2, f) to the columns.
– Python> map(func,py-List)
Add Columns and Rows
• cbind
E.g.
> cbind(c(1,2,3),c(4,5,6))
• rbind
E.g.
> rbind(c(1,2,3),c(4,5,6))
Data Frame in R
Explicitly like a list
Explicitly like a list
• When can a list be made into a data.frame?
– Components must be vectors (numeric, character, logical) or factors.
– All vectors and factors must have the same lengths.
Python os and R
Python os.method
• getwd() (Python> os.getcwd() )
• setwd(Path) (Python> os.chdir(Path))
Control Structures and Looping
if
• if ( statement1 )
• statement2
• else if ( statement3 )
• statement4
• else if ( statement5 )
• statement6
• else
• statement8
swtich
• Switch (statement, list)
• Example:
> y <- "fruit"
> switch(y, fruit = "banana", vegetable = "broccoli", meat = "beef")
[1] "banana"
for
• for ( name in vector ) statement1
• E.g.
>.for ( ind in 1:10) { print(ind) }
while
• while ( statement1 ) statement2
repeat
• repeat statement
• The repeat statement causes repeated evaluation of the body until a break is specifically requested.
• When using repeat, statement must be a block statement. You need to both perform some computation and test whether or not to break from the loop and usually this requires two statements.
Functions in R
Create Function in R
• name <- function(arg_1, arg_2, ...) expression
• E.g.
– ADD <- function(a,b) a+b
– ADD <- function(a,b) {c<-a+b}
– ADD <- function(a,b) {c<-a+b;c}
– ADD <- function(a,b) {c<-a+b; return(c)}
– (All these functions are the same functions)
Function Return R-List
• To return more than one item, create a list using list()
• E.g.
– MyFnTest1 <- function(a,b) {c<-a+b;d<-a-b; list(r1=c,r2=d)}
– MyFnTest1 <- function(a,b) {c<-a+b;d<-a-b; return(list(r1=c,r2=d))}
– (These two functions are the same, too)
Python map(func,Py-List)
• apply series methods (to be continued.)
R Time Objects
R Basic Time Objects
• Basic Types
– Date
– POSIXct
– POSIXlt
• Constructors:
– as.Date
– as. POSIXct
– as. POSIXlt
as.POSIXct/ as.POSIXlt
• as. POSIXct( timestamp , origin , tz , …)
• E.g.
– as. POSIXct( timestamp , origin="1970-01-01",tz="CST“, …)
strftime / strptime
• "POSIXlt“/"POSIXct“ to Character
– strftime(x, format="", tz = "", usetz = FALSE, ...)
• Character to "POSIXlt“
– strptime(x, format, tz = "")
• E.g.
– strptime(… ,"%Y-%m-%d %H:%M:%S", tz="CST")
Time to Timestamp [Python> time.mktime(…)]
• as.numeric(POSIXlt Object)
• E.g.
– as.numeric(Sys.time())
R Graph
Types of Graphics
• Base
• Lattice
Base Graphics
• Use function such as
– plot
– barplot
– contour
– boxplot
– pie
– pairs
– persp
– image
Plot Arguments
• type = ???
• axes = FALSE : suppresses axes
• xlab = “str” : label of x-axis
• ylab = “str” : label of y-axis
• sub = “str” : subtitle appear under the x-axis
• main = “str” : title appear at top of plot
• xlim = c(lo,hi)
• ylim = c(lo,hi)
Plot’s type arg
• type =
– “p” : plots points
– “l” : plots a line
– “n” : plots nothing,
just creates the axes for later use
– “b” : plots both lines and points
– “o” : plot overlaid lines and points
– “h” : plots histogram-like vertical lines
– “s” : plots step-like lines
Plot Example
• R> plot(x=(1:20),y=(11:30),pch=1:20,col=1:20,main="plot",xlab="x-axis",ylab="y-axis",ylim=c(0,30))
• R> example(points)
pch
• 0:18: S-compatible vector symbols.
• 19:25: further R vector symbols.
• 26:31: unused (and ignored).
• 32:127: ASCII characters.
• 128:255 native characters only in a single-byte locale and for the symbol font. (128:159 are only used on Windows.)
• Ref: http://stat.ethz.ch/R-manual/R-devel/library/graphics/html/points.html
http://rgraphics.limnology.wisc.edu/
cex
• a numerical vector giving the amount by which plotting characters and symbols should be scaled relative to the default. This works as a multiple of par("cex"). NULL and NA are equivalent to 1.0. Note that this does not affect annotation: see below.
• E.g.
– points(c(6,2), c(2,1), pch = 5, cex = 3, col = "red")
– points(c(6,2), c(2,1), pch = 5, cex = 10, col = "red")
points, lines, text, abline
arrows
par/layout (Matlab> subplot)
• par(mfrow=c(m,n))
– Matlab> subplot(m,n,?)
pairs
• E.g.
– R> pairs(iris[,c(1,3,5)])
– R> example(pairs)
MISC. Code1 (Saving Graph)
• postscript("myfile.ps")
• plot(1:10)
• dev.off()
MISC. Code2 (Saving Graph)
• windows(record=TRUE, width=7, height=7)
• Last_30_TXF<-last(TXF,30)plt
• chartSeries(Last_30_TXF)
• savePlot(paste("Last30_",unlist(strsplit(filename,"\\."))[1],sep=""),type = "jpeg",device = dev.cur(),restoreConsole = TRUE)
可使用的顏色種類
• R> colors() 可以查出所有顏色
• 可搭配grep找尋想要的色系, 如
• R> grep("red",colors())
• Reference: • http://research.stowers-institute.org/efg/R/Color/Chart/
R xts
Tools for xts
• diff
• lag
My XTS’ Tools
• Integration_of_XTS
• Indexing_of_XTS
• XTS_Push_Events_Back
• Get_XTS_Local_Max
• Get_XTS_Local_Min
Basic Statistics Tools
R Statistical Models
Model Formulae
• formula(x, ...)
• as.formula(object, env = parent.frame())
• E.g.
– R> example(formula)
MISC. 1 Updating fitted models
• http://cran.r-project.org/doc/manuals/R-intro.html#Updating-fitted-models
R Packages
• library()
• search()
• loadedNamespaces()
• getAnywhere(Package_Name)
• http://cran.r-project.org/doc/manuals/R-intro.html#Namespaces
Random Number Generators
• rnorm
• runif
•
Regular Expression
Python Re Module
grep
• Pattern_Index <- grep(Pattern, Search_Vector)
• E.g. (quantmod中的 Cl function)
return(x[, grep("Close", colnames(x))])
• hits <- grep( pattern, x )
• Ref: Lecture5v1
R LibSVM (e1071)
http://www.csie.ntu.edu.tw/~cjlin/libsvm/R_example
R CR Tree Method (rpart)
Classification and Regression Tree
• http://www.statsoft.com/textbook/classification-and-regression-trees/
• http://www.stat.cmu.edu/~cshalizi/350/lectures/22/lecture-22.pdf
• http://www.stat.wisc.edu/~loh/treeprogs/guide/eqr.pdf
R Adaboost Package (adabag)
adaboost.M1
• 此函數的演算法使用 Freund and Schapire‘s Adaboost.M1 algorithm
• 其中 weak learner 的部分使用 CR Tree 也就是R中的 rpart package
adaboost.M1’s Training Data Form
• Label Column must be a factor object
(in source code)
fit <- rpart(formula = formula, weights = data$pesos, data = data[, -1], maxdepth = maxdepth)
flearn <- predict(fit, data = data[, -1], type = "class")
R IDE Tools
Reference
• http://en.wikipedia.org/wiki/R_(programming_language)
• http://jekyll.math.byuh.edu/other/howto/R/RE.shtml (Emacs)
• http://stat.ethz.ch/ESS/
Reference
• http://www.nd.edu/~steve/Rcourse/Lecture2v1.pdf
• http://addictedtor.free.fr/graphiques/
• http://www.evc-cit.info/psych018/r_intro/r_intro4.html
• http://www.r-tutor.com/r-introduction/data-frame
• http://msenux.redwoods.edu/math/R/dataframe.php
• http://www.nd.edu/~steve/Rcourse/Rnotes.html