Download - R Introduction
R IntroWeek 1
Scott Chamberlain[modified from Haldre Rogers]
September 9, 2011
Don’t just listen to me! Other Intros to R:
• http://www.stat.duke.edu/programs/gcc/ResourcesDocuments/RTutorial.pdf
• http://www.cyclismo.org/tutorial/R/• http://www.r-tutor.com/r-introduction• Quick R: http://www.statmethods.net/• http://www.bioconductor.org/help/course-materials/2011/CSAMA/Mond
ay/Morning%20Talks/R_intro.pdf
R user frameworks• R from command line: OSX and PC
– Just type “R” into the command line – and have fun!
• R itself– http://www.r-project.org/
• RStudio – good choice– http://www.rstudio.org/
• RevolutionR [free academic version] – this is sort of the SAS-ised version of R– http://www.revolutionanalytics.com/downloads/free-academic.php– Uses proprietary .xdf file format that speeds up computation times
• Many other ways to use R, including GUIs, other IDEs, and huge variety of text editors– https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources
• If you are afraid of the code interface, use Rattle, or R Commander, or Deducer, or Red R– You can learn using these interfaces what code does what after pressing
buttons
R user frameworks, cont.• R from Python
– RPy: http://rpy.sourceforge.net/
• C from R: – rcpp package:
• http://cran.r-project.org/web/packages/Rcpp/index.html • http://dirk.eddelbuettel.com/code/rcpp.html
– Can hugely speed up computation times by writing R functions in C language. Then the function calls C to run instead of R.• E.g.,
http://helmingstay.blogspot.com/2011/06/efficient-loops-in-r-complexity-versus.html
• & http://dirk.eddelbuettel.com/code/rcpp.examples.html
• Excel from R– XLConnect package:
http://cran.r-project.org/web/packages/XLConnect/index.html
• And more….see for yourself
R Tips
• R can crash Do not use R’s built in text editor or solely write code in the R console. Instead use any text editor that integrates with R. See here for links: – https://github.com/RatRiceEEB/RIntroCode/wiki/R-Resources
• When asking for help on listserves/help websites, use BRIEF and REPRODUCIBLE examples– Not doing this makes people not want to help you!
• R automatically overwrites files with the same file name!!!!– Make sure you want to overwrite a file before doing so
Style
Not this kind of style…
This kind of style!!!
Style
Style is important so YOU and OTHERS can read your code and actually use it
• Google style guide: – http://google-styleguide.googlecode.com/svn/tru
nk/google-r-style.html#generallayout• Henrik Bengtsson style guide: – http://www1.maths.lth.se/help/R/RCC/
• Hadley Wickham's style guide: – https://github.com/hadley/devtools/wiki/Style
Preparing your data for R
• What makes clean data?– Correct spelling– Identical capitalization (e.g. Premna vs premna)
• If myvector <- c(3, 4, 5), calling Myvector does not work!
– No spaces between words (spaces turned into “.”)• Generally try to avoid, use underscores instead
– NA or blank (if using csv) for missing values• Find and replace to get rid of spaces after words• I generally keep an .xls and a .csv file so you can
always recreate work in R with the .csv file and still modify the .xls file
Bringing data into R• Create csv file
– One worksheet only– No special formatting, filters, comments etc.– Copy only columns and rows with your data to the CSV, as R will read in columns without data
sometimes
• Name your variables well – self-explanatory, unique, lowercase, short-ish, one-word names
• In R, set the working directory– setwd("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro")– What is the working directory? getwd()– What is in the working directory? dir()
• Read in data– CSV files: iris.df <- read.csv("iris_df.csv", header=T)– Clipboard: read.csv("clipboard")- reads in file like cutting and pasting it– From web: read.csv("http://explore.data.gov/download/pwaj-zn2n/CSV")– From excel files: (using the XLConnect package)iris.df <- readWorksheetFromFile("/Users/ScottMac/Dropbox/R Group/Week1_R-Intro/iris_df.xlsx", sheet=“Sheet1”)
• Write data– write.csv(dataframe, “dataframename.csv”), OR– save(iris, “iris.RData”) [and load(“iris.RData”) to open in R]
R data structures• Scalar:
– Object with a single value, either numeric or character• Vector:
– Sequence of any values, including numeric, character, and NA• List:
– Arbitrary collections of variables – very useful R object• Character:
– Text, e.g., “this is some text”• Factor:
– Like character vectors, but only w/ values in predefined “levels”• Matrix:
– Only numeric values allowed• Dataframe:
– Each column can be of a different class• Immutable dataframe:
– special dataframe used in plyr package for faster dataframe manipulation, it references the original dataframe for faster calculations
• Function• Environment
Exploring dataframes• str(dataframe) gives column formats and dimensions• head(dataframe) and tail() give first and last 6 rows• names(dataframe) gives column names• row.names(dataframe) gives row names• attributes(dataframe) gives column and row names and object class• summary(dataframe) gives a lot of good information
– Make sure variables are appropriate form• Character/string, Numeric, Factor, Integer, logical
– Make sure mins, maxs, means, etc. seem right– Make sure you don’t have typing errors so Premna and premna are two
separate factors• Use: unique(iris$species) to see what all unique values of a column
are• Or use: levels(spider$species) to see different levels
To attach or not to attach…that is the question
• Some like to use ‘attach’ to make dataframe variables accessible by name within the R session
• Generally, ‘attach’ is frowned upon by R junkies. • Use dataframe$y, or data=dataframe, or
dataframe[,”y”], or dataframe[, 2]• To detach the object, use: detach()
I recommend: do not use attach, but do what you want
R Packages
• 3,262 packages!!!!• Packages are extensions written by anyone for any purpose,
usually loaded by:– install.packages(”packagename”), then– require(packagename) or library()– Use ?functionname for help on any function in base R or in
R packages– In RStudio, just press tab when in parentheses after the
function name to see function options!!!• Explore packages at the CRAN site:
– http://cran.r-project.org/web/packages/
• Inside-R package reference: – http://www.inside-r.org/packages
Data manipulation• Packages: plyr, data.table, doBY, sqldf,
reshape2, and more• Comparison of packages– Modified from code from Recipes,
scripts and Genomics blog: https://gist.github.com/878919
– data.table is by far the fastest!!! – BUT, ease of use and flexibility may be
plyr? See for yourself…• Also, see examples in the tutorial
code for reshape2 package for neat data manipulation tricks
Visualizations
• A few different approaches:– Base graphics– Lattice graphics– Grid graphics– ggplot2 graphics– Further reading: http://www.slideshare.net/dataspora/a-survey-of-r-graphics
• An example:
more on ggplot2 graphics
• There are classes taught by Hadley Wickham here at Rice if you want to learn more!– Data visualization (Stat645): http://had.co.nz/stat645/– Statistical computing (Stat405):
http://had.co.nz/stat405/• Hadley’s website is really helpful:
http://had.co.nz/ggplot2/ • The ggplot2 google groups site:
https://groups.google.com/forum/#!forum/ggplot2
QUICK RSTUDIO RUN THROUGH
Keyboard shortcuts!!http://www.rstudio.org/docs/using/keyboard_shortcuts
USE CASE HERE[see intro_usecase.R file]