introduction to s-plus
DESCRIPTION
Introduction to S-Plus. by Francesco Ferretti Analysis of Biological Data Course Winter term 2007 Dalhousie University. Introduction. S-plus and R are statistical programs using the S language. Developed in the Bell Labs of AT&T in 1970s by Rick Becker, John Chambers and Allan Wilks - PowerPoint PPT PresentationTRANSCRIPT
Introduction to S-Plusby
Francesco FerrettiAnalysis of Biological Data Course
Winter term 2007 Dalhousie University
Introduction S-plus and R are statistical programs using the S language. Developed in the Bell Labs of AT&T in 1970s by Rick
Becker, John Chambers and Allan Wilks In 1987 Douglas Martin at the University of Washington
created the present Insightful Corporation. He made S more popular, compatible with many hardware platforms, and provided with the necessary support for technical and statistical problems. S become S-plus
In 1997 the R project started. It was created by Ross Ihaka and Robert Gentelman at the university of Auckland, New Zealand. R is Similar to S-plus and freely available.
S-Plus and R Flexible and powerful statistical program Particularly appealing for its graphical
capabilities Can be problematic with large amount of data
SAS is more powerful in these cases
GUI (Grafical User Interface) Main toolbar and several windows Object Explorer
Overview of what is available on the system. Computational Engine
data frames, list, matrices, vectors Interface Objects
Search path, menu items, toolbars, dialogs Documet objects – Outputs
Graph sheets, Scripts and ReportsObject explorer visualize all the objects you have in your
work directory
GUI (Grafical User Interface) Import data
File>Import Data>From file Export data
File>Export Data>to file chose among all the data frames present in your working directory, give location
and extension Creating graphs1. Highlight a dataset in object explorer2. Select variables (Ctrl-select)3. Click on 2D plots4. Chose the preferred graph type5. Save graphs
• Default *.sgr (s-plus graph sheet)• Eventually you can choose your preferred picture extension with
File>Export Graph.. then specify location, name and extension then click OK
GUI (Grafical User Interface) Summary statistics
1. From object explorer select a data frame
2. On the main toolbar select Statistics>Summary Statistics
3. Select data, variables and statistics to be shown then click OK
Programming modeFull potential and flexibility of S-plus. Highly recommended! While GUI can perform much of the S-Plus commands and functions, programming mode allows you to resolve potentially all problems you will encounter in data manipulation, analysis and plotting.
Command window Can be used step by step interactively Writing functions
Using a text editor (notepad, emacs, editplus, etc.) or directly on the command line
Command line (the basic) S-plus is case sensitive # commenting sign ? Call help q() quit S-plus <- assignment sign. This is to associate a
value or a function to a variable name
Use of S-Plus in programming mode Calculator
*/+-, =, log, exp, sqrt, ^, sin, cos
Follow the same arithmetic rules */ before +- and () before */
Manipulate data Fitting models to data Plotting graphs
Logical Values Boolean Values: True, False < (less than), >, <= (less than or equal to),
>=, == (equal to), != (not equal to) Conditional expressions and operators
If, else, ifelse
& (and) | (or)
Brackets () to enclose arguments of functions and
perform arithmetic calculations [] indexing objects
x<-c(1,5,7,8) then x[3] = 7
{} to enclose groups of commands Function bodies If else statements loops
S-plus common objects Vector
Ordered group of numbers or strings X<-c(45,29,27) z<-c(180,180,165) y<-c(“Hall”,”Francesco”,”Sara”)
Matrix “rectangular layout of cells each one containing a value”
AH<-matrix(c(45,29,27,180,180,165),nrow = 3) AH<-matrix(c(x,z),nrow=3)
Array Multidimentional matrix
Data frame AHP<-data.frame(x,z,y) AHP<-data.frame(x,z,y,)
List
group together data not having the same structure. Output or summary come out as list. You can access or use part of these output.
Functions Set of commands performed on specified
variables Y<-mean(x) …or..y<-(x1+x2+x3+x4)/4 ..or..
y<-sum(x)/4 ..or..y<-sum(x)/length(x)
You can build your own functions In command line
SD<-function(x){sqrt(var(x))} function will be saved in your working directory…..SD(x)
Functions Creating a file with an s extension (file.s, sort of a library where you can store one ore more
functions) Open and editor Write the function:
# this function create the dataset “buddy” and # plot its variables one against the otherbuddy<-function(){
x<-c(2,3,5,6,8,10)y<-c(4,6,10,12,16,20)buddy<-data.frame(x,y)
plot(buddy$x,buddy$y,xlab=“x”,ylab=“y”,type=“l”)print(buddy)}
Save the file as an s file: c:\buddy.s Open the file with source(“c:\\buddy.s”) Access the funtion calling it as buddy()
Function namearguments
Body of the function, set of commands
Use of S-Plus in programming mode (Manipulation of data) Dataset never ready for analyses
Importing datasets: read.table() Subsetting object Creating new variables
seq(), rep(), sort(), unique(), length()
Merging and binding datasets: merge(), cbin(),rbin()
Graphical analysis Plotting to the active device: s-plus window
or file
pdf.graph(file=“”,horizontal=“”)
postscript(file=“”,horizontal=“”)
graphsheet(file=“”,format=“”)
Important functions:
par(), plot(), hist(), boxplot(), pairs()
Fitting a model to data Take SharkLife data Summary of the data, summary() EDA (Exploratory Data Analysis), pairs(),
hist(), boxplot(), plot() Fitting a linear regression model between
Lmax and birth.size, model1<-lm() Checking the model (using statistics and
plots), summary(model), plot(model)
Programming mode Script window
Mode where you can write programs, run them and keep track of your operations for future work File>New>Script File
Useful Reference Books The Basic of S-Plus by Krause A. and Olson M.
Statistical computing with S-Plus by Crawley M.J.
Modern Applied Statistics with S-plus by
Venables W.N. and Ripley B.D
…much more in the internet