tuanv.nguyen% · species bodywt brainwt nondrea ming dreaming totalsleep%lifespan gestaon% predaon%...

69
Tuan V. Nguyen Gene$cs Epidemiology of Osteoporosis Lab Garvan Ins$tute of Medical Research Garvan Ins$tute Biosta$s$cal Workshop 17 April 2014 © Tuan V. Nguyen

Upload: others

Post on 27-Aug-2021

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Tuan  V.  Nguyen  Gene$cs  Epidemiology  of  Osteoporosis  Lab  

Garvan  Ins$tute  of  Medical  Research  

Garvan  Ins$tute    Biosta$s$cal  Workshop  17  April  2014   ©  Tuan  V.  Nguyen  

Page 2: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Introduction to R

•  A  brief  history    

•  Installa$on  

•  Packages  

•  Essen$al  grammar  

•  A  session  with  R    

Page 3: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Previously …

•  Many  sta$s$cal  packages  were/are  available  

•  Popular  packages  include    

Systat,  Minitab,  Sta$s$ca,  BMDP,  S+,  Gauss,  Spida    

JMP,  SPSS,  Stata,  SAS  

 

and  now  R    

Page 4: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

R is gaining popularity

Number  of  scholarly  ar$cles  that  reference  each  soUware  by  year  (Source:  Muenchen  R.  The  popularity  of  data  analysis  soUware,  r4stat.com/ar$cles/popularity)  

Page 5: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

R is gaining popularity

Number  of  scholarly  ar$cles  that  reference  each  soUware  by  year,  aUer  removing  the  top  two,  SPSS  and  SAS  (Source:  Muenchen  R.  The  popularity  of  

data  analysis  soUware,  r4stat.com/ar$cles/popularity)  

Page 6: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

A brief history

•  R  is  a  “sta$s$cal  and  graphical  programming  language”  

•  Originated  from  S    –  1988  -­‐  S2:  RA  Becker,  JM  Chambers,  A  Wilks    –  1992  -­‐  S3:  JM  Chambers,  TJ  Has$e  –  1998  -­‐  S4:  JM  Chambers  

•  R  was  ini$ally  wriben  by  Ross  Ihaka  and  Robert  Gentleman  (Univ  of  Auckland,  New  Zealand)  in  1990s  

•  From  1997:  interna$onal  “R-­‐core”,  15  people  

Page 7: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

What can R do?

•  It  is  a  sta$s$cal  language    

•  All  models  of  sta$s$cal  analysis  

•  Great  for  simula$on  work  

•  Programming  (do  you  want  to  take  a  challenge?)      

Page 8: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Why R ?

•  Open  source  –  totally  free!  

•  Developed  by  professional  and  academic  sta$s$cians    

•  Run  on  Windows,  Unix,  MacOS    

•  Keep  up-­‐to-­‐date  with  methodological  developments  

•  Speak  the  language  of  experts  (bioinforma$cs  and  sta$s$cs)  

•  Large  user  community  

Page 9: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Installa9on    

Page 10: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

cran.r-project.org

Page 11: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Installation of R on Windows

•  Select  Windows  

•  Select “base”  

•  Run  à  OK  à  Next  

•  Then  Finish    –  R  icon  on  your  desktop  

Page 12: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

A screenshot of R

Page 13: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

RStudio

An  “add-­‐on”  of  R    

RStudio  hbp://rstudio.org  

 

Page 14: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Introduction to RStudio

•  An  IDE  (Interface  Development  Environment)  of  R.  

•  Provide  some  convenient  func$ons  for  running  R    

•  R  also  has  a  number  of  other  IDEs:      •  TinnR  

•  R  commander  

Page 15: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

R and RStudio

Can  run  R  within  Rstudio  (you  don’t  need  to  start  R)  

Page 16: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

RStudio

R  console  

Workspace:  Variables  

Files  

Page 17: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Packages  

R is a real demonstration of the power of collaboration

Ihaka

Page 18: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Packages

•  R  =  Base  +  Packages  •  Base  R  includes  basic  R  func$ons  for  simple  func$ons  and  analyses  

•  Packages  are  modules  for  specific  analyses    •  More  than  6000  packages  in  R  !    

Page 19: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Common packages

Hmisc:  Miscellaneous  for  data  manipula$on  

tables:  For  tabula$on  of  data  

foreign:  For  reading  data  from  other  soUwares    

tables:  For  tabula$on  of  data  

gmodels:  Programming  tools    

ggplot2:  Advanced  graphics  

sciplot:  Scien$fic  graphs      

Zelig:  “Every  one’s  sta$s$cal  soUware”      

 

rms:  Regression  modeling  strategies  

car:  Companion  to  regression  analysis      

survival:  Survival  analyses  

EpiR:  Epidemiological  analyses  

epicalc:  Epidemiological  analyses  

boot:  Bootstrap  analyses  

cluster:  Cluster  analysis  

psych:  Psychometrics  and  descrip$ve  sta$s$cs    

Page 20: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Basic management of packages

•  Installing  new  packages  (try  now!)  install.packages(c("Hmisc", "rms", "tables", "foreign", "gmodels", "ggplot2", "sciplot", "Zelig", "car", "survival", "EpiR", "epicalc", "boot", "cluster", "psych", "binom", "BMA", "ExactCIdiff", "lattice", "mgcv", "gam", "nlme", "quantreg")  

•  To  find  out  which  packages  you  have  installed  library()

Page 21: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

R  Grammar:    a  quick  introduc9on  

Page 22: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Interacting with R

•  Start  up  R    

•  Can  use  up/down  arrow  keys  to  retrieve  command  history  

•  Can  use  leU/right  keys  to  edit  a  command  line  

•  Can  use  TAB  to  append  a  full  command  –  very  useful!  

•  Mul$ple  commands  can  be  wriben  in  1  line  by  using  “;”  separator  

Page 23: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Variable names

•  Use  lebers,  numbers,  and  signs  (.,  -­‐,  _)  

•  Assignment  symbol:  <-­‐  or  =  

•  Dis$nc$on  between  upper  and  lower  case  lebers  Genotype = 5; genotype <- 7;

Geno.type = Genotype + genotype

Page 24: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Object-oriented language

R  is  an  object-­‐oriented  language  

•  Func$on  

•  Vector    • Matrix  

•  Dataframe  

Page 25: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Function

•  R  “commands”  =  func$on  

•  Func$on  has  arguments  

•  Arguments  include  variables  (name),  parameters,  op$ons,  etc  

•  Example:  firng  a  linear  regression  model  y  =  a  +  bx  

 

m1 = lm(y ~ x, data=test)

Page 26: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Function

•  R  “commands”  =  func$on  

•  Func$on  has  arguments  

•  Example:  firng  a  linear  regression  model  y  =  a  +  bx  

m1 = lm(y ~ x, data=test)

Object  name  m1  

Func9on    lm    =  linear  model    

Arguments:  variables:  y,  x  dataset  name      

Page 27: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Vector

•  Vectors  are  basic  building  block  in  R    •  Vector  =  a  series  of  values    

•  Values  can  be  numeric  or  character   score = c(4,2,1,5)

gender = c('F','M','F','M')

 c  (concatena9on)  for  direct  data  entry  

Page 28: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Matrix

•  Rectagular  data  à  rows,  columns  

• Matrix  can  be  a  collec$on  of  vectors  

  1 3 6 7

3 4 7 9 5 7 8 0

 

Page 29: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Matrix

1 3 6 7 3 4 7 9 5 7 8 0

 v1 = c(1,3,5) v2 = c(3,4,7) v3 = c(6,7,8) v4 = c(7,9,0) m = cbind(v1,v2,v3,v4) m

Page 30: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Reference to matrix

•  Row  first,  column  later  

•  Flexible  in  R  

 

> m v1 v2 v3 v4 [1,] 1 3 6 7 [2,] 3 4 7 9 [3,] 5 7 8 0

> m[2,3] v3 7 > m[1,] v1 v2 v3 v4 1 3 6 7 > m[1:2,] v1 v2 v3 v4 [1,] 1 3 6 7 [2,] 3 4 7 9

> m[,2:3] v2 v3 [1,] 3 6 [2,] 4 7 [3,] 7 8 > m[,3:4]*m[1,2] v3 v4 [1,] 18 21 [2,] 21 27 [3,] 24 0

Page 31: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Dataframe

Dataset  in  R  =  “Dataframe”  =  matrix  

ID Gender Math Reading

1 F 5 8

2 M 5 2

3 F 7 3

4 F 8 6

fields,  columns,  variables  

rows  records  observa9ons  

numeric   character   numeric   numeric  

Page 32: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Reference to field/column in a dataframe

•  Dataframe  should  be  attached  prior  to  analysis  

•  Reference  to  field:    (dataframe  name)$(field  name)  

•  Example:  v1 = c(1,3,5) v2 = c(3,4,7) v3 = c(6,7,8) v4 = c(7,9,0) dat = data.frame(v1, v2, v3, v4) attach(dat) dat$sum = dat$v1 + dat$v3 sum1 = v1 + v3 dat

Page 33: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

The effect of $

v1 = c(1,3,5) v2 = c(3,4,7) v3 = c(6,7,8) v4 = c(7,9,0) dat=data.frame(v1,v2,v3,v4) attach(dat) dat$sum = dat$v1 + dat$v3 sum1 = v1 + v3 dat

> dat v1 v2 v3 v4 sum 1 1 3 6 7 7 2 3 4 7 9 10 3   5 7 8 0 13

There  is  NO  sum1  !  

Page 34: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Data coding in R

id = c(1, 2, 3, 4, 5)

gender = c("male", "female", "male", "female", "female")

dat = data.frame(id, gender)

 

We  want  to  create  a  new  variable  called  sex  with  numeric  values  (1,  2)    

dat$sex[gender=="male"] <- 1

dat$sex[gender=="female"] <- 2

Page 35: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Character and numeric coding

Character  to  numeric    X = c("1", "2", "3", "4", "5")  

We  want  to  create  a  new  variable  called  Y  with  numeric  values  (for  calcula$on)  

Y = as.numeric(X)

mean(Y)

 

Numeric  to  character  Y = 1:10  

We  want  to  create  a  new  variable  called  X  with  character  values  

X = as.character(Y)

Page 36: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Sorting dat: sort()

X = rnorm(10); X [1] 1.5651300 -0.5382971 -0.1995302 1.0111098 0.3590144 -1.5245237

[7] -0.3192534 0.1323256 -0.7916954 -0.0664167

sort(X) [1] -1.5245237 -0.7916954 -0.5382971 -0.3192534 -0.1995302 -0.0664167

[7] 0.1323256 0.3590144 1.0111098 1.5651300

Page 37: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Merging datasets

id = c(1,2,3,4) sex=c("M","F","M","F") dat1=data.frame(id,sex)

id = c(1,2,3,4,5) age=c(21,34,45,32,18) dat2=data.frame(id,age)

dat = merge(dat1, dat2, by="id") dat = merge(dat1, dat2, by="id", all.x=T, all.y=T)

Page 38: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

An  R  Session  (demo)  

Page 39: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

To work with R …

•  R,  like  most  sta$s$cal  programs,  works  on  observa$ons  (rows)  and  variables  

•  You  should  keep  in  mind  

–  Name  of  dataframe      

–  Name  of  variables  

Page 40: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Allison and Cichhetti’s study

Trueb  Allison;  Domenic  V.  Ciccher.  Sleep  in  Mammals:  Ecological  and  Cons$tu$onal  Correlates.  Science  1976;  194:732-­‐734.  

Page 41: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

R Session

•  Reading  a  file  into  R  for  analysis  

Filename:  allison.csv  

•  Some  graphical  analyses  

•  Some  descrip$ve  (and  not  so  descrip$ve)  analyses  

Allison T, Cicchetti DV (1976). Sleep in mammals: ecological and constitutional correlates. Science 194, 732–734.

Page 42: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Species   BodyWt   BrainWt  NonDreaming   Dreaming   TotalSleep  LifeSpan   Gesta9on   Preda9on   Exposure   Danger  

Africanelephant   6654   5712   NA   NA   3.3   38.6   645   3   5   3  

Africangiantpouchedrat   1   6.6   6.3   2   8.3   4.5   42   3   1   3  

Arc$cFox   3.385   44.5   NA   NA   12.5   14   60   1   1   1  

Arc$cgroundsquirrel   0.92   5.7   NA   NA   16.5   NA   25   5   2   3  

Asianelephant   2547   4603   2.1   1.8   3.9   69   624   3   5   4  

Baboon   10.55   179.5   9.1   0.7   9.8   27   180   4   4   4  

Bigbrownbat   0.023   0.3   15.8   3.9   19.7   19   35   1   1   1  

Braziliantapir   160   169   5.2   1   6.2   30.4   392   4   5   4  

Cat   3.3   25.6   10.9   3.6   14.5   28   63   1   2   1  

Chimpanzee   52.16   440   8.3   1.4   9.7   50   230   1   1   1  

Chinchilla   0.425   6.4   11   1.5   12.5   7   112   5   4   4  

Cow   465   423   3.2   0.7   3.9   30   281   5   5   5  

Deserthedgehog   0.55   2.4   7.6   2.7   10.3   NA   NA   2   1   2  

Donkey   187.1   419   NA   NA   3.1   40   365   5   5   5  

EasternAmericanmole   0.075   1.2   6.3   2.1   8.4   3.5   42   1   1   1  

Page 43: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Reading file csv

•  Locate  your  folder  and  filename  

•  Use  the  func$on  read.csv  

•  In  Mac,  you  simply  drag  the  filename  to  the  R  command  line    

dat = read.csv("~/Dropbox/Garvan Lectures 2014/Datasets and Teaching Materials/allison.csv", header=T, na.strings="NA")

Page 44: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Reading file through file.choose()

f = file.choose() # find  the  file

dat = read.csv(f, header=T, na.strings="NA")

attach(dat) # abach  the  data  before  analysis

names(dat) # want  to  know  variable  names  

dim(dat) # how  many  rows  and  columns?    

summary(dat) # summarize  data  

 

Page 45: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Summary: an overall “picture” > summary(dat) Species BodyWt BrainWt Africanelephant : 1 Min. : 0.005 Min. : 0.14 Africangiantpouchedrat: 1 1st Qu.: 0.600 1st Qu.: 4.25 ArcticFox : 1 Median : 3.342 Median : 17.25 Arcticgroundsquirrel : 1 Mean : 198.790 Mean : 283.13 Asianelephant : 1 3rd Qu.: 48.203 3rd Qu.: 166.00 Baboon : 1 Max. :6654.000 Max. :5712.00 (Other) :56 NonDreaming Dreaming TotalSleep LifeSpan Min. : 2.100 Min. :0.000 Min. : 2.60 Min. : 2.000 1st Qu.: 6.250 1st Qu.:0.900 1st Qu.: 8.05 1st Qu.: 6.625 Median : 8.350 Median :1.800 Median :10.45 Median : 15.100 Mean : 8.673 Mean :1.972 Mean :10.53 Mean : 19.878 3rd Qu.:11.000 3rd Qu.:2.550 3rd Qu.:13.20 3rd Qu.: 27.750 Max. :17.900 Max. :6.600 Max. :19.90 Max. :100.000 NA's :14 NA's :12 NA's :4 NA's :4 Gestation Predation Exposure Danger Min. : 12.00 Min. :1.000 Min. :1.000 Min. :1.000 1st Qu.: 35.75 1st Qu.:2.000 1st Qu.:1.000 1st Qu.:1.000 Median : 79.00 Median :3.000 Median :2.000 Median :2.000 Mean :142.35 Mean :2.871 Mean :2.419 Mean :2.613 3rd Qu.:207.50 3rd Qu.:4.000 3rd Qu.:4.000 3rd Qu.:4.000 Max. :645.00 Max. :5.000 Max. :5.000 Max. :5.000 NA's :4

Page 46: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Descriptive statistics: counting

library(tables) tabular(factor(Exposure) ~ (n=1 + Percent("col"))) All n factor(Exposure) All Percent 1 27 43.548 2 13 20.968 3 4 6.452 4 5 8.065 5 13 20.968

Page 47: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Descriptive statistics: mean, SD, etc

tabular(factor(Exposure) ~ LifeSpan*(n=1 + mean + median + sd), data=na.omit(dat))

LifeSpan n factor(Exposure) All mean median sd 1 18 15.17 5.75 24.103 2 9 13.81 7.00 15.622 3 4 25.40 26.50 13.846 4 4 20.30 23.60 9.428 5 7 33.34 30.00 18.452

Page 48: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Descriptive statistics: graph means=with(na.omit(dat), tapply(LifeSpan, Exposure, mean)) barplot(sort(means), horiz=T, las=1, col="blue", xlab="Life Span", ylab="Exposure")

Page 49: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Descriptive statistics: graph library(sciplot) bargraph.CI(Exposure, LifeSpan, lc=F, data=na.omit(dat))

Page 50: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Box plot

boxplot(LifeSpan ~ Exposure, notch=F, col="blue")

1 2 3 4 5

020

4060

80100

Page 51: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Even better box plot library(ggplot2)

qplot(x=factor(Exposure), y=LifeSpan, data=dat, geom=c("boxplot", "jitter"), fill=Exposure)

0

25

50

75

100

1 2 3 4 5factor(Exposure)

LifeSpan

1

2

3

4

5Exposure

Page 52: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Histogram

hist(LifeSpan, prob=T, col="blue") lines(density(LifeSpan, na.rm=T), col="red", lwd=3)

Histogram of LifeSpan

LifeSpan

Density

0 20 40 60 80 100

0.00

0.01

0.02

0.03

Page 53: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Histogram with ggplot2

qplot(x=LifeSpan) + geom_histogram(col="white", fill="blue") + opts(legend.position="none")

0.0

2.5

5.0

7.5

10.0

0 25 50 75 100LifeSpan

count

Page 54: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Histogram and density with ggplot2

m  =  ggplot(data=dat,  aes(x=LifeSpan))  

m+  geom_histogram(binwidth=20,  aes(y=..density..),  col="white",  fill="blue",  lwd=0.5)  +  geom_density()  

0.00

0.01

0.02

0.03

0 40 80 120LifeSpan

density

Page 55: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

More “fancy” histogram

library(ggplot2) qplot(x=LifeSpan, geom="density", fill=factor(Exposure), alpha=I(0.5)) + opts(legend.position="top")  

Page 56: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Scatter plot

plot(BodyWt, BrainWt, pch=16, col="blue")

Page 57: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Scatter plot with labels

plot(BodyWt, BrainWt, pch=16, col="blue")

text(BodyWt, BrainWt, labels=Species, cex= 0.5)

Page 58: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Scatter plot with transformation

plot(log(BodyWt), log(BrainWt), pch=16, col="blue")

Page 59: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Scatter plot with straight line

plot(log(BrainWt) ~ log(BodyWt), pch=16, col="blue") abline(lm((log(BrainWt) ~ log(BodyWt))), col="red")

Page 60: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Scatter plot coloured by a 3rd variable

qplot(x=log(BodyWt), y=log(BrainWt), col=Exposure) + stat_smooth(method="lm", se=T)

Page 61: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Scatter plot scaled by size

qplot(x=log(BodyWt), y=log(BrainWt), size=Danger, col=Exposure) + stat_smooth(method="lm", se=T)

Page 62: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Multiple scatter plots with straight line

qplot(log(BodyWt), log(BrainWt), data=dat, facets=~Danger)+geom_abline()

Page 63: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Correlogram

library(psych)

vars=cbind(log(BodyWt), log(BrainWt), TotalSleep, Dreaming, LifeSpan, Gestation)

pairs.panels(vars)

Page 64: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

-2 2 6

0.96 -0.53

0 2 4 6

-0.23 0.61

0 300 600

-40

48

0.77-2

26

-0.56 -0.34 0.71 0.78

TotalSleep

0.73 -0.41

510

20

-0.63

02

46 Dreaming

-0.30 -0.45

LifeSpan

040

80

0.61

-4 0 4 8

0300600

5 10 20 0 40 80

Gestation

Page 65: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Factor analysis

library(psych)

vars=cbind(BodyWt, BrainWt, LifeSpan, Gestation, TotalSleep, Danger, Predation)

fit = factanal(na.omit(vars), 2, rotation="varimax")

fit

Page 66: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Factor analysis

Loadings: Factor1 Factor2 BodyWt 0.933 BrainWt 0.995 LifeSpan 0.511 Gestation 0.771 0.264 TotalSleep -0.333 -0.614 Danger 0.996 Predation 0.948 Factor1 Factor2 SS loadings 2.834 2.345 Proportion Var 0.405 0.335 Cumulative Var 0.405 0.740 Test of the hypothesis that 2 factors are sufficient. The chi square statistic is 62.94 on 8 degrees of freedom. The p-value is 1.23e-10

Page 67: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Summary

•  R  –  an  important  development  in  sta$s$cal  science    

•  Absolutely  free,  powerful,  highly  flexible  

• Widely  used  around  the  world    

•  Fit  all  statsi$cal  models    

•  Very  useful  to  simula$on  work    

•  High  quality  (eg  publishable)  graphics    

Page 68: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3

Books and references

Dalgaard  P  (2008)  Introductory  Sta$s$cs  with  R.  New  York:  Springer,  2nd  edi$on.  

Seefeld  K,  Linder  E  (2007)  Sta$s$cs  using  R  with  biological  examples.  Available  online  (free).  hbp://cran.r-­‐project.org/doc/contrib/Seefeld_StatsRBio.pdf  

Braun  WJ,  Murdoch  DJ  (2007)  A  First  Course  in  Sta$s$cal  Programming  with  R.  Cambridge:  Cambridge  University  Press.  

Wickham  H  (2009)  ggplot:  using  the  grammar  of  graphics  with  R.  Springer    

Useful  websites  

www.rseek.org  (Google)  

 

Page 69: TuanV.Nguyen% · Species BodyWt BrainWt NonDrea ming Dreaming TotalSleep%LifeSpan Gestaon% Predaon% Exposure% Danger Africanelephant 6654 5712 NA NA 3.3 38.6 645 3 5 3