r for statistical data analysis

31
www.utm.my innova-ve entrepreneurial global 1 Workshop: R for Sta/s/cal Data Analysisan alterna/ve to SPSS. UiTM 26 April 2016 . Dr. N Ahmad 25 April 2016 Dr. Norhaiza Ahmad Department of Mathematical Sciences Faculty of Science Universiti Teknologi Malaysia R for Statistical Data Analysis an Alternative to SPSS PART A0: R INTRO

Upload: others

Post on 07-May-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   1  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

25 April 2016 !Dr. Norhaiza Ahmad!

Department of Mathematical Sciences!Faculty of Science!

Universiti Teknologi Malaysia!

R for Statistical Data Analysis!

an Alternative to SPSS!!

PART A0: R INTRO!

Page 2: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   2  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Outline  

● PART  A:  INTRO  § What  is  R  §  History  § Why  Use  R  vs  SPSS  (or  other  commercial  packages)  §  Types  of  R  §  Downloading  &  Installing  R  (see  aMach)  §  Execu-ng  &  Running  R  Commands  §  Saving  and  QuiRng  R  

Page 3: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   3  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

What  is  R?  

•  A computer language, with orientation towards statistical applications

•  Growing rapidly in use

Page 4: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   4  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

History  

•  R  programming  language  is  an  offshoot  of  a  programming  language  called  S.  

  •  Splus  •  Commercial  ware  •  S  Plus  (1988)  vs.  R  (1993)  

Page 5: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   5  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Why  Use  R?  

•  FREE! •  Well documented-many add-on packages for

specialized uses •  We can inspect the actual codes and modify

to our needs •  Has a large community of R users online who

can answer your questions and who contribute to ‘packages’ in R

•  Minuses  –  Obscure  terms,  in-mida-ng  manuals,  odd  symbols,  inelegant  output  (except  graphics)  

Page 6: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   6  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

                   R  •  Many  different  datasets  (and  other  “objects”)  available  at  same  -me  

•  Datasets  can  be  of  any  dimension  •  Func-ons  can  be  modified  •  Experience  is  interac-ve-­‐you  program  

un-l  you  get  exactly  what  you  want  •  One  stop  shopping  -­‐  almost  every  

analy-cal  tool  you  can  think  of  is  available  

•  R  is  free  and  will  con-nue  to  exist.  Nothing  can  make  it  go  away,  its  price  will  never  increase.    

•  One  datasets  available  at  a  given  -me  •  Datasets  are  rectangular  •  Func-ons  are  proprietary  •  Experience  is  passive-­‐you  choose  an  

analysis  and  they  give  you  everything  they  think  you  need  

•  Tend  to  be  have  limited  scope,  forcing  you  to  learn  addi-onal  programs;  extra  op-ons  cost  more  and/or  require  you  to  learn  a  different  language  (e.g.,  SPSS  Macros)  

•  They  cost  money.  There  is  no  guarantee  they  will  con-nue  to  exist,  but  if  they  do,  you  can  bet  that  their  prices  will  always  increase  

SPSS/Other  Canned  Packages  

Page 7: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   7  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

R  vs.  SPSS  

Product Developer Latest Version Open Source

Cost

R R Foundation April 14, 2016 Y None

SPSS IBM March 15, 2016 N $$$$$

https://en.wikipedia.org/wiki/Comparison_of_statistical_packages

Page 8: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   8  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Recommended  Books  

ISBN  978-­‐0-­‐387-­‐09418-­‐2    ISBN-­‐13:  978-­‐1420079333    

Page 9: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   9  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Types  of  R  

•  R  command  line  (very  liMle  interface)  •  R  with  menu-­‐driven  interac-ve  interfaces:  •  R  commander  •  R  studio  

•  We  will  use  this.  •  Develop  

programming  skills  that  can  be  carried  over  for  the  other  R.  

R  has  currently  different  packages  to  work  with:            Deducer          The  R  commander          JGR          SciViews          pmg          Rkward,  for  KDE          Cantor,  which  integrates  R,  Latex,  Sage,  Maxima  and  KAlgebra  

Page 10: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   10  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

R  

Downloading  &  Installing  base  R  

Page 11: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   11  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

GeVng  Started:  Downloading  R  

•  Internet  connec-on  • See  step  by  step  procedure  on  how  to  download  R    

• End  up  with  R  icon  on  desktop  

Page 12: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   12  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

hWp://www.r-­‐project.org/    

Page 13: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   13  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Downloading  Base  R    [Figs  1.1  –  1.4]  

•  Click  on  Windows  •  Then  in  next  screen,  click  on  “base”  

•  Then  screens  for  Run,  OK,  or  Next  

•  And  finally  “Finish”    –   will  put  R  icon  on  desktop   When  you  download  

and  install  R,    you  are    downloading  and  installing  base  R  and  selected  packages.  

Page 14: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   14  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

R  

How  to  Run  &  Execute  R  commands?  

Page 15: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   15  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

How  to  Run  commands  in  R:  Method  1:  R  Console    

R  prompt  (>)    

R  is  updated  regularly.    Check  the  R  version  on  your  PC  

At  the  R  prompt:  -­‐Type  your    R  commands  here!  

•  Computa-ons  are  performed  on  an  R  console  •  R  commands  are  

typed  and  evaluated  here.  

Page 16: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   16  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

> 6+3 > 6-3 > (22+ 18+35)/3 > 31 %% 7    

R  as  calculator  

•  Write  your  R  commands  aser  each  prompt  •  Hit  Enter  to  execute  command  

•  Other  operators:            +,  -­‐,  *,  /  

Page 17: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   17  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Handy  Tip  

•  Recall  the  previous  R  commands  used  by  using  the  up  and  down  arrow  keys.  

Page 18: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   18  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Making  Comments  in  R  

•  Comment  lines  in  R  are  denoted  by  “#”  •  Any  lines  that  is  wriMen  aser  “#”  will  not  be  read  as  an  R  command    

•  Comment  lines  are  useful  for  making  nota-on/notes  in  your  program  

 > 31 %% 7 #remainder after division of 31 by 7. ie. 31(mod 7)  

 

Page 19: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   19  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Script  window  

A  script:  separate  window  where  you  can  type  your  commands  and  run  them  at  your  own  convenience.  

 It  is  similar  to  a  notepad/text  file!  

At  the  ribbon:  File  -­‐  New  

How  to  Run  commands  in  R:  Method  2:  R  Script    

Page 20: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   20  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Wri/ng  in  an  R  Script  File  

# ======== # 20 Nov 2014 # This is my first R document # ======== 6+3 6-3 6+3; 6-3          

 

Step  2:  To  run  commands  in  a  script  file:    Highlight  all  the  commandsà  Right  Click  à  Choose  Run  line  or  selec/on  

NOTE:  Use  semi  colons  to  separate  different  command  lines  6+3;  6-­‐3  

Step  1:  Type  these.  There  is  no  “>“  prompt  in  a  script  file  

Step  3:  Save  your  script  file  in  your  D  drive.  è  similar  to  saving  a  notepad/text  file  

Page 21: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   21  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

R  Assignment  

>    # use ‘=‘ or ‘<-’ for assignment > x <-2 > x > x <-2 > x

R  is  object  oriented  program.  Each  input/output  can  be  assigned/  stored  to  an  object    

•  Once  an  R  object  is  assigned,  it  can  be  called  upon  at  any  -me  as  long  as  it  is  saved.    Here  the  number  2  is  stored  in  an  object  called  “x”.  

•  In  general,  R  objects  are  stored  in  an  R  workspace,  also  known  as  the  global  environment.  

Page 22: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   22  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

R  Assignment  

>    # Use ‘;’ to combine line commands > x <-2; x > len = 2; len

•  Once  an  R  object  is  assigned,  it  can  be  called  upon  at  any  -me  as  long  as  it  is  saved.    Here  the  number  2  is  stored  in  an  object  called  “x”.  

•  In  general,  R  objects  are  stored  in  an  R  workspace,  also  known  as  the  global  environment.  

Page 23: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   23  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

R  Assignment  

>    # Use ‘()’ to auto call up assignment >    (x <-2) > (len = 2) • TIP:  Use  brackets  around  assignment  –  to  automa-cally  call  up  stored  variable  

Page 24: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   24  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Important  Rules  

1.  We  created  a  variable/  object  name  2.  Variable  names  are  case  sensi-ve  3.  No  blanks  in  name    

 (can  use  _  or  .  to  join  words,  but  not  -­‐)  4.  Start  with  a  leMer  (capital  or  lower-­‐case)  5.  Can  use  <-­‐  instead  of  =  

Page 25: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   25  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Handy  Tip  

•  If  a  command  is  not  complete  at  the  end  of  a  line  then  R  will  issue  the  following  prompt  (by  default):    

 +    •  on  second  and  subsequent  lines  un-l  the  command  syntax  is  correct.    To  break  out  of  this,  type  CTRL  +  c  (press  the  Control  Key  and  ‘C’  at  the  same  -me).    

Page 26: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   26  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Example  

Suppose  the  compound  interest  based  on  an  interest  rate  of  0.25%  per  year  of  a  30  year  period  is  1.078.  Calculate  the  bank  balance  aser  30  years,  if  the  ini-al  balance  is  RM3000.   Op-on  3  Recommended.  Occasionally  it  may  be  

convenient  to  use  Op-on  1  or  Op-on  2  > Ir.30= 1.078 > I.bal= 3000 > (F.bal= I.bal*Ir.30)

Page 27: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   27  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

FUN  Tips!  

•  You  can  change  the  appearance  of  your  prompt  symbol  ‘>’  to  some  other  form:  

>  op-ons(prompt  =  "R>  ")    To  return  to  the  default  prompt:  >  op-ons(prompt  =  ">  ")    

Page 28: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   28  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Saving  Stuff  

① You  can  save  individual  variables  with  the  save()  func-on.  

② You  can  save  the  en-re  workspace  with  the  save.image()  func-on.  

③ You  can  save  your  R  script  file,  using  the  appropriate  save  menu  command  in  your  code  editor.  

Op-on  3  Recommended.  Occasionally  it  may  be  convenient  to  use  Op-on  1  or  Op-on  2  

Page 29: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   29  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

Saving  Stuff  

•  To  exit:    •  hit  X  (top  right  corner)  or  at  R  prompt              >  quit ( ) •  Brings  up  to  this  screen:  •  Unless  you  need  to  recall  certain  objects  

regularly,  we  do  not  need  to  save  the  workspace  .    

•  Thus,  make  it  a  prac-ce  to  save  your  work  on  a  script  file  instead.  

•  Workspace  is  your  current  working  environment.  This  includes  all  the  func-ons,  objects  etc  that  you  have  created  in  that  session.    

Page 30: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   30  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

References  

•  Slides  adapted  from  T.  P.  Hogan,  A  Brief  Introductory  Guide,  SAGE  Publica-ons,  2010  

T.  Mar-n,  The  Undergraduate  Guide  to  R  W.  J.  Braun  et  al,  A  First  Course  In  Sta-s-cal  Programming  with  R  (2008)  

 Other  ways  to  learn  R:  hWp://swirlstats.com/    

Page 31: R for Statistical Data Analysis

www.utm.my     innova-ve  ●  entrepreneurial  ●  global   31  

Workshop:  R  for  Sta/s/cal  Data  Analysis-­‐an  alterna/ve  to  SPSS.  UiTM  26  April  2016  .  Dr.  N  Ahmad  

NEXT  

● Exercises  ● PART  A1:  Func-ons,  Packages  &  Help  in  R