03 the r language part 2

Upload: mhafod

Post on 23-Feb-2018

215 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/24/2019 03 the R Language Part 2

    1/163

    2014 RStudio, Inc. All rights reserved. Follow @rstudioapp

    All Training materials are provided "as is" and withoutwarranty and RStudio disclaims any and all express and

    implied warranties including without limitation theimplied warranties of title, fitness for a particularpurpose, merchantability and noninfringement.

    The Training Materials are licensed under the CreativeCommons Attribution-Noncommercial 3.0 United StatesLicense. To view a copy of this license, visithttp://creativecommons.org/licenses/by-nc/3.0/us/or send aletter to Creative Commons, 171 Second Street, Suite300, San Francisco, California, 94105, USA.

    http://creativecommons.org/licenses/by-nc/3.0/us/http://twitter.com/rstudioapp
  • 7/24/2019 03 the R Language Part 2

    2/163

    Studio

    2014 RStudio, Inc. Follow @rstudioapp

    Garrett GrolemundMaster Instructor, RStudio

    August 2014

    Retrieve and use informationin precise, e#cient ways

    The R language (2 of 2)

    http://twitter.com/rstudioapp
  • 7/24/2019 03 the R Language Part 2

    3/163

    2014 RStudio, In

    1. Subsetting

    2. R Packages

    3. Logical tests

    4. Missing values

    Studio

  • 7/24/2019 03 the R Language Part 2

    4/163

    2014 RStudio, In

    How can you save just the fifth element of xto y

    How can you change the fifth element of x to a

    Question

    x

  • 7/24/2019 03 the R Language Part 2

    5/163

    Subsetting

  • 7/24/2019 03 the R Language Part 2

    6/163 2014 RStudio, In

    With your neighbor, run the code on thefollowing slide IN YOUR HEADS

    Your turn

    vec

  • 7/24/2019 03 the R Language Part 2

    7/163 2014 RStudio, In

    vec df

    6 1 3 6 10 5John

    Paul

    George

    Ringo

    1940

    1940

    1942

    1943

    guitar

    guitar

    drums

    bass

    name birth instrument

    vec[2]

    vec[c(5, 6)]

    vec[-c(5,6)]

    vec[vec > 5]

    # Predict what the following code will do

    # DON'T RUN IT!

    df[c(2, 4), 3]

    df[ , 1]

    df[ , "instrument"]

    df$instrument

  • 7/24/2019 03 the R Language Part 2

    8/163 2014 RStudio, In

    Subset notation

    Studio

    vec[2]

    name of objectto subset

  • 7/24/2019 03 the R Language Part 2

    9/163 2014 RStudio, In

    Subset notation

    Studio

    vec[?]

    name of objectto subset

    brackets(brackets always mean

    subset)

  • 7/24/2019 03 the R Language Part 2

    10/163 2014 RStudio, In

    Subset notation

    Studio

    vec[?]

    name of objectto subset

    brackets(brackets always mean

    subset)

    an index(that tells R which

    elements to include)

    d

  • 7/24/2019 03 the R Language Part 2

    11/163

    2014 RStudio, In

    Each dimension needs its own index!

    Studio

    vec[?] 6 1 3 6 10 5

    S di

  • 7/24/2019 03 the R Language Part 2

    12/163

    2014 RStudio, In

    Each dimension needs its own index!

    Studio

    vec[?] 6 1 3 6 10 5

    S di

  • 7/24/2019 03 the R Language Part 2

    13/163

    2014 RStudio, In

    Each dimension needs its own index!

    Studio

    vec[?]

    df[?,?]

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    St di

  • 7/24/2019 03 the R Language Part 2

    14/163

    2014 RStudio, In

    Each dimension needs its own index!

    Studio

    vec[?]

    df[?,?]

    whichrowstoinclude

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    St dio

  • 7/24/2019 03 the R Language Part 2

    15/163

    2014 RStudio, In

    Each dimension needs its own index!

    Studio

    vec[?]

    df[?,?]

    whichrowstoinclude

    whichcolumnsto include

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Studio

  • 7/24/2019 03 the R Language Part 2

    16/163

    2014 RStudio, In

    Each dimension needs its own index!

    Studio

    vec[?]

    separatedimensions

    with acomma

    df[?,?]

    whichrowstoinclude

    whichcolumnsto include

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Studio

  • 7/24/2019 03 the R Language Part 2

    17/163

    2014 RStudio, In

    Each dimension needs its own index!

    Studio

    vec[?]

    df[?,?]

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    But what should go in the indexes?

    Studio

  • 7/24/2019 03 the R Language Part 2

    18/163

    2014 RStudio, In

    Four ways to subset

    1. Integers

    2. Blank spaces

    3. Names

    4. Logical vectors (TRUE and FALSE)

    Studio

    Studio

  • 7/24/2019 03 the R Language Part 2

    19/163

    2014 RStudio, In

    Integers (positive)

    Positive integers behave just like ijnotation inlinear algebra

    Studio

    df[?,?]

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Studio

  • 7/24/2019 03 the R Language Part 2

    20/163

    2014 RStudio, In

    Integers (positive)

    Studio

    df[2,?]

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Positive integers behave just like ijnotation inlinear algebra

    Studio

  • 7/24/2019 03 the R Language Part 2

    21/163

    2014 RStudio, In

    Integers (positive)

    Studio

    df[2,3]

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Positive integers behave just like ijnotation inlinear algebra

    Studio

  • 7/24/2019 03 the R Language Part 2

    22/163

    2014 RStudio, In

    Integers (positive)

    df[2,3]

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Positive integers behave just like ijnotation inlinear algebra

    Studio

  • 7/24/2019 03 the R Language Part 2

    23/163

    2014 RStudio, In

    df[c(2?4),c(2?3)]

    Integers (positive)

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Positive integers behave just like ijnotation inlinear algebra

    Studio

  • 7/24/2019 03 the R Language Part 2

    24/163

    2014 RStudio, In

    df[c(2,4),c(2?3)]

    Integers (positive)

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Positive integers behave just like ijnotation inlinear algebra

    Studio

  • 7/24/2019 03 the R Language Part 2

    25/163

    2014 RStudio, In

    df[c(2,4),c(2,3)]

    Integers (positive)

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Positive integers behave just like ijnotation inlinear algebra

    Studio

  • 7/24/2019 03 the R Language Part 2

    26/163

    2014 RStudio, In

    Integers (positive)

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    df[c(2,4),c(2,3)]

    Positive integers behave just like ijnotation inlinear algebra

    Studio

  • 7/24/2019 03 the R Language Part 2

    27/163

    2014 RStudio, In

    Integers (positive)

    Positive integers behave just like ijnotation inlinear algebra

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    df[c(2,4),3]

    ?

    Studio

  • 7/24/2019 03 the R Language Part 2

    28/163

    2014 RStudio, In

    Integers (positive)

    Positive integers behave just like ijnotation inlinear algebra

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    df[c(2,4),3]

    Studio

  • 7/24/2019 03 the R Language Part 2

    29/163

    2014 RStudio, In

    1. Colons are a useful way to create vectors

    1:4

    # 1 2 3 4

    df[1:4, 1:2]

    2. Repeating input repeats outputdf[c(1,1,1,2,2), 1:3]

    Studio

  • 7/24/2019 03 the R Language Part 2

    30/163

    2014 RStudio, In

    Integers (zero)

    As an index, zero will return nothingfrom adimension. This creates an empty object.

    vec[0]

    # numeric(0)

    df[1:2, 0]

    # data frame with 0 columns and 2 rows

    Studio

  • 7/24/2019 03 the R Language Part 2

    31/163

    2014 RStudio, In

    Integers (negative)

    Negative integers return everything but theelements at the specified locations.

    You cannot use both negative and positiveintegers in the samedimension

    Studio

  • 7/24/2019 03 the R Language Part 2

    32/163

    2014 RStudio, In

    Integers (negative)

    Negative integers return everything but theelements at the specified locations.

    You cannot use both negative and positiveintegers in the samedimension

    vec[c(5,6)] 6 1 3 6 10 5

    Studio

  • 7/24/2019 03 the R Language Part 2

    33/163

    2014 RStudio, In

    vec[-c(5,6)]

    Integers (negative)

    Negative integers return everything but theelements at the specified locations.

    You cannot use both negative and positiveintegers in the samedimension

    6 1 3 6 10 5

    Studio

  • 7/24/2019 03 the R Language Part 2

    34/163

    2014 RStudio, In

    Integers (negative)

    Negative integers return everything but theelements at the specified locations.

    You cannot use both negative and positiveintegers in the samedimension

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    df[c(2:4), 2:3]

    Studio

  • 7/24/2019 03 the R Language Part 2

    35/163

    2014 RStudio, In

    Integers (negative)

    Negative integers return everything but theelements at the specified locations.

    You cannot use both negative and positiveintegers in the samedimension

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    df[-c(2:4), 2:3]

    Studio

  • 7/24/2019 03 the R Language Part 2

    36/163

    2014 RStudio, In

    Integers (negative)

    Negative integers return everything but theelements at the specified locations.

    You cannot use both negative and positiveintegers in the samedimension

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    df[-c(2:4),-(2:3)]

  • 7/24/2019 03 the R Language Part 2

    37/163

    2014 RStudio, In

    Your Turn

    1. Fix these poorly written subset commands

    vec(1:4)

    vec[-1:4]vec[3, 4, 5]

    Studio

  • 7/24/2019 03 the R Language Part 2

    38/163

    2014 RStudio, In

    () for functions, [] for subsetting

    vec[1:4]

    # 6 1 3 6

    Dont mix positive and negative integers; distribute thnegative sign (e.g., -1:4 = -1 0 1 2 3 4).

    vec[-(1:4)]

    # 10 5

    Pass multiple values for the same dimension as a ve

    vec[c(3, 4, 5)]

    # 3 6 10

  • 7/24/2019 03 the R Language Part 2

    39/163

    2014 RStudio, In

    Your Turn

    What is wrong with these subsetting commandsWhat will they do?

    mat[2]

    df[1]

  • 7/24/2019 03 the R Language Part 2

    40/163

    2014 RStudio, In

    1

    2

    3

    4

    5

    6

    7

    8

    9

    mat[2]

    How R makes a matrix

  • 7/24/2019 03 the R Language Part 2

    41/163

    2014 RStudio, In

    How R makes a matrix

    1 2 3 4 5 6 7 8 9vec

    How R makes a matrix

  • 7/24/2019 03 the R Language Part 2

    42/163

    2014 RStudio, In

    How R makes a matrix1

    2

    3

    4

    5

    6

    7

    8

    9

    vec

    How R makes a matrix

  • 7/24/2019 03 the R Language Part 2

    43/163

    2014 RStudio, In

    How R makes a matrix

    1

    23

    vec

    4

    5

    6

    7

    8

    9

    How R makes a matrix

  • 7/24/2019 03 the R Language Part 2

    44/163

    2014 RStudio, In

    How R makes a matrix

    1

    23

    4

    56

    7

    8

    9

    vec

    How R makes a matrix

  • 7/24/2019 03 the R Language Part 2

    45/163

    2014 RStudio, In

    How R makes a matrix

    1

    23

    4

    56

    7

    89

    vec

    How R makes a matrix

  • 7/24/2019 03 the R Language Part 2

    46/163

    2014 RStudio, In

    How R makes a matrix

    1

    23

    4

    56

    7

    89

    vecmatrix

  • 7/24/2019 03 the R Language Part 2

    47/163

    2014 RStudio, In

    1

    2

    3

    4

    5

    6

    7

    8

    9

    1 2 3 4 5 6 7 8 9

    mat[2]

  • 7/24/2019 03 the R Language Part 2

    48/163

    2014 RStudio, In

    John

    df[2]

    Paul

    George

    Ringo

    1940

    1942

    1943

    1940

    bass

    drums

    guitar

    guitar

    How R makes a data frame

  • 7/24/2019 03 the R Language Part 2

    49/163

    2014 RStudio, In

    List c("a","b","c","d") c(1, 2, 3, 4) c(T, F, T, F)

    How R makes a data frame

  • 7/24/2019 03 the R Language Part 2

    50/163

    2014 RStudio, In

    List c("a",

    "b",

    "c",

    "d")

    c(

    1,

    2,

    3,

    4)

    c(T,

    F,

    T,

    F)

  • 7/24/2019 03 the R Language Part 2

    51/163

    2014 RStudio, In

    c(

    "a",

    "b",

    "c",

    "d")

    c(

    1,

    2,

    3,

    4)

    c(T,

    F,

    T,

    F)

    List

  • 7/24/2019 03 the R Language Part 2

    52/163

    2014 RStudio, In

    c(

    "a",

    "b",

    "c",

    "d")

    c(

    1,

    2,

    3,

    4)

    c(T,

    F,

    T,

    F)

    Listdata frame

  • 7/24/2019 03 the R Language Part 2

    53/163

    2014 RStudio, In

    John

    df[2]

    Paul

    George

    Ringo

    1940

    1942

    1943

    1940

    bass

    drums

    guitar

    guitar

    c("guitar", "bass",

    "guitar","drums")

    c("John","Paul",

    "George","Ringo")

    c(1940, 1942,

    1943, 1940)

    Studio

  • 7/24/2019 03 the R Language Part 2

    54/163

    2014 RStudio, In

    Blank spaces

    Blank spaces return everything

    (i.e., no subsetting occurs on that dimension)

    vec[ ] 6 1 3 6 510

    Studio

  • 7/24/2019 03 the R Language Part 2

    55/163

    2014 RStudio, In

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Blank spaces

    Blank spaces return everything

    (i.e., no subsetting occurs on that dimension)

    df[1,]

    Studio

  • 7/24/2019 03 the R Language Part 2

    56/163

    2014 RStudio, In

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    Blank spaces

    Blank spaces return everything

    (i.e., no subsetting occurs on that dimension)

    df[ ,2]

    Studio

  • 7/24/2019 03 the R Language Part 2

    57/163

    2014 RStudio, In

    If your object has names, you can ask forelements or columns back by name.

    Names

    vec[c("a","b","d")] 6 1 3 6 510

    Studio

  • 7/24/2019 03 the R Language Part 2

    58/163

    2014 RStudio, In

    If your object has names, you can ask forelements or columns back by name.

    names(vec)

  • 7/24/2019 03 the R Language Part 2

    59/163

    2014 RStudio, In

    If your object has names, you can ask forelements or columns back by name.

    names(vec)

  • 7/24/2019 03 the R Language Part 2

    60/163

    2014 RStudio, In

    If your object has names, you can ask forelements or columns back by name.

    names(vec)

  • 7/24/2019 03 the R Language Part 2

    61/163

    2014 RStudio, In

    Names

    If your object has names, you can ask forelements or columns back by name.

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    df[ ,"birth"]

    name birth instrument

    N

    Studio

  • 7/24/2019 03 the R Language Part 2

    62/163

    2014 RStudio, In

    Names

    If your object has names, you can ask forelements or columns back by name.

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    df[ ,c("name","birth")]

    name birth instrument

    L i l

    Studio

  • 7/24/2019 03 the R Language Part 2

    63/163

    2014 RStudio, In

    You can subset with a logical vector of the samelength as the dimension you are subsetting.Each element that corresponds to a TRUE willbe returned.

    Logical

    vec[c(FALSE,TRUE,FALSE,TRUE,TRUE,FALSE)]

    6 1 3 6 510

    L i l

    Studio

  • 7/24/2019 03 the R Language Part 2

    64/163

    2014 RStudio, In

    You can subset with a logical vector of the samelength as the dimension you are subsetting.Each element that corresponds to a TRUE willbe returned.

    Logical

    vec[c(FALSE,TRUE,FALSE,TRUE,TRUE,FALSE)]

    6 1 3 6 510

    c(FALSE,TRUE,FALSE,TRUE,TRUE,FALSE)

    L i l

    Studio

  • 7/24/2019 03 the R Language Part 2

    65/163

    2014 RStudio, In

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    You can subset with a logical vector of the samelength as the dimension you are subsetting.Each element that corresponds to a TRUE willbe returned.

    Logical

    df[c(FALSE,TRUE,TRUE,FALSE), ]

    L i l

    Studio

  • 7/24/2019 03 the R Language Part 2

    66/163

    2014 RStudio, In

    You can subset with a logical vector of the same

    length as the dimension you are subsetting.Each element that corresponds to a TRUE willbe returned.

    Logical

    df[c(FALSE,TRUE,TRUE,FALSE), ]

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    c(FALSE,TRUE,TRUE,FALSE)

    Studio

    Subset notation

  • 7/24/2019 03 the R Language Part 2

    67/163

    2014 RStudio, In

    Subset notation

    Your Turn

  • 7/24/2019 03 the R Language Part 2

    68/163

    2014 RStudio, In

    Your Turn

    Write down as many ways to extract the name"John"from dfas you can. Make sure eachworks. You have two minutes.

    John

    Paul

    George

    Ringo

    1940

    1940

    1941

    1943

    guitar

    guitar

    drums

    bass

    name birth instrument

    # Answers

    Studio

  • 7/24/2019 03 the R Language Part 2

    69/163

    2014 RStudio, In

    df[1, 1]

    df[1, "name"]

    df[1, -(2:3)]

    df[1, c(TRUE, FALSE, FALSE)]

    df[-(2:4), 1]

    df[-(2:4), "name"]

    df[-(2:4), -(2:3)]

    df[-(2:4), c(TRUE, FALSE, FALSE)]df[c(TRUE, FALSE, FALSE, FALSE), 1]

    df[c(TRUE, FALSE, FALSE, FALSE), "name"]

    df[c(TRUE, FALSE, FALSE, FALSE), -(2:3)]

    df[c(T, F, F, F), c(T, F, F)]

    Your Turn

  • 7/24/2019 03 the R Language Part 2

    70/163

    2014 RStudio, In

    Your Turn

    lst

  • 7/24/2019 03 the R Language Part 2

    71/163

    2014 RStudio, In

    Subsetting lists

    sum(lst[1]) # Error!

    # What is the di"

    erence?

    lst[c(1,2)]

    lst[1]

    lst[[1]]

    lst c(1, 2) c("a", "b", "c")TRUE

    Studio

  • 7/24/2019 03 the R Language Part 2

    72/163

    2014 RStudio, In

    If list xis a train carryingobjects, then x[[5]]is the

    object in car 5; x[4:6]is a

    train of cars 4-6.

    http://twitter.com/#!/RLangTip/status/118339256388304896

    Studio

    http://twitter.com/#!/RLangTip/status/118339256388304896
  • 7/24/2019 03 the R Language Part 2

    73/163

    2014 RStudio, In

    lst c(1, 2) c("a", "b", "cTRUE

    Studio

  • 7/24/2019 03 the R Language Part 2

    74/163

    2014 RStudio, In

    c(1, 2) c("a", "b", "cTRUE

    Studio

  • 7/24/2019 03 the R Language Part 2

    75/163

    2014 RStudio, In

    c("a", "b", "cc(1, 2) TRUE

    lst[c(1,2)]

    Studio

  • 7/24/2019 03 the R Language Part 2

    76/163

    2014 RStudio, In

    lst[c(1,2)]

    c(1, 2) c("a", "b", "cTRUE

    c(1, 2) TRUE

    # [[1]]

    # [1] 1 2

    #

    # [[2]]

    # [1] TRUE

    Studio

  • 7/24/2019 03 the R Language Part 2

    77/163

    2014 RStudio, In

    c("a", "b", "cc(1, 2) TRUE

    lst[1]

    Studio

  • 7/24/2019 03 the R Language Part 2

    78/163

    2014 RStudio, In

    c("a", "b", "cTRUEc(1, 2)

    c(1, 2)lst[1]

    # [[1]]

    # [1] 1 2

    Studio

  • 7/24/2019 03 the R Language Part 2

    79/163

    2014 RStudio, In

    lst[[1]]

    c("a", "b", "cc(1, 2) TRUE

    Studio

  • 7/24/2019 03 the R Language Part 2

    80/163

    2014 RStudio, In

    c("a", "b", "cc(1, 2) TRUE

    c(1, 2)lst[[1]]

    Studio

  • 7/24/2019 03 the R Language Part 2

    81/163

    2014 RStudio, In

    c("a", "b", "cc(1, 2) TRUE

    Wha

    twillt h

    lst[[1]][2]

    Studio

  • 7/24/2019 03 the R Language Part 2

    82/163

    2014 RStudio, In

    lst[[1]][2]

    c("a", "b", "cc(1, 2) TRUE

    Studio

  • 7/24/2019 03 the R Language Part 2

    83/163

    2014 RStudio, In

    lst[[1]][2]

    c("a", "b", "cc(1, 2) TRUE

    c(1, 2)[2]

    Studio

    (" " "b" "( )

  • 7/24/2019 03 the R Language Part 2

    84/163

    2014 RStudio, In

    lst[[1]][2]

    c("a", "b", "cc(1, 2) TRUE

    c(1, 2)[2]

    2

    $

    Studio

  • 7/24/2019 03 the R Language Part 2

    85/163

    2014 RStudio, In

    $The most common syntax for subsetting listsand data frames

    names(lst)

  • 7/24/2019 03 the R Language Part 2

    86/163

    2014 RStudio, In

    lst c(1, 2) c("a", "b", "c")TRUE

    lst$alpha

    Studio

    alpha beta gamma

  • 7/24/2019 03 the R Language Part 2

    87/163

    2014 RStudio, In

    lst c(1, 2) c("a", "b", "c")TRUE

    lst$alpha

    name of list

    Studio

    alpha beta gamma

  • 7/24/2019 03 the R Language Part 2

    88/163

    2014 RStudio, In

    lst c(1, 2) c("a", "b", "c")TRUE

    lst$alpha

    $name of list

    Studio

    alpha beta gamma

  • 7/24/2019 03 the R Language Part 2

    89/163

    2014 RStudio, In

    lst c(1, 2) c("a", "b", "c")TRUE

    lst$alpha

    $name of listname of element

    (no quotes)

    Studio

    alpha beta gamma

  • 7/24/2019 03 the R Language Part 2

    90/163

    2014 RStudio, In

    lst c(1, 2) c("a", "b", "c")TRUE

    lst$alpha

    $name of listname of element

    (no quotes)

    c(1, 2)

    Studio

    alpha beta gamma

  • 7/24/2019 03 the R Language Part 2

    91/163

    2014 RStudio, In

    lst c(1, 2) c("a", "b", "c")TRUE

    lst$alpha

    c(1, 2)

    Same

    aslst

    Studio

    John 1940 guitar

    name birth instrument

  • 7/24/2019 03 the R Language Part 2

    92/163

    2014 RStudio, In

    df$birth

    Paul

    George

    Ringo 1940

    1941

    1943 guitar

    drums

    bass

    Studio

    John 1940 guitar

    name birth instrument

  • 7/24/2019 03 the R Language Part 2

    93/163

    2014 RStudio, In

    df$birth

    name of dataframe

    Paul

    George

    Ringo 1940

    1941

    1943 guitar

    drums

    bass

    Studio

    John 1940 guitar

    name birth instrument

  • 7/24/2019 03 the R Language Part 2

    94/163

    2014 RStudio, In

    df$birth

    $name of data

    frame

    Paul

    George

    Ringo 1940

    1941

    1943 guitar

    drums

    bass

    Studio

    John 1940 guitar

    name birth instrument

  • 7/24/2019 03 the R Language Part 2

    95/163

    2014 RStudio, In

    df$birth

    $name of data

    framename of column

    (no quotes)

    Paul

    George

    Ringo 1940

    1941

    1943 guitar

    drums

    bass

    Studio

    John 1940 guitar

    name birth instrument

  • 7/24/2019 03 the R Language Part 2

    96/163

    2014 RStudio, In

    df$birth

    $name of data

    framename of column

    (no quotes)

    c(1940, 1941, 1943, 1940)

    Paul

    George

    Ringo 1940

    1941

    1943 guitar

    drums

    bass

  • 7/24/2019 03 the R Language Part 2

    97/163

    RPackages

    R Packages

    Studio

  • 7/24/2019 03 the R Language Part 2

    98/163

    2014 RStudio, In

    A collection of code and functions written

    for the R language.

    Usually focuses on a specific task orproblem.

    Most of the useful R applications appear in

    packages.

    Start RStudio

    Studio

  • 7/24/2019 03 the R Language Part 2

    99/163

    2014 RStudio, In

    ggplot2

    Internet Your hard drive Your R session

    Install your package withi ll k (" l 2")

    Studio

  • 7/24/2019 03 the R Language Part 2

    100/163

    ggplot2

    2014 RStudio, In

    install.packages("ggplot2")

    ggplot2

    Internet Your hard drive Your R session

    install.packages("ggplot2")

    Load your package withlib ( l t2)

    ( )EVERY

    Studio

  • 7/24/2019 03 the R Language Part 2

    101/163

    library(ggplot2)

    2014 RStudio, InInternet Your hard drive Your R session

    library(ggplot2)

    ggplot2ggplot2

    ( )TIME

    ggplot2

    Studio

  • 7/24/2019 03 the R Language Part 2

    102/163

    2014 RStudio, InInternet Your hard drive Your R session

    library(ggplot2)

    ggplot2ggplot2

    ggplot2

    qplot(1:10, 1:10)

    Studio

  • 7/24/2019 03 the R Language Part 2

    103/163

    2014 RStudio, In

    ## Error: could not find function "qplot"

    You cannot use a function package until you load th

    package

    library(ggplot2)

    qplot(1:10, 1:10)

    ##

    Package summary

    Studio

  • 7/24/2019 03 the R Language Part 2

    104/163

    2014 RStudio, In

    1.Download the package withinstall.packages("name")

    % You only have to do this once

    % You should be connected to the internet

    2.Load the package withlibrary("name")

    % You have to do this each time you start anR session.

    There

    areov

    Rpa

    c

    Your Turn

  • 7/24/2019 03 the R Language Part 2

    105/163

    2014 RStudio, In

    We're going to use the ggplot2, maps,

    RColorBrewer, and scales packages today.Load them with

    library("ggplot2")

    library("maps")

    library("RColorBrewer")

    Note: If you have not yet installed them, you'll need to install.packages(c("ggplot2", maps",

    "RColorBrewer")) first.

  • 7/24/2019 03 the R Language Part 2

    106/163

    Diamonds

    Diamonds data

  • 7/24/2019 03 the R Language Part 2

    107/163

    2014 RStudio, In

    % ~54,000round diamonds from &

    http://www.diamondse.info/

    % comes in the ggplot2 package

    % Carat, colour, clarity, cut

    % Total depth, table, depth, &

    width, height

    % Price

    Your turn

    http://www.diamondse.info/
  • 7/24/2019 03 the R Language Part 2

    108/163

    2014 RStudio, In

    diamondsis huge!

    Use subsetting to look at just the first sixrows of diamonds

    Challenge: use subsetting to look at justthe lastsix rows

    di d [1 6 ]

    Studio

  • 7/24/2019 03 the R Language Part 2

    109/163

    2014 RStudio, In

    diamonds[1:6, ]

    nrow(diamonds)

    # 53940diamonds[53935:53940, ]

    # Same as

    head(diamonds)

    tail(diamonds)

    Studio

    View

  • 7/24/2019 03 the R Language Part 2

    110/163

    2014 RStudio, In

    The View function can also help you examine a data

    it opens a spreadsheet like data viewer.

    View(diamonds)# notice: Capital V

    Studio

    Help pages

  • 7/24/2019 03 the R Language Part 2

    111/163

    2014 RStudio, In

    You can open the help page for any R object(including functions) by typing ?followed by theobject's name

    ?diamonds

    table width

    x

  • 7/24/2019 03 the R Language Part 2

    112/163

    2014 RStudio, In

    z

    depth = z / diameter

    table = table width / x * 100

    x, y, z in mm

  • 7/24/2019 03 the R Language Part 2

    113/163

    2014 RStudio, Inqplot(x, y, data = diamonds)

    What is weirdabout these

    values?

  • 7/24/2019 03 the R Language Part 2

    114/163

    2014 RStudio, Inqplot(x, y, data = diamonds)

    Can you removethem with

  • 7/24/2019 03 the R Language Part 2

    115/163

    2014 RStudio, Inqplot(x, y, data = diamonds)

    them withsubsetting?

  • 7/24/2019 03 the R Language Part 2

    116/163

    Logicaltests

    Studio

    Logical comparisons

  • 7/24/2019 03 the R Language Part 2

    117/163

    2014 RStudio, In

    What will these return?

    1 < 3

    1 > 3

    c(1, 2, 3, 4, 5) > 3

    Your turnx

  • 7/24/2019 03 the R Language Part 2

    118/163

    2014 RStudio, In

    c( , , 3, , 5)

    Your turnx

  • 7/24/2019 03 the R Language Part 2

    119/163

    2014 RStudio, In

    ( , , , , )

    Your turnx

  • 7/24/2019 03 the R Language Part 2

    120/163

    2014 RStudio, In

    ( , , , , )

    Your turnx

  • 7/24/2019 03 the R Language Part 2

    121/163

    2014 RStudio, In

    ( )

    Your turnx

  • 7/24/2019 03 the R Language Part 2

    122/163

    2014 RStudio, In

    Your turnx

  • 7/24/2019 03 the R Language Part 2

    123/163

    2014 RStudio, In

    Your turnx

  • 7/24/2019 03 the R Language Part 2

    124/163

    2014 RStudio, In

    %in%

    Studio

  • 7/24/2019 03 the R Language Part 2

    125/163

    2014 RStudio, In

    # What does this do?

    1 %in% c(1, 2, 3, 4)

    1 %in% c(2, 3, 4)

    c(3,4,5,6) %in% c(2, 3, 4)

    %in%

    Studio

  • 7/24/2019 03 the R Language Part 2

    126/163

    2014 RStudio, In

    %in%tests whether the object on the left is a

    member of the group on the right.

    1 %in% c(1, 2, 3, 4)

    # TRUE

    1 %in% c(2, 3, 4)

    # FALSE

    c(3,4,5,6) %in% c(2, 3, 4)

    # TRUE TRUE FALSE FALSE

    You can combine logical tests with & | xor ! any and

    Studio

    Boolean operators

  • 7/24/2019 03 the R Language Part 2

    127/163

    2014 RStudio, In

    You can combine logical tests with &, |, xor, !, any, andall

    x > 2 & x < 9

    TRUE & TRUE

    TRUE

    You can combine logical tests with & | xor ! any and

    Studio

    Boolean operators

  • 7/24/2019 03 the R Language Part 2

    128/163

    2014 RStudio, In

    You can combine logical tests with &, |, xor, !, any, andall

    x > 2 & x < 9

    TRUE&TRUE

    TRUE

    You can combine logical tests with & | xor ! any and

    Studio

    Boolean operators

  • 7/24/2019 03 the R Language Part 2

    129/163

    2014 RStudio, In

    You can combine logical tests with &, |, xor, !, any, andall

    x > 2& x < 9

    TRUE&TRUE

    TRUE

    You can combine logical tests with & | xor ! any and

    Studio

    Boolean operators

  • 7/24/2019 03 the R Language Part 2

    130/163

    2014 RStudio, In

    You can combine logical tests with &, |, xor, !, any, andall

    x > 2 & x < 9

    TRUE & TRUE

    TRUE

    &

    Are both condition 1 and condition 2 true?

    Studio

  • 7/24/2019 03 the R Language Part 2

    131/163

    2014 RStudio, In

    Are both condition 1 andcondition 2 true?

    |

    Is either condition 1 or condition 2 true?

    Studio

  • 7/24/2019 03 the R Language Part 2

    132/163

    2014 RStudio, In

    Is either condition 1 orcondition 2 true?

    xor

    Is either condition 1 or condition 2 true but not both?

    Studio

  • 7/24/2019 03 the R Language Part 2

    133/163

    2014 RStudio, In

    Is either condition 1 orcondition 2 true, but not both?

    !

    Negation

    Studio

  • 7/24/2019 03 the R Language Part 2

    134/163

    2014 RStudio, In

    Negation

    any

    Is any condition TRUE?

    Studio

  • 7/24/2019 03 the R Language Part 2

    135/163

    2014 RStudio, In

    Is anycondition TRUE?

    all

    Is everycondition TRUE?

    Studio

  • 7/24/2019 03 the R Language Part 2

    136/163

    2014 RStudio, In

    y

    Studio

    Logical operators

  • 7/24/2019 03 the R Language Part 2

    137/163

    2014 RStudio, In

    Studio

    Boolean operators

  • 7/24/2019 03 the R Language Part 2

    138/163

    2014 RStudio, In

    Studio

  • 7/24/2019 03 the R Language Part 2

    139/163

    2014 RStudio, In

    w

  • 7/24/2019 03 the R Language Part 2

    140/163

    2014 RStudio, In

    Turn these sentences into logical tests in R

    Is wpositive?

    Is xgreater than 10 and less than 20?

    Is object ythe word February?

    Is every value in za day of the week?

    # Answers

    Studio

  • 7/24/2019 03 the R Language Part 2

    141/163

    2014 RStudio, In

    # Answers

    w > 010 < x & x < 20

    y == "February"

    all(z %in% c("Monday", "Tuesday", "Wednesday

    "Thursday", "Friday", "Saturday", "Sunday"

    Studio

  • 7/24/2019 03 the R Language Part 2

    142/163

    2014 RStudio, In

    # Common mistakes

    x > 10 & < 20

    y = "February"

    all(z == "Monday" | "Tuesday" | "Wednesday".

    Studio

    x > 10 & < 20??

  • 7/24/2019 03 the R Language Part 2

    143/163

    2014 RStudio, In

    x > 10 & < 20??

    ??TRUE & error!

    error!

    Studio

    x > 10 & < 20??

  • 7/24/2019 03 the R Language Part 2

    144/163

    2014 RStudio, In

    x 10& 20??

    ??TRUE &error!

    Studio

    x > 10 & < 20??

  • 7/24/2019 03 the R Language Part 2

    145/163

    2014 RStudio, In

    x 10 & 20??

    ??TRUE &error!

    error!

    Studio

    x > 10 & < 20??

  • 7/24/2019 03 the R Language Part 2

    146/163

    2014 RStudio, In

    ??TRUE & error!

    error!

  • 7/24/2019 03 the R Language Part 2

    147/163

    Logicalsubsetting

    Logical subsetting

    Studio

  • 7/24/2019 03 the R Language Part 2

    148/163

    2014 RStudio, In

    Combining logical tests with subsetting is a verypowerful technique!

    x_zeroes

  • 7/24/2019 03 the R Language Part 2

    149/163

    2014 RStudio, In

    # Prints to screen

    diamonds[diamonds$x > 10, ]

    # Saves to new data frame

    big 10, ]

    # Overwrites existing data frame.Dangerous!

    diamonds

  • 7/24/2019 03 the R Language Part 2

    150/163

    2014 RStudio, In

    # Uh oh!

    rm(diamonds)

    str(diamonds)

    # Phew!

  • 7/24/2019 03 the R Language Part 2

    151/163

    Missingvalues

    Studio

    Data errors

  • 7/24/2019 03 the R Language Part 2

    152/163

    2014 RStudio, In

    Typically removing the entire row becauseof one error is overkill. Better to selectivelyreplace problem values with missingvalues.

    In R, missing values are indicated by NA

  • 7/24/2019 03 the R Language Part 2

    153/163

    2014 RStudio, In

  • 7/24/2019 03 the R Language Part 2

    154/163

    2014 RStudio, In

  • 7/24/2019 03 the R Language Part 2

    155/163

    2014 RStudio, In

  • 7/24/2019 03 the R Language Part 2

    156/163

    2014 RStudio, In

  • 7/24/2019 03 the R Language Part 2

    157/163

    2014 RStudio, In

  • 7/24/2019 03 the R Language Part 2

    158/163

    2014 RStudio, In

    Missing values propagate

    Use i () to check for missing values

    Studio

    NA Behavior

  • 7/24/2019 03 the R Language Part 2

    159/163

    2014 RStudio, In

    Use is.na()to check for missing values

    a

  • 7/24/2019 03 the R Language Part 2

    160/163

    2014 RStudio, In

    na.rmargument to remove missing values

    prior to computation.

    b

  • 7/24/2019 03 the R Language Part 2

    161/163

    2014 RStudio, In

    change individual values within an object

    summary(diamonds$x)

    diamonds$x[diamonds$x == 0]

    diamonds$x[diamonds$x == 0]

  • 7/24/2019 03 the R Language Part 2

    162/163

    2014 RStudio, In

    diamonds$y[diamonds$y == 0]

  • 7/24/2019 03 the R Language Part 2

    163/163

    2014 RStudio, Inqplot(x, y, data = diamonds)