data manipulation with r_3

Upload: -

Post on 06-Jul-2018

225 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/17/2019 Data Manipulation with R_3

    1/17

    DATA ANALYSIS - THE DATA TABLE WAY

    INDEXING

  • 8/17/2019 Data Manipulation with R_3

    2/17

    Using column names in i

    > DTA B

    1 : c 1 2 : b 2 3 : a 3 4 : c 4

    5 : b 5 6 : a 6

    > DT[A == “a” ]

    A B 1 : a 3 2 : a 6

    > DT[A %in % c ( “a” , “c” )]

    A B 1 : c 1 2 : a 3 3 : c 4 4

    : a 6

  • 8/17/2019 Data Manipulation with R_3

    3/17

    Conceptually

    > DT

    A B1 : c 12 : b 23 : a 34 : c 45 : b 56 : a 6

    > w w[ 1 ] FALSE FALSE TRUE FALSE FALSE TRUE

    > DT[w]

    A B1 : a 32 : a 6

    ,

    > DT[A == “a” ]

    A B1 : a 32 : a 6

  • 8/17/2019 Data Manipulation with R_3

    4/17

    Automatic indexing

    > DT[A == “a” ]> DT[A %in % c ( “a” , “c” )]

    These don’t actually vector scan.They create an index automatically (by default) on A, the first time

    you use column A.

    > DT[A == “b” ] # second time much faster

  • 8/17/2019 Data Manipulation with R_3

    5/17

    Let’s practice

    DATA.TABLE

  • 8/17/2019 Data Manipulation with R_3

    6/17

    DATA ANALYSIS - THE DATA TABLE WAY

    KEYS

  • 8/17/2019 Data Manipulation with R_3

    7/17

    Creating and using a key

    > DT

    A B 1 : c 1 2 : b 2 3 : a 3 4 : c 4 5 : b 5 6 : a 6

    > setkey (DT, A)

    A B 1 : a 3 2 : a 6 3 : b 2 4 : b 5 5 : c 1 6 : c 4

    > DT

    A B 1 : a 3 2 : a 6 3 : b 2 4 : b 5 5 : c 1 6 : c 4

    > DT[ “b” ]

    A B1 : b 2 2 : b 5

    > DF[ “b” ] Error: duplicate 'row.namare not allowed

  • 8/17/2019 Data Manipulation with R_3

    8/17

    mult

    > setkey (DT, A)

    A B 1 : a 3 2 : a 6 3 : b 2 4 : b 5 5 : c 1 6 : c 4

    > DT[ “b” , mult = “first” ]

    A B 1 : b 2

    > DT[ “b” , mult = “last” ]

    A B 1 : b 5

  • 8/17/2019 Data Manipulation with R_3

    9/17

    nomatch

    > DT[ c ( “b” , “d” )

    A B 1 : b 2 2 : b 5 3 : d NA

    > DT[ c ( “b” , “d” ), nomatch = 0 ]

    A B 1 : b 2 2 : b 5

    > setkey (DT, A)

    A B 1 : a 3 2 : a 6 3 : b 2 4 : b 5 5 : c 1 6 : c 4

    ], nomatch = NA ] # default

  • 8/17/2019 Data Manipulation with R_3

    10/17

    A two-column key

    > setkey (DT, A, B)

    A B C 1 : a 2 6 2 : a 6 3 3 : b 1 2

    4 : b 5 5 5 : c 3 4 6 : c 4 1

    > DT

    A B C 1 : c 4 1 2 : b 1 2 3 : a 6 3

    4 : c 3 4 5 : b 5 5 6 : a 2 6

    > DT[.( “b” , 5 )]

    A B C 1 : b 5 5

    > DT[.( “b” , 6 )]

    A B C 1 : b 6 NA

    > DT[.( “b” )]

    A B C 1 : b 1 2 2 : b 5 5

    b

    b 5

  • 8/17/2019 Data Manipulation with R_3

    11/17

    Let’s practice

    DATA.TABLE

  • 8/17/2019 Data Manipulation with R_3

    12/17

    DATA ANALYSIS - THE DATA TABLE WAY

    ROLLING JOINS

  • 8/17/2019 Data Manipulation with R_3

    13/17

    Ordered joins

    > setkey (DT, A, B)

    A B C 1 : a 2 6 2 : a 6 3 3 : b 1 2 4 : b 5 5 5 : c 3 4 6 : c 4 1

    > DT[.( “b” , 4 ), roll = “nearest” ]

    A B C 1 : b 4 5

    > DT[.( “b” , 4 ), roll = TRUE ]

    A B C 1 : b 4 2

    > DT[.( “b” , 4 )]

    A B C 1 : b 4 NA

    a

    a

    b

    b

    c

    c

    b

    b

  • 8/17/2019 Data Manipulation with R_3

    14/17

    Forwards and backwards

    > setkey (DT, A, B)

    A B C 1 : a 2 6 2 : a 6 3 3 : b 1 2 4 : b 5 5 5 : c 3 4 6 : c 4 1

    > DT[.( “b” , 4 ), roll = +Inf ]

    A B C 1 : b 4 2

    > DT[.( “b” , 4 ), roll = -Inf ]

    A B C 1 : b 4 5

  • 8/17/2019 Data Manipulation with R_3

    15/17

    Limited staleness

    > setkey (DT, A, B)

    A B C 1 : a 2 6 2 : a 6 3 3 : b 1 2 4 : b 5 5 5 : c 3 4 6 : c 4 1

    > DT[.( “b” , 4 ), roll = 2 ]

    A B C 1 : b 4 NA

    > DT[.( “b” , 4 ), roll = - 2 ]

    A B C 1 : b 4 5

  • 8/17/2019 Data Manipulation with R_3

    16/17

    Control ends> DT[.( “b” , 7 : 8 ), roll = TRUE ]

    A B C 1 : b 7 5 2 : b 8 5

    > DT[.( “b” , 7 : 8 ), roll = TRUE , rollends =

    > setkey (DT, A, B)

    A B C 1 : a 2 6 2 : a 6 3 3 : b 1 2

    4 : b 5 5 5 : c 3 4 6 : c 4 1

    A B C 1 : b 7 NA 2 : b 8 NA

  • 8/17/2019 Data Manipulation with R_3

    17/17

    Let’s practice

    DATA.TABLE