part d: data structure in r€¦ · data structure in r. workshop: getting started with r. utm 14...
TRANSCRIPT
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
14 October 2018 Dr. Norhaiza Ahmad
Department of Mathematical SciencesFaculty of Science
Universiti Teknologi Malaysia
http://science.utm.my/norhaiza/
Getting Started with
for newbies
PART D: DATA STRUCTURE IN R
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
Outline
PARTD:DATASTRUCTUREINR
1. TypesofDatastructureScalars&Vectors
MatricesOtherstructures:Factors,Lists,DataFrames
2. CheckingandChangingDataObjectStructure
2
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
DataStructureinRRecall• InPartA- youhaveexperimentedwithobject
assignmentusingR
> x = 2; x > len = 2; len
x=2; len=2;x+2
TheseareexamplesofdataobjectsinR
i.e assignedavariable/nametoavalue
3
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
HandyTip
Tocheckthetypei.e modeofdatastructure
> x=5;x> mode(x)> mode(3<4)> mode("Apa khabar?") [1] 5
[1] "numeric”[1] "logical”[1] "character”
4
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
Checkthestructureofthedataset
> str(iris)
str(iris)'data.frame': 150 obs. of 5 variables:
$ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...>
> iris
CallupanRin-builtdataset
Checkthestructureofthedataset
Typeofdata=dataframe
5
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
DataObjectStructureinR• ThereisawidevarietyofdataobjectstructureinR:
• Scalars• Vectors• Factors• Matrices&Arrays• Lists• DataFrames
NUMERICLOGICALMODE(True/False)STRINGS
DATASTRUCTUREMODESCONSIST
OFVALUES
IMPORTANTTOKNOWSTRUCTUREOFDATAasDifferentRfunctionsmightuseaparticulardatastructure6
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
ExampleIrisDataset:FlowerswithMeasurementsSepalLength
SepalWidth
PetalLength
PetalWidth
1 5.1 3.5 1.4 0.22 4.9 3 1.4 0.23 4.7 3.2 1.3 0.24 4.6 3.1 1.5 0.25 5 3.6 1.4 0.26 5.4 3.9 1.7 0.47 4.6 3.4 1.4 0.38 5 3.4 1.5 0.29 4.4 2.9 1.4 0.210 4.9 3.1 1.5 0.1
Scalar
Vector
Matrix
ComputationalinR:manipulaton ofdatastructure– LinearAlgebra.InthissectionwelearntounderstandandmanipulatedatainR 7
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
TypesofDataObjectStructure:Scalar • WhatareotherfunctionsinR?
• Wheretofindotherfunctions?
• simplesttypeofdataobjectstructureisascalar.• Scalarisadataobjectwithonevalue> x = 5 #create scalar data object
> y = 2
> x*y+2
> ”Apa khabar?”
> 3 < 4
[1] 12
[1]"Apa khabar?”
[1] TRUE8
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
Typesof DataObjectStructure:Vectors• Avector'svaluescanbenumbers,strings,logicalvalues,oranyothertype,aslongasthey'reallthesametype.
(2)Avector'svaluescanbenumbers > c(2,4,5) [1] 2 4 5
(3)Avector'svaluescanbestrings > c(‘a’,’b’) [1] "a" "b"
> x=c(2,4,5);x [1] 2 4 5
(1)Thisisascalar > 5 [1] 5
AssignthescalartoanRobject > x=5;x [1] 5
• Thec() function(c isshortforCombine/Concatenate)createsanewvectorbycombiningalistofvalues.
AssignthescalartoanRobject
9
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
Extracting&Assigningelements fromvectors
> x=c(1:10)*2 > x
No Task R Code Output
1 Extractthe6thelementinx > x[6] [1] 12
2 Extractthe2nd to6thelement > x[2:6] [1] 4 6 8 10 12
3 Extractthe1st,3rdand5thelementinx
> x[c(1,3,5)] [1] 2 6 10
4 Extract reverse order > x[3:1] [1] 6 4 2
WecanEXTRACT anelementfromthevectororasubsetofthevectorbyindicatingtheINDEXofTHEELEMENTSusingsquarebrackets[ ].
[1] 2 4 6 8 10 12 14 16 18 20
10
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
WecanEXTRACT andassignsubsetstoavectorthatwehaveextracted
No Task R Code Output
5 ExtractDistinct ranges > x[c(1:3,5:6)]
> x[c(1:3,7,10)]
[1] 2 4 6 10 12
[1] 2 4 6 14 20
6 ExtractRepeatedindex > x[rep(c(9,10),2)] [1] 18 20 18 20
7 Extract andassignsubsettoavector
>ab=x[c(1:3,7,10)] > ab[1] 2 4 6 14 20
8 Extractlogicalvector > x>10
> x[x>10]
[1]FALSEFALSEFALSEFALSEFALSETRUE[7]TRUETRUETRUETRUE
[1]1214161820
Extracting&Assigningelements fromvectors
11
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
ExcludingelementsfromvectorsWecanEXCLUDE elementsfromavectorbyindicatingtheNEGATIVEindexoftheelement(s)usingsquarebrackets[ ].
No Task R Code Output9 Excludethe
6thelementinx
> x[-6] [1] 2 4 6 8 10 14 16 18 20
10 Excludethe2ndto6thelement
> x[-(2:6)] [1] 2 14 16 18 20
11 Excludethe1st,3rdand5thelementinx
> x[-c(1,3,5)] [1] 4 8 12 14 16 18 20
12
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
Otherbasicfunctions:appliedtovectors> x=6:15> length(x) #Number of elements in x[1] 10> max(x) #Largest value in x[1] 15> min(x) #Least value in x[1] 6> sum(x) #Sum of all values in x[1] 105> prod(x) #Product of all values in x[1] 10897286400> mean(x) #Average of all values in x[1] 10.5> range(x) #Range of vector x[1] 6 15> var(x) #Variance of x[1] 9.166667> sd(x) #Standard deviation of x[1] 3.02765> sqrt(var(x)) #Square root of variance=sd of x[1] 3.02765
13
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
OutlinePARTD:DATASTRUCTUREINR
1. TypesofDatastructureScalars&Vectors
MatricesOtherstructures:Factors,Lists,DataFrames
2. CheckingandChangingDataObjectStructure
14
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
TypesofDataObjectStructure:Matrices
• Rstoresdataelementsina2-dimensionalmatrixusingthefunctionmatrix()
• Computationallyefficient- Manipulatedataasmatrices
Arrayisamatrixwithmorethan2-dimension 15
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
MatricesinR
> x.mat=matrix(c(2,5,1,5,6,8),nrow=3,ncol=2)
> x.mat=matrix(c(2,5,1,5,6,8),ncol=2)
A=255618
[,1] [,2][1,] 2 5[2,] 5 6[3,] 1 8
[,1] [,2][1,] 2 5[2,] 5 6[3,] 1 8
> x.mat=matrix(c(2,5,1,5,6,8),3,2)[,1] [,2]
[1,] 2 5[2,] 5 6[3,] 1 8
16
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
Extractingelement(s)fromamatrix
No Task R Code Output
1 Extractelementin3rdrowof 1st column
> x.mat[3,1] [1] 1
2 Extractallobservationsinthe2nd column
> x.mat[ ,2] [1] 5 6 8
3 Extracttallobservationsinthe3rd row ? [1] 1 8
4 Extractsubmatrices > x.mat[1:2,] [,1] [,2][1,] 2 5[2,] 5 6
Justlikevectorselementsareindicatedbythelabelsinthematrices> x.mat
[,1] [,2][1,] 2 5[2,] 5 6[3,] 1 8
[ROW,COLUMN]
17
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
Combiningmatrixorvectors(matrixform) toamatrix
No Task R Code Output1 Combining
avector> y.mat=matrix(1:3,3,1)
[,1][1,] 1[2,] 2[3,] 3
> cbind(x.mat,y.mat) > cbind(x.mat,y.mat)[,1] [,2] [,3]
[1,] 2 5 1[2,] 5 6 2[3,] 1 8 3
(Providedthatthelengthisappropriate)
> x.mat[,1] [,2]
[1,] 2 5[2,] 5 6[3,] 1 8
Combine matrices by columnscbind()
18
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
Combiningmatrixorvectors(matrixform)toamatrix
No Task R Code Output1 Creatematrix
z.mat in R ? z.mat[,1] [,2]
[1,] 1 4[2,] 2 5
2 Combinematrixx.mat andz.mat byrowsandassignthenewmatrixasA.mat
? A.mat[,1] [,2]
[1,] 2 5[2,] 5 6[3,] 1 8[4,] 1 4[5,] 2 5
Combine matrices by row rbind()
19
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
Transposingamatrix:t()No Task R Code Output
1 Transpose amatrix
> t(x.mat) [,1] [,2] [,3][1,] 2 5 1[2,] 5 6 8
20
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
MatrixArithmeticOperations:No Task R Code Output
1 Checkdimension ofmatrix
> dim(x.mat)> dim(y.mat)
[1] 3 2[1] 3 1#clearly these two matrices cannot be multiplied
2 Matrixmultiplication
> dim(t(x.mat))> t(x.mat)%*%y.mat
[1] 2 3[,1]
[1,] 15[2,] 41
3 Inverseofamatrix#solve()function. Thematrixmustbesquareandnotsingular.
>A=matrix(sample(4),2,2)
# any matrix
[,1] [,2][1,] 2 3[2,] 4 1
> solve(A)
#inverse of A#TEST that the AA-1=I
[,1] [,2][1,] -0.1 0.3[2,] 0.4 -0.2
Matrixmultiplications
%*%Ensureappropriatedimension
21
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
OutlinePARTD:DATASTRUCTUREINR1. TypesofDatastructure
ScalarsVectorsMatricesOtherstructures:Factors,Lists,DataFrames
2. CheckingandChangingDataObjectStructure
22
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
OtherTypesofDataObjectStructure:Factor• typeofcharacter/stringvector• Typicallyusedtodescribethedata- notforcalculations
• Labelsforqualitativevariables
> quality=c("High","Medium","Low")> quality=factor(quality)
> quality[1] High Medium Low Levels: High Low Medium
factor()
23
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
OtherTypesofDataObjectStructure:Lists• dataobjectscontaining‘every’elements• Containselementofmiscellaneousmodes• Usefulfororganising information
> mylist= list(5,6,c(1,2,3),c("blue","red"),x.mat)
> mylist[[1]][1] 5
[[2]][1] 6
[[3]][1] 1 2 3
[[4]][1] "blue" "red"
[[5]][,1] [,2]
[1,] 2 5[2,] 5 6[3,] 1 8
list()
> mylist[[5]]
Elementsonalistcanbeextracted
24
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
OtherTypesofDataObjectStructure:DataFrames• similartoaspreadsheet• Eachcolumnisavector.Elementsineachvectorhasthesamemode.Differentvectorscanhavedifferentmodes.
• Allvectorsinthedataframemustbethesamelength
data.frame()
> x=1:2> y=c(”a”,”b”)> z=c(100,200)> A.df=data.frame(x,y,z);A.df
> A.dfx y z
1 1 a 1002 2 b 200
25
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
OtherTypesofDataObjectStructure:DataFrames• Iftheelementsaredefinedwithinadataframe,• Useattach(nameofdataframe)toreadtheelements
data.frame()
> A.df=data.frame(x1=1:2,y1=c("a","b"),c1=c(100,200))
> x1
> attach(A.df)> x1
> # or use $ sign>A.df$x1
> A.dfx1 y1 c11 1 a 1002 2 b 200
Error: object 'x1' not found
> x1[1] 1 2
> A.df$x1[1] 1 2 26
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
OutlinePARTD:DATASTRUCTUREINR1. TypesofDatastructure
ScalarsVectorsMatricesOtherstructures:Factors,Lists,DataFrames
2. CheckingandChangingDataObjectStructure
27
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
CheckingDataObjectStructure• IdentifytypesofDatastructure:vector,matrix,listetc
is.<what>Eg. is.vector();is.matrix();is.numeric; is.character
> x<-c(1,2,3,4) > #check data object type > is.vector(x)
> is.data.frame(x)
> #check data mode > is.character(x)
> is.numeric(x)
[1] TRUE
[1] FALSE
[1] FALSE
[1] TRUE 28
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
ChangeDataObjectStructure• Forcingastructuretoanother
as.<what>Eg. as.vector();as.matrix();as.numeric; as.character
> x<-c(1,2,3,4)
> x1=as.matrix(x)
> x[1] 1 2 3 4
> x1[,1]
[1,] 1[2,] 2[3,] 3[4,] 4
29
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
Next
• PartE:Read/ImportData
30
Workshop:Getting Startedwith R.UTM14Oct2018.©Dr.NorhaizaAhmad
HandyTip• Tocallavariableinadataframe.Wecoulduse$signorattach(<dataframe>)thencallthevariable.• However,ifthenameofthevariableinthedataframe hasalreadybeendefined
earlierasadataobjectoutsidethedataframe (Globalenvironment),Thus-Callingthevariableinadataframe afterattach()mightfail,asthevariableinthedataframe couldbe‘masked’bythedataobject.
• Say>x1=c(“here”,”there”) #definedasadataobject.Then> A.df=data.frame(x1=1:2,y1=c("a","b"),c1=c(100,200))Thus, if > attach(A.df);x1Then x1 is displayed as “here”,”there” instead of 1:2
• Onewayistoremovethedataobjectx1> rm(x1)Then attach again
31