sas n r lab record

Upload: anshuman-singh

Post on 05-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/31/2019 SAS n r lab record

    1/18

    SAS-Statistical Analysis System- & R Lab

    Lab Record

    AMITY UNIVERSITY RAJASTHAN

    Submitted to:

    Submitted by:

    Mr. Amit kumar Chitralekha

    Rathore

  • 7/31/2019 SAS n r lab record

    2/18

    Faculty

    B.Tech Bioinformatics

    AIBVIII Sem

    SAS-Statistical Analysis System- & R Lab

    Lab Record

    AMITY UNIVERSITY RAJASTHAN

  • 7/31/2019 SAS n r lab record

    3/18

    Submitted to:

    Submitted by:

    Mr. Amit kumar YogitaKumawat

    Faculty

    B.Tech Bioinformatics

    AIB

    VIII SemSAS-Statistical Analysis System- & R Lab

    Lab Record

    AMITY UNIVERSITY RAJASTHAN

  • 7/31/2019 SAS n r lab record

    4/18

    Submitted to:Submitted by:

    Mr. Amit kumar Parul

    Singh

    Faculty

    B.Tech Bioinformatics

    AIB

    VIII Sem

    EXPERIMENT-1

    OBJECTIVE: To explore various data-types available in R

    SYNTAX:

    1 Double

    If you do calculations on numbers, you can use the data type double to represent thenumbers. Doubles are numbers like 3.1415, 8.0 and 8.1. Doubles are used torepresent continuous variables like the weight or length of a person.

    Use the function is.double to check if an object is of type double. Alternatively, usethe function typeofto ask R the type of the object x.

  • 7/31/2019 SAS n r lab record

    5/18

    2 Integer

    Integers are natural numbers. They can be used to represent counting variables, forexample the number of children in a household.

    Note that 3.0 is not an integer, nor is 3 by default an integer!

    However, you can mix objects of type `double' and `integer' in one calculationwithout any problems.

    The maximum integer in R is 231-1.

    3 Complex

    Objects of type `complex' are used to represent complex numbers. In statistical dataanalysis you will not need them often. Use the function as.complex or complex tocreate objects of type complex.

  • 7/31/2019 SAS n r lab record

    6/18

    4 Logical

    An object of data type logical can have the value TRUE or FALSE and is used toindicate if a condition is true or false. Such objects are usually the result of logicalexpressions

    The result of the functionis.double is an object of type logical (TRUE or FALSE).

    Logical expressions are often built from logical operators:

    < smaller than

    larger than

    >= larger than or equal to

    == is equal to

    != is unequal to

    The logical operators and, or and not are given by&, | and !, respectively.

    Calculations can also be carried out on logical objects, in which case the FALSE isreplaced by a zero and a one replaces the TRUE. For example, the sum function canbe used to count the number of TRUE's in a vector or array.

    5 Character

  • 7/31/2019 SAS n r lab record

    7/18

    A character object is represented by a collection of characters between double quotes("). For example: "x", "test character"and"iuiu8ygy-iuhu". One way to createcharacter objects is as follows.

    The double quotes indicate that we are dealing with an object of type character'.

    6 Factor

    The factor data type is used to represent categorical data (i.e. data of which the valuerange is a collection of codes). For example: variable `sex' with values male andfemale. variable `blood type' with values: A, AB and O. An individual code of the valuerange is also called a level of the factor variable. So the variable `sex' is a factor

    variable with two levels, male and female.

    The object sex is a character object. You need to transform it to factor.

    Use the function levels to see the different levels a factor variable has.

    Note that the result of the levels function is of type character. Another way togenerate the sex variable is as follows:

    The object sex' is an integer variable, you need to transform it to a factor.

    7 Dates and Times

    To represent a calendar date in R use the function as.Date to create an object of classDate.

  • 7/31/2019 SAS n r lab record

    8/18

    You can add a number to a date object, the number is interpreted as the number ofday to add to the date.

    8 Missing data and Infinite values

    We have already seen the symbol NA. In R it is used to represent `missing' data (NotAvailable). It is not really a separate data type, it could be a missing double or a

    missing integer. To check if data is missing, use the functionis.na or use a directcomparison with the symbol NA. There is also the symbol NaN(Not a Number), whichcan be detected with the function is.nan.

    Infinite values are represented by Infor-Inf. You can check if a value is infinite with thefunction is.infinite. Use is.finite to check if a value is Infinite.

    In R NULL represents the null object. NULL is used mainly to represent the lists withzero length, and is often returned by expressions and functions whose value isundefined

  • 7/31/2019 SAS n r lab record

    9/18

    EXPERIMENT-2

    OBJECTIVE: To understand the types of data structure

    SYNTAX

    Before you can perform statistical analysis in R, your data has to be structured insome coherent way. To store your data R has the following structures:

    vector

    matrix

    array

    data frame

    time-series

    List

    1 Vectors

    The simplest structure in R is the vector. A vector is an object that consists of anumber of elements of the same type, all doubles or all logical. A vector with thename `x' consisting of four elements of type `double' (10, 5, 3, 6) can be constructedusing the function c.

    The function merges an arbitrary number of vectors to one vector. A single number isregarded as a vector of length one.

    Use the functionroundto round the numbers in a vector.

    2 Matrices

    Generating matrices

    A matrix can be regarded as a generalization of a vector. As with vectors, all theelements of a matrix must be of the same data type. A matrix can be generated inseveral ways. For example:

    Use the function dim:

  • 7/31/2019 SAS n r lab record

    10/18

    Use the function matrix:

    3 Arrays

    Arrays are generalizations of vectors and matrices. A vector is a one-dimensionalarray and a matrix is a two dimensional array. As with vectors and matrices, all the

    elements of an array must be of the same data type. An example of an array is thethree-dimensional array `iris3', which is a built-in data object in R. A threedimensional array can be regarded as a block of numbers.

    All basic arithmetic operations which apply to matrices are also applicable to arraysand are performed on each element.

    The function array is used to create an array object

    4 Data frames

  • 7/31/2019 SAS n r lab record

    11/18

    Data frames can also be regarded as an extension to matrices. Data frames can havecolumns of dierent data types and are the most convenient data structure for data analysis in R. Infact, most statistical modeling routines in R require a data frame as input.

    One of the built-in data frames in R is `mtcars'.

    The attributes of a data frame can be retrieved separately from the data frame withthe functions names and row.names

    5 Time-series objects

    In R a time-series object (an object of class `ts') is created with the function ts. Itcombines two components:

    The data, a vector or matrix of numeric values. In case of a matrix, eachcolumn is a separate time-series.

    The dates of the data, the dates are equi spaced points in time.

  • 7/31/2019 SAS n r lab record

    12/18

    The function tsp returns the start and end time, and also the frequency withoutprinting the complete data of the time-series

    6 Lists

    A list is like a vector. However, an element of a list can be an object of any type andstructure. Consequently, a list can contain another list and therefore it can be used toconstruct arbitrary data structures. Lists are often used for output of statisticalroutines in R. The output object is often a collection of parameter estimates,residuals, predicted values etc.

    For example, consider the output of the function lsfit. In its most simple form thefunction ts a least square regression.

  • 7/31/2019 SAS n r lab record

    13/18

  • 7/31/2019 SAS n r lab record

    14/18

    EXPERIMENT-3

    OBJECTIVE: To Perform computing on a data matrix

    SYNTAX:

    Creating mat8 and mat9>mat8 mat8

    [,1] [,2] [,3]

    [1,] 1 3 5

    [2,] 2 4 6

    >mat9 mat9

    [,1] [,2] [,3]

    [1,] 1 1 1

    [2,] 2 2 2

    Computations on data matrix are :

    1.)addition

    >mat9 + mat8

    [,1] [,2] [,3]

    [1,] 2 4 6

    [2,] 4 6 8

    >mat9 + 3

    [,1] [,2] [,3]

    [1,] 4 4 4

    [2,] 5 5 5

    2.)subtraction

    >mat8 - mat9

    [,1] [,2] [,3]

    [1,] 0 2 4

  • 7/31/2019 SAS n r lab record

    15/18

    [2,] 0 2 4

    3.)inverse

    >solve(mat8[, 2:3])

    [,1] [,2][1,] -3 2.5

    [2,] 2 -1.5

    4.)transpose

    >t(mat9)

    [,1] [,2]

    [1,] 1 2

    [2,] 1 2

    [3,] 1 2

    5.)multiplication

    >#we transpose mat8 since the dimension of the matrices have to match

    >#dim(2, 3) times dim(3, 2)

    >mat8 %*% t(mat9)

    [,1] [,2]

    [1,] 9 18

    [2,] 12 24

    6.)element-wise multiplication

    >mat8 * mat9

    [,1] [,2] [,3]

    [1,] 1 3 5

    [2,] 4 8 12

    >mat8 * 4

    [,1] [,2] [,3]

    [1,] 4 12 20

    [2,] 8 16 24

  • 7/31/2019 SAS n r lab record

    16/18

    7.)division

    >mat8/mat9

    [,1] [,2] [,3]

    [1,] 1 3 5[2,] 1 2 3

    >mat9/2

    [,1] [,2] [,3]

    [1,] 0.5 0.5 0.5

    [2,] 1.0 1.0 1.0

    8.)using submatrices from the same matrix in computations

    >mat8[, 1:2]

    [,1] [,2]

    [1,] 1 3

    [2,] 2 4

    >mat8[, 2:3]

    [,1] [,2]

    [1,] 3 5

    [2,] 4 6

    >mat8[, 1:2]/mat8[, 2:3]

    [,1] [,2]

    [1,] 0.3333333 0.6000000

    [2,] 0.5000000 0.6666667

    9.)mean

    > apply (mat8,2,mean)

    [1] 1.5 3.5 5.5

    > apply (mat8,1,mean)

    [1] 3 4

  • 7/31/2019 SAS n r lab record

    17/18

    EXPERIMENT-4

    OBJECTIVE: To implement Descriptive Statistics in R.

    SYNTAX:

    > age income range(age)

    [1]

    Mean

    In R, a mean can be calculated on an isolated variable via the mean(VAR) command,where VAR is the name of the variable whose mean you wish to compute

    > mean(age)

    [1] 36

    Standard deviation

    Standard deviation can be calculated for each of the variables in a dataset by using thesd(DATAVAR) command, where DATAVAR is the name of the variable containing the data.

    > sd(income)

    [1] 11126.54

    Percentiles

    Given a dataset and a desired percentile, a corresponding value can be found using thequantile(VAR, c(PROB1, PROB2,)) command. Here, VAR refers to the variable name andPROB1, PROB2, etc., relate to probability values.

    quantile(age

  • 7/31/2019 SAS n r lab record

    18/18

    Summary

    A very useful multipurpose function in R is summary(X), where X can be one of any number ofobjects, including datasets, variables, and linear models, just to name a few. When used, thecommand provides summary data related to the individual object that was fed into it.

    > summary(income)Min. 1st Qu. Median Mean 3rd Qu. Max.

    10000 25000 30000 27400 32000 40000