sas academic conference - sas and sas/eg at the plant ... · pdf filec = cos(2*constant('pi')...

Download SAS Academic Conference - SAS and SAS/EG at the Plant ... · PDF fileC = cos(2*constant('pi') ... •Hettinger P. SAS paper DV-03 ... SAS and SAS/EG at the Plant Systems Biology Department

If you can't read please download the document

Upload: lephuc

Post on 06-Feb-2018

222 views

Category:

Documents


1 download

TRANSCRIPT

  • VERONIQUE STORME

    SAS and SAS/EG at the Plant Systems Biology Department

  • Using the mixed procedure to analyze data with empty cells

    random coefficient model with a fitted periodic function

    invoke SAS in batch mode in a unix environment passing through environment variables

    SAS Enterprise Guide: using prompts

    SAS information

    outline

  • Using the mixed procedure to analyze data with empty cells

  • Leaf area data on 6 different plant varieties

    Clustered data

    Leaf series analysis: Leaf area was measured on 11 leaves from a single plant

    Some varieties have less than 11 leaves

    The experiment was performed 3 times

    Problem setting

  • Subject plots

  • Frequency tables

    Table of Line by Leaf

    Line Leaf

    Frequency 0 1 3 4 5 6 7 8 9 10 11 Total

    AA1_H 30 30 30 30 30 30 30 28 12 1 0 251

    AA1_WT 30 30 30 30 30 30 30 30 23 4 0 267

    AA2_H 28 28 28 28 28 28 28 28 25 5 1 255

    AA2_WT 28 28 28 28 28 28 28 28 25 4 0 253

    AA3_H 24 24 24 24 24 24 24 23 17 4 0 212

    AA3_WT 30 30 30 30 30 30 30 30 28 14 5 287

    Total 170 170 170 170 170 170 170 167 130 32 6 1525

  • Analysis 1:

    model Area= Line|Leaf / ddfm = KR ;

    repeated leaf/type=un subject=PlantID ;

    random int /subject=experiment ;

    Analysis 2:

    model Area= Line|Leaf / ddfm = KR ;

    repeated leaf/type=ar(1) subject=PlantID ;

    random int /subject=experiment ;

    Analysis 3:

    model Area= Line|Leaf ;

    repeated leaf/type=ar(1) subject=PlantID ;

    random int /subject=experiment

    Proc mixed analyses

  • Results type 3 tests of fixed effects ANALYSIS 1 (KR UN)

  • Analysis 1:

    model Area= Line|Leaf / ddfm = KR ;

    repeated leaf/type=un subject=PlantID ;

    random int /subject=experiment ;

    Analysis 2:

    model Area= Line|Leaf / ddfm = KR ;

    repeated leaf/type=ar(1) subject=PlantID ;

    random int /subject=experiment ;

    Analysis 3:

    model Area= Line|Leaf ;

    repeated leaf/type=ar(1) subject=PlantID ;

    random int /subject=experiment

    Proc mixed analyses

  • ANALYSIS 2 (KR AR(1))

    Results type 3 tests of fixed effects ANALYSIS 1 (KR UN)

  • Analysis 1:

    model Area= Line|Leaf / ddfm = KR ;

    repeated leaf/type=un subject=PlantID ;

    random int /subject=experiment ;

    Analysis 2:

    model Area= Line|Leaf / ddfm = KR ;

    repeated leaf/type=ar(1) subject=PlantID ;

    random int /subject=experiment ;

    Analysis 3:

    model Area= Line|Leaf ;

    repeated leaf/type=ar(1) subject=PlantID ;

    random int /subject=experiment

    Proc mixed analyses

  • ANALYSIS 3 (CONTAINMENT AR(1)) ANALYSIS 2 (KR AR(1))

    Results type 3 tests of fixed effects ANALYSIS 1 (KR UN)

  • Option 1: truncate the data

    Option 2: tune the sensitivity in sweeping with the singular option

    model Area= Line|Leaf / ddfm = KR singular = 1E-7;

    repeated leaf/type=ar(1) subject=PlantID ;

    The singular option

  • random coefficient model with a fitted periodic function

  • 2 Arabidopsis varieties

    4 treatment conditions

    High-througput phenotyping (IGIS)

    Phenotype: compactness

    Compactness describes if the leaves are nearer around the centroid or farther away from it, e. g. by having longer stipes

    rhythmic leaf movements (circadian clock)

    Problem setting

  • Research question:

    Is there an effect of the variety and/or treatment on the amplitude

    Mean line plot

  • Fit a model with a fundamental sine wave

    Allow sinusodal deviations for each plant

    Assume these random coefficients come from the same normal distribution

    Use a single variance component for all the trigonometric components

    model compactness =

    S C G(enotype) T(reatment) time S*G C*G S*T C*T G*time T*time

    S*G*T C*G*T G*T*time/ ddfm=Satterthwaite;

    random plantID;

    random S*plantID C*plantID /type=toep(1) ;

    Where S = sin(2*constant('pi')*time/24)

    C = cos(2*constant('pi')*time/24)

    The analysis model

  • b0 + b1*S + b2*C + b3*G1 + b4*T1 + b5*T2 + b6*T3 + b7*time +

    b8*S*G1 + b9*C*G1 +

    b10*S*T1 + b11*S*T2 + b12*S*T3 + b13*C*T1 + b14*C*T2 + b15*C*T3 +

    b16*time*G1 + b17*time*T1 + b18*time*T2 + b19*time*T3 +

    b20*S*G1*T1 + b21*S*G1*T2 + b22*S*G1*T3 +

    b23*C*G1*T1 + b24*C*G1*T2 + b25*C*G1*T3 +

    b26*time*G1*T1 + b27*time*G1*T1 + b28*time*G1*T1

    The regression model

  • Amplitude:

    1 + 82 + 2 + 9

    2

    GT1 AND REF TREATMENT

    b0 + b1*S + b2*C + b7*time

    Amplitude: 12 + 2

    2

    REF GT AND REF TREATMENT

    Mean predicted amplitude

    Amplitude:

    1 + 102 + 2 + 13

    2

    REF GT AND TREATMENT T 1

    Amplitude:

    1 + 8 + 10 + 202 + 2 + 9 + 13 + 23

    2

    GT1 AND TREATMENT T 1

  • Parametric bootstrap (ie resampling residuals)

    Fit the model

    Bootstrap sample from the residuals

    Add the randomly resampled e to Y-hat

    Fit the model for each of the B reps

    Compute bootstrap estimates

    Difficulty: unbalanced clustered data

    Standard errors ?

  • invoke SAS in batch mode in a unix environment passing through environment variables

  • %let path=%sysget(fullpath); * returns the value as string;

    %let libname=%sysget(wkd);

    libname &libname "&path";

    %include "&path.m_NameConversions.sas" /source2;

    %include "&path.m_selectSNPs.sas" /source2;

    %include "&path.m_selectPheno.sas" /source2;

    %NameConversions(libname=wkd,traitfile=file);

    %selectSNPs(libname=wkd);

    %selectPheno(libname=wkd);

    Main code

  • [vesto@midas TEST]$ sas

    -set fullpath "/group/biostat/myGWASprojects/SNP_ARAB/GCEP/GALAXY/TEST/"

    -set wkd "stem"

    -set file "stems3.txt"

    -sysin workflow_unix.sas

    set : defines an environment variable

    sysin: specifies an external file

    Invoke SAS in batch mode (Red Hat Enterprise LINUX 6)

  • SAS Enterprise Guide: using prompts

  • Way to automate your project

    Prompts pass parameters to macro variables

    Example:

    routine two-way analysis of variance where the whole experiment was repeated 3 times independently

    Performing simple tests of effects with the plm procedure

    SAS Enterprise Guide Prompts

  • Open the sas program

    Create prompts

    Assign the prompts to the program

    Steps

  • libname dir "&path";

    ods graphics on;

    proc mixed data =dir.&inputdata scoring = 3;

    class &F1 &F2 █

    model &Y=&F1 &F2 &F1.*&F2 /ddfm= satterthwaite solution vciry

    outp=out singular=1E-7;

    random █

    lsmeans &F1.*&F2 ;

    repeated /group = &F2 ;

    store work.result;

    run;

    ods graphics off;

    proc plm restore = work.result;

    slice &F1.*&F2/sliceby=&F2 diff adjust=&method;

    effectplot;

    lsmeans &F1.*&F2 ;

    run;

  • path

    Give the path where the inputdata is located

    Name of the

    macro

    variable

  • proc plm restore = work.result;

    slice &F1.*&F2/sliceby=&F2 diff

    adjust=&method;

  • SAS provides a wealth of information

  • Koen Knapen

    SAS Technical Support:

    Aditya

    Bart

    ACKNOWLEDGEMENTS

    Proc mixed manual

    Schaalje et al (SAS Paper 262-26)

    Cassell D.L. SAS paper 183-2007

    Morris J.S., 2002

    Shang S and Cavanaugh J.E., 2008

    Hettinger P. SAS paper DV-03

    REFERENCES

    references

  • Default algorithm to optimize the likelihood function:

    ridge-stabilized Newton-Raphson algorithm

    Possible problems

    covariance parameters are on a different scale

    Rescale the effects

    poor MIVQUE(0) starting values

    Use the Fisher scoring algorithm in the first 3 steps

    Proc mixed data = input scoring = 3;

    Other convergence problems