introduction to bayesian sem

Upload: milicalazic123456789

Post on 02-Jun-2018

224 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/11/2019 Introduction to Bayesian SEM

    1/127

    Bayesian Structural Equation Modelling Using Mplus

  • 8/11/2019 Introduction to Bayesian SEM

    2/127

    Overview: Major Steps in the Bayesian Approach to Data Analysis

  • 8/11/2019 Introduction to Bayesian SEM

    3/127

    Research Question

    Estimation

    Model fit

    Hypotheses evaluation - Model selection

    The Data to be Collected: Variables and Sample Size

    How to enter the data into Mplus

    Missing data

    The Statistical Model

    How to specify a statistical model in MplusHow to specify an imputation model in Mplus

    The Prior Distribution

    Default Uninformative and how to specify in Mplus

    Informative and how to specify in Mplus

    The Posterior Distribution

    Estimates and credible intervals

    How to check convergence

    Model fit

    Hypothesis evaluation and Model selection

    How to interpret Mplus output

  • 8/11/2019 Introduction to Bayesian SEM

    4/127

    Lecture 1: Bayesian Estimation

    Data, Research Question, and Statistical Model

  • 8/11/2019 Introduction to Bayesian SEM

    5/127

    Research Question

    ?

  • 8/11/2019 Introduction to Bayesian SEM

    6/127

    The Data N=65

    ... ... ... ...20 3 7 13

    21 1 4 11

    22 2 4 9

    23 3 6 11

    24 4 6 9

    25 8 7 1626 11 9 16

    27 5 3 7

    28 5 5 8

    29 11 6 14

    30 6 6 10

    31 7 5 11

    32 8 8 10

    33 9 5 8

    34 2 2 1

    35 4 4 8

    ... ... ... ...

    ID Stork Urban Babies

    title:

    Mediation Model for the Stork Data;

    data:

    file = stork.txt;

    variable:

    names = ID stork urban babies;

    usev = stork urban babies;

  • 8/11/2019 Introduction to Bayesian SEM

    7/127

    The Statistical Model - 1

    model:

    urban on stork (a);babies on urban stork (b c);

    [urban] (d);

    [babies] (e);

    urban (f);

    babies (g);

    Stork

    Urban

    Babies

    a b

    c

    f

    g

  • 8/11/2019 Introduction to Bayesian SEM

    8/127

    The Statistical Model - 2

    d

    af

    Urban = d + a Stork + error with error ~ N(0,f)

  • 8/11/2019 Introduction to Bayesian SEM

    9/127

    The Statistical Model - 3

    City

    Village

    Rural

    c

    Babies = e + c Stork + b Urban + error with error ~ N(0,g)

  • 8/11/2019 Introduction to Bayesian SEM

    10/127

    Lecture 1: Bayesian Estimation

    Introducing Prior, Posterior, and Sampling Based Estimation

    Using One Variable

  • 8/11/2019 Introduction to Bayesian SEM

    11/127

    The Prior Distribution - 1 - Introduction - Non Informative Prior Distribution

    A simple example based on expert elicitation: How many babies are born per

    1,000 inhabitants per year in the Netherlands .

    ... ...

    20 13

    21 11

    21 9

    22 11

    23 9

    24 16

    25 16

    26 7

    27 8

    ... ...

    ID Babies

    Data

    The mean is 9, the standard error of the mean

    is .5 this means that the data tell us that

    between 8 and 10 babies are born.

    Note I computed a confidence interval for the

    mean using 9 +/- 2 x .5. Note that 2 is almost

    1.96 a more precise value for the computation

    of a confidence interval.

    No prior information was used, that is, an

    uninformative prior distribution was used.

    model:

    [babies] (a);

    babies (b);

  • 8/11/2019 Introduction to Bayesian SEM

    12/127

    The Prior Distribution - 2 - Introduction - Informative Prior Distribution

    Expert Elicitation:

    I assume that in each region containing 1,000 persons, the age distribution is

    uniform between 0-100 years of age.

    This means that each year 200 persons are between 20 and 40 years of age

    (the fertile years). Which renders 80 couples and 40 bachelors.

    On average I expect each couple to have 2 children, that is, 160 children over

    the course of 20 years. This means 8 children per year per region containing

    1,000 persons.

    In my line of argument Im most uncertain about the uniform age distribution.

    I know the amount of elderly is increasing, so maybe there are only 160 persons

    between 20 and 40 years of age 64 couples 128 children about 6 children

    per year. On the other hand there may still be less elderly than young, so maybe

    240 96 couples 192 chidren about 10 per year.

    In summary, I expect 8, but my credible interval is between 6 and 10 which means

    my personal standard error is 1 (8 +/- 2 x 1 gives my credible interval).

  • 8/11/2019 Introduction to Bayesian SEM

    13/127

    The Prior Distribution - 3 - Introduction

    The Normal Prior Distribution Used for means and regression coefficients.

    MODEL PRIORS:

    a ~ N(8,1);

    8

    8

    106

    1410 12

    12 14

    6

    42

    42

    8 1410 12642

    MODEL PRIORS:

    a ~ N(8,9);

    MODEL PRIORS:

    a ~ N(8,100000);

    a

    a

    a

  • 8/11/2019 Introduction to Bayesian SEM

    14/127

    The Prior Distribution - 4 - Introduction

    The Inverse Gamma Prior Distribution Used for variances.

    MODEL PRIORS:

    b ~ IG(.001,.001);

    b0

    The default in Mpus is

    uninformative improper

    MODEL PRIORS:

    b ~ IG(-1,.0);

    b0

    Uninformative

    proper

    h b d

  • 8/11/2019 Introduction to Bayesian SEM

    15/127

    The Posterior Distribution - 1 - Introduction

    Combining Data Knowledge and Prior Knowledge

    a - Mean Number of Babies

    98

    Prior Data

    Posterior

    h i i ib i 2 d i

  • 8/11/2019 Introduction to Bayesian SEM

    16/127

    The Posterior Distribution - 2 - Introduction

    The posterior distribution combines the information with respect to the mean

    number of babies in the data with the information in the prior distribution. This

    combination is executed by Mplus.

    Using sampling the information in the posterior distribution with respect to the

    mean number of babies is made accesible:

    Th P i Di ib i 3 I d i

  • 8/11/2019 Introduction to Bayesian SEM

    17/127

    9.1

    7.9

    8.3

    9.9

    7.1

    ...

    ...

    ...

    ...

    a

    Estimate:

    mean or median

    SD

    Credible Interval:

    central or highest

    a

    mean

    median

    2.5% 97.5%

    analysis:

    estimator = bayes;

    process = 2;fbiter = 100000;

    point = median;

    fbiter

    output:

    tech1 tech8

    standardized(stdyx)

    cinterval(hpd);

    plot:

    type = plot1 plot2 plot3;

    Data + Prior

    The Posterior Distribution - 3 - Introduction

    Th P t i Di t ib ti 4 I t d ti

  • 8/11/2019 Introduction to Bayesian SEM

    18/127

    model:

    [babies] (a);

    babies (b);

    MODEL PRIORS:

    a ~ N(8,1);

    b ~ IG(.001,.001);

    model:[babies] (a);

    babies (b);

    MODEL PRIORS:

    a ~ N(0,100000);

    b ~ IG(.001,.001);

    Estimate S.D. Lower 2.5% Upper 2.5%

    Means

    BABIES 9.078 0.443 8.203 9.945

    A Non Informative Prior Distribution for the Mean Number of Babies

    An Informative Prior Distribution for the Mean Number of Babies

    Estimate S.D. Lower 2.5% Upper 2.5%

    Means

    BABIES 8.904 0.405 8.098 9.688

    The Posterior Distribution - 4 - Introduction

  • 8/11/2019 Introduction to Bayesian SEM

    19/127

    Lecture 1: Bayesian Estimation

    Introducing Prior, Posterior, and Sampling Based Estimation

    Using The Stork Data (three variables) and

    Uninformative Priors

    Th P i Di t ib ti 5 U i f ti P i Di t ib ti f th St k D t

  • 8/11/2019 Introduction to Bayesian SEM

    20/127

    The Prior Distribution -5 - Uninformative Prior Distributions for the Stork Data

    MODEL PRIORS:

    a ~ N(0,100000);b ~ N(0,100000);

    c ~ N(0,100000);

    d ~ N(0,100000);

    e ~ N(0,100000);

    f ~ IG(.001,.001);

    g ~ IG(.001,.001);

    model:

    urban on stork (a);

    babies on urban stork (b c);

    [urban] (d);

    [babies] (e);

    urban (f);

    babies (g);

    User Specified

    Mplus Default

    MODEL PRIORS:

    a ~ N(0,Infinity);

    b ~ N(0,Infinity);

    c ~ N(0,Infinity);

    d ~ N(0,Infinity);

    e ~ N(0,Infinity);

    f ~ IG(-1,0);

    g ~ IG(-1,0);

    The Posterior Distribution 5 Bayesian Estimation Using Markov Chain Monte Carlo Methods

  • 8/11/2019 Introduction to Bayesian SEM

    21/127

    The Posterior Distribution - 5 - Bayesian Estimation Using Markov Chain Monte Carlo Methods

    model constraint:

    new(indirect);

    indirect = a*b;

    a b c d e f g indirect

    initial values initial values initial values initial values

    ... ... ... ... ... ... ... ...

    .35 1.14 -.11 2.89 4.00 3.46 7.15 .42

    .29 1.69 -.32 1.75 5.10 3.01 7.30 .49

    ... ... ... ... ... ... ... ...fbiter fb fb fb fb fb fb fb

    Stork

    Urban

    Babies

    a b

    c

    f

    g

    analysis:

    estimator = bayes;

    process = 2;fbiter = 100000;

    point = median;

    The Posterior Distribution 6 Output Computed Using the MCMC Sample

  • 8/11/2019 Introduction to Bayesian SEM

    22/127

    output:

    tech1 tech8 standardized(stdyx) cinterval(hpd);

    plot:

    type = plot1 plot2 plot3;

    The Posterior Distribution - 6 - Output Computed Using the MCMC Sample

    The Posterior Distribution 7 Histograms Estimates and Credible Intervals

  • 8/11/2019 Introduction to Bayesian SEM

    23/127

    Babies on Stork

    The Posterior Distribution - 7 - Histograms, Estimates and Credible Intervals

    The Posterior Distribution 8 Histograms Estimates and Credible Intervals

  • 8/11/2019 Introduction to Bayesian SEM

    24/127

    Indirect

    Note that the credible interval

    is not symmetric!

    The Posterior Distribution - 8 - Histograms, Estimates and Credible Intervals

    The Posterior Distribution 9 Estimates and Credible Intervals

  • 8/11/2019 Introduction to Bayesian SEM

    25/127

    MODEL RESULTS

    Posterior One-Tailed 95% C.I.

    Estimate S.D. P-Value Lower 2.5% Upper 2.5%

    URBAN ON

    STORK 0.375 0.072 0.000 0.236 0.517

    BABIES ON

    URBAN 1.143 0.185 0.000 0.781 1.509STORK -0.111 0.124 0.181 -0.356 0.131

    Intercepts

    URBAN 2.894 0.460 0.000 1.978 3.787

    BABIES 4.007 0.847 0.000 2.320 5.646

    Residual Variances

    URBAN 3.465 0.653 0.000 2.360 4.828

    BABIES 7.159 1.359 0.000 4.937 10.094

    New/Additional Parameters

    INDIRECT 0.422 0.108 0.000 0.225 0.644

    The Posterior Distribution - 9 - Estimates and Credible Intervals

  • 8/11/2019 Introduction to Bayesian SEM

    26/127

  • 8/11/2019 Introduction to Bayesian SEM

    27/127

    Lecture 1: Bayesian Estimation

    INTERMEZZO

    P-values

    The Posterior Distribution - 10 - The one-tailed p-value

  • 8/11/2019 Introduction to Bayesian SEM

    28/127

    0

    90% CI

    95%CI

    If 90% CI touches 0 the one-tailed p-value is .05.

    If 95% CI touches 0 the one-tailed p-value is .025.

    For about normal posterior distributions multiplication with 2

    renders a two-tailed p-value.

    Urban On Stork (a)

    The Posterior Distribution 10 The one tailed p value

    The Posterior Distribution - 11 - p-values credible intervals and model selection

  • 8/11/2019 Introduction to Bayesian SEM

    29/127

    p-values:

    For example .05

    Surely God loves the .06 as much as the .05

    Publication bias

    Multiple hypotheses testing and capitalization on chance

    Credible Intervals and Confidence Intervals

    What is the value of the parameter of interestIs the parameter positive, negative or is zero also in the ball-park

    With multiple parameters still capitalization on chance

    Model Selection

    Compare a few carefully chosen models

    Very power-full in combination with credible intervals andstandardized estimates

    The Posterior Distribution 11 p values, credible intervals, and model selection

  • 8/11/2019 Introduction to Bayesian SEM

    30/127

    Lecture 1: Bayesian Estimation

    Introducing Prior, Posterior, and Sampling Based Estimation

    Using The Stork Data (three variables) and

    informative Priors

    The Prior Distribution - 6- Informative Based on Historical Data

  • 8/11/2019 Introduction to Bayesian SEM

    31/127

    The Prior Distribution 6 Informative Based on Historical Data

    ... ... ... ...

    20 3 7 13

    21 1 4 11

    22 2 4 9

    23 3 6 11

    24 4 6 9

    25 8 7 16

    26 11 9 16

    27 5 3 7

    28 5 5 8

    29 11 6 14

    30 6 6 10

    31 7 5 1132 8 8 10

    33 9 5 8

    34 2 2 1

    35 4 4 8

    ... ... ... ...

    ID Stork Urban Babies

    1 5 62 11 8

    ... ... ...

    ... ... ...

    80 0 1

    ID Stork Urban

    The Current Data Historical Data

    model:

    urban on stork (a);

    [urban] ;

    urban;

    MODEL RESULTS Estimate S.D.URBAN ON STORK 0.400 0.050

    a ~ N(.400,.0025)

  • 8/11/2019 Introduction to Bayesian SEM

    32/127

    The Prior Distribution - 8- Informative Based on Historical Data

  • 8/11/2019 Introduction to Bayesian SEM

    33/127

    MODEL PRIORS:

    a ~ N(.400,.0025);

    b ~ N(0,100000);

    c ~ N(0,100000);

    d ~ N(0,100000);e ~ N(0,100000);

    f ~ IG(.001,.001);

    g ~ IG(.001,.001);

    model:

    urban on stork (a);

    babies on urban stork (b c);

    [urban] (d);

    [babies] (e);urban (f);

    babies (g);

    User Specified

    Suppose the data are collected by another research group

    in the Netherlands in 2010.

    e o st but o 8 o at e ased o sto ca ata

    The Posterior Distribution - 12- Comparing Results from Uninformative and Informative Priors

  • 8/11/2019 Introduction to Bayesian SEM

    34/127

    MODEL RESULTS Estimate S.D. Lower 2.5% Upper 2.5%

    URBAN ON STORK 0.375 0.072 0.236 0.517

    INDIRECT 0.422 0.108 0.225 0.644

    MODEL RESULTS Estimate S.D. Lower 2.5% Upper 2.5%

    URBAN ON STORK 0.391 0.041 0.314 0.473

    INDIRECT 0.444 0.086 0.283 0.621

    MODEL PRIORS:a ~ N(.400,.0025);

    MODEL PRIORS:

    a ~ N(0,100000);

    The result of using subjective priors is a gain in information. But, do you trust this?

    Would you be willing to use and defend this approach?

    p g

    The Prior Distribution - 9 - Extra Tools for the Specification of Informative Priors

  • 8/11/2019 Introduction to Bayesian SEM

    35/127

    p

    MODEL PRIORS:

    b ~ N (0, 1);

    c ~ N (0, 1);

    COVARIANCE (b, c) = 0.5;

    output:

    tech1 tech3 tech8

    standardized(stdyx) cinterval(hpd);

    Summary

  • 8/11/2019 Introduction to Bayesian SEM

    36/127

    y

    Research Question

    Statistical Model

    Prior Distribution - Informative Prior Distributions

    Posterior Distribution

    - Assymetric Credible Intervals

    - Small Sample Inferences no Asymptotic Approximations

    - No Heywood Cases, Like, for Example, Negative Variances

    - Sampling will ofter Work where Maximum Likelihood Fails

    References Bayesian Structural Equation Modelling

  • 8/11/2019 Introduction to Bayesian SEM

    37/127

    y q g

    A relatively accessible introduction to Bayesian structural equation modeling can be found in:

    Kaplan, D. and Depaoli, S. (2012). Bayesian Structural Equation Modeling. In R.H. Hoyle (Ed.),

    Handbook of Structural Equation Modeling, pp. 650-673. New York: The Guilford Press.

    A classic about the elicitation of prior knowledge is:

    OHagan, A., Buck, C.E., Daneshkhah, A., Eiser, J.R., Garthwaithe, P.H., Jenkinson, D.J..,

    Oakley, J.E., and Rakow, T. (2006). Uncertain Judgements. Eliciting Experts Probabilities.

    Chichester: Wiley.

    A classic introduction to Bayesian data analysis is:

    Gelman, A., Carlin, J.B., Stern, H.S., and Rubin, D.B. (2004). Bayesian Data Analysis.

    Boca Raton, FL: Chapman & Hall/CRC.

    The documentation provided by Mplus is:

    Muthen, B. (2010). Bayesian analysis in Mplus: A brief introduction.

    Asparouhov, T. and Muthen, B. (2010). Bayesian analysis in Mplus: Technical Implementation.

  • 8/11/2019 Introduction to Bayesian SEM

    38/127

    Lecture 2: Bayesian Estimation in the Presence of Missing Data

    Introduction

    Missing Data - 1 - Introduction

  • 8/11/2019 Introduction to Bayesian SEM

    39/127

    ... ... ... ...20 3 7 13

    21 1 4 11

    22 999 4 9

    23 3 6 11

    24 4 6 9

    25 8 7 1626 11 999 999

    27 5 3 7

    28 5 5 8

    29 999 6 14

    30 6 6 10

    31 7 5 999

    32 8 8 10

    33 999 999 8

    34 2 2 1

    35 4 4 8

    ... ... ... ...

    ID Stork Urban Babies

    variable:

    names = ID stork urban babies;

    usev = stork urban babies;

    missing = all (999);

    Stork

    Urban

    Babies

    a b

    c

    f

    g

    Missing Data - 2 - Introduction

  • 8/11/2019 Introduction to Bayesian SEM

    40/127

    By default Mplus with analysis: estimator = bayes; will use the statistical model

    that is specified to impute the missing data.

    First I will explain what is meant by imputation of the missing data.

    Secondly I will explain why it is usually NOT a good idea to used the statistical model

    that is specified to impute the missing data.

    One exception occurs if the amount of missing values is very small. A good question iswhat is a small amount of missing values?

    Another exception occurs if missings occur in variables that are ONLY a dependent

    variable and if the missingness is MAR given the predictors of the dependent variable.

    Third of all I will introduceMultiple imputation using a general imputation model

    Analysis of each imputed data set using a statistical model that is consistent with the

    imputation model

    Summarizing the results obtained from the analysis of each imputed data set

  • 8/11/2019 Introduction to Bayesian SEM

    41/127

    Lecture 2: Bayesian Estimation in the Presence of Missing Data

    Multiple Imputation

    Multiple Imputation Using the Statistical Model - 1

  • 8/11/2019 Introduction to Bayesian SEM

    42/127

    d

    af

    Multiple Imputation Using the Statistical Model - 2

  • 8/11/2019 Introduction to Bayesian SEM

    43/127

    a b c d e f g 22-S 26-U 26-B 29-U 31-B 33-S 33-U

    0 0 0 0 0 1 1 0 0 0 0 0 0 0

    ... ... ... ... ... ... ... ... ... ... ... ... ... ...

    .35 1.14 -.11 2.89 4.00 3.46 7.15 5 5 12 7 9 2 3

    .29 1.69 -.32 1.75 5.10 3.01 7.30 7 3 11 5 10 3 4

    ... ... ... ... ... ... ... ... ... ... ... ... ... ...

    fbiter fb fb fb fb fb fb fb fb fb fb fb fb fb

    MODEL RESULTS

    Posterior One-Tailed 95% C.I.

    Estimate S.D. P-Value Lower 2.5% Upper 2.5%

    BABIES ON

    URBAN 1.143 0.185 0.000 0.781 1.509

    STORK -0.111 0.124 0.181 -0.356 0.131

    New/Additional Parameters

    INDIRECT 0.422 0.108 0.000 0.225 0.644

  • 8/11/2019 Introduction to Bayesian SEM

    44/127

    Lecture 2: Bayesian Estimation in the Presence of Missing Data

    Data that are not Missing at Random

    Multiple Imputation Using the Statistical Model- 3 - Data that are NOT Missing at Random

  • 8/11/2019 Introduction to Bayesian SEM

    45/127

    Stork

    Urban

    Babies

    a b

    c

    f

    g

    ... ... ... ...

    20 3 7 1321 1 4 11

    22 999 4 9

    23 3 6 11

    24 4 6 9

    25 8 7 16

    26 11 999 99927 5 3 7

    28 5 5 8

    29 7 999 14

    30 6 6 10

    31 7 5 999

    32 8 8 1033 999 999 8

    34 2 2 1

    35 4 4 8

    ... ... ... ...

    ID Stork Urban Babies

    Multiple Imputation Using the Statistical Model- 4 - Data that are NOT Missing at Random

  • 8/11/2019 Introduction to Bayesian SEM

    46/127

    Urban Babies

    1 1

    2 2

    3 3

    4 4

    5 5

    6 6

    7 7

    8 8

    Urban Babies

    1 1

    2 23 3

    4 4

    5

    6

    7

    8

    Urban Babies

    1 1

    2 23 3

    4 4

    2.5 5

    2.5 6

    2.5 7

    2.5 8

    Urban Babies

    1 1

    2 23 3

    4 4

    5 5

    6 6

    7 7

    8 8

    Urban Babies

    Model:

    Babies on Urban;

    Urban;

    Model:

    Babies with Urban;

    Urban Babies

  • 8/11/2019 Introduction to Bayesian SEM

    47/127

    Lecture 2: Bayesian Estimation in the Presence of Missing Data

    Data that are Missing at Random

    Multiple Imputation Using the Statistical Model- 5 - Data that are Missing at Random

  • 8/11/2019 Introduction to Bayesian SEM

    48/127

    Urban Babies

    1 1

    2 2

    3 3

    4 4

    5 5

    6 6

    7 7

    8 8

    Urban Babies

    1 1

    2 23 3

    4 4

    5

    6

    7

    8

    Urban Babies

    1 1

    2 23 3

    4 4

    5 5

    6 6

    7 7

    8 8

    Urban Babies

    1 1

    2 23 3

    4 4

    5 5

    6 6

    7 7

    8 8

    Urban Babies

    Model:

    Babies on Urban;

    Urban;

    Model:

    Babies with Urban;

    Urban Babies

    Multiple Imputation Using a General Imputation Model - 1 - Data that are Missing at Random

  • 8/11/2019 Introduction to Bayesian SEM

    49/127

    ... ... ... ...

    20 3 7 1321 1 4 11

    22 999 4 9

    23 3 6 11

    24 4 6 9

    25 8 7 16

    26 11 999 99927 5 3 7

    28 5 5 8

    29 7 999 14

    30 6 6 10

    31 7 5 999

    32 8 8 1033 999 999 8

    34 2 2 1

    35 4 4 8

    ... ... ... ...

    ID Stork Urban Babies

    Stork

    Urban

    Babies

    model:

    stork with urban;stork with babies;

    urban with babies;

    [stork];

    [urban];

    [babies];

  • 8/11/2019 Introduction to Bayesian SEM

    50/127

    Multiple Imputation Using a General Imputation Model - 2 - How to do it in Mplus

  • 8/11/2019 Introduction to Bayesian SEM

    51/127

    title: this is an example of multiple imputation

    for a set of variables with missing values using

    a general statistical model;

    data: FILE = storkMI.txt;

    variable:

    names = ID stork urban babies;

    auxiliary = ID;

    usevariables = stork urban babies;missing = all (999);

    analysis: estimator = bayes;

    fbiter = 10000;

    proces = 2;

    data imputation:

    impute = stork urban babies;

    ndatasets = 10;

    thin = 1000;

    save = storkimp*.dat;

    model: stork with urban babies;

    urban with babies;

    [stork];

    [urban];

    [babies];

    output: tech8;

    plot: type = plot1 plot2 plot3;

    Multiple Imputation Using a General Imputation Model - 3 - Multiple Imputations

  • 8/11/2019 Introduction to Bayesian SEM

    52/127

    ... ... ... ...

    20 3 7 13

    21 1 4 11

    22 999 4 9

    23 3 6 11

    24 4 6 925 8 7 16

    26 11 999 999

    27 5 3 7

    28 5 5 8

    29 999 6 14

    30 6 6 10

    ... ... ... ...

    ID Stork Urban Babies

    ... ... ... ...

    3 7 13 20

    1 4 11 21

    4 4 9 22

    3 6 11 234 6 9 24

    8 7 16 25

    11 8 12 26

    5 3 7 27

    5 5 8 28

    9 6 14 29

    6 6 10 30

    ... ... ...

    ... ... ... ...

    3 7 13 20

    1 4 11 21

    7 4 9 22

    3 6 11 234 6 9 24

    8 7 16 25

    11 9 14 26

    5 3 7 27

    5 5 8 28

    8 6 14 29

    6 6 10 30

    ... ... ... ...

    ... ... ... ...

    3 7 13 20

    1 4 11 21

    5 4 9 22

    3 6 11 234 6 9 24

    8 7 16 25

    11 8 13 26

    5 3 7 27

    5 5 8 28

    11 6 14 29

    6 6 10 30

    ... ... ... ...

    ... ... ... ...

    3 7 13 20

    1 4 11 21

    6 4 9 22

    3 6 11 234 6 9 24

    8 7 16 25

    11 5 10 26

    5 3 7 27

    5 5 8 28

    11 6 14 29

    6 6 10 30

    ... ... ... ...

    Stork Urban Babies ID Stork Urban Babies IDStork Urban Babies ID Stork Urban Babies ID

    m = 1, ..., M

    Multiple Imputation Using a General Imputation Model- 4 - Data that are Missing at Random

  • 8/11/2019 Introduction to Bayesian SEM

    53/127

    It can never be ensured that data are missing at random.

    Use enough variables in the imputation model to feel confident that

    MAR is a reasonable assumption. There may be variables in the imputation

    model that do not appear in the statistical model.

    Can we in our example think of variables that could be very goodpredictors of missing data and that are not part of the statistical model?

    Never use to many variables in the imputation model. A rule of thumb is

    1 variable for every 20 cases in the data file. But this is only a rule of thumb!

    Creating a good imputation model is partly ART, partly SKILL, and ratherBAYESIAN because it requires carefull prior thinking, that is thinking

    without using empirical data.

    Multiple Imputation Using a General Imputation Model - 5 - How to do it in Mplus

  • 8/11/2019 Introduction to Bayesian SEM

    54/127

    title:

    Mediation Model for the Stork Data;

    data:file = storkimplist.dat;

    type = imputation;

    variable:

    names = stork urban babies ID;

    usev = stork urban babies;

    missing = all (999);

    model:

    urban on stork (a);

    babies on urban stork (b c);

    [urban] (d);

    [babies] (e);

    urban (f);

    babies (g);

    model constraint:

    new(indirect);

    indirect = a*b;

    analysis:

    estimator = ml;

    output:

    standardized(stdyx);

    Note the difference between the imputation model

    and the statistical model!!

    It is also quite common that the statistical model

    contains only a subset of the variables used in the

    imputation model.

    Multiple Imputation Using a General Imputation Model - 6 - Analyse Each Imputed Data Set

  • 8/11/2019 Introduction to Bayesian SEM

    55/127

    ... ... ... ...

    20 3 7 13

    21 1 4 11

    22 999 4 9

    23 3 6 11

    24 4 6 925 8 7 16

    26 11 999 999

    27 5 3 7

    28 5 5 8

    29 999 6 14

    30 6 6 10

    ... ... ... ...

    ID Stork Urban Babies

    ... ... ... ...

    3 7 13 20

    1 4 11 21

    4 4 9 22

    3 6 11 234 6 9 24

    8 7 16 25

    11 8 12 26

    5 3 7 27

    5 5 8 28

    9 6 14 29

    6 6 10 30

    ... ... ...

    ... ... ... ...

    3 7 13 20

    1 4 11 21

    7 4 9 22

    3 6 11 234 6 9 24

    8 7 16 25

    11 9 14 26

    5 3 7 27

    5 5 8 28

    8 6 14 29

    6 6 10 30

    ... ... ... ...

    ... ... ... ...

    3 7 13 20

    1 4 11 21

    5 4 9 22

    3 6 11 234 6 9 24

    8 7 16 25

    11 8 13 26

    5 3 7 27

    5 5 8 28

    11 6 14 29

    6 6 10 30

    ... ... ... ...

    ... ... ... ...

    3 7 13 20

    1 4 11 21

    6 4 9 22

    3 6 11 234 6 9 24

    8 7 16 25

    11 5 10 26

    5 3 7 27

    5 5 8 28

    11 6 14 29

    6 6 10 30

    ... ... ... ...

    Stork Urban Babies ID Stork Urban Babies IDStork Urban Babies ID Stork Urban Babies ID

    m = 1, ..., M

    Estimate SD Estimate SD Intercepts Estimate SD Estimate SD

    10.109 1.303 9.843 1.221 BABIES 10.567 1.432 9.992 1.271

    Estimate 10.002 SD 1.672 Rate of Missing Information .22

    Multiple Imputation Using a General Imputation Model - 7 - Relative Efficiency

  • 8/11/2019 Introduction to Bayesian SEM

    56/127

    Relative efficiency = 1 / (1 + rate/M)

    For the example on the previous transparancy:

    Relative efficiency = 1 / (1 + .22/10) = .98

    Multiple Imputation Using a General Imputation Model - 8 - Summarize the Multiple Analyses

  • 8/11/2019 Introduction to Bayesian SEM

    57/127

    INDIRECT 0.395 0.114 3.462 0.001 0.184

    STDYX Standardization Two-Tailed Rate ofEstimate S.E. Est./S.E. P-Value Missing

    URBAN ON

    STORK 0.536 0.095 5.633 0.000 0.123

    BABIES ON

    URBAN 0.693 0.110 6.307 0.000 0.234STORK -0.123 0.124 -0.986 0.324 0.152

    Intercepts

    URBAN 1.335 0.299 4.463 0.000 0.059

    BABIES 1.286 0.343 3.755 0.000 0.109

    Residual Variances

    URBAN 0.712 0.101 7.026 0.000 0.120BABIES 0.593 0.105 5.626 0.000 0.183

    R-SQUARE

    URBAN 0.288 0.101 2.842 0.004 0.120

    BABIES 0.407 0.105 3.867 0.000 0.183

  • 8/11/2019 Introduction to Bayesian SEM

    58/127

  • 8/11/2019 Introduction to Bayesian SEM

    59/127

    Lecture 2: Bayesian Estimation in the Presence of Missing Data

    A Closer Look at the Imputation Model

  • 8/11/2019 Introduction to Bayesian SEM

    60/127

    Multiple Imputation Using a General Imputation Model - 11 - Consistency

  • 8/11/2019 Introduction to Bayesian SEM

    61/127

    Stork Babies

    Stork

    Urban

    Babies

    Multiple Imputation Using a General Imputation Model - 12 - Non Consistency

  • 8/11/2019 Introduction to Bayesian SEM

    62/127

    Stork

    Urban

    Babies

    Stork

    Urban

    Babies

    Stork

    *

    Stork

    Multiple Imputation Using a General Imputation Model - 13- Non Consistency

  • 8/11/2019 Introduction to Bayesian SEM

    63/127

    Stork

    Urban

    Babies

    Stork Urban

    Babies

    Stork

    *

    Urban

    Summary

  • 8/11/2019 Introduction to Bayesian SEM

    64/127

    Imputation model and statistical model

    Does the imputation model render data that are missing at random?

    Are the imputation model and the statistical model congeneal?

    The combination of multiple imputation with estimator = ML is possiblein Mplus. The combination with estimator = Bayes is not possible.

    References Missing Data

    A non technical introduction to missing data analysis and multiple imputation can be found in:

  • 8/11/2019 Introduction to Bayesian SEM

    65/127

    A non-technical introduction to missing data analysis and multiple imputation can be found in:

    Schafer, J.L. And Graham, J.W. (2002). Missing data: Our view of the state of the

    art. Psychological Methods, 7, 147-177.

    Classic books about missing data analysis and multiple imputation are

    Rubin, D.B. (1987). Multiple imputation for nonresponse in surveys.New York: Wiley.

    Schafer, J.L. (1997).Analysis of incomplete multivariate data.London: Chapman & Hall.

    A contemporary book is:

    Van Buuren, S. (2012). Flexible imputation of missing data.Boca Raton: Chapmann & Hall/CRC.

    An important paper with respect to consistency is:

    Meng, X-L. (2002). Multiple imputation inferences with uncongenial sources of input.

    Statistical Science, 9, 538-573.

    The documentation provided by Mplus is:

    Asparouhov, T. and Muthen, B. (2010). Multiple imputation with Mplus.

    Mplusautomation is developed by Michael Hallquist. It can be found at www.statmodel.comunder the tab

    How-To choose Using Mplus via R.

    http://www.statmodel.com/http://www.statmodel.com/
  • 8/11/2019 Introduction to Bayesian SEM

    66/127

    Lecture 3: Model Fit

    Model Fit 1 The Covariance Matrix

  • 8/11/2019 Introduction to Bayesian SEM

    67/127

    ... ... ... ...

    20 3 7 13

    21 1 4 11

    22 2 4 9

    23 3 6 11

    24 4 6 9

    25 8 7 16

    26 11 9 1627 5 3 7

    28 5 5 8

    29 11 6 14

    30 6 6 10

    31 7 5 11

    32 8 8 1033 9 5 8

    34 2 2 1

    35 4 4 8

    ... ... ... ...

    ID Stork Urban Babies

    S U B

    S 10. 7

    U 4. 0 4. 8

    B 3. 4 5. 1 12. 2

    The observed covariance matrix displays the

    relation between each pair of variables in the

    data matrix.

    The model implied covariance matrix is a

    reconstruction of the observed covariance

    matrix using the statistical model at hand.

    Model Fit 2 What is model fit? Why is it important?

    9 model parameters

  • 8/11/2019 Introduction to Bayesian SEM

    68/127

    Stork Urban Babies

    Stork

    Urban

    Babies

    Stork

    Urban

    Babies

    S U B

    S 10. 7

    U 4. 0 4. 8

    B 3. 4 5. 1 12. 2

    S U B

    S 10. 7

    U 4. 0 4. 8

    B 3. 4 5. 1 12. 2

    S U B

    S 10. 7

    U 0 4. 8

    B 0 0 2. 2

    Observed = Model Implied

    Model Implied

    Model Implied

    Covariance Matrices9 model parameters

    7 model parameters

    6 model parameters

    Model Fit 3

  • 8/11/2019 Introduction to Bayesian SEM

    69/127

    The chi square test is computed for each statistical model. It is a function of

    - The observed covariance matrix- The model implied covariance matrix

    - The difference between the number of parameters of the current and the

    saturated statistical model.

    It is a measure of the size of the difference between the observed and implied

    covariance matrices.

    The larger the size of the difference, that is, the larger the chi square value, the

    less a statistical model is able to reconstruct the observed covariance matrix.

    The hypothesis that is tested using the chi square test states that

    the observed covariance matrix can adequately be reconstructed bythe current statistical model.

    Model Fit 4

    Using the observed data

  • 8/11/2019 Introduction to Bayesian SEM

    70/127

    Stork Urban BabiesUsing the observed data

    and the statistical model

    at hand

    Parameters are sampledM-V M-V ... M-V

    Used to Replicate Data

    and Impute observed

    missings

    Xobs-Xrep Xobs-Xrep ... Xobs-Xrep

    Used to compute the

    CHI-test using the

    parameters and theobserved-imputed

    and replicated data

    CHIobs-CHIrep CHIobs-CHIrep ... CHIobs-CHIrep

    The proportion of pairs in which CHIrep is larger than CHIobs is the posterior predictive p-value

    Model Fit 5

  • 8/11/2019 Introduction to Bayesian SEM

    71/127

    Model Fit 6

  • 8/11/2019 Introduction to Bayesian SEM

    72/127

    Stork Urban Babies

    MODEL FIT INFORMATION

    Number of Free Parameters 6

    Bayesian Posterior Predictive Checking using Chi-Square

    95% Confidence Interval for the Difference Between

    the Observed and the Replicated Chi-Square Values

    48.046 71.430

    Posterior Predictive P-Value 0.000

    Posterior predictive p-values around .50 indicate a model that

    for all practical purposes is well fitting. Note that this approach

    provides a rough model check and not a classical evaluation of

    an hypothesis using a p-value.

    References Model Fit

  • 8/11/2019 Introduction to Bayesian SEM

    73/127

    This model fit test was proposed by:

    Scheines, R., Hoijtink, H., and Boomsma, A. (1999). Bayesian Estimation and Testing

    of Structural Equation Models. Psychometrika, 64, 37-52.

    Who based it on the work by:

    Gelman, A., Meng, X-L, and Stern, H. (1996). Posterior predictive assessment of model

    fitness via realized discrepancies. Statistica Sinica, 6, 733-807.

    The documentation provided by Mplus is:

    Asparouhov, T. and Muthen, B. (2010). Bayesian analysis in Mplus: Technical Implementation.

  • 8/11/2019 Introduction to Bayesian SEM

    74/127

    Model Selection 1 Introduction

  • 8/11/2019 Introduction to Bayesian SEM

    75/127

    What is a model?

    Stork

    Urban

    Babies

    Stork

    Urban

    Babies

    Stork

    Urban

    Babies

    Model Selection 2 Introduction

  • 8/11/2019 Introduction to Bayesian SEM

    76/127

    What is a model?

    IQ

    AA

    LA

    A

    A

    A A

    A

    A

    L

    L

    L

    L

    L

    L

    h d l?

    Model Selection 3 Introduction

  • 8/11/2019 Introduction to Bayesian SEM

    77/127

    What is a model?

    Stork Babies

    Stork Babies

    Babies = a + b stork + error

    b

    b

    MODEL PRIORS:

    a ~ N(4,1)

    b ~ N(1,1)

    MODEL PRIORS:

    a ~ N(4,1)

    b ~ N(4,1)

  • 8/11/2019 Introduction to Bayesian SEM

    78/127

    Lecture 4: Model Selection Using the Bayes Factor, BIC and DIC

    What is the Goal of Model Selection?

    Model Selection 4 Introduction

  • 8/11/2019 Introduction to Bayesian SEM

    79/127

    What is the goal of model selection?

    To select the best model from the models that are under consideration.

    What is the best model?

    There are multiple answers to this question. Later in this lecture we will introduce

    two options:

    The model that has the smallest distance to the true model (DIC)

    The model that maximizes the probability of the data (Bayes factor and BIC)

    But all answers involve an evaluation of the misfit and complexity of each model.

    Model Selection 5 Introduction

  • 8/11/2019 Introduction to Bayesian SEM

    80/127

    What if the models are all wrong?

    What if the true model is not in the set of models under consideration?

    All models are wrong but some are useful

    Should the null-hypothesis be among the models under consideration?

    Should the alternative hypothesis be among the models under consideration?

    It can serve as a fail-safe for the models under consideration. A model withrestrictions is only a good model if it is better than the corresponding model

    without restrictions.

    =

    Model Selection 6 Introduction

  • 8/11/2019 Introduction to Bayesian SEM

    81/127

    Why is model selection consistent with the empirical cycle?

    Observation (exploratory research!!)

    Induction: from observations to a theory

    Deduction: deriving testable consequences fromthe theory, that is, models or hypotheses

    Testing: confrontation of models or hypotheses

    with empirical data

    Model Selection 7 Introduction

    Why is Bayesian inference consistent with the empirical cycle?

  • 8/11/2019 Introduction to Bayesian SEM

    82/127

    Why is Bayesian inference consistent with the empirical cycle?

    Observation (exploratory research!!)

    Induction: from observations to a theory

    Deduction: deriving testable consequences from

    the theory, that is, models or hypotheses

    Testing: confrontation of models or hypotheses

    with empirical data

    Prior knowledge and

    prior thinking

    Plausible models, probably

    not the true model

    Select the best model =

    the current state of knowledge

    Remember the earth is flat, the earth is round, and the earth is shaped somewhat

    like an American football. This too is sequential theory updating using new data as they

    become available.

  • 8/11/2019 Introduction to Bayesian SEM

    83/127

  • 8/11/2019 Introduction to Bayesian SEM

    84/127

    Lecture 4: Model Selection Using the Bayes Factor, BIC and DIC

    Information Criteria

    Model Selection 1 Information Criteria

  • 8/11/2019 Introduction to Bayesian SEM

    85/127

    IC = misfit + complexity

    The smaller the value of IC the better the model at hand. Because:

    We like well-fitting models

    We like parsimonious, that is specific, not-complex models because

    we can derive good predictions from them

    misfit is determined by the posterior distribution

    of the model parameters

    complexity is a function of the number of parameters in model

    and the amount of information in the prior distribution

    to illustrate the main features a number of examples will be given

    Model Selection 2 Information Criteria

  • 8/11/2019 Introduction to Bayesian SEM

    86/127

    x

    y

    What is the y-value?

    ?1

    ?2

    ?3

    Model Selection 3 Information Criteria

  • 8/11/2019 Introduction to Bayesian SEM

    87/127

    x

    y

    What is the y-value?

    ?1

    ?2

    ?3

    What is the fit of this model?

    What is the complexity of this model?

    Model Selection 4 Information Criteria

  • 8/11/2019 Introduction to Bayesian SEM

    88/127

    x

    y

    What is the y-value?

    ?1

    ?2

    ?3

    What is the fit of this model?

    What is the complexity of this model?

    Model Selection 5 Information Criteria Stork can not Predict Babies

  • 8/11/2019 Introduction to Bayesian SEM

    89/127

    Stork Babiespopulation correlation = 0, N=100

    Stork Babies Stork Babiescompeting models

    DIC = 274.67

    misfit = 268.45

    par = 3.11

    BIC = 282.30

    misfit = 268.38

    par = 3.00

    DIC = 272.23

    misfit = 268.65

    par = 1.89

    BIC = 277.61

    misfit = 268.39

    par = 2.00

    Model Selection 6 Information Criteria Stork can Predict Babies

  • 8/11/2019 Introduction to Bayesian SEM

    90/127

    Stork Babiespopulation correlation = .6, N=100

    Stork Babies Stork Babiescompeting models

    DIC = 229.54misfit = 223.32

    par = 3.11

    BIC = 237.07

    misfit = 223.25

    par = 3.00

    DIC = 273.48misfit = 269.70

    par = 1.89

    BIC = 278.86

    misfit = 269.65

    par = 2.00

    Model Selection 7 Information Criteria DIC and BIC can not Evaluate Models that Differ inthe Prior

  • 8/11/2019 Introduction to Bayesian SEM

    91/127

    TITLE: Illustrate misfit

    and complexity;

    MONTECARLO:NAMES ARE y x;

    NOBSERVATIONS = 10000;

    NREPS = 1;

    SEED = 123;

    MODEL POPULATION:y ON x * .6;

    [y * 0];

    y * .64;

    [x * 0];

    x * 1;

    analysis:

    estimator = bayes;

    MODEL PRIORS:

    a ~ N(.6,.01);

    MODEL: y ON x (a);

    OUTPUT: TECH9;

    Simulate a data matrix

    Analyse the simulated data matrix

    Specification of the

    simulation model

    Specification of the

    simulation study

    y = a + b x + error and error ~ N(0,s2)

    var y = b**2 var x + s2

    = .6**2 + .64

    = 1.0

    Why is b in this setup the correlation:

    Model Selection 8 Information Criteria DIC and BIC can not Evaluate Models that Differ inthe Prior

  • 8/11/2019 Introduction to Bayesian SEM

    92/127

    MODEL PRIORS:

    b ~ N(.6,.01)

    MODEL PRIORS:

    b ~ N(0,1000000)

    Stork Babiespopulation

    correlation = .6

    MODEL PRIORS:

    b ~ N(0,.01)

    N = 10000

    DIC = 24060.54

    par = 2.98

    BIC = 24082.21

    par = 3.00

    DIC = 24060.33

    par = 2.99

    BIC = 24081.98

    par = 3.00

    DIC = 24060.35

    par = 3.00

    BIC = 24081.98

    par = 3.00

    N = 500

    DIC = 1198.10

    par = 2.88

    BIC = 1210.95

    par = 3.00

    DIC = 1194.66

    par = 2.91

    BIC = 1207.48

    par = 3.00

    DIC = 1194.90

    par = 3.03

    BIC = 1207.47

    par = 3.00

    Model Selection 9 Information Criteria

  • 8/11/2019 Introduction to Bayesian SEM

    93/127

    Summary:

    Complexity and (mis) fit

    Complexity not adequate for models that differ in the prior but Bayes factor

    can deal with this situation. One example will be given during the last day

    of this courseDIC or BIC? Depends on missing values present or not. Depends on the error

    rates obtained using DIC and BIC.

  • 8/11/2019 Introduction to Bayesian SEM

    94/127

    Lecture 4: Model Selection Using the Bayes Factor, BIC and DIC

    Error Rates

    Model Selection 1 Error Rates

    b

  • 8/11/2019 Introduction to Bayesian SEM

    95/127

    Stork Babiesb

    M1: b = 0 DIC = 273

    M2: b 0 DIC = 229

    The conclusion is that M2 is a better model than M1

    But how certain are we about this?

    What are the probabilities of making an incorrect decision?

    M1: b = 0 BIC = 278

    M2: b 0 BIC = 237

    deltaDIC = 44 deltaBIC = 41

    M2: b 0M1: b= 0Populations:

    Model Selection 2 Error Rates - Frequency Evaluations

  • 8/11/2019 Introduction to Bayesian SEM

    96/127

    ... ...

    Data Matrices

    Sampled from

    Populations

    deltaDIC or deltaBIC xx xx ... xx xx xx ... xx

    Model Selection 3 Error Rates Frequency Evaluations

    M1: b = 0 DIC = 273M2: b 0 DIC = 229

  • 8/11/2019 Introduction to Bayesian SEM

    97/127

    DIC, 1000 replications

    18% > 05% < 0

    M2: b 0 DIC = 229

    deltaDIC = 44

    correlation = 0, N=100 correlation = .3, N=100DIC, 1000 replications

    Model Selection 4 Error Rates Frequency Evaluations

    M1: b = 0 BIC = 278

    M2: b 0 BIC = 237

  • 8/11/2019 Introduction to Bayesian SEM

    98/127

    BIC, 1000 replications

    3% > 0

    19% < 0

    correlation = 0, N=100 correlation = .3, N=100

    BIC, 1000 replications

    M2: b 0 BIC = 237

    deltaBIC = 41

    Model Selection 5 Error Rates A Simple Alternative For Frequency Evaluations

    TITLE: Error Rates; M1: b = 0 DIC = 273M2: b 0 DIC = 229

    M1: b = 0 BIC = 278

    M2: b 0 BIC = 237

  • 8/11/2019 Introduction to Bayesian SEM

    99/127

    MONTECARLO:

    NAMES ARE y x;

    NOBSERVATIONS = 100;

    NREPS = 1000;

    SEED = 123;

    RESULTS = PopH0AnH1.txt;

    MODEL POPULATION:

    y ON x * .3; !! y ON x * 0;[y * 0];

    y * .91; !! y * 1;

    [x * 0];

    x * 1;

    analysis:estimator = bayes;

    fbiter = 10000;

    MODEL: y ON x; !! y ON x @ 0;

    OUTPUT: TECH9;

    M2: b 0 DIC = 229

    deltaDIC = 44M2: b 0 BIC = 237

    deltaBIC = 41

    DIC, 1000 replications

    correlation = 0, N=100

    correlation = .3, N=100

    BIC, 1000 replications

    deltaDIC = 285.38 - 277.08

    = 8.30

    deltaBIC = 290.66 - 284.97

    = 5.69

    deltaDIC = 285.48 286.51= -1.03

    deltaBIC = 290.75 - 294.40= -3.65

    Model Selection 5 Error Rates

  • 8/11/2019 Introduction to Bayesian SEM

    100/127

    Summary:

    How to determine the populations from which to simulate data. Keep power

    analysis in the back of you mind. It is closely related.

    Mplus does not give the error rates. However, in combination with SPSS

    error rates can be computed. In Exercise 7 from the lab-meeting you have the

    opportunity to compute error rates in the context of multiple regression.Mplus give a very rough alternative for error rates.

    The error rates discussed here are unconditional: What is the probability of

    erroneous decisions if data matrices come from M1 or M2.

    Very interesting and very Bayesian are conditional error rates: What is the

    probability that M1 and M2 are true if deltaBIC is equal to 2.45 for the

    observed data. However, these probabilities are beyond the scope of thisworkshop.

    References Model Selection

    An introduction to model selection can be found in

  • 8/11/2019 Introduction to Bayesian SEM

    101/127

    An introduction to model selection can be found in

    Burnham, K.P. And Anderson, D.R. (2002). Model Selection and Multi-Model Inference.

    New York: Springer.

    The DIC was introduced by

    Spiegelhalter, D. J.,Best, N. G., Carlin, B. P., and Linde, A. V. D. (2002). Bayesian Measures

    of Model Complexity and Fit.Journal of the Royal Statistical Society, 64, 583639.

    The BIC is elaborate in

    Kass, R.E. and Raftery, A.E. (1995). Bayes factors.Journal of the American Statistical

    Association, 90,773-795.

    A comparison and overview can be found in

    Hamaker, E.L., Hattum, P. van, Kuiper, R., and Hoijtink, H. (2010). Model Selection based on

    information criteria in multilevel modelling. In. J. Hox and K. Roberts. Handbook of

    Advanced Multilevel Modelling. London, Taylor and Francis.

  • 8/11/2019 Introduction to Bayesian SEM

    102/127

    Lecture 5: An Application of Model Selection

    An Application of Model Selection 1

  • 8/11/2019 Introduction to Bayesian SEM

    103/127

    Introduction of the Twin data

    and

    Analysis of the first model

    An Application of Model Selection 2

    title: The Twin Data File;

  • 8/11/2019 Introduction to Bayesian SEM

    104/127

    data: file = twins.txt;

    variable:

    names = ID sex zygosity mothed fathed income eng1 eng2

    math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    usev = mothed fathed eng1 eng2

    math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    missing = all(999);

    model: fac by eng1 eng2 math1 math2 socsci1 socsci2 natsci1

    natsci2 vocab1 vocab2;

    fac on mothed fathed;

    analysis:

    estimator = bayes;process = 2;

    fbiter = 10000;

    point = median;

    output: standardized(stdyx) tech1 tech3 tech8 cinterval(hpd);

    plot: type = plot1 plot2 plot3;

    An Application of Model Selection 3

  • 8/11/2019 Introduction to Bayesian SEM

    105/127

    M-ED F-ED

    F

    M1 E1 S1 N1 V1M2 E2

    S2 N2 V2

    Model: 1 Factor and Education

    An Application of Model Selection 4

  • 8/11/2019 Introduction to Bayesian SEM

    106/127

    *** WARNING

    Data set contains cases with missing on x-variables.These cases were not included in the analysis.

    Number of cases with missing on x-variables: 26

    1 WARNING(S) FOUND IN THE INPUT INSTRUCTIONS

    For model comparison all analyses must be based on the same number of persons.

    Therefore you have to deal with the missing data if Mplus excluses persons from the

    analysis like it does in this example.

    If there are relatively few missings like here a quick solution is to do a single imutation

    using a sensible imputation model.

    If there are many missings you have to resort to the use of multiple imputation and

    DIC4. However, that is beyond the context of this course and also in statistical science

    an area that is under development.

    An Application of Model Selection 5

    title: Single Imputation of the Twin Data File;

  • 8/11/2019 Introduction to Bayesian SEM

    107/127

    data: FILE = twins.txt;

    variable:

    names = ID sex zygosity mothed fathed income eng1 eng2

    math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    usev = mothed fathed income eng1 eng2

    math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    auxiliary = ID sex zygosity;

    missing = all(999);

    data imputation:

    impute = mothed fathed income eng1 eng2

    math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    ndatasets = 1;

    thin = 1000;save = twinimp*.dat;

    analysis: estimator = bayes;

    fbiter = 10000;

    proces = 2;

    An Application of Model Selection 6

  • 8/11/2019 Introduction to Bayesian SEM

    108/127

    model: mothed with fathed income eng1 eng2

    math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    fathed with income eng1 eng2math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    income with eng1 eng2

    math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    eng1 with eng2 math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    eng2 with math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    math1 with math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    math2 with socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    socsci1 with socsci2 natsci1 natsci2 vocab1 vocab2;

    socsci2 with natsci1 natsci2 vocab1 vocab2;

    natsci1 with natsci2 vocab1 vocab2;

    natsci2 with vocab1 vocab2;

    vocab1 with vocab2;

    output: tech8;

    An Application of Model Selection 7

  • 8/11/2019 Introduction to Bayesian SEM

    109/127

    Analyse the first model using the single imputed data set

    An Application of Model Selection 8

    title: The Twin Data File;

  • 8/11/2019 Introduction to Bayesian SEM

    110/127

    data: file = twinimp1.dat;

    variable:

    names = mothed fathed income eng1 eng2 math1 math2

    socsci1 socsci2 natsci1 natsci2 vocab1 vocab2 ID sex zygosity;

    usev = mothed fathed eng1 eng2

    math1 math2 socsci1 socsci2 natsci1 natsci2 vocab1 vocab2;

    missing = all(999);

    model: fac by eng1 eng2 math1 math2 socsci1 socsci2 natsci1

    natsci2 vocab1 vocab2;

    fac on mothed fathed;

    analysis:

    estimator = bayes;process = 2;

    fbiter = 10000;

    point = median;

    output: standardized(stdyx) tech1 tech3 tech8 cinterval(hpd);

    plot: type = plot1 plot2 plot3;

    An Application of Model Selection 9

  • 8/11/2019 Introduction to Bayesian SEM

    111/127

    In itself these numbers have no meaning. They can only be compared to the

    same numbers computed for one or more competing models.

    Model: 1 Factor and Education

    Information Criterion

    Deviance (DIC) 46237.298

    Estimated Number of Parameters (pD) 31.861

    Bayesian (BIC) 46388.873

    An Application of Model Selection 10

  • 8/11/2019 Introduction to Bayesian SEM

    112/127

    M-ED F-ED

    F1

    M1 E1 S1 N1 V1 M2 E2 S2 N2 V2

    Model: 2 Factor and Education

    F2

    An Application of Model Selection 11

  • 8/11/2019 Introduction to Bayesian SEM

    113/127

    Income

    F

    M1 E1 S1 N1 V1 M2 E2 S2 N2 V2

    Model: 1 Factor and Income

    An Application of Model Selection 12

  • 8/11/2019 Introduction to Bayesian SEM

    114/127

    F1

    M1 E1 S1 N1 V1 M2 E2 S2 N2 V2

    Model: 2 Factor and Income

    F2

    Income

    Model: 1 Factor and Education

    An Application of Model Selection 13

    Model: 2 Factor and Education

  • 8/11/2019 Introduction to Bayesian SEM

    115/127

    Model: 1 Factor and Education

    Information Criterion

    Deviance (DIC) 46237.298

    Estim number of Par (pD) 31.861

    Bayesian (BIC) 46388.873

    Model: 2 Factor and Education

    Information Criterion

    Deviance (DIC) 46008.581

    Estim Number of Par (pD) 34.841

    Bayesian (BIC) 46174.343

    Model: 2 Factor and IncomeModel: 1 Factor and Income

    Information Criterion

    Deviance (DIC) 46031.495

    Estim Number of Par (pD) 32.818

    Bayesian (BIC) 46187.846

    Information Criterion

    Deviance (DIC) 46263.315

    Estim Number of Par (pD) 30.940

    Bayesian (BIC) 46410.004

    An Application of Model Selection 14

  • 8/11/2019 Introduction to Bayesian SEM

    116/127

    Are the differences in BIC and DIC convincing?

    Should we determine the error rates?

    Should we determine the conditional error rates?

    An Application of Model Selection 15

    Posterior One-Tailed 95% C.I.

    Estimate S D P Value Lower 2 5% Upper 2 5%

  • 8/11/2019 Introduction to Bayesian SEM

    117/127

    Estimate S.D. P-Value Lower 2.5% Upper 2.5%

    FAC1 BY

    ENG1 0.765 0.016 0.000 0.732 0.796

    MATH1 0.691 0.020 0.000 0.651 0.728SOCSCI1 0.862 0.011 0.000 0.840 0.883

    NATSCI1 0.770 0.016 0.000 0.738 0.801

    VOCAB1 0.850 0.012 0.000 0.827 0.873

    FAC2 BY

    ENG2 0.748 0.017 0.000 0.713 0.780

    MATH2 0.739 0.018 0.000 0.703 0.772SOCSCI2 0.868 0.011 0.000 0.847 0.888

    NATSCI2 0.762 0.016 0.000 0.729 0.793

    VOCAB2 0.862 0.011 0.000 0.839 0.883

    FAC1 ON

    MOTHED 0.098 0.042 0.010 0.016 0.180

    FATHED 0.236 0.041 0.000 0.154 0.316FAC2 ON

    MOTHED 0.088 0.042 0.018 0.006 0.170

    FATHED 0.256 0.041 0.000 0.177 0.336

    FAC2 WITH

    FAC1 0.870 0.013 0.000 0.843 0.895

    An Application of Model Selection 16

  • 8/11/2019 Introduction to Bayesian SEM

    118/127

    Posterior One-Tailed 95% C.I.

    Estimate S.D. P-Value Lower 2.5% Upper 2.5%

    Intercepts

    ENG1 3.423 0.140 0.000 3.149 3.698

    ENG2 3.733 0.145 0.000 3.445 4.015

    MATH1 2.731 0.121 0.000 2.484 2.958

    MATH2 2.785 0.126 0.000 2.537 3.033

    SOCSCI1 3.450 0.148 0.000 3.151 3.732

    SOCSCI2 3.502 0.149 0.000 3.201 3.786

    Residual Variances

    ENG1 0.415 0.025 0.000 0.367 0.464

    ENG2 0.441 0.026 0.000 0.392 0.492

    MATH1 0.523 0.027 0.000 0.470 0.577

    MATH2 0.455 0.026 0.000 0.405 0.507SOCSCI1 0.256 0.019 0.000 0.220 0.295

    SOCSCI2 0.246 0.018 0.000 0.211 0.283

    FAC1 0.907 0.020 0.000 0.868 0.944

    FAC2 0.900 0.020 0.000 0.858 0.937

    An Application of Model Selection 17

    R-SQUARE

  • 8/11/2019 Introduction to Bayesian SEM

    119/127

    Posterior One-Tailed 95% C.I.

    Variable Estimate S.D. P-Value Lower 2.5% Upper 2.5%

    ENG1 0.585 0.025 0.000 0.536 0.633

    ENG2 0.559 0.026 0.000 0.508 0.608

    MATH1 0.477 0.027 0.000 0.423 0.530

    MATH2 0.545 0.026 0.000 0.493 0.595

    SOCSCI1 0.744 0.019 0.000 0.705 0.780

    SOCSCI2 0.754 0.018 0.000 0.717 0.789

    NATSCI1 0.592 0.025 0.000 0.544 0.640

    NATSCI2 0.580 0.025 0.000 0.531 0.629

    VOCAB1 0.723 0.020 0.000 0.682 0.761

    VOCAB2 0.742 0.019 0.000 0.703 0.778

    Posterior One-Tailed 95% C.I.

    Variable Estimate S.D. P-Value Lower 2.5% Upper 2.5%

    FAC1 0.093 0.020 0.000 0.056 0.132

    FAC2 0.100 0.020 0.000 0.063 0.142

    An Application of Model Selection 18

  • 8/11/2019 Introduction to Bayesian SEM

    120/127

    And now the empirical cycle has to be restarted !!!!

    References An Application of Model Selection

  • 8/11/2019 Introduction to Bayesian SEM

    121/127

    Loehlin, J.C. and Nichols, R.C. (1976). Genes, Environment and Personality.

    Austin TX: University of Texas Press.

  • 8/11/2019 Introduction to Bayesian SEM

    122/127

    Lecture 6: Model Selection in the Presence of Missing Data

    Model Selection and Missing Data 1

    ID Stork Urban Babies

  • 8/11/2019 Introduction to Bayesian SEM

    123/127

    ... ... ... ...

    20 3 7 13

    21 1 4 1122 999 4 9

    23 3 6 11

    24 4 6 9

    25 8 7 16

    26 11 999 999

    27 5 3 728 5 5 8

    29 999 6 14

    30 6 6 10

    31 7 5 999

    32 8 8 10

    33 999 999 834 2 2 1

    35 4 4 8

    ... ... ... ...

    ID Stork Urban Babies

    Model Selection and Missing Data 2

    Situation 1: The data are MAR when the statistical model is equal to the imputation model

  • 8/11/2019 Introduction to Bayesian SEM

    124/127

    Situation 1: The data are MAR when the statistical model is equal to the imputation model

    In Mplus, both the misfit and the complexity of the DIC are computed using only

    the observed data, and, parameter values sampled and estimated using the statistical

    model to impute the missing values.

    This is a valid procedure that can be used without hesitation.

    DIC = misfit + complexity = misfit + estimated number of parameters

    Model Selection and Missing Data 3

  • 8/11/2019 Introduction to Bayesian SEM

    125/127

    BIC = misfit + complexity = misfit + log N x P

    In Mplus in the misfit of the BIC is computed using only the observed data, and,

    parameter values sampled and estimated using the statistical model to impute

    the missing values.

    The complexity is estimated as the log of the number of persons multipliedwith the number of parameters in a statistical model. As to yet it is unknown how

    N should be determined in the presence of missing data. Mplus uses the sample

    size. But this is an ad-hoc and unmotivated choice.

    Currently it is not advised to used the BIC in the presence of missing data.

    Model Selection and Missing Data 4

  • 8/11/2019 Introduction to Bayesian SEM

    126/127

    Situation 2: The statistical model is consistent with the imputation model, and, given

    the imputation model the missing values are MAR

    Using a three step procedure Mplus can be used to compute the DIC accounting

    for the fact that some of the data are missing:

    1. Multiply impute the data using the imputation model.

    2. For each imputed data matrix compute the DIC using Mplus

    3. Average the DICs obtained for the M imputed data matrices

    The results is DIC4 as discussed by Celeux at al. (2006). This is not the definite answer

    to the computation of the DIC in the presence of missing data, but at least there is

    some support for this approach in the scientific literature. One is well advised to use

    the MonteCarlo approach from Mplus to evaluate in each new situation how well the

    DIC4 performs. It is beyond the scope of this course to show how this can be done.

    Note that using MplusAutomation this can relatively easily be implemented (as opposed

    to doing this manually). However, this is also beyond the scope of this course.

    References Model Selection and Missing Data

  • 8/11/2019 Introduction to Bayesian SEM

    127/127

    A paper about the computation of DIC in the presence of missing data

    Celeux, G., Forbes, F., Robert, C.P., and Titterington, D.M. (2006). Deviance Information

    Criteria for Missing Data Models. Bayesian Analysis, 1, 651-674.

    A paper about the difference between the imputation and analysis model in the

    context of missing data

    Kuiper, R.M. and Hoijtink, H. (2011). How to Handle Missing Data for Predictor Selection in

    Regression Models Using the AIC. Statistics Neerlandica, 65,489-506.

    Mplusautomation is developed by Michael Hallquist. If you google for CRAN

    MPLUSAUTOMATION you will find the website from which the R package anddocumentation can be downloaded.