intro ba yes

Upload: gianfrancoqf

Post on 10-Jan-2016

217 views

Category:

Documents


0 download

DESCRIPTION

Intro

TRANSCRIPT

  • A Gentle Introduction to Bayesian Analysis: Applications toDevelopmental Research

    Rens van de SchootUtrecht University and North-West University

    David KaplanUniversity of WisconsinMadison

    Jaap DenissenUniversity of Tilburg

    Jens B. AsendorpfHumboldt-University Berlin

    Franz J. NeyerFriedrich Schiller University of Jena

    Marcel A.G. van AkenUtrecht University

    Bayesian statistical methods are becoming ever more popular in applied and fundamental research. In thisstudy a gentle introduction to Bayesian analysis is provided. It is shown under what circumstances it is attrac-tive to use Bayesian estimation, and how to interpret properly the results. First, the ingredients underlyingBayesian methods are introduced using a simplied example. Thereafter, the advantages and pitfalls of thespecication of prior knowledge are discussed. To illustrate Bayesian methods explained in this study, in asecond example a series of studies that examine the theoretical framework of dynamic interactionism are con-sidered. In the Discussion the advantages and disadvantages of using Bayesian statistics are reviewed, andguidelines on how to report on Bayesian statistics are provided.

    it is clear that it is not possible to think aboutlearning from experience and acting on it withoutcoming to terms with Bayes theorem.

    Jerome Corneld (in De Finetti, 1974a)

    In this study, we provide a gentle introduction toBayesian analysis and the Bayesian terminologywithout the use of formulas. We show why it isattractive to adopt a Bayesian perspective and, morepractically, how to estimate a model from a Bayesianperspective using background knowledge in theactual data analysis and how to interpret the results.

    Many developmental researchers might never haveheard of Bayesian statistics, or if they have, theymost likely have never used it for their own dataanalysis. However, Bayesian statistics is becomingmore common in social and behavioral scienceresearch. As stated by Kruschke (2011a), in a specialissue of Perspectives on Psychological Science:

    whereas the 20th century was dominated byNHST [null hypothesis signicance testing], the21st century is becoming Bayesian. (p. 272)

    Bayesian methods are also slowly becoming usedin developmental research. For example, a number ofBayesian articles have been published in Child Devel-opment (n = 5), Developmental Psychology (n = 7), andDevelopment and Psychopathology (n = 5) in the last5 years (e.g., Meeus, Van de Schoot, Keijsers,

    We would like to thank the following researchers for provid-ing feedback on our manuscript to improve the readability:Geertjan Overbeek, Esmee Verhulp, Marielle Zondervan-Zwijn-enburg, Christiane Gentzel, Koen Perryck, and Joris Broere. Therst author was supported by a grant from The NetherlandsOrganization for Scientic Research: NWO-VENI-451-11-008. Thesecond author was supported by the Institute of Education Sci-ences, U.S. Department of Education, through GrantR305D110001 to The University of WisconsinMadison. Theopinions expressed are those of the authors and do not necessar-ily represent views of the Institute or the U.S. Department ofEducation.Correspondence concerning this article should be addressed to

    Rens van de Schoot, Department of Methods and Statistics,Utrecht University, P.O. Box 80.140, 3508TC Utrecht, The Nether-lands. Electronic mail may be sent to [email protected].

    2013 The AuthorsChild Development 2013 Society for Research in Child Development, Inc.This is an open access article under the terms of the Creative CommonsAttribution-NonCommercial-NoDerivs License, which permits use and dis-tribution in any medium, provided the original work is properly cited, theuse is non-commercial and no modications or adaptations are made.All rights reserved. 0009-3920/2013/xxxx-xxxxDOI: 10.1111/cdev.12169

    Child Development, xxxx 2013, Volume 00, Number 0, Pages 119

  • Schwartz, & Branje, 2010; Rowe, Raudenbush, &Goldin-Meadow, 2012). The increase in Bayesianapplications is not just taking place in developmentalpsychology but also in psychology in general. Thisincrease is specically due to the availability ofBayesian computational methods in popular soft-ware packages such as Amos (Arbuckle, 2006),Mplus v6 (Muthen & Muthen, 19982012; for theBayesian methods in Mplus see Kaplan & Depaoli,2012; Muthen & Asparouhov, 2012), WinBUGS(Lunn, Thomas, Best, & Spiegelhalter, 2000), and alarge number of packages within the R statisticalcomputing environment (Albert, 2009).

    Of specic concern to substantive researchers,the Bayesian paradigm offers a very different viewof hypothesis testing (e.g., Kaplan & Depaoli,2012, 2013; Walker, Gustafson, & Frimer, 2007;Zhang, Hamagami, Wang, Grimm, & Nesselroade,2007). Specically, Bayesian approaches allowresearchers to incorporate background knowledgeinto their analyses instead of testing essentially thesame null hypothesis over and over again, ignor-ing the lessons of previous studies. In contrast, sta-tistical methods based on the frequentist (classical)paradigm (i.e., the default approach in most soft-ware) often involve testing the null hypothesis. Inplain terms, the null hypothesis states that noth-ing is going on. This hypothesis might be a badstarting point because, based on previous research,it is almost always expected that something isgoing on. Replication is an important and indis-pensible tool in psychology in general (Asendorpfet al., 2013), and Bayesian methods t within thisframework because background knowledge is inte-grated into the statistical model. As a result, theplausibility of previous research ndings can beevaluated in relation to new data, which makesthe proposed method an interesting tool for conr-matory strategies.

    The organization of this study is as follows: First,we discuss probability in the frequentist and Bayes-ian framework, followed by a description, in generalterms, of the essential ingredients of a Bayesiananalysis using a simple example. To illustrate Bayes-ian inference, we reanalyze a series of studies on thetheoretical framework of dynamic interactionismwhere individuals are believed to develop through adynamic and reciprocal transaction between person-ality and the environment. Thereby, we apply theBayesian approach to a structural equation model-ing (SEM) framework within an area of develop-mental psychology where theory building andreplication play a strong role. We conclude with adiscussion of the advantages of adopting a Bayesian

    point of view in the context of developmentalresearch. In the online supporting informationappendices we provide an introduction to the com-putational machinery of Bayesian statistics, and weprovide annotated syntax for running Bayesiananalysis using Mplus, WinBugs, and Amos in ouronline supporting information appendices.

    Probability

    Most researchers recognize the important role thatstatistical analyses play in addressing researchquestions. However, not all researchers are awareof the theories of probability that underlie modelestimation, as well as the practical differencesbetween these theories. These two theories arereferred to as the frequentist paradigm and the subjec-tive probability paradigm.

    Conventional approaches to developmentalresearch derive from the frequentist paradigm ofstatistics, advocated mainly by R. A. Fisher, JerzyNeyman, and Egon Pearson. This paradigm associ-ates probability with long-run frequency. Thecanonical example of long-run frequency is thenotion of an innite coin toss. A sample space ofpossible outcomes (heads and tails) is enumerated,and probability is the proportion of the outcome(say heads) over the number of coin tosses.

    The Bayesian paradigm, in contrast, interpretsprobability as the subjective experience of uncer-tainty (De Finetti, 1974b). Bayes theorem is amodel for learning from data, as suggested in theCorneld quote at the beginning of this study. Inthis paradigm, the classic example of the subjectiveexperience of uncertainty is the notion of placing abet. Here, unlike with the frequentist paradigm,there is no notion of innitely repeating an event ofinterest. Rather, placing a betfor example, on abaseball game or horse raceinvolves using asmuch prior information as possible as well as per-sonal judgment. Once the outcome is revealed, thenprior information is updated. This is the model oflearning from experience (data) that is the essenceof the Corneld quote at the beginning of thisstudy. Table 1 provides an overview of similaritiesand differences between frequentist and Bayesianstatistics.

    The goal of statistics is to use the data to saysomething about the population. In estimating, forexample, the mean of some variable in a popula-tion, the mean of the sample data is a statistic(i.e., estimated mean) and the unknown populationmean is the actual parameter of interest. Similarly,

    2 van de Schoot et al.

  • the regression coefcients from a regression analy-sis remain unknown parameters estimated fromdata. We refer to means, regression coefcients,residual variances, and so on as unknown parame-ters in a model. Using software like SPSS, Amos, orMplus, these unknown parameters can be esti-mated. One can choose the type of estimator for thecomputation, for example, maximum likelihood(ML) estimation or Bayesian estimation.

    The key difference between Bayesian statisticalinference and frequentist (e.g., ML estimation) sta-tistical methods concerns the nature of theunknown parameters. In the frequentist framework,a parameter of interest is assumed to be unknown,but xed. That is, it is assumed that in the popula-tion there is only one true population parameter,for example, one true mean or one true regressioncoefcient. In the Bayesian view of subjective prob-ability, all unknown parameters are treated asuncertain and therefore should be described by aprobability distribution.

    The Ingredients of Bayesian Statistics

    There are three essential ingredients underlyingBayesian statistics rst described by T. Bayes in1774 (Bayes & Price, 1763; Stigler, 1986). Briey,these ingredients can be described as follows (thesewill be explained in more detail in the followingsections).

    The rst ingredient is the background knowl-edge on the parameters of the model being tested.

    This rst ingredient refers to all knowledge avail-able before seeing the data and is captured in theso-called prior distribution, for example, a normaldistribution. The variance of this prior distributionreects our level of uncertainty about the popula-tion value of the parameter of interest: The largerthe variance, the more uncertain we are. The priorvariance is expressed as precision, which is simplythe inverse of the variance. The smaller the priorvariance, the higher the precision, and the morecondent one is that the prior mean reects thepopulation mean. In this study we will vary thespecication of the prior distribution to evaluateits inuence on the nal results.

    The second ingredient is the information in thedata themselves. It is the observed evidenceexpressed in terms of the likelihood function of thedata given the parameters. In other words, the like-lihood function asks:

    Given a set of parameters, such as the meanand/or the variance, what is the likelihood orprobability of the data in hand?

    The third ingredient is based on combining therst two ingredients, which is called posterior infer-ence. Both (1) and (2) are combined via Bayes theo-rem (described in more detail in the online AppendixS1) and are summarized by the so-called posteriordistribution, which is a compromise of the priorknowledge and the observed evidence. The posteriordistribution reects ones updated knowledge, bal-ancing prior knowledge with observed data.

    Table 1Overview of the Similarities and Differences Between Frequentist and Bayesian Statistics

    Frequentist statistics Bayesian statistics

    Denition of the p value The probability of observingthe same or more extremedata assuming that the nullhypothesis is true in the population

    The probability of the (null) hypothesis

    Large samples needed? Usually, when normal theory-basedmethods are used

    Not necessarily

    Inclusion of priorknowledge possible?

    No Yes

    Nature of the parametersin the model

    Unknown but xed Unknown and therefore random

    Population parameter One true value A distribution of values reecting uncertaintyUncertainty is dened by The sampling distribution based on

    the idea of innite repeated samplingProbability distribution for the population parameter

    Estimated intervals Condence interval: Over an innityof samples taken from the population,95% of these contain the true population value

    Credibility interval: A 95% probability that thepopulation value is within the limits of theinterval

    A Gentle Introduction to Bayesian Analysis 3

  • These three ingredients constitute Bayes theorem,which states, in words, that our updated under-standing of parameters of interest given our currentdata depends on our prior knowledge about theparameters of interest weighted by the currentevidence given those parameters of interest. In onlineAppendix S1 we elaborate on the theorem. In whatfollows, we will explain Bayes theorem and its threeingredients in detail.

    Prior Knowledge

    Why Dene Prior Knowledge?

    The key epistemological reason concerns our viewthat progress in science generally comes about bylearning from previous research ndings andincorporating information from these researchndings into our present studies. Often informa-tion gleaned from previous research is incorpo-rated into our choice of designs, variables to bemeasured, or conceptual diagrams to be drawn.With the Bayesian methodology our prior beliefsare made explicit, and are moderated by theactual data in hand. (Kaplan & Depaoli, 2013,p. 412)

    How to Dene Prior Knowledge?

    The data we have in our hands moderate ourprior beliefs regarding the parameters and thus leadto updated beliefs. But how do we specify priors?The choice of a prior is based on how much infor-mation we believe we have prior to the data collec-tion and how accurate we believe that informationto be. There are roughly two scenarios. First, insome cases we may not be in possession of enoughprior information to aid in drawing posterior infer-ences. From a Bayesian point of view, this lack ofinformation is still important to consider and incor-porate into our statistical specications.

    In other words, it is equally important to quan-tify our ignorance as it is to quantify our cumu-lative understanding of a problem at hand.(Kaplan & Depaoli, 2013, p. 412)

    Second, in some cases we may have considerableprior information regarding the value of a parame-ter and our sense of the accuracy around that value.For example, after decades of research on the rela-tion between, say, parent socioeconomic status andstudent achievement, we may, with a bit of effort,

    be able to provide a fairly accurate prior distribu-tion on the parameter that measures that relation.Prior information can also be obtained from meta-analyses and also previous waves of surveys. Thesesources of information regarding priors are objec-tive in the sense that others can verify the sourceof the prior information. This should not be con-fused with the notion of objective priors, whichconstitute pure ignorance of background knowl-edge. Often the so-called uniform distribution is usedto express an objective prior. For some subjectiveBayesians, priors can come from any source: objec-tive or otherwise. The issue just described isreferred to as the elicitation problem and hasbeen nicely discussed in OHagan et al. (2006; seealso Rietbergen, Klugkist, Janssen, Moons, & Hoijt-ink, 2011; Van Wesel, 2011). If one is unsure aboutthe prior distribution, a sensitivity analysis is rec-ommended (e.g., Gelman, Carlin, Stern, & Rubin,2004). In such an analysis, the results of differentprior specications are compared to inspect theinuence of the prior. We will demonstrate sensitiv-ity analyses in our examples.

    An Example

    Let us use a very simple example to introducethe prior specication. We will only estimate twoparameters: the mean and variance of readingskills, for example, measured at entry to kindergar-ten for children in a state-funded prekindergartenprogram. To introduce the Bayesian methodology,we will rst focus on this extremely simple case,and only thereafter will we consider a more com-plex (and often more realistic) example. In onlineAppendices S2S4 we provide the syntax for ana-lyzing this example using Mplus, WinBugs, andAmos.

    The prior reects our knowledge about themean reading skills score before observing the cur-rent data. Different priors can be constructedreecting different types of prior knowledge.Throughout the study we will use different priorswith different levels of subjectivity to illustrate theeffects of using background knowledge. In the sec-tion covering our real-life example, we base ourprior specication on previous research results, butin the current section we discuss several hypotheti-cal prior specications. In Figure 1, six differentdistributions of possible reading skills scores aredisplayed representing degrees of prior knowledge.These distributions could reect expert knowledgeand/or results from previous similar research stud-ies or meta-analyses.

    4 van de Schoot et al.

  • Noninformative Prior Distributions

    In Figure 1a, it is assumed that we do not knowanything about mean reading skills score and everyvalue of the mean reading skills score in our databetween minus innity and plus innity is equallylikely. Such a distribution is often referred to as anoninformative prior distribution.

    A frequentist analysis of the problem wouldignore our accumulated knowledge and let the dataspeak for themselvesas if there has never beenany prior research on reading skills. However, itcould be reasonably argued that empirical psychol-ogy has accumulated a considerable amount ofempirical information about the distribution ofreading skills scores in the population.

    Informative Prior Distributions

    From a Bayesian perspective, it seems natural toincorporate what has been learned so far into ouranalysis. This implies that we specify that the meanof reading skills mean has a specic distribution. Theparameters of the prior distribution are referred to ashyperparameters. If for the mean reading score a nor-mal distribution is specied for the prior distribution,the hyperparameters are the prior mean and theprior precision. Thus, based on previous researchone can specify the expected prior mean. If readingskills are assessed by a standardized test with a meanof 100, we hypothesize that reading skills scores closeto 100 are more likely to occur in our data than val-ues further away from 100, but every value in the

    (A)

    (B)

    (C)

    (D)

    (E)

    (F)

    Figure 1. A priori beliefs about the distribution of reading skills scores in the population.

    A Gentle Introduction to Bayesian Analysis 5

  • entire range between minus and plus innity is stillallowed.

    Also, the prior precision needs to be specied,which reects the certainty or uncertainty about theprior mean. The more certain we are, the smaller wecan specify the prior variance and, as such, the preci-sion of our prior will increase. Such a prior distribu-tion encodes our existing knowledge and is referredto as a subjective or informative prior distribution. If alow precision is specied, such as Prior 2 in Fig-ure 1b, it is often referred to as a low-informativeprior distribution. Note that it is the prior varianceof the prior distribution we are considering here andnot the variance of the mean of the reading skillscore.

    One could question how realistic Priors 1 and 2in Figures 1a and 1b are, if a reading skills score isthe variable of interest. What is a negative readingskills score? And can reading skills result in anypositive score? To assess reading skills, a readingtest could be used. What if we use a reading testwhere the minimum possible score is 40 and themaximum possible score is 180? When using such atest, Priors 1 and 2 are not really sensible. In Fig-ure 1c, a third prior distribution is specied wherevalues outside the range 40180 are not allowedand within this range obtaining every reading skillsscore is equally likely.

    Perhaps we can include even more informationin our prior distribution, with the goal to increaseprecision and therefore contribute to more accurateestimates. As said before, we assume that our dataare obtained from a randomly selected sample fromthe general population. In that case we mightexpect a mean of reading skills scores that is closeto 100 to be more probable than extremely low orhigh scores. In Figure 1d, a prior distribution is dis-played that represents this expectation. Figure 1eshows that we can increase the precision of ourprior distribution by increasing its prior variance.

    In Figure 1f, a prior distribution is speciedwhere a very low score of reading skills is expectedand we are very certain about obtaining such amean score in our data. This is reected by a priordistribution with high precision, that is, a smallprior variance. If we sample from the general popu-lation, such a prior distribution would be highlyunlikely to be supported by the data and, in thiscase, would be a misspecied prior. If, however,we have specied inclusion criteria for our sample,for example, only children with reading skills scoreslower than 80 are included because this is the targetgroup, then Prior 6 is correctly specied and Prior 5would be misspecied.

    To summarize, the prior reects our knowledgeabout the parameters of our model before observingthe current data. If one does not want to specifyany prior knowledge, then noninformative priorscan be specied and as a result, the nal resultswill not be inuenced by the specication of theprior. In the Bayesian literature, this approach tousing noninformative priors is referred to asobjective Bayesian statistics (Press, 2003) becauseonly the data determine the posterior results. Usingthe objective Bayesian method, one can still benetfrom using Bayesian statistics as will be explainedthroughout the study.

    If a low-informative prior is specied, the resultsare hardly inuenced by the specication of theprior, particularly for large samples. The more priorinformation is added, the more subjective itbecomes. Subjective priors are benecial because: (a)ndings from previous research can be incorpo-rated into the analyses and (b) Bayesian credibleintervals will be smaller. Both benets will be dis-cussed more thoroughly in the section where wediscuss our posterior results. Note that the termsubjective has been a source of controversy betweenBayesians and frequentists. We prefer the terminformative and argue that the use of any kind ofprior be warranted by appealing to empirical evi-dence. However, for this study, we stay with theterm subjective because it is more commonly used inthe applied and theoretical literature.

    Note that for each and every parameter in themodel, a prior distribution needs to be specied. Aswe have specied a prior distribution for the meanof reading skills scores we also have to specify aprior distribution for the variance/standard devia-tion of reading skills. This is because for Bayesianstatistics, we assume a distribution for each andevery parameter including (co)variances. As wemight have less prior expectations about the vari-ance of reading skills, we might want to specify alow-informative prior distribution. If we specify theprior for the (residual) variance term in such a waythat it can only obtain positive values, the obtainedposterior distribution can never have negative val-ues, such as a negative (residual) variance.

    Observed Evidence

    After specifying the prior distribution for allparameters in the model, one can begin analyzingthe actual data. Let us say we have information onthe reading skills scores for 20 children. We usedthe software BIEMS (Mulder, Hoijtink, & de Leeuw,2012) for generating an exact data set where the

    6 van de Schoot et al.

  • mean and standard deviation of reading skillsscores were manually specied. The second compo-nent of Bayesian analysis is the observed evidencefor our parameters in the data (i.e., the samplemean and variance of the reading skills scores).This information is summarized by the likelihoodfunction containing the information about theparameters given the data set (i.e., akin to a histo-gram of possible values). The likelihood is a func-tion reecting what the most likely values are forthe unknown parameters, given the data. Note thatthe likelihood function is also obtained when non-Bayesian analyses are conducted using ML estima-tion. In our hypothetical example, the sample meanappears to be 102. So, given the data, a readingskills score of 102 is the most likely value of thepopulation mean; that is, the likelihood functionachieves its maximum for this value.

    Posterior Distribution

    With the prior distribution and current data inhand, these are then combined via Bayes theoremto form the so-called posterior distribution. Speci-cally, in this case, Bayes theorem states that ourprior knowledge is updated by the current data toyield updated knowledge in the form of the poster-ior distribution. That is, we can use our prior infor-mation in estimating the population mean,variance, and other aspects of the distribution forthis sample.

    In most cases, obtaining the posterior distribu-tion is done by simulation, using the so-called Mar-kov chain Monte Carlo (MCMC) methods. Thegeneral idea of MCMC is that instead of attemptingto analytically solve for the point estimates, likewith ML estimation, an iterative procedure to esti-mate the parameters. For a more detailed introduc-tion, see Kruschke (2011b, 2013), and for a moretechnical introduction, see Lynch (2007) or Gelmanet al. (2004). See online Appendix S1 for a briefintroduction.

    The Posterior Distribution in the Example

    The graphs in Figure 2 demonstrate how theprior information and the information in the dataare combined in the posterior distribution. Themore information is specied in the prior distribu-tion, the smaller the posterior distribution of read-ing skills becomes. As long as the prior mean isuninformative (see Figure 2a), the result obtainedfor the mean with ML estimation and the posteriormean will always be approximately similar. If an

    informative prior is specied, the posterior mean isonly similar to the ML mean if the prior mean is(relatively) similar to the ML estimate (see Fig-ures 2b to 2e). If the prior mean is different fromthe ML mean (Prior 6), the posterior mean will shifttoward the prior (see Figure 2f).

    The precision of the prior distribution for thereading skills scores inuences the posteriordistribution. If a noninformative prior is specied,the variance of the posterior distribution is notinuenced (see Figure 2a). The more certain one isabout the prior, the smaller the variance, and hencemore peaked the posterior will be (cf. Figures 2dand 2e).

    Posterior Probability Intervals (PPIs)

    Let us now take a closer look at the actualparameter estimates. We analyzed our data set withMplus, Amos, and WinBUGS. Not all prior speci-cations are available in each software package; thishas been indicated in Table 2 by using subscripts.In Mplus, the default prior distributions for meansand regression coefcients are normal distributionswith a prior mean of zero and an innitive largeprior variance, that is, low precision (see Figure 1b).If the prior precision of a specic parameter is setlow enough, then the prior in Figure 1a will beapproximated. The other prior specications inFigure 1 are not available in Mplus. In Amos, how-ever, one can specify a uniform prior, like inFigure 1a, but also normal distributions, like in Fig-ure 1b, and a uniform distribution using the bound-aries of the underlying scale, like in Figure 1c. Ifprior distributions of Figures 1d to 1f are of inter-est, one needs to switch to WinBUGS. We assumedno prior information for the variance of readingskills scores and we used the default settings inAmos and Mplus, but in WinBUGS we used alow-informative gamma distribution.

    In the table, the posterior mean reading skillsscore and the PPIs are displayed for the six differ-ent types of prior specications for our hypotheti-cal example. Recall that the frequentist condenceinterval is based on the assumption of a verylarge number of repeated samples from the popu-lation that are characterized by a xed andunknown parameter. For any given sample, wecan obtain the sample mean and compute, forexample, a 95% condence interval. The correctfrequentist interpretation is that 95% of these con-dence intervals capture the true parameter underthe null hypothesis. Unfortunately, results of thefrequentist paradigm are often misunderstood (see

    A Gentle Introduction to Bayesian Analysis 7

  • Gigerenzer, 2004). For example, the frequentist-based 95% condence interval is often interpretedas meaning that there is a 95% chance that aparameter of interest lies between an upper andlower limit, whereas the correct interpretation isthat 95 of 100 replications of exactly the sameexperiment capture the xed but unknown para-meter, assuming the alternative hypothesis aboutthat parameter is true.

    The Bayesian counterpart of the frequentist con-dence interval is the PPI, also referred to as thecredibility interval. The PPI is the 95% probabilitythat in the population the parameter lies betweenthe two values. Note, however, that the PPI and thecondence interval may numerically be similar and

    might serve related inferential goals, but they arenot mathematical equivalent and conceptually quitedifferent. We argue that the PPI is easier to commu-nicate because it is actually the probability that acertain parameter lies between two numbers, whichis not the denition of a classical condence interval(see also Table 1).

    Posterior Results of the Example

    The posterior results are inuenced by the priorspecication. The higher the prior precession, thesmaller the posterior variance and the more certainone can be about the results. Let us examine thisrelation using our example.

    (A)

    (B)

    (C)

    (D)

    (E)

    (F)

    Figure 2. The likelihood function and posterior distributions for six different specications of the prior distribution.

    8 van de Schoot et al.

  • When the prior in Figure 2a is used, the poster-ior distribution is hardly inuenced by the priordistribution. The estimates for the mean readingskills score obtained from the likelihood function(i.e., the ML results) and posterior result are closeto each other (see the rst two rows of Table 2). Ifa normal distribution for the prior is used, as inFigure 2b, the 95% PPI is only inuenced when ahigh-precision prior is specied; see Table 2 andcompare the results of Priors 2a, 2b, and 2c, whereonly for Prior 2c the resulting PPI is smaller com-pared to the other priors we discussed so far. Thismakes sense because for the latter prior we speci-ed a highly informative distribution; that is, thevariance of the prior distribution is quite smallreecting strong prior beliefs. If the prior ofFigure 2c is used, the results are similar to the MLresults. When the prior of Figure 2c is combinedwith specifying a normal distribution, the PPIdecreases again. If we increase the prior precisionof the mean even further, for example, for Prior 5,the PPI decreases even more. If the prior mean ismisspecied, like in Figure 2f, the posterior meanwill be affected; see the results in Table 2 of Priors6a and 6b. The difference between Priors 6a and 6breects the degree of certainty we have about theprior mean. For Prior 6a we are rather sure themean was 80, which is reected by a high priorprecision. For Prior 6b we are less sure, and weused a low prior precision. The posterior mean ofPrior 6b is therefore closer to the ML estimate whencompared to the posterior mean of Prior 6a.

    To summarize, the more prior information isadded to the model, the smaller the 95% PPIsbecomes, which is a nice feature of the Bayesianmethodology. That is, after confronting the priorknowledge with the data one can be more certain

    about the obtained results when compared tofrequentist method. This way, science can be trulyaccumulative. However, when the prior is misspeci-ed, the posterior results are affected because theposterior results are always a compromise betweenthe prior distribution and the likelihood function ofthe data.

    An Empirical Example

    To illustrate the Bayesian methods explained inthis study, we consider a series of articles thatstudy the theoretical framework of dynamic interac-tionism where individuals are believed to developthrough a dynamic and reciprocal transactionbetween personality and the environment (e.g.,quality of social relationships; Caspi, 1998). Themain aim of the examined research program was tostudy the reciprocal associations between personal-ity and relationships over time. In the case of extra-version, for example, an extraverted adolescentmight seek out a peer group where extraversion isvalued and reinforced, and as such becomes moreextraverted.

    A theory that explains environmental effects onpersonality is the social investment theory (Roberts,Wood, & Smith, 2005). This theory predicts that thesuccessful fulllment of societal roles (in work, rela-tionships, health) leads to strengthening of those per-sonality dimensions that are relevant for thisfulllment. For this study, the social investment the-ory is important because it can be hypothesized thateffects fullling societal roles on personality are stron-ger in emerging adulthood when these roles are morecentral than in earlier phases of adolescence. At thetime of the rst article in our series (Asendorpf &Wilpers, 1998), however, the predictions of social

    Table 2Posterior Results Obtained With Mplus, AMOS, or WINBUGS (n = 20)

    Prior type Prior precision used (prior mean was always 100) Posterior mean reading skills score 95% CI/PPI

    ML 102.00 94.42109.57Prior 1AW 101.99 94.35109.62Prior 2aM AW Large variance, i.e., Var. = 100 101.99 94.40109.42Prior 2bM AW Medium variance, i.e., Var. = 10 101.99 94.89109.07Prior 2 cM AW Small variance, i.e., Var. = 1 102.00 100.12103.87Prior 3AW 102.03 94.22109.71Prior 4W Medium variance, i.e., Var. = 10 102.00 97.76106.80Prior 5W Small variance, i.e., Var. = 1 102.00 100.20103.90Prior 6aW Large variance, i.e., Var. = 100 99.37 92.47106.10Prior 6bW Medium variance, i.e., Var. = 10 86.56 80.1792.47

    Note. CI = condence interval; PPI = posterior probability interval; ML = maximum likelihood results; SD = standard deviation;M = posterior mean obtained using Mplus; A = posterior mean obtained using Amos; W = posterior mean obtained using WinBUGS.

    A Gentle Introduction to Bayesian Analysis 9

  • investment theory were not yet published. Instead,the authors started with a theoretical notion by McC-rae and Costa (1996) that personality inuenceswould be more important in predicting social relation-ships than vice versa. At the time, however, the ideadid not yet have much support because:

    empirical evidence on the relative strength ofpersonality effects on relationships and viceversa is surprisingly limited. (p. 1532)

    Asendorpf and Wilpers (1998) investigated forthe rst time personality and relationships overtime in a sample of young students (N = 132) aftertheir transition to university. The main conclusionof their analyses was that personality inuencedchange in social relationships, but not vice versa.Neyer and Asendorpf (2001) studied personalityrelationship transactions using now a large repre-sentative sample of young adults from all overGermany (age between 18 and 30 years; N = 489).Based on the previous results, Neyer and Ase-ndorpf

    hypothesized that personality effects would havea clear superiority over relationships effects.(p. 1193)

    Consistent with Asendorpf and Wilpers (1998),Neyer and Asendorpf (2001) concluded that onceinitial correlations were controlled, personality traitspredicted change in various aspects of social rela-tionships, whereas effects of antecedent relation-ships on personality were rare and restricted tovery specic relationships with ones preschoolchildren (p. 1200). Asendorpf and van Aken (2003)continued working on studies into personalityrelationship transaction, now on 12-year-olds whowere followed up until age 17 (N = 174), and triedto replicate key ndings of these earlier studies.Asendorpf and van Aken conrmed previous nd-ings and concluded that the stronger effect was anextraversion effect on perceived support from peers.This result replicates, once more, similar ndings inadulthood.

    Sturaro, Denissen, van Aken, and Asendorpf(2010), once again, investigated the personalityrela-tionship transaction model. The main goal of the2010 study was to replicate the personalityrelation-ship transaction results in an older sample (1723 years) compared to the study of Asendorpf andvan Aken (2003; 1217 years). Sturaro et al. foundsome contradictory results compared to the previ-ously described studies.

    [The ve-factor theory] predicts signicant pathsfrom personality to change in social relationshipquality, whereas it does not predict social rela-tionship quality to have an impact on personalitychange. Contrary to our expectation, however,personality did not predict changes in relation-ship quality. (p. 8)

    In conclusion, the four articles described aboveclearly illustrate how theory building works indaily practice. By using the quotes from thesearticles we have seen that researchers do haveprior knowledge in their Introduction and Discus-sion sections. However, all these articles ignoredthis prior knowledge because they were based onfrequentist statistics that test the null hypothesisthat parameters are equal to zero. Using Bayesianstatistics, we will include prior knowledge in theanalysis by specifying a relevant prior distribu-tion.

    Method

    Description of the Neyer and Asendorpf (2001) Data

    Participants were part of a longitudinal study ofyoung adults. This sample started in 1995 (when par-ticipants were 1830 years old; Mage = 24.4 years,SD = 3.7) with 637 participants who were largelyrepresentative of the population of adult Germans.The sample was reassessed 4 years later (returnrate = 76%). The longitudinal sample included 489participants (N = 226 females).

    To simplify the models we focus here on onlytwo variables: extraversion as an indicator for per-sonality and closeness with/support by friends asan indicator for relationship quality. Quality ofrelationships was assessed at both occasions usinga social network inventory, where respondentswere asked to recall those persons who play animportant role in their lives. In the presentinvestigation, we reanalyzed the relevant data onthe felt closeness with friends. Participants namedon average 4.82 friends (SD = 4.22) and 5.62friends (SD = 4.72) at the rst and secondoccasions, respectively. Closeness was measuredwith the item: How close do you feel to thisperson? (1 = very distant to 5 = very close). Theratings were averaged across all friends. Extraver-sion was assessed using the German version ofthe NEO-FFI (Borkenau & Ostendorf, 1993). Inter-nal consistencies at both measurement occasionswere .76 and .78, and the rank order stability wasr = .61.

    10 van de Schoot et al.

  • Description of the Sturaro et al. (2010) andAsendorpf and van Aken (2003) Data

    Participants were part of the Munich Longitudi-nal Study on the Genesis of Individual Competen-cies (Weinert & Schneider, 1999). This samplestarted in 1984 (when participants were 3 to 4 yearsold) with 230 children from the German city ofMunich. Participants were selected from a broadrange of neighborhoods to ensure representative-ness. This study focuses on reassessments of thesample at ages 12, 17, and 23. At age 12, 186 partic-ipants were still part of the sample; at age 17 thiswas true for 174 participants. Because theAsendorpf and van Aken (2003) publication focusedon participants with complete data at both wavesof data collection, the present analyses focus on the174 individuals with data at ages 12 and 17. At age23, a total of 154 participants were still in the sam-ple. For this study, analyses focus on a subset of148 individuals who provided personality self-ratings.

    Measures selected for this study were taken incorrespondence with the cited articles. At age 12,support by friends was measured as support fromclassroom friends. For the category of classroomfriends, an average of 3.0 individuals was listed.For each of these individuals, participants rated thesupportiveness of the relationship in terms ofinstrumental help, intimacy, esteem enhancement,and reliability (three items each; items of the rstthree scales were adapted from the NRI; Furman &Buhrmester, 1985). Ratings were averaged across allfriends. At age 17, the same scales were repeatedonly for the best friend in class. In both years, ifparticipants did not have any classroom friends,they received a score of 1 for support (the lowestpossible). At age 23, support was measured usingan ego-centered Social Network Questionnaire.Like the Sturaro et al. (2010) article, we focus hereon the average quality with same-sex peers becausethis measure was deemed most comparable withthe peer measures at ages 12 and 17.

    Extraversion at ages 12 and 17 was assessed withbipolar adjective pairs (Ostendorf, 1990; sampleitem: unsociable vs. outgoing). At age 23, extraver-sion was assessed with a scale from the NEO-FFI(Borkenau & Ostendorf, 1993; sample item: I liketo have a lot of people around me). As reportedby Sturaro et al. (2010), in a separate sample of 641college students, the Ostendorf Scale for Extraver-sion and the NEO-FFI Scale for Extraversion arecorrelated almost perfectly after controlling for theunreliability of the scales (r = .92).

    Analytic Strategy

    We used Mplus to analyze the model displayedin Figure 3. Two crucial elements when applyingBayesian statistics have to be discussed in the Ana-lytic Strategy section of a Bayesian article: (a) whichpriors were used and where did these priors camefrom? And (b) how was convergence assessed (seealso online Appendix S1)? Concerning the latter, weused the GelmanRubin criterion (for more infor-mation, see Gelman et al., 2004) to monitor conver-gence, which is the default setting of Mplus.However, as recommended by Hox, van de Schoot,and Matthijsse (2012), we set the cutoff value stric-ter (i.e., bconvergence = .01) than the default valueof .05. We also specied a minimum number of iter-ations by using biterations = (10,000), we requestedmultiple chains of the Gibbs sampler by usingchains = 8, and we requested starting values basedon the ML estimates by using stvalues = ml. More-over, we inspected all the trace plots manually tocheck whether all chains converged to the same tar-get distribution and whether all iterations used forobtaining the posterior were based on stable chains.

    Concerning the specication of the priors, wedeveloped two scenarios. In the rst scenario, weonly focus on those data sets with similar agegroups. Therefore, we rst reanalyze the data ofNeyer and Asendorpf (2001) without using priorknowledge. Thereafter, we reanalyze the data ofSturaro et al. (2010) using prior information basedon the data of Neyer and Asendorpf; both data setscontain young adults between 17 and 30 years of

    T1Extraversion

    Hypothesized to be >0

    Hypothesized to be 0

    T2Extraversion

    T1Friends

    T2Friends

    e1

    e2r1

    r2

    1

    2

    3

    4

    Figure 3. Cross-lagged panel model where r1 is the correlationbetween Extraversion measured at Wave 1 and Friends mea-sured at Wave 1, and r2 is the autocorrelation between the resid-uals of two variables at Wave 2, b1 and b2 are the stability paths,and b3 and b4 are the cross-loadings. T1 and T2 refer to ages 12and 17, respectively, for the Asendorpf and van Aken (2003)data, but to ages 17 and 23, respectively, for the Sturaro, Denis-sen, van Aken, and Asendorpf (2010) data.

    A Gentle Introduction to Bayesian Analysis 11

  • age. In the second scenario, we assume the relationbetween personality and social relationships is inde-pendent of age and we reanalyze the data of Stur-aro et al. using prior information taken from Neyerand Asendorpf and from Asendorpf and van Aken(2003). In this second scenario we make a strongassumption, namely, that the cross-lagged effectsfor young adolescents are equal to the cross-laggedeffects of young adults. This assumption implicatessimilar developmental trajectories across agegroups. We come back to these issues in the Discus-sion section.

    Scenario 1

    Based on previous research ndings, Asendorpfand Wilpers (1998) hypothesized the model shownin Figure 3. As this study described the rstattempt to study these variables over time, Ase-ndorpf and Wilpers would probably have specied(had they used Bayesian statistics) an uninformativeprior distribution reecting no prior knowledge(see also Figures 1a and 1b). Neyer and Asendorpf(2001) gathered a general sample from the Germanpopulation and analyzed their data. As Neyer andAsendorpf used different testretest intervals ascompared to Asendorpf and Wilpers, we cannotuse the results from Asendorpf and Wilpers asprior specications. So, when reanalyzing Neyerand Asendorpf, we will use the default settings ofMplus, that is, noninformative prior distributions;see Figure 1b and see Model 1 (Neyer & Asendorpf,2001 | Uninf. Prior) in the second column ofTable 3. Note that | means condition on, so thestatement is read as the results of the Neyer and

    Asendorpf (2001) data condition on an uninformativeprior. Sturaro et al. (2010) continued working onthe cross-lagged panel model. In the third columnof Table 3, the results of Model 2 (Sturaro et al.,2010 | Uninf. prior) are shown when using nonin-formative prior distribution (which does not takethe previous results obtained by Neyer and Asen-dorpf into account). What if we used our updatingprocedure and use the information obtained inModel 1 as the starting point for our current analy-sis? That is, for Model 3 (Sturaro et al., 2010 | Neyer& Asendorpf, 2001) we used for the regression coef-cients the posterior means and standard deviationsfrom Model 1 as prior specications for Model 3a.Noninformative priors were used for residual vari-ances and for the covariances. This was donebecause the residuals pick up omitted variables,which almost by denition are unknown. Then, wewould have a hard time knowing what their priorrelation would be to the outcome or to other vari-ables in model. This way the prior for thesubsequent study is a rough approximation to theposterior from the previous study.

    As pointed out by one of the reviewers, there isan assumption being made that the multiparameterposterior from a previous study can be accuratelyrepresented by independent marginal distributionson each parameter. But the posterior distributioncaptures correlations between parameters, and inregression models the coefcients can be quitestrongly correlated (depending on the data). If onewould have strong prior believes on the correla-tions among parameters, this could be representedin a Bayesian hierarchical model. However, becausethese correlations are data specic in regression,

    Table 3Posterior Results for Scenario 1

    Parameters

    Model 1: Neyer & Asendorpf (2001)data without prior knowledge

    Model 2: Sturaro et al. (2010) datawithout prior knowledge

    Model 3: Sturaro et al. (2010) datawith priors based on Model 1

    Estimate (SD) 95% PPI Estimate (SD) 95% PPI Estimate (SD) 95% PPI

    b1 0.605 (0.037) 0.5320.676 0.291 (0.063) 0.1690.424 0.333 (0.060) 0.2280.449b2 0.293 (0.047) 0.1990.386 0.157 (0.103) 0.0420.364 0.168 (0.092) 0.0100.352b3 0.131 (0.046) 0.0430.222 0.029 (0.079) 0.1320.180 0.044 (0.074) 0.1030.186b4 0.026 (0.039) 0.1000.051 0.303 (0.081) 0.1440.462 0.247 (0.075) 0.1010.393

    Model t Lower CI Upper CI Lower CI Upper CI Lower CI Upper CI

    95% CI for difference between observedand replicated chi-square values

    14.398 16.188 12.595 17.263 12.735 17.298

    ppp value .534 .453 .473

    Note. See Figure 3 for the model being estimated and the interpretation of the parameters. Posterior SD = standard deviation;PPI = posterior probability interval; CI = condence interval; ppp value = posterior predictive p value.

    12 van de Schoot et al.

  • and data and model specic in SEM (see Kaplan &Wenger, 1993), it is unlikely that we would be ableto elicit such priors. Therefore, the easiest approachis to specify independent marginal priors and letthe posterior capture the empirical correlations.

    Scenario 2

    Assuming the cross-lagged panel effects to benot age dependent, Asendorpf and van Aken (2003)could have used the results from Neyer and Ase-ndorpf (2001) as the starting point for their ownanalyses, which in turn could have been the start-ing point for Sturaro et al. (2010). In the second col-umn of Table 4, the results, without assuming priorknowledge, of Asendorpf and van Aken are dis-played, that is, Model 4 (Asendorpf & van Aken,2003 | Uninf. prior). In the third column, that is,Model 5 (Asendorpf & van Aken, 2003 | Neyer & Asen-dorpf, 2001), the data of Asendorpf and van Akenwere updated using prior information taken fromModel 1. In the last step, that is, Model 6 (Sturaroet al., 2010 | Asendorpf & van Aken, 2003 | Neyer &Asendorpf, 2001), the data of Sturaro et al. wereupdated using the prior information taken fromModel 5. In sum, the models tested are as follows:

    Uninformative priors

    Neyer & Asendorpf, 2001 | Uninf. priorAsendorpf & van Aken, 2003 | Uninf. priorSturaro et al., 2010 | Uninf. prior

    Scenario 1: Age specicity when updating knowledgeSturaro et al., 2010 | Neyer & Asendorpf, 2001

    Scenario 2: Age invariance when updating knowledgeAsendorpf & van Aken, 2003 | Neyer & Asen-dorpf, 2001Sturaro et al., 2010 | Asendorpf & van Aken,2003 | Neyer & Asendorpf, 2001

    Model Fit

    When using SEM models to analyze the researchquestions, one is not interested in a single hypothe-sis test, but instead in the evaluation of the entiremodel. Model t in the Bayesian context relates toassessing the predictive accuracy of a model, and isreferred to as posterior predictive checking (Gelmanet al., 2004). The general idea behind posterior pre-dictive checking is that there should be little, if any,discrepancy between data generated by the modeland the actual data itself. Any deviation betweenthe data generated by the model and the actualdata suggests possible model misspecication. Inessence, posterior predictive checking is a methodfor assessing the specication quality of the modelfrom the viewpoint of predictive accuracy. A com-plete discussion of Bayesian model evaluation isbeyond the scope of this study; we refer the inter-ested reader to Kaplan and Depaoli (2012, 2013).

    One approach to quantifying model t is tocompute Bayesian posterior predictive p values (pppvalue). The model test statistic, the chi-squarevalue, is calculated on the basis of the data is com-pared to the same test statistic, but then dened forthe simulated data. Then, the ppp value is denedas the proportion of chi-square values obtained in

    Table 4Posterior Results for Scenario 2

    Parameters

    Model 4: Asendorpf & van Aken (2003)data without prior

    knowledge

    Model 5: Asendorpf & van Aken (2003)data with priors based

    on Model 1

    Model 6: Sturaro et al. (2010)data with priors based on

    Model 5

    Estimate (SD) 95% PPI Estimate (SD) 95% PPI Estimate (SD) 95% PPI

    b1 0.512 (0.069) 0.3760.649 0.537 (0.059) 0.4240.654 0.314 (0.061) 0.1970.441b2 0.115 (0.083) 0.0490.277 0.139 (0.077) 0.0110.288 0.144 (0.096) 0.0390.336b3 0.217 (0.106) 0.0060.426 0.195 (0.094) 0.0070.380 0.044 (0.076) 0.1090.191b4 0.072 (0.055) 0.0360.179 0.065 (0.052) 0.0400.168 0.270 (0.075) 0.1210.418

    Model t Lower CI Upper CI Lower CI Upper CI Lower CI Upper CI

    95% CI for difference between observedand replicated chi-square values

    16.253 17.102 16.041 15.625 12.712 16.991

    ppp value .515 .517 .473

    Note. See Figure 3 for the model being estimated and the interpretation of the parameters. Posterior SD = standard deviation;PPI = posterior probability interval; CI = condence interval; ppp value = posterior predictive p value.

    A Gentle Introduction to Bayesian Analysis 13

  • the simulated data that exceed that of the actualdata. The ppp values around .50 indicate a well-tting model.

    Posterior Results

    Scenario 1

    In Table 3 the posterior results are displayed forthe rst scenario. Consider the posterior regressioncoefcient for the stability path of Friends (b2),which is estimated as .293 in Model 1; Models 2and 3 represent different ways of updating thisknowledge. Model 2 ignores the results by Neyerand Asendorpf (2001) and achieves a stability pathof .157. Model 3, in contrast, bases the prior distri-butions on the posterior results of Model 1 (Neyer& Asendorpf, 2001 | Uninf. prior) and arrives at astability path of .168, which does not differ thatmuch from the original outcome. If we compare thestandard deviation of the stability path b2 ofFriends between Model 2 and Model 3 (Sturaroet al., 2010 | Neyer & Asendorpf, 2001), we canobserve that the latter is more precise (decrease invariance from .103 to .092). Consequently, the 95%PPI changes from [.042, .364] in Model 2 to[.010, .352] in Model 3. Thus, the width of the PPIdecreased and, after taking the knowledge gainedfrom Model 1 into account, we are more condentabout the results of the stability path of Friends.

    The cross-lagged effect between FriendsT1 ? Extraversion T2 (b4) is estimated as .026 inModel 1 (Neyer & Asendorpf, 2001 | Uninf. prior),but as .303 in Model 2 (Sturaro et al., 2010 | Uninf.prior). When Model 1 is used as input for the priorspecication for the Sturaro et al. (2010) data,Model 3 (Sturaro et al., 2010 | Neyer & Asendorpf,2001), the coefcient is inuenced by the prior, andthe coefcient becomes .247 again with a smallerPPI. Furthermore, in both Models 2 and 3, thecross-lagged effect between ExtraversionT1 ? Friends T2 (b3) in Model 3 appears not to besignicant.

    Scenario 2

    Concerning Scenario 2 the results of the updatingprocedure are shown in Table 4. Compare Models4 (Asendorpf & van Aken, 2003 | Uninf. prior) and5 (Asendorpf & van Aken, 2003 | Neyer & Asen-dorpf, 2001) where the data of Asendorpf and vanAken (2003) were analyzed with noninformativepriors and priors based on Model 1, respectively.Again, in Model 5 the PPIs decreased when com-

    pared to Model 4 because of the use of subjectivepriors. In Model 6 (Sturaro et al., 2010 | Asendorpf& van Aken, 2003 | Neyer & Asendorpf, 2001), thedata of Sturaro et al. (2010) were analyzed usingpriors based on Model 5; consequently, the poster-ior results of Model 6 are different from the resultsof Sturaro et al. in Model 2 where no prior knowl-edge was assumed.

    Discussion of Empirical Example

    Inspection of the main parameters, the cross-lagged effects, b3 and b4, indicate that there are hardlyany differences between Scenarios 1 and 2. Appar-ently, the results of Sturaro et al. (2010) are robustirrespective of the specic updating procedure. How-ever, there are differences between the updated out-comes and the original results. That is, Models 3 and6 have smaller standard deviations and narrowerPPIs compared to Model 2. Thus, using prior knowl-edge in the analyses led to more certainty about theoutcomes of the analyses and we can be more con-dent in the conclusions, namely, that Sturaro et al.found opposite effects to Neyer and Asendorpf(2001). This should be reassuring for those who mightthink that Bayesian analysis is too conservative whenit comes to revising previous knowledge. Therefore,the bottom line remains that effects occurringbetween ages 17 and 23 are different from thosefound when ages 1830 were used as range. Theadvantage of using priors is that the condence inter-vals became smaller such that the effect of differentages (1723 vs. 1830) on the cross-lagged results canbe more trusted than before.

    Because developmental mechanisms may varyover time, any (reciprocal) effects found betweenages 12 and 17 are not necessarily found betweenages 17 and 23. Although the Sturaro et al. (2010)study was originally designed as a replication ofthe Asendorpf and van Aken (2003) study, resultsturned out to be more consistent with the alterna-tive explanation of the social investment theory ofRoberts et al. (2005), namely, that between ages 17and 23 there might be more change in personalitybecause of signicant changes in social roles. Inspite of the fact that we have chosen for the exactreplication of the Asendorpf and van Aken study(because this was the stated goal of the Sturaroet al., 2010, study), developmental researchers ofcourse should not blindly assume that previousresearch ndings from different age periods can beused to derive priors. After all, development isoften multifaceted and complex and looking onlyfor regularity might make the discovery of interest-

    14 van de Schoot et al.

  • ing discontinuities more difcult. In such cases,however, this sense of indetermination needs to beacknowledged explicitly and translated into priordistributions that are atter than would be typicalin research elds in which time periods are moreinterchangeable.

    Discussion

    One might wonder when it is useful to use Bayes-ian methods instead of using the default approach.Indeed, there are circumstances in which bothmethods produce very similar results, but there arealso situation that both methods should producedifferent outcomes. Advantages of Bayesian statis-tics over frequentist statistics are well documentedin the literature (Jaynes, 2003; Kaplan & Depaoli,2012, 2013; Kruschke, 2011a, 2011b; Lee & Wagen-makers, 2005; Van de Schoot, Verhoeven, & Hoijt-ink, 2012; Wagenmakers, 2007) and we will justhighlight some of those advantages here.

    Theoretical Advantages

    When the sample size is large and all parametersare normally distributed, the results between MLestimation and Bayesian estimation are not likely toproduce numerically different outcomes. However,as we discussed in our study, there are some theo-retical differences.

    1. The interpretation of the results is very differ-ent; for example, see our discussion on con-dence intervals. We believe that Bayesianresults are more intuitive because the focus ofBayesian estimation is on predictive accuracyrather than up or down signicance testing.Also, the Bayesian framework eliminatesmany of the contradictions associated withconventional hypothesis testing (e.g., Van deSchoot et al., 2011).

    2. The Bayesian framework offers a more directexpression of uncertainty, including completeignorance. A major difference between frequ-entist and Bayesian methods is that only thelatter can incorporate background knowledge(or lack thereof) into the analyses by means ofthe prior distribution. In our study we haveprovided several examples on how priors canbe specied and we demonstrated how thepriors might inuence the results.

    3. Updating knowledge: Another important argu-ment for using Bayesian statistics is that it

    allows updating knowledge instead of testinga null hypothesis over and over again. Oneimportant point is that having to specifypriors forces one to better reect on the simi-larities and differences between previous stud-ies and ones own study, for example, interms of age groups and retest interval (notonly in terms of length but also in terms ofdevelopmental processes). Moreover, theBayesian paradigm sometimes leads to repli-cating others conclusions or even strengthen-ing them (i.e., in our case), but sometimesleads to different or even opposite conclu-sions. We believe this is what science is allabout: updating ones knowledge.

    Practical Advantages

    In addition to the theoretical advantages, thereare also many practical advantages for using Bayes-ian methods. We will discuss some of them.

    1. Eliminating the worry about small sample sizesalbeit with possible sensitivity to priors (as itshould be). Lee and Song (2004) showed in asimulation study that with ML estimation thesample size should be at least 4 or 5 times thenumber of parameters, but when Bayes wasused this ratio decreased to 2 or 3 times thenumber of parameters. Also, Hox et al. (2012)showed that in multilevel designs at least 50clusters are needed on the between levelwhen ML estimation is used, but only 20 forBayes. In both studies default prior settingswere used and the gain in sample size reduc-tion is even larger when subjective priors arespecied. It should be noted that the smallerthe sample size, the bigger the inuence ofthe prior specication and the more can begained from specifying subjective priors.

    2. When the sample size is small, it is often hardto attain statistical signicant or meaningfulresults (e.g., Button, et al., 2013). In a cumula-tive series of studies where coefcients falljust below signicance, then if all results showa trend in the same direction, Bayesian meth-ods would produce a (slowly) increasing con-dence regarding the coefcientsmore sothan frequentist methods.

    3. Handling of non-normal parameters: If parame-ters are not normally distributed, Bayesianmethods provide more accurate results asthey can deal with asymmetric distributions.An important example is the indirect effect of

    A Gentle Introduction to Bayesian Analysis 15

  • a mediation analysis, which is a multiplicationof two regression coefcients and thereforealways skewed. Therefore, the standard errorsand the condence interval computed withthe classical Baron and Kenny method or theSobel test for mediation analyses are alwaysbiased (see Zhao, Lynch, & Chen, 2010, for anin-depth discussion). The same argumentshold for moderation analyses where an inter-action variable is computed to represent themoderation effect. Alternatives are bootstrap-ping, or Bayesian statistics (see Yuan & MacK-innon, 2009). The reason that Bayesoutperforms frequentist methods is that theBayesian method does not assume or requirenormal distributions underlying the parame-ters of a model.

    4. Unlikely results: Using Bayesian statistics it ispossible to guard against overinterpretinghighly unlikely results. For example, in astudy in which one is studying somethingvery unlikely (e.g., extrasensory perception;see the discussion in Wagenmakers, Wetzels,Borsboom, & van der Maas, 2011), one canspecify the priors accordingly (i.e., coef-cient = 0, with high precision). This makes itless likely that a spurious effect is identied.A frequentist study is less specic in thisregard. Another example is using small vari-ance priors for cross-loadings in conrmatoryfactor analyses or in testing for measurementinvariance (see Muthen & Asparouhov, 2012).The opposite might also be wanted, considera study in which one is studying somethingvery likely (e.g., intelligence predicting schoolachievement). The Bayesian method wouldnow be more conservative when it comes torefuting the association.

    5. Elimination of inadmissible parameters: With MLestimation it often happens that parameters areestimated with implausible values, for exam-ple, negative residual variances or correlationslarger than 1. Because of the shape of the priordistribution for variances/covariances, suchinadmissible parameters cannot occur. Itshould be noted, however, that often a negativeresidual variance is due to overtting themodel and Bayesian estimation does not solvethis issue. Bayesian statistics does not providea golden solution to all of ones modelingissues.

    In general, we do not want to make the argu-ment for using Bayesian statistics because of its

    superiority but rather one of epistemology. Thatis, following De Finetti (1974a), we have to come togrips as to what probability is: long-run frequencyof a particular result or the uncertainty of ourknowledge? This epistemological issue is more fun-damental than the divergence of results between thetwo approaches, which is often less than dramatic.

    Limitations and Future Research

    Of course, the Bayesian paradigm is not withoutassumptions and limitations. The most often heardcritique is the inuence of the prior specication,which might be chosen because of opportune rea-sons. This could open the door to adjusting resultsto ones hypotheses by assuming priors consistentwith these hypotheses. However, our results mightsomewhat assuage this critique: The Sturaro et al.(2010) results were upheld even when incorporatingpriors that assumed an inverse pattern of results.Nevertheless, it is absolutely necessary for a seriousarticle based on Bayesian analysis to be transparentwith regard to which priors were used and why.Reviewers and editors should require this informa-tion.

    Another discussion among Bayesian statisticiansis which prior distribution to use. So far, we onlydiscussed the uniform distribution and the normaldistribution. Many more distributions are availableas an alternative for the normal distribution, forexample, a t distribution with heavier tails to dealwith outliers (only available in WinBUGS). It mightbe difcult for nonstatisticians to choose among allthese, sometimes exotic, distributions. The defaultdistributions available in Amos/Mplus are suitablefor most models. If an analyst requires a nonstan-dard or unusual distributions, be aware that mostdistributions are not (yet) available in Amos/Mplusand it might be necessary to switch to other soft-ware, such as WinBUGS or programs available inR. Another critique is that in Bayesian analysis weassume that every parameter has a distribution inthe population, even (co)variances. Frequentist stat-isticians simply do not agree on this assumption.They assume that in the population there is onlyone true xed parameter value. This discussion isnot the scope of our study and we would like torefer interested readers to the philosophical litera-tureparticularly, Howson and Urbach (2006)formore information.

    A practical disadvantage might be that computa-tional time increases because iterative samplingtechniques are used. Fortunately, computer proces-sors are becoming more efcient as well as cheaper

    16 van de Schoot et al.

  • to produce. Accordingly, the availability of ade-quate hardware to run complex models is becomingless of a bottleneck, at least in resource-rich coun-tries. On the other hand, Bayesian analysis is ableto handle highly complex models efciently whenfrequentist approaches to estimation (i.e., ML) oftenfail (e.g., McArdle, Grimm, Hamagami, Bowles, &Meredith, 2009). This is especially the case for mod-els with categorical data or random effect whereBayes might even be faster than the default numericintegration procedures most often used.

    Guidelines

    There are a few guidelines one should followwhen reporting the analytic strategy and posteriorresults in a manuscript:

    1. Always make clear which priors were used inthe analyses so that the results can be repli-cated. This holds for all the parameters in themodel.

    a. If the default settings are used, it is neces-sary to refer to an article/manual wherethese defaults are specied.

    b. If subjective/informative priors are used, asubsection has to be included in the Analyti-cal Strategy section where the priors arespecied and it is should be explicitly statedwhere they come from. Tables could beused if many different priors are used. Ifmultiple prior specications are used, as wedid in all our examples, include informationabout the sensitivity analysis.

    2. As convergence might be an issue in a Bayesiananalysis (see online Appendix S1), and becausethere are not many convergence indices to relyon, information should be added about conver-gence, for example, by providing (some of) thetrace plots as supplementary materials.

    In conclusion, we believe that Bayesian statisticalmethods are uniquely suited to create cumulativeknowledge. Because the availability of proprietaryand free software is making it increasingly easy toimplement Bayesian statistical methods, we encour-age developmental researchers to consider applyingthem in their research.

    References

    Albert, J. (2009). Bayesian computation with R. London, UK:Springer.

    Arbuckle, J. L. (2006). Amos 7.0 users guide. Chicago, IL:SPSS.

    Asendorpf, J. B., Conner, M., De Fruyt, F., De Houwer, J.,Denissen, J. J. A., Fiedler, K., . . . Wicherts, J. M. (2013).Recommendations for increasing replicability in psy-chology. European Journal of Personality, 27, 108119.doi:10.1002/per.1919

    Asendorpf, J. B., & van Aken, M. A. G. (2003). Personal-ity-relationship transaction in adolescence: Core versussurface personality characteristics. Journal of Personality,71, 629666. doi:10.1111/1467-6494.7104005

    Asendorpf, J. B., & Wilpers, S. (1998). Personality effectson social relationships. Journal of Personality and SocialPsychology, 74, 15311544. doi:0022-3514/98/$3.00

    Bayes, T., & Price, R. (1763). An essay towards solving aproblem in the doctrine of chance. By the late Rev. Mr.Bayes, communicated by Mr. Price, in a letter to JohnCanton, M. A. and F. R. S. Philosophical Transactions ofthe Royal Society of London, 53, 370418. doi:10.1098/rstl.1763.0053

    Borkenau, P., & Ostendorf, F. (1993). NEO-Funf-Faktoren-Inventar nach Costa und McCrae [NEO-Five-Factor-Ques-tionnaire as in Costa and McCrae]. Gottingen,Germany: Hogrefe.

    Button, K. S., Ioannidis, J. P., Mokrysz, C., Nosek, B. A.,Flint, J., Robinson, E. S., & Munafao, M. R. (2013).Power failure: Why small sample size undermines thereliability of neuroscience. Nature Reviews Neuroscience,14, 365376.

    Caspi, A. (1998). Personality development across the lifecourse. In N. Eisenberg (Ed.), Handbook of child psychol-ogy: Vol. 3. Social, emotional, and personality development(5th ed., pp. 311388). New York, NY: Wiley.

    De Finetti, B. (1974a). Bayesianism: Its unifying role forboth the foundations and applications of statistics.International Statistical Review, 42, 117130.

    De Finetti, B. (1974b). Theory of probability (Vols. 1 and 2).New York, NY: Wiley.

    Furman, W., & Buhrmester, D. (1985). Childrens percep-tions of the personal relationships in their social net-works. Developmental Psychology, 21, 10161024. doi:10.1037/0012-1649.21.6.1016

    Geiser, C. (2013). Data analysis with Mplus. New York,NY: The Guilford Press.

    Gelman, A., Carlin, J. B., Stern, H. S., & Rubin, D. B.(2004). Bayesian data analysis (2nd ed.). London, UK:Chapman & Hall.

    Gigerenzer, G. (2004). The irrationality paradox. Behavioraland Brain Sciences, 27, 336338. doi:10.1017/S0140525X04310083

    Howson, C., & Urbach, P. (2006). Scientic reasoning: TheBayesian approach (3rd ed.). Chicago, IL: Open Court.

    Hox, J., van de Schoot, R., & Matthijsse, S. (2012). Howfew countries will do? Comparative survey analysisfrom a Bayesian perspective. Survey Research Methods,6, 8793.

    Jaynes, E. T. (2003). Probability theory: The logic of science.Cambridge, UK: Cambridge University Press.

    A Gentle Introduction to Bayesian Analysis 17

  • Kaplan, D., & Depaoli, S. (2012). Bayesian structuralequation modeling. In R. Hoyle (Ed.), Handbook of struc-tural equation modeling (pp. 650673). New York, NY:Guilford Press.

    Kaplan, D., & Depaoli, S. (2013). Bayesian statisticalmethods. In T. D. Little (Ed.), Oxford handbook of quanti-tative methods (pp. 407437). Oxford, UK: Oxford Uni-versity Press.

    Kaplan, D., & Wenger, R. N. (1993). Asymptotic indepen-dence and separability in covariance structure models:Implications for specication error, power, and modelmodication. Multivariate Behavioral Research, 28, 483498. doi:10.1207/s15327906mbr2804_4

    Kruschke, J. K. (2011a). Bayesian assessment of null val-ues via parameter estimation and model comparison.Perspectives on Psychological Science, 6, 299312. doi:10.1177/1745691611406925

    Kruschke, J. K. (2011b). Doing Bayesian data analysis. Bur-lingon, MA: Academic Press.

    Kruschke, J. K. (2013). Bayesian estimation supersedes thet test. Journal of Experimental Psychology: General, 142,573603. doi:10.1037/a0029146

    Lee, M. D., & Wagenmakers, E. J. (2005). Bayesian statisti-cal inference in psychology: Comment on Tramow(2003). Psychological Review, 112, 662668. doi:10.1037/0033-295X.112.3.662

    Lee, S.-Y., & Song, X.-Y. (2004). Evaluation of the Bayes-ian and maximum likelihood approaches in analyzingstructural equation models with small sample sizes.Multivariate Behavioral Research, 39, 653686.

    Lunn, D. J., Thomas, A., Best, N., & Spiegelhalter, D.(2000). WinBUGSA Bayesian modelling framework:Concepts, structure, and extensibility. Statistics andComputing, 10, 325337. doi:10.1007/s11222-008-9100-0

    Lynch, S. (2007). Introduction to applied Bayesian statisticsand estimation for social scientists. New York, NY:Springer.

    McArdle, J. J., Grimm, K. J., Hamagami, F., Bowles, R.P., & Meredith, W. (2009). Modeling life-span growthcurves of cognition using longitudinal data with mul-tiple samples and changing scales of measurement.Psychological Methods, 14, 126149. doi:10.1037/a0015857

    McCrae, R. R., & Costa, P. T. (1996). Towards a new gen-eration of personality theories: Theoretical contexts forthe ve-factor model. In J. S. Wiggins (Ed.), The ve-factor model of personality: Theoretical perspectives (pp. 5187). New York, NY: Guilford Press.

    Meeus, W., Van de Schoot, R., Keijsers, L., Schwartz, S. J.,& Branje, S. (2010). On the progression and stability ofadolescent identity formation. A ve-wave longitudinalstudy in early-to-middle and middle-to-late adoles-cence. Child Development, 81, 15651581. doi:0009-3920/2010/8105-0018

    Mulder, J., Hoijtink, H., & de Leeuw, C. (2012). Biems:A Fortran90 program for calculating Bayes factors forinequality and equality constrained models. Journal ofStatistical Software, 46, 2.

    Muthen, B., & Asparouhov, T. (2012). Bayesian SEM:A more exible representation of substantive theory.Psychological Methods, 17, 313335. doi:10.1037/a0026802

    Muthen, L. K., & Muthen, B. O. (19982012). Mplus usersguide (7th ed.). Los Angeles, CA: Muthen & Muthen.

    Neyer, F. J., & Asendorpf, J. B. (2001). Personality-relationship transaction in young adulthood. Journal ofPersonality and Social Psychology, 81, 11901204. doi:10.1037//0022-3514.81.6.1190

    Ntzoufras, I. (2011). Bayesian modeling using WinBUGS.Hoboken, NJ: Wiley.

    OHagan, A., Buck, C. E., Daneshkhah, A., Eiser, J. R.,Garthwaite, P. H., Jenkinson, D. J., . . . Rakow, T. (2006).Uncertain judgements: Eliciting experts probabilities. WestSussex, UK: Wiley.

    Ostendorf, F. (1990). Sprache und Personlichkeitsstruktur. ZurValiditat des Funf-Faktoren Modells der Personlichkeit[Language and personality structure: Validity of theve-factor model of personality]. Regensburg, Germany:Roderer.

    Press, S. J. (2003). Subjective and objective Bayesian statistics:Principles, models, and applications (2nd ed.). New York,NY: Wiley.

    Rietbergen, C., Klugkist, I., Janssen, K. J. M., Moons, K.G. M., & Hoijtink, H. (2011). Incorporation of historicaldata in the analysis of randomized therapeutic trials.Contemporary Clinical Trials, 32, 848855. doi:10.1016/j.cct.2011.06.002

    Roberts, B. W., Wood, D., & Smith, J. L. (2005). Evaluat-ing ve factor theory and social investment perspec-tives on personality trait development. Journal ofResearch in Personality, 39, 166184. doi:10.1016/j.jrp.2004.08.002

    Rowe, M. L., Raudenbush, S. W., & Goldin-Meadow, S.(2012). The pace of vocabulary growth helps predictlater vocabulary skill. Child Development, 83, 508525.doi:10.1111/j.1467-8624.2011.01710.x

    Stigler, S. M. (1986). Laplaces 1774 memoir on inverseprobability. Statistical Science, 1, 359363.

    Sturaro, C., Denissen, J. J. A., van Aken, M. A. G., &Asendorpf, J. B. (2010). Person-environment transac-tions during emerging adulthood the interplay betweenpersonality characteristics and social relationships.European Psychologist, 13, 111. doi:10.1027/1016-9040.13.1.1

    Van de Schoot, R., Hoijtink, H., Mulder, J., Van Aken,M. A. G., Orobio de Castro, B., Meeus, W., & Romeijn,J.-W. (2011). Evaluating expectations about negativeemotional states of aggressive boys using Bayesianmodel selection. Developmental Psychology, 47, 203212.doi:10.1037/a0020957

    Van de Schoot, R., Verhoeven, M., & Hoijtink, H. (2012).Bayesian evaluation of informative hypotheses in SEMusing Mplus: A black bear story. European Journal ofDevelopmental Psychology, 10, 8198. doi:10.1080/17405629.2012.732719

    Van Wesel, F. (2011, July 1). Priors & prejudice: using exist-ing knowledge in social science research. Utrecht Univer-

    18 van de Schoot et al.

  • sity. Prom./coprom.: prof. dr. H.J.A. Hoijtink, dr. I.G.Klugkist & dr. H.R. Boeije.

    Wagenmakers, E. J. (2007). A practical solution to the per-vasive problems of p values. Psychonomic Bulletin &Review, 14, 779804. doi:10.3758/BF03194105

    Wagenmakers, E. J., Wetzels, R., Borsboom, D., & van derMaas, H. L. J. (2011). Why psychologists must changethe way they analyze their data: The case of psi. Journalof Personality and Social Psychology, 100, 426432. doi:10.1037/a0022790

    Walker, L. J., Gustafson, P., & Frimer, J. A. (2007). Theapplication of Bayesian analysis to issues in develop-mental research. International Journal of Behavioral Devel-opment, 31, 366373. doi:10.1177/0165025407077763

    Weinert, F. E., & Schneider, W. (Eds.). (1999). Individualdevelopment from 3 to 12: Findings from the Munich Longi-tudinal Study. New York, NY: Cambridge UniversityPress.

    Yuan, Y., & MacKinnon, D. P. (2009). Bayesian mediationanalysis. Psychological Methods, 14, 301322. doi:10.1037/a0016972

    Zhang, Z., Hamagami, F., Wang, L., Grimm, K. J., &Nesselroade, J. R. (2007). Bayesian analysis of longitudi-nal data using growth curve models. International Jour-nal of Behavioral Development, 31, 374383. doi:10.1177/0165025407077764

    Zhao, X., Lynch, J. G., Jr., & Chen, Q. (2010). Reconsider-ing Baron and Kenny: Myths and truths about media-tion analysis. Journal of Consumer Research, 37, 197206.doi:10.1086/651257

    Supporting Information

    Additional supporting information may be found inthe online version of this article at the publisherswebsite:

    Appendix S1. Bayes Theorem in More Details.Appendix S2. Bayesian Statistics in Mplus.Appendix S3. Bayesian Satistics in WinBugs.Appendix S4. Bayesian Statistics in AMOS.

    A Gentle Introduction to Bayesian Analysis 19