psychology 350: special topics an introduction to r for...

24
Introduction A toy data set Real data Simulated data References Psychology 350: Special Topics An introduction to R for psychological research Analyzing dynamic data: a tutorial William Revelle Northwestern University Evanston, Illinois USA https://personality-project.org/courses/350 October, 2020 1 / 20

Upload: others

Post on 21-Aug-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Psychology 350: Special TopicsAn introduction to R for psychological research

Analyzing dynamic data: a tutorial

William RevelleNorthwestern UniversityEvanston, Illinois USA

https://personality-project.org/courses/350

October, 2020

1 / 20

Page 2: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Outline

Introduction

A toy data set

Real data

Simulated data

2 / 20

Page 3: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Dynamic Data: An old problem reconsidered

1. The study of personality has traditionally emphasized howpeople differ from each other and the reliability and validity ofthese differences. This has been reflected in the manypublications in Personality and Individual Differences andothers emphasizing the structure of personality, scaleconstruction, and validation.

2. The typical data collected emphasized the “R” approach ofCattell’s data box (Cattell, 1946a, 1966), that is, correlatinghow participants differ across items/tests.

3. Cattell’s data box also included the possibility of studying howone person varied over time (“P”). Sometimes the approachwould consider stabilities across time as measured by thecorrelation of measures taken at two different time points(“S”).

3 / 20

Page 4: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Longitudinal data

1. Studying psychology the “long way” involves longitudinaldesigns.

2. Traditionally associated with developmental studies, the timeperiods are years and decades.

3. One of the more impressive stabilities is the correlation of .56over 79 years of IQ scores from age 11 to age 90 (Deary,Pattie & Starr, 2013).

4. An example of what Cattell referred to as a diagonal in hisdata box would be the correlation across time of individualstaken on different measures.

5. An powerful example of this is the prediction of health relatedoutcomes in middle age from teacher ratings of students ingrades 1 - 6 (Hampson & Goldberg, 2006).

4 / 20

Page 5: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

1. In the past 30 years or so, we have seen an exciting change inthe way we collect data, in that we now can study howindividuals vary over time (Cattell’s P approach). To Cattell,this was “the method for discovering trait unities” (Cattell,1946b, p 95).

2. The emphasis is now upon individual variability with theadded complexity of how these patterns of individual changediffer across participants (e.g., Bolger & Laurenceau, 2013;Mehl & Conner, 2012; Wilt, Funkhouser & Revelle, 2011;Wilt, Bleidorn & Revelle, 2016).

3. Although the methods were originally developed to examinedata with a nested structure (e.g., students nested withinclasses nested within schools Bryk & Raudenbush, 1992), theuse of these techniques across many occasions withinindividuals has been labeled Intensive Longitudinal Methods(Walls & Schafer, 2006) and “captures life as it is lived”(Bolger, Davis & Rafaeli, 2003).

5 / 20

Page 6: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Dynamic Data

1. We refer to data that show systematic variation over time asdynamic to distinguish them from static cross sectional data.

2. Formal models that distinguish between dynamic patternsversus stochastic variation (Revelle & Condon, 2015) arebeyond the scope of this paper. Although it is possible toexamine group patterns over time, it is more typical toconsider how individuals differ in their patterning across time.

6 / 20

Page 7: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Many names, one analytic tecnique

1. Analytic strategies for analyzing such multi-level data havebeen given different names in a variety of fields and are knownby a number of different terms such as the

• random effects or random coefficient models of economics,• multi-level models of sociology and psychology,• hierarchical linear models of education• or more generally, mixed effects models (Fox, 2016).

2. Although frequently cautioned not to do so, somepsychologists continue to use a repeated measures analysis ofvariance approaches rather than the more accurate mixedeffects models.

7 / 20

Page 8: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Within groups and between groups

1. The analysis of data at multiple levels presents at least twochallenges, one is that of interpretation, the other is that ofstatistical inference.

2. It has long been known (Yule, 1903) that relationships foundwithin groups are not necessarily the same as those betweengroups. Although when aggregating across British healthdistricts, it appeared that increased mortality was associatedwith increases in vaccinations, when examined at the withindistrict level, it was clear that vaccinations reduced mortality(Yule, 1912).

3. Variously known as Simpson’s paradox (Simpson, 1951), orthe ecological fallacy (Robinson, 1950), the observation isthat relationships of aggregated data do not imply the samerelationship at the disaggregated level. Such results areexamples of non-ergodic relationships, that is, relationshipsthat differ from the individual to the group level (Molenaar,2004; Nesselroade & Molenaar, 2016).

8 / 20

Page 9: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Structure at different levels of analysis

1. More importantly, when the effect of levels is ignored,structural relationships are difficult to interpret.

2. The correlation between two variables (x and y) when x and yare measured within individuals is a function of the correlationbetween the individual means (rxybetween), the pooled withinindividual correlations (rxywihin) and the relationships betweenthe data and the between group means ηbetween as well as thethe correlation of the data within the within subject meansηwithin.

rxy = ηxwithin ∗ ηywithin ∗ rxywithin + ηxbetween ∗ ηybetween ∗ rxybetween . (1)

9 / 20

Page 10: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Creating a toy data setR codelibrary(psych) #activate the psych package#create the dataset.seed(42)x <- sim.multi(n.obs=4,nvar=4,nfact=2,days=6,ntrials=6,plot=TRUE,

phi.i=c(-.7,0,0,.7),loading=.6)raw <- round(x[3:8])raw[1:4] <- raw[1:4] + 6#make a ’Fat’ versionXFat <- reshape(raw,idvar="id",timevar="time",times=1:4,

direction="wide")#show itXFat

#now make it wideXWide <- reshape(XFat,idvar="id",varying=2:25,direction="long")Xwide <- dfOrder(XWide,"id")

#add in the trait informationtraits <- data.frame(id = 1:4,extraversion =c(5,10,15,20),

neuroticism =c(10,5, 15,10))Xwide.traits <- merge(Xwide,traits, by ="id")

10 / 20

Page 11: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

A toy data setLattice Plot by subjects over time

time

values

2

4

6

8

10

: id { 1 }

20 40 60 80 100 120 140

: id { 2 }

20 40 60 80 100 120 140

: id { 3 }

2

4

6

8

10

: id { 4 }

11 / 20

Page 12: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Open and shared science

1. One of the more powerful uses of the web is to share data

2. A number of data sets are available for other people to use

3. Aaron Fisher at UCB has released a data set of positive andnegative mood for 10 subjects over 100 days (Fisher, 2015)

4. Other, larger data sets, are also available.

12 / 20

Page 13: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

The Fisher data set

1. Although available on the web, it is necessary to download thedata and do some rearrangements to make it useful for ourpurposes.

2. In a study of 10 participants diagnosed with clinicallygeneralized anxiety disorder, (Fisher, 2015) collected 28 itemsfor at least 60 days per participant.

3. I have moved this data set to the 350 folder so that we canuse it more readily.

4. In an impressive demonstration of how different people are, heexamined the dynamic factor structure of each person usingprocedures discussed by Molenaar (1985)

13 / 20

Page 14: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Fisher’s data: pooled across subjectsR code

positive <- cs(happy,content,relaxed, positive)negative <- cs(angry,afraid, sad, lonely)pana <- c(positive,negative)R <- lowerCor(fisher[pana])pana.scores <- scoreItems(keys=list(positive=positive,

negative=negative), fisher)summary(pana.scores)

happy cntnt relxd postv angry afrad sad lonlyhappy 1.00content 0.80 1.00relaxed 0.67 0.74 1.00positive 0.84 0.79 0.70 1.00angry -0.19 -0.15 -0.12 -0.15 1.00afraid -0.36 -0.37 -0.28 -0.34 0.65 1.00sad -0.34 -0.32 -0.23 -0.31 0.67 0.75 1.00lonely -0.24 -0.21 -0.12 -0.21 0.64 0.74 0.80 1.00

Scale intercorrelations corrected for attenuationraw correlations below the diagonal, (unstandardized) alpha on the diagonalcorrected correlations above the diagonal:

positive negativepositive 0.93 -0.33negative -0.30 0.91

14 / 20

Page 15: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Fisher affect over time

R codeaffect.df <-cbind(fisher[1:2], pana.scores$score)describe(affect.df)lowerCor(affect.df)

describe(affect.df)vars n mean sd median trimmed mad min max range skew kurtosis se

id 1 792 18.42 17.37 11.00 14.85 5.93 1.00 65.00 64.0 1.75 2.25 0.62time 2 792 41.34 25.36 40.00 40.15 29.65 1.00 118.00 117.0 0.42 -0.31 0.90positive 3 792 36.40 18.76 31.75 34.39 15.57 3.75 93.25 89.5 0.94 0.20 0.67negative 4 792 30.34 19.79 25.75 28.41 21.50 1.50 98.50 97.0 0.75 -0.07 0.70> lowerCor(affect.df)

id time postv negtvid 1.00time -0.08 1.00positive -0.50 0.01 1.00negative 0.60 -0.01 -0.30 1.00

15 / 20

Page 16: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Fisher’s data, measuring positive and negative affect over time

Lattice Plot by subjects over time

time

values

0

20

40

60

80

100 : id 002

0 20 40 60 80 100 120

: id 007 : id 009

0 20 40 60 80 100 120

: id 010 : id 011

0 20 40 60 80 100 120

: id 013 : id 022

0 20 40 60 80 100 120

: id 023 : id 030

0 20 40 60 80 100 120

0

20

40

60

80

100 : id 065

16 / 20

Page 17: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Fisher’s affect data within subjects over timeR codesb.affect<- statsBy(affect.df,"id",cors=TRUE)round(sb.affect$within,2)round(sb.affect$pooled,2)sb.affect

time-postv time-negtv postv-negtv1 -0.38 -0.18 -0.367 -0.48 0.53 -0.609 0.05 -0.03 0.3510 -0.29 -0.36 -0.2811 0.43 0.13 -0.2813 0.03 -0.22 -0.0522 -0.04 -0.32 -0.2423 -0.01 -0.22 -0.1630 0.29 0.11 -0.4365 0.09 -0.48 -0.74

time positive negativetime 1.00 -0.04 -0.10positive -0.04 1.00 -0.29negative -0.10 -0.29 1.00sb.affectStatistics within and between groupsCall: statsBy(data = affect.df, group = "id", cors = TRUE)Intraclass Correlation 1 (Percentage of variance due to groups)

id time positive negative1.00 0.10 0.64 0.70

Intraclass Correlation 2 (Reliability of group differences)id time positive negative

1.00 0.90 0.99 0.99 17 / 20

Page 18: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Simulating data

To understand how models work, it useful to simulate data wherewe know the structure. sim.multi does this.

1. Trends over time

2. Diurnal variation

3. Within subject variability

4. sim.multi() defaults to 4 subjects for two variables over 16days.

18 / 20

Page 19: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Simulated data over time

time

DV

-4

-2

0

2

4

: id { 1 }

0 50 100 150

: id { 2 } : id { 3 }

0 50 100 150

: id { 4 }

: id { 5 } : id { 6 } : id { 7 }

-4

-2

0

2

4

: id { 8 }

-4

-2

0

2

4

: id { 9 } : id { 10 } : id { 11 } : id { 12 }

0 50 100 150

: id { 13 } : id { 14 }

0 50 100 150

: id { 15 }

-4

-2

0

2

4

: id { 16 }

19 / 20

Page 20: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Conclusion

Modern data collection techniques allow for intensive measurementwithin subjects. Analyzing this type of data requires analyzing dataat the within subject as well as between subject level. Althoughsometimes conclusions will be the same at both levels, it isfrequently the case that examining within subject data will showmuch more complex patterns of results than when they are simplyaggregated. This tutorial is a simple introduction to the kind ofdata analytic strategies that are possible.(See http://personality-project.org/courses/350/350.wk7b.html forworked examples.)These slides have been adapted from (Revelle & Wilt, 2019):Analyzing dynamic data: a tutorial

20 / 20

Page 21: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Bolger, N., Davis, A., & Rafaeli, E. (2003). Diary methods:Capturing life as it is lived. Annual Review of Psychology, 54,579–616.

Bolger, N. & Laurenceau, J. (2013). Intensive longitudinalmethods. New York, N.Y.: Guilford.

Bryk, A. S. & Raudenbush, S. W. (1992). Hierarchical linearmodels: Applications and data analysis methods. Advancedqualitative techniques in the social sciences, 1. Thousand Oaks,CA: Sage Publications, Inc.

Cattell, R. B. (1946a). Description and measurement ofpersonality. Oxford, England: World Book Company.

Cattell, R. B. (1946b). Personality structure and measurement. I.The operational determination of trait unities. British Journal ofPsychology, 36, 88–102.

Cattell, R. B. (1966). The data box: Its ordering of total resourcesin terms of possible relational systems. In R. B. Cattell (Ed.),

20 / 20

Page 22: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Handbook of multivariate experimental psychology (pp.67–128). Chicago: Rand-McNally.

Deary, I. J., Pattie, A., & Starr, J. M. (2013). The stability ofintelligence from age 11 to age 90 years: The Lothian BirthCohort of 1921. Psychological Science, 24(12), 2361–2368.

Fisher, A. J. (2015). Toward a dynamic model of psychologicalassessment: Implications for personalized care. Journal ofConsulting and Clinical Psychology, 83(4), 825 – 836.

Fox, J. (2016). Applied regression analysis and generalized linearmodels (3rd ed.). Sage.

Hampson, S. E. & Goldberg, L. R. (2006). A first large cohortstudy of personality trait stability over the 40 years betweenelementary school and midlife. Journal of personality and socialpsychology, 91(4), 763–779.

Mehl, M. R. & Conner, T. S. (2012). Handbook of researchmethods for studying daily life. New York: Guilford Press.

20 / 20

Page 23: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Molenaar, P. C. M. (1985). A dynamic factor model for theanalysis of multivariate time series. Psychometrika, 50(2),181–202.

Molenaar, P. C. M. (2004). A manifesto on psychology asidiographic science: Bringing the person back into scientificpsychology, this time forever. Measurement, 2(4), 201–218.

Nesselroade, J. R. & Molenaar, P. C. M. (2016). Some behaviorialscience measurement concerns and proposals. MultivariateBehavioral Research, 51(2-3), 396–412.

Revelle, W. & Condon, D. M. (2015). A model for personality atthree levels. Journal of Research in Personality, 56, 70–81.

Revelle, W. & Wilt, J. A. (2019). Analyzing dynamic data: atutorial. Personality and Individual Differences, 136(1), 38–51.

Robinson, W. S. (1950). Ecological correlations and the behaviorof individuals. American Sociological Review, 15(3), 351–357.

20 / 20

Page 24: Psychology 350: Special Topics An introduction to R for ...personality-project.org/courses/350/350.week7.dynamics.pdf4.An example of what Cattell referred to as a diagonal in his data

Introduction A toy data set Real data Simulated data References

Simpson, E. H. (1951). The interpretation of interaction incontingency tables. Journal of the Royal Statistical Society.Series B (Methodological), 13(2), 238–241.

Walls, T. A. & Schafer, J. L. (2006). Models for intensivelongitudinal data. Oxford University Press.

Wilt, J., Bleidorn, W., & Revelle, W. (2016). Velocity explains thelinks between personality states and affect. Journal of Researchin Personality, xx, xx–xx doi: 10.1016/j.jrp.2016.06.008.

Wilt, J., Funkhouser, K., & Revelle, W. (2011). The dynamicrelationships of affective synchrony to perceptions of situations.Journal of Research in Personality, 45, 309–321.

Yule, G. U. (1903). Notes on the theory of association ofattributes in statistics. Biometrika, 2(2), 121–134.

Yule, G. U. (1912). On the methods of measuring associationbetween two attributes. Journal of the Royal Statistical Society,LXXV, 579–652.

20 / 20