data-pipeline using alspac data

97
Data-pipeline using ALSPAC data

Upload: dawson

Post on 01-Feb-2016

54 views

Category:

Documents


0 download

DESCRIPTION

Data-pipeline using ALSPAC data. Contents. Introduction to ALSPAC Description of the measures Preparing my data for the pipeline The pipeline (in stata) Summarize / Codebook Polychoric correlations Polychoric PCA Loevinger’s H Mokken Scale Procedure Options for SPSS users. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Data-pipeline using ALSPAC data

Data-pipelineusing ALSPAC data

Page 2: Data-pipeline using ALSPAC data

Contents

• Introduction to ALSPAC– Description of the measures

• Preparing my data for the pipeline– The pipeline (in stata)

• Summarize / Codebook• Polychoric correlations• Polychoric PCA• Loevinger’s H• Mokken Scale Procedure

• Options for SPSS users

Page 3: Data-pipeline using ALSPAC data

Day 2: Contents

• Introduction to Psychometrics:– Item Response Theory in Stata

• Non-parametric procedures: – Mokken,Description of the measures

• Parametric models– Singe parameter logistic model (Rasch)– Two parameter logistic (Lord-Birnbaum)

• An R – 2 – Detour (a detour to R)• Running psychometric analyses from Stata

• From Data to PAPER– Automated IRT analyses that yield publication quality graphics– Connections from Stata

Page 4: Data-pipeline using ALSPAC data

Contents

• Introduction to ALSPAC– Description of the measures

• Preparing my data for the pipeline– The pipeline (in stata)

• Summarize / Codebook• Polychoric correlations• Polychoric PCA• Loevinger’s H• Mokken Scale Procedure

• Options for SPSS users

Page 5: Data-pipeline using ALSPAC data

What is ALSPAC?

• “Avon Longitudinal Study of Parents and Children” AKA Children of the Nineties

• Cohort study of ~14,000 children and their parents, based in south-west England

• Eligibility criteria: Mothers had to be resident in Avon and have an expected date of delivery between April 1st 1991 and December 31st 1992

• Population based prospective cohort study

Page 6: Data-pipeline using ALSPAC data

Where’s Avon to, my luvver?trans: Where is Avon?

Page 7: Data-pipeline using ALSPAC data

The county of Avon

• 1) A nice short name• 2) Known for it’s “ladies”• 3) Replaced in 1996 with

– Bristol– North Somerset– Bath and North East Somerset– South Gloucestershire

– Collectively known as “CUBA” (Counties which Used to Be Avon)

Page 8: Data-pipeline using ALSPAC data

What data does ALSPAC have?

• Self completion questionnaires– Mothers, Partners, Children, Teachers

• Hands on assessments– 10% sample tested regularly since birth– Yearly clinics for all since age 7

• Data from external sources – SATS from LEA, Child Health database

• Biological samples– DNA / cell lines

Page 9: Data-pipeline using ALSPAC data

Contents

• Introduction to ALSPAC– Description of the measures

• Preparing my data for the pipeline– The pipeline (in stata)

• Summarize / Codebook• Polychoric correlations• Polychoric PCA• Loevinger’s H• Mokken Scale Procedure

• Options for SPSS users

Page 10: Data-pipeline using ALSPAC data

Today’s Measures 1 - MFQ

• Moods and Feelings Questionnaire

• Angold and Costello (1987).

Mood and feelings questionnaire (MFQ). Durham: Duke University, Developmental Epidemiology Program.

• Short version, 13 items• Parental response at 13 years [Questionnaire]• Child response at 14 years [Clinic, computer]

Page 11: Data-pipeline using ALSPAC data
Page 12: Data-pipeline using ALSPAC data

Today’s Measures 2 - EAS

• EAS Temperament Survey (Parental Ratings)

• Buss and Plomin, (1984).

A temperament theory of personality development. New York: John Wiley.

• 20 questions• 4 subscales:

Emotionality, Activity, Shyness & Sociability• Parental response at 4 years [Questionnaire]

Page 13: Data-pipeline using ALSPAC data
Page 14: Data-pipeline using ALSPAC data
Page 15: Data-pipeline using ALSPAC data

Contents

• Introduction to ALSPAC– Description of the measures

• Preparing my data for the pipeline– The pipeline (in stata)

• Summarize / Codebook• Polychoric correlations• Polychoric PCA• Loevinger’s H• Mokken Scale Procedure

• Options for SPSS users

Page 16: Data-pipeline using ALSPAC data

Rename variables for clarity and consistency

gen mum01_012 = ta5020gen mum02_012 = ta5021gen mum03_012 = ta5022gen mum04_012 = ta5023gen mum05_012 = ta5024gen mum06_012 = ta5025gen mum07_012 = ta5026gen mum08_012 = ta5027gen mum09_012 = ta5028gen mum10_012 = ta5029gen mum11_012 = ta5030gen mum12_012 = ta5031gen mum13_012 = ta5032

gen kid01_012 = fg6410 gen kid02_012 = fg6412 gen kid03_012 = fg6413 gen kid04_012 = fg6414 gen kid05_012 = fg6415 gen kid06_012 = fg6416 gen kid07_012 = fg6418 gen kid08_012 = fg6419 gen kid09_012 = fg6421 gen kid10_012 = fg6422 gen kid11_012 = fg6423 gen kid12_012 = fg6424 gen kid13_012 = fg6425

ta5020 ~ fg6410 or ta5027 ~ fg6419???

mum01_012 ~ kid01_012 and mum08_012 ~ kid08_012

Page 17: Data-pipeline using ALSPAC data

Derive binary variables

recode *_012 (3=0)(2=1)(1=2)

foreach x in "mum01" "mum02" "mum03" "mum04" "mum05" "mum06" /// "mum07" "mum08" "mum09" "mum10" "mum11" "mum12" "mum13" ///"kid01" "kid02" "kid03" "kid04" "kid05" "kid06" "kid07" "kid08" /// "kid09" "kid10" "kid11" "kid12" "kid13" {gen `x'_001 = `x'_012recode `x'_001 (0=0)(1=0)(2=1)gen `x'_011 = `x'_012recode `x'_011 (0=0)(1=1)(2=1)}

mum01_012mum01_001mum01_011

Page 18: Data-pipeline using ALSPAC data

Variable labels

foreach var of varlist *01_* {label variable `var' "Felt miserable/unhappy [`var']"}

foreach var of varlist *02_* {label variable `var' "Didnt enjoy anything at all [`var']"}

foreach var of varlist *03_* {label variable `var' "Felt so tired they just sat around & did nothing [`var']"}

foreach var of varlist *04_* {label variable `var' "Was restless [`var']"}

Etc.

Page 19: Data-pipeline using ALSPAC data

Value Labels

foreach var of varlist *_012 {label define `var'_lab 0 "Not true" 1 "Sometimes true" 2 "True"label values `var' `var'_lab}

foreach var of varlist *_011 {label define `var'_lab 0 "Not true" 1 "Sometimes true / True"label values `var' `var'_lab}

foreach var of varlist *_001 {label define `var'_lab 0 "Sometimes true / not true" 1 "True"label values `var' `var'_lab}

Page 20: Data-pipeline using ALSPAC data

Contents

• Introduction to ALSPAC– Description of the measures

• Preparing my data for the pipeline– The pipeline (in Stata)

• Summarize / Codebook• Polychoric correlations• Polychoric PCA• Loevinger’s H• Mokken Scale Procedure

• Options for SPSS users

Page 21: Data-pipeline using ALSPAC data

log using "mfq_dataprep.log", replaceforeach x in "mum" "kid" {

su `x'*_012codebook `x'*_012 loevH `x'*_012polychoric `x'*_012polychoricpca `x'*_012msp `x'*_012}

log close

Repeat with *_011 and *_001

Typical data-pipeline syntax

Page 22: Data-pipeline using ALSPAC data

summarize / codebook

Page 23: Data-pipeline using ALSPAC data

su emo_*_01234

Variable | Obs Mean Std. Dev. Min Max-------------+------------------------------------------------------emo_l_02_0~4 | 9467 1.564276 .806012 0 4emo_l_06_0~4 | 9445 1.7081 .8448107 0 4emo_l_11_0~4 | 9448 1.274238 .8241389 0 4emo_l_15_0~4 | 9431 1.613933 .8029195 0 4emo_l_19_0~4 | 9342 1.594198 1.008401 0 4

Page 24: Data-pipeline using ALSPAC data

codebook emo_*_01234 -----------------------------------------------------------------------------------------------emo_l_02_01234 Child cries easily [emo_l_02_01234]----------------------------------------------------------------------------------------------- type: numeric (float) label: emo_l_02_01234_lab range: [0,4] units: 1 unique values: 5 missing .: 5196/14663 tabulation: Freq. Numeric Label 761 0 E-Like 3620 1 Q-like 4202 2 S-like 751 3 NM-Like 133 4 NAA-Like 5196 . -----------------------------------------------------------------------------------------------emo_l_06_01234 Child tends to be somewhat emotional [emo_l_06_01234]----------------------------------------------------------------------------------------------- type: numeric (float) label: emo_l_06_01234_lab range: [0,4] units: 1 unique values: 5 missing .: 5218/14663 tabulation: Freq. Numeric Label 632 0 E-Like 3018 1 Q-like 4507 2 S-like 1051 3 NM-Like 237 4 NAA-Like 5218 .

Page 25: Data-pipeline using ALSPAC data

codebook emo_*_01234 -----------------------------------------------------------------------------------------------emo_l_11_01234 Child often fusses and cries [emo_l_11_01234]----------------------------------------------------------------------------------------------- type: numeric (float) label: emo_l_11_01234_lab range: [0,4] units: 1 unique values: 5 missing .: 5215/14663 tabulation: Freq. Numeric Label 1538 0 E-Like 4420 1 Q-like 2942 2 S-like 457 3 NM-Like 91 4 NAA-Like 5215 . -----------------------------------------------------------------------------------------------emo_l_15_01234 Child gets upset easily [emo_l_15_01234]----------------------------------------------------------------------------------------------- type: numeric (float) label: emo_l_15_01234_lab range: [0,4] units: 1 unique values: 5 missing .: 5232/14663 tabulation: Freq. Numeric Label 559 0 E-Like 3689 1 Q-like 4214 2 S-like 772 3 NM-Like 197 4 NAA-Like 5232 .

Page 26: Data-pipeline using ALSPAC data

codebook emo_*_01234 -----------------------------------------------------------------------------------------------emo_l_19_01234 Child reacts intensely when upset [emo_l_19_01234]----------------------------------------------------------------------------------------------- type: numeric (float) label: emo_l_19_01234_lab range: [0,4] units: 1 unique values: 5 missing .: 5321/14663 tabulation: Freq. Numeric Label 1329 0 E-Like 3038 1 Q-like 3459 2 S-like 1127 3 NM-Like 389 4 NAA-Like 5321 .

Page 27: Data-pipeline using ALSPAC data

Multihist

pause onforeach x in "01" "02" "03" "04" "05" "06" "07" "08" "09" "10" "11" "12" "13" {

multihist *`x'_012pause}

pause off

•Compare response to same questions at different times•Big differences would suggest an error in previous code

- reversal of responses- change to order of questions asked- change to response options (aargh!)

Page 28: Data-pipeline using ALSPAC data

Fre

que

ncy

ta01_012 (n=6723/14663)

Felt miserable/unhappy [ta01_012Not true .5 Sometime 1.5 True0

1000

2000

3000

4000

kw01_012 (n=7019/14663)

Felt miserable/unhappy [kw01_012

Not true .5 Sometime 1.5 True0

1000

2000

3000

4000

ku01_012 (n=7733/14663)

Felt miserable/unhappy [ku01_012Not true .5 Sometime 1.5 True0

1000

2000

3000

4000

fg01_012 (n=5753/14663)

Felt miserable/unhappy [fg01_012Not true .5 Sometime 1.5 True0

1000

2000

3000

ff01_012 (n=6396/14663)

Felt miserable/unhappy [ff01_012Not true .5 Sometime 1.5 True0

1000

2000

3000

4000

fd01_012 (n=7033/14663)

Felt miserable/unhappy [fd01_012Not true .5 Sometime 1.5 True0

1000

2000

3000

4000

Multihist for first item of MFQ (6 repeat measures)

Page 29: Data-pipeline using ALSPAC data

Polychoric Correlations

Page 30: Data-pipeline using ALSPAC data

Correlation -v- regression coefficient

Correlation coefficient:

The interdependence between pairs of variables i.e. the extent to which values of the variable change together

The strength and direction of the linear relationship

A fatter ellipse will result in a greater degree of scatter for a regression line of a given gradient, and a lower correlation

Page 31: Data-pipeline using ALSPAC data

Polychoric Correlation - Assumptions

• A binary or categorical variable is the observed (or manifest) part of an underlying (or latent) continuous variable

• Here we’ll also assume that latent variables are normally distributed

• THRESHOLD relates the manifest to the latent variable

• Uebersax link: http://ourworld.compuserve.com/homepages/jsuebersax/tetra.htm

Page 32: Data-pipeline using ALSPAC data

Thresholds

Figure from Uebersax webpage

Page 33: Data-pipeline using ALSPAC data

2 binary variables. tab mum01_001 mum04_001

Felt | Was restless miserable/unhappy | [mum04_001] [mum01_001] | ST / NT True | Total----------------------+----------------------+---------- ST / NT | 6,343 78 | 6,421 True | 234 54 | 288 ----------------------+----------------------+---------- Total | 6,577 132 | 6,709

This is all we see, however ….

Page 34: Data-pipeline using ALSPAC data

… this is what we assume is going on

Figure from Uebersax webpage

Page 35: Data-pipeline using ALSPAC data

What we are really interested in is the correlation (r) between the continuous latent variables

Computer algorithm used to search for a correlation r and thresholds t1 and t2 which best reproduce the cell counts of the 2x2 table

Page 36: Data-pipeline using ALSPAC data

Poly / tetra

• Tetrachoric– Special case where both variables are binary

• Polychoric– More general (any categorical variable)

• Bi/Polyserial– One continuous and one categorical variable

Page 37: Data-pipeline using ALSPAC data

Poly versus standard correlations

foreach x in "emo_l_02" "emo_l_06" "emo_l_11" "emo_l_15" "emo_l_19" {gen `x'_00001 = `x'_01234 recode `x'_00001 (0=0)(1=0)(2=0)(3=0)(4=1)gen `x'_00011 = `x'_01234 recode `x'_00011 (0=0)(1=0)(2=0)(3=1)(4=1)gen `x'_00111 = `x'_01234 recode `x'_00111 (0=0)(1=0)(2=1)(3=1)(4=1)gen `x'_01111 = `x'_01234 recode `x'_01111 (0=0)(1=1)(2=1)(3=1)(4=1)gen `x'_01122 = `x'_01234 recode `x'_01122 (0=0)(1=1)(2=1)(3=2)(4=2)gen `x'_00123 = `x'_01234 recode `x'_00123 (0=0)(1=0)(2=1)(3=2)(4=3)}

Page 38: Data-pipeline using ALSPAC data

log using "eas_dataprep_poly_corr.log", replace

foreach x in "emo_*_00001" "emo_*_00011 " "emo_*_00111 " ///"emo_*_01111" "emo_*_01122 " "emo_*_00123 " "emo_*_01234" {corr `x'polychoric `x'}

log close

Page 39: Data-pipeline using ALSPAC data

emo_l_02 emo_l_06 emo_l_11 emo_l_15 emo_l_19

emo_l_02 1.000

emo_l_06 0.592 1.000

emo_l_11 0.688 0.579 1.000

emo_l_15 0.762 0.656 0.702 1.000

emo_l_19 0.433 0.508 0.505 0.521 1.000

emo_l_02 emo_l_06 emo_l_11 emo_l_15 emo_l_19

emo_l_02 1.000

emo_l_06 0.523 1.000

emo_l_11 0.606 0.510 1.000

emo_l_15 0.674 0.581 0.618 1.000

emo_l_19 0.386 0.456 0.450 0.465 1.000

Polychoric Correlation Matrix (01234)

Standard Correlation Matrix (01234)

Page 40: Data-pipeline using ALSPAC data

0.2

0.3

0.4

0.5

0.6

0.7

0.8

00001 00011 00111 01111 01122 00123 01234

Correlation

[P] emo1, emo2

[P] emo2, emo3

[C] emo1, emo2

[C] emo2, emo3

Page 41: Data-pipeline using ALSPAC data

Poly versus standard correlations

• Polychoric correlations always higher than Pearson correlations

• Polychoric correlations more robust to changes in the number of categories

• For polychoric in Stata, if # categories > 10, variable treated as if continuous, so the correlation of two variables that have 10 categories each would be simply the usual Pearson moment correlation found through correlate.

Page 42: Data-pipeline using ALSPAC data

Polychoric PCA

Page 43: Data-pipeline using ALSPAC data

Polychoric PCA

• Performs PCA on the polychoric correlation matrix

• Produces eigenvectors, eigenvalues, and the correlation matrix as with standard PCA

Page 44: Data-pipeline using ALSPAC data

PCA v PolychoricPCA, mum MFQ

PCA Polychoric PCA

Component Eigenvalue Cum. explained Eigenvalues Cum. explained

Comp1 5.554 0.427 8.340 0.642

Comp2 1.206 0.520 1.006 0.719

Comp3 0.826 0.584 0.698 0.773

Comp4 0.719 0.639 0.518 0.812

Comp5 0.707 0.693 0.466 0.848

Comp6 0.646 0.743 0.397 0.879

Comp7 0.610 0.790 0.364 0.907

Comp8 0.547 0.832 0.281 0.928

Comp9 0.526 0.872 0.256 0.948

Comp10 0.500 0.911 0.247 0.967

Comp11 0.426 0.944 0.181 0.981

Comp12 0.383 0.973 0.132 0.991

Comp13 0.350 1.000 0.113 1.000

Page 45: Data-pipeline using ALSPAC data

PCA Polychoric PCA

Variable Comp1 Comp2 Comp3 e1 e2 e3

mum01_012 0.270 0.286 0.297 0.279 0.121 -0.385

mum02_012 0.274 0.225 0.225 0.276 0.215 -0.219

mum03_012 0.178 0.535 0.236 0.186 0.657 -0.253

mum04_012 0.208 0.454 -0.434 0.217 0.461 0.510

mum05_012 0.328 -0.189 -0.067 0.314 -0.122 0.087

mum06_012 0.248 -0.027 0.511 0.266 -0.005 -0.382

mum07_012 0.254 0.278 -0.411 0.255 0.236 0.432

mum08_012 0.317 -0.286 0.019 0.315 -0.166 -0.005

mum09_012 0.287 -0.320 -0.015 0.302 -0.229 0.034

mum10_012 0.282 -0.031 0.161 0.279 -0.110 -0.161

mum11_012 0.306 -0.206 0.073 0.300 -0.206 -0.083

mum12_012 0.300 -0.164 -0.328 0.287 -0.237 0.279

mum13_012 0.312 -0.085 -0.209 0.299 -0.182 0.164

Page 46: Data-pipeline using ALSPAC data

PCA v PolychoricPCA, EASPCA Polychoric PCA

Component Eigenvalue Cum. explained Eigenvalues Cum. explained

Comp1 5.207 0.260 6.099 0.305

Comp2 3.080 0.414 3.363 0.473

Comp3 1.656 0.497 1.766 0.561

Comp4 1.283 0.561 1.311 0.627

Comp5 1.054 0.614 1.058 0.680

Comp6 0.943 0.661 0.918 0.726

Comp7 0.725 0.697 0.648 0.758

Comp8 0.669 0.731 0.590 0.788

Comp9 0.640 0.763 0.567 0.816

Comp10 0.575 0.792 0.495 0.841

Comp11 0.560 0.820 0.490 0.865

Comp12 0.518 0.845 0.447 0.888

Comp13 0.494 0.870 0.424 0.909

Etc.

Page 47: Data-pipeline using ALSPAC data

PCA Polychoric PCA

Variable Comp1 Comp2 Comp3 Comp4 Comp5 e1 e2 e3 e4 e5

act_l_04_01234 -0.250 0.160 0.383 -0.144 -0.126 -0.264 0.163 0.386 -0.148 -0.144

act_l_07_01234 -0.187 -0.009 0.300 -0.179 0.173 -0.205 -0.019 0.296 -0.168 0.176

act_l_09_01234 -0.237 0.139 0.366 -0.186 -0.215 -0.236 0.135 0.341 -0.171 -0.222

act_l_13_01234 -0.288 0.126 0.353 -0.204 -0.071 -0.299 0.126 0.354 -0.203 -0.103

act_l_17_01234 -0.229 0.074 0.187 0.043 0.195 -0.230 0.066 0.181 0.041 0.244

emo_l_02_01234 0.181 0.377 -0.036 -0.141 0.287 0.172 0.379 -0.029 -0.147 0.295

emo_l_06_01234 0.156 0.387 -0.047 -0.088 0.070 0.147 0.388 -0.044 -0.094 0.072

emo_l_11_01234 0.182 0.383 -0.043 -0.132 0.125 0.175 0.385 -0.041 -0.137 0.135

emo_l_15_01234 0.205 0.386 -0.010 -0.156 0.216 0.196 0.388 -0.003 -0.163 0.217

emo_l_19_01234 0.132 0.360 -0.008 -0.024 -0.228 0.124 0.359 -0.008 -0.023 -0.232

shy_l_01_01234 0.258 -0.032 0.286 0.232 0.140 0.255 -0.025 0.293 0.225 0.111

shy_l_08_01234 0.289 -0.054 0.177 -0.006 -0.319 0.286 -0.046 0.178 0.006 -0.273

shy_l_12_01234 0.324 -0.084 0.197 0.009 -0.136 0.321 -0.074 0.204 0.014 -0.089

shy_l_14_01234 0.254 -0.044 0.400 0.238 0.109 0.246 -0.038 0.404 0.236 0.085

shy_l_20_01234 0.204 -0.116 0.369 0.270 0.319 0.195 -0.112 0.380 0.263 0.305

soc_l_03_01234 -0.246 0.161 -0.047 0.229 0.100 -0.255 0.162 -0.052 0.225 0.079

soc_l_05_01234 -0.153 0.219 0.030 0.511 0.020 -0.155 0.221 0.022 0.517 0.071

soc_l_10_01234 -0.205 0.211 -0.031 0.294 -0.196 -0.204 0.210 -0.042 0.303 -0.191

soc_l_16_01234 -0.270 0.038 -0.110 0.234 0.412 -0.270 0.025 -0.111 0.227 0.427

soc_l_18_01234 0.045 0.265 -0.018 0.401 -0.438 0.050 0.267 -0.031 0.406 -0.438

Page 48: Data-pipeline using ALSPAC data

Assumptions of PCA/FA

• Items can be regarded as parallel (same frequency distribution)

• PCA/FA not always appropriate when items differ in their frequency distribution such as when items have differing levels of difficulty

• Alternative methods may be more appropriate…. find out tomorrow

Page 49: Data-pipeline using ALSPAC data

Loevinger’s H

Coefficient of Homogeneity

Page 50: Data-pipeline using ALSPAC data

Item Response Function

Increasing probability of

endorsing item

Increasing level of latent trait

Page 51: Data-pipeline using ALSPAC data

Non-parametric

• No fixed form on function of the relationship between trait and probability of positive response to each item

• Unlike polychoric, no assumption made about the distribution of the latent trait

Page 52: Data-pipeline using ALSPAC data

Bit about scaling

Page 53: Data-pipeline using ALSPAC data

(Guttman) Error Cells

• . tab mum01_001 mum04_001

Felt | Was restless miserable/unhappy | [mum04_001] [mum01_001] | ST / NT True | Total----------------------+----------------------+---------- ST / NT | 6,343 78 | 6,421 True | 234 54 | 288 ----------------------+----------------------+---------- Total | 6,577 132 | 6,709

• mum04_001 is more difficult than mum01_001• If mum01_001 and mum04_001 formed a hierarchy,

there would be a zero count in the top right cell

Page 54: Data-pipeline using ALSPAC data

EAS, Emotionality [00011]

CH often |

fusses and |

cries | Child cries easily

[emo_l_11_ | [emo_l_02_00011]

00011] | 0 1 | Total

-----------+----------------------+----------

0 | 8,248 503 | 8,751

1 | 180 363 | 543

-----------+----------------------+----------

Total | 8,428 866 | 9,294

CH often |

fusses and | Child tends to be

cries | somewhat emotional

[emo_l_11_ | [emo_l_06_00011]

00011] | 0 1 | Total

-----------+----------------------+----------

0 | 7,857 894 | 8,751

1 | 166 377 | 543

-----------+----------------------+----------

Total | 8,023 1,271 | 9,294

CH often |

fusses and | Child gets upset

cries | easily

[emo_l_11_ | [emo_l_15_00011]

00011] | 0 1 | Total

-----------+----------------------+----------

0 | 8,199 552 | 8,751

1 | 141 402 | 543

-----------+----------------------+----------

Total | 8,340 954 | 9,294

CH often |

fusses and | Child reacts

cries | intensely when upset

[emo_l_11_ | [emo_l_19_00011]

00011] | 0 1 | Total

-----------+----------------------+----------

0 | 7,607 1,144 | 8,751

1 | 178 365 | 543

-----------+----------------------+----------

Total | 7,785 1,509 | 9,294

Page 55: Data-pipeline using ALSPAC data

EAS, Emotionality [00011]

CH often |

fusses and |

cries | Child cries easily

[emo_l_11_ | [emo_l_02_00011]

00011] | 0 1 | Total

-----------+----------------------+----------

0 | 8,248 503 | 8,751

1 | 180 363 | 543

-----------+----------------------+----------

Total | 8,428 866 | 9,294

CH often |

fusses and | Child tends to be

cries | somewhat emotional

[emo_l_11_ | [emo_l_06_00011]

00011] | 0 1 | Total

-----------+----------------------+----------

0 | 7,857 894 | 8,751

1 | 166 377 | 543

-----------+----------------------+----------

Total | 8,023 1,271 | 9,294

CH often |

fusses and | Child gets upset

cries | easily

[emo_l_11_ | [emo_l_15_00011]

00011] | 0 1 | Total

-----------+----------------------+----------

0 | 8,199 552 | 8,751

1 | 141 402 | 543

-----------+----------------------+----------

Total | 8,340 954 | 9,294

CH often |

fusses and | Child reacts

cries | intensely when upset

[emo_l_11_ | [emo_l_19_00011]

00011] | 0 1 | Total

-----------+----------------------+----------

0 | 7,607 1,144 | 8,751

1 | 178 365 | 543

-----------+----------------------+----------

Total | 7,785 1,509 | 9,294

Σ = 655

Page 56: Data-pipeline using ALSPAC data

EAS, Emotionality [00011]

Difficulty emo_l_11 emo_l_02 emo_l_15 emo_l_06 emo_l_19Total Errors

emo_l_11 5.8% - 180 141 166 178 665

emo_l_02 9.3% 180 - 297 342 428 1247

emo_l_15 10.3% 141 297 - 349 430 1217

emo_l_06 13.7% 166 342 349 - 619 1476

emo_l_19 16.2% 178 428 430 619 - 1655

Page 57: Data-pipeline using ALSPAC data

EAS, Emotionality [00011]

Difficulty emo_l_11 emo_l_02 emo_l_15 emo_l_06 emo_l_19Total Errors

emo_l_11 5.8% - 180 141 166 178 665

emo_l_02 9.3% 180 - 297 342 428 1247

emo_l_15 10.3% 141 297 - 349 430 1217

emo_l_06 13.7% 166 342 349 - 619 1476

emo_l_19 16.2% 178 428 430 619 - 1655

Σ = 3130

Page 58: Data-pipeline using ALSPAC data

Expected Guttman Errors CH often |fusses and | cries | Child cries easily[emo_l_11_ | [emo_l_02_00011] 00011] | 0 1 | Total-----------+----------------------+---------- 0 | 8,248 503 | 8,751 1 | 180 363 | 543 -----------+----------------------+---------- Total | 8,428 866 | 9,294

Under perfect Guttman scaling, cell count = 0

Under marginal independence, cell count = [(8428/9294)*(543/9294)]*9294 =

492.4

Page 59: Data-pipeline using ALSPAC data

Expected Guttman Errors

• Total observed Guttman errors for emo_l_11

= 180+141+166+178

= 655

• Total expected Guttman errors for emo_l_11

= 492.4 + 487.26 + 468.74 + 454.84

= 1903.25

Loevinger H coefficient for emo_l_11 (H11)

= 1 – Σ(observed) / Σ(expected)

= 1 – (655/1903.25)

= 0.651

Page 60: Data-pipeline using ALSPAC data

loevH emo_*_00011

Item ObsEasynessP(Xj=1)

ObservedGuttman

errors

ExpectedGuttman

errorsLoevinger

H coeff z-stat.H0: Hj<=0

p-value

Numberof NSHjk

emo_l_11_00011 9294 0.0584 665 1903.25 0.6506 83.4427 0 0

emo_l_02_00011 9294 0.0932 1247 2742.48 0.5453 84.2498 0 0

emo_l_15_00011 9294 0.1026 1217 2887.01 0.57846 90.9775 0 0

emo_l_06_00011 9294 0.1368 1476 3104.49 0.52456 81.0815 0 0

emo_l_19_00011 9294 0.1624 1655 3043.97 0.4563 66.0645 0 0

Scale 9294 3130 6840.6 0.54244 126.6163 0

Hi

Page 61: Data-pipeline using ALSPAC data

loevH emo_*_00011

Item ObsEasynessP(Xj=1)

ObservedGuttman

errors

ExpectedGuttman

errorsLoevinger

H coeff z-stat.H0: Hj<=0

p-value

Numberof NSHjk

emo_l_11_00011 9294 0.0584 665 1903.25 0.6506 83.4427 0 0

emo_l_02_00011 9294 0.0932 1247 2742.48 0.5453 84.2498 0 0

emo_l_15_00011 9294 0.1026 1217 2887.01 0.57846 90.9775 0 0

emo_l_06_00011 9294 0.1368 1476 3104.49 0.52456 81.0815 0 0

emo_l_19_00011 9294 0.1624 1655 3043.97 0.4563 66.0645 0 0

Scale 9294 3130 6840.6 0.54244 126.6163 0

Loevinger H for scale

Page 62: Data-pipeline using ALSPAC data

Acceptable values of Hi, H

• Acceptable ScaleHi all > 0.3

this then implies H > 0.3

• Weak scale: 0.3 ≤ H < 0.4

• Medium scale: 0.5 ≤ H < 0.5

• Strong scale: 0.5 ≤ H

‘Mokken’ Scale

Page 63: Data-pipeline using ALSPAC data

loevH mum*_012

Observed Expected Number Difficulty Guttman Guttman Loevinger H0: Hj<=0 of NSItem Obs P(Xj=0) errors errors H coeff z-stat. p-value Hjk---------------------------------------------------------------------------------------------------mum01_012 6623 0.5886 4663 11129.80 0.58103 102.2589 0.00000 0mum02_012 6623 0.8629 5001 9567.51 0.47729 103.0832 0.00000 0mum03_012 6623 0.7315 7720 11642.62 0.33692 68.7850 0.00000 0mum04_012 6623 0.7331 6742 11163.54 0.39607 79.9763 0.00000 0mum05_012 6623 0.9050 3560 8243.22 0.56813 117.8687 0.00000 0mum06_012 6623 0.9244 3889 7076.82 0.45046 88.9681 0.00000 0mum07_012 6623 0.7944 6135 11147.15 0.44964 95.7725 0.00000 0mum08_012 6623 0.9349 2601 6340.91 0.58981 112.2358 0.00000 0mum09_012 6623 0.9500 2246 5144.76 0.56344 99.9250 0.00000 0mum10_012 6623 0.8474 5238 9951.33 0.47364 102.4057 0.00000 0mum11_012 6623 0.9064 3847 8124.81 0.52651 108.8859 0.00000 0mum12_012 6623 0.8655 5069 9987.87 0.49248 106.8244 0.00000 0mum13_012 6623 0.8209 4927 10431.72 0.52769 113.1608 0.00000 0---------------------------------------------------------------------------------------------------Scale 6623 30819 59976.03 0.48614 246.5451 0.00000

Page 64: Data-pipeline using ALSPAC data

loevH kid*_012 Observed Expected Number Difficulty Guttman Guttman Loevinger H0: Hj<=0 of NSItem Obs P(Xj=0) errors errors H coeff z-stat. p-value Hjk---------------------------------------------------------------------------------------------------kid01_012 5703 0.3730 8579 16440.13 0.47817 88.0802 0.00000 0kid02_012 5703 0.7998 9763 14118.52 0.30850 64.2818 0.00000 0kid03_012 5703 0.4780 12234 17415.56 0.29752 57.5953 0.00000 0kid04_012 5703 0.4638 13379 18379.14 0.27206 53.4953 0.00000 0kid05_012 5703 0.7891 8096 16659.66 0.51404 109.7532 0.00000 0kid06_012 5703 0.8048 9308 15947.63 0.41634 88.8534 0.00000 0kid07_012 5703 0.4277 10736 18173.50 0.40925 79.3781 0.00000 0kid08_012 5703 0.8197 7914 16091.78 0.50820 107.3113 0.00000 0kid09_012 5703 0.8040 9132 15004.31 0.39137 82.7063 0.00000 0kid10_012 5703 0.6846 8939 18013.31 0.50376 103.7182 0.00000 0kid11_012 5703 0.8313 7740 15178.88 0.49008 101.7086 0.00000 0kid12_012 5703 0.7315 9445 17874.94 0.47161 98.9656 0.00000 0kid13_012 5703 0.8101 7959 15065.70 0.47171 99.6037 0.00000 0---------------------------------------------------------------------------------------------------Scale 5703 61612 1.1e+05 0.42516 219.7088 0.00000

Page 65: Data-pipeline using ALSPAC data

loevH kid*_012 Observed Expected Number Difficulty Guttman Guttman Loevinger H0: Hj<=0 of NSItem Obs P(Xj=0) errors errors H coeff z-stat. p-value Hjk---------------------------------------------------------------------------------------------------kid01_012 5703 0.3730 8579 16440.13 0.47817 88.0802 0.00000 0kid02_012 5703 0.7998 9763 14118.52 0.30850 64.2818 0.00000 0kid03_012 5703 0.4780 12234 17415.56 0.29752 57.5953 0.00000 0kid04_012 5703 0.4638 13379 18379.14 0.27206 53.4953 0.00000 0kid05_012 5703 0.7891 8096 16659.66 0.51404 109.7532 0.00000 0kid06_012 5703 0.8048 9308 15947.63 0.41634 88.8534 0.00000 0kid07_012 5703 0.4277 10736 18173.50 0.40925 79.3781 0.00000 0kid08_012 5703 0.8197 7914 16091.78 0.50820 107.3113 0.00000 0kid09_012 5703 0.8040 9132 15004.31 0.39137 82.7063 0.00000 0kid10_012 5703 0.6846 8939 18013.31 0.50376 103.7182 0.00000 0kid11_012 5703 0.8313 7740 15178.88 0.49008 101.7086 0.00000 0kid12_012 5703 0.7315 9445 17874.94 0.47161 98.9656 0.00000 0kid13_012 5703 0.8101 7959 15065.70 0.47171 99.6037 0.00000 0---------------------------------------------------------------------------------------------------Scale 5703 61612 1.1e+05 0.42516 219.7088 0.00000

Need a procedure to derive a Mokken scale by selecting a subset of the above items

Page 66: Data-pipeline using ALSPAC data

MSP

Mokken Scaling Procedure

Page 67: Data-pipeline using ALSPAC data

Mokken Scaling Procedure

• Bottom-up, hierarchical clustering procedure

• Contrast to top-down procedures such as PCA/FA

Page 68: Data-pipeline using ALSPAC data

Employs Hij

CH often |fusses and | cries | Child cries easily[emo_l_11_ | [emo_l_02_00011] 00011] | 0 1 | Total-----------+----------------------+---------- 0 | 8,248 503 | 8,751 1 | 180 363 | 543 -----------+----------------------+---------- Total | 8,428 866 | 9,294

Observed Guttman errors = 180Expected Guttman errors* = 492.4

Hij = 1 – (# observed / # expected) = 0.634

* Under marginal independence

Page 69: Data-pipeline using ALSPAC data

Procedure

1. Derive Hij for all pairs of items and select the pair with the highest value (> 0.3). Favour more difficult items if two pairs give the same Hij

2. Find the next best item in the scale:

If item k is a new item not already in the scale then calculate:

Hik for all items i in the scale, and also

Hk between item k and the current scale as a whole, and

H for each new scale (items i plus k)

again, favouring more difficult items and those with higher Hk in

the event of a tied H value and ensuring all H/ Hik/ Hk > 0.3

Page 70: Data-pipeline using ALSPAC data

Worked example using emo_*_00011

Difficulty emo_l_11 emo_l_02 emo_l_15 emo_l_06 emo_l_19Total Errors

emo_l_11 5.8% - 180 141 166 178 665

emo_l_02 9.3% 492.40 - 297 342 428 1247

emo_l_15 10.3% 487.26 777.11 - 349 430 1217

emo_l_06 13.7% 468.74 747.57 823.54 - 619 1476

emo_l_19 16.2% 454.84 725.39 799.11 1064.64 - 1655

Expected

Observed

Page 71: Data-pipeline using ALSPAC data

Stage 1. derive Hij

i j # obs # exp Hij

emo_l_11 emo_l_02 180 492.4 0.63

emo_l_11 emo_l_15 141 487.26 0.71

emo_l_11 emo_l_06 166 468.74 0.65

emo_l_11 emo_l_19 178 454.84 0.61

emo_l_02 emo_l_15 297 777.11 0.62

emo_l_02 emo_l_06 342 747.57 0.54

emo_l_02 emo_l_19 428 725.39 0.41

emo_l_15 emo_l_06 349 823.54 0.58

emo_l_15 emo_l_19 430 799.11 0.46

emo_l_06 emo_l_19 619 1064.64 0.42

Page 72: Data-pipeline using ALSPAC data

Select highest Hij

i j # obs # exp Hij

emo_l_11 emo_l_02 180 492.4 0.63

emo_l_11 emo_l_15 141 487.26 0.71

emo_l_11 emo_l_06 166 468.74 0.65

emo_l_11 emo_l_19 178 454.84 0.61

emo_l_02 emo_l_15 297 777.11 0.62

emo_l_02 emo_l_06 342 747.57 0.54

emo_l_02 emo_l_19 428 725.39 0.41

emo_l_15 emo_l_06 349 823.54 0.58

emo_l_15 emo_l_19 430 799.11 0.46

emo_l_06 emo_l_19 619 1064.64 0.42

Page 73: Data-pipeline using ALSPAC data

Stage 2. Find next best item

• Items 11 and 15 were selected

• Calculate H for each ‘new’ scale and Hk between each item not in scale, and the current scale

Item Hk,11 Hk,15 Hk new H

emo_l_02 0.63 0.62 1 – (477/1269.51) = 0.624 0.648

emo_l_06 0.65 0.58 1 – (515/1292.28) = 0.601 0.631

emo_l_19 0.61 0.46 1 – (608/1253.95) = 0.515 0.570

• Select the new item with highest H and Hk provided all H > 0.3

• Repeat step offering emo_l_06 and emo_l_19 to this new scale…

Page 74: Data-pipeline using ALSPAC data

msp emo_*_00011 Scale: 1----------Significance level: 0.005000The two first items selected in the scale 1 are emo_l_11_00011 and emo_l_15_00011 (Hjk=0.7106)Significance level: 0.003846The item emo_l_02_00011 is selected in the scale 1Hj=0.6243 H=0.6482Significance level: 0.003333The item emo_l_06_00011 is selected in the scale 1Hj=0.5799 H=0.6115Significance level: 0.003125The item emo_l_19_00011 is selected in the scale 1Hj=0.4563 H=0.5424Significance level: 0.003125There is no more items remaining.

Item ObsDifficulty P(Xj=0)

Observed Guttman

errors

Expected Guttman

errorsLoevinger

H coeff z-stat.

H0: Hj<=0 p-value

 

Number of NS

Hjk

emo_l_19_00011 9294 0.1624 1655 3043.97 0.4563 66.0645 0 0

emo_l_06_00011 9294 0.1368 1476 3104.49 0.52456 81.0815 0 0

emo_l_02_00011 9294 0.0932 1247 2742.48 0.5453 84.2498 0 0

emo_l_11_00011 9294 0.0584 665 1903.25 0.6506 83.4427 0 0

emo_l_15_00011 9294 0.1026 1217 2887.01 0.57846 90.9775 0 0

Scale 9294 3130 6840.6 0.54244 126.6163 0

Page 75: Data-pipeline using ALSPAC data

msp kid*_012Scale: 1----------Significance level: 0.000641The two first items selected in the scale 1 are kid10_012 and kid11_012 (Hjk=0.7083)Significance level: 0.000562The item kid01_012 is selected in the scale 1 Hj=0.6145 H=0.6467Significance level: 0.000505The item kid05_012 is selected in the scale 1 Hj=0.6254 H=0.6358Significance level: 0.000463The item kid08_012 is selected in the scale 1 Hj=0.6373 H=0.6364Significance level: 0.000431The item kid12_012 is selected in the scale 1 Hj=0.5845 H=0.6175Significance level: 0.000407The item kid13_012 is selected in the scale 1 Hj=0.5642 H=0.6032Significance level: 0.000388The item kid06_012 is selected in the scale 1 Hj=0.4978 H=0.5769Significance level: 0.000373The item kid07_012 is selected in the scale 1 Hj=0.4437 H=0.5463Significance level: 0.000362The item kid09_012 is selected in the scale 1 Hj=0.4285 H=0.5243Significance level: 0.000355The item kid02_012 is selected in the scale 1 Hj=0.3038 H=0.4884Significance level: 0.000350The item kid03_012 is selected in the scale 1 Hj=0.3041 H=0.4569Significance level: 0.000347None new item can be selected in the scale 1 because all the Hj are lesser than .3 or none new item had all the related Hjk coefficients significantely greater than 0.

Page 76: Data-pipeline using ALSPAC data

Item kid04_012 has been dropped and 12 item scale now is acceptable

Item ObsDifficultyP(Xj=0)

ObservedGuttman

errors

ExpectedGuttman

errorsLoevinger

H coeff z-stat.H0: Hj<=0

p-value

Numberof NSHjk

kid01_012 5703 0.373 7050 14408.47 0.51 87.64 0 0

kid07_012 5703 0.428 9134 15880.51 0.42 76.60 0 0

kid03_012 5703 0.478 10582 15206.23 0.30 54.64 0 0

kid10_012 5703 0.685 7757 16308.82 0.52 103.87 0 0

kid12_012 5703 0.732 8242 16294.06 0.49 100.41 0 0

kid05_012 5703 0.789 7219 15316.30 0.53 110.12 0 0

kid02_012 5703 0.800 8928 13008.55 0.31 63.83 0 0

kid09_012 5703 0.804 8284 13829.26 0.40 82.81 0 0

kid06_012 5703 0.805 8345 14690.90 0.43 90.08 0 0

kid13_012 5703 0.810 7139 13885.77 0.49 100.26 0 0

kid08_012 5703 0.820 6959 14807.57 0.53 109.27 0 0

kid11_012 5703 0.831 6827 13968.33 0.51 103.55 0 0

Scale 5703 48233 88802.39 0.45685 219.11 0

Page 77: Data-pipeline using ALSPAC data

msp *_01234 (EAS)Scale: 1----------Significance level: 0.000263The two first items selected in the scale 1 are emo_l_11_01234 and emo_l_15_01234 (Hjk=0.7457)The following items are excluded at this step: soc_l_03_01234 act_l_04_01234 act_l_07_01234

act_l_09_01234 soc_l_10_01234 a> ct_l_13_01234 soc_l_16_01234 act_l_17_01234Significance level: 0.000250The item emo_l_02_01234 is selected in the scale 1Hj=0.7093 H=0.7208Significance level: 0.000239The item emo_l_06_01234 is selected in the scale 1Hj=0.6106 H=0.6644Significance level: 0.000230The item emo_l_19_01234 is selected in the scale 1Hj=0.4860 H=0.5826The following items are excluded at this step: shy_l_20_01234Significance level: 0.000224None new item can be selected in the scale 1 because all the Hj are lesser than .3 or none new item had all the related Hjk coefficients significantely greater than 0.

Observed Expected Number Difficulty Guttman Guttman Loevinger H0: Hj<=0 of NSItem Obs P(Xj=0) errors errors H coeff z-stat. p-value Hjk-----------------------------------------------------------------------------------------------------emo_l_19_01234 8928 0.1427 13758 26768.91 0.48605 83.4028 0.00000 0emo_l_06_01234 8928 0.0657 9715 23068.69 0.57887 97.0460 0.00000 0emo_l_02_01234 8928 0.0793 9264 22722.70 0.59230 101.4626 0.00000 0emo_l_11_01234 8928 0.1615 7812 21586.83 0.63811 101.9176 0.00000 0emo_l_15_01234 8928 0.0579 8187 22615.95 0.63800 109.0215 0.00000 0-----------------------------------------------------------------------------------------------------Scale 8928 24368 58381.54 0.58261 154.7274 0.00000

Page 78: Data-pipeline using ALSPAC data

Scale: 2----------Significance level: 0.000476The two first items selected in the scale 2 are act_l_09_01234 and act_l_13_01234 (Hjk=0.6861)The following items are excluded at this step: shy_l_01_01234 shy_l_08_01234 shy_l_12_01234 shy_l_14_01234 soc_l_18_01234 s> hy_l_20_01234Significance level: 0.000446The item act_l_04_01234 is selected in the scale 2Hj=0.6439 H=0.6585Significance level: 0.000424The item act_l_17_01234 is selected in the scale 2Hj=0.4013 H=0.5339Significance level: 0.000407The item act_l_07_01234 is selected in the scale 2Hj=0.3528 H=0.4674Significance level: 0.000394None new item can be selected in the scale 2 because all the Hj are lesser than .3 or none new item had all the related Hjk coefficients significantely greater than 0.

Observed Expected Number Difficulty Guttman Guttman Loevinger H0: Hj<=0 of NSItem Obs P(Xj=0) errors errors H coeff z-stat. p-value Hjk---------------------------------------------------------------------------------------------------act_l_07_01234 8928 0.0060 12055 18626.72 0.35281 58.5754 0.00000 0act_l_17_01234 8928 0.0030 11927 19687.06 0.39417 61.8768 0.00000 0act_l_04_01234 8928 0.0016 9516 19832.30 0.52018 87.5901 0.00000 0act_l_09_01234 8928 0.0087 12018 23677.34 0.49243 82.3819 0.00000 0act_l_13_01234 8928 0.0010 8508 19606.32 0.56606 95.2809 0.00000 0---------------------------------------------------------------------------------------------------Scale 8928 27012 50714.87 0.46738 121.7656 0.00000

Page 79: Data-pipeline using ALSPAC data

Scale: 3----------Significance level: 0.001111The two first items selected in the scale 3 are shy_l_08_01234 and shy_l_12_01234 (Hjk=0.6433)The following items are excluded at this step: soc_l_03_01234 soc_l_05_01234 soc_l_10_01234 soc_l_16_01234Significance level: 0.001020The item shy_l_01_01234 is selected in the scale 3Hj=0.4853 H=0.5448Significance level: 0.000962The item shy_l_14_01234 is selected in the scale 3Hj=0.5151 H=0.5294Significance level: 0.000926The item shy_l_20_01234 is selected in the scale 3Hj=0.4863 H=0.5098The following items are excluded at this step: soc_l_18_01234Significance level: 0.000926There is no more items remaining.

Observed Expected Number Difficulty Guttman Guttman Loevinger H0: Hj<=0 of NSItem Obs P(Xj=0) errors errors H coeff z-stat. p-value Hjk---------------------------------------------------------------------------------------------------shy_l_20_01234 8928 0.0736 13559 26393.66 0.48628 78.0926 0.00000 0shy_l_14_01234 8928 0.0719 10629 23587.87 0.54939 91.0884 0.00000 0shy_l_01_01234 8928 0.1221 10559 21603.72 0.51124 82.3020 0.00000 0shy_l_08_01234 8928 0.3863 12010 22334.00 0.46225 75.4764 0.00000 0shy_l_12_01234 8928 0.3985 10079 22018.14 0.54224 88.5938 0.00000 0---------------------------------------------------------------------------------------------------Scale 8928 28418 57968.70 0.50977 130.7685 0.00000

Page 80: Data-pipeline using ALSPAC data

Scale: 4----------Significance level: 0.005000The two first items selected in the scale 4 are soc_l_03_01234 and soc_l_10_01234 (Hjk=0.4400)Significance level: 0.003846The item soc_l_05_01234 is selected in the scale 4Hj=0.3941 H=0.4082Significance level: 0.003333The item soc_l_16_01234 is selected in the scale 4Hj=0.3693 H=0.3889The following items are excluded at this step: soc_l_18_01234Significance level: 0.003333There is no more items remaining.

Observed Expected Number Difficulty Guttman Guttman Loevinger H0: Hj<=0 of NSItem Obs P(Xj=0) errors errors H coeff z-stat. p-value Hjk---------------------------------------------------------------------------------------------------soc_l_16_01234 8928 0.0043 8974 14228.65 0.36930 50.2618 0.00000 0soc_l_05_01234 8928 0.0040 8936 14840.68 0.39787 55.7348 0.00000 0soc_l_03_01234 8928 0.0010 7252 12405.87 0.41544 58.1266 0.00000 0soc_l_10_01234 8928 0.0077 9878 15864.61 0.37736 54.2418 0.00000 0---------------------------------------------------------------------------------------------------Scale 8928 17520 28669.91 0.38891 76.7622 0.00000

There is only one item remaining (soc_l_18_01234).

Relate this back to PCA results

Page 81: Data-pipeline using ALSPAC data

MSP – Simpler example

msp mum*_011on a sample of 20 children

Page 82: Data-pipeline using ALSPAC data

Scale: 1----------Significance level: 0.000641The two first items selected in the scale 1 are fg05_011 and fg10_011 (Hjk=1.0000)Significance level: 0.000562The item fg03_011 is selected in the scale 1 Hj=1.0000 H=1.0000Significance level: 0.000505The item fg13_011 is selected in the scale 1 Hj=1.0000 H=1.0000Significance level: 0.000463The item fg02_011 is selected in the scale 1 Hj=0.8261 H=0.9205Significance level: 0.000431The item fg11_011 is selected in the scale 1 Hj=0.7059 H=0.8452Significance level: 0.000407The item fg09_011 is selected in the scale 1 Hj=0.5506 H=0.7697Significance level: 0.000388The item fg01_011 is selected in the scale 1 Hj=0.6324 H=0.7412Significance level: 0.000373The item fg08_011 is selected in the scale 1 Hj=0.6635 H=0.7225Significance level: 0.000362The item fg06_011 is selected in the scale 1 Hj=0.6078 H=0.6964Significance level: 0.000355The item fg12_011 is selected in the scale 1 Hj=0.4872 H=0.6531The following items are excluded at this step: fg04_011Significance level: 0.000352The item fg07_011 is selected in the scale 1 Hj=0.3929 H=0.6100Significance level: 0.000352There is no more items remaining.

Page 83: Data-pipeline using ALSPAC data

Item ObsEasyness P(Xj=1)

Observed Guttman errors

Expected Guttman errors

Loevinger H coeff z-stat.

H0: Hj<=0 p-value

Number of NS Hjk

fg09_011 20 0.15 7 22.95 0.695 6.722 0 2

fg13_011 20 0.15 5 22.95 0.7821 7.5649 0 1

fg11_011 20 0.2 12 28.8 0.5833 6.37 0 3

fg08_011 20 0.2 9 28.8 0.6875 7.5074 0 3

fg05_011 20 0.2 7 28.8 0.7569 8.2658 0 1

fg12_011 20 0.25 16 31.75 0.4961 5.5536 0 4

fg02_011 20 0.25 15 31.75 0.5276 5.9063 0 5

fg06_011 20 0.25 14 31.75 0.5591 6.2589 0 3

fg10_011 20 0.25 10 31.75 0.685 7.6693 0 1

fg07_011 20 0.5 17 28 0.3929 3.4118 0.0003 8

fg03_011 20 0.5 13 28 0.5357 4.6525 0 6

fg01_011 20 0.6 7 23.2 0.6983 5.1154 0 5

Scale 20   66 169.25 0.61 14.972 0  

MSP output (reordered by difficulty and H)

Page 84: Data-pipeline using ALSPAC data

ID fg01 fg02 fg03 fg04 fg05 fg06 fg07 fg08 fg09 fg10 fg11 fg12 fg13

1 0 0 1 0 0 0 1 0 0 0 0 0 0

2 0 0 0 0 0 0 0 0 0 0 0 0 0

3 0 0 0 0 0 0 0 0 0 0 0 0 0

4 1 0 1 0 0 0 1 0 0 0 0 0 0

5 1 0 0 1 0 1 0 0 0 0 0 0 0

6 1 0 0 1 0 0 1 0 0 0 0 0 0

7 1 0 1 0 0 0 0 0 0 0 0 0 0

8 0 1 1 0 0 0 0 0 0 0 1 0 0

9 1 1 1 0 1 0 1 0 0 1 0 1 1

10 1 1 1 1 1 1 1 1 1 1 1 0 1

11 0 0 0 0 0 0 0 0 0 0 0 0 0

12 1 0 1 0 1 1 0 1 0 1 1 1 0

13 0 0 0 0 0 0 0 0 0 0 0 0 0

14 0 0 0 0 0 0 0 0 0 0 0 0 0

15 0 0 1 1 0 0 1 0 0 0 0 0 0

16 1 0 0 0 0 0 1 0 0 0 0 0 0

17 1 0 0 0 0 1 1 1 1 0 0 1 0

18 1 1 1 1 1 1 1 1 1 1 1 1 1

19 1 0 0 0 0 0 1 0 0 0 0 1 0

20 1 1 1 0 0 0 0 0 0 1 0 0 0

Original dataset

Page 85: Data-pipeline using ALSPAC data

Mokken set of variables reorderedID fg09 fg13 fg11 fg08 fg05 fg12 fg02 fg06 fg10 fg07 fg03 fg01 fg04

1 0 0 0 0 0 0 0 0 0 1 1 0 0

2 0 0 0 0 0 0 0 0 0 0 0 0 0

3 0 0 0 0 0 0 0 0 0 0 0 0 0

4 0 0 0 0 0 0 0 0 0 1 1 1 0

5 0 0 0 0 0 0 0 1 0 0 0 1 1

6 0 0 0 0 0 0 0 0 0 1 0 1 1

7 0 0 0 0 0 0 0 0 0 0 1 1 0

8 0 0 1 0 0 0 1 0 0 0 1 0 0

9 0 1 0 0 1 1 1 0 1 1 1 1 0

10 1 1 1 1 1 0 1 1 1 1 1 1 1

11 0 0 0 0 0 0 0 0 0 0 0 0 0

12 0 0 1 1 1 1 0 1 1 0 1 1 0

13 0 0 0 0 0 0 0 0 0 0 0 0 0

14 0 0 0 0 0 0 0 0 0 0 0 0 0

15 0 0 0 0 0 0 0 0 0 1 1 0 1

16 0 0 0 0 0 0 0 0 0 1 0 1 0

17 1 0 0 1 0 1 0 1 0 1 0 1 0

18 1 1 1 1 1 1 1 1 1 1 1 1 1

19 0 0 0 0 0 1 0 0 0 1 0 1 0

20 0 0 0 0 0 0 1 0 1 0 1 1 0

Page 86: Data-pipeline using ALSPAC data

Guttman scale

• Will produce a perfect scale pattern• It will be possible to sort the cases and variables

in such a way to produce a triangular pattern with a clear delineation between the zeros and ones

Page 87: Data-pipeline using ALSPAC data

Cases sorted to create triangular 0/1 splitID fg09 fg13 fg11 fg08 fg05 fg12 fg02 fg06 fg10 fg07 fg03 fg01 fg04

2 0 0 0 0 0 0 0 0 0 0 0 0 0

3 0 0 0 0 0 0 0 0 0 0 0 0 0

11 0 0 0 0 0 0 0 0 0 0 0 0 0

13 0 0 0 0 0 0 0 0 0 0 0 0 0

14 0 0 0 0 0 0 0 0 0 0 0 0 0

7 0 0 0 0 0 0 0 0 0 0 1 1 0

6 0 0 0 0 0 0 0 0 0 1 0 1 1

16 0 0 0 0 0 0 0 0 0 1 0 1 0

1 0 0 0 0 0 0 0 0 0 1 1 0 0

15 0 0 0 0 0 0 0 0 0 1 1 0 1

4 0 0 0 0 0 0 0 0 0 1 1 1 0

5 0 0 0 0 0 0 0 1 0 0 0 1 1

20 0 0 0 0 0 0 1 0 1 0 1 1 0

19 0 0 0 0 0 1 0 0 0 1 0 1 0

8 0 0 1 0 0 0 1 0 0 0 1 0 0

12 0 0 1 1 1 1 0 1 1 0 1 1 0

9 0 1 0 0 1 1 1 0 1 1 1 1 0

17 1 0 0 1 0 1 0 1 0 1 0 1 0

10 1 1 1 1 1 0 1 1 1 1 1 1 1

18 1 1 1 1 1 1 1 1 1 1 1 1 1

Page 88: Data-pipeline using ALSPAC data

ID fg09 fg13 fg11 fg08 fg05 fg12 fg02 fg06 fg10 fg07 fg03 fg01 fg04

2 0 0 0 0 0 0 0 0 0 0 0 0 0

3 0 0 0 0 0 0 0 0 0 0 0 0 0

11 0 0 0 0 0 0 0 0 0 0 0 0 0

13 0 0 0 0 0 0 0 0 0 0 0 0 0

14 0 0 0 0 0 0 0 0 0 0 0 0 0

7 0 0 0 0 0 0 0 0 0 0 1 1 0

6 0 0 0 0 0 0 0 0 0 1 0 1 1

16 0 0 0 0 0 0 0 0 0 1 0 1 0

1 0 0 0 0 0 0 0 0 0 1 1 0 0

15 0 0 0 0 0 0 0 0 0 1 1 0 1

4 0 0 0 0 0 0 0 0 0 1 1 1 0

5 0 0 0 0 0 0 0 1 0 0 0 1 1

20 0 0 0 0 0 0 1 0 1 0 1 1 0

19 0 0 0 0 0 1 0 0 0 1 0 1 0

8 0 0 1 0 0 0 1 0 0 0 1 0 0

12 0 0 1 1 1 1 0 1 1 0 1 1 0

9 0 1 0 0 1 1 1 0 1 1 1 1 0

17 1 0 0 1 0 1 0 1 0 1 0 1 0

10 1 1 1 1 1 0 1 1 1 1 1 1 1

18 1 1 1 1 1 1 1 1 1 1 1 1 1

Violations highlighted in yellow

Page 89: Data-pipeline using ALSPAC data

ID fg09 fg13 fg11 fg08 fg05 fg12 fg02 fg06 fg10 fg07 fg03 fg01

Person Guttman

errors

2 0 0 0 0 0 0 0 0 0 0 0 0 0

3 0 0 0 0 0 0 0 0 0 0 0 0 0

11 0 0 0 0 0 0 0 0 0 0 0 0 0

13 0 0 0 0 0 0 0 0 0 0 0 0 0

14 0 0 0 0 0 0 0 0 0 0 0 0 0

7 0 0 0 0 0 0 0 0 0 0 1 1 0

6 0 0 0 0 0 0 0 0 0 1 0 1 1

16 0 0 0 0 0 0 0 0 0 1 0 1 1

1 0 0 0 0 0 0 0 0 0 1 1 0 2

15 0 0 0 0 0 0 0 0 0 1 1 0 2

4 0 0 0 0 0 0 0 0 0 1 1 1 0

5 0 0 0 0 0 0 0 1 0 0 0 1 3

20 0 0 0 0 0 0 1 0 1 0 1 1 3

19 0 0 0 0 0 1 0 0 0 1 0 1 5

8 0 0 1 0 0 0 1 0 0 0 1 0 12

12 0 0 1 1 1 1 0 1 1 0 1 1 10

9 0 1 0 0 1 1 1 0 1 1 1 1 6

17 1 0 0 1 0 1 0 1 0 1 0 1 16

10 1 1 1 1 1 0 1 1 1 1 1 1 5

18 1 1 1 1 1 1 1 1 1 1 1 1 0

Item Guttman errors [0] 0 1 2 2 3 6 8 8 8 11 10 7  

Item Guttman errors [1] 7 4 10 7 4 10 7 6 2 6 3 0  

Total 7 5 12 9 7 16 15 14 10 17 13 7  

66

Page 90: Data-pipeline using ALSPAC data

Each yellow square contributed at least one Guttman error to the scale

Item ObsEasyness P(Xj=1)

Observed Guttman errors

Expected Guttman errors

Loevinger H coeff z-stat.

H0: Hj<=0 p-value

Number of NS Hjk

fg09_011 20 0.15 7 22.95 0.695 6.722 0 2

fg13_011 20 0.15 5 22.95 0.7821 7.5649 0 1

fg11_011 20 0.2 12 28.8 0.5833 6.37 0 3

fg08_011 20 0.2 9 28.8 0.6875 7.5074 0 3

fg05_011 20 0.2 7 28.8 0.7569 8.2658 0 1

fg12_011 20 0.25 16 31.75 0.4961 5.5536 0 4

fg02_011 20 0.25 15 31.75 0.5276 5.9063 0 5

fg06_011 20 0.25 14 31.75 0.5591 6.2589 0 3

fg10_011 20 0.25 10 31.75 0.685 7.6693 0 1

fg07_011 20 0.5 17 28 0.3929 3.4118 0.0003 8

fg03_011 20 0.5 13 28 0.5357 4.6525 0 6

fg01_011 20 0.6 7 23.2 0.6983 5.1154 0 5

Scale 20   66 169.25 0.61 14.972 0  

Page 91: Data-pipeline using ALSPAC data

Monotone Homogeneity and Double Monotonicity

Page 92: Data-pipeline using ALSPAC data

Monotone Homogenity

Page 93: Data-pipeline using ALSPAC data

Double Monotonicity

• Lead in to the next slides which demonstrate the P(1,1) matrix

Page 94: Data-pipeline using ALSPAC data

fg09 fg13 fg11 fg08 fg05 fg12 fg02 fg06 fg10 fg07 fg03 fg01

fg09 -

fg13 2 -

fg11 2 2 -

fg08 3 2 3 -

fg05 2 3 3 3 -

fg12 3 3 2 3 3 -

fg02 2 2 2 2 3 2 -

fg06 3 3 3 4 3 3 2 -

fg10 2 2 3 3 4 3 4 3 -

fg07 3 3 2 3 3 4 3 3 3 -

fg03 2 3 4 4 4 3 5 3 5 6 -

fg01 3 3 4 4 4 5 4 5 5 8 7 -

P(1,1) Matrix for sample of 20 cases

Page 95: Data-pipeline using ALSPAC data

fg09 fg13 fg11 fg08 fg05 fg12 fg02 fg06 fg10 fg07 fg03 fg01

fg09 -

fg13 0.10 -

fg11 0.10 0.10 -

fg08 0.15 0.10 0.15 -

fg05 0.10 0.15 0.15 0.15 -

fg12 0.15 0.15 0.10 0.15 0.15 -

fg02 0.10 0.10 0.10 0.10 0.15 0.10 -

fg06 0.15 0.15 0.15 0.20 0.15 0.15 0.10 -

fg10 0.10 0.10 0.15 0.15 0.20 0.15 0.20 0.15 -

fg07 0.15 0.15 0.10 0.15 0.15 0.20 0.15 0.15 0.15 -

fg03 0.10 0.15 0.20 0.20 0.20 0.15 0.25 0.15 0.25 0.30 -

fg01 0.15 0.15 0.20 0.20 0.20 0.25 0.20 0.25 0.25 0.40 0.35 -

Page 96: Data-pipeline using ALSPAC data

fg11 fg08 fg13 fg06 fg09 fg02 fg05 fg12 fg10 fg03 fg04 fg07 fg01

fg11 -

fg08 596 -

fg13 575 592 -

fg06 516 574 520 -

fg09 473 543 562 454 -

fg02 417 443 469 415 457 -

fg05 641 748 653 615 555 529 -

fg12 700 735 748 627 637 534 813 -

fg10 794 772 771 751 688 624 859 1019 -

fg03 664 707 766 777 766 858 833 1032 1225 -

fg04 693 739 806 771 812 794 873 1036 1275 1947 -

fg07 786 869 925 855 925 905 984 1238 1431 2096 2132 -

fg01 902 988 972 1054 992 933 1135 1339 1628 2151 2216 2483 -

P(1,1) for full sample

Page 97: Data-pipeline using ALSPAC data

fg11 fg08 fg13 fg06 fg09 fg02 fg05 fg12 fg10 fg03 fg04 fg07 fg01

fg11 -

fg08 0.10 -

fg13 0.10 0.10 -

fg06 0.09 0.10 0.09 -

fg09 0.08 0.10 0.10 0.08 -

fg02 0.07 0.08 0.08 0.07 0.08 -

fg05 0.11 0.13 0.11 0.11 0.10 0.09 -

fg12 0.12 0.13 0.13 0.11 0.11 0.09 0.14 -

fg10 0.14 0.14 0.14 0.13 0.12 0.11 0.15 0.18 -

fg03 0.12 0.12 0.13 0.14 0.13 0.15 0.15 0.18 0.21 -

fg04 0.12 0.13 0.14 0.14 0.14 0.14 0.15 0.18 0.22 0.34 -

fg07 0.14 0.15 0.16 0.15 0.16 0.16 0.17 0.22 0.25 0.37 0.37 -

fg01 0.16 0.17 0.17 0.18 0.17 0.16 0.20 0.23 0.29 0.38 0.39 0.44 -