bijker, m. (2010) making measures and inferences reserve

MAKING MEASURES FOR

ACCURATE INFERENCESMONIQUE BIJKER

MARCEL VAN DER KLINK

ELS BOSHUIZEN

CELSTEC, OPEN UNIVERSITY OF THE

NETHERLANDS

OVERVIEW Practical and theoretical rationales Literature review Fundamental measurement to improve

theory The development of items and scales Participants Results Differences between educational science

and psychology students

PRACTICAL BACKGROUND

Instruments for self-reported generic competences, to predict learning performance measures and labor market success measures in causal models (SEM)

TO

WER

OF B

AB

EL

Self-regulating learning capabilities ?

An intra-individual system of motivations, expectancies, and learning strategies?

For our purposes, can we use separate variables from the system?

Self-directing learning capabilies?

Self-directing career capabilities?

VAGUENESS The concepts are

never studied simultaneously and never operationalized and validated simultaneously.

Unknown whether they are similar or different.

FINDINGS BASED ON LITERATURE Self-efficacy (SE) and self-regulating learning

capabilities (SRLC): bottom-up concepts, emerging from social-cognitive experimental research. Predictors of the (more frequent) use of cognitive strategies and predictors of academic achievement (Pintrich et al., 1991, 1993)

SE: “People’s judgments of their capabilities to organize and execute courses of action required to attain designated types of performances” (Bandura, 1986, p. 391)

SRLC: planning, monitoring, evaluation. Effort, perseverance, and persistence (Pintrich et al., 1991, 1993).

FINDINGS BASED ON LITERATURE Self-directed learning and career capabilities (SDLC and

SDCC): top down concepts, emerging from descriptive adult learning theory, multidisciplinary career theory, and informal learning environments.

Influenced by social, economic, and political perspectives.

Predictors of employability. SDLC: “A characteristic adaptation to influence work-

related learning processes in order to cope for oneself on the labour market” (Raemdonck, 2006, p.13).

SDCC: “A characteristic adaptation to influence career processes in order to cope for oneself on the labour market” (Raemdonck, 2006, p.13).

UNADDRESSED QUESTIONS1. Can operationalizations of

self-regulating (SRLC; TSE) and self-directing capabilities (SDLC-SDCC) be combined in one construct?

2. Do the concepts predict different outcomes?

3. Are there any differences in these concepts between different groups of adult learners in formal education programs?

APPROACH

Use of 36 existing and the development of 48 new, theory-based items.

Collection of real data. The use of a measurement theory that defines the measures,

and constructs person capability measures independent from the items, and items independent from the persons: the Rasch model

Selection of items that fit the model and verification of the construct validity and dimensionality.

Creation of measures in the first sample and anchoring the measures in the second sample on the first one, to correct for possibly different response patterns on items.

WHY THE RASCH MODEL?

1. Rasch person and item measures are invariant across samples and tests (generalization).

2. Rasch transforms qualitatively ordered (Likert type) raw scores in mathematically ordered person and item interval measures. Each unit of measurement is the same as the next one.

3. Rasch recognizes that items contribute differently to the underlying variable (in difficulty, or endorsability).

4. Rasch recognizes that scale distances (1-2; 2-3; 3-4; 4-5) in Likert-type items are unequal. Scales of items should fit the Rasch model, to measure person capabilities invariantly. Hence, Likert raw scores are unsuitable to be summed up, and will bias statistical analyses.

5. Generalizability theory and CFA cannot adjust for targeting and the lack of interval properties of scales.

OBSERVED AVERAGE MEASURES FOR PERSON (scored) (ILLUSTRATED BY AN OBSERVED CATEGORY)-1 0 1 2 3 4|-----------+-----------+-----------+-----------+-----------| NUM ITEM| 21 3 4 5 | 7 Q8| 1 2 3 4 5 | 10 Q11| 12 3 4 5 | 23 Q27| 1 23 4 5 | 28 Q33| 1 2 3 4 5 | 19 Q23| 2 3 1 4 5 | 18 Q22R| 12 3 4 5 | 16 Q19| 1 2 3 4 5 | 8 Q9| 12 34 5 | 4 Q5| 23 4 5 | 6 Q7| 2 3 4 5 | 33 Q38| 2 13 4 5 | 24 Q28| 21 3 4 5 | 32 Q37| 2 1 3 4 5 | 22 Q26| 2 3 4 5 | 27 Q32| 1 2 3 4 5 | 11 Q12| 2 3 4 5 | 1 Q2| 2 3 4 5 | 15 Q18| 1 2 3 4 5 | 5 Q6| 2 3 4 5 | 14 Q16| 2 3 4 1 5 | 26 Q30| 2 3 4 5 | 13 Q15R| 21 3 4 5 | 20 Q24| 2 3 4 5 | 17 Q21| 23 4 5 | 25 Q29RM| 2 3 4 5 | 3 Q4| 2 3 4 5 | 30 Q35| 1 2 3 4 5 | 9 Q10| 2 3 4 5 | 21 Q25| 3 21 4 5 | 2 Q3| 2 34 5 | 12 Q13| 2 31 4 5 | 31 Q36| 2 4 3 5 | 29 Q34|-----------+-----------+-----------+-----------+-----------| NUM ITEM

SCALE DISTANCES AND ITEM CONTRIBUTIONS

OTHER PROBLEMS: DISORDERED THRESHOLDS

CATEGORY PROBABILITIES: MODES - Structure measures at intersectionsP -+-----+-----+-----+-----+-----+-----+-----+-----+-----+-R 1.0 + +O | |B | |A |11 4444 |B .8 + 11 444 4444 +I | 1 44 44 5|L | 1 4 4 55 |I | 11 4 44 5 |T .6 + 1 4 4 5 +Y | 1 4 4 55 | .5 + 1 4 * +O | 12222 4 5 44 |F .4 + 22* 2 4 5 4 + | 2 1 224 5 4 |R | 22 1 33*3333 55 44 |E | 22 133 4 2 33 5 4|S .2 + 22 3114 2 33 55 +P |22 33 41 22 333 55 |O | 33 44 11 22 33*555 |N | 33333 444 111 22**555 333333 |S .0 +***********555555555555*****11*************************+E -+-----+-----+-----+-----+-----+-----+-----+-----+-----+- -4 -3 -2 -1 0 1 2 3 4 5 PERSON [MINUS] ITEM MEASURE

FORMULA

The polytomous "Rating Scale" model:

log(Pnij/ Pni(j-1) ) = Bn - Di - Fj

where Pnij is the probability that person n

encountering item i is observed in category j, Bn is the "ability" measure of person n, Di is the "difficulty" measure of item i, the point

where the highest and lowest categories of the item are equally probable.

Fj is the "calibration" measure of category j relative to category j-1, the point where categories j-1 and j are equally probable relative to the measure of the item.

DATA COLLECTION

Online questionnaires composed of the 84 items (and additional open questions about curricula).

Participants: 232 adult students of the school of Educational Sciences and 139 students of the school of Psychology of the Open University of the Netherlands in their premaster (BSc) or master trajectory.

35% male, 65% female. Average age: 42, SD = 10.

RESULTS

1. Four distinct scales with Cronbach alpha’s of .90 (SDCC; 20 items), .84 (SDLC; 23 items), .72 (SRLC; 6 items), and .79 (TSE; 9 items). (RQ1)

2. 26 items of the 84 did not fit the model. Predominantly the new items fit the Rasch model in SDLC and SDCC.

3. Specifically items in SDLC are very sensitive for misfitting the model, misfits, and disordered thresholds. SDLC has very small categories.

4. TSE is characterized by contextualized items. Which items are generalizable to other contexts (suitable for anchoring)?

5. SRLC is too easy to endorse.6. SDCC is the most stable and best targeted

construct. 7. Modeling of the constructs in SEM. (RQ2)8. Three significant differences between ES and Psy.

(RQ3)

SCALES SUCH AS TSE

tem Infit Outfit Measure Error PTMEA Miscellaneous

83 and 84 are similar in ES and Psy.

80 is different in ES and Psy.

77 .82 .84 1.86 .13 .5780 1.29 1.27 1.03 .14 .4272 .74 .74 .90 .14 .7084 .98 .91 .51A .14 .6070 .64 .64 .22A .15 .6971 .77 .75 .09A .15 .7283 .99 .94 -.41A .15 .7073 1.13 1.07 -1.10A .15 .5478 .88 .82 -1.16 .15 .74All items

Mean .91 .89 .21 .15 Person Reliability .79SD. .19 .18 .94 .01 Person Separation 1.91All persons Item Reliability .97Mean .89 .89 1.62 .62 Item Separation 6.24SD .64 .64 1.35 .11 Cronbach alpha .82Average measures 1 2 3 4 5

-1.96 -.61 .58 2.37 4.10Step calibration measures -3.79 -1.95 1.19 4.55

Zscore: TSE9a

Zscore(SRLCa)

,28

Zscore: SDL23

,41

Zscore: SDCCa

ressdl

ressdcc,36

,25

,49

Selfregulated and Selfdirected CapabilitiesChi square=5,315

df=2p=,070

,25,48

Zscore: TSE9a

Zscore(SRLCa)

,34

Zscore: SDL23

,56

Zscore: SDCCa

,07

GPA ,07

ECTSprop

,30

,32

,37 ,27

,26

ressdl

ressdc

resects

resgpa

,49

SR and SD with outcome measuresChi square=8,187

df=9p=,515

,54

Implications for practice

For ES: In the premaster stage: Focus on tasks that support SRLC and academic achievement (planning; monitoring; evaluation ,but also support effort, persistence, and perseverance).

For ES: Support TSE, by mastery experiences, modeling, and persuasion.

For PSY: support SDLC by integrating more authentic professional tasks (or practice experiences), not only in research practicals, but also regarding diagnostic or interventions practice.

Implications for future research How generalizable is self-efficacy as a construct

(and consequently, how can you compare groups on this phenomenon)?

What is the quality of the negatively formulated items?

Is it justified to assume that student samples, in comparable stages of their learning trajectory, are of an equal endorsability level in self-reporting generic competences in different domains?

Is it justified to assume that responses on items can be attributed to persons, if context affects response patterns (e.g. SRLC “When I participate in an education program I make sure that I complete that program”)? (has also consequences for making measures)

Rude questions…

What is the quality of the instruments we use to measure learning and development (how and when are they validated? With which methods)?

How reliable, valid, and comparable are our performance measures, if we do not use Rasch validated items or tests?

How frequently do we calibrate our measures?

THANK YOU FOR YOUR ATTENTION. Any questions?

[email protected]

bijker, m. (2010) making measures and inferences reserve

Documents

scales of items

development of items

items independent

new items

theorybased items

rasch model selection

likerttype items

person capabilities