measurement – class 11 education (2/2). a.measuring students skills: pisa
TRANSCRIPT
Measurement – class 11
Education (2/2)
A. Measuring students skills: PISA
Measuring students skills: PISA
• Cf. class 10: OECD became the dominant source for measurement of education understood as “outcome”: individual skills
• 1995: International Adult Literacy Survey
• 2000: PISA
• Cornerstone of these measurement tools: measuring skills, not performance
Measuring students skills: PISA
• Psychometry:
– Measured score = real score +measurement error
– Historically, attempts at measuring “intelligence” (IQ)
– The skill is the “hidden factor”, partially revealed by tests
Measuring students skills: PISA
Measuring students skills: PISA
• Germany: “PISA shock” (huge debate)
• England: “Are we not such dunces after all?” (Times, dec. 5, 2000)
• Again, much debate about rankings…
• So what does PISA tell us, and what does it not?
Measuring students skills: PISA
• Goal = measuring ability to use knowledge in a practical setting. « PISA’s aim of tapping students’ preparedness for life. »
• Ex: reading = « The capacity of an individual to understand, use and reflect on written texts in order to achieve one’s goals, to develop one’s knowledge and potential, and to participate in society. »
Measuring students skills: PISA
• Math = « The capacity of an individual to identify and understand the role that mathematics plays in the world, to make well-founded judgements and to use and engage with mathematics in ways that meet the needs of that individual’s life as a constructive, concerned and reflective citizen. »
Measuring students skills: PISA
• Items are designed by experts, designed to measure certain skills
• Then, ranked on a difficulty scale according to % of test students who succeeded
• Actual survey students (not the same students) are then given a score
• The student’s score in turn predicts the probability of success on items of each difficulty level (5 levels)
Measuring students skills: PISA
• Items are designed by experts, designed to measure certain skills
• Then, ranked on a difficulty scale according to % of test students who succeeded (not the same students)
• Evaluated students are then given a score
• The score in turn predicts the probability of success on items of each difficulty level
Measuring students skills: PISA
Measuring students skills: PISA
• Scores are standardized: – Average score of OECD Countries = 500– Standard deviation = 100 (20% of m)
Measuring students skills: PISA
• Items are tested in different countries, eliminated if present a cultural, gender… bias
• In the end, cultural bias? • Convincingly argued that no:
– « Stimuli » are 15% longer in French, but French Canadians do as well as the best English speaking Canadian states
– Differences such as US / Eng Canada– Type of question? Not really (the French
score well on item choice questions)
Measuring students skills: PISA
• Other sources of bias?
• Only the 15-year-olds currently attending school– UK: 95 % – France: 90 % – Brazil: 55 % (OECD partner state)– Mexico: 54 % (OCDE member)
Measuring students skills: PISA
• Other sources of bias?
• Target population: remote schools, disabled youth or non-native speakers could be left out, but within less than 5% of the population
• But some countries left out more than 5%
• Response rates? Very low in UK in 2000 and 2003 => results excluded from PISA international comparisons
Measuring students skills: PISA
• Main source of uncertainty: sampling error
• Most countries: +/- 5 points on average score
• Ex France (Grenet, working paper): – Ranked somewhere between 18 and 28 (on
56)– Mean score not statistically significant from 13
other countries’ (on 55)
Measuring students skills: PISA
• Differences– between France and Australia = 32 points– btw Australia and Argentina = 136 points
– Difference between 2 levels = 70 points
Mesurer les compétences: PISA
• Educational context– « skills » and not contents of a curriculum but
the 2 are linked (skills are taught at school, even if only a little!)
– Ex: 1st notions of probability come during 5th year of high school in France
Mesurer les compétences: PISA
• 15 year olds sampled by PISA did not all receive the same education => what is being measured isn’t the quality of education but many things (grade repetition). Ex: probabilities
15-year olds participating in PISA 2003
6th year of high school ("1ère") = "early" 2%
5th year of high school = "on time" 57%
4th year of high school (repeated once) 34%
3rth year of high school (repeated twice) 5%
other 1%
Mesurer les compétences: PISA
• Huge gap: ex. on Reading scale, PISA 200– « on time » students in non-vocational classes
score 560 on average (= Finnish score) – Students having repeated once: 430 on
average = bottom of international rankings
• PISA sheds light on these structural differences and asks relevant policy questions
Mesurer les compétences: PISA
• Motivation effect: PISA 2003 asked students to measure their effort in answering the questionnaire, on a 1 to 10 scale– French students: average 7/10– 40th / 41 participating countries…
Mesurer les compétences: PISA
• Binet’s joke (1 of designers of the IQ test): « What is intelligence? Well, it’s what my test measures! »
• DESECO program (1999): asked non-psychologists (philosophers, sociologists, economists, ethnologist) what skills were needed to succeed in today’s world
= need for a definition of what is being measured, outside of the definition of the measurement tool
A word on non response
• PISA: 3 causes at least– Student can’t answer => measurement OK– Student didn’t have time => intensity of the
test, not skill => must be taken into account in scoring
– Student didn’t even try => motivation effect, measurement pb
• Relatively OK for assesment of skills within school context
– Students are used to comply. Pb with adults!
A word on non response
• In general– Total non response: no survey– Partial non response: parts of the survey,
items
• Total non response high probable bias
• Partial non response: – what does it say?
A word on non response
• Refusal to answer some questions because too private = not most frequent case
– Ex: income. From brackets to actual amount– Ex: questions on divorce, getting along with
partner…
• Very often: meaningless question for the respondent (Bourdieu)
– Ex: « what do you think of the US Foreign Policy regarding Cuba? »
A word on non response
• Answering an opinion question meaningfully means having an opinion = having already thought about the question
• Conclusion for survey design and reading survey answers
– Always leave the possibility to answer « don’t know » vs. « refusal to answer »
– When reading tables /regression results, check the number of respondents
C. Measuring adults skills: IALS
Measuring adults’ skills: IALS
• IALS : 1995. A case in point of « measurement failure »
• Idea: same as what was said in part A. on PISA, but on adults.
• Items of various difficulties, level of the individual = level of questions he answers with .8 probability
• One single scale of literacy. Level 1 = has difficulties ~ « illiterate »
Measuring adults’ skills: IALS
• France left the IALS project and inflicted total censorship on the results
• Except that France was left in appendix tables articles like the IHT one
• How did that happen?
Measuring adults’ skills: IALS
Measuring adults’ skills: IALS
• Sampling method = random path dwelling– Adresses are sampled from a sampling frame
(census, phonebook…). – To avoid bias due to dwellings not in the sampling
frame, another dwelling is actually sampled, with an itinerary to go from the first one to the actual one
– Ex: « start from 48, bd Jourdan. Head North, take the second street to the left, count 5 buildings on the right side, enter the 6th one. Go to the 3rd floor and choose the 2nd aparatment on the right side of the corridor »
Measuring adults’ skills: IALS
• The Kalton, Lyberg, Rempp audit (1995) :
• Replacement rates (replace the protocol dwelling by another) = very high
• Refusal = 45% (in addition to absent households, etc)
Measuring adults’ skills: IALS
The Kalton, Lyberg, Rempp audit (1995) Replacements:
Adress interviewed Adress interviewed NN percentage percentage Protocol adress 1Protocol adress 1 13631363 45,5 45,5 11st st replacement replacement 792 792
26,426,4 22ndnd replacementreplacement 841 841
28,128,1 totaltotal 2996 2996
100,0100,0
non-response = 45,2 %non-response = 45,2 %
Measuring adults’ skills: IALS
• Probably upward bias when protocol not strictly implemented– Germany: 100% have no problem reading
German – HH were not selected at ranmdom– Within HH, the most able / motivated HH
member was selected in 5 to 10% of cases, contrary to protocol
Measuring adults’ skills: IALS
• Motivation effect: very important– The interviewer had nothing to do while
respondent filled booklet => pressure on respondent (theoretically, no time limit)
Measuring adults’ skills: IALS
• Work by A. Blum et F. Guerin (1998)
• Interviewed the interviewers– In 20% of HH, people seemed to answer
without thinking to be done with the survey
• Studied the items: ambiguities, « wrong » when in fact perfectly understood. The « onion example »
Measuring adults’ skills: IALS
• Conclusion on IALS
• Not one single defect, but a series of dysfunctionings all along the measurement production chain– Sampling,
– Items design
– Survey fieldwork
– Imputation of non response
– Coding and calculation of scores (1 single scale)
Lessons to be learnt
• HMK 11: – always consider the VARIANCE, not just the
mean / median– Are levels relevant or only ranks (absolute //
relative measurement)
Lessons to be learnt
• IALS + HMK 11: non-response matters– It is often non ignorable and induces biases,
try to assess them– Partial non response means something about
the question
• PISA and IALS– ALWAYS read the technical documents /
annexes before saying anything – you can!– Rankings are relevant if variable values are
actually different…– Same thing as « look for the standard error »