1 a century of testing: ideas on solving enduring accountability and assessment problems ucla, los...
Post on 18-Jan-2018
216 Views
Preview:
DESCRIPTION
TRANSCRIPT
1
A Century of Testing:A Century of Testing:Ideas on Solving Enduring Ideas on Solving Enduring
Accountability and Assessment Accountability and Assessment ProblemsProblems
UCLA, Los AngelesUCLA, Los Angeles8-9 September 20058-9 September 2005
Barry McGawBarry McGawDirector for EducationDirector for Education
Organisation for Economic Co-operation and DevelopmentOrganisation for Economic Co-operation and Development
Celebrating 20 years of Research Celebrating 20 years of Research on Educational Measurementon Educational Measurement
The 2005 CRESST Conference:The 2005 CRESST Conference:
2
Where to focus…Where to focus…
3
…so much has happened… Advancing the link to teaching and learning
Refining system monitoring effectiveness (quality) efficiency (value for money) equity
Taking an international perspective IEA surveys such as TIMSS, PIRLS OECD Programme for International Student Assessment (PISA)– different national achievement on social background slopes
– Google on PISA Somewhere else? (given what else was on the programme)
4
One key problem to be One key problem to be resolvedresolved
5
Point of reference for judging individuals
Abandoning hope of an external measure Psychophysics
– comparing judgements (such as brightness of light) with measure
– requiring judgements of differences, not absolute values Psychological phenomena
– developed in the context of differential psychology – individual performance judged in relation to other’ performance
– in particular, in relation to average performance of others
– norm-referenced (Want to look better? Choose other company.)
In search of an external criterion Separating scale construction and measurement
– Thurstone– criterion-referenced measurement
Simultaneous scale construction and measurement– item-response models (person-response-to-item models)
6
Application in a high-stakes Application in a high-stakes arenaarena
7
Public examinations High-stakes assessments based on curriculum
secondary certification and university entrance selection of highly competitive courses (top 1½ per cent)
need a common curriculum across schools The comparability-over-time problem…
Grade distributions used to monitor standards– failure rate used as a measure of ‘standards– claim that if participation rates grow, grades should decline to ensure that an ‘A’ still and ‘A’, etc
– do enough students fail? Criterion (standards) and norm (cohort)-referencing– ‘standards’ were never absent (in curriculum, examination)– ‘standards’ were ignored in the norm-based award of results– cannot use link items over time, whole test must become public
– marrying criterion and norm-referencing with judgments
8
Marrying criterion and norm-referencing
England use of criteria defined for some grade boundaries
review of previous years’ scripts at grade boundaries
reference to prior grade distributions reference to evidence of change in student cohort to justify shifts in grade distributions between years
Australia (New South Wales) development of band descriptors ‘consistent’ definition of bands over years. reporting with norm and criterion-referencing
9
The Suite of Documents
10
All HSC courses listed with Assessment Mark, Examination Mark, HSC Mark and Performance Band
All Preliminary courses listed
11
Descriptions in bands: summary of what students know and can do
Minimum standard expected (50)
Graph of distribution of results to show how all students performed
Student’s HSC Mark
Mark Range 0–100
Examination Mark
School Assessment
Mark
Number of candidates
12
How they got there… Review and recommendations for change
New NSW Higher School Certificate– McGaw, (1997). Shaping their future: Recommendations for reform of the Higher School Certificate. Sydney: Department of Training and Education Co-ordination
Scaling process– standards-referencing to curriculum and over-time– Bennett, J. (2001), Standards-setting and the NSW Higher School Certificatewww.boardofstudies.nsw.edu.au/manuals/pdf_doc/bennett.pdf
Developing grade descriptors Used past examinations
– experienced examiners for each subject– reviewed examination papers and students’ marked papers
Developing band descriptors– described performance for Band 6 to 2, low Band 1 not described
13
Using grade descriptors Stage 1
examiners independently form ‘image of band’ set cut mark for each band boundary on each question
Stage 2 examiners work together to reach agreement on boundary locations for bands on each question
boundary locations for total scores also established
Stage 3 Student work at boundaries on total scores reviewed Cut points reviewed and determined Boundaries located on mark scale
– 5/6 boundary set to 90– 4/5 boundary set to 80– …– 1/2 boundary set to 50
14
but, it does not alwaysbut, it does not alwayschange debate…change debate…
15
Debate isn’t always changed Federal Minister
found an English paper awarded a pass despite some inadequate expression within it
concluded too few students were being failed Nature of debate
became again a debate about desirable failure rates
important for such debates to be reconstructed as a debate about nature of performance judged inadequate
16
OECD education websitewww.oecd.org/edu
ContactBarry.McGaw@oecd.org
Thank-you
top related