developing a model for investigating the impact of assessment

Developing a model for investigating the

impact of assessment within educational

contexts by a public examination provider

Dr Nick Saville, Research and Validation Group, Cambridge ESOL

Developing

a model for investigating

the impact of assessment

within educational contexts

by a public examination provider

"Impact by Design"

http://www.cambridgeassessment.org.uk/

Nick Saville

AthensJune 2006

A model for investigating the impact of (language)

assessment within

educational contexts

Teaching Testing Learning

A Perspective from Cambridge ESOL

Nick Saville

AthensJune 2006

A model for investigating the impact of assessment

within educational contexts

Teaching Testing Learning

Implications for Cambridge Assessment?

153 years of history ……..

In tune with the spirit of the Victorian age

14th December 1858

370 students in seven different local contexts took an examination paper set by UCLES for the first time

153 years of history ……..

"This year, we find that students have acquired a great deal of skill but that they seem to have acquired it for examination purposes"

Art examiner writing in The TES, 1915

Michael Shaw - Remembrance of things passedCover Story - Magazine, TES (10 December 2010)

153 years of history …….. plus ça change

"This year, we find that students have acquired a great deal of skill but that they seem to have acquired it for examination purposes"

Art examiner writing in The TES, 1915

Michael Shaw - Remembrance of things passedCover Story - Magazine, TES (10 December 2010)

• Background to ESOL's approach• 1980’s

• Messick, Bachman – early 1990s

• The literature on washback/impact• early work and recent progress

• gaps? where next?

• Analysis of three case studies• what can be learnt?

• Towards a Comprehensive Model of Impact• applicable to other educational contexts?

Outline for today's talk

V

Test

R Practicality?

ESOL background – 1987-1990 : Japan

Considerations in developing fair tests

The art of the possible

PracticalityV

P

TestR

“Practicality in Language Testing: an educational management model”

Main argument: test development is a form of educational innovation - and needs to be managed as such

“... achieving a balance between the purpose of the test, its validity for the purpose, the required reliability for the purpose and the constraints

imposed by the context is essentially the task facing the test designer ….”

Saville (1990), University of Reading.

A Cambridge test development project: Japan, 1987 to 1989

Putting the test into context

V

R P

Test

" The aim … is not only to encourage good testing practice, but to prevent bad tests being produced ....

... a bad test is not only one with low reliability and dubious validity but also one which has a damaging washback on the curriculum".

Saville 1990

……. any test which is produced should be appropriate to the educational context in which it is to be used and the effect on learners and institutions will be a major consideration.

V

R P

Test

Putting the test into context

Impact Ripples

V

R P

Test

V

R P

Test

I

Local Impact

“micro” level

Impact Ripples

V

R P

Test

II

Wider Impact

“macro”

level II

Impact Ripples

U = V + R + I + P

Prof L Bachman (UCLA) - Cambridge Seminars 1990/91

The unitary concept of Usefulness

Overall Validity

U = V + R + I + P

Bachman and Palmer, 1996 : U = Cv + A + I + R + I + P

Developing “useful tests”, fit for purpose

Balancing the test qualities

Usefulness as “overall Validity”

Current ESOL Practice

Principles of Good Practice - 2011

Quality Management and validation in language assessment

VRIP

Principles of Good PracticeQuality Management and validation in language assessment

www.cambridgeesol.org/about/standards/pogp.html

VRIPSee also brochure - Making an Impact


Starting to develop a model of impact

g 1993 – 1995

g Using VRIP to develop and revise examse.g. the revision of IELTS 1995

• The IELTS impact project

g An expanded view of impact - from the test developer’s perspective

• Working for positive impact• Limiting negative consequences

Maxim 1 PLANUse a rational and explicit approach to test development

Maxim 2 SUPPORTSupport stakeholders in the testing process

Maxim 3 COMMUNICATEProvide comprehensive, useful and transparent information

Maxim 4 MONITOR and EVALUATECollect all relevant data and analyse as required.

Maxims for achieving/monitoring impact

Milanovic and Saville, 1995 Considering the impact of the Cambridge EFL examinations

The literature on washback/impact

g Readings in the language testing literature:• Hamp-Lyons (1989)• Wall and Alderson (1993) Does washback exist? etc..• Language Testing (1996: 13, 3) Messick, Bailey, etc…• Hamp-Lyons (1997)• Watanabe (1997)• Cheng and Watanabe (eds) (2004)

• Recent PhD studies and subsequent books in SILT series based on research conducted in the 1990s:

• Cheng (SILT 21 - 2005)• Wall (SILT 23 - 2005)• Hawkey (SILT 24 - 2006)• Green (SILT 25 -2007) - “washback in context”

g Washback (or backwash) has been broadly defined in the assessment literature as the effect of testing on teaching and learning

g One aspect of the broader phenomenon known as impact

Washback/impact

g Based on who or what might be affected:• Teaching• Learning • Content• Rate of learning• Sequence of teaching/learning• Degree/depth of curriculum coverage• Attitudes of teachers/learners• Etc.

Alderson and Wall, 1993

15 washback hypotheses

g A continuum - stretching from harmful at one end, through neutral to beneficial at the other end

Negative Neutral Positive

- +

Washback

g Negative?• Restriction of content – narrowing of

curriculum• Too much time practising for the test

g Positive?• Transparent objectives and outcomes• Increased motivation of learners• Increased accountability of teachers (?)

Washback

The “law” of unintended consequences

g “Any purposeful action will produce some unintended consequences” or side-effects

g “Goodhart’s Law”(or “Campbell’s Law” in the USA)• a variant of the “law” of unintended

consequences

“Goodhart’s Law”

g “All performance indicators lose their meaning when adopted as policy targets”

g Examples:• England - school achievement targets - school

league tables• USA – No Child Left Behind (NCLB)

g The clearer you are about what you want, the more likely you are to get it – but the less likely it is to mean what you wanted it to!

(Dylan Wiliam, Cambridge 2008)

Perverse incentives?

g Assessment policy can create a tension between

• educational objectives at the micro level (teaching and learning in schools) and

• a requirement for accountability at the macro level

g Negative?• Restriction of content – narrowing of

curriculum• Too much time practising for the test

g Positive?• Transparent objectives and outcomes• Increased motivation of learners• Increased accountability of teachers (?)

g BUT – cause and effect explanations are rarely adequate …..

Washback

Washback Models

In the language testing literature:

• Hughes (1993)

• Bailey (1996)

• Watanabe (2004)

• Cheng (2004, 2005)

• Green (2007)

3 Ps:

Participants• students• teachers

Processes

Products• learning• teaching• materials• curricula

Bailey’s 1996 Model (based on Hughes 1993)




• Cheng (SILT 21 - 2005) • Wall (SILT 23 - 2005)• Hawkey (SILT 24 - 2006)• Green (SILT 25 - 2007) - “washback in context”

Liying Cheng Dianne Wall Roger Hawkey

Studies in Language Testing series




• Cheng (SILT 21 - 2005)• Wall (SILT 23 - 2005)• Hawkey (SILT 24 - 2006)• Green (SILT 25 - 2007) - “washback in context”

FocalConstruct

Test designcharacteristics

item formatcontent

complexityetc.

Overlap

Potential fornegative backwash

Potential forpositive backwash

Perception oftest importance

Perception oftest difficulty

Backwash toparticipant

Important

Unimportant

No backwash

Intense backwash

Easy

Unachievable

Challenging

Washback direction

Washback intensity

Washback variabilityParticipant characteristics and values

Knowledge/ understanding of test demandsResources to meet test demandsAcceptance of test demands

Other stakeholdersCourse providersMaterials writers

PublishersTeachersLearners

Green IELTS Washback in context: Preparation for academic writing in higher education(SILT 25, 2007)

The model starts from test design characteristics and related validity issues of construct representationidentified with washback by Messick (1996)

Washback will be most intense –have the most powerful effects on teaching and learning behaviours –where participants see the test as challenging and the results as importantSEE BLUE ARROW

Studies in Language Testing, 25

IELTS - Washback in context



So• Impact is relatively new in the field of language assessment - an

extension on the notion of washback and related to ethicality• It is now considered to be of growing importance• It is part of a validity argument and evidence needs to be provided

Broadly speaking there is consensus • washback is an aspect of impact related to the “micro contexts” of the

classroom and the school (teaching and learning)• impact deals with wider influences and includes the “macro contexts” -

tests and examinations in societyBUT

g The dynamics between the micro and macro contexts mean that this is a complex rather than a simple or linear relationship

- a “complex dynamic system”


And currently:

• there has not been a comprehensive model of test or examination impact within educational contexts

• impact has not yet been fully integrated into an approach to test development and validation in a systematic way

Three case studies – 1995 to 2004

g Case 1 - the world-wide survey of the impact of IELTS• a starting point for the work and the original model for what has followed• a conceptualisation of impact and design/validation of suitable instruments to

investigate it

g Case 2 - the Italian PL2000 project• an application of the model within a macro educational context• an initial attempt at the applying the approach on a limited basis within a

state educational context• Hawkey – SILT 24 (2006)

g Case 3 - the Florence Language Learning Gains Project• an extension and re-application of the model within in a single school context • at the micro level focusing on individual stakeholders within a single

language teaching institution

Case 1 - the IELTS Impact studies

The project had the following aim within the IELTS revision project (1993-5):

….. to investigate the impact of the test on candidates and on other test users, as part of the continuous process of ensuring that IELTS is as valid, effective and ethical as possible

IELTS 1995 Revision Project

Phases of the IELTS Impact Study

Phase One: Prof. C.Alderson (Lancaster University) was commissioned to develop first draft of data collection instruments (1995)

Phase Two: trialling, revision, rationalisation of instruments

Phase Three: (2001-2004)pre-survey, main data collection, analyses, report

See: Research Notes (2, 2000; 6, 2001; 15, 2004)Alderson and Banerjee (SILT 11, 2001)Saville and Hawkey (2004 - in Cheng and Watanabe)Hawkey (SILT 24, 2006)

g Responses received from:• 572 pre- and post-IELTS candidates• 83 teachers completing the teacher questionnaire• 43 teachers completing the instrument for the analysis of textbook

materials

g Stakeholder interviews and focus groups at selected case study centres, involving:

120 students21 teachers 15 receiving institution administrators. 12 “live” IELTS-preparation classes have been video-recorded

and analysed.

Stakeholder participation in Phase 3

Some key points and lessons learnt

g Setting objectives, design and research questions• The instruments – development and validation• The data – (strategies for collection, storage, retrieval)• The analysis and interpretation of multiple sources of

data (quantitative and qualitative)

g Managing impact studies • practical, legal, ethical issues• project management and action planning

g But the IELTS international dimension introduces multiple contexts – many more case studies required in specific contexts

Using international certification in Italian state-sector education

Case 2 - the Italian PL2000 project

• an application of the approach within a single macro educational context

• an initial attempt at the applying the approach on a limited basis within a state educational context



g The Progetto Lingue 2000 within the state school system of Italy

g As the name suggests - came into practice in the academic year 1999 to 2000

Progetto Lingue 2000

g The intention of the progetto was:

“.... to introduce innovation into the teaching and

learning of other languages by putting greater

emphasis on the development of communicative

competence in all grades of the school system”

Italian Ministry document


g Emphasis on

• the use of new technology in pedagogic contexts

• self-study and the individualisation of the learning experience

g The adoption of a level system based on the Council of Europe’s Common European Framework of Reference (CEFR) as learning objectives and standards

g The option of getting a certificate of proficiency to certify the level reached• the certificate should be aligned to the CEFR scale and issued by a

certificating body which is recognised internationally

Educationalgoals

Italy’s national learning goals integrated with pan-European - Council of Europe - goals

An educational innovation project


Educationalgoals

ResourcesTeacher

Development& support

Assessmentand

Certification

Curriculumdesign


Educationalgoals

Assessment,CertificationIncluding optional

external certification


PL2000 Impact Project 2001-2

Main interdependent language programme stakeholders and dimensions

Learning goals,curriculum,

syllabus

Students

Parents

Teachers

Teacher-trainers

Curriculum developers

Testers

Publishers

Receiving institutions

Employers

Students

Parents

Teachers

Teacher-trainers

Curriculum developers

Testers

Publishers

Receiving institutions

Employers

Materials

Teacher Support

Testing

Methodology

Some key points and lessons learnt

g Applied lessons learnt in the IELTS studiesg Adapted the instruments and data collection techniquesg Introduced new features of data collection

• Seven case study schools with school visits and interviews

g Proved the successful application of the approach within a national context

g Showed the possibility of matching learning objectives and tests via a “neutral” framework of reference – CEFR

g But – only limited data g Test provider was an “outsider”

Studies in Language Testing, 24

Impact Theory and Practice


Case 3 – Florence project

g the Florence Language Learning Gains Project

• an extension and re-application of the model within in a single school context

• at the micro level focusing on individual stakeholders within a single language teaching institution

(British Institute of Florence)

Key points and lessons learnt:

g Focus on washback on language performance and learning growth• Can the influence of the test be separated from the other

variables?

g Longitudinal study over one academic year (2002-3)g Participant learners were compared in terms of:

• Competence level• Age• Stage• Motivation• External (high stakes) or internal final exam• Learning gain

g Provided multiple sources of very rich data

g But - difficult and costly to dog Requires active participation of many stakeholder groups and

individuals

Learning from the 3 impact case studies

g What can be learned using these specific impact projects as meta-data?

Learning from the 3 impact case studies

g Three key factors of contemporary educational systems need to be accounted for:

1. the nature of complex dynamic systems(see for example D. Larsen Freeman 1997)

2. the roles that stakeholders play within such systems

3. the need to see assessment projects as educational innovations within the systems and to manage change effectively – need a theory of action

1. The nature of complex dynamic systems

LearnersTeachersTest writers/examiners Receiving institutionsSchool ownersFuture employersGovernment agenciesProfessional bodiesTest centre administratorsMaterials writersPublishersetc

Learners Parents/carersTeachersReceiving institutions EmployersSchool ownersExaminersGovernment agenciesProfessional bodiesAcademic researchersTest writers/Examinersetc

Test constructsTest format

Test conditions

Test assessment

criteria

Test scores

Testing System

Contexts of test use - consequencesInputs to test design

2. The roles that stakeholders play

LearnersTeachersTest writers/examiners Receiving institutionsSchool ownersFuture employersGovernment agenciesProfessional bodiesTest centre administratorsMaterials writersPublishersetc

Learners Parents/carersTeachersReceiving institutions EmployersSchool ownersExaminersGovernment agenciesProfessional bodiesAcademic researchersTest writers/Examinersetc

Test constructsTest format

Test conditions

Test assessment

criteria

Test scores

Testing System

Contexts of test use - consequencesInputs to test design

The roles that stakeholders play

See Wall (SILT 22, 2005)… a case study using insights from testing and innovation theory e.g. Henrichsen (1989)

3. The need to see assessment projects as educational innovations and to manage change effectively

Hybrid Model of the Diffusion / Implementation Process

Antecedents Process Consequences

Timeline

Learning from the case studies

g When applied to (language) assessment, two key factors also need to be accounted for :

a) the nature of the construct: language itself as a socio-cognitive phenomenon - the latest views on validity

b) the nature of the test development and validation process• from conception to routine data collection and analysis

g Impact research, therefore is another kind of validation activity ........

Theory Test Taking Context

TT CONTEXT• TLU • Learning context • Context of score use

a) A socio-cognitive framework

MessickBachmanKaneMislevyWeir….. etc.

Consequential aspects

of validity



A socio-cognitive framework

The testing system

CoreConstruct


of validity

see also Pellegrino



The contexts

Learning contexts

Testingcontexts

Use of resultscontexts


of validity



ImpactConsequential

aspectsof validity

The contexts

..Test

performance

..“Real world”

(target situation of use)

True score

Test score

How can we score what we observe?

Relates to marking,rating criteria

Scoring model

Evaluation

Does the test measure consistently?

Relates totest reliability,rater training,scale construction and version equating using IRTetc

Measurement model

Generalization Extrapolation

Does the test score reflect the candidate’s actual ability?

Relates to Validity

e.g. a Socio-cognitive model linking features of the learners, the test content and the skills to be measured

CEFRlevels

Specific testing context Link to context -neutral frameworkIdealization

How does the specific learning/testing context relate to a more general proficiency framework?

Depends on identifying the salient features of the levels and the specific learner group – not all salient features may be relevant to all groups.

Quantitative and qualitative evidence may be provided.

inference to a framework - from Dr Neil Jones

… based on Kane, Mislevy etc.

b) Model of the Test Development Process

“ … seek validity by design as a likely basis for washback”

Messick, 1996: 252

Seek "impact by design"

i.e. a theory of action

Saville, 2009

Identifying stakeholders and their needs

Linking these needs to the requirements of test usefulness- including predicted impact

- theoretical

- practical

Long term, Iterative Processes -a key feature of validation

Model of the Test Development Process

Involvement of the stakeholder constituency

E.g. during test design and development

g presentation and consultation to do with specifications and detailed syllabus designs

g professional support programmes for institutions and individual teachers/students etc. who plan to use the examinations

g training and employment of suitable personnel within the field to work on all aspects of the examination cycle – to be question/item writers, to act as examiners, etc.

Cf. the Maxims referred to above

After an examination becomes operational

g Procedures need to be in place to collect data routinely which allows impact to be estimated:

• who is taking the examination (i.e. a profile of the candidates)

• who is using the examination results and for what purpose• who is teaching towards the examination and under what circumstances• what kinds of courses and materials are being designed and used to prepare

candidates• what effect the examination has on public perceptions generally

(e.g. regarding educational standards)• how the examination is viewed by those directly involved in educational

processes(e.g. by students, examination takers, teachers, parents, etc.)

• how the examination is viewed by members of society outside education(e.g. by politicians, business people, etc.)

Towards a comprehensive model

g How can these considerations be combined to produce a comprehensive, integrated model?

• to guide language testers in ways to build impact into test development and validation systems

• to promote research into impact by a wide range of stakeholders

A meta-framework building on Milanovic & Saville’s maxims (1996)

Four inter-related dimensions:1. re-conceptualise the role of impact study within the assessment enterprise,

vis-à-vis societal systems generally and language education specifically

2. introduce the concept of “impact by design” into the planning and operationalisation of language assessments by examination providers

3. re-organise validation procedures to incorporate impact research into

operational activities to provide the basis for knowing about and

understanding how well an assessment system works in practice with regard to its impact (as defined in point 1 above)

4. develop an appropriate theory of action which enables examination providers

to work with stakeholders to achieve the intended objectives, to avoid negative consequences and to take remedial action when necessary.

“Impact by design”

g Integral part of a framework for developing and validating examination systems

g A concept akin to social impact assessment (SIA)

g Focus on what matters – e.g. successful learning

Impacts (positive and negative) anticipated in design phase

Impact research methodology used to find out what happens

Remedial action taken when needed on the basis of impact evidence

Key considerations

Centrality of language construct, theories of language learning- a socio-cognitive model- learning understood as change- effective communication

Impact research incorporated into routine validation processesMixed method designs used with impact “toolkit” to collect quantitative and qualitative data

Importance of the timeline with iterative cycles of review and revisions implemented over time

Emergent aspects of validityImproved understanding of the meaning of language assessment in context and of the effects and

consequences on systems and people

StancePerspective of UK examinations boardInfluenced by critical realism, contemporary pragmatism

Reconceptualising impact taking account of:- theories of knowledge - socio-cognitive theory- constructivism- theories of change

Impact by design

Procedural basis for knowing about effects and consequences

Theory of Action

A revised model (2009)

Applications beyond ESOL?

g Applying the model within the UK educational context:

g The Asset Languages Project (2003 onwards)

Conclusion

Investigating impact as validationg The investigation of impact is not a discrete or one-off activity

g It is an essential component in establishing the overall validity (usefulness) of an assessment system in terms of its fitness forspecific purposes and contexts of use

g The proposed model locates the study of test impact as one of a set of research and development tools within an iterative approach to on-going test validation

g It is consistent with Messick, 1996:

“In essence ..... test validation is empirical evaluation of meaning and consequences of measurement, taking into account extraneous factors in the applied setting that might erode or promote validity of local score interpretation and use.”

Thank You!

[email protected]

developing a model for investigating the impact of assessment

Documents