portfolio-based assessments in medical education: are they valid and reliable for summative...

Portfolio-based assessments in medical education: are theyvalid and reliable for summative purposes?

Portfolios are widely used in all stages of

education from elementary school

through to vocational and professional

programmes. Inevitably, this high pre-

valence results in varying definitions and

differences in their use. Their predom-

inant use in medical education has

been for formative assessment, often as

a vehicle for encouraging a component

of reflection.1 More recently, the use of

portfolios has been advocated for sum-

mative purposes, (e.g. in undergraduate

education2), and for revalidation of

doctors.3

Portfolios have high face validity

and are a useful formative

assessment tool.

Over the last 20–30 years there has

been increasing awareness of the need to

develop new assessment tools and use

them in a way that ensures a high degree

of validity and an acceptable level of

reliability. Guidelines for devising qual-

ity procedures have been developed with

reference to the area of assessment of

clinical competence.4 While achieving a

high level of reliability may not be

considered critical where use of portfo-

lios is mainly for formative purposes,

once portfolios are proposed for use in

high stakes decision making, such as for

final medical school examinations or

revalidation, evidence of sound psycho-

metric properties needs to be estab-

lished.

The recent review of portfolios as

a method of assessment by Friedman

et al.5 included some brief discussion

on issues of reliability. In an attempt

to establish the psychometric credibility

of portfolios, we have conducted a

systematic review of the evidence for

portfolio-based assessment. In the set-

ting of medical education, we found

only 2 papers providing data, mostly

from small-scale studies.2,6 These stud-

ies focused on rater reliability and their

conclusions were not very reassuring,

with reliability falling well below the

generally acceptable value of 0Æ8.

Importantly, other aspects affecting

reliability have not been investigated.

In the area of assessing aspects of clin-

ical competence and problem solving,

generalisability studies have consis-

tently shown that content specificity is

a major contributor to unreliability,

more so than marker related factors.7

In essence, this means that a large

sample of performance has to be tested

before a reliable generalisation about

ability can be made. This has led, for

example, to the understanding that

objective structured clinical examina-

tions (OSCEs) have to be long in order

to be reliable, irrespective of the effect-

iveness of structured rating forms and

examiner training in reducing variance.

It is inevitable that similar problems

will beset portfolios, particularly those

that may allow considerable variability

in the content to be included, yet no

generalisability studies that allow us to

judge the likely extent of this problem

in determining reliability have been

published.

Portfolios need sound

psychometric properties if they

are to be recommended for high

stakes summative purposes.

In view of the lack of evidence under-

pinning the widespread implementation

of portfolio-based assessment in medical

settings, we broadened our search to the

wider education literature to identify

areas where more experience has been

gained. The most comprehensive are in

the field of elementary education.8,9

There are a number of concerns which

have been flagged around the issue of

reliability. The consistency of scoring

between examiners across a range of

studies has been highly variable. Whilst

the high face validity of portfolios is not

contested, criterion related validity has

proved disappointing in that scores from

portfolio assessments do not seem to

correlate well with scores from other

methods of assessment. However, these

studies suggest that there are some gen-

eral principles that can be used to guide

large-scale portfolio-based assessment.

To achieve high levels of interrater reli-

ability of around 0Æ8,8,9 it appears that

portfolios should be carefully introduced

to well prepared students and should be

of uniform content. They should be

marked by experienced, trained scorers,

who use clearly articulated criteria, have

a shared understanding of the purpose of

assessment and a deep understanding of

expected student performance.

There is little evidence at present

to support the widespread

introduction of portfolios for high

stakes summative assessment.

However, a number of outstanding

issues require further research. One of

these concerns how to ensure that the

material (evidence) in the portfolio is

attributable to the person submitting it,

while another relates to unresolved con-

cerns about cost and feasibility. Import-

antly, there will be consequential effects

on learning and teaching when imple-

menting portfolios. Educators need to

ensure they assess what they want learn-

ers to learn.10 If the assessment criteria

Correspondence: Chris Roberts, Senior

Clinical Lecturer in Medical Education,

Department of Medical Education,

University of Sheffield, Coleridge House,

Northern General Hospital, Herries Road,

Sheffield S5 7AU, UK. Tel.: 00 44 114 226

6784; Fax: 00 44 114 242 4896; E-mail:

[email protected]

Commentaries

� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:899–900 899

for portfolios are not well thought out,

students may inadvertently be directed

away from intended purposes and out-

comes by the portfolio process.

There may be some general

principles we can learn from the

wider education literature to

ensure the quality of

portfolio-based assessments.

An example of this will become

apparent if an attempt is made to use

the same portfolio for appraisal and

revalidation. The former process is

primarily intended to be formative,

encouraging people to identify areas in

which they might improve and to plan

how they might rectify deficiencies. The

latter process involves an evaluation of

performance and may potentially have

detrimental outcomes for those being

assessed. Portfolio assessment is thus a

summative purpose and the person

undertaking it is highly unlikely to

willingly present a portfolio identifying

their weaknesses.

This is the era of evidence-based

medical education. Where evidence is

available, it should not be ignored.

Where evidence is not available, this

should be acknowledged and efforts

made to plug such gaps. In regard to

portfolios, the evidence backing the

widespread introduction for high stakes

summative assessments is thin, to say

the least, and many gaps are evident.

Caution is advised, while at the same

time institutions and organisations that

are introducing portfolios should be

strongly encouraged to take a research-

based approach and publish their

data on validity, reliability, feasibility

and effects on student learning. With-

out this, we are in danger of reverting

to the dark days of the past, when

judgements about the value of assess-

ment approaches were largely based

on face validity and worthy psycho-

metric data was either ignored or never

sought.

Chris Roberts

David I Newble

Alan J O’Rourke

Sheffield, UK

References1 Challis M. AMEE Medical Education

Guide No. 11 (revised). Portfolio-based

learning and assessment in medical

education. Med Teacher 1999;21:370–

86.

2 Davis MH, Friedman M, Harden RM,

Howie P, Ker J, McGhee C, Pippard

MJ, Snadden D. Portfolio assessment

in medical students’ final examina-

tions. Med Teacher 2001;23:357–66.

3 GeneralMedicalCouncil.Revalidation.

http://www.gmc-uk.org/revalidation

(accessed 31 January 2002).

4 Newble DI, ed. Guidelines for Asses-

sing Clinical Competence. Teaching

Learning Med 1994;6:213–20.

5 Friedman M, Davis MH, Harden RM,

Howie PW, Ker J, Pippard MJ. AMEE

Medical Education Guide No. 24.

Portfolios as a method of student

assessment. Dundee: Association for

Medical Education in Europe. Med

Teacher 2001;23:535–51.

6 Pitts J, Coles C, Thomas P. Educa-

tional portfolios in the assessment of

general practice trainers: reliability of

assessors. Med Educ 1999;33:515–20.

7 Van der Vleuten CPM. The assess-

ment of professional competence.

Developments, research and practical

implications. Adv Health Sci Educ

1996;1:41–67.

8 Koretz D. Large scale portfolio

assessment in the US: evidence per-

taining to the quality of measurement.

Assessment Education 1998;5:309–33.

9 Herman JL, Winters L. Portfolio

research: a slim collection. Educational

Leadership 1994;52:48–55.

10 Van der Vleuten CPM, Newble DI.

How can we test clinical reasoning?

Lancet 1995;345:1032–4.

Portfolio-based assessments in medical education • C Roberts et al.900

� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:899–900

portfolio-based assessments in medical education: are they valid and reliable for summative...

Documents