portfolio-based assessments in medical education: are they valid and reliable for summative...
TRANSCRIPT
Portfolio-based assessments in medical education: are theyvalid and reliable for summative purposes?
Portfolios are widely used in all stages of
education from elementary school
through to vocational and professional
programmes. Inevitably, this high pre-
valence results in varying definitions and
differences in their use. Their predom-
inant use in medical education has
been for formative assessment, often as
a vehicle for encouraging a component
of reflection.1 More recently, the use of
portfolios has been advocated for sum-
mative purposes, (e.g. in undergraduate
education2), and for revalidation of
doctors.3
Portfolios have high face validity
and are a useful formative
assessment tool.
Over the last 20–30 years there has
been increasing awareness of the need to
develop new assessment tools and use
them in a way that ensures a high degree
of validity and an acceptable level of
reliability. Guidelines for devising qual-
ity procedures have been developed with
reference to the area of assessment of
clinical competence.4 While achieving a
high level of reliability may not be
considered critical where use of portfo-
lios is mainly for formative purposes,
once portfolios are proposed for use in
high stakes decision making, such as for
final medical school examinations or
revalidation, evidence of sound psycho-
metric properties needs to be estab-
lished.
The recent review of portfolios as
a method of assessment by Friedman
et al.5 included some brief discussion
on issues of reliability. In an attempt
to establish the psychometric credibility
of portfolios, we have conducted a
systematic review of the evidence for
portfolio-based assessment. In the set-
ting of medical education, we found
only 2 papers providing data, mostly
from small-scale studies.2,6 These stud-
ies focused on rater reliability and their
conclusions were not very reassuring,
with reliability falling well below the
generally acceptable value of 0Æ8.
Importantly, other aspects affecting
reliability have not been investigated.
In the area of assessing aspects of clin-
ical competence and problem solving,
generalisability studies have consis-
tently shown that content specificity is
a major contributor to unreliability,
more so than marker related factors.7
In essence, this means that a large
sample of performance has to be tested
before a reliable generalisation about
ability can be made. This has led, for
example, to the understanding that
objective structured clinical examina-
tions (OSCEs) have to be long in order
to be reliable, irrespective of the effect-
iveness of structured rating forms and
examiner training in reducing variance.
It is inevitable that similar problems
will beset portfolios, particularly those
that may allow considerable variability
in the content to be included, yet no
generalisability studies that allow us to
judge the likely extent of this problem
in determining reliability have been
published.
Portfolios need sound
psychometric properties if they
are to be recommended for high
stakes summative purposes.
In view of the lack of evidence under-
pinning the widespread implementation
of portfolio-based assessment in medical
settings, we broadened our search to the
wider education literature to identify
areas where more experience has been
gained. The most comprehensive are in
the field of elementary education.8,9
There are a number of concerns which
have been flagged around the issue of
reliability. The consistency of scoring
between examiners across a range of
studies has been highly variable. Whilst
the high face validity of portfolios is not
contested, criterion related validity has
proved disappointing in that scores from
portfolio assessments do not seem to
correlate well with scores from other
methods of assessment. However, these
studies suggest that there are some gen-
eral principles that can be used to guide
large-scale portfolio-based assessment.
To achieve high levels of interrater reli-
ability of around 0Æ8,8,9 it appears that
portfolios should be carefully introduced
to well prepared students and should be
of uniform content. They should be
marked by experienced, trained scorers,
who use clearly articulated criteria, have
a shared understanding of the purpose of
assessment and a deep understanding of
expected student performance.
There is little evidence at present
to support the widespread
introduction of portfolios for high
stakes summative assessment.
However, a number of outstanding
issues require further research. One of
these concerns how to ensure that the
material (evidence) in the portfolio is
attributable to the person submitting it,
while another relates to unresolved con-
cerns about cost and feasibility. Import-
antly, there will be consequential effects
on learning and teaching when imple-
menting portfolios. Educators need to
ensure they assess what they want learn-
ers to learn.10 If the assessment criteria
Correspondence: Chris Roberts, Senior
Clinical Lecturer in Medical Education,
Department of Medical Education,
University of Sheffield, Coleridge House,
Northern General Hospital, Herries Road,
Sheffield S5 7AU, UK. Tel.: 00 44 114 226
6784; Fax: 00 44 114 242 4896; E-mail:
Commentaries
� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:899–900 899
for portfolios are not well thought out,
students may inadvertently be directed
away from intended purposes and out-
comes by the portfolio process.
There may be some general
principles we can learn from the
wider education literature to
ensure the quality of
portfolio-based assessments.
An example of this will become
apparent if an attempt is made to use
the same portfolio for appraisal and
revalidation. The former process is
primarily intended to be formative,
encouraging people to identify areas in
which they might improve and to plan
how they might rectify deficiencies. The
latter process involves an evaluation of
performance and may potentially have
detrimental outcomes for those being
assessed. Portfolio assessment is thus a
summative purpose and the person
undertaking it is highly unlikely to
willingly present a portfolio identifying
their weaknesses.
This is the era of evidence-based
medical education. Where evidence is
available, it should not be ignored.
Where evidence is not available, this
should be acknowledged and efforts
made to plug such gaps. In regard to
portfolios, the evidence backing the
widespread introduction for high stakes
summative assessments is thin, to say
the least, and many gaps are evident.
Caution is advised, while at the same
time institutions and organisations that
are introducing portfolios should be
strongly encouraged to take a research-
based approach and publish their
data on validity, reliability, feasibility
and effects on student learning. With-
out this, we are in danger of reverting
to the dark days of the past, when
judgements about the value of assess-
ment approaches were largely based
on face validity and worthy psycho-
metric data was either ignored or never
sought.
Chris Roberts
David I Newble
Alan J O’Rourke
Sheffield, UK
References1 Challis M. AMEE Medical Education
Guide No. 11 (revised). Portfolio-based
learning and assessment in medical
education. Med Teacher 1999;21:370–
86.
2 Davis MH, Friedman M, Harden RM,
Howie P, Ker J, McGhee C, Pippard
MJ, Snadden D. Portfolio assessment
in medical students’ final examina-
tions. Med Teacher 2001;23:357–66.
3 GeneralMedicalCouncil.Revalidation.
http://www.gmc-uk.org/revalidation
(accessed 31 January 2002).
4 Newble DI, ed. Guidelines for Asses-
sing Clinical Competence. Teaching
Learning Med 1994;6:213–20.
5 Friedman M, Davis MH, Harden RM,
Howie PW, Ker J, Pippard MJ. AMEE
Medical Education Guide No. 24.
Portfolios as a method of student
assessment. Dundee: Association for
Medical Education in Europe. Med
Teacher 2001;23:535–51.
6 Pitts J, Coles C, Thomas P. Educa-
tional portfolios in the assessment of
general practice trainers: reliability of
assessors. Med Educ 1999;33:515–20.
7 Van der Vleuten CPM. The assess-
ment of professional competence.
Developments, research and practical
implications. Adv Health Sci Educ
1996;1:41–67.
8 Koretz D. Large scale portfolio
assessment in the US: evidence per-
taining to the quality of measurement.
Assessment Education 1998;5:309–33.
9 Herman JL, Winters L. Portfolio
research: a slim collection. Educational
Leadership 1994;52:48–55.
10 Van der Vleuten CPM, Newble DI.
How can we test clinical reasoning?
Lancet 1995;345:1032–4.
Portfolio-based assessments in medical education • C Roberts et al.900
� Blackwell Science Ltd MEDICAL EDUCATION 2002;36:899–900