designing an assessment system · designing an assessment system richard p. phelps international...

40
Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan October, 2016 © 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 1

Upload: others

Post on 07-Oct-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

Designing an Assessment System

Richard P. Phelps

International Research-to-Practice ConferenceNazarbayev Intellectual Schools AEO

Astana, Kazakhstan

October, 2016

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 1

Page 2: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 2

“If a thing exists, it exists in some amount. If it exists in some amount, then it is capable of being measured.”

−−René Descartes, Principles of Philosophy, 1664

Page 3: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 3

Image of Protein Molecules Forming MemoriesAlbert Einstein College of Medicine, New York, January 2014

Page 4: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 4

Image of Protein Molecules Forming MemoriesAlbert Einstein College of Medicine, New York, January 2014

Page 5: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 5

Learning Curve

Page 6: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 6

Forgetting Curve (1870s)

Page 7: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 7

Page 8: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 8

Ebbinghaus:

“Learning usually requires rehearsal or repetition”

Page 9: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 9

Cognitive Load Theory

John Sweller, 1980s

Working Memory Capacity

George Miller, 1950s

Page 10: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 10

Working Memory:

Ability to temorarily hold and manipulate information for cognitive tasks

Working Memory is challenged by:

new, unfamiliar information and

quantity of discrete bits of information

Page 11: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 11

I am thinking of a type of object, what is it?

They are shapes, geometric plane figures, polygons, quadrilaterals, and parallelograms with opposite equal acute angles, opposite equal obtuse angles, and four equal sides

Description 1:

Page 12: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 12

I am thinking of a type of object, what is it?

Description 2:

Page 13: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 13

Page 14: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

Two centuries of research on learning concludes…

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 14

“…repeated retrieval during learning is the key to long-term retention.”

— Henry L. “Roddy” Roediger

Page 15: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 15

Cognitive Scientists’ 6 Strategies for Effective Learning

Retrieval Practice

Spaced Practice

Dual Coding

Interleaving

Concrete Examples

Elaboration

Page 16: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 16

Retrieval Practice

Page 17: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 17

Page 18: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 18

Page 19: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 19

Implications for Teachers 1

Most teachers should test more frequently, …with smaller,

shorter, low-stakes tests

Understand that useful assessment can be short and

simple.

Page 20: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 20

Implications for Teachers 2

Does the test format matter?

• multiple-choice?• essay?• short answer?• oral?• demonstration?• …etc.?

Not so much.

Page 21: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 21

Tests provide feedback to teachersabout what works and what does not

Implications for Teachers 3

Just like students can learn by testing each other; teachers can help each other by reviewing each others’ tests.

Page 22: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 22

Cognitive Psychology experiments were

conducted with “formative” tests in

schools and classrooms

Page 23: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 23

What about systemwide, large-scale tests?

First priority:

do no harm to the formative testing programs in schools and classrooms

Page 24: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 24

The effect of testing on student learning

• 12-year study, read >3,000 documents

• analyzed close to 700 separate studies, and more than 1,600 separate effects

• 2,000 other studies were reviewed and found incomplete or inappropriate

• hundreds of other studies remain to be reviewed

Page 25: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 25

245 Qualitative studies

813 Surveys or Polls

640 Quantitative Studies:

Experiments:

School- and classroom-level

Multivariate studies:

Large-scale testing programs

The effect of testing on student learning

Page 26: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 26

Meta-analysis

A method for summarizing a large research literature, with a single, comparable measure.

( 0.5 effect size ≈ 1 grade level of learning )

Page 27: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 27

Findings from Phelps (2012):

• Survey study effect sizes average >1.0

• Over 90% of qualitative studies positive

• For quantitative studies, univariate effect sizes positive and stronger when:

– Testing more frequently

– Testing with feedback

– Testing with stakes

Page 28: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

28

Findings from Phelps & Silva (2015)

For quantitative studies, effect sizes vary

between 0.55 and 0.88:

+++ testing more frequently

++ testing with stakes

+ testing with feedback

International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016© 2016, Richard P PHELPS

Page 29: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 29

• size of study population

• small +0.34 over large

• scale of test administration

• small-scale +0.14 over large-scale

• responsible level of government

• local tests +0.29 over state tests

Effect of scale on testing benefits

Page 30: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

Large-scale test, tight security

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 30

Page 31: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

Large-scale test, lax security

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 31

Page 32: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 32

Besides, systemwide tests are needed for other purposes, such as…

…selection to programs with limited number of places…monitoring and system diagnosis…workforce planning…accountability…credentialing

That’s enough!

Page 33: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 33

Some large-scale test advantages

On per-student basis, inexpensive

Cognitive laboratory pre-testing possible

Standardization offers comparisons across schools and regions.

May produce high-quality items that schools and teachers can use.

MOST IMPORTANT: provides reliable, comparative information to all those not involved in a particular school

Page 34: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

The more systemwide decision points, the better ?

Figure 1: Average TIMSS Score and Number of Quality Control

Measures Used, by Country

0

10

20

30

40

50

60

70

80

0 5 10 15 20

Number of Quality Control Measures Used

Av

era

ge

Pe

rce

nt

Co

rre

ct

(gra

de

s 7

&8

)

Top-Performing Countries Bottom-Performing Countries

SOURCE: Phelps, Benchmarking to the best in mathematics, Evaluation Review, 2001

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 34

Page 35: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

Quality control has proportionally greater effect in poorer countries

Figure 2: Average TIMSS Score and Number of Quality Control

Measures Used (each adjusted for GDP/capita), by Country

Number of Quality Control Measures Used (per GDP/capita)

Av

era

ge

Pe

rce

nt

Co

rre

ct

(gra

de

s 7

& 8

)

(p

er

GD

P/c

ap

ita

)

SOURCE: Phelps, Benchmarking to the best in mathematics, Evaluation Review, 2001

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 35

Page 36: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 36

TIMSS, PIRLS, CIVED, SITES, ICILS, PPP, ECES, TEDS

IEA:

OECD PISA:

World Bank:

PISA, PISA for schoolsPISA for development

READ, SABER…provides funding for PISA

Page 37: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 37

The effect of international testing programs

Free

do

m t

o d

esig

n y

ou

r te

stin

g schooltests

international tests

state and national tests

Page 38: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 38

OECD and World Bank are run by economists

How well do economists understand PSYCHO-metrics?

Some interesting examples:

Chile’s national testing program, funded by the World Bank

OECD’s “Synergies for Better Learning” project

Page 39: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

© 2016, Richard P PHELPS International Research-to-Practice Conference, Astana, Kazakhstan, October, 2016 39

Some interesting oddities:

Page 40: Designing an Assessment System · Designing an Assessment System Richard P. Phelps International Research-to-Practice Conference Nazarbayev Intellectual Schools AEO Astana, Kazakhstan

Designing an Assessment System

richard {at} nonpartisaneducation {dot} org