developing an in-house speaking assessment: rasch analysis for action research

39
DEVELOPING AN IN-HOUSE SPEAKING ASSESSMENT: RAS CH A NALYSIS FOR A CTION RESEARCH FLLT 2016 Bangkok, Thailand June 24, 2016 by Andy Vajirasarn Toyo University (Tokyo, Japan)

Upload: andy-vajirasarn

Post on 20-Feb-2017

75 views

Category:

Education


0 download

TRANSCRIPT

Page 1: Developing an in-house speaking assessment: Rasch analysis for action research

DEVELOPIN

G AN IN-

HOUSE SPEAKING

ASSESSMENT:

R A S C H AN A LY S I S

FO R A

C T I ON R

E S E A R C H

FLLT 2016Bangkok, ThailandJune 24, 2016

by Andy VajirasarnToyo University (Tokyo, Japan)

Page 2: Developing an in-house speaking assessment: Rasch analysis for action research

OVERVIEW Action Research Speaking Assessment Development Study and Results/Conclusions What’s next?

Page 3: Developing an in-house speaking assessment: Rasch analysis for action research

ACTION RESEARCH

T H E O R Y P R A C T I C

E

Page 4: Developing an in-house speaking assessment: Rasch analysis for action research

ACTION RESEARCH: DEFINITIONS

“…most agree on the following: action research is inquiry that is done by or with insiders to an organization or community, but never to or on them.” (Anderson, 2005, in Ivankova, 2015)

“Action research is a family of practices of living inquiry... It seeks to bring together action and reflection, theory and practice, in participation with others, in the pursuit of practical solutions to issues of pressing concern to people, and more generally, the flourishing of individual persons and their communities.” (Reason & Bradbury, 2008, in Ivankova, 2015)

Methodological features: systematic, cyclical, and flexible; it involves collection of multiple data sources and generation of a plan of action or intervention. (Ivankova, 2015)

Page 5: Developing an in-house speaking assessment: Rasch analysis for action research

ACTION RESEARCH MODELS

Kurt Lewin (1948)

Kemmis and McTaggart (2007)

Stringer (2014)

Page 6: Developing an in-house speaking assessment: Rasch analysis for action research

ACTION RESEARCH CYCLE (IVANKOVA, 2015)

DIAGNOSIS

RECONNAISSANCE

PLAN

ACT

MONITOR

EVAULATE

Page 7: Developing an in-house speaking assessment: Rasch analysis for action research

ACTION RESEARCH MODELS

Ivankova’s (2015) model1. Diagnosis: identify the problem2. Reconnaissance: gather information3. Plan: plan the intervention4. Act: execute the intervention5. Evaluate: assess the intervention6. Monitor: monitor for further improvement needs

DIAGNOSI

S

RECONNAISSANCE

PLAN

ACT

MONITOR

EVAULATE

Page 8: Developing an in-house speaking assessment: Rasch analysis for action research

STAGE 1: DIAGNOSIS of CONTEXT Private Japanese university Annual Screening test to enter Advanced English Program I participated as 1 of the 4 volunteer judges 80 candidates seen in one day Group interview: 6 candidates at once 10 -15 minutes per group

Page 9: Developing an in-house speaking assessment: Rasch analysis for action research

STAGE 1 (DIAGNOSIS): WHAT’S THE PROBLEM? No meetings, consensus-building or training for raters. Raters instructed to “ask anything” as interview questions

! Never knew what other judges would ask until the day of! No control for quality of questions: topic and wording difficulty

levels Marks given for English Skills and Motivation using “S, A,

B, C, or F”! No criteria given for English skills or motivation! No guidance given about meaning of levelsoWhat is B in Motivation? oWhat is a C in English skills?

In brief: very loosely run; purely intuition-based

Page 10: Developing an in-house speaking assessment: Rasch analysis for action research

STAGE 2: RECONNAISSANCE Request and obtained permission for action research

project. YES! Field Notes

o Conducted observation of a speaking assessment Transcripts

o Conducted interviews with director and other volunteer judges• Review of speaking assessment development literatureo Brown (2012) on rubrics and language assessmento Fulcher and Davidson (2007) on instrument design and validityo Luoma (2004) on assessing speakingo Taylor (2011) on how the IELTS test was validated

Page 11: Developing an in-house speaking assessment: Rasch analysis for action research

RECONNAISSANCE OUTCOMES Bank of questions (2 sets: easier and harder) Procedure: Start with easier Q first, move on to harder Q Scoring rubric (matrix of criteria)

4 Categories 4 levels of descriptorso Content Relevance 4o Content Support 3o Fluency 2o Accuracy 1

Procedure: Check validity using Multifaceted Rasch Measurement (Bond & Fox 2007; Fulcher & Davidson, 2007; Linacre, 2006)

Page 12: Developing an in-house speaking assessment: Rasch analysis for action research
Page 13: Developing an in-house speaking assessment: Rasch analysis for action research
Page 14: Developing an in-house speaking assessment: Rasch analysis for action research

MULTIFACETED RASCH

ANALYSIS (M

FRM)

S O P H I ST I C

A T E D ST A T I S

T I CA L A

N A LY S E S

Page 15: Developing an in-house speaking assessment: Rasch analysis for action research

MFRM and “FACETS” (Linacre, 2006)

Raw data is used to build a modelData is then compared to the model for how well it “fits”

(checked with infit and outfit mean squares OR t-test’s z-scores)

Passing the fit test = your instrument is “constructive for measurement”

Data Handling• Robust against MISSING VALUES• Raters do not need to rate every candidate; overlapping

groups are fine.• Finer grain view of data is possible via logit score/scale

Page 16: Developing an in-house speaking assessment: Rasch analysis for action research

MFRM and “FACETS” (Linacre, 2006)Provides measures of examinee ability rater severity rating category difficulty level rating scale use

Fairness adjustment post-scoring adjusting possible Based on all raters’ severity data, an adjustment value calculated Provides “adjusted score” alongside “observed score” (raw data)

Page 17: Developing an in-house speaking assessment: Rasch analysis for action research

THE PIL

OT STU

DY

W H AT AM I

L O O K I NG F

O R & H

O W ?

Page 18: Developing an in-house speaking assessment: Rasch analysis for action research

STAGE 3: PLAN the INTERVENTIONPurpose of the study:

Using MFRM to seek evidence for the soundness of new assessment scale.

Research questions:1. How are the performances of the examinees, raters & rubric

categories related when they are put on the same logit scale?

2. To what degree are different raters scoring in the same ways?

3. How difficult or easy are the rubric categories relative to each other?

Page 19: Developing an in-house speaking assessment: Rasch analysis for action research

PHASE 1: THE TRIAL INTERVIEW TEST 20 volunteer candidates were recruited Candidates given procedure, question lists & the rubric I conducted one-on-one interviews with each candidate Sessions were recorded using an IC recorder

Page 20: Developing an in-house speaking assessment: Rasch analysis for action research

PHASE 2: RATING 5 language professionals recruited as raters: 4 + myself

3 native English speakers; 2 L1 Japanese non-native English speakers

Rubrics provided along with audio data No formal training session, but informal verbal instruction on rubric

use 20 candidates submitted self-ratings

Each candidate given their own interview audio data Self-ratings submitted by email

Page 21: Developing an in-house speaking assessment: Rasch analysis for action research

RATING WORKLOAD

Page 22: Developing an in-house speaking assessment: Rasch analysis for action research

RESULTS

W H AT DI D

I F I N

D ?

Page 23: Developing an in-house speaking assessment: Rasch analysis for action research

RESULTS WALKTHROUGHTable 2 Raw scoresTable 3 Summary Fit StatisticsFigure 1 Vertical Ruler: Summary of all facets RQ 1Table 4 Student Ability RQ 1Table 5 Rater Severity RQ 1 & 2Table 6 Rating Criteria RQ 1 & 3Figure 2 Rating Scale Probability Curves ( not RQ

but related to purpose)

Page 24: Developing an in-house speaking assessment: Rasch analysis for action research

TABL

E 2:

RAW

SCO

RES

Content Relevance

Content Support

Fluency

Accuracy

16 points 12 points 10 points

16 points

10 points15 points

Page 25: Developing an in-house speaking assessment: Rasch analysis for action research

TABLE 3: SUMMARY FIT STATISTICSSeparate and unique examinee abilities

Content Relevance, Content support, Fluency, accuracy are significantly separate constructs as categories.

Too high . Raters are a bit too separate and unique.Lower value here means a more “normed” rating ability.

Page 26: Developing an in-house speaking assessment: Rasch analysis for action research

FIGU

RE 1

: VER

TICA

L RU

LER

Page 27: Developing an in-house speaking assessment: Rasch analysis for action research

FIGU

RE 1

: VER

TICA

L RU

LER

#17 as student is an average- abilityspeaker.

#17 as rater is lenient on himself

Page 28: Developing an in-house speaking assessment: Rasch analysis for action research

FIGU

RE 1

: VER

TICA

L RU

LER

#12 as student is a high-ability speaker

#12 as rater is too strict on herself

Page 29: Developing an in-house speaking assessment: Rasch analysis for action research

FIGU

RE 1

: VER

TICA

L RU

LER

Student raters (#1 – 20) show 17 SD units range.Too varied and erratic as “good” measurement.

Page 30: Developing an in-house speaking assessment: Rasch analysis for action research

FIGU

RE 1

: VER

TICA

L RU

LER

Professional raters’ (#21-25) grouped near each other.Not too far (- or + 2 logits) from the mean.Much “fairer” as raters than the students are.

Page 31: Developing an in-house speaking assessment: Rasch analysis for action research

FIGU

RE 1

: VER

TICA

L RU

LER

The “fairest one of all” is a non-native English speaker…And so is the second fairest! (#25 and #24).

The native English speakers were…a bit more lenient (#21 – 23).

Page 32: Developing an in-house speaking assessment: Rasch analysis for action research

TABL

E 4:

STU

DENT

ABI

LITY

Reasonable range for fit mean squares = .4 to 1.2

ORZ scores are within -2 to +2 SD

Page 33: Developing an in-house speaking assessment: Rasch analysis for action research

TABL

E 5:

RA

TER

SEVE

RITY

Reasonable range for fit mean squares = .4 to 1.2

ORZ scores are within -2 to +2 SD

Page 34: Developing an in-house speaking assessment: Rasch analysis for action research

TABLE 6: RATING CRITERIA

Reasonable range for fit mean squares = .4 to 1.2

ORZ scores are within -2 to +2 SD

Page 35: Developing an in-house speaking assessment: Rasch analysis for action research

FIGU

RE 2

: PR

OBAB

ILIT

Y CU

RVES

Page 36: Developing an in-house speaking assessment: Rasch analysis for action research

SUMMARY/CONCLUSIONThere is evidence that this scoring system provided statistically consistent measures of student ability, rater severity, and rubric functioning.Overall goodness of fit looks OK: all facets had z-scores within 2 SD

range. Student ability: good spread from high to low. Rater severity:

Self-ratings erratic, unexpected. (Remove them and re-calculate?) Professional raters’ were stable and fair.

Categories: evidence for unique constructs. Levels:

• All levels (1 to 4) were used enough times.• No merging / collapsing of unused levels needed.

Page 37: Developing an in-house speaking assessment: Rasch analysis for action research

WHAT’S NEXT?Before using under LIVE test-taking conditions, it would be best to…• design rater training program materials (w/recordings and ratings

from trial)• assess quality and difficulty levels of the interview questions• conduct post-scoring interviews (with raters and candidates)• For the rest… See me in Nagoya at JALT this November for stages 4, 5, & 6.

Thanks for coming!

DIAGNOSI

S

RECONNAISSANCE

PLAN

ACT

MONITOR

EVAULATE

Page 38: Developing an in-house speaking assessment: Rasch analysis for action research

WORKS CITEDBond, T.G. and Fox, C. M. (2012). Applying the Rasch Model: Fundamental

Measurement in the Human Sciences, (2nd ed). New York: Routledge.Brown, J.D. (Ed.). (2012). Developing, using, and analyzing rubrics in language

assessment with case studies in Asian and Pacific Languages. Honolulu, HI: NFLRC.

Fulcher, G. & Davidson, F. (2007). Language Testing and Assessment: an Advanced Resource Book. New York: Routledge.

Ivankova, N. (2015). Mixed Methods Applications in Action Research: From method to community action. Los Angeles: Sage Publications.

Linacre, J. M. (2006). Facets Rasch measurement computer program, version 3.61.0. Chicago: Winsteps.com.

Luoma, S. (2004). Assessing speaking. Cambridge: Cambridge University Press.Taylor, L. (Ed.). (2011). Examining speaking: Research and practice in assessing

second language speaking. Cambridge: Cambridge University Press.

Page 39: Developing an in-house speaking assessment: Rasch analysis for action research

DEVELOPIN

G AN IN-

HOUSE SPEAKING

ASSESSMENT:

R A S C H AN A LY S I S

FO R A

C T I ON R

E S E A R C H

FLLT 2016Bangkok, ThailandJune 24, 2016

by Andy Vajirasarn