degrees of incorrectness in computer adaptive language testing tsopanoglou, a., ypsilandis, g.s.,...

18
DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle University of Thessaloniki

Upload: harold-kelley

Post on 18-Jan-2018

213 views

Category:

Documents


0 download

DESCRIPTION

CAT from OUP and Univ of Cambridge

TRANSCRIPT

Page 1: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

DEGREES OF INCORRECTNESS IN

COMPUTER ADAPTIVE LANGUAGE TESTING

Tsopanoglou, A., Ypsilandis, G.S., Mouti, A.Dept of Italian Language and Literature

Aristotle University of Thessaloniki

Page 2: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

This Talk

Multiple-Choice and CAT

The Study

The Results

The Proposal

Page 3: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

CAT from OUP and Univ of Cambridge

Page 4: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

Multiple-Choice: Dichotomous

STEM - QUESTION

CORRECT ANSWER

1ST DISTRACTOR

2ND DISTRACTOR

3RD DISTRACTOR

Page 5: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

Multiple-Choice, e.g.

Who was the Prime Minister of the UK in the year 1988

CORRECT ANSWERMargaret Thatcher

1ST DISTRACTORJohn Mayor

2ND DISTRACTORElton John

3RD DISTRACTORLiverpool FC

Page 6: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

Multiple-Choice Polychotomous Pattern

STEM - QUESTION

CORRECT ANSWER

1ST DISTRACTOR ------ VERY LIKABLE / VERY SUITABLE

2ND DISTRACTOR ------ LIKABLE / SUITABLE

3RD DISTRACTOR ------ IRRELEVANT / TOTALLY WRONG

Page 7: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

The Procedure

Collection of MC questions from a CAT

Completion of test in paper by Test Takers

Test Correction in Traditional Mode

Test Correction in Experimental Mode

Listing of Test Items by Exp. Examiner

Analysis of Results

Conclusion

Page 8: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

Rating of Test Items by Expert Examiner (Native)

CLEAR - DICHOTOMOUS• Three Wrong – 1 Correct = 54• All Wrong = 6• 1 Correct – 1/2 Likable = 5• TOTAL = 65 (81%)

PATTERNED - POLYCHOTOMOUS• 2 Correct rest Wrong = 7• Entire Experimental Pattern = 1• 1 Correct – 1/2 Very Likable = 7• TOTAL = 15 (19%)

Page 9: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

Comparing Scoring Procedures 1.Traditional Mode, 1.Experimental Mode, and 3.Negative scoring

86% 68% 79% 75% 36% 49% 63% 46% 55% 88% 69% 81% 76% 40% 51% 66% 49% 55%84% 56% 74% 66% 12% 28% 36% 26% 33%

72% 51% 56% 91% 35% 51% 51% 83% 53%76% 56% 59% 91% 43% 55% 54% 84% 57%62% 40% 45% 88% 19% 35% 34% 76% 40%

Tr-Exp Pearson r = 0,995488

Tr-Neg Pearson r = 0,979421

Exp-Neg Pearson r = 0,985956

Page 10: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

Qualitative Analysis

5.5% subjects

48%-49%

Tr. Method

>=50%

Exp.Method

16.6% subjects

51%

Tr. Method

Secure

Exp.Method

89% subjects

Higher

Exp.Method

Same

Both Methods

11% subjects

Page 11: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

More … Qualitative Analysis

• All subjects scored less when corrected with negative scoring

• 5 of those corrected with negative scoring would score considerably (<=20%) lower from traditional or experimental scoring

• Of those 5 of the above category 5 scored <=55%

Page 12: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

Correlation of Answers of Experimental Pattern

Pearson r = - 0.310272. This indicates a tendency that those who score high select few totally wrong answers

Pearson r = -0.53534. This indicates that those who score high select few very likable.

Pearson r = 0,35228. This indicates a tendency that those who score very likable also score irrelevant

Page 13: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

0

10

20

30

40

50

60

70

80

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

correctvery likablelikableirrelevant

Page 14: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

Conclusions• reward intermediate levels of performance

particularly when v. suitable items are selected• increase reliability and efficiency in scoring

compared to dichotomous scoring (in agreement to Bennett, Wongwiwatthananukit, and Popovich (2000)) .

• Make Score outcomes more reflective of student knowledge compared to dichotomous scoring.

• This would give us a more precise and individualized language proficiency measurement.

Page 15: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

The Proposal• Partial Credit Scoring for (2-1-0-0) framework2a+1b• For those who wish to include negative scoring in

mapple languagetest:= proc (A, B, G, D)[a=A, b=B, g=G, d=D]if a<40 then 0

elif b>20 then e:=b-20d:=d-e:

fi:2a+1b-dend:

Page 16: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

The Proposal

• In CAT it is possible to:1. Offer more items of the same level when

a very likable / suitable distractor is selected

2. shift to an in-between level before shifting level

2. Use the code presented in the previous slide at the end of the test before offering the final verdict

Page 17: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

Future Hypothesis

• Does awareness of partial credit scoring increase test takers’ responsibility in answering the items? and

• does it change test takers’ attitude to more responsible responses?

Page 18: DEGREES OF INCORRECTNESS IN COMPUTER ADAPTIVE LANGUAGE TESTING Tsopanoglou, A., Ypsilandis, G.S., Mouti, A. Dept of Italian Language and Literature Aristotle

Concluding Remark

• Bachman (1990:280) - Spolsky (1981) address the ethical considerations of test use, questioning whether language testers have enough evidence to be sure of the decisions made on the basis of test scores.

• We would like to think that the proposed scoring technique provides supportive evidence in this direction.