designing integrated oral assessment group tasks for ......integrated assessment } integration of...

Post on 13-Nov-2020

3 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Designing Integrated Oral Assessment Group Tasks for Classroom Use: A Practical Approach

Hale Kızılcık & Deniz Şallı-Çopur

Outline }  Key Concepts }  Procedure of Test Development }  Q&A

Classroom Assessment

Angelo, T.A. & Cross, P.K. (1993). Classroom assessment techniques (2nd ed.). San Francisco: Jossey-Bass. Steadman, M. (1998). Using classroom assessment to change both teaching and learning. New Directions for Teaching and Learning, 75, 23-35.

}  is learner-centered, teacher-driven, mutually beneficial, formative, context-specific, ongoing and rooted in good practice (Angelo & Cross, 1993)

}  provides information on what, how much and how well the students are learning (Angelo & Cross, 1993)

}  improves both teaching and learning (Steadman, 1998)

The Spectrum of Classroom Assessments (Gareis & Grant, 2015)

Reading Facial

Expression

Oral Q & A Paper & Pencil Quizzes

and Tests

Essays Performance Assessments

Project-Based Assessments

Gareis, C.R. & Grant, L.W. (2015). Teacher-made assessments: How to connect curriculum, instruction and student learning. (2nd Ed.). New York: Routledge. McMillan, J.H. (2007). Classroom assessment: Principles and practices for effective standards-based instruction.(4th Ed). USA: Pearson.

}  Authentic, integrated & involving. (McMillan, 2007)

Integrated Assessment }  Integration of skills (Reading-into-Speaking) }  Authentic }  Scaffolding for productive skills (vocabulary, content, and the like) }  Positive backwash (Cumming, 2013)

}  Deeper understanding of the texts }  Appropriate handling of source materials

Cumming, A. (2013) Assessing integrated writing tasks for academic purposes: Promises and perils. Language Assessment Quarterly, 10, 1-8.

Challenges of Integrated Assessment (Cumming, 2013) }  Construct definition is problematic }  Difficult to prepare and score }  Suitable for students at a certain threshold level of proficiency

Cumming, A. (2013) Assessing integrated writing tasks for academic purposes: Promises and perils. Language Assessment Quarterly, 10, 1-8.

Assessing in Groups }  More than 2 students }  Real-life }  Time saving

Challenges of Assessing in Groups }  Highly unpredictable }  The quality of the discussion is highly susceptible to performance of the

individuals and the group dynamics }  Students’ proficiency levels }  Personality }  Communication styles }  Relationship between/among them

Kızılcık, H. & Şallı-Çopur, D. (2016). Integrating dynamic assessment principles in testing discussion skills: Operationalizing theory. Paper presented in the IATEFL Testing, Evaluation and Assessment SIG Conference: From Theory into Practice-Assessment Literacy in the Classroom, Aigle, Switzerland.

Procedure

1.  Identify Target Language Use 2.  Initial Task Design 3.  Defining the Construct 4.  Scale Development

a.  CEFR & More b.  Using Performance Samples

5.  Piloting (Larger sample) 6.  Validation 7.  Test Specifications

The Context of the Sample Study

ü  International English-Medium University ü  English at Freshman Level: ENG 102 English for Academic Purposes I

1. Identify Target Language Use (TLU) }  What kind of speaking students are engaged in TLU situation?

}  Needs Analysis }  Literature Search

1. Identify Target Language Use

ü  Seminars & Discussion

Integrated Reading-Speaking Task in University Setting (de Chazal, 2014, p.243)

Communicative Event: Seminar/ Discussion Characteristics, Implications, and Opportunities for Speaking

Spontaneity offering in-discussion contributions and questions

Content Knowledge

contributing original content arising from preparation and reading, and extending knowledge through engagement with other participants’ contributions

Language & Skill Development

trying out new language including functional language like asking for clarification

Criticality critiquing the arguments and ideas of other participants, and asking and answering critical questions

Integration referring to source texts including lectures

Academic developing academic credibility through rigorous responses (i.e. supported by sound evidence) and questioning

de Chazal, E. (2014). English for academic purposes. Oxford: Oxford University Press.

2. Designing the Task Task Prompt Specifications

Language of instructions/ channel

Duration of the test

Number of assessors

Recorded

Flexibility of task frame

Flexibility of interlocutor frame

Specification of content

Interaction type

Discourse mode (genre)

Audience Topic

Planning time Setting

Grouping

CoE (2009). Relating language examinations to the Common European Framework of Reference for languages: Learning, teaching, assessment (CEFR). Manual. Strasbourg: Council of Europe. Retrieved https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680667a2d

2. Designing the Task Task Prompt Specifications Our Task

Language of instructions/ channel English; same as level of test; Written prompt

Duration of the test Approximately 20 minutes

Number of assessors 1 (the teacher)

Recorded See guidelines for recording

Flexibility of task frame Partially controlled

Flexibility of interlocutor frame Open-ended; partially controlled

Specification of content Specified

Interaction type Dialogue: grouped test takers

Discourse mode (genre) Small group discussion

Audience Other students Topic The Changing Face of Power

Planning time 5 minutes before discussion for arrangement Setting Educational

Grouping See the guidelines for grouping

CoE (2009). Relating language examinations to the Common European Framework of Reference for languages: Learning, teaching, assessment (CEFR). Manual. Strasbourg: Council of Europe. Retrieved https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680667a2d

2. Designing the Task: Sample Task

Group Discussion Task Date: April 24, 2017 Duration: 15-20 minutes Grouping: Groups of 3-4 Source: ENG 102 Coursebook (Gülcü, M., Gülen, G., Şeşen, E. & Tokdemir, G. (2015). The Compass: Route to Academic English 2. Nüans Publishing.)

-  Read the text “The Changing Face of Power” in the ENG 102 coursebook (pp.110-112)

to identify the main ideas and key details/examples in the text. Note down what you personally think of the views discussed in the text and make a list of the points you would like to discuss or elaborate on in the discussion group in class.

-  Annotating the main ideas & key details, and answering the comprehension questions in the book will help you for the class discussion.

3. Defining the Construct }  Reading -into- Speaking

}  “Construct definition needs to be operational… and it needs to be associated with things that can be observed and... scored” (Fulcher, 2003, p. 18).

Fulcher, G. (2003). Testing second language speaking. Harlow: Pearson Educated Limited.

3. Defining the Construct: Reading A two-model account of reading comprehension

(Grabe, 2009; Grabe & Stoller, 2013; Weigle, Yang &Montee, 2013)

Text model of comprehension - identifying main arguments - paraphrasing ideas - summarising paragraphs - identifying relationships among ideas

Situation model of reader interpretation - evaluating information - connecting to background knowledge - showing emotion

- taking a stance - interpreting author’s tone

Grabe, W. (2009). Reading in a second language: Moving from theory to practice. New York: Cambridge University Press. Grabe, W. & Stoller, F. L. (2013). Teaching and researching reading (2nd ed). Harlow: Longman. Weigle, S. C., Yang, W. and Montee, M. (2013). Exploring reading processes in an academic reading test using short answer questions. Language Assessment Quarterly, 10 (1): 28-48.

3. Defining the Construct: Speaking as Interaction It is recognized in education that language is used to think about a subject and to talk about that thinking in a dynamic co-constructive process

(CoE, 2018, p. 117) Communication is a co-construction. In assessment, should we not take the interlocutor into account in our predictions of successful communication? But how can that be done?

(McNamara, 2000, pp. 84- 85) The focus of attention is on individual performance rather than joint competence (Walsh, 2011, p. 159).

CoE. (2018). CEFR companion volume with new descriptors. Retrieved from www.coe.int/lang-cefr McNamara, T. (2000). Language testing. UK, Oxford: Oxford University Press Walsh, S. (2011). Exploring classroom discourse: Language in action. New York: Routledge.

3. Defining the Construct: Working Definition

}  (1) reading a text in order to comprehend the main and supporting ideas, and to interpret the information in the text incorporating background knowledge, attitudes and beliefs;

}  (2) engaging in a group discussion collaborating to deepen initial understanding of the text by outlining the main and supporting ideas, and elaborating on initial interpretations.

}  (1-2) Skills to be addressed are: text comprehension, text interpretation, interaction, and language use.

4. Scale Development

}  «Each level (or band) in the rating scale is characterized by a verbal descriptor which, taken together, constitute the operational definition of the construct that the test developer claims to be assessing» (Fulcher, 2012 referring to Fulcher, 1996; Davies et al, 1999).

Fulcher, G. (2012). Scoring performance tests. In G. Fulcher & F. Davidson. The Routledge handbook of language testing. (pp. 378-392). London: Routledge.

4. A. Scale Development: CEFR (2001) Scales 1. Turntaking strategies

2. Cooperating strategies

3. Asking for clarification

4. Fluency

5. Flexibility

6. Coherence

7. Thematic development

8. Precision

9. Socio-linguistic competence

10. General range

11. Vocabulary range

12. Grammatical accuracy

13. Vocabulary control

14. Phonological control

Suggested Categories for Oral Assessment in the CEFR (2001, p.193)

CoE (2001). Common European Framework of Reference for languages: Learning, teaching, assessment. https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680459f97

4. a. Scale Development: CEFR Scales

CoE. (2018). CEFR companion volume with new descriptors. Retrieved from www.coe.int/lang-cefr

4. a. Scale Development Test Focus

Integrated Skills

4. a. Scale Development: CEFR: A Manual (2009)

Test Focus Suggested Scales from CEFR Manual (2009)

Integrated Skills

Processing Text Comprehension Written Production

CoE (2009). Relating language examinations to the Common European Framework of Reference for languages: Learning, teaching, assessment (CEFR). Manual. Strasbourg: Council of Europe. https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680667a2d

4. a. Scale Development: CEFR Manual (2009)

Test Focus Suggested Scales from CEFR Manual (2009 > 2001)

Our Suggestion

Integrated Skills

Processing Text (p.96) Comprehension (pp.66 & 69) Written Production (p.61)

Comprehension Text Model of Comprehension Situation Model of Interpretation

Interaction Overall spoken interaction (p.74) Formal Discussions (p.78) Goal-Oriented Cooperation (p.79) Turn taking/Taking the floor (Initiation, Interruption) (p.86 & p.124) Cooperating (p.86) Asking for clarification (p.87) Monitoring & Repair (p.65) Sociolinguistic appropriateness (p.122) Planning (dependent on rehearsed speech) (p.64)

Language & Discourse Management Accuracy and range (grammar & vocabulary, pp.112, 114) Phonological control (p.117) Flexibility (p.124) Compensating (p.64) Propositional precision (p.129) Spoken Fluency (p.219) Coherence & Cohesion (p.125)

CoE. (2001). Common European Framework of Reference for languages: Learning, teaching, assessment. Retrieved: https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680459f97 CoE. (2009). Relating language examinations to the Common European Framework of Reference for languages: Learning, teaching, assessment (CEFR). Manual. Strasbourg: Council of Europe. Retrieved from https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680667a2d

4. a. Scale Development: CEFR Companion (2018)

Test Focus Suggested Scales from CEFR Manual (2018)

Our Suggestion (with a pinch of salt)

Integrated Skills

Mediation Activities Mediating a text Mediating concepts Mediation Strategies

Comprehension Processing text in speech (akin to reading for information and argument in the CEFR) Expressing a personal response to creative texts Analysis and criticism of creative texts Interaction Colloborating in a group

Facilitating colloborative interaction with peers Colloborating to construct meaning

Leading group work Managing interaction Encouraging conceptual talk

Language & Discourse Management Strategies to explain a new concept

Linking to previous knowledge Breaking down complicated information

Strategies to simplify a text Adapting language Amplifying a dense text Streamlining a text

CoE. (2018). CEFR companion volume with new descriptors. Retrieved from www.coe.int/lang-cefr

«The mediation descriptors are

particularly relevant for the classroom in

connection with small group, collaborative

tasks» (p.34)

4. b. Developing the Scale

}  Holistic or Analytical Scale }  Intuitive or Empirical Approaches (Carr, 2011, p. 136; CEFR, 2001; Fulcher, 2010)

}  Student-Performance Samples }  Recording }  Transcription }  Analysis

Carr, N. T. (2011). Designing and analyzing language tests. Oxford, New York. United Kingdom: Oxford University Press. CoE (2001). Common European Framework of Reference for languages: Learning, teaching, assessment. Retrieved: https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680459f97 Fulcher, G. (2010). Practical language testing. Great Britain, London: Hodder Education.

4. b. Developing the Scale: Template for Transcription Analysis

Turn Time Exchange Analysis & Comments

  Name of the student: Comprehension:  

Interaction:  

Language (Grammar, Vocabulary & Discourse Management):  

Fluency:  

Comments:  

Utterences that are verbally responded to

or clearly spoken words that are available to be

responded to (Leaper & Brawn, 2018)

Leaper, D. A. & Brawn, J.R. (2018). Detecting development of speaking proficiency with a group oral test: A quantitative analysis. Language Testing. https://doi.org/10.1177/0265532218779626

Turn Time Exchange Analysis & Comments27.  6:07 Seda: I mean first one was the traditional

view you mentioned /Osman: Hmm/ the hard power ahm… because people thought there was a general impression that a… how successful you are in a war shows that how much power you have. Of course today is more… we are living in a ge… age of information and technology and that’s why the raw material losing its value in terms of power… gaining power

Comprehension: Text-Model of Comprehension Situation-Model of InterpretationInteraction: S1 <-> S2 Referring to a previous argument addressing the contributorLanguage: Good command of languageFluency: Fluent with false starts that do not break the flowComments: Has this turned into a dialogue between Seda & Osman? Does this explain what Tuğçe said? Are they so harmonious that the others find it difficult to enter the discussion?

28.  6:38 Osman: But they are still needed. Comprehension: Situation-Model of InterpretationInteraction: Challenges Seda’s raw material argument thus providing an explanation preventing a possible misunderstanding (prevents loose ends)

29.  6:40 Seda: Of course (0.2) Comprehension: Situation-Model of InterpretationInteraction: S1 <-> S2 Uptake (content)

4. b. Developing the Scale: Sample Excerpt

4. b. Developing the Scale: Skills and Functions of Interest • Comprehension

•  Text Model of Comprehension •  Situation Model of Interpretation

•  Interaction •  Initiation •  Expansion •  Repetition (Planning) •  Repair •  Interruption •  Responding to ideas (cooperating) •  Scaffolding (cooperating) •  Asking for clarrification

• Language •  Accuracy and complexity •  Fluency •  Discourse Management

• Comprehension •  Text Model of Comprehension •  Situation Model of Interpretation

•  Interaction •  Turn taking/Taking the floor (Initiation, Interruption) •  Cooperating •  Asking for clarification •  Monitoring & Repair •  Socio-linguistic appropriateness •  Planning (dependent on rehearsed speech)

• Language & Discourse Management •  Accuracy and range •  Flexibility •  Compensating •  Propositional precision •  Spoken Fluency •  Coherence & Cohesion

4. Developing the Scale: Interpretation of the Scores Successful Performance

5-4Average Performance

3Unsatisfactory Performance

2-1The performance is successful.   It provides satisfactory evidence showing that the student can effectively participate in group discussions of the assigned texts.  

The performance is somehow satisfactory.   It provides evidence that students can participate in group discussions of assigned texts with some difficulty. There is room for further practice.

The performance is not satisfactory.   It provides evidence that student has not yet developed skills to participate in group discussions of the assigned texts.   There is room for remedial work.

4. Developing the Scale: Fluency Successful Performance

5-4Average Performance

3Unsatisfactory Performance

2-1Fluent with some natural false starts that do not break the flow. (5)   Generally fluent both in short and long turns but may occasionally lose fluency especially when there is rapid topic change. (4)

Fluent in shorter turns but hesitant in longer turns especially when there is rapid topic change.

Plenty of false starts and hesitations; making it difficult to follow. (2)   Long unnatural pauses causing a stretch. (1)

4. Scale Development: The Scale

Initiates and expands the discussion by asking questions, providing examples, and/or reflecting different perspectives. Effectively responds to ideas, comments and questions by the group members leaving no suspending points.

4. Scale Development: The Scale

Demonstrates a jagged performance characterized by some hesitations in longer turns especially when searching for words or gathering thoughts but by fluent delivery at other times. Uses intonation accurately most of the time.

√ √/X X 1. Are the descriptors positively worded? (CEFR, Carr)  

2. Do the descriptors avoid vagueness enabling raters to make concrete distinctions between the steps (over reliance on qualifiers like “some, most, etc.” to distinguish levels, which may be interpreted differently by the raters)? (CEFR, Carr)

 

3. Are the descriptors written in simple syntax with an explicit, logical structure? (CEFR)  

4. Are the descriptors free from jargon? (CEFR)  

5. Are the descriptors brief (fewer than 25 words is the desired limit)? (CEFR, Carr)  

6. Are the descriptors independent from the descriptors in other bands? (e.g. not as accurately as students at Level Y)  

7. Do the descriptors provide a verbal description of performance levels (not 1, 2, 3 and so on in each category)? (Carr)  

8. Do the descriptors include the extent to which a listener or reader would find something clear or unclear, would be distracted by errors, or would have difficulty understanding due errors? (Carr)

 

9. Do the descriptors include a feasible number of categories (4 to 5, not more than 7)? (CEFR)  

10. Are the descriptors parallel across levels? (if a certain aspect of the response is addressed in the descriptors for one score band, it should be addressed in other bands as well) (Carr)

 

11. Can the descriptors be used for continuous teaching assessment and/or self-assessment? (CEFR)  

4. Scale Development: Checklist

Carr, T.A. (2011). Designing and analysing language tests. Oxford: OUP. CoE. (2001). Common European Framework of Reference for languages: Learning, teaching, assessment. Retrieved: https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680459f97

5. Piloting: Larger Sample }  Polish the Task & Scale

6. Validation: Task & Scale The process of test validation is ongoing, involving an accumulation of evidence supporting the use of a test from a variety of perspectives (Messick, 1989 in Weigle, 2004).

}  Validation 1 (Luoma, 2004): }  Documenting the test development process starting from TLU to

developing the rating scale }  Test specifications

}  Validation 2 (Luoma, 2004): Student feedback & backwash

Luoma, S. (2004). Assessing speaking. Cambridge: Cambridge University Press. Weigle, S. C. (2004). Integrating reading and writing in a competency test for non-native speakers of English. Assessing Writing, 9, 27-55.

6. Validation: Student Evaluation of the Task

 

1. The task encouraged me to read the assigned texts.

2. The task encouraged me to paraphrase and summarize the main ideas and key details in the text.

3. Discussing the texts with my friends helped me to better understand the texts.

4. The task provided me with the opportunity to interpret the texts using my perspective/experience/background knowledge.

5. The task encouraged me to listen and respond to my friends’ contributions.

6. I enjoyed exchanging ideas with my friends.

7. The task encouraged me to practice spontaneous speaking skills.

8. My language proficiency was sufficient to complete the task.

Procedure

7. Test Specifications: Council of Europe (2009, pp. 153-179) 1. General Information }  General Statement of Purpose: (Needs analysis

report, TLU, Purpose, Learner profile) }  Integration of skills }  Construct definition }  Target level performance 2. Task Prompt }  Language of instructions }  Duration of the test }  Number of assessors }  Recorded }  Flexibility of task frame }  Flexibility of interlocutor frame }  Specification of content }  Interaction type }  Discourse mode (genre) }  Audience }  Topic }  Planning time }  Setting }  Grouping

3. Texts }  Text source }  Genre }  Discourse type/ rhetorical function }  Mode of input }  Domain }  Topics/ themes }  Nature of content }  Text familiarity }  Readability }  Text length }  Vocabulary }  Grammar }  Text likely to be comprehensible by learner at

CEFR level   4. Response (the expected spoken response elicited by the task) }  Length of response }  Rhetorical functions }  Text purpose }  Register

}  Domain }  Grammatical competence expected }  Lexical competence expected }  Discoursal competence expected }  Authenticity: Situational }  Authenticity: Interactional }  Cognitive processing }  Content knowledge required   5. Rating of Task }  Rating method }  Assessment criteria }  Number of raters }  Use of moderator   6. Feedback to test taker }  Quantitative feedback }  Qualitative feedback

CoE (2009). Relating language examinations to the Common European Framework of Reference for languages: Learning, teaching, assessment (CEFR). Manual. Strasbourg: Council of Europe. https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680667a2d

Conclusion

Image adapted from: http://www.tmdu-global.jp/events/events/201706/d-cafe_20170708.html

Thank You!

Hale Kızılcık & Deniz Şallı-Çopur khale@metu.edu.tr dsalli@metu.edu.tr

References

Angelo, T. A., and Cross, K. P. (1993). Classroom assessment techniques: A handbook for college teachers. (2nd ed.) San Francisco: Jossey-Bass. Carr, T.A. (2011). Designing and analysing language tests. Oxford: OUP. Council of Europe (CoE) (2001). Common European Framework of Reference for languages: Learning, teaching, assessment. Retrieved: https://rm.coe.int/

CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680459f97 Council of Europe (CoE) (2009). Relating language examinations to the Common European Framework of Reference for languages: Learning, teaching, assessment (CEFR).

Manual. Strasbourg: Council of Europe. Retrieved: https://rm.coe.int/CoERMPublicCommonSearchServices/DisplayDCTMContent?documentId=0900001680667a2d

Council of Europe. (CoE)(2018). CEFR companion volume with new descriptors. Retrieved from www.coe.int/lang-cefr Cumming, A. (2013) Assessing integrated writing tasks for academic purposes: Promises and perils. Language Assessment Quarterly, 10, 1-8. de Chazal, E. (2014). English for academic purposes. United Kingdom: Oxford University Press. Fulcher, G. (2003). Testing second language speaking. Harlow: Pearson Educated Limited. Fulcher, G. (2010). Practical language testing. Great Britain, London: Hodder Education. Fulcher, G. (2012). Scoring performance tests. In G. Fulcher & F. Davidson. The Routledge handbook of language testing. (pp. 378-392). London: Routledge. Gareis, C.R. & Grant, L.W. (2015). Teacher-made assessments: How to connect curriculum, instruction, and student learning. (2nd.Ed.) New York: Routledge. Grabe, W. (2009). Reading in a second language: Moving from theory to practice. New York: Cambridge University Press. Grabe, W. & Stoller, F. L. (2013). Teaching and researching reading (2nd ed). Harlow: Longman. Kızılcık, H. and Şallı-Çopur, D. (2016). Integrating dynamic assessment principles in testing discussion skills: Operationalizing theory. Paper presented in the IATEFL Testing,

Evaluation and Assessment SIG Conference: From Theory into Practice-Assessment Literacy in the Classroom, Aigle, Switzerland. Leaper, D. A. & Brawn, J.R. (2018). Detecting development of speaking proficiency with a group oral test: A quantitative analysis. Language Testing. https://doi.org/10.1177/0265532218779626 Luoma, S. (2004). Assessing speaking. Cambridge: Cambridge University Press. McNamara, T. (2000). Language testing. UK, Oxford: Oxford University Press. McMillan, J.H. (2007). Classroom assessment: Principles and practices for effective standards-based instruction.(4th Ed). USA: Pearson. Steadman, M. (1998). Using classroom assessment to change both teaching and learning. New Directions for Teaching and Learning, 75, 23-35. Walsh, S. (2011). Exploring classroom discourse: Language in action. New York: Routledge. Weigle, S. C. (2004). Integrating reading and writing in a competency test for non-native speakers of English. Assessing Writing, 9, 27-55. Weigle, S. C., Yang, W. & Montee, M. (2013). Exploring reading processes in an academic reading test using short answer questions. Language Assessment Quarterly, 10

(1): 28-48.

top related