2 -reference questions
TRANSCRIPT
-
7/29/2019 2 -Reference Questions
1/7
Reference questions
I -Testing writing:
1 - What should we take into account to make sure that writing tasks arerepresentative and valid?
Content validity
People will simply be better at some tasks than others. So, if we arent able to
include every task and happen to choose just the task or tasks that a candidate
is particularly good or bad at, then the outcome is likely to be very different.
Try to select a representative set of tasks.
And the more tasks within reason that we set, the more representative of a
candidates ability we obtain.It is also to be remembered that if a test includes a wide ranging and
representative sample of specifications, the test is more likely to have a
beneficial backwash effect.
Accuracy is important: depending on the stakes (moving students from one class
to another Accuracy is not important) but it is important for
overseas students. (more samples are required)
2 - How can you ensure valid and reliable scoring?
Set as many separate tasks as is feasible
Need to include a representative sample of the specified content.
Peoples performance even on the same task is unlikely to be perfectly
consistent. Therefore offer as many fresh starts as possible and each task can
represent a fresh start. We will achieve greater reliability and so greater validity.
Test only writing ability, and nothing else
For the sake of validity tasks should not measure creativity, imagination, script-
writing ability.
The ability of reading interferes with the accurate measurement of writing.
Candidates are expected to be able to read simple instructions (ensure this)
Instructions should not be too long
Instead make use of illustrations.
-
7/29/2019 2 -Reference Questions
2/7
A series of pictures can be used to elicit a narrative
Ensure valid and reliable scoring
Set tasks which can be reliably scored
A number of suggestions made to obtain a representative performance will alsofacilitate reliable scoring.
Set as many tasks as possible
The more scores for each candidate, the more reliable should be the total score.
Restrict candidates
Candidates should know just what is required of them, and they should not be
allowed to go too far astray.
Provide information in the form of notes (or pictures)
Tasks should not only fit well with the specifications, but they should also be
made authentic as possible.
The greater the restrictions imposed on the candidates, the more directly
comparable will be the performances of different candidates.
Give no choice of tasks
Making the candidates perform all tasks also makes comparisons between
candidates easier.
Ensure long enough samples
The samples of writing that are elicited have to be long enough for judgments to
be made reliably. (obtain reliable information on students organizational ability
in writing, the pieces have to be long enough for organization to reveal itself).
Given a fixed period of time for the test.
Create appropriate scales of scoring:
Holistic scoring
Instead of scoring only once, each students work is scores by 4 trained scorers =
high scorer reliability (research has proven it)
Not every scoring system will give equally valid and reliable results in every
situation. The system has to be appropriate to the level of the candidates and
the purpose of the tests.
-
7/29/2019 2 -Reference Questions
3/7
Testers have to be prepared to modify existing scales to suit their own purposes.
What we decide must depend in part on the purpose of the assessment.
Analytic scoring.
A separate score for each of the number of aspects of a task: Grammar,Vocabulary, Mechanics
Advantages:
It disposes of the problem of uneven development of sub skills in individuals,
Scorers are compelled to consider aspects of performance which they might
otherwise ignore
The very fact that the scorer has to give a number of scores will tend to make
the scoring more reliable.
The halo effect: the mere fact of having shots at assessing the students
performance should lead to greater reliability.
The main disadvantage of the analytical method is the time it takes. Scoring will
take longer than with the holistic method.
A second disadvantage is that concentration on the different aspects may
divert attention from the overall effect of the piece of writing.
The choiceholistic or analytical- depends in part on the purpose of the testing.
If diagnostic information is required directly from the ratings given, then analytic
scoring is essential.
Any of both should reflect the particular purpose of the test and the form the
reported scores on it will take. The chosen scales need to be adapted for the
situation in which they are to be used.
Scales tell candidates this is the criteria by which we will judge you. Candidates
need to be aware of them (backwash effect)
II - Testing oral ability:
3 - How can we make sure that the task truly represents the candidate's oral
ability?
To elicit a valid sample of oral ability we have to choose an appropriate
technique or set of techniques. They include 3 formats:
1 Interview: it is common but it has at least one important drawback. The
relationship between the tester and the candidate is usually such that the
candidate speaks as to a superior and is unwilling to take the initiative. As a
result, only one style of speech is elicited, and many functions such as asking for
-
7/29/2019 2 -Reference Questions
4/7
information are not represented by the candidate. To solve this problem we
have useful techniques.
a) Questions and requests for information e.g: can you explainto me how/ why.?
b) Requests for elaboration: eg. What exactly do you mean?, canyou explain that in a little more detail?, what would be a good
example of that?
c) Appearing not to understand: eg Im sorry, but I dont quitefollow you.
d) Invitation to ask questions: eg is there anything youd like to askme?
e) Abrupt change of topic: to see how the candidate deals withthis.
f) Pictures.g) Role Playh) Prepared monologuei) Reading aloud
2 Interaction with fellow candidates:
a) Discussionb) Role play
3 - Response to audio or video recordings:
a) Describe situationsb) Remarks in isolation to respond toc) Simulated conversation
-
7/29/2019 2 -Reference Questions
5/7
4 - What should we do to ensure that the samples of oral behavior are scored in
a valid and reliable way?
To create appropriate scales of scoring. Rating scales may be both holistic and
analytic.
Scoring will be valid and reliable only if:
Clearly recognizable and appropriate descriptions of criterial levels arewritten and scorers are trained to use them
Irrelevant features performance are ignored There are more than one scorer for each performance.
The accurate measurement of oral ability is not easy. It takes considerable time
and effort to obtain valid and reliable results. Nevertheless, where backwash is
an important consideration, the investment of such time and effort may be
considered necessary.
Readers are reminded that the appropriateness of content, descriptions of
criterial levels, and elicitation techniques used in oral testing will depend upon
the needs of individual institutions or organizations.
III - Testing reading:
5 - Please comment on the following quote:
"The basic problem is that the exercise of receptive skills does not necessarily, or
usually, manifest directly in overt behavior"
6 - What should be considered when setting the tasks?
First of all we have to carefully select the text (using a representative sampling
with the appropriate length and with many passages as possible, so we can
obtain content validity and reliability. Make sure the passages contain plenty
pieces of information and that the text has a clear structure)
Writing items that measure the ability in which we are interested. We need to set
tasks that will involve (students) in providing evidence of successful reading.
The possible techniques used in the task interfere as little as possible with the
reading process. Students may read perfectly but have some problems with
-
7/29/2019 2 -Reference Questions
6/7
writing that is why some solutions include: multiple choice, short answer, gap
filling, information transferred.
IV - Testing Listening:
7 - As listening is a receptive skill, in what way does testing listening parallel
testing in writing?
8 - What are some possible techniques to assess listening? What are their
advantages and disadvantages?
Multiple choice:
Short answer:
Gap filling:
Note taking
Partial dictation
Transcription
V - Testing grammar and vocabulary
9 - According to the author, why should we test grammar and vocabulary?
Because it is very useful to have an idea on what are the strengths and
weaknesses, for individual learners and groups, in grammar, what kind of gaps
exist in their grammatical repertoire and in this way they can be able to identify
what they need to improve.
-
7/29/2019 2 -Reference Questions
7/7
VI - Testing overall ability
10 - Provide examples of techniques that allow us to test overall ability.
Variety of cloze procedure:Deletion cloze:
Conversational cloze:
Mini cloze item:
The c-test:
Dictation: