2 -reference questions

7/29/2019 2 -Reference Questions

1/7

Reference questions

I -Testing writing:

1 - What should we take into account to make sure that writing tasks arerepresentative and valid?

Content validity

People will simply be better at some tasks than others. So, if we arent able to

include every task and happen to choose just the task or tasks that a candidate

is particularly good or bad at, then the outcome is likely to be very different.

Try to select a representative set of tasks.

And the more tasks within reason that we set, the more representative of a

candidates ability we obtain.It is also to be remembered that if a test includes a wide ranging and

representative sample of specifications, the test is more likely to have a

beneficial backwash effect.

Accuracy is important: depending on the stakes (moving students from one class

to another Accuracy is not important) but it is important for

overseas students. (more samples are required)

2 - How can you ensure valid and reliable scoring?

Set as many separate tasks as is feasible

Need to include a representative sample of the specified content.

Peoples performance even on the same task is unlikely to be perfectly

consistent. Therefore offer as many fresh starts as possible and each task can

represent a fresh start. We will achieve greater reliability and so greater validity.

Test only writing ability, and nothing else

For the sake of validity tasks should not measure creativity, imagination, script-

writing ability.

The ability of reading interferes with the accurate measurement of writing.

Candidates are expected to be able to read simple instructions (ensure this)

Instructions should not be too long

Instead make use of illustrations.


2/7

A series of pictures can be used to elicit a narrative

Ensure valid and reliable scoring

Set tasks which can be reliably scored

A number of suggestions made to obtain a representative performance will alsofacilitate reliable scoring.

Set as many tasks as possible

The more scores for each candidate, the more reliable should be the total score.

Restrict candidates

Candidates should know just what is required of them, and they should not be

allowed to go too far astray.

Provide information in the form of notes (or pictures)

Tasks should not only fit well with the specifications, but they should also be

made authentic as possible.

The greater the restrictions imposed on the candidates, the more directly

comparable will be the performances of different candidates.

Give no choice of tasks

Making the candidates perform all tasks also makes comparisons between

candidates easier.

Ensure long enough samples

The samples of writing that are elicited have to be long enough for judgments to

be made reliably. (obtain reliable information on students organizational ability

in writing, the pieces have to be long enough for organization to reveal itself).

Given a fixed period of time for the test.

Create appropriate scales of scoring:

Holistic scoring

Instead of scoring only once, each students work is scores by 4 trained scorers =

high scorer reliability (research has proven it)

Not every scoring system will give equally valid and reliable results in every

situation. The system has to be appropriate to the level of the candidates and

the purpose of the tests.


3/7

Testers have to be prepared to modify existing scales to suit their own purposes.

What we decide must depend in part on the purpose of the assessment.

Analytic scoring.

A separate score for each of the number of aspects of a task: Grammar,Vocabulary, Mechanics

Advantages:

It disposes of the problem of uneven development of sub skills in individuals,

Scorers are compelled to consider aspects of performance which they might

otherwise ignore

The very fact that the scorer has to give a number of scores will tend to make

the scoring more reliable.

The halo effect: the mere fact of having shots at assessing the students

performance should lead to greater reliability.

The main disadvantage of the analytical method is the time it takes. Scoring will

take longer than with the holistic method.

A second disadvantage is that concentration on the different aspects may

divert attention from the overall effect of the piece of writing.

The choiceholistic or analytical- depends in part on the purpose of the testing.

If diagnostic information is required directly from the ratings given, then analytic

scoring is essential.

Any of both should reflect the particular purpose of the test and the form the

reported scores on it will take. The chosen scales need to be adapted for the

situation in which they are to be used.

Scales tell candidates this is the criteria by which we will judge you. Candidates

need to be aware of them (backwash effect)

II - Testing oral ability:

3 - How can we make sure that the task truly represents the candidate's oral

ability?

To elicit a valid sample of oral ability we have to choose an appropriate

technique or set of techniques. They include 3 formats:

1 Interview: it is common but it has at least one important drawback. The

relationship between the tester and the candidate is usually such that the

candidate speaks as to a superior and is unwilling to take the initiative. As a

result, only one style of speech is elicited, and many functions such as asking for


4/7

information are not represented by the candidate. To solve this problem we

have useful techniques.

a) Questions and requests for information e.g: can you explainto me how/ why.?

b) Requests for elaboration: eg. What exactly do you mean?, canyou explain that in a little more detail?, what would be a good

example of that?

c) Appearing not to understand: eg Im sorry, but I dont quitefollow you.

d) Invitation to ask questions: eg is there anything youd like to askme?

e) Abrupt change of topic: to see how the candidate deals withthis.

f) Pictures.g) Role Playh) Prepared monologuei) Reading aloud

2 Interaction with fellow candidates:

a) Discussionb) Role play

3 - Response to audio or video recordings:

a) Describe situationsb) Remarks in isolation to respond toc) Simulated conversation


5/7

4 - What should we do to ensure that the samples of oral behavior are scored in

a valid and reliable way?

To create appropriate scales of scoring. Rating scales may be both holistic and

analytic.

Scoring will be valid and reliable only if:

Clearly recognizable and appropriate descriptions of criterial levels arewritten and scorers are trained to use them

Irrelevant features performance are ignored There are more than one scorer for each performance.

The accurate measurement of oral ability is not easy. It takes considerable time

and effort to obtain valid and reliable results. Nevertheless, where backwash is

an important consideration, the investment of such time and effort may be

considered necessary.

Readers are reminded that the appropriateness of content, descriptions of

criterial levels, and elicitation techniques used in oral testing will depend upon

the needs of individual institutions or organizations.

III - Testing reading:

5 - Please comment on the following quote:

"The basic problem is that the exercise of receptive skills does not necessarily, or

usually, manifest directly in overt behavior"

6 - What should be considered when setting the tasks?

First of all we have to carefully select the text (using a representative sampling

with the appropriate length and with many passages as possible, so we can

obtain content validity and reliability. Make sure the passages contain plenty

pieces of information and that the text has a clear structure)

Writing items that measure the ability in which we are interested. We need to set

tasks that will involve (students) in providing evidence of successful reading.

The possible techniques used in the task interfere as little as possible with the

reading process. Students may read perfectly but have some problems with


6/7

writing that is why some solutions include: multiple choice, short answer, gap

filling, information transferred.

IV - Testing Listening:

7 - As listening is a receptive skill, in what way does testing listening parallel

testing in writing?

8 - What are some possible techniques to assess listening? What are their

advantages and disadvantages?

Multiple choice:

Short answer:

Gap filling:

Note taking

Partial dictation

Transcription

V - Testing grammar and vocabulary

9 - According to the author, why should we test grammar and vocabulary?

Because it is very useful to have an idea on what are the strengths and

weaknesses, for individual learners and groups, in grammar, what kind of gaps

exist in their grammatical repertoire and in this way they can be able to identify

what they need to improve.


7/7

VI - Testing overall ability

10 - Provide examples of techniques that allow us to test overall ability.

Variety of cloze procedure:Deletion cloze:

Conversational cloze:

Mini cloze item:

The c-test:

Dictation:

2 -reference questions

Documents