testing evaluation pp t

7/29/2019 Testing Evaluation Pp t

1/51

THE PRACTICE OF ENGLISH LANGUAGE TEACHING

TESTING &EVALUATION


2/51

TESTING AND ASSESSMENT

Measurement and evaluation have been with us for a longtime.

Since the effect of testing on teaching and learning isunavoidable, testing is an important part of everylanguage teaching and language learning experience.


3/51

WHY TEST?

Has the instruction been successful?

Were the materials for instruction at

the right level?

Have all language skills been emphasized

equally?

What points need reviewing?

Should the same materials be used next year

or do they need some modifications?.


4/51

TEST EVALUATIONMEASURMENT


5/51

Test is the narrowest of the three terms.

It often connotes to the presentation ofa set of questions to be answered.

In general what distinguishes a test from other

types of measurement is that it is designed to obtain

a specific sample of behavior.

TEST

It is one type of measurement.

It may be used for pedagogic or descriptive

purposes.


6/51

MEASUREMENT

It implies a broader sense.

We can measure characteristics by means

other than giving tests e.g. using observation,rating scales, or other devices that

allow us to obtain information in a quantitative

form.


7/51

EVALUATIONIt has been defined in a variety

of ways.

In general, it refers to thesystematic gathering of

information for purposes of

decision making.

In other words, evaluation is aprofessional judgment or a

process that allows one to make

a judgment about the desirability

or value of a measure.


8/51

So, testing is not the only way in which

information bout peoples language ability

can be gathered.

It is just one form of assessment.


9/51

ASSESSMENT

SUMMATIVE

FORMATIVE


10/51

FORMATIVE ASSESSMENT

To check on the progress of their students.

To see how far they have mastered what

they should have learned.

And then to use this information to modify

their future teaching plans.

It cab also be the basis for

feedback to the students

e.g. informal testor quizzes


11/51

It is used at the end of

the term, semester

, or year in order

to measure what

has been achievedboth by groups and

by individuals.

SUMMATIVE ASSESSMENT

e.g. final

examination

In most cases grades

are assigned

on the basis of

performance on tests

in addition toclassroom performance.


12/51

Prof ic iency

test

Diagnost ic

test

Placement

test

Progress

(Achievement)

test

Different reasons for testing learners

Different kinds of tests

Port fol io

assessment


13/51

PLACEMENT TEST

It is used to sort new students into relatively homogenous language ability

groupings so that they can start a course at approximately the same level

as the other students in the class.

It is one of the most frequently used test at different levels of language

instruction.


14/51

DIAGNOSTIC TEST

It is designed to show what skills or knowledge a learner knows or does not

know. In other words, it is used to identify students strengths and

weaknesses.

It is in the reverse side of achievement test in the sense that while theinterest of the achievement test is in success, the interest in the diagnostic

test is in failure, what has gone wrong, in order to develop remedies.


15/51

ACHIEVEMENT TEST

It is designed to measure the degree of students learning from a particular

set of instructional materials.

It is directly related to language courses. It means that such tests normally

come after a program of instruction or items of the test are drawn from thecontent of instruction directly.

e.g. final, midterm, and class examinations


16/51

PROFICIENCY TEST

It is used to measure the overall language ability of the learners regardless

of any training they may have had in that language.

It seeks to answer the question:having learned this much, what can the

student do with it?

e.g. Test of English as a Foreign Language (TOEFL)


17/51

PORTFOLIO ASSESSMENT

Many educational institutions allow students to assemble a

portfolio of their works over a period of time(a term or semester),

so the students can be assessed by looking at three or four ofthe best pieces of work over this period.


18/51

ADVANTAGES

Provide evidence of

students effort

Help students to becomemore autonomous

Help them to self monitor

their own learning

DISADVANTAGES

It is time-consuming.

Teachers will need clear

training in how to select

items from the portfolio

and how to give them

grades.

In preparing their portfolios,

students may have been

helped by others.


19/51

RELIABILITYVALIDITY

Characteristics of a good test

1 2


20/51

VALIDITY

It measures what it is supposed to measure (construct

validity)

or can be used for the purposes for which it is intended.

Valid + for

It means any given test may be valid for some purposes, butnot for others.

Validity tells us what can be inferred from test scores.


21/51

Different kinds

of validityface validity: a test should look, on the face of it,as if it is valid. A test which consisted of only three

multiple-choice items would not convince students

of its face validity.

criterion-related validity: it is based on the extent to whichperformance on a newly-developed test is related to some other

criterion measure which is an indicator of the ability tested.

-

content validity: if the content of a test constitutes arepresentative sample of the language skills, structures, etc.with which it is meant to be concerned.


22/51


23/51

First, write explicit specifications for the test and make

sure that you include a representative sample of the

content of these in the test.

Second, whenever feasible, use direct testing.

Third, make sure that the scoring of responses relates

directly to what is being tested.

Finally, do everything possible to make the test

reliable. If a test is not reliable, it cannot be valid.


24/51

What is

reliability?


25/51

RELIABILITY

Reliability is a quality of test scores.

It refers to the consistency of measuresacross different times,

test forms, raters, and other characteristics of the measurementcontext.

Synonyms for reliability are:

Dependability, stability, consistency, predictability,accuracy


26/51

How to make test more reliable?

Take enough samples of behavior

Exclude items which do not discriminate well between weaker and

stronger students.

Do not allow candidates too much freedom.

Provide clear and explicit instructions.

Write unambiguous items.

Provide uniform and non-distracting conditions of administration.

..


27/51

Two kinds of testing

Discrete-point testing:

only tests one thing at a time and the answer is

either correct or incorrect.e.g. asking students to choose the correct form of

tense, or multiple-choice tests

Integrative testing:

expects students to use a variety of language at

any given time

e.g. writing a composition, doing conversational oral test


28/51

Types of test items

Direct test itemIt requires the candidate to

perform precisely the skill

we wish to measure.

It tries to be as much like

real-life language use as

possible.

e.g. writing samples, oral

interview

Indirect test item

It tries to measure the

abilities which underlie the

skills in which we are

interested.

e.g. multiple-choice

questions, cloze

procedures, sentencereordering


29/51

Time Line

Transformat

ion andparaphrase

Sentencereordering

Multiple-choice items

Clozeprocedures


30/51

MULTIPLE-CHOICE ITEMS

Scoring is reliable, easy, and economical.

It is possible to include more items in agiven period of time.ADVANTAGES

It is very difficult to write successful

items. It restricts what can be tested.

It tests only recognition knowledge.

Cheating may be facilitated.

DISADVANTAGES


31/51

CLOZE PROCEDURES

It offers the ideal indirect, but integrative

testing items.

They can be prepared quickly, and are

an extremely cost-effective way of

finding out about a testees overall

knowledge.

Cause the deletion of words is random,

it avoids test designers failing.

Cause of the randomness of deleted

words, anything may be

tested.(grammar, collocations, fixedphrases,)

Supplying the correct word for the blank

implies an understanding of context and

a knowledge of that word and how it

operates.

In some cases, there are several

possible answers.

The actual score a student gets

depends on the particular words that

are deleted, rather than on any general

English knowledge.(problem of

reliability)

advantages disadvantages


32/51

They tell us somethingabout the candidatesknowledge of the

language system.

TRANSFORMATION

AND

PARAPHRASES


33/51

SENTENCE REORDERING

It tells us quite a lot aboutstudents underlyingknowledge of syntax andlexico-grammatical elements.

Although they are easy to write,they are not always possible toensure only one correct order.


34/51

DIRECT TEST ITEMS

To have valid and reliable direct test items, test designers need to

do the following:


35/51

Create a level playing field: it means that in the case of awritten test or in testing receptive skills, it is needed to avoidmaking excessive demands on the students general or

specialist knowledge. That is the topics should not be toogeneral or too specific.

Replicate real-life interaction: it means that tests of listeningor speaking should reflect real life, i.e. the text should be asrealistic as possible.


36/51

WRITING AND MARKING TESTS


37/51

WRITINGTESTS

1- assess the test situation2- decide what to test

3- balance the elements

4- weight the scores

5- make the test work


38/51

OBJECTIVE SCORING: a method of scoring in which the scores are given

according to some predetermined

criteria. in this method, each correct

answer is usually counted one point.

SUBJECTIVE SCORING: a method of scoring in which the scoring

procedures do not follow any objective criteria. So, the fluctuations ofscores from one scorer to another creates a serious problem. To

compensate for the inadequacies of subjective scoring, the following

solutions are recommended:


39/51

1- Training

It means that the scorer should not come to the task fresh. Theyshould see some scripts at different levels.e.g. they may be allowed to watch videoed oral test in order to betrained to rate the samples of spoken language accurately andconsistently in terms of predetermined description of performance.


40/51

2- More than one scorer

More scorer, more reliabilityThe more people who look at a script, the greater the chance that its true

worth will be located somewhere between the various scores it is given.

sometimes we can use a moderator

whose job is to check samples of

scorers work to see that it conforms with

the general standards laid down for

the exam.


41/51

3- Global assessment scale

A way of specifying score is to use a pre-defined descriptions of performance. Such

descriptions say what the students need to be capable of in order to gain the

required marks.

However, they are not without problems:

Maybe these descriptions do not exactly match the students who is

speaking.

Another one is that different teachers will not agree on the meaning of

scale descriptors.


42/51

We can mark tests for different elements, instead of general assessment.

A combination of global and analytic scoring gives us the best chance

of the reliable marking.

Scorecriteria

Fluency

Use of vocabulary

Use of grammar

Pronunciation

Repair skills

Task completion

intelligibility

4- Analytic profiles


43/51

5- Scoring and interacting during oral tests

if we separate the role ofscorer(or examiner) from the role ofinterlocutor

(the examiner who guides and provokes conversation), it will allow the

scorer to observe and assess, free from the responsibility of keeping theinteraction with the candidate or candidates going.

e.g. In test of speaking, we can put students in pairs or groups for certain

tasks. It will help to relax students in a way that interlocutor-candidate

interaction might not.


44/51

TEACHING FOR TESTS backwash (wash back) effect: the effect of the nature of a test on

teaching and learning. In other words, it is the potential impact of test ontest takers and their characteristics, on teaching and learning activities,

and on educational system and society.

Two kinds:

Harmful (negative) backwash: when test and testing techniques are at

variance with the objectives of the course.

Beneficial (positive) backwash: if a test is regarded as important, if the

stakes are high, preparation for it can come to dominate all teaching and

learning activities.


45/51

What does teaching for test mean?

Exam-teachers

Those who quit reasonably want

their students to pass the tests

and exams they are going to

take, so their teaching becomedominated by the test. Suffering

from the backwash effect, they

might stick rigidly to exam-

format activities in class.

In such a situation, the format of

the test is determining the

format of the class.

Non-exam teachers

They might use a range of

different activities.


46/51

Many teachers believe that teaching examclasses are extremely satisfying

because:

Since students perceive a clear sense of

purpose and are highly motivated to do

as well as possible, they are in some

sense easier to teach than students

whose focus is less clear.

Also, in training students to developgood exam skills (e.g.

working on their own, reviewing whatthey have done, learning to usereference tools, keeping an

independent learning record, etc.), wepush them towards autonomouslearning.


47/51

However, to be a good exam-preparation teacher is not easy.

They need to be familiar with the test their students are taking, and be able

to answer their students concerns and worries, and to walk a fine line

between good exam preparation and the wash back effect. So there are

number of things they can do in an exam class:

Train for test

Discuss general exam skills

Do practice test

Have funIgnore the test


48/51

1- Train for test types

Generally, we can make students familiar with the test items they will have to face sothat they can give their best, and the test discovers their level of English.

e.g. we can show them the various types of tests.

Help them to understand what the test designer is aiming for.

Help them to focus on what they are being asked to do and why.

and so on

2- Discuss general exam skills

We can remind students about general test and exam skills and teach them how to

organize their work so that they can revise effectively.

e.g. help them to pace themselves so that they do not spend a disproportionate

amount of time on only one part of exam.

Remind them how easily they can find the answer by reading question carefully, and


49/51

3- Do practice test

It means giving students the chance to practice taking the test so that they

get a feel for experience.

During a course, students can sit practice papers or whole practice tests.

4- Have fun

Although students need to practice certain test types, it has not to be done in

a boring or tense way. Teachers can use number of ways of having fun

with tests and exams.

e.g. teachers can ask students to write their own test items, based on

language they have been working on and the examples they have seen so

far.


50/51

5- ignore the test

Warning

Exam teacher should be careful that only discussing on exam techniques

and taking practice tests in class may become lesson and class

monotonous. In other words, in such classes there is a possibility thatgeneral English improvement will be compromised at the expense of exam

preparation.

To avoid this problem, we need to ignore the examfrom time to time so

that we have opportunities to work on general language issues toencourage students to take part in the kind of motivating activities that are

appropriate for all English lesson.


51/51

THANK TOU

testing evaluation pp t

Documents