la notes (1 7 & 9)

104
Chapter 1 Overview of assessment : Context, issues and trends

Upload: hakim-azman

Post on 21-Feb-2017

572 views

Category:

Education


2 download

TRANSCRIPT

Page 1: La notes (1 7 & 9)

Chapter 1 Overview of assessment : Context, issues and trends

Page 2: La notes (1 7 & 9)

DEFINITION OF TERMS – test, measurement, evaluation, and assessment• A test is a subset of assessment intended to measure a test-taker's language

proficiency, knowledge, performance or skills• Brown defined a test as a process of quantifying a test-taker’s performance according

to explicit procedures or rules. • Assessment : process of observing and measuring learning. It is an ongoing process in

educational practice, which involves a multitude of methodological techniques. • It can consist tests, projects, portfolios.• Evaluation involves the interpretation of information. When a tester or marker

evaluate, s/he “values” the results in such a way that the worth of the performance is conveyed to the test-taker.

• Measurement is the assigning of numbers to certain attributes of objects, events, or people according to a rule-governed system.

Page 3: La notes (1 7 & 9)

The relationship between tests, measurement and assessment

Page 4: La notes (1 7 & 9)

Four stages/phases of development of examination system in our country• Pre-Independence • Razak Report• RahmanTalib Report• Cabinet Report• Malaysia Education Blueprint (2013-2025)

Page 5: La notes (1 7 & 9)

The achievements of Malaysia Examination Syndicate (MES)

Page 6: La notes (1 7 & 9)

Chapter 2 Role and

Purposes of Assessment In

T & L

Page 7: La notes (1 7 & 9)

Framework

Reasons / Purposes of Assessment

Assessment of Learning / Assessment for Learning

Page 8: La notes (1 7 & 9)

Assessment OF Learning• the use of a task or an activity to measure, record, and report on a

student’s level of achievement in regards to specific learning expectations.

• This type of assessment is also known as summative assessment.• provide the focus to improve student achievement, give everyone the

information they need to improve student achievement, and apply the pressure needed to motivate teachers to work harder to teach and learn.

Page 9: La notes (1 7 & 9)

AOL

Page 10: La notes (1 7 & 9)

Assessment FOR leaning• the use of a task or an activity for the purpose of determining student

progress during a unit or block of instruction.• Is roughly equivalent ( the same) to formative assessment -

assessment intended to promote further improvement of student learning during the learning process.

• commonly known as formative and diagnostic assessments.• students are provided valuable feedback on their own learning.

Page 11: La notes (1 7 & 9)

Importance of AFL• reflects a view of learning in which assessment helps students learn better, rather

than just achieve a better mark• involves assessment activities as part of learning and to inform the planning of future

learning• includes clear goals for the learning activity• provides effective feedback that motivates the learner and can lead to improvement• reflects a belief that all students can improve• encourages self-assessment and peer assessment as part of the regular classroom

routines• involves teachers, students and parents reflecting on evidence• is inclusive of all learners.

Page 12: La notes (1 7 & 9)

Types of testsHenning (1987) identifies six kinds of information that tests provide about students. They are:

o Diagnosis and feedback o Screening and selection o Placement o Program evaluation o Providing research criteria o Assessment of attitudes and socio-psychological differences

Page 13: La notes (1 7 & 9)

Type of tests Explanation

1. Proficiency tests - designed to assess the overall language ability of students at varying levels.- usually developed by external bodies such as examination boards like Educational

Testing Services (ETS) or Cambridge ESOL.- Standardized

2. Achievement tests - to see what a student has learned with regard to stated course outcomes- usually administered at mid-and end- point of the semester or academic year.- generally based on the specific course content or on the course objectives.- cumulative, covering material drawn from an entire course or semester.

3. Diagnosis tests - seek to identify those language areas in which a student needs further help. - is crucial for further course activities and providing students with remediation.- placement tests often serve a dual function of both placement and diagnosis (Harris &

McCann, 1994; Davies et al., 1999). 4. Aptitude tests - designed to measure general ability or capacity to learn a foreign language a priori

(before taking a course) and ultimate predicted success in that undertaking. - designed to apply to the classroom learning of any language

5. Progress tests - measure the progress that students are making towards defined course or programme goals.

- administered at various stages throughout a language course to see what the students have learned

Page 14: La notes (1 7 & 9)

Type of tests Explanation

6. Placement tests - designed to assess students’ level of language ability for placement in an appropriate course or class.

- indicates the level at which a student will learn most effectively

- main aim is to create groups, which are homogeneous in level.

Page 15: La notes (1 7 & 9)

Malaysian Context (KSSR)• School-Based AssessmentPurpose: 1. to realign the education system from one that focuses on academic

excellence to a more holistic one2. To ensure a more systematic mastery of knowledge by emphasising on

assessment of each child. 3. To achieve the aspiration of National Philosophy of Education towards

developing well rounded learners (JERIS)4. to reduce exam-oriented learning among learners5. to evaluate learners’ learning progress

Page 16: La notes (1 7 & 9)

Malaysian context ctd..SBE features:• Assessment for and of learning• Standard-referenced Assessment (Performance Standard)• *Formative tests which are assessed using Bands 1 to 6, HOTS (Higher

Order Thinking Skills)• Holistic• Integrated

Page 17: La notes (1 7 & 9)

SBE Component:Academic:• School Assessment (using Performance Standards)• Centralised AssessmentNon-academic:• Physical Activities, Sports and Co-curricular Assessment (PAJSK : eg;

SEGAK)• Psychometric/Psychological Tests (Aptitude test, Personality test)

Page 18: La notes (1 7 & 9)

SBE Instrument (WHO):• Teachers • Rationale:• - Can continuously monitor their pupils’ growth• - Can provide constructive feedback to help improve pupils’ learning

abilities• - Better understand the context and environment most conducive to

assess pupils• - Appraise and provide feedback based on the Performance StandardsHOW:Observation, Performance, Project, Product,Hands on, Written Essays, Pencil and Paper, Worksheet, Open ended discussion, Quizzes, Checklist,Homework.

Page 19: La notes (1 7 & 9)

Performance Standard ;a set of statements detailing the achievement and mastery of an individual within a certain discipline, in a specific period of study based on an identified benchmark.

Page 20: La notes (1 7 & 9)

Chapter 3 Basic Testing Terminology

Page 21: La notes (1 7 & 9)

FrameworkTypes of Tests

Norm-Referenced and Criterion-Referenced

Formative and Summative

Objective and Subjective

Page 22: La notes (1 7 & 9)

Norm-Referenced Test Criterion-Referenced Test (Mastery tests)

Definition

Purpose

A test that measures student’s achievement as compared to other students in the group *Designed to yield a normal curve, 50% above 50% below.

Determine performance difference among individual and groups

An approach that provides information on student’s mastery based on a criterion specified by the teacher*Anyone who meets the criterion can get high score

Determine learning mastery based on specified criterion and standard

Test Item Frequency

From easy to difficult level and able to discriminate examinee’s ability

Continuous assessment in the classroom

Guided by minimum achievement in the related objectives

Continuous assessment

Appropriateness

Example

Summative evaluation

Public exams: UPSR, PMR, SPM, and STPM

Formative evaluation

Mastery test: monthly test, coursework, project, exercises in the classroom

Page 23: La notes (1 7 & 9)

Norm-Referenced Test Criterion-Referenced Test

Purpose - To rank each pupil with respect to the achievement of others in broad areas of knowledge.- To discriminate between high and low achievers.- To show how a student’s performance compares

to that of other test-takers

- To determine whether each student has achieved specific skills or concepts.

- To find out how much student know before instruction begins and after it has finished.

- To classify students according to whether they have met established standard.

Content - Measures broad skill areas sampled from a variety of textbooks,syllabus,and the judgment of curriculum experts

- Measures specific skills which make up a designated curriculum

- Each skill is expressed as an instructional objective.

Item characteristics - Each skills is usually tested by less few items- Items vary in difficulty- Items are selected that discriminate between

high and low achievers

- Each skills tested by at least 4 items in order to obtain an adequate sample of pupil performance.

- Minimize guessing- The items which test any given skill are

parallel in difficulty.

SET B: Q1a)

Page 24: La notes (1 7 & 9)

Norm-Referenced test (Normal Curve)• represents the norm or average performance of a population and

the scores that are above and below the average within that population.

• include percentile ranks, standard scores, and other statistics for the norm group on which the test was standardized.

• A certain percentage of the norm group falls within various ranges along the normal curve.

• Depending on the range within which test scores fall, scores correspond to various descriptors ranging from deficient to superior.

• An examinee's test score is compared to that of a norm group by converting the examinee's raw scores into derived or scale scores.

• Testmakers make the test so that most students will score near the middle, and only a few will low (the left side of the curve) or high (the right side of the curve).

• Scores are usually reported as percentile ranks.• The scores range from 1st percentile to 99th percentile, with the

average students scores set at the 50th percentile.

Page 25: La notes (1 7 & 9)

Positive Skew• Positive skew is when the long tail is

on the positive side of the peak, and some people say it is "skewed to the right".

• The mean is on the right of the peak value.

• the mean is greater than the mode. • distribution has scores clustered to

the left, with the tail extending to the right.

Page 26: La notes (1 7 & 9)

Negative Skew• Majority of the score falls toward the upper hand.

• Curves are not symmetrical and have more scores on the higher ends of distribution which will tend to reduce the reliability of the test.

• Also called the mastery curve.

Problem:

• Scores are scrunched up around one point and thus making it difficult to make decisions as many pupils will be around that same point.

• Skewed distributions will also create problems as they indicate violations of the assumption of normality that underlies many of the other statistics that are used to study test validity. (James Dean Brown, 1997)

Page 27: La notes (1 7 & 9)

Characteristics Formative SummativeRelation to Instruction - Occurs during instruction - Occurs after instruction

Frequency - Occurs on an ongoing basis - Occurs at a particular point in time to determine to know

Relation to grading - Not graded – information used as feedback to students and teachers, mastery is not expected when students are first introduced to a concept

- Graded

Students role - Active engagement – self assessment - Passive engagement in design and monitoring

Requirements for use - Clearly defined learning targets that students understand

- Clearly defined criteria for success that student understand

- Use of descriptive versus evaluation feedback

- Well designed assessment blue print that outlines the learning targets.

- Well designed test items using best practices

Examples - a process .- Observations, interviews, evidence from

work samples, paper and pencil tasks

- Final assessment

Purpose - Designed to provide information needed to adjust teaching and learning

- Designed to provide information about the amount of learning that has occurred at a particular point.

Page 28: La notes (1 7 & 9)

Formative Vs Summative

Assessment For Learning AFL/AOL? Assessment Of Learning

involves both teachers and students in ongoing dialogue, descriptive feedback, and reflection throughout instruction

Elaboration -evaluate student learning at the end of an instructional unit by comparing it against some standard or benchmark-Specific learning outcomes and standards are reference points-grade levels may be the benchmarks for reporting-Rubrics can be given to students before they begin working on a particular project so they know what is expected of them for each of the criteria.

-help students identify their strengths and weaknesses and target areas that need workrecognize where students are struggling and address problems immediately-gain as much information as possible of what the student has achieved, what has not been achieved, and what the student requires to best facilitate further progress-Students’ involvement-Opportunities for students to express their understandings

Benefit -create clear expectations-Includes different level of difficulty -make a judgment of student competency

Page 29: La notes (1 7 & 9)

Formative SummativeExit slips: Ask students to solve one problem or answer one question on a small piece of paper. Students hand on the slips as “exit tickets” to pass to their next class, go to lunch, or transition to another activity. The slips give teachers a way to quickly check progress toward skills mastery.

Graphic organizers: When students complete mind maps or graphic organizers that show relationships between concepts, they’re engaging in higher level thinking. These organizers will allow teachers to monitor student thinking about topics and lessons in progress.

Self-assessments: One way to check for student understanding is to simply ask students to rate their learning. They can use a numerical scale, a thumbs up or down, or even smiley faces to show how confident they feel about their understanding of a topic.

Think-pair-share: Ask a question, give students time to think about it, pair students with a partner, have students share their ideas. By listening into the conversations, teachers can check student understanding and assess any misconceptions. Students learn from each other when discussing their ideas on a topic.

Observation: Watching how students solve a problem can lead to further information about misunderstanding. Discussion: Hearing how students reply to their peers can help a teacher better understand a student’s level of understanding.Categorizing: Let students sort ideas into self-selected categories. Ask them to explain why such concepts go together. This will give you some insight into how students view topics.

Example Multiple choice, True/false, Matching

Short answer

Fill in the blank

One or two sentence response

Portfolios: Portfolios allow students to collect evidence of their learning throughout the unit, quarter, semester, or year, rather than being judged on a number from a test taken one time.

Projects: Projects allow students to synthesize many concepts into one product or process. They require students to address real world issues and put their learning to use to solve or demonstrate multiple related skills.

Performance Tasks: Performance tasks are like mini-projects. They can be completed in a few hours, yet still require students to show mastery of a broad topic.

Page 30: La notes (1 7 & 9)

SetAQ1b)Benefits of integrating formative and summative

• The integration of summative assessments with formative practices can make the assessment process more meaningful for students by providing regular feedback that supports learning whilst also contributing towards an overall picture of their learning.

• Integrated assessment practices can also help learners to understand connections between learning and assessment. Developing students’ active involvement as assessors of their own learning supports them in life-long learning beyond formal education.

• The integration of assessments facilitates the accumulation of evidence which can be used for both formative and summative purposes over time, reducing ‘teaching to the test’.

Page 31: La notes (1 7 & 9)

Objective vs SubjectiveObjective SubjectiveThose with a single correct response regardless of who scores a set of responses, an identical score will be obtained

Those items that typically do not have a single correct response

Subjective judgment of the scorer do not influence an individual’s score

Subjective judgments of the scorer are an integral part of the scoring process

Also known as “selected response” and “structured-response” items

Also known as “free-response”, “constructed-response” and “supply-type” items

Include multiple-choice question, matching and alternative-choice items

Include short-answer and essay items

Assess lower-level skills such as knowledge,comprehension

Require students to produce what they know

Relatively easy to administer, score and analyse Easy to construct

Page 32: La notes (1 7 & 9)

5 Basic Terminology in Objective test1. Receptive or selective responseItems that the test-takers chooses from a set of responses, commonly called a supply type of response rather than creating a response.2. StemEvery multiple-choice item consists of a stem (the ‘body’ of the item that presents a stimulus). Stem is the question or assignment in an item. It is in a complete or open, positive or negative sentence form. Stem must be short or simple, compact and clear. However, it must not easily give away the right answer.3. Options or alternativesThey are known as a list of possible responses to a test item. There are usually between three and five options/alternatives to choose from.4. KeyThis is the correct response. The response can either be correct or the best one. Usually for a good item, the correct answer is not obvious as compared to the distractors.5. Distractors This is known as a ‘disturber’ that is included to distract students from selecting the correct answer. An excellent distractor is almost the same as the correct answer but it is not.

Page 33: La notes (1 7 & 9)

SETBQ1B Objective testsStrength Weaknesses

Quick grading Difficult to design, has to consider good distractor

High inter-rater reliability-requires no judgment from the scorer

Considerable effect- Guessing is possible

Easy to administer especially for a big group Low validity

Wide coverage of topics in the outlined curriculum Difficult to construct HOTS question

Precision in testing specific skills Testing on the skill rather than content

Page 34: La notes (1 7 & 9)

General Guidelines for Objective Test items

MCQ Alternate-choice items

i.Design each item to measure a single objective;ii.State both stem and options as simply and directly as possible;iii.Make certain that the intended answer is clearly the one correct one;iv.(Optional) Use item indices to accept, discard or revise item.

1.Must have only one correct answer 2. Format the items vertically, not horizontally. 3. Avoid using ‘All of the above”, “None of the above”, or other special distractors. 4. Use the author’s examples as a basis for developing your items. 5. Avoid trick items which will mislead or deceive examinees into answering incorrectly

An alternate-choice test item is a simple declarative sentence, one portion of which is given with two different wordings.E.g:Ali seems to be (a) eager (b) hesitant in making decision to further his studies.

The examinee's task is to choose the alternative that makes the sentence most nearly true.

Rate of guessing is high-Difficult to write good alternate choice that covers all aspects.Takes a shorter time-Examiners take shorter time to evaluate the examinee.Trick questions are seldom appropriate-Examiners need to test the examinee directly.

Avoid taking statements directly from the text and placing them out of context.-Avoid confusion. It will not test the examinee understanding but their ability of finding answers.

Use other symbols other than T/F, Y/N-Examiners could make the examinee to underline the correct answers.

Page 35: La notes (1 7 & 9)

General guidelines Subjective test itemsShort answer Essay items

Short-answer questions are open-ended questions that require students to create an answer. They are commonly used in examinations to assess the basic knowledge and understanding (low cognitive levels) of a topic before more in-depth assessment questions are asked on the topic.

-Design short answer items which are appropriate assessment of the learning objective-Make sure the content of the short answer question measures knowledge appropriate to the desired learning goal-Express the questions with clear wordings and language which are appropriate to the student population-Ensure there is only one clearly correct answer in each question-Ensure that the item clearly specifies how the question should be answered-Write the instructions clearly so as to specify the desired knowledge and specificity of response-Set the questions explicitly and precisely.-Direct questions are better than those which require completing the sentences.-Let the students know what your marking style is like, is bullet point format acceptable, or does it have to be an essay format?-Prepare a structured marking sheet; allocate marks or part-marks for acceptable answer(s).

-Do not make the correct answer a “giveaway” word that could be guessed by students who do not really know the information. -In addition, avoid giving grammatical cues or other cues to the correct answer. Avoid using statements taken directly from the curriculum. -Develop grading criteria that lists all acceptable answers to the test item. Have subject matter experts determine the acceptable answers.-Clearly state questionsnot only to make essay tests easier for students to answer, but also to make the responses easier to evaluate-Specify and define what mental process you want the students to perform (e.g., analyze, synthesize, compare, contrast, etc.). -Do not assume learner is practiced with the process-Avoid writing essay questions that require factual knowledge,as those beginning questions with interrogative pronouns(who, when, why, where)-Avoid vague, ambiguous, or non-specific verbs(consider, examine, discuss, explain) unless you include specific instructions in developing responses -Have each student answer all the questions-Do not offer options for questions -Structure the question to minimize subjective interpretations

Page 36: La notes (1 7 & 9)

Chapter 4Basic Principle of

AssessmentSET A SECTION B (1)

SET B Q2 a)

Page 37: La notes (1 7 & 9)

Reliability (Brown)• Consistent and dependable- If you give to another pupil or matched pupil on 2 different occasion,

the test should yield similar result• Consistent in its conditions across two or more administrations• Gives clear directions for scoring / evaluation• Has uniform rubrics for scoring / evaluation• Lends itself to consistent application of those rubrics by the scorer• Contains item / tasks that are unambiguous to the test-taker

Page 38: La notes (1 7 & 9)

Factor to UNRELIABILITY of a test1. Student-related realibility- Temporary illness, fatigue, a ‘bad day’, anxiety etc which make an observed score deviate

from one’s true score.2. Rater-reliability-Human error and biasness while scoring. -Inter-rater reliability happen when 2 or more scores award inconsistent scores of the same test.-Unclear scoring criteria,fatigue,biasness,carelessness.3. Test Administration Reliability – conditions in which the test is administered-noise,room lighting,variation in temperature,condition of table and chair4. Test-reliability – Nature of the test can cause measurement errors.-duration of the test (too long,timed), poorly written test items ie. Ambigious,generic, have more than one answer.

Page 39: La notes (1 7 & 9)

Validity• second characteristic of good tests is validity, which refers to whether the test is

actually measuring what it claims to measure. • The extent to which inferences made from assessment results are appropriate,

meaningful and useful in terms of the purpose of the assessment (Groundland,1998)A valid test:1. Measures exactly what it proposes to measure2. Does not measure irrelevant or ‘contaminating’ variable3. Relies as much as possible on empirical evidence (performance)4. Involves performance that samples the test’s criterion (objective)5. Offers useful, meaningful information about a test-takers ability6. Is supported by a theoretical rationale or argument

Page 40: La notes (1 7 & 9)

Face Validity: Do the assessment items appear to be appropriate?

• “determined impressionistically; for example by asking students whether the examination was appropriate to the expectations” (Henning, 1987).

• as the degree to which a test looks right, and appears to measure the knowledge or abilities it claims to measure, based on the subjective judgement of the examinees who take it, the administrative personnel who decide on its use, and other psychometrically unsophisticated observers.

High validity if:1. Well-constructed,expected format with familiar tasks2. Clearly doable within the allotted time limit3. Items that are clear and uncomplicated4. Directions that crystal clear5. Task that relate to their course work6. A difficulty level that presents a reasonable challenge

Page 41: La notes (1 7 & 9)

Content Validity - Does the assessment content cover what you want to assess? Have satisfactory samples of language and language skills been selected for testing?

• whether or not the content of the test is sufficiently representative and comprehensive for the test to be a valid measure of what it is supposed to measure” (Henning, 1987).

• “If a test samples the subject matter about which conclusions are to be drawn, and if it requires the test-taker to perform the behaviour that is being measured” (Mousavi,2002).

• Validity can be verified through the use of Table of Test Specification:1. is to make sure all content domains are presented in the test. 2. give detailed information on each content,3. level of skills, 4. status of difficulty, 5. number of items, item representation for rating in each content or skill or topic.

Page 42: La notes (1 7 & 9)

Construct Validity –Are you measuring what you think you're measuring? Is the test based on the best available theory of language and language use?

• The extent to which a test measures a theoretical construct or attribute

• Proficiency, communicative competence, and fluency are examples of linguistic constructs;

• Self-esteem and motivation are psychological constructs.

Page 43: La notes (1 7 & 9)

Criterion-Related Validity is usually expressed as a correlation between the test in question and the criterion measure.

-Concurrent (parallel) validity: Can you use the current test score to estimate scores of other criteria? Does the test correlate with other existing measures?• The extent to which procedure correlates with the current behaviour of

subjects• the use of another more reputable and recognised test to validate one’s own

test. -Predictive validity: Is it accurate for you to use your existing students’ scores to predict future students’ scores? Does the test successfully predict future outcomes?• The extent to which a procedure allows accurate prediction about a subject’s

future behaviour

Page 44: La notes (1 7 & 9)

Consequential Validity• Encompasses all of the consequences of a test, including considering

its accuracy in measuring intended criteria, its impact in the preparation of test takers, its effect on the learner, and the social consequences of a test’s interpretation and use.

Page 45: La notes (1 7 & 9)

Practicality• Refers to logistical, administrative issues involved in making,giving, and scoring an

assessment instrument.• Include “cost,time to construct and administer,ease of scoring and ease of reporting

the results (Mousavi,2009)Practical test:1. Stays within budgetary limits2. Stays within appropriate time constraint3. Relatively easy to administer4. Appropriately utilizes available human resources5. Does not exceed available material resources6. Has a scoring/evaluation procedure that is specific and time-efficient

Page 46: La notes (1 7 & 9)

Objectivity• refers to the ability of teachers/examiners who mark the answer

scripts. • The extent in which an examiner examines and awards scores to the

same answer script.High objective if:1. Examiners are able to give the same score to the similar answers

guided by the marking schemeObjective test = highest objectivitySubjective test = lowest objectivity

Page 47: La notes (1 7 & 9)

Authenticity• “the degree of correspondence of the characteristics of a given

language test task to the features of a target language task” High authenticity if:1. The language in the test is as natural as possible2. Items are contextualized3. Topics are meaningful4. Some thematic organization to items is provided5. Task represent to real world tasks

Page 48: La notes (1 7 & 9)

Washback• refers to the impact that tests have on teaching and learning

Positive Washback Negative Washback

Teacher -Induce teachers to cover their subject more thoroughly.-Improve teaching strategies-Encourage positive teaching learning process

-Encourage teachers to make “teaching to the test” curriculum-Teacher not fulfil curriculum standard-Neglect teaching of skill

Student -Make student work harder -Bring anxiety and distort performance-Make student to create a negative judgment towards test

Decision makers Use the authority power of high stakes testing to achieve the goalsTo improve and the introduction of new curriculum

-Overwhelmingly use tests to promote their political agendas and seize

Page 49: La notes (1 7 & 9)

Interpretability• Test should be written in a clear,correct and simple language• Avoid ambiguous questions and instruction• Clarity is essential to enable the pupils know exactly what the

examiner wants them to do.• Difficulty: The test questions should be appropriate in difficulty not

too hard or easy• Should be progressive to reduce stress and tension

Page 50: La notes (1 7 & 9)

Chapter 5 Designing Language

Classroom Test

Page 51: La notes (1 7 & 9)

Stages of Test Construction

Explanation

Determining 1) What it is one wants to know2) For what purpose

Aspect (Questions need answered) - Examinees- Kind of test- Purpose (State)- Abilities tested- Accuracy of results- Importance of backwash effect- Scope of test- Constraints set by the unavailability of expertise, facilities, time of construction, administration, and

scoringPlanning 1) Determine the content

Aspect - Purpose (Describe) - Characteristics of the test takers, the nature of the population of the examinees for whom the test is

being designed- A plan for evaluating the qualities of test usefulness (reliability, validity, authenticity, practicality

inter-activeness, and impact)

Page 52: La notes (1 7 & 9)

Stages of Test Construction

Explanation

Planning ctd - Nature of the ability we want measured- Identify resources- A plan for allocation and management of resources - Format and timing- Criteria- Levels of performance- Scoring procedures

Writing Test items writers’ characteristics:• Experienced in test construction.• Quite knowledgeable of the content of the test.• Have the capacity in using language clearly and economically.• Ready to sacrifice time and energy.

Other aspects:• Sampling : test constructors choose widely from the whole area of the course content. (Not

including EVERYTHING under course content in 1 version of test) • Decision regarding content validity and beneficial backwash

You’ve written it well when..(/) It is representative sample of the course material

Page 53: La notes (1 7 & 9)

Stages of Test Construction

Explanation

Preparing You have to…(/) Understand the major principles, techniques and experience …before preparing test items.

AVOID preparing• Test items which can be answered through test-wiseness.Test wiseness : examinees utilise the characteristics and formats of the test to guess the correct answer

Reviewing Principles for reviewing test items:• The test should not be reviewed immediately after its construction, but after some considerable time.• Other teachers or testers should review it. In a language test, it is preferable if native speakers are

available to review the test.

Pre-testing • The tester should administer the newly-developed test to a group of examinees similar to the target group; PURPOSE Analyse every individual item as well as the whole test.

• Numerical data (test results) should be collected to check the efficiency of the item, it should include item facility and discrimination.

Page 54: La notes (1 7 & 9)

Stages of Test Construction

Explanation

Validating • Identify IF • Item Facility (IF) shows to what extent the item is easy or difficult.• IF= number of correct responses (Σc) / total number of candidates (N)• And to measure item difficulty:IF= (Σw) / (N)The results of such equations range from 0 – 1. An item with a facility index of 0 is too difficult, and with 1 is too easy. The ideal item is one with the value of (0.5) and the acceptability range for item facility is between [0.37 → 0.63], i.e. less than 0.37 is difficult, and above 0.63 is easy.

Too easy/Too hard = Low reliability

Page 55: La notes (1 7 & 9)

Preparing Test Blueprint / Test Specifications• Test specs = an outline of your test /what it will “look like” + your guiding plan

for designing an instrument that effectively fulfils your desired principles, especially validity.

• They include the following:a description of its contentitem types (methods, such as multiple-choice, cloze, etc.)tasks (e.g. written essay, reading a short passage, etc.)skills to be includedhow the test will be scoredhow it will be reported to students

Page 56: La notes (1 7 & 9)

What is an item? • A tool, an instrument, instruction or question used to get feedback

from test-takers• Evidence of something that is being measured. • Useful information for consideration in measuring or asserting a

construct measurement. • Can be classified as a recall and thinking item. • Recall item : item that requires one to recall in order to answer• Thinking item : item that requires test-takers to use their thinking

skills to attempt.

Page 57: La notes (1 7 & 9)

Sequential steps in designing test specs• A broad outline of how the test will be organised• Which of the eight sub-skills you will test • What the various tasks and item types will be• How results will be scored, reported to students, and used in future class (washback)

Remember to…Know the purpose of the test you are creatingKnow as precisely as possible what it is you want to testNot conduct a test hastilyExamine the objectives for the unit you are testing carefully

Page 58: La notes (1 7 & 9)

Bloom’s Taxonomy (Revised)• Def : A systematic way of describing how a learner’s performance

develops from simple to complex levels in their affective, psychomotor and cognitive domain of learning.

Page 59: La notes (1 7 & 9)
Page 60: La notes (1 7 & 9)

The Cognitive Dimension Process

Page 61: La notes (1 7 & 9)

The Cognitive Dimension Process

Page 62: La notes (1 7 & 9)

Level 3C - 3

Page 63: La notes (1 7 & 9)
Page 64: La notes (1 7 & 9)
Page 65: La notes (1 7 & 9)

Categories & Cognitive Processes Definition

Factual Knowledge The basic elements students must know to the acquainted with a discipline or solve problems in it

Conceptual Knowledge The interrelationships among the basic elements within a larger structure that enable them to function together

Procedural Knowledge How to do something, methods of inquiry, and criteria for using skills, algorithms, techniques, and methods

Metacognitive Knowledge Knowledge of cognition in general as well as awareness and knowledge of one’s own cognition

The Knowledge Domain

Page 66: La notes (1 7 & 9)

SOLO Taxonomy• Def : (Structure of the Observed Learning Outcome) a systematic way

of describing how a learner’s performance develops from simple to complex levels in their learning.

• There are 5 stages, namely :Prestructural, Unistructural, Multistructural, which are in a quantitative phrase and Relational and Extended Abstract, which are in a qualitative phrase (Refer Figure 1.0) • A means of classifying learning outcomes in terms of their complexity,

enabling teachers to assess students’ work in terms of its quality.

Page 67: La notes (1 7 & 9)

Figure 1.0

Page 68: La notes (1 7 & 9)
Page 69: La notes (1 7 & 9)

Functions of SOLO taxonomy• An integrated strategy, to be usedIn lesson design (learning outcomes intended)In task guidanceIn formative and summative assessment In deconstructing exam questions to understand marks awardedAs a vehicle for self-assessment and peer-assessment

Page 70: La notes (1 7 & 9)

Advantages of SOLO taxonomy Aspect

Structure of the taxonomy • Encourages viewing learning as an on-going process, moving from simple recall of facts towards a deeper understanding; that learning is a series of interconnected webs that can be built upon and extended.

• Consisting as a series of cycles (especially between the Unistructural, Multistructural and Relational levels), which would allow for a development of breadth of knowledge as well as depth.

In turn..• Creating sts that are.. “self-regulating, self-evaluating learners who were well motivated

by learning.”SOLO based techniques • Use of constructional alignment encourages teachers to be more explicit when creating

learning objectives, focusing on what the student should be able to do and at which level. In turn..• Sts will be able to make progress and allows for the creation of rubrics, for use in class, to

make the process explicit to the student. It’s HOTs properties • Scaffold in depth discussion

In turn..• Encouraging sts to develop interpretations, use research and critical thinking effectively to

develop their own answers, and write essays that engage with the critical conversation of the field.

• May also be helpful in providing a range of techniques for differentiated learning.

Page 71: La notes (1 7 & 9)

Proponents of the SOLO taxonomy say.. • A model of learning outcomes that helps schools develop a common

understanding. • A ‘framework for developing the quality of assessment’ and that it is

‘easily communicable to students’. • Hattie outlines three levels of understanding: surface, deep and

conceptual. He indicates that:“The most powerful model for understanding these three levels and integrating them into learning intentions and success criteria is the

SOLO model.”

Page 72: La notes (1 7 & 9)

Critics of the SOLO taxonomy say…• There is potential to misjudge the level of functioning.• It has ‘conceptual ambiguity’; that the ‘categorisation’ is ‘unstable’. • The structure is referred as a hierarchy, hence rise of concerns when

complex processes, such as human thought, are categorised in this manner.

Page 73: La notes (1 7 & 9)

Guidelines for constructing test items

Guideline Elaboration

Aim of test • Developed to precisely measure the objectives prescribed by the blueprint• Meet quality standards

Range of the topics to be tested

Measure the test-takers’ ability or proficiency in applying the knowledge and principles on the topics that they have learnt

Range of skills to be tested • Have cognitive characteristics exemplifying understanding, problem-solving, critical • thinking, analysis, synthesis, evaluation and interpreting rather than just declarative

knowledge. • (Bloom’s taxonomy as tool to use in item writing)

Test format Needs to be a logical and consistent stimulus format Why?For test item writers : help expedite the laborious process of writing test items as well as supply a format for asking basic questions.

For test-takers :• So that the questioning process in itself does not give unnecessary difficulty to answering

questions• test takers can quickly read and understand the questions, since the format is expected

Page 74: La notes (1 7 & 9)

Guideline Elaboration

International and Cultural Considerations (biasness)

refrain from… the use of slang geographic references historical references or dates (holidays) …that may not be understood by an international examinee.

Level of difficulty Assure that the test item… Has a planned number of questions at each level of difficulty Able to determine mastery and non-mastery performance states Weak students could answer easy item Intermediate language proficiency students could answer easy and moderate item High language proficiency students could answer easy, moderate and advance test items encompass all three levels of difficulties

Page 75: La notes (1 7 & 9)

Test format• Refers to the layout of questions on a test. For example, the format of

a test could be two essay questions, 50 multiple- choice questions, etc.

*Note : If you wish to know on the outlines of some large-scale standardised tests, please refer to pages 64 & 65 in the PPG Module

Page 76: La notes (1 7 & 9)

Chapter 6Assessing Language

Skills Content

Page 77: La notes (1 7 & 9)

Types of test items to assess language skills

Language Skills Elaboration

Listening Two kinds of listening tests: • Tests that test specific aspects of listening, like sound discrimination• Task based tests which test skills in accomplishing different types of listening tasks considered

important for the students being testedFour types of listening performance from which assessment could be considered.

Intensive Listening for perception of the components (phonemes, words, intonation, discourse markers,etc) of a larger stretch of language.

Responsive Listening to a relatively short stretch of language ( a greeting, question, command, comprehension check, etc.) in order to make an equally short response

Selective Processing stretches of discourse such as short monologues for several minutes in order to “scan” for certain information. For example, to listen for names, numbers, grammatical category, directions (in a map exercise), or certain facts and events.

Extensive Listening to develop a top-down , global understanding of spoken language. For example listening to a conversation and deriving a comprehensive message or purpose and listening for the gist and making inferences.

Page 78: La notes (1 7 & 9)

Language Skills Elaboration Speaking Objective test : tests skills such as …

• Pronunciation• Knowledge of what language is appropriate in different situations• Language required in doing different things like describing, giving directions, giving instructions,

etcIntegrative task-based test : involves finding out if pupils can perform different tasks using spoken language that is appropriate for the purpose and the context. For example :• Describing scenes shown in a picture• Participating in a discussion about a given topic• Narrating a story, etc. CATEGORIES FOR ORAL ASSESSMENT (Refer yellow table)

Page 79: La notes (1 7 & 9)

Category Elaboration

Imitative • Ability to imitate a word or phrase or possibly a sentence/ pronunciation• A number of prosodic (intonation, rhythm,etc.), lexical , and grammatical properties of language may be included

Intensive • The production of short stretches of oral language designed to demonstrate competence in a narrow band of grammatical, phrasal, lexical, or phonological relationships.

• Eg :directed response tasks (requests for specific production of speech), reading aloud, sentence and dialogue completion, limited picture-cued tasks including simple sentences, and translation up to the simple sentence level.

Responsive • Interaction and test comprehension but at somewhat limited level of very short conversation, standard greetings, and small talk, simple requests and comments.

• The stimulus is almost always a spoken prompt (to preserve authenticity) with one or two follow-up questions or retorts

Interactive • Increased length + complexity from responsive. • May include multiple exchanges and/or multiple participants.• Two types : (a) transactional language, which has the purpose of exchanging specific information, and (b)

interpersonal exchanges, which have the purpose of maintaining social relationships. Extensive • Speeches, oral presentations, and storytelling, during which the opportunity for oral interaction from listeners is

either highly limited (perhaps to nonverbal responses) or ruled out together.• Language style is more deliberative (planning is involved)• May include informal monologue such as casually delivered speech (e.g., recalling a vacation in the mountains,

conveying recipes, recounting the plot of a novel or movie).

Page 80: La notes (1 7 & 9)

Language Skills Elaboration

Reading

Meaning conveyed through reading text

Type Elaboration

Skimming Inspect lengthy passage rapidly

Scanning Locate specific information within a short period of time

Receptive/ Intensive A form of reading aimed at discovering exactly what the author seeks to convey

Responsive Respond to some point in a reading text through writing or by answering questions

Page 81: La notes (1 7 & 9)

Meaning conveyed through reading text

Grammatical meaning Meanings that are expressed through linguistic structures such as complex and simple sentences and the correct interpretation of those structures.

Informational meaning The concept or messages contained in the text. May be assessed through various means such as summary and précis writing.

Discourse meaning The perception of rhetorical functions conveyed by the text.

Writer’s tone The writer’s tone – whether it is cynical, sarcastic, sad or etc

Page 82: La notes (1 7 & 9)

Language Skills Elaboration

Writing

Imitative • The ability to spell correctly and to perceive phoneme-grapheme correspondences in the English spelling system

• The mechanics of writing• Form is the primary focus while context and meaning are of secondary concern.

Intensive (controlled)

• Producing appropriate vocabulary within a context, collocation and idioms, and correct grammatical features up to the length of a sentence.

Responsive • Perform at a limited discourse level, connecting sentences into a paragraph and creating a logically connected sequence of two or three paragraphs.

• Tasks relate to pedagogical directives, lists of criteria, outlines, and other guidelines.• Eg : brief narratives and descriptions, short reports, lab reports, summaries, brief responses to reading, and

interpretations of charts and graphs. • Form-focused attention is mostly at the discourse level, with a strong emphasis on context and meaning.

Extensive • Implies successful management of all the processes and strategies of writing for all purposes, up to the length of eg : an essay,

• Focus is on achieving a purpose, organizing and developing ideas logically, using details to support or illustrate ideas, demonstrating syntactic and lexical variety and engaging in the process of multiple drafts to achieve a final product.

• Focus on grammatical form is limited to occasional editing and proofreading of a draft

Page 83: La notes (1 7 & 9)

Brown’s (Assessing Skills) Skill Type Test item

Listening Intensive Listening • Recognizing phonological and morphological elements• Paraphrase recognition

Responsive Listening • Responding to a stimulus; conversation, requests

Selective Listening • Listening cloze• Information transfer• Sentence repetition

Extensive Listening • Dictation• Communicative stimulus-response tasks• Authentic listening tasks

Speaking Intensive Speaking • Directed response tasks• Read-Aloud tasks• Sentence/dialogue completion tasks and oral questionnaires• Picture-cued tasks

Responsive Speaking • Q & A• Giving instructions and directions• Paraphrasing

Interactive Speaking • Interview• Role-play• Discussions and conversations• Games

Extensive speaking • Oral presentations• Picture-cued storytelling • Retelling a story, news event

Page 84: La notes (1 7 & 9)

Skill Type Test item

Reading Perceptive reading • Reading aloud• Written response• Multiple-choice• Picture-cued items

Selective reading • Matching tasks• Editing tasks• Picture-cued tasks• Gap-filling tasks

Interactive reading • Cloze tasks• Impromptu reading + comprehension questions• Short answer tasks• Editing longer texts• Scanning• Ordering tasks• Information transfer; reading charts, maps, graphs, diagrams

Extensive reading • Skimming tasks• Summarizing and responding • Notetaking and outlining

Writing Imitative writing • Writing letters, words and punctuation• Spelling tasks and detecting phoneme – grapheme correspondences

Intensive (Controlled) writing • Dictation and dicto-comp• Grammatical transformation tasks• Picture-cued tasks• Vocabulary assessment tasks• Ordering tasks• Short answer and sentence completion tasks

Page 85: La notes (1 7 & 9)

Skill Type • Test item

Writing Responsive and extensive writing • Paraphrasing• Guided Q & A• Paragraph constructions tasks• Strategic options• Standardized tests of responsive writing

Grammar & Vocabulary

Selected response • Multiple-choice tasks• Discrimination tasks• Noticing tasks or consciousness-raising tasks

Limited production • Gap-filling tasks• Short-answer tasks• Dialogue-completion tasks

Extended production • Information gap tasks• Role-play or simulation tasks

Page 86: La notes (1 7 & 9)

Objective and Subjective Test

Objective test • Tests that are graded objectively• Include the multiple choice test, true false items

and matching items • Similar to select type tests where students are

expected to select or choose the answer from a list of options

Subjective test • Involve subjectivity in grading• Include essays and short answer questions• Similar to supply type as the students are expected

to supply the answer through their essaySubjective + objective • Dictation test, filling in the blank type tests, as well

as interviews and role plays

Page 87: La notes (1 7 & 9)

Type of test : according to how students are expected to respond

Selected response:Do not create any language but rather

select the answer from a given list

Constructed response:Produce language by writing, speaking,

or doing something else

Personal response:Produce language but also allows each students’ response to be different from

one another and for students to “communicate what they want to

communicate”

True false Fill-in Conferences

Matching Short answer Portfolios

Multiple choice Performance test Self and peer assessments

Page 88: La notes (1 7 & 9)

Types of test items to assess language content

Page 89: La notes (1 7 & 9)

Discrete Integrative Language is seen to be made up of smaller units and it may be possible to test language by testing each unit at a time

Language is that of an integrated whole which cannot be broken up into smaller units or elements

Page 90: La notes (1 7 & 9)

Communicative test• Sts have to produce the language in an interactive setting involving

some degree of unpredictability which is typical of any language interaction situation.

Page 91: La notes (1 7 & 9)

The three principles of communicative tests are :

• involve performance;• are authentic; and• are scored on real-life outcomes

Page 92: La notes (1 7 & 9)

Limitation in applying the communicative test• Issues of practicality, involving especially the amount of time and

extent of organisation to allow for such communicative elements to emerge.

Advantages in applying the communicative test• Have valid language that are purposeful and can stimulate positive

washback in teaching and learning.

Page 93: La notes (1 7 & 9)

Chapter 7 Scoring, grading and assessment criteria

Page 94: La notes (1 7 & 9)

Scoring approaches

Objective • Relies on quantified methods of evaluating students’ writing

Holistic • The reader (examiner) reacts to the students’ compositions as a whole and a single score is awarded to the writing

• Each score on the scale will be accompanied with general descriptors of ability

• Related : Primary trait scoringAnalytical • Raters assess students’ performance on a variety of

categories which are hypothesised to make up the skill of writing

Page 95: La notes (1 7 & 9)

Comparison between approachesScoring Approach Advantages Disadvantages

Holistic

Quickly graded Provide a public standard that is understood

by the teachers and students alike Relatively higher degree of rater reliability Applicable to the assessment of many

different topics Emphasise the students’ strengths rather

than their weaknesses.

The single score may actually mask differencesacross individual compositions. Does not provide a lot of diagnostic feedback

Analytical

It provides clear guidelines in grading in the form of the various components.

Allows the graders to consciously address important aspects of writing.

Writing ability is unnaturally split up intocomponents.

Objective:

Emphasises the students’ strengths rather than their weaknesses.

Still some degree of subjectivity involved. Accentuates negative aspects of the learner’swriting without giving credit for what they cando well.

Page 96: La notes (1 7 & 9)

Questions you can attempt..• Describe with examples how holistic and analytical rubrics can be

used to assess Year 6 pupils’ writing based on the following skill- Write simple factual descriptions of things, events, scenes and what

one saw and did.

- Characteristics of each approach

Page 97: La notes (1 7 & 9)

Chapter 9 Reporting of Assessment

Data

Page 98: La notes (1 7 & 9)

Purposes of reporting • Main purpose of tests is to obtain information concerning a particular

behaviour or characteristic. • Evaluate the effectiveness of one’s own teaching or instructional

approach and implement the necessary changes• Based on information obtained from tests, several different types of

decisions can be made.

Page 99: La notes (1 7 & 9)
Page 100: La notes (1 7 & 9)

Reporting methods

Norm - Referenced Assessment and Reporting Assessing and reporting a student's achievement and progress in comparison to other students.

Criterion - Referenced Assessment and Reporting Assessing and reporting a student's achievement and progress in comparison to predetermined criteria.An outcomes-approach to assessment will provide information about student achievement to enable reporting against a standards framework.

An outcomes-approach Acknowledges that students, regardless of their class or grade, can be working towards syllabus outcomes anywhere along the learning continuum.

Page 101: La notes (1 7 & 9)

Principles of effective and informative assessment and reporting

Has clear, direct links with outcomes

Is integral to teaching and learning

Is balanced, comprehensive and varied

Is valid

Is fair

Engages the learner

Values teacher judgement

Is time efficient and manageable

Recognises individual achievement and progress

Involves a whole school approach

Actively involves parents

Conveys meaningful and useful information

Page 102: La notes (1 7 & 9)

Chapter 10Issues and Concerns related to assessment in Malaysian

Primary Schools

Page 103: La notes (1 7 & 9)

Components of PBSSchool assessment Refers to written tests that assess subject learning. The test questions and marking

schemes are developed, administered, scored, and reported by school teachers based on guidance from LP.

Central assessment Refers to written tests, project work, or oral tests (for languages) that assess subject learning. LP develops the test questions and marking schemes. The tests are, however, administered and marked by school teachers

Psychometric assessment Refers to aptitude tests and a personality inventory to assess students’ skills, interests, aptitude, attitude and personality. Aptitude tests are used to assess students’ innate and acquired abilities, for example in thinking and problem solving. The personality inventory is used to identify key traits and characteristics that make up the students’ personality. LP develops these instruments and provides guidelines for use.

Physical, sports, and co-curricular activities assessment

Refers to assessments of student performance and participation in physical and health education, sports, uniformed bodies, clubs, and other non-school sponsored activities

Page 104: La notes (1 7 & 9)

Benefits of PBS• enables students to be assessed on a broader range of output over a

longer period of time. • Provides teachers with more regular information to take the

appropriate remedial actions for their students. • Will hopefully reduce the overall emphasis on teaching totest, so that

teachers can focus more time on delivering meaningful learning as stipulated in the curriculum.