chapter 3: development of tools for classroom-based assessment · chapter 3: development of tools...

41
CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 49 The painstaking process of classroom assessment dictates the deployment of varied assessment tools to obtain a substantial view of the quality of learning in the classroom. Classroom assessment tools are employed to collect information that can further be interpreted, analyzed, and synthesized to facilitate developmental feedback to students during and after their engagement with the instructional process. They provide useful and objective measures of learning outcomes. The assessment information can speak of how well the learners are doing in a class, how effectively the teachers deliver the instruction, and what more can be done to ensure an effective instructional process. The more teachers know about what and how students are learning, the better they can plan to learn. Learning Outcomes: At the end of the chapter, the students are able to: 1. Identify the different types of the test; 2. implement the essential steps in the test development process that includes: a. re-examination of the target outcomes; b. determining the desired competencies to be measured; c. preparing a Table of Specification (TOS); and d. construction of valid and appropriate classroom assessment tests for measuring learning outcomes; 3. calculate for the validity and reliability of a prepared test; 4. Identify blunders in a constructed test; and 5. illustrate the test development process. Test Development Process The process of test construction for classroom testing applies the same initial steps in the construction of any instrument designed to measure a psychological construct. The process of test development involves three key phases and 12 steps as illustrated in Figure 11. Figure 11. Test Development Process

Upload: others

Post on 13-Oct-2020

25 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 49

The painstaking process of classroom assessment dictates the deployment of varied

assessment tools to obtain a substantial view of the quality of learning in the classroom. Classroom assessment tools are employed to collect information that can further be interpreted, analyzed, and synthesized to facilitate developmental feedback to students during and after their engagement with the instructional process. They provide useful and objective measures of learning outcomes. The assessment information can speak of how well the learners are doing in a class, how effectively the teachers deliver the instruction, and what more can be done to ensure an effective instructional process. The more teachers know about what and how students are learning, the better they can plan to learn.

Learning Outcomes: At the end of the chapter, the students are able to: 1. Identify the different types of the test; 2. implement the essential steps in the test development process that includes:

a. re-examination of the target outcomes; b. determining the desired competencies to be measured; c. preparing a Table of Specification (TOS); and d. construction of valid and appropriate classroom assessment tests for

measuring learning outcomes; 3. calculate for the validity and reliability of a prepared test; 4. Identify blunders in a constructed test; and 5. illustrate the test development process.

Test Development Process The process of test construction for classroom testing applies the same initial steps in the construction of any instrument designed to measure a psychological construct. The process of test development involves three key phases and 12 steps as illustrated in Figure 11.

Figure 11. Test Development Process

Page 2: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 50

Planning phase. This is where the learning outcomes to be assessed and competencies to be measured are specified (what to test). Based on the target outcomes and competencies, the teacher decides on the type and method of assessment to be used (how to test). A table of specifications is prepared to guide the item construction phase.

The important steps in planning for a test are: 1. Re-examine the instructional outcomes. Review the set instructional outcomes.

Do they cover the various levels of learning taxonomy? Do they require the application of higher-order thinking skills (HOTS) or

evoke critical thinking?

2. Determine the competencies to be measured. What knowledge, skills, and values are expected for students to learn or

master?

3. Decide on the type and method of assessment to use. What test can appropriately measure if the set instructional outcomes are

achieved? Can the test cover the learning outcomes intended and essential to be

achieved? What test format is best to use? How many items can practically be given within the set period of time?

4. Prepare a Table of Specification (TOS)

Table of specification is a test blueprint that details the content area to be covered in a test, the classification of test items according to test type/format, and the item number or placement in a test to achieve a fair and balanced sampling of skills to be tested. There are several formats in the preparation of a TOS. Format 1:

Content Number of Items

1. Importance of Research 6

2. Types of Research 12

3. Qualities of a Good Researcher 8

4. The Research Process 14

Total 40

Page 3: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 51

Format 2:

Topics Cognitive Level Type of Test Item Number

Total Points

1. Importance of Research

Remembering Enumeration 9-14 6

2. Types of Research

Evaluating Constructed response

15-26 12

3. Qualities of a Good Researcher

Understanding True or False 1-8 8

4. The Research Process

Creating Creating a diagram

27-40 14

Total 40

Format 3:

Specific Objectives No. of Class

sessions

No. of Items

Cognitive Level Item Distribution

K-C A HOTS

1, List the importance of research

1 ½ 6 / 9-14

2. Identify and justify the type of research that will best address a given research question

3 12 / 15-26

3. Distinguish a statement that describes a good researcher

2 8 / 1-8

4. Create a diagram to illustrate the research process

3 ½ 14 / 27-40

Total 10 40

K – Knowledge; C – Comprehension; HOTS – Higher Order Thinking Skills

Page 4: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 52

Format 4:

Content Class sessions (by hour)

Cognitive Level

Rem Und App An Eval Crea Total Items

Item Dist

1. Importance of Research

1 ½ / 6 9-14

2. Types of Research

3 / 12 15-26

3. Qualities of a Good Researcher

2 / 8 1-8

4. The Research Process

3 ½ / 14 27-40

Total 10 40

Rem – Remembering; Under – Understanding; App – Application; An – Analysis; Eval – Evaluating; Crea – Creating In deciding on the number of items per subtopic, the formula below is observed:

Ex. For the topic on the importance of research, the following are the given: Number of class sessions – 1 ½ Desired total number of items – 40 Total number of class session – 10

Number of items = 1 ½ x 40 = 10

Number of items = number of class sessions x desired total number of items Total number of class session

6 items

Page 5: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 53

The length of time, the type of test, and the type of item used are also factors considered in determining the number of items to be constructed in a test. Gabuyo (2012) presents an estimated average time to answer a specific type of test. (see Table 4)

Table 4. Average Time to Answer Each Type of Test

TEST TYPE AVERAGE TIME TO ANSWER

True-false 30 seconds

Multiple-choice 60 seconds

Multiple-choice of higher-level learning objectives

90 seconds

Short answer 120 seconds

Completion 60 seconds

Matching 30 seconds per response

Short essay 10-15 minutes

Extended essay 30 minutes

Visual image 30 seconds

Test Design and Construction Phase. This is where test items are designed and

constructed following the appropriate item format for the specified learning outcomes of instruction. The test items are dependent upon the educational outcomes and materials/topics to be tested. This phase includes the following steps:

1. Item Construction. According to Osterlind, 1989), the perils of writing test items without adequate forethought are great. Decisions about persons, programs, projects, and materials are often made on the basis of test scores. If a test is made up of items haphazardly written by untutored persons, the resulting decisions could be erroneous. Such errors can sometimes have serious consequences for learners, teachers, and the institution as a whole. Performances, programs, projects, and materials could be misjudged. Obviously, such a disservice to examinees as well as to the evaluation process should be avoided if at all possible.

To help classroom teachers improve the quality of test construction, Kubiszyn and Borich (2007) suggested some general guidelines for writing test items consist of the following:

Begin writing items far enough or in advance so that you will have time to revise them.

Match items to intended outcomes of an appropriate level of difficulty to provide a valid measure of instructional objectives. Limit the question to the skill being assessed.

Be sure each item deals with an important aspect of the content area and not with trivia.

Be sure the problem posed is clear and unambiguous. Be sure that the item is independent of all other items. The answer to

one item should not be required as a condition in answering the next item. A hint to one answer should not be embedded in another item.

Be sure the item has one or the best answer on which experts would agree.

Page 6: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 54

Prevent unintended clues to an answer in the statement or question. Grammatical inconsistencies such as a or an give clues to the correct

answer to those students who are not well prepared for the test. Avoid replication of the textbook in writing test items. Do not quote

directly from the textual materials. Avoid tricky questions in an achievement test. Do not waste time testing

how well the students can interpret your intentions. Try to write items that require higher-order thinking skills.

Types of Test

To create effective tests, the teacher needs to familiarize himself/herself with the different types of tests and avoid any pitfalls in test construction through helpful and definitive guidelines.

Figure 12. Types of Test

Objective Test. This test consists of questions or items that require factual answers. This test can be quickly and unambiguously scored by the teacher or anyone who has the answer key. The response options are often structured that can easily be marked as correct or incorrect, thus minimizing subjectivity or bias on part of the scorer.

a. Selection Test. In this test type, the students select the best possible

answer/s from the choices that are already given and do not need to recall facts or information from their memory.

Types of Test

Objective Test Subjective Test

Selection Test Supply Test

True-False Matching Multiple Choice

Short Answer

Completion Essay

Restricted Extended

Performance Test

Simulated

Performance

Product-

based

Page 7: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 55

a.1 True-false test contains items with only two fixed choices (binary options). The students simply recognize a correct or an incorrect statement that can stand for their knowledge and understanding of facts or information shared.

TRUE-FALSE TEST

Advantages Disadvantages

True-false items are easy to formulate.

The score is more objective than the essay test.

It covers a large range of content in a short span of time.

The test is easy to formulate and quick to check and to score.

It is easier to prepare compared to multiple-choice and matching-type tests.

There is a high probability of guessing.

It is not well-suited for measuring complex mental processes.

It often measures low-level thinking skills that are limited to the ability to recognize, recall, and understand information.

Guidelines

Keep the statement direct, brief, and concise but complete. Each statement should only focus on a single idea unless it has the

intention to show a cause-and-effect relationship. Use approximately the same number of true and false statements. Don’t copy statements directly taken from the textbook. Specify clearly in the direction where and how the students should

mark their answers. Arrange the true and false items in random order to minimize the

chance for students to detect a pattern of responses. BEWARE of using

trivial and tricky questions;

opinion-based statement, unless such a statement is attributed to an author, expert, or a proponent;

superlatives such as best, worst, largest, etc.;

negative or double negatives. If this cannot be avoided, bold negative words or underline it to call the attention of the examinees; and

clues to the correct choice through specific determiners such as some, sometimes, and many that tend to appear in the true statements; and never, always, all, none that tend to appear in the statements that are false.

Page 8: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 56

a.2 Matching type test provides two columns for learners to connect or match words, phrases, or sentences. Column A at the left side contains the descriptions called premises and column B on the right side contains the options for answers called responses. The items in Colum A are numbered while the items in Column B are labeled with capital letters. The convention is for learners to match the given response on the right with the premise on the left.

MATCHING TYPE TEST

Advantages Disadvantages

It is easy to construct, check, and grade.

It can cover a lot of content in the given set of tests.

It provides accurate, efficient, and reliable test scores.

It is best suited for measuring the student’s ability to do associations.

The effect of guessing is less compared to true-false and multiple-choice tests.

It assesses only low level of cognitive domain such as simple recall or memorization of information.

Answering matching questions is time-consuming for students.

Guidelines

The descriptions and options must be short and straightforward. Keep the descriptions and options homogenous or interconnected by

themes. Place all descriptions at the left side and marked it with column A and the

options (expressed in shorter form) at the right side and marked it with column B.

Make all descriptions and options appear on the same page. Allow more options than descriptions or indicate in the directions that

options may be used more than once to decrease the chance of guessing. Specify the basis for matching in the direction. Avoid too many correct answers. When using names, always include the complete name (first and surname)

to avoid ambiguities. Arrange the answer choices in a logical order (chronological or alphabetical

order) to help the examinee locate the correct answer quickly. Give a minimum of three items and a maximum of seven items for

elementary level and a maximum of seventeen items for secondary and tertiary levels.

Page 9: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 57

a.3 Multiple-choice test requires test-takers to choose the correct answer from the list of options given. It includes three parts: the stem, the keyed option, and the incorrect options or alternatives. The stem represents the problem or question usually expressed in the completion form or question form. The key option is the correct answer. The incorrect options or alternatives are also called the distractors or foils.

S

MULTIPLE-CHOICE TEST

Advantages Disadvantages

It can be scored and analyzed efficiently, quickly, and reliably.

It measures learning outcomes at the various levels from the knowledge to evaluation.

It measures almost any educational objective.

It measures broad samples of content within a short span of time.

Its questions/items can further be analyzed in terms of validity and reliability.

If an item analysis is applied, it can reveal the difficulty of an item and its ability to discriminate against the good performing students.

The development of good items in a test is time-consuming.

Plausible distractors are hard to formulate.

Test scores can be influenced by other factors such as the test-wiseness or reading ability of the examinees.

It is not effective in assessing the problem-solving skills of the students

It is not applicable when measuring the ability to organize and express ideas.

Guidelines

Phrase each question concisely. Use simple, precise, and unambiguous wording. Avoid the use of trivial and tricky questions. Use three to five options to challenge critical thinking and discourage

guessing. Present diagram, drawing, or illustration when students are asked to

apply, analyze, or evaluate ideas. Use tables, figures, or charts when students are required to interpret

ideas. Use pictures, if possible, when students are required to apply concepts

and principles.

Page 10: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 58

b. Supply

c. d. e. f. g. h. i. j. k. l.

m.

b. Supply Test. The supply test, otherwise known as the constructed-response test,

requires students to create and supply their own answers or perform a certain task to show mastery of knowledge or skills rather than choosing an answer to a question. It includes short answer, completion type, essay type items. These tests can be categorized as either objective or subjective. They are in objective form when definite answers are asked from the examinees and which answers observe stable scoring or are not influenced by the judgment of the scorers. On the other hand, they are in the subjective form when students are allowed to answer items in the test in their own words or using their original ideas.

Guidelines

The stem should: o be written in question or completion form. If blanks are

provided in completion form, they are placed at the end and NOT at the beginning or in the middle of the sentence/statement;

o be clear and concise (does not use excessive/irrelevant words);

o avoid using negative words such as not or except. If this cannot be avoided, they are written in bold or capital letters to call the attention of the examinee; and

o be free from grammatical clues and errors. Options:

o are arranged in a logical order; o are marked with capital letters; o are listed vertically beneath the stem. o provide only one correct or clearly best answer in each item; o are kept independent and do not overlap with options in

other items; o Are homogenous in content to raise the difficulty level of an

item; o are of uniform or equal length as much as possible; and o avoid or sparingly use the phrases “all of the above” and

“none of the above.” Distractors:

o should be plausible and effective but not too attractive to be mistaken by most students as the correct answer. Each distractor should be chosen by at least 5% of the examinees but not more than the key answer;

o should be equally familiar to the examinees; and o should not be constructed for the purpose of tricking the

examinees.

Page 11: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 59

b.1 Short-answer test contains items that ask students to provide exact answers. Rather than simply choosing from several options provided, the examinees either provide clearly-defined answers or compose short phrases for answers.

SHORT ANSWER TEST

Advantages Disadvantages

It takes a shorter time to answer than an essay test.

The answers are generally definite, thus, easier to check and to score.

There is greater objectivity and reliability in scoring than essay tests.

It has more extensive topic coverage compared to an essay test.

The answers to test questions/items are not pre-selected but supplied by the examinees.

There is less chance of guessing.

Its preparation and administration are easier than an essay test.

More emphasis is placed on rote learning.

It cannot measure ability and attitude.

It is weak to measure language skill/expression.

Objectivity and accuracy in scoring may be influenced by the examinee’s handwriting and spelling skills.

Guidelines

Clearly specify in the test direction how the question should be answered.

Frame questions/items using words that are easily understood by the examinees.

Restate or do not copy exact wordings from the text. Make sure that examinees provide factually correct answers.

Page 12: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 60

b.2 Completion test or fill-in-the-blank test requires examinees to supply word/s, phrase/s symbol/s, or number/s to complete a statement.

Subjective Test. This test allows the student to organize and present answers in

their own words or using their original ideas. This test can be influenced by the judgment or opinion of the examinees and the scorers; nevertheless, it allows assessment of aspects in students’ performance that are complex and qualitative. Questions raised may elicit varied answers that can be expressed in several ways.

b.3 Essay test is a subjective type of test that requires examinees to structure long written response to answer a question. This test measures complex cognitive

skills or processes and is usually scored on an opinion basis. It may require the examinees to give definitions, provide interpretations, make evaluations or comparisons, contrast concepts, and demonstrate knowledge of the relationships (Morrow, et al., 2016).

b.3.1 Restricted response essays set limits on the content and response given by the students. Limitations in the form of the response are well- specified in the given essay question/item. Example: Point out the limitations of objective type of test in 300 words. Your answer will be scored in terms of content and organization (5pts.), quality of writing (3 pts.), and

COMPLETION TEST

Advantages Disadvantages

It is easy to construct. It minimizes guessing. It has wider coverage in terms

of content

It is more difficult or tedious to score than other objective types of tests.

It is typically not suitable for measuring complex learning outcomes.

Guidelines

Only the keywords to be supplied by the examinees should be omitted. The item should require a single-word answer or brief answers. Use only one blank per item. Preferably, place it at the end of the

statement. Blanks provided should be equal in length. Their length should provide

sufficient space for the answer. Do not use indefinite statements that allow varying answers. Indicate the units (e.g. cm, ft, inc) when items require numerical

answers. Avoid grammatical clues such as a or an. Do copy the exact sentences from textbooks.

Page 13: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 61

grammar usage and mechanics (2 pts.).

b.3.2. Extended response essays allow the students wide latitude of expression in terms of the length and complexity of the response. It is best suited to measure the examinee’s ability to organize, integrate, synthesize, and evaluate ideas.

Example: Is a valid test reliable? Thoroughly discuss your answer. Scoring Rubric:

Descriptor Points

The essay demonstrates complete knowledge and understanding of the topic. It uses clear and precise language.

4

The essay demonstrates very good knowledge and understanding of the topic. It uses clear language with occasional lapses.

3

The essay demonstrates a good knowledge and understanding of the topic. It uses clear language and precise language for the most part.

2

The essay demonstrates little knowledge and understanding of the topic. The language is only partly clear and accurate.

1

The essay demonstrates no real knowledge and understanding of the topic. The language is not clear and inaccurate.

0

Page 14: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 62

c. Performance Test. This assessment type requires students to perform a

task or activity or do an actual demonstration of essential and observable skills and the creation of products. This may be in the form of simulated performance or work samples.

c.1 Simulated performance requires examinees to carry out the basic rudiments of skill in a realistic context rather than simply choosing an answer from a ready-made list. Examples are recital, dramatic enactment, role-playing, participating in debate, public speaking, and entrepreneurial activity, etc.

ESSAY TEST

Advantages Disadvantages

It is most useful in assessing higher-order thinking skills.

It is best to develop logical thinking and critical reasoning.

It takes less time and easy to construct.

It largely eliminates guessing. It can effectively reveal

personality and measure opinions and attitudes.

It gives examinees freedom to plan their answers and respond within broad limits

It is difficult to check and score. It observes inconsistent and

unreliable procedures for scoring.

Test effectiveness is difficult to analyze and establish.

Its reliability is often low because of the subjective scoring of the answers.

It does not allow a larger sampling of content.

It encourages bluffing. Scoring may be affected by good handwriting, neatness, grammar, etc.

It entails excessive use of time for answering.

Scores may be affected by personal biases or previous impressions.

Guidelines

Use rubrics for scoring an essay answer. Do not begin with who or what in writing your essay question. Use clear and unambiguous wording of the essay questions/items. Indicate the values or assigned points for every essay question/item. All examinees should be required to answer all and the same essay

questions for valid and objective scoring. Keep the students anonymous while checking the essay answers. Evaluate all answers to one question before going on to the next. Make sure that students have ample time to answer the essay test.

Page 15: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 63

c.2 Product-based focuses on the final assessable output and not on the actual performance of making the product. Examples are portfolios, multimedia presentations, posters, ads, and bulletin boards.

2. Test Assembling. After constructing the test items, arrange the test items. There are two steps in assembling the test: (1) packaging the test; and (b) reproducing the test. Gabuyo (2012) sets the following guidelines for the assembling of the test.

Group all test items with similar format. Arrange test items from easy to difficult. Space the test items for easy reading. Keep items and options on the same page. Place the illustrations near the description. Check the answer key. Decide where to record the answer.

3. Writing direction. All test directions must be complete, explicit, and simply worded. The

type of answer that is elicited from the learners must be clearly specified. The number of items to which they apply, how to record their answers, the basis of which they select

PERFORMANCE TEST

Advantages Disadvantages

They can measure complex learning outcomes in a natural setting.

The students can apply the knowledge, skills, and values learned.

They promote more active student engagement in an activity.

They can help identify the students’ strengths and weaknesses.

They provide a more realistic way of assessing performance.

Scoring procedures are generally subjective and unreliable.

They demand a great amount of time for preparation, administration, and scoring.

They can possibly be costly. They rely heavily on students’

creativity and drive.

Guidelines

Focus on skill or product to be tested. They should relate to the pre-determined learning outcomes.

Provide clear directions on the task or product required. Clearly communicate expectations.

Minimize dependence on skills that are not relevant to the intended purpose.

Use rubrics to rate performance or product.

Page 16: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 64

answers, and the criteria for scoring or scoring system, and the time allocated for each type of test (if so required) should also be indicated in the test instructions.

Example:

4. Checking on the assembled test items. Before reproducing the test, it is very important

to proofread first the test items for typographical and grammatical errors and make necessary corrections if any. If possible, let others examine the test. This can save time during the examination as it will not anymore cause any distraction to the students.

Table 5. Checklist for Checking the Test Items

Checklist Yes No

1. Are the test items appropriate to measure the set learning outcome?

2. Does the test allow learners to demonstrate a range of learning?

3. Are the directions clear, complete, and precise?

4. Does the test use simple and unambiguous wordings?

5. Does the test include varied test types?

6. Can the test be answered within the allotted time?

7. Are the items of the same format grouped together?

8. Are the test types arranged logically – from simple to more complex types?

9. Are there no tricky and unnecessary clues in the test?

10. Does the test provide realistic and fair guidelines for scoring?

11. Are there no spelling, punctuation, and grammatical errors?

12. Does the test, as a whole, have student appeal?

WEAK BETTER

Direction: Choose the best answer for each given question.

Direction: Study the rubric below and identify the correct answer for the questions that follow (marked as items 6, 7, 8, 9, and 10). Write the CAPITAL LETTER corresponding to the correct answer on the space provided before each number. (1 point each)

Page 17: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 65

Reviewing Phase. In this phase, items in the test are examined in terms of their validity and usability. The test will then be administered to a sample group for reliability testing. The initial results will be subjected to an item analysis to determine the discrimination and difficulty indexes of the test before it will be rolled out to the actual participants.

1. Validating the test. This is done by checking on the relevance, appropriateness, and meaningfulness of the test to the purpose it claims to serve. This is an imperative requirement for the accurate application and interpretation of the test results. There are different types of validity: a. Content validity establishes the relevance and representativeness of the

assessment instrument of the behavior or targeted construct that it is designed to measure. It answers the question: Is the test completely a representation of what it desires to measure? This form of validation assesses the quality of items on a test. Experts are asked to review the test items judgmentally in reference to the learning outcomes and/or instructional objectives. The preparation of Table of Specification (TOS) before test construction strengthens the content validity of a test.

b. Criterion-Related validity measures how well scores on one measure relate to or predict scores on theoretically similar measures. It addresses the question: Do the results from a test correspond to a different test of the same thing? Example: The scores obtained by the students in the teacher-made and standardized reading tests indicate similar levels of performance. This validation can further be classified into two – concurrent and predictive validity.

b.1 Concurrent validity. Concurrent means occurring or existing side by side. In this form of validation, the criterion and predictor data are collected simultaneously, hence, you can estimate individual performance on different tests at approximately the same time. This test is best to use when you want to diagnose the students’ current criterion status (Gabuyo, 2012).

Example: A teacher gives his/her students a test designed to measure

language ability. The scores the students will obtain on a test can be compared with the test scores in a recognized and duly validated test tool already held or kept by the school. The scores in the teacher-made test can be correlated with the scores in the validated test tool using a statistical formula such as the Pearson Product Moment Coefficient of Correlation to establish its validity.

Page 18: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 66

b.2 Predictive validity. This approach of criterion validity utilizes the student’s current test result to estimate his/her performance on some criterion measure administered at some point in the future. This is determined by doing correlational, regression, or path analyses in which the coefficient between the test of interest and criterion assessment serves as an index measure.

Example: The SAT/ACT scores of higher education institutions are used to

predict a students’ potential to succeed in their chosen career in college.

c. Construct Validity. This type of validity determines how well a test measures what it is designed to measure. It is the ability of an assessment tool to measure a theoretical or unobservable variable quality that it claims to test. Is the test constructed in a way that it successfully tests what it claims to test? Does the test measure the concept or construct that it is intended to measure? Construct validity has two sub-types which are: c.1 Convergent validity establishes that a test has a high level of correlation with another test that measures the same construct. Example: If the instruments that measure self-co1ncept and self-worth yield

scores that are close enough or with a high level of correlation, the two measurements converge. The result indicating a high level of correlation between two tests underpins their validity.

c.2 Discriminant validity shows low or no correlation between two tests

measuring different constructs. Example: If scores measuring self-worth and depression do not converge, the

instruments used are measuring different constructs. The low or absence of correlation indicates the discriminant validity between two tests on self-worth and depression.

Factors to the Validity of a Test

The following are the factors that can lower the validity of a test:

o Ambiguity o Unclear directions o Errors in test scoring o Inappropriate length of the test o Flaws in test administration o Poor construction of test items o Identifiable clues or patterns for answers o Inappropriate level of difficulty of test items

Page 19: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 67

o Incorrect arrangement of test types and test items o Insufficient time to answer the entire test o Difficult to understand or incomprehensible wordings

Guidelines to Improve the Validity of a Test To improve the validity of a test, the following guidelines are set:

o Review your stated objectives/outcomes. o Make sure that your assessment method matches your set

objectives/outcomes. o Obtain feedback from other teachers concerning the assessment

method, procedure, and format. o Involve students. Have them look over the prepared instrument and

identify difficulties. o Give a reasonable length of the test. o Ensure the proper administration of the test.

Validity Coefficient

The validity Coefficient is a statistical index used to report evidence of validity for intended interpretations of test scores and defined as the magnitude of the correlation between test scores and a criterion variable (Encyclopedia of Measurement and Statistics, 2017). In most cases, it is the computed value of the rxy using the Pearson r formula. It is reported as a number between 0 and 1.00 that indicates the magnitude of the relationship. As a general rule, the higher the validity coefficient the more beneficial it is to use the test. According to the US Department of Labor Employment and Training Administration (1999), validity coefficients of r =.21 to r =.35 are typical for a single test. Validities for selection systems that use multiple tests will probably be higher because you are using different tools to measure/predict different aspects of performance, where a single test is more likely to measure or predict fewer aspects of total performance. Validity coefficients can be interpreted as follows:

Table 6. Validity Coefficient Value

Validity coefficient value Interpretation

above .35 Very Beneficial

.21 - .35 Likely to be Useful

.11 - .20 Depends on Circumstances

below .11 Unlikely to be Useful

Example: Teacher Johnny wanted to know if the 30-item test he prepared to

measure his students’ mathematical ability is valid. He administered the test to his 15 high school students and compared the results with another test that is already recognized or acknowledged for its validity

Page 20: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 68

and used it as a criterion. Is the test developed by teacher Johnny valid? The following table shows the results of the two tests.

Determine the Validity Coefficient using the Pearson r:

( )( )

√( ( ) ) ( ( ) )

( ) ( )( )

√( ( ) ( ) ) ( ( ) ( ) )

√( ) ( )

√( ) ( )

Students Scores in Math

Test (x)

Scores in Criterion Test (y) xy x2 y2

1 14 12 168 196 144

2 20 27 540 400 729

3 25 25 625 625 625

4 16 17 272 256 289

5 30 30 900 900 900

6 23 25 575 529 625

7 10 16 160 100 256

8 28 29 812 784 841

9 20 19 380 400 361

10 18 23 414 324 529

11 9 7 63 81 49

12 27 29 783 729 841

13 29 26 754 841 676

14 24 25 600 576 625

15 5 15 75 25 225

∑x=298 ∑y=325 ∑xy=7121 ∑x2=6766 ∑y2=7715

Page 21: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 69

Interpretation: The correlation coefficient is 0.88 which indicates that the validity of the test is high and therefore, very beneficial to serve its intended purpose.

Coefficient of Determination

Another way of interpreting the findings is to consider the squared correlation coefficient (rxy)

2 otherwise known as the coefficient of determination. This is a statistical measurement that examines and indicates how much variation in the criterion can be accounted for by the teacher-made test (predictor). Example: Using the preceding value for rxy = 0.88. The coefficient of

determination is 0.7744. This means that 77.44% of the variance in students’ scores can be attributed to the test or 22.56% (100.00%-77.44%) cannot be attributed to the test.

2. Pilot testing. This step in the process means conducting a “test rehearsal” to a

try-out or sample group to test the validity of a test. A selected group of learners tries answering a test and their obtained scores in the test provide useful feedback prior to the deployment of the test to the target group of examinees. The data gathered from this experimental procedure helps the teacher to perform a preliminary analysis of the feasibility, practicality, and usability of the techniques, methods, and tools used in a test. Early detection of probable problems and difficulties can take place before the actual test administration. This will eventually reveal aspects of the test and the conduct of it that need to be refined or improved. The following are the important considerations in the pilot testing of a test:

o Create a plan. Identify the smaller sample to be tested. Determine the time duration, cost, correspondences, and the persons to collaborate with in the conduct of a test.

o Prepare for a pilot test. Set the time and venue for the conduct of a pilot test. Check on the condition of the test-takers. Try to eliminate aspects in the environment that will threaten the validity of the test such as the lightings, ventilation, and orderliness of the test/examination room.

o Deploy the test. Regulate the test-takers and the condition of the venue. Make sure that the test can obtain truthful and objective information. Keep an eye on any form of distraction or interruption during the conduct of the test. Address the specific needs of the test-takers. Provide clear and complete answers in case they ask questions or clarifications about the test.

o Assess and evaluate the pilot test. Reflect on the pilot-testing activity that took place. Identify flaws in the process and devise a plan so they can be avoided during the actual test. Organize and collate scores for further analysis.

Page 22: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 70

Range of Difficulty Index Description

0.00 – 0.20 Very Difficult

0.21 – 0.40 Difficult

0.41 – 0.60 Moderately Difficult

0.61 – 0.80 Easy

0.81 – 1.00 Very Easy

3. Item Analysis.

After the conduct of pilot-testing, the teachers score and look into the quality of each item in the test. This procedure helps the teachers identify good items by doing an item analysis. Item analysis is a process that examines student responses to individual test items (questions) in order to assess the quality of those items and the test as a whole (University of Washington-Office of Educational Assessment, 2020). This is certainly vital in improving items in later tests. Also, the teachers can clearly spot and eliminate ambiguous or misleading items, design appropriate remediation, and construct a test bank for future use.

The most common method employed for item-analysis is the Upper-Lower (U-L) index method (Stocklein cited in Gutierrez, 2020). This analysis provides teachers with three types of information which include (a) difficulty index, (b) discrimination index, and (c) distractor or option-response analysis.

The difficulty index is determined in terms of the proportion of students in the upper 27% and lower 27% group who answered a test item correctly.

The steps (Stocklein cited in Gutierrez, 2020) are as follows: 1. Score the test papers and arrange the total scores from the highest to

lowest. 2. Sort the top and bottom 27% of the papers. 3. Tally the correct answers to each item by each student/test taker

in the upper 27% group. 4. Repeat Step 3 but this time, consider the lower 27%. 5. Get the percentage of the upper group that obtained the correct answer and call this U. 6. Repeat Step 5 but this time consider the lower group and call this L. 7. Get the average percentage of U and L. 8. Get the difference between U and L percentages.

Formula:

Table 7. Index Range for Level of Difficulty To interpret the obtained difficulty

index, use the table to your right. A good or retained item must have both acceptable indexes of difficulty and discrimination index. The acceptable index of difficulty ranges from 0.41 to 0.60 while the acceptable index of discrimination ranges from +0.20 to +1.00.

Difficulty Index = % of the upper group who got the item right + % of the lower group who got the item right 2

Page 23: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 71

Discrimination Index = % U – % L

The discrimination index is the power of the test item to discriminate those who scored high (in the upper 27%) and those who scored low (in the lower 27%) on the total test.

Formula:

Table 8. Index Range for Level of Discrimination

Index Range Discrimination Level

0.19 and below Poor item should be eliminated or need to be revised

0.20-0.29 The marginal item needs some revision

0.30-0.39 Reasonably good item but possibly for improvement

0.40 and above Very good item

For interpretation and decision:

If an item obtained an acceptable level of difficulty (index ranges from 0.41 to 0.60) and discrimination (index is 0.40 and above), it is considered a good item and must be retained. If an item is not or unacceptable in either difficulty or discrimination indices, it is considered fair and must be revised. Finally, if an item is not or unacceptable in both indices, it is considered a poor item and therefore, must be rejected.

Table 9. Guide to Making Decision for the Retention, Revision, and Rejection of Test Items

Difficulty Index (0.41-0.60)

Discrimination Index (0.40 and above)

Remarks Decision

acceptable acceptable good item retain

acceptable not acceptable fair revise

Not acceptable acceptable fair revise

Not acceptable Not acceptable poor reject

Page 24: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 72

13.5 or 14

13.5 or 14

SAMPLE RESULTS OF ITEM ANALYSIS

(number of students tested – 50)

Item

Number

Upper 27% Lower

27% Diff

Index

Remarks Discr

index Remarks Decision

14 % 14 %

1 12 0.86 3 0.21 0.54 moderately

difficult 0.65 very good retain

2 14 1.00 7 0.50 0.75 easy 0.50 very good revise

3 7 0.50 10 0.71 0.61 easy -0.21 poor reject

4 12 0.86 6 0.43 0.65 easy 0.43 very good revise

5 10 0.71 4 0.29 0.50 moderately

difficult 0.42 very good retain

Upper 27% = 50 x 0.27 =

Lower 27% = 50 x 0.27 =

For item 1 Difficulty Index

Under Upper 27 % =

=

=

Under Lower 27 % =

=

=

Difficulty Index = 0.86 + 0.21 =

2

For item 1 Discrimination Index

Discrimination Index = % Upper – % Lower

= 0.86 - 0.21 =

The distractor analysis examines effectiveness or how well the incorrect

choices contribute to the quality of an item in a multiple-choice test. It addresses the performance of incorrect response options called the distractors. The distractor should be plausible that it can be chosen by those examinees that are not sufficiently knowledgeable in the content area. On the other hand, it must not be too attractive that it can be chosen by the greater proportion of the examinees than the keyed option (right answer). The proportion of examinees that chose the keyed option must be, more or less equivalent to the p-value or difficulty index.

0.86

0.21

0.54

0.65

Page 25: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 73

Example: Let us assume that 100 students took the test. If A is the key (right answer) and the item difficulty is 0.70, then 70 students answered correctly. What about the remaining 30 students and the effectiveness of the three distractors?

Options Remarks

Difficulty Index

A B C D

Key Distractors

0.70

70

30

0

0

If the remaining 30 students chose B, then options C and D are useless in their role as distractors.

70 15 15 0 If the remaining students selected options B and C, then option D is a useless distractor.

70 10 10 10 This is an ideal situation because each of the three distractors was selected by 10 students. Options B, C, and D now appear as plausible distractors.

4. Reliability Testing Effective assessments are dependable and consistent that yield reliable evidence (Suskie, 2018). Reliability refers to the consistency of measure which can be affected by the clarity of the assessment tool and the capability of those who use it. Reliable assessment tools generate repeatable and consistent results over time (test-retest), across items (internal consistency), and across different raters or evaluators (inter-rater reliability). If the test tool or instrument is unreliable, it cannot produce a valid outcome. a. Test-retest reliability indicates the repeatability of test scores when

administered twice to the same group of examinees with a time interval in between (e.g. two-week time interval). The two sets of scores are correlated using the Pearson Product Moment Coefficient of Correlation (r) that establishes the measure of reliability. The reliability coefficient is expressed in a range of scores from 0.00 to 1.00 which are denoted as follows: (see Table 10)

Page 26: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 74

Table 10. Level of Reliability Coefficient (Navarro & Santos, 2012)

Reliability Coefficient Interpretation

0.90 and above Excellent reliability The test is at the level of best-standardized

tests

0.81-0.90 Very good for a classroom test

0.71-0.80 Good for a classroom test There are probably a few items that need to

be improved

0.61-0.70 Somewhat low The test needs to be supplemented by other

measures (e.g. more tests) to determine grades

There are probably some items which could be improved.

0.51-0.60 Suggests need for revision of test, unless it is quite short (ten or fewer items

The test definitely needs to be supplemented by other measures (e.g. more tests for grading)

0.50 and below Questionable reliability The test will not contribute heavily to the

course grade It needs revision

Example: Professor Oz constructed a reading test and subjected it to a test-retest method using 15 students to ensure its reliability. The table shows the scores obtained by the examinees during the first and second administration of the test observing a 15-day interval in between. Is the test reliable? The formula for the Pearson r:

( )( )

√( ( ) ) ( ( ) )

Page 27: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 75

Students

Test Scores

(1st Run) Re-test Scores

(2nd Run) xy x2 y2

1 17 22 374 289 484

2 49 28 1372 2401 784

3 26 12 312 676 144

4 32 30 960 1024 900

5 12 40 480 144 1600

6 27 31 837 729 961

7 18 17 306 324 289

8 38 27 1026 1444 729

9 33 40 1320 1089 1600

10 24 30 720 576 900

11 9 18 162 81 324

12 34 35 1190 1156 1225

13 33 41 1353 1089 1681

14 46 38 1748 2116 1444

15 29 39 1131 841 1521

∑x=427 ∑y=448 ∑xy=13291 ∑x2=13979 ∑y2=14586

( )( )

√( ( ) ) ( ( ) )

( ) ( )( )

√( ( ) ( ) ) ( ( ) ( ) )

√( ) ( )

√( ) ( )

The obtained rxy = 0.36 which is lower than 0.50 indicates that the test has a questionable level of reliability. It will not contribute to successfully meet the desired course outcomes and therefore needs to be thoroughly reviewed and revised.

Page 28: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 76

b. Internal consistency gauges how well the items in an instrument can produce consistent or similar results on multiple items measuring the same construct. If people’s responses to the different items are not correlated with each other, then it would no longer make sense to claim that test items are all measuring the same underlying construct. Another approach to measuring internal consistency is the split-half correlation. In this method, all items that measure the same thing are randomly split into two such as the first and second halves of the items or the even- and odd-numbered items. The score is computed for each set of items, and the relationship between the two sets of scores is examined. The Pearson r can be applied to find the correlation coefficient between the two halves The Kuder-Richardson Formula 20, or KR-20 also checks the internal consistency of a test instrument with binary or dichotomous choices such as true or false, right or wrong. It is similar to performing the split-half method. A correct answer is scored 1 while 0 is assigned for and incorrect answer. Formula:

(

)

Where: k = number of test questions/items pj = proportion of examinees passing the item qj = proportion of examinees failing the item σ2 = variance of the total scores of all the people taking the test

Example: A true-false test with 10 questions is administered to 17 students. The results are listed in the table that follows. Determine the reliability of the questionnaire using Kuder and Richardson Formula 20.

Page 29: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 77

Kuder and Richardson Formula 20

Q1 Q2 Q3 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Total

1 1 1 1 1 1 1 1 1 1 1 10

2 1 1 1 1 1 1 1 1 0 0 8

3 1 1 1 1 1 1 1 1 0 0 8

4 1 1 1 1 1 1 1 1 1 0 9

5 1 1 1 1 1 1 1 1 1 0 9

6 0 0 1 1 1 1 1 1 0 0 6

7 0 0 0 0 0 0 1 0 0 1 2

8 0 1 1 1 1 1 1 1 1 0 8

9 0 1 1 1 1 1 1 1 1 0 8

10 0 1 0 0 1 0 0 0 0 0 2

11 0 1 1 1 1 1 1 1 1 0 8

12 0 0 0 0 0 1 1 1 1 1 5

13 1 1 1 1 1 0 1 1 1 1 9

14 1 1 1 1 1 1 0 1 1 1 9

15 1 0 0 0 1 1 0 1 1 0 5

16 1 1 1 0 0 0 1 1 1 1 7

17 1 1 1 0 0 0 1 0 1 1 6

Total 10 13 13 11 13 12 14 14 12 7 119

p 0.588 0.765 0.765 0.647 0.765 0.706 0.824 0.824 0.706 0.412

q 0.412 0.235 0.235 0.353 0.235 0.294 0.176 0.176 0.294 0.588

pq 0.242 0.18 0.18 0.228 0.18 0.208 0.145 0.145 0.208 0.242 1.958

k 17

var 5.294

KR20 0.670

The yielded ρKR20 value of 0.67 means that the reliability of the instrument is

somewhat low but is already within the acceptable range value (0.60 or higher).

The test needs, however, to be supplemented by other measures to serve as the

basis of grades and some items probably need to be improved.

Cronbach’s alpha is the most common measure of internal consistency when there are multiple options given to answer an item in the test. An instrument that uses

Cell Entity Formula

B26 k =COUNTA(A3:A19)

L24 ∑pq =SUM(B24:K24)

B27 σ2 =VAR(L3:L19)

B28 =(B26/B26-1)*(1-(L24/B27))

Page 30: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 78

the Likert Scale to do assessments in the affective domain can apply this formula to ensure its ability to draw out reliable answers from the respondents. The formula for Cronbach’s coefficient alpha is:

(

)

Where: k= the number of items in a measure = the variance of each individual = the variance of the total items on the index

This application is easier and more efficiently done using Microsoft Excel. Example: The researcher used the Likert scale for the respondents to rate their attitude toward Mathematics as a subject. The respondents showed their agreement or disagreement on the different items raised in the questionnaire. The responses of the participants were coded from the lowest (strongly disagree) with an assigned score of 1 to the highest (strongly agree) with a designated score of 5. The table reflects the answers of the 19 respondents on the 5-item test.

Cronbach’s Alpha using Microsoft Excel

Items

Respondents Q1 Q2 Q3 Q4 Q5

1 4 4 4 4 4

2 4 4 4 4 4

3 3 3 3 3 3

4 4 5 4 5 5

5 2 2 2 2 2

6 4 4 4 5 5

7 5 2 3 4 5

8 4 4 5 4 5

9 4 3 2 5 3

10 4 4 4 5 5

11 5 2 3 4 5

12 2 3 3 4 5

13 3 3 3 3 3

14 5 4 4 4 5

15 4 4 4 4 5

16 2 3 1 4 3

17 4 4 4 5 5

18 5 5 5 5 5

19 5 5 5 5 5

Page 31: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 79

OK

Step 1. On the top menu, click data analysis > two-way ANOVA without

replication > OK.

Step 2. The ANOVA without replication dialogue box will appear. Click the

input range > new worksheet ply > OK.

Highlight the scores

Page 32: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 80

Step 3. The generated output will appear in a new sheet.

Step 4. Apply the formula shown below:

1- (

)

1- (

)

1- (

)

1- 0.119606

0.88096

The 0.88096 represents the instrument has a very high level of

reliability to assess students’ attitude toward Math as a subject

c. Inter-rater reliability measures the extent to which two or more raters or

examiners agree in their judgments or scoring of a test. This method addresses the

issue of consistency in the implementation of a grading system. The basic measure

for inter-rater reliability is a percent agreement between two or more raters.

Page 33: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 81

For example, two raters observed five learners who executed the basic steps

in a folk dance. The table shows how each demonstration of skill was rated.

Students Rater 1 Rater 2

1 2 3

2 4 4

3 1 4

4 5 5

5 3 3

The 2 raters agreed on 3 out of 5 scores. Percent agreement is determined by

(a) counting the number of ratings in agreement which is 3, (b) counting the

total number of ratings which is 5, (c) dividing the number of ratings in

agreement by the total number of ratings (3/5), and then (d) convert the

result (0.60) to a percentage (60%). The 60% agreement is at an acceptable

level of reliability.

If there are multiple raters, add columns for the pairing of results and to

indicate the agreement.

Example:

Students Rater 1

Rater 2

Rater 3

R1&R2 R1&R3 R2&R3 Agreement

1 4 3 4 0 1 0 1/3

2 1 2 3 0 0 0 0/3

3 2 4 4 0 0 1 1/3

4 5 5 5 1 1 1 3/3

5 3 3 4 1 0 1/3

Calculate for the mean of fractions in the agreement column.

+

The obtained result (53%) falls within the range of 0.51 and 0.60. It suggests that the test has a questionable level of reliability. Revision of the test and adoption of more tests to provide a valid assessment of students’ performance are called for.

Page 34: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 82

Activity A. Preparing Table of Specification. Revisit the answers you indicated in Activity I B in module 2. Finalize your choice of the topic related to your field of discipline and target learning outcomes. Based on the finalized topic of choice and formulated learning outcomes, prepare a Table of Specification using format 4. Indicate your activity on a separate paper observing the following layout:

a. Orientation – portrait b. Margin – 1” at the top, bottom, right, and left c. Size – A4 (8.3 by 11.7 in) d. Spacing – single

Activity B. Test Design and Construction. Based on the chosen topic and the specified learning outcomes, construct a test (unit or periodic test). Use a separate

sheet with the layout indicated in activity A. (1 point each)

Exercises.

I. Identification. Specify on the blank before each number the type of objective test being described in each of the items below.

_______________ 1. This test asks examinees to perform a task or activity. _______________ 2. A test consists of incomplete statements to be filled in by the examinees. _______________ 3. A test that presents a question to be answered by the examinees in a word or phrase. ________________4. An objective test that requires examinees to choose only one correct answer from the three or more options provided. _______________ 5. This test gives a situation or a question that can be addressed by having students construct a rather long response up to several paragraphs.

Page 35: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 83

II. Short Answer. Study the data in the table below and answer the questions that follow. Write your answer on the space provided. (1 point each)

Using the results of the 5-item test given to 60 students, Teacher Trixie

wanted to determine which items need to be retained, revised, and rejected. She encoded the data and determined the number of students who got each of the 5 items correct from both the upper and lower 27th group.

Item Analysis Results

Item Number Upper 27% Lower 27%

1 16 16

2 10 3

3 4 0

4 15 10

5 12 4

1. Which item/s has/have an acceptable level of difficulty?

__________________________

2. Which item does/do NOT have an acceptable discrimination index?

__________________________

3. Which item/s need/s to be retained?

__________________________

4. Which item/s need/s to be revised?

__________________________

5. Which item/s need/s to be rejected?

__________________________

Page 36: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 84

III. Identifying errors. Each item below contains blunders/errors/violations based on the guidelines of test construction. Identify and reflect them in the box provided at the right. (3 pts. each)

SENTENCE COMPLETION Fill in the blanks with the correct word to complete the statement.

1. A test is a ___________ used to establish the quality, __________, or reliability of ________,

TRUE OR FALSE Tell whether the statement is true or false.

2. Scoring an essay test is always difficult.

MULTIPLE CHOICE Choose the best answer. 3. Which is the best way of dealing with discipline

problem in the classroom? A. Always give test B. Impose punishment C. Talking with the child in private D. Must involve the parents

ESSAY Construct an essay to answer the question. 4. List the 7-step path to making “ethical decisions.”

List them in their correct progressive order.

progressive order.

MATCHING TYPE Match column A with B. Letter only. A B ___1. Multiple choice A. Most difficult to score ___2. True-False B. Students can simply make guesses ___3. Short Answer C. measures greatest variety of learning outcomes D. Least useful for educational

diagnosis

Page 37: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 85

IV. Critiquing. In 2 to 3 sentences, state your comment on the validity of the

assessment practices of teachers reflected in the following scenarios: 5 points

each).

I.

Scoring Rubric

Level Score Description

Exemplary 5 points The comment/remark is accurate with all main points included.

There is a clear and logical presentation of ideas.

Sentences are well-structured and free from grammatical and/or syntactic errors.

Very Good 4 points The comment/remark is accurate but there are minor problems in logic and construction.

Few grammatical/syntactic errors are found.

Good 3 points One or two major points are missing in the comment/remark.

Few grammatical/syntactic errors are found.

Needs Improvement

2 points The answer does convey a full understanding of the lesson.

The quality of writing is inferior.

Unsatisfactory 1 point The answer is inaccurate or deviates from what is asked.

Sentences are disorganized and contain major grammatical/syntactic errors.

Scenario 1. Teacher Luna constructed a 50-item summative test in Filipino for grade six pupils. She prepared a TOS according to the pre-determined topics outlined in the course program. Being assigned a special assignment, she missed delivering almost half of the topics that she was supposed to cover. To make up for her absences, she distributed hand-outs or copies so students could proceed despite her failure to hold a regular class. Will the test provide a valid result?

Please write your answer here.

Formatted: Justified, Indent: Left: 0.75", Nobullets or numbering

Page 38: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 86

Scenario 2. Teacher May set this learning outcome for her students to develop at the end of the lesson: the students are able to demonstrate proficiency in writing and verbal skills. Considering the time and effort she will exert in checking the papers, she finally opted to give a short answer test where students will still be required to construct short sentences. Does her assessment method match her target outcome? Will she be able to measure what she’s supposed to measure?

Scenario 3. Sir Ben gave a 120 multiple-choice test in Math for his college students to answer in one hour. Due to lack of time, more than 50% of the students were not able to finish. The students appealed that the remaining unanswered items will not be counted and students’ scores will only be based on the number of items they were able to finish. Sir Ben finally conceded to the students’ request Is his decision proper? Will it not invalidate the test?

Please write your answer here.

Please write your answer here.

Page 39: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 87

V. Problem-Solving. Perform what is asked.

Teacher Marsh prepared a 30-item test to measure the students’ level of

comprehension. She wanted to find out if the instrument she used can elicit

stable results over a period of time. She did two administration of the test to a

group of learners with a gap of 15 days in between. Using the data in the

table, calculate for the coefficient reliability of the instrument, and state your

interpretation.

You will be graded based on the following:

Process 5 points (steps)

Answer 3 points (result of computation)

Interpretation 2 points (description based on reliability index)

10points

Students 1st Run 2nd Run

1 22 21

2 13 19

3 24 24

4 25 19

5 16 18

6 3 12

7 23 26

8 21 25

9 22 25

10 15 20

11 16 19

12 19 18

13 21 22

14 3 14

15 4 9

16 2 12

17 16 23

18 8 13

19 26 24

20 24 30

21 30 30

22 16 14

Answer: _________________ Interpretation: __________________________________________________________

Please show your process here.

Page 40: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 88

VI. Illustrating the Concept. Present a creative and inventive illustration that

depicts the test development process. Label every phase and the

corresponding activities done by the teacher/assessor for a clearer

representation of the process. (10 pts.).

Criteria for scoring:

Content maximum of 5points

o Accurate and reflects

complete understanding of the topic

Presentation maximum of 3 points

o Neat, organized, logical, creative, and

original

Mechanics maximum of 2 points

o Free of spelling, grammar, and

Punctuation error

Please write your answer here.

Page 41: CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT · CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT This material is originally prepared and contains

CHAPTER 3: DEVELOPMENT OF TOOLS FOR CLASSROOM-BASED ASSESSMENT

This material is originally prepared and contains digital signatures. Further reproduction is prohibited without prior permission from the authors. Page 89

References: Encyclopedia of Measurements and Statistics. (2017). Validity Coefficient. Retrieved September 4,

2020, from http://methods.sagepub.com/Reference//encyclopedia-of-measurement-and-statistics/n470.xml#:~:text=The%20validity%20coefficient%20is%20a,intended%20meaning%20of%20the%20test).

Gabuyo, Y. (2012). Assessment of Learning 1. Manila: Rex Book Store. Gutierrez, D. (2007). Assessment of Learning Outcomes (Cognitive Domain). Malabon, Metro

Manila: Kerusso Publishing House. Kubiszyn, Tom & Borich, G. (2007) Educational Testing and Measurement: Classroom Application and

Practice. 8th Ed. Wiley Jossey-Bass Morrow, J. et al. (2016). Measurement and Evaluation in Human Performance. 5 th edition.

USA: Thomson-Shore, Inc. Navarro, R., and Santos, R. (2012). Assessment of Learning Outcomes, 2 nd edition. Manila,

Philippines: Lorimar Publishing, Inc. Osterlind S.J. (1989) What Is Constructing Test Items?. In: Constructing Test Items.

Evaluation in Education and Human Services, vol 25. Springer, Dordrecht. Retrieved August 21, 2020, from https://link.springer.com/chapter/10.1007%2F978-94-009-1071-3_1.

Suskie, L. (2018). Assessing Student Learning: A Common Sense Guide. USA: Jossey Bass. University of Washington-Office of Educational Assessment (2020).Understanding item analysis. Retrieved September 5, 2020, from https://www.washington.edu/assessment/scanning-scoring/scoring/reports/item-analysis/.

US Department of Labor Employment and Training Administration. (1999). Understanding test

quality-concepts of reliability and validity. Retrieved September 4, 2020, from https://hr- guide.com/Testing_and_Assessment/Reliability_and_Validity.htm#:~:text=As%20a%20general%20rule%2C%20the, typical%20for%20a%20single%20test.