comprehensive material for measurement and evaluation

Upload: mary-child

Post on 14-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    1/54

    TCP.TIP_Rungduin Page 1

    Technological Institute of the Philippines

    College of Education

    Center for Teaching Excellence

    Teaching Certificate Program

    Introduction to Measurement and Evaluation

    Discussion Point 1:

    Introduction to Measurement and Evaluation

    The Necessity of EvaluThe Necessity of EvaluThe Necessity of EvaluThe Necessity of Evaluation in Teachingation in Teachingation in Teachingation in Teaching

    To teach without evaluation is a contradiction in terms.

    By its very nature, teaching requires innumerable judgments to be made by the

    teacher, the school administrators, parents and the pupils themselves.

    Teachers are obligated to assemble, analyze, and utilize whatever evidence can be

    brought forward to make the most effective decisions (evaluations) for the benefit of

    the students in their classes. Among these decisions are the following:1. The nature of the subject matter that should be taught at each grade level;2. Which aspects of the curriculum need to be eliminated, modified or included

    as a function of the current level of student knowledge and attitudes;

    3. How instruction can be improves to ensure that students learn;4. How pupils should be organized within the classroom to maximize learning;5. How teachers can tell if students are able to retain knowledge;6. Which students are in need of remedial or advanced work;7. Which students will benefit from placement in special programs for the

    mentally retarded, emotionally disturbed, or physically handicapped;

    8. Which children should be referred to the school counselor, psychologist,speech therapist, nurse or social worker; and

    9. How each pupils progress can be explained most clearly and effectively.The Relationship between Teaching and EvaluationThe Relationship between Teaching and EvaluationThe Relationship between Teaching and EvaluationThe Relationship between Teaching and Evaluation

    The purpose of teaching is to improve the knowledge, behaviors, and attitudes of

    students.

    Teachers want students to increase the amount of knowledge they possess and to

    decrease the amount of forgetting.

    Teaching consists of at least four interrelated elements (Glaser and DeCecco,

    1968):

    1. Developing instructional objectivesTeachers need to know what they are attempting to accomplish and

    cannot leave such matters to chance.Students improve when they make progress toward clearly defined

    objectives.

    Clearly defined instructional objectives serve at least two roles:

    a. Help the teacher recognize student improvement by clarifying whatit is the teacher wants to accomplish

    b. Instructional goals imply the way in which the goals will beevaluated

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    2/54

    TCP.TIP_Rungduin Page 2

    2. Evaluating the students entering behaviorIndividual differences (academic achievement, sexual preference, social

    class, notes from previous teachers, former school or former location ,

    physical characteristics, knowledge of older brother or sister, and family

    background)

    Teaching methods are effective only if they are considered in relationship

    to the background of the student.

    3. Selecting an instructional strategyIf student background is important in selecting an instructional strategy,

    the teacher will have to become familiar with those procedures used to

    measure and evaluate those backgrounds.

    4. Providing for an evaluation of the students performancePerformance assessment may suggest that a program is ineffective

    because the objectives are unrealistic or because the entering behavior

    was not considered adequately.

    Evaluation can determine whether instructional objectives have been

    met; it provides evidence that students have the necessary entering

    behavior and it helps to evaluate the adequacy of an instructionalstrategy.

    Test an instrument or systematic procedure for measuring a sample of behavior. (Answer

    the question How well does the individual perform either in comparison with others or in

    comparison with a domain of performance task?)

    Measurement the process of obtaining a numerical description of degree to which an

    individual possesses a particular characteristic. (Answer the question How much?)

    Evaluation the systematic process of collecting, analyzing and interpreting information to

    determine the extent to which pupils are achieving instructional objectives. (Answer thequestion How good?)

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    3/54

    TCP.TIP_Rungduin Page 3

    MeasurementMeasurementMeasurementMeasurement

    Measurement involves the assigning of numbers to attributes or characteristics of

    persons, objects, or events according to explicit formulations or rules.

    Educational measurement requires the quantification of attributes according to

    specified rules.

    Characteristics of Scales of Measurement

    Scale Definition Uses and Examples Limitations

    LeastComplex

    Nominal Scale involving the

    classification of objects,

    persons, or events into

    discrete categories

    Plate numbers, Social

    Security numbers,

    names of people,

    places, objects,

    numbers to identify

    athletes

    Cannot specify

    quantitative differences

    among categories

    Ordinal Scale involving ranking

    of objects, persons,traits or abilities without

    regard to equality of

    differences

    Letter grades (ratings

    from excellent tofailing), military ranks,

    order of finishing a test

    Restricted to specifying

    relative differenceswithout regard to

    absolute amount of

    difference

    MostComplex

    Interval Scale having equal

    differences between

    successive categories

    Temperature, Grades,

    Scores

    Ratios are meaningless,

    the zero point is

    arbitrarily defined

    Ratio Scale having an

    absolute zero and equal

    intervals.

    Distance, weight, time

    required to learn a skill

    or subject

    None except that few

    educational variables

    have ratio

    characteristics

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    4/54

    TCP.TIP_Rungduin Page 4

    Testing

    A test may be defined as a task or series of tasks used to obtain systematic

    observations presumed to be representative of educational or psychological traits

    and or attributes.

    Typically tests require examinees to respond to items or tasks from which the

    examiner infers something about the attribute being measured.Tests and other measurement instruments serve a variety of purposes:

    1. Selection. To determine which persons will be admitted to or deniedadmittance to an institution or organization.

    2. Placement. To help individuals determine which of several programs they willpursue.

    3. Diagnosis and remediation. To help discover the nature of the specificproblems individuals may have.

    4. Feedback5. Motivation of guidance and learning6. Program and curriculum improvement7. Theory development

    Tests may be classified by how they are administered (individually or in groups), how

    they are scored (objectively or subjectively), what sort of response they emphasize

    (power or speed), what type of response subjects make (performance or pencil-and-

    paper), what they attempt to measure (sample or sign), and the nature of groups

    being compared (teacher-made or standardized)

    1. Individual and group testsSome tests are administered on a one-to-one basis during careful oral

    questioning (e.g., individual intelligence tests), whereas others can be

    administered to a group of individuals.

    2. Objective and subjective testsAn objective test is one on which equally competent scorers will obtain

    the same scores (e.g., multiple-choice tests), whereas subjective test isone on which the scores are influenced by the opinion or judgment of the

    person doing the scoring (e.g., essay tests).

    3. Power and speed testsA speed test measures the number of items that an individual can

    complete in a given time, whereas a power tests measures the level of

    performance under ample time conditions. Power tests items usually are

    arrange in order of increasing difficulty.

    Relationship between power and speed tests

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    5/54

    TCP.TIP_Rungduin Page 5

    PowerPowerPowerPower SpeedSpeedSpeedSpeed

    TimeTimeTimeTime Generous Limited

    Partially speeded

    DifficultyDifficultyDifficultyDifficulty Relatively hard Relatively easy

    4. Performance and paper-and-pencil testsPerformance tests require examinees to perform a task rather than

    answer questions. They are usually administered individually so that the

    examiner can count the number of errors committed by the student and

    can measure how long each tasks takes.

    Pencil-and-paper tests are almost always given in group situation in which

    students are asked to write their answers on paper.

    5. Sample and sign testsSample of a students total behavior

    Sign tests are administered to distinguish one group of individuals from

    another.

    6. Teacher-made and standardized testsTeacher made tests are constructed by teachers for use within their own

    classrooms. Their effectiveness depends on the skill of the teacher and

    hi or her knowledge of test construction.

    Standardized tests are constructed by test specialists working with

    curriculum experts and teachers. They are standardized in that they have

    been administered and scored under standard and uniform conditions so

    that results from different classes and different schools may be compared

    7. Mastery and survey testsSome achievement tests measure the degree of mastery of a limited set

    of specific learning outcomes, whereas others measure pupils general

    level of achievement over broad range outcomes.8. Supply and Selection Tests

    Some tests require examinees to supply the answer (e.g., essay tests),

    whereas others require them to select the correct response from the set

    of alternatives (e.g., multiple-choice tests).

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    6/54

    TCP.TIP_Rungduin Page 6

    Evaluation

    Evaluation is a process through which a value judgment or decision is made from a

    variety of observations and from the background and training of the evaluator.

    General Principles of Evaluation

    1. Determining and clarifying what is to be evaluated always has priority in theevaluation process.

    2. Evaluation techniques should be selected according to the purpose to beserved.

    3. Comprehensive evaluation requires a variety of evaluation techniques.4. Proper use of evaluation techniques requires an awareness of both their

    limitations and their strengths.

    5. Evaluation is a means to an end, not an end in itself.Reasons for Using Tests and Other Measurements

    Basis for

    Classification

    Type of Evaluation Function of Evaluation Illustrative Instruments

    Nature ofMeasurement

    Maximumperformance

    Determines what individuals can

    do when performing at their best

    Aptitude tests, achievement tests

    Typical performance

    Determines what individuals will

    do under natural conditions

    Attitude, interest and personality

    inventories; observational

    techniques; peer appraisal

    Use in Classroom

    Instruction

    Placement

    Determines prerequisite skills,

    degree of mastery of course

    objectives, and/or best mode of

    learning

    Readiness tests, aptitude tests,

    pretests on course objectives, self

    report inventories, observational

    techniques

    Formative

    Determine learning progress,

    provides feedback to reinforce

    learning, and correct learning

    errors

    Teacher-made mastery tests,

    custom-made tests from test

    publishers, observational

    techniques

    DiagnosticDetermines causes (intellectual,physical, emotional, environmental)

    of persistent learning difficulties

    Published diagnostic tests, teacher-made diagnostic tests,

    observational techniques

    Summative

    Determines end-of-course

    achievement for assigning grades

    or certifying mastery of objectives

    Teacher-made survey tests,

    performance rating scales, product

    scales

    Method of

    Interpreting

    Results

    Criterion referenced

    Describes pupil performance

    according to specified domain of

    clearly defined learning tasks (e.g.,

    adds single digit whole numbers).

    Teacher-made mastery tests,

    custom-made tests from test

    publisher, observational techniques

    Norm referenced

    Describes pupil performance

    according to relative position in

    some known group (e.g., ranks

    tenth in a classroom group of 30).

    Standardized aptitude and

    achievement tests, teacher-made

    survey tests, interest inventories,

    adjustment inventories

    Motivation and Guidance of Learning

    Tests can be used to motivate and guide students to learn, and because pupils study

    for the type of examination they expect to take, it is the teachers responsibility to

    construct examinations that measure important course objectives.

    Program and Curriculum Improvement: Formative and Summative Evaluations

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    7/54

    TCP.TIP_Rungduin Page 7

    1. Formative EvaluationFormative EvaluationFormative EvaluationFormative Evaluation. Formative evaluation is used to monitor learningprogress during instruction and provide continuous feedback to both pupil

    and teacher concerning learning success and failures.

    2. Summative EvaluationSummative EvaluationSummative EvaluationSummative Evaluation. Summative evaluation typically comes at the end of acourse (or unit) of instruction. It is designed to determine the extent to which

    the instructional objectives have been achieved and is used primarily for

    assigning course grades or certifying pupil mastery of the intended learning

    outcomes.

    NormNormNormNorm----Referenced and CriterReferenced and CriterReferenced and CriterReferenced and Criterionionionion----Referenced MeasurementReferenced MeasurementReferenced MeasurementReferenced Measurement

    Evaluation procedures can also be classified according to how the results are

    interpreted. There are two basic ways of interpreting pupil performance on tests and other

    evaluation instruments. One is to describe the performance in terms of the relative position

    held in some known group (e.g., typed better than 90 percent of the class members). The

    other is to directly describe the specific performance that was demonstrated (e.g., typed 40

    words per minute without error). The first type of interpretation is called norm referenced;

    the second criterion referenced. Both types of interpretation are useful.

    Some Basic Terminologies

    1. Norm-referenced test a test designated to provide a measure of performance thatis interpretable in terms of an individuals relative standing in some known group.

    2. Criterion-referenced test a test designated to provide a measure of performancethat is interpretable in terms of a clearly defined and delimited domain of learning

    tasks.

    3. Objective-referenced test a test designated to provide a measure of performancethat is interpretable in terms of a specific instructional objective. (Many objective-

    referenced test are called criterion-referenced tests by their developers).

    Other terms that are less often used have meanings similar to criterion referenced:

    content referenced, domain referenced, and universe referenced.

    Comparison of Norm-Referenced Tests (NRTs) and Criterion-Referenced Tests (CRTs)

    Common Characteristics of NRTs and CRTs

    1. Both require specification of the achievement domain to be measured.2. Both require a relevant and representative sample of test items.3. Both use the same types of tests items.4. Both use the same rules fir item writing (except for item difficulty).5. Both are judge by the same qualities of goodness (validity and reliability).6. Both are useful in educational measurement.

    Differences Between NRTs and CRTs (but it is only matter of emphasis)

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    8/54

    TCP.TIP_Rungduin Page 8

    1. NRT typically covers a large domain of learning tasks, with just a few itemsmeasuring each specific tasks

    CRT typically focuses on a delimited domain of learning tasks, with a relatively large

    number of items measuring each specific task.

    2. NRT emphasizes discrimination among individuals in terms of relative level oflearning

    CRT emphasizes description of what learning tasks individuals can and cannot

    perform

    3. NRT favors items of average difficulty and typically omits easy items.CRT matches item difficulty to learning tasks, without altering item difficulty or

    omitting easy items

    4. NRT used primarily (but not exclusively) for survey testingCRT used primarily (but not exclusively) for mastery testing

    5. NRT interpretation requires a clearly defined groupCRT interpretation requires a clearly defined and delimited achievement domain

    Strictly speaking, norm reference and criterion reference refer only to the method of

    interpreting the results. These types of interpretation are likely to be most meaningful anduseful, however, when tests (and other evaluation instruments) are specially designated for

    the type of interpretation to be made. Thus, we can use the terms criterion referenced and

    norm referenced as broad categories for classifying tests and other evaluation techniques.

    Tests that are specifically built to maximize each type of interpretation have much in

    common, and it is impossible to determine to determine the type of test from examining the

    test itself. Rather, it is in the construction and use of the tests that the differences can be

    noted. A key feature in constructing norm-referenced tests is the selection of items of

    average difficulty and the elimination of item that all pupils are likely to answer correctly.

    This procedure provides a wide spread of scores so that discrimination among pupils at

    various levels of achievement can be more reliably made. This is useful for decisions basedon relative achievement, such as selection, grouping and relative grading. In contrast, a key

    feature in constructing criterion-referenced tests is the selection of items that are directly

    relevant to the learning outcomes to be measured, without regard to the items ability to

    discriminate among pupils. If the learning tasks are easy, the test items will be easy, and if

    the learning tasks are difficult, the test items will be difficult. Here the main purpose is to

    describe the specific knowledge and skills that each pupil can demonstrate, which is useful

    for planning both group and individual instruction.

    Norm-Referenced Test Combined Type Criterion-Referenced Test

    Discrimination Dual Description

    Among Pupils Interpretation of Performance

    Other Descriptive Terms

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    9/54

    TCP.TIP_Rungduin Page 9

    Some of the other terms that are frequently used in describing tests are presented

    here as contrasting test types, but some are simply the ends of a continuum (e.g., speed

    versus power tests).

    1. Informal Versus Standardized TestsInformal Versus Standardized TestsInformal Versus Standardized TestsInformal Versus Standardized Tests. Informal tests are those constructed byclassroom teachers, whereas those designated by test specialists and administered,

    scored, and interpreted under standard conditions are called standardized tests.

    2. Individual Versus Group TestsIndividual Versus Group TestsIndividual Versus Group TestsIndividual Versus Group Tests. Some tests are administered on a one-to-one basisduring careful oral questioning (e.g., individual intelligence tests), whereas others can

    be administered to a group of individuals.

    3. Mastery Versus SuMastery Versus SuMastery Versus SuMastery Versus Survey Testsrvey Testsrvey Testsrvey Tests. Some achievement tests measure the degree ofmastery of a limited set of specific learning outcomes, whereas others measure

    pupils general level of achievement over broad range outcomes. Mastery tests are

    typically criterion referenced, and survey tests tend to be norm referenced, but some

    criterion-referenced interpretations are also possible with carefully prepared surveytests.

    4. Supply Versus Selection TestsSupply Versus Selection TestsSupply Versus Selection TestsSupply Versus Selection Tests. Some tests require examinees to supply the answer(e.g., essay tests), whereas others require them to select the correct response from

    the set of alternatives (e.g., multiple-choice tests).

    5. Speed Versus Power TestsSpeed Versus Power TestsSpeed Versus Power TestsSpeed Versus Power Tests. A speed test measures the number of items that anindividual can complete in a given time, whereas a power tests measures the level of

    performance under ample time conditions. Power tests items usually are arrange in

    order of increasing difficulty.

    6. Objective Versus Subjective TestsObjective Versus Subjective TestsObjective Versus Subjective TestsObjective Versus Subjective Tests. An objective test is one on which equallycompetent scorers will obtain the same scores (e.g., multiple-choice tests), whereas

    subjective test is one on which the scores are influenced by the opinion or judgment

    of the person doing the scoring (e.g., essay tests).

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    10/54

    TCP.TIP_Rungduin Page 10

    Discussion Point 2:

    Preparing Instructional Objectives

    Instructional objectives play a key role in the instructional process. When properly

    stated, they serve as guides for both teaching and evaluation. A clear description of

    the intended outcomes of instruction aids in selecting relevant materials and

    methods of instruction, in monitoring pupil learning progress, in selecting or

    constructing appropriate evaluation procedures, and in conveying instructional intent

    to others.

    In preparing instructional objectives, it is possible to focus on different aspects of

    instruction.

    Educational goal is a general aim or purpose of education that is stated as a broad,

    long range outcome to work toward. Goals are used primarily in policy making and

    general program planning (e.g. Develop proficiency in the basic skills of reading,

    writing and arithmetic.)

    General Instructional Objective is an intended outcome of instruction that has been

    stated in general enough terms to encompass a set of specific learning outcomes

    (e.g., Comprehends the literal meaning of written material).Specific Learning Outcome is an intended outcome of instruction that has been

    stated in terms of specific and observable pupil performance (e.g. Identifies details

    that are explicitly stated in a passage.) A set of specific learning outcomes describes

    a sample of the types of performance that learners will be able to exhibit when they

    have achieved a general instructional objective (also called Specific Objectives,

    Performance Objectives, Behavioral Objectives, and Measurable Objectives).

    Pupil Performance is any measurable or observable pupil response in the cognitive,

    affective, or psychomotor area that is a result of learning.

    Dimensions of Instructional Objectives

    1. Mastery vs. Developmental OutcomesMastery objectives are typically concerned with relatively simple

    knowledge and skill outcomes (adds two single-digit numbers with sums of

    ten or less).

    Developmental outcomes are concerned with objectives that can never be

    fully achieved. Varying degrees of pupil progress along a continuum of

    development.

    2. Ultimate vs. Immediate ObjectivesUltimate objectives are concerned with those concerned with the typical

    performance of individuals in the actual situations they will face in the

    future. Example, good citizenship is reflected in adult life through voting

    behavior, interest in community affairs, and the like; safety consciousnessshows up in safe driving and safe work habits and in obeying safety rules

    in daily activities.

    Immediate objectives should be closely related to ultimate situation. For

    example, can pupils apply basic skills to practical situations? Such

    objectives, calling for the application of knowledge and skills, aid in the

    transfer of skills to ultimate situations and should be on any list of

    objectives.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    11/54

    TCP.TIP_Rungduin Page 11

    3. Single-course vs. Multiple-course ObjectivesAreas Containing Multiple-Course Objectives

    Whether these areas are the shared responsibility of several teachers depends

    on the grade level and the schools goals (e.g., in some schools, every teacher

    is considered a teacher of basic skills).

    Reading Computer skills CreativityWriting Study skills Citizenship

    Speaking Library skills Health

    Selection of Instructional Objectives

    1. Types of learning outcomes to consider2. Taxonomy of educational objectives3. Use of published lists of objectives4. Review of your own teaching materials and methods

    Begin with a Simple Framework: Knowledge, Understanding, Application

    Reading

    K = Knows vocabularyU = Reads with comprehension

    A = Reads a wide variety of printed materials

    Writing

    K = Knows the mechanics of writing

    U = Understands grammatical principles in writing

    A = Writes complete sentences (paragraph, theme)

    Math

    K = Knows the number system and basic operations.

    U = Understands math concepts and processes.

    A = Solves math problems accurately and efficiently

    Criteria for Selecting Appropriate Objectives

    1. Do the objectives include all important outcomes?2. Are the objectives in harmony with the general goals of the school?3. Are the objectives in harmony with sound principles of learning?4. Are the objectives realistic in terms of the pupils abilities and the time and facilities

    available?

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    12/54

    TCP.TIP_Rungduin Page 12

    Stating the Specific Learning OutcomesStating the Specific Learning OutcomesStating the Specific Learning OutcomesStating the Specific Learning Outcomes

    1. Focus on action verbsExamples:

    a. Understands the meaning of terms.1. Defines the terms in own words2. Identifies the meaning of a term in context3. Differentiates between proper and improper usage of a term4. Distinguishes between two similar terms on the basis of meaning.5. Writes an original sentence using the term.

    b. Demonstrates skill in critical thinking1. Distinguishes between fact and opinion2. Distinguishes between relevant and irrelevant information.3. Identifies fallacious reasoning in written material.4. Identifies the limitations of given data.5. Identifies the assumptions of underlying conclusions

    2. Kept free of specific content so that they can be used in various units of study.Poor: Identifies the last ten presidents of the Psychological association

    of the Philippines.Better: Identifies important historical figures.

    Poor: Identifies the parts of the brain.

    Better: Identifies the parts of a given structure.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    13/54

    TCP.TIP_Rungduin Page 13

    Discussion Point 3:

    Achieving Different Types of Learning Outcomes

    Achieving Cognitive LearningAchieving Cognitive LearningAchieving Cognitive LearningAchieving Cognitive Learning

    Teaching Fact, Factual Information, and Knowledge

    Basic Concepts

    1. Fact something that has happened, an event or an actual state of affairs2. Factual Information information discriminated by many individuals who share the

    same cultural background and accepted as correct or appropriate.

    3. Information anything that is discriminated by an individual4. Knowledge factual information that is learned initially and then remembered

    Types of Knowledge

    1. General knowledge that applies to many different situations2. Domain-specific knowledge that pertains to a particular task or subjects3. Declarative knowledge of verbal information: facts, beliefs, opinions]4. Procedural knowledge of how a task is performed5. Conditional knowing when and why the need to use declarative and procedural

    knowledge

    Three Categories of Knowledge

    1. Knowledge of specifics isolated facts and remembered separately2. Knowledge of ways and means conventions, trends and sequences, classification

    and categories, criteria and methodology

    3. Knowledge of abstraction laws, theories and principles

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    14/54

    TCP.TIP_Rungduin Page 14

    Application of Principles in Teaching and Learning Factual Information

    Principles Applications in Classroom Situation

    1. Organizing learning materialthrough meaningful association

    facilitates acquisition of

    information

    Group itemso According to common attributeso By relationships

    2. Transition from old to newmaterials facilitates acquisition of

    information

    Organize the material to a higher level ofgenerality

    Use advanced organizer that is moregeneral, abstract, inclusive and

    comparative

    Utilize related prior knowledge of studentsas advance organizers

    3. Proper sequencing of materialsfacilitates acquisition of

    information

    Order subject mattero According to regularity of the

    structure

    o According to the responsesavailable to the learner

    o According to similarity of differentstimuli

    4. Appropriate practice facilitatesacquisition of information

    Provide adequate practice througho Use of knowledge in situationo Relationshipo Distribution of sessiono Review of small amount of material,

    then at increasingly larger interval

    Reinforce practice through confirmationof correct responses

    5. Independent evaluation facilitatesacquisition of information

    Provide mechanism for learners toevaluate their own responses.

    Teaching Concepts and PrinciplesTeaching Concepts and PrinciplesTeaching Concepts and PrinciplesTeaching Concepts and Principles

    Basic Concepts of Concept and Principles

    1. Concept essentially an idea or an understanding of what something is A category used to group similar events, ideas, objects or people Organized information about the properties of one or more things

    2. Principles relationship among two or more conceptsClassification of Principles

    1. Cause and effect if-then relationship2. Probability prediction on actual sense3. Correlation prediction based on a wide range of phenomena4. Axioms rules

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    15/54

    TCP.TIP_Rungduin Page 15

    Instances of Concepts

    1. Positive specific example of concepts2. Negative non-example of a concept

    Attributes of Concept

    1. Learnability some concepts are readily learned than others2. Usability some concepts can be used more than others3. Validity the extent to which experts agree on its attributes4. Generality the higher the concept, the more general it is5. Power the extent to which concept facilitates learning of other concepts6. Structure internally consistent organization7. Instance perceptibility the extent to which concepts can be sensed8. Instance numerousness number ranging from one to infinite number

    Four Levels of Concept Attainment

    1. Concrete2. Identity3. Classificatory4. Formal

    Four Components in Any Concept Development

    1. Name of concept2. Definition3. Relevant and irrelevant attributes4. Examples and non-examples

    Simple Procedures in Concept Analysis

    1. State attributes and non-attributes2. Give example and non-examples3. Indicate relationships of a concept to other concept4. Identify the principles in which concept is used5. Use concept in solving problems

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    16/54

    TCP.TIP_Rungduin Page 16

    Application of Principles in Teaching and Learning of Concepts and Principle

    Principles Applications in Classroom Situation

    1. Awareness of attributes facilitates concept

    learning

    Manage instruction by

    Guiding learners to identify the critical attributes Using examples and non-examples which the

    attribute of the concept can be identified

    Utilizing activities where instances of theconcept can be directly observe

    Providing for over generalization and undergeneralizing to establish the limits of the

    concept

    Providing the right amount of variation andrepetition

    Varying irrelevant dimensions so that therelevant dimensions may be identified easily

    2. Correct language for concepts facilitates

    concept learning

    Teach relevant names and labels associated with

    Concept Attributes

    3. Proper sequencing of instances facilitates

    concept learning

    Concept development should proceed

    From simple to complex example From concrete to abstract From parts to whole From whole to parts

    Present concepts of

    Larger number of instances of one concept High dominance than those of low dominance Positive and negative instances of the concept

    rather than all positive or all negative instances

    Present instances of concept simultaneously rather than

    successively

    4. Guided student discovery facilitates concept

    learning

    Guided students discovery of concept through

    Encounter with real and meaningful problems Gathering accurate information A responsive environment Prompt and accurate feedback

    5. Concept application facilitates concept

    learning

    Conduct meaningful applications of concepts by

    Drawing on the learners experiences Observing related situations Encountering life-like situations

    6. Independent evaluation facilitates concept

    learning

    Arrange for independent evaluation by

    Creating an attitude of seeking and searching Arranging for self-evaluation of the adequacy of

    ones concept

    Assisting learners to evaluate their conceptsand their methods of evaluating them.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    17/54

    TCP.TIP_Rungduin Page 17

    Developing Problem Solving AbilitiesDeveloping Problem Solving AbilitiesDeveloping Problem Solving AbilitiesDeveloping Problem Solving Abilities

    Basic Concepts

    1. Problem felt difficulty or a question for which a solution may be found only by aprocess of thinking

    2. Thinking the recall and reorganization of facts and theories that occur at a timewhen the individual is face4d with obstacles and problems3. Reasoning productive thinking in which previous experiences are organized or

    combined in new ways to solve problems

    4. Problem solving creating new solutions to remove felt difficultySteps in Problem Solving

    1. Felt need2. Recognizing a problem situation3. Gathering data4. Evaluating the possible solution5. Testing and verification6. Making generalization or conclusion

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    18/54

    TCP.TIP_Rungduin Page 18

    Principles for Developing Problem Solving Abilities and their Applications in Classroom

    Situations

    Principles Applications in classroom

    1. Recognizing difficulties in a situation

    facilitates problem solving

    Assist students to

    Identify solvable significant problems State problems themselves

    2. Delimiting the problem facilitates problem

    solving

    Guide students in

    Analyzing the situation related to the problem Determining problems of immediate concern Delimiting the problem Stating problems with opportunity for securing

    progress towards a solution

    Deciding the form in which the solutions mightappear

    Using information processing skill of selectiveattention

    3. Using new methods for arriving at a

    conclusion facilitates problem solving

    Help students in

    Locating needed information Acquiring the necessary background

    information, concepts, principles for dealingwith the problem

    Developing their minimum reference list Identify various sources of information Drawing information from their own experiences Deciding on a uniform system for writing

    bibliography

    4. Generalizing possible solutions through

    applying knowledge and methods to the

    problem situation facilitates problem solving

    Lead students to generate solutions through

    Brainstorming session Processing information Analyzing information in terms of the larger

    problems

    Incorporating diverse information Eliminating overlapping and discrepancies5. problem solutions through inferring and

    testing hypothesis facilitates problem solving

    Develop the skills of the students in

    Drawing hypotheses Stating hypotheses Testing hypotheses

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    19/54

    TCP.TIP_Rungduin Page 19

    Developing CreativityDeveloping CreativityDeveloping CreativityDeveloping Creativity

    Basic Concepts

    1. Restructuring conceiving of a problem in a new or different way2. Incubation unconscious work toward a solution while one is away from the problem3. Divergent thinking coming up with many possible solutions4. Convergent narrowing possibilities to the single answer5. Creativity occurrence of uncommon or unusual but appropriate responses:

    imaginative, original thinking

    Characteristics of a Creative Individual

    1. Has a high degree of intellectual capacity2. Genuinely values intellectual matter3. Values own independence and autonomy4. Verbally fluent5. Enjoys aesthetics impressions6. Is productive7. Is concerned with philosophical problems8. Has high aspiration level of self9. Has a wide range of interests10.Thinks in unusual way11.Is an interesting, arresting person12.Appears straight forward, candid13.Behaves in ethically consistent manner

    Principles for Developing Creativity and their Applications in Classroom Teaching

    Principles Applications in Classroom Teaching

    1. Production of novel forms of ideas

    through expressing oneself by figural,verbal and physical means facilitates

    development of creativity

    Model creative behaviors such as

    Curiosity Inquiry Divergent production

    Provide opportunities for

    Expression in many media: language,rhythm, music and art

    Divergent production through figural, verbaland physical means

    Creative processes Valuing creative achievement Production of ideas that cannot be scored

    right or wrong

    Develop a continuing program for developingcreative abilities

    2. Associating success in creative efforts

    with high level of creative experience

    facilitates the development of creativity

    Respect Unusual questions Imaginative creative ideas Reward Creative efforts Unique productions

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    20/54

    TCP.TIP_Rungduin Page 20

    Achieving Psychomotor LearningAchieving Psychomotor LearningAchieving Psychomotor LearningAchieving Psychomotor Learning

    Basic Concepts

    1. Capacity individuals potential power to do certain task2. Ability actual power to perform an act physically and mentally3. Skill level of proficiency attained in carrying out sequences of action in a consistentway

    Characteristics of Skilled Performances

    1. Less attention to the specific movement (voluntary to involuntary)2. Better differentiation of cues3. More rapid feedback and correction of movements4. Greater speed and coordination5. Greater stability under a variety of environmental conditions

    Phases of Motor Skills Learning

    1. Cognitive phase understanding the task2. Organizing phase associating responses with particular cues and integrating

    responses

    3. Perfecting phase executing performance in automatic fashion

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    21/54

    TCP.TIP_Rungduin Page 21

    Application of Principles in Developing Psychomotor Skills in Classroom Teaching

    Principles Application in Classroom Teaching

    1. Attending to the characteristicsof the skill and assessing ones

    own related abilities facilitate

    motor skill learning

    Analyze the psychomotor skills in terms of the

    learners abilities and development level

    To determine the specific abilitiesnecessary to perform it To arrange the component abilities in order

    To help students master them2. Observing and imitating a model

    facilitates initial learning of skills

    and movements

    Demonstrate and describe the

    Entire procedure for advance organizer Correct component of motor abilities Links of the motor chain in sequence Skill again step by step

    3. Guiding initial responsesverbally and physically facilitates

    learning of motor skills

    Provide verbal guidance to

    Give learners a feeling of security Direct attention to more adequate

    techniques

    Promote insight into the factors related tosuccessful performance of task

    Provide physical guidance to

    Facilitate in making correct response4sinitially

    Correct immediately wrong responses4. Practicing under desirable

    conditions facilitates the

    learning of skills through

    eliminating errors and

    strengthening and refining

    correct responses and form

    Conduct practice of skills

    Close to actual conditions where the skillwill be used

    From whole to part arrangement Through repetitive drills in the same

    materials By distributed rather than mass practice With interval of rest long enough to

    overcome fatigue but not too long that

    forgetting occurs

    5. Knowledge of results facilitatesskill learning

    Provide informational feedback on

    Correct and incorrect responses Adequate and inadequate responses Correct6 or incorrect verbal remarks

    Feedback may be secured from

    Verbal analysis Chart analysis Tape performance

    6. Evaluating ones ownperformance facilitates mastery

    of skills

    Self evaluation of learners performance through

    Discussion Analysis Assessment

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    22/54

    TCP.TIP_Rungduin Page 22

    Achieving Affective LearningAchieving Affective LearningAchieving Affective LearningAchieving Affective Learning

    Developing Attitudes and Values

    1. Affective pertains to emotions or feelings rather than thought2. Affective learning consists of responses acquired as one evaluates the meaning of

    an idea, object, person, or event in terms of his view of the world.

    Main Elements

    1. Taste like or dislike of a particular animal, color, or flavor2. Attitudes learned, emotionally toned predisposition to react in a consistent way,

    favorable or unfavorable toward a person, object or idea

    3. Values inner core belief, and internalized standards as norm of behaviorDefining Attributes of Attitudes

    1. Learnability all attitudes are learned2. Stability learned attitudes become stronger and enduring3.

    Personal-societal significance attitudes are of high importance to the individual andsociety

    4. Affective-cognitive contents attitudes have both factual information and emotionsassociated with an object

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    23/54

    TCP.TIP_Rungduin Page 23

    Application of Principles in Developing Attitudes and Values in Classroom Teaching

    Principles Application in Classroom Teaching

    1. Recognizing an attitudefacilitates its initial learning

    Guides students in

    Identifying the attitudes and values todeveloped

    Defining the terminal behavior expected ofthem

    2. Observing and imitating a modelfacilitates initial attitude learning

    Teachers provides

    Different types of exemplary models Opportunities to examine carefully

    instructional materials in terms of attitudes

    and values presented

    Teacher sets good example

    3. Positive attitudes toward aperson, event or object

    facilitates affective learning

    Provide for pleasant and positive emotional

    experiences by

    Showing warmth and enthusiasm towardstudents

    Keeping personal prejudices under control Allowing students to express ones own

    value commitments

    Demonstrating interest in subject matter Making possible for each student to

    experience success

    4. Getting information aboutperson, event, or object

    influences initial attitude

    learning and later commitment

    to group held attitudes

    Guide learners to extend their informative

    experiences by

    Undergoing direct experiences Listening to group lectures and discussions Engaging in extensive reading Participating in related activities

    5. Interacting in primary groupsinfluences initial attitude

    learning, later commitment to

    group held attitude

    Facilitate interacting in primary groups through

    Group planning Group discussion Group decision making Role-playing

    6. Practicing an attitude facilitatesstable organization

    Practice context should

    Regard the teacher as an exemplary modelmanifesting interest in the students

    Be characterized by positive climate Confirm learner responses with positive

    remarks, approving nod, and smile7. Purposeful learning facilitates

    effective attitude acquisition and

    modification

    Guide learners to engage in independent attitude

    cultivation through

    Providing opportunities for them to thinkabout their own attitudes

    Writing about open-ended themes

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    24/54

    TCP.TIP_Rungduin Page 24

    Discussion Point 4:

    Test Construction, Reliability and Validity

    STEP I: CONTENT VALIDATON

    It is the degree to which the test represents the essence, the topics, and theareas that the test is designed to measure.

    Considered the most crucial procedure in the test construction process becausecontent validity sets the pace for the succeeding validity and reliability measures.

    1.1Documentary analysis or preDocumentary analysis or preDocumentary analysis or preDocumentary analysis or pre----survey.survey.survey.survey. At this stage, one must have familiarizedhim/herself with the theoretical constructs directly related to the test one

    planning.

    1.2Development of a Table of Specification.Development of a Table of Specification.Development of a Table of Specification.Development of a Table of Specification. Determining the areas or concepts thatwill represent the nature of the variable being measured and the relative

    emphasis of each area are essentially judgmental.

    Detailed TS includes areas or concepts, objectives, number of items, and the

    percentage or proportion of items in each area.

    It is advisable to make 50 to 100 percent allowance in the construction of items.

    Sample

    Table of Specification (first draft) for Introduction to Psychology Unit Exam

    AREAS

    LEARNING

    OBJECTIVES

    NUMBER

    OF

    ITEMS

    PLACEMENT OF ITEMS PERCENTAGE

    K C A A S E

    I. History

    of

    Psychology

    151,6,13,14,22,23,24,32,33,

    49,50,51,59,65,6521.43 %

    II.Branches

    of

    Psychology

    157,12,15,16,20,21,31,34,

    47,48,52,58,59,61,7021.43 %

    III.

    Schools of

    Psychology

    20

    3,4,5,17,19,25,26,30,35,

    36,39,40,44,46,53,54,60,

    62,67,69

    28.57 %

    IV.

    Research

    Methods

    20

    2,8,9,10,11,18,27,28,29,

    37,38,41,42,43,45,55,56,

    63,64,68

    28.57 %

    Total 70 100 %

    Building a Table of Specifications:

    1. Obtaining a list of Instructional Objectives2. Outlining the Course Content3. Preparing a Two-way chart

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    25/54

    TCP.TIP_Rungduin Page 25

    Table of Specifications for a Summative Third-Grade Social Studies Test

    1.3Consultation with expertsConsultation with expertsConsultation with expertsConsultation with experts. At this point it is advisable to consult with your thesisadviser or with some authorities that have the expertise in making judgment

    about the representativeness or relevance of the entries made in your TS.

    1.4Item writingItem writingItem writingItem writing. At this stage you should know what types of items you are supposedto construct: the type of instrument, format, scaling and scoring techniques.

    STEP II: FACE VALIDATIONSTEP II: FACE VALIDATIONSTEP II: FACE VALIDATIONSTEP II: FACE VALIDATION

    Face validity, the crudest type of validity, pertains to whether the test looks valid,that is, if by the face of the instrument, it looks like it can measure what you

    intend to measure. This type of validity cannot stand-alone for use especially in researches in the

    graduate level.

    2.1 Item inspection.Item inspection.Item inspection.Item inspection. Have the initial draft of the instrument inspected by a group ofevaluators thesis adviser, test construction experts, expert/professionals whose

    specialization are related to the subject matter at hand.

    ITEM / ITEM NO.ITEM / ITEM NO.ITEM / ITEM NO.ITEM / ITEM NO. SUITABLESUITABLESUITABLESUITABLENOTNOTNOTNOT

    SUITABLESUITABLESUITABLESUITABLE

    NEEDNEEDNEEDNEED

    REVISIONREVISIONREVISIONREVISION

    2.2 InterInterInterInter----judge consistencyjudge consistencyjudge consistencyjudge consistency. You may collate the data gathered from the evaluatorsfor analysis. You have to look at the agreement or consistency of judgment they

    made each of the items.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    26/54

    TCP.TIP_Rungduin Page 26

    STEP III: FIRST TRIAL RUNSTEP III: FIRST TRIAL RUNSTEP III: FIRST TRIAL RUNSTEP III: FIRST TRIAL RUN

    At this stage you must have already a stencil of your first draft as a result ofsteps 1 and 2. Try out your test to a sample that is comparable to your target

    population or of your final sample. This try out should be large enough to

    provide meaningful computations.

    STEP IV:STEP IV:STEP IV:STEP IV: ITEM ANALYSISITEM ANALYSISITEM ANALYSISITEM ANALYSIS

    Both the reliability and validity of any test depend largely on the characteristicsof the items. It says that high validity and reliability can be built into the

    instruments in advance through item analysis.

    According to Likert, item analysisitem analysisitem analysisitem analysis can be used as an objective check indetermining whether the member of the group react differentially to the battery,

    that is, item analysis indicates whether those person who fall toward one end of

    the attitude continuum on the battery do so on the particular statement and

    vice versa.

    THE UTHE UTHE UTHE U----L INDEX METHODL INDEX METHODL INDEX METHODL INDEX METHOD

    Appropriate for the test whose criterion is measured along the continuous scaleand whose individual item is scored right or wrong and negative or positive.

    Steps in using USteps in using USteps in using USteps in using U----L Index MethodL Index MethodL Index MethodL Index Method

    1. Score the test and arrange them from lowest to highest based on the total scores.Sample score from 10 item test

    n = 30

    List of scores

    Arranged scoresArranged scoresArranged scoresArranged scores

    from lowest tofrom lowest tofrom lowest tofrom lowest to

    highesthighesthighesthighest

    2 9 1 5

    3 8 2 5

    5 9 2 5

    9 5 2 6

    5 2 2 6

    4 3 2 6

    2 4 3 8

    6 4 3 8

    8 6 3 8

    8 2 3 8

    8 2 4 89 1 4 9

    3 8 4 9

    6 4 4 9

    3 5 5 9

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    27/54

    TCP.TIP_Rungduin Page 27

    2. Separate the top 27 %top 27 %top 27 %top 27 % and bottom 27 %bottom 27 %bottom 27 %bottom 27 % of the cases.the 27 % of 30 is 8.1 or 8 (30 x .27 = 8.1)

    Sample score

    from 10 itemtest

    n = 30

    Arranged scoresArranged scoresArranged scoresArranged scores

    from lowest tofrom lowest tofrom lowest tofrom lowest to

    highesthighesthighesthighest

    1 5

    2 5

    2 5

    2 6

    2 6

    2 63 8

    3 8

    3 8

    3 8

    4 8

    4 9

    4 9

    4 9

    5 9

    3. Prepare a tally sheet. Tally the number of cases from each group who got theitem right for each of the entire items. And then convert them into frequencies.

    ITEMITEMITEMITEM

    NO.NO.NO.NO.UPPER 27 %UPPER 27 %UPPER 27 %UPPER 27 % LOWER 27 %LOWER 27 %LOWER 27 %LOWER 27 %

    tally frequency tally frequency

    1111 IIIII III 8 II 2

    2222 IIIII 5 II 2

    3333 IIIII III 8 I 1

    4444 IIIII I 6 I 1

    5555 IIIII III 8 II 2

    6666 IIIII II 7 III 3

    7777 IIIII I 6 I 1

    8888 IIIII 5 I 1

    9999 IIIII III 8 II 2

    10101010 IIIII - II 7 II 2

    Bottom 27 %

    To 27 %

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    28/54

    TCP.TIP_Rungduin Page 28

    4. Compute the proportions of each case in the different item number.

    U or L =

    ITEMITEMITEMITEM

    NO.NO.NO.NO.

    UPPER 2UPPER 2UPPER 2UPPER 27 %7 %7 %7 %

    nnnn = 30= 30= 30= 30

    LOWER 27 %LOWER 27 %LOWER 27 %LOWER 27 %

    nnnn = 30= 30= 30= 30

    f p F P

    1111 8 2

    2222 5 2

    3333 8 1

    4444 6 1

    5555 8 2

    6666 7 3

    7777 6 18888 5 1

    9999 8 2

    10101010 7 2

    5. Compute the discrimination index of each item.Discrimination indexDiscrimination indexDiscrimination indexDiscrimination index refers to the degree to which an item differentiates correctly

    among test takers in the behavior that the test is designed to measure. Thus, a god test

    item separates the bright from the poor respondents.

    Ds =

    Where: DsDsDsDs is discrimination index

    PuPuPuPu is proportion of the upper 27 %

    PlPlPlPl is proportion of the lower 27 percent %

    ITEMITEMITEMITEM

    NO.NO.NO.NO.

    UPPER 27 %UPPER 27 %UPPER 27 %UPPER 27 %

    nnnn = 30= 30= 30= 30

    LOWER 27 %LOWER 27 %LOWER 27 %LOWER 27 %

    nnnn = 30= 30= 30= 30DDDDssss

    f p (Pu) f p (Pl)

    1111 8 2

    2222 5 2

    3333 8 14444 6 1

    5555 8 2

    6666 7 3

    7777 6 1

    8888 5 1

    9999 8 2

    10101010 7 2

    f

    n

    Pu Pl

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    29/54

    TCP.TIP_Rungduin Page 29

    6. Compute the difficulty index of each itemDifficulty inDifficulty inDifficulty inDifficulty indexdexdexdex is the percentage of the respondents who got the item right. It

    can also be interpreted as how easy or how difficult an item is.

    Df =

    Where: DfDfDfDf is difficulty index

    PuPuPuPu is proportion of the upper 27 %

    PlPlPlPl is proportion of the lower 27 %

    ITEMITEMITEMITEM

    NO.NO.NO.NO.

    UPPER 27 %UPPER 27 %UPPER 27 %UPPER 27 %

    nnnn = 30= 30= 30= 30

    LOWER 27 %LOWER 27 %LOWER 27 %LOWER 27 %

    nnnn = 30= 30= 30= 30DDDDssss DDDDFFFF

    F p (Pu) f p (Pl)

    1111 8 2

    2222 5 23333 8 1

    4444 6 1

    5555 8 2

    6666 7 3

    7777 6 1

    8888 5 1

    9999 8 2

    10101010 7 2

    7. Deciding whether to retain an item will be based on two ranges.ITEMITEMITEMITEM

    NO.NO.NO.NO.

    UPPER 27 %UPPER 27 %UPPER 27 %UPPER 27 %

    nnnn = 30= 30= 30= 30

    LOWER 27 %LOWER 27 %LOWER 27 %LOWER 27 %

    nnnn = 30= 30= 30= 30DDDDssss DDDDFFFF DecisionDecisionDecisionDecision

    F p f p

    1111 8 2

    2222 5 2

    3333 8 1

    4444 6 1

    5555 8 2

    6666 7 3

    7777 6 1

    8888 5 1

    9999 8 2

    10101010 7 2

    Items with difficulty indicesdifficulty indicesdifficulty indicesdifficulty indices within .20 to .80 and didididiscrimination indicesscrimination indicesscrimination indicesscrimination indices within.30 to .80 are retained.

    Pu Pl

    2

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    30/54

    TCP.TIP_Rungduin Page 30

    The Chung-the-fan item analysis table can be obtained in the discriminationindices of the items.

    .40 and above - very good item

    .30 - .39 - reasonably good item but possibly subject

    to improvement

    .20 - .29 - marginal item, usually needing

    improvement

    .19 and below - poor item, to be rejected, improved or

    revised

    Difficulty indices can interpreted as the following:.00 - .20 - very difficult

    .21 - .80 - moderately difficult

    .81 1.00 - very easy

    CRITERION OF INTERNAL CONSISTENCYCRITERION OF INTERNAL CONSISTENCYCRITERION OF INTERNAL CONSISTENCYCRITERION OF INTERNAL CONSISTENCY

    Somewhat similar to the U-L Index Method, in that two criterion groups, the highgroup and the low group, are employed to judge the discriminatory power of anitem. However, in this method, Likert recommends the use of the high 10

    percent and the low 10 percent groups.

    Steps in Criterion of Internal Consistency Method

    1. List all the scores of the respondents and get its high 10 percent and low 10percent. Write their respective scores for each item.

    sample scores from a 10-item test

    using 4-point scale

    n = 50

    High10

    %

    TEST ITEMS1111 2222 3333 4444 5555 6666 7777 8888 9999 10101010

    AAAA 4 4 4 4 4 3 4 2 4 3

    BBBB 4 2 4 2 2 4 4 2 4 3

    CCCC 4 4 4 4 4 4 4 1 3 3

    DDDD 4 4 4 4 2 4 4 2 4 4

    EEEE 3 4 4 3 4 2 2 2 4 4

    Low

    10

    %

    FFFF 4 1 1 2 3 2 3 4 3 2GGGG 4 1 2 2 4 2 2 3 3 2

    HHHH 3 2 2 1 3 3 1 4 2 2

    IIII 3 2 2 1 4 2 3 4 2 2

    JJJJ 3 2 2 2 4 1 3 3 1 2

    respondents

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    31/54

    TCP.TIP_Rungduin Page 31

    2. Get the summation of the high group and the low group.sample scores from a 10-item test

    using 4-point scale

    n = 50

    High 10 % TEST ITEMS1111 2222 3333 4444 5555 6666 7777 8888 9999 10101010

    AAAA 4 4 4 4 4 3 4 2 4 3

    BBBB 4 2 4 2 2 4 4 2 4 3

    CCCC 4 4 4 4 4 4 4 1 3 3

    DDDD 4 4 4 4 2 4 4 2 4 4

    EEEE 3 4 4 3 4 2 2 2 4 4

    Sum of high groupSum of high groupSum of high groupSum of high group

    Sum of low groupSum of low groupSum of low groupSum of low group

    Low 10 %

    FFFF 4 1 1 2 3 2 3 4 3 2

    GGGG 4 1 2 2 4 2 2 3 3 2HHHH 3 2 2 1 3 3 1 4 2 2

    IIII 3 2 2 1 4 2 3 4 2 2

    JJJJ 3 2 2 2 4 1 3 3 1 2

    3. Get the difference of the groups.sample scores from a 10-item test

    using 4-point scale

    n = 50

    High 10 %TEST ITEMS

    1111 2222 3333 4444 5555 6666 7777 8888 9999 10101010

    AAAA 4 4 4 4 4 3 4 2 4 3

    BBBB 4 2 4 2 2 4 4 2 4 3

    CCCC 4 4 4 4 4 4 4 1 3 3

    DDDD 4 4 4 4 2 4 4 2 4 4

    EEEE 3 4 4 3 4 2 2 2 4 4

    Sum of high group

    Sum of low group

    DifferenceDifferenceDifferenceDifference

    Low 10 %

    FFFF 4 1 1 2 3 2 3 4 3 2GGGG 4 1 2 2 4 2 2 3 3 2

    HHHH 3 2 2 1 3 3 1 4 2 2

    IIII 3 2 2 1 4 2 3 4 2 2

    JJJJ 3 2 2 2 4 1 3 3 1 2

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    32/54

    TCP.TIP_Rungduin Page 32

    4. To see the difference between scores, ranking will be helpful to use.sample scores from a 10-item test

    using 4-point scale

    n = 50

    High 10 % TEST ITEMS1111 2222 3333 4444 5555 6666 7777 8888 9999 10101010

    AAAA 4 4 4 4 4 3 4 2 4 3

    BBBB 4 2 4 2 2 4 4 2 4 3

    CCCC 4 4 4 4 4 4 4 1 3 3

    DDDD 4 4 4 4 2 4 4 2 4 4

    EEEE 3 4 4 3 4 2 2 2 4 4

    Sum of high group

    Sum of low group

    Difference

    RankRankRankRank

    Low 10 %FFFF 4 1 1 2 3 2 3 4 3 2

    GGGG 4 1 2 2 4 2 2 3 3 2

    HHHH 3 2 2 1 3 3 1 4 2 2

    IIII 3 2 2 1 4 2 3 4 2 2

    JJJJ 3 2 2 2 4 1 3 3 1 2

    PEARPEARPEARPEARSON PRODUCTSON PRODUCTSON PRODUCTSON PRODUCT----MOMENT CORRELATION METHODMOMENT CORRELATION METHODMOMENT CORRELATION METHODMOMENT CORRELATION METHOD

    This item analysis technique is used for tests of continuous scaling with three(3) or more scale points. There is a total score, which serves as an X criterion,

    and item score, which is the Y criterion. This is done to the entire items.

    Therefore, if the draft consists of 60 items, there should be 60 correlation

    coefficients computed.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    33/54

    TCP.TIP_Rungduin Page 33

    Steps in Pearson Product-Moment Correlation Method

    1. Find the X and Y scores.Where XXXX is the total scores of the respondents while the YYYY is the item score.

    sample scores in item no 1item no 1item no 1item no 1 in a75 item test

    n = 10

    Respondents X Y

    A 30 4

    B 43 5

    C 53 3

    D 45 4

    E 70 2

    F 45 3

    G 68 4

    H 48 5I 38 2

    J 45 4

    TOTAL 485 36

    2. Square all the X and Y scores.sample scores in item no 1item no 1item no 1item no 1 in a

    75 item test

    n = 10

    Respondents X Y X2 Y2

    A 30 4

    B 43 5

    C 53 3

    D 45 4

    E 70 2

    F 45 3

    G 68 4

    H 48 5

    I 38 2

    J 45 4

    TOTAL 485 36 24905 140

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    34/54

    TCP.TIP_Rungduin Page 34

    3. Multiply all the X and Y.sample scores in item no 1item no 1item no 1item no 1 in a

    75 item test

    n = 10

    Respondents X Y X2

    Y2

    XYA 30 4

    B 43 5

    C 53 3

    D 45 4

    E 70 2

    F 45 3

    G 68 4

    H 48 5

    I 38 2

    J 45 4

    TOTAL 485 36 24905 140

    4. Given the above data, compute the Pearson r.rxy =

    where: rxy = correlation between x and y

    x = sum of total scores

    y = sum of item scores

    xy = sum of the product of XY

    y2 = sum of the squared total scoresx2 = sum of squared total item scores

    Significant coefficient reflects good items while insignificant coefficient onesreflect poor items. Most researchers considers a coefficient of .30 and above as

    indicating good items.

    To interpret the correlation coefficient values ( r) obtained, the followingclassification may be applied:

    +.00 - + .20 = negligible correlation

    +.21 - + .40 = low or slight correlation

    +.41 - +.70 = marks or moderate correlation

    +.71 - +.90 = high relationship

    +.91 - +.99 = very high correlation

    +1.00 = perfect correlation

    nxy (x) (y)

    [nx2

    (x)2] [ny

    2 (y)

    2]

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    35/54

    TCP.TIP_Rungduin Page 35

    POINTPOINTPOINTPOINT----BISERIAL CORRELATION METHODBISERIAL CORRELATION METHODBISERIAL CORRELATION METHODBISERIAL CORRELATION METHOD

    This is applied to test with dichotomous scoring system (yes/no, right/wrong,improved/ not improved). Unlike the Pearson Product method, the Y criterion

    is scored either 1 or 0.

    USING TWO OR MORE TECHNIQUESUSING TWO OR MORE TECHNIQUESUSING TWO OR MORE TECHNIQUESUSING TWO OR MORE TECHNIQUES

    Basically it is a combination of two or more item analysis techniques. Althoughitem analysis is laborious, some researchers have adopted to play safe by going

    through this process. This is done to ensure the more accurate quantitative

    judgment.

    STEP V: SECOND TRIAL RUN OR FINAL TEST ADMINISTRATIONSTEP V: SECOND TRIAL RUN OR FINAL TEST ADMINISTRATIONSTEP V: SECOND TRIAL RUN OR FINAL TEST ADMINISTRATIONSTEP V: SECOND TRIAL RUN OR FINAL TEST ADMINISTRATION

    More often than not, the second trial run becomes the final run. This meansthat for the second trial run one may administer the draft resulting from the

    item analysis to ones final sample.

    Necessary adjustments can still be done before finally administering theinstrument to the final sample.

    STEP VI: EVALUATION OF THE TESTSTEP VI: EVALUATION OF THE TESTSTEP VI: EVALUATION OF THE TESTSTEP VI: EVALUATION OF THE TEST

    After the final run, the test can now be evaluated statistically of its final validityand reliability.

    6.1 Evaluation of reliabilityEvaluation of reliabilityEvaluation of reliabilityEvaluation of reliabilityTHE SPLIT HALF RELIABILITYTHE SPLIT HALF RELIABILITYTHE SPLIT HALF RELIABILITYTHE SPLIT HALF RELIABILITY

    The most common technique of evaluating the reliability of the half-test isthrough the odd-even split half technique. This is done by splitting the test into

    two, the odd numbered items as one, and the even numbered items as the

    other.

    Through the use Pearson Product-Moment Correlation, the reliability of the halfof the instrument can be determined. The reliability coefficient of this type is

    often called a coefficient of internal consistency.

    Through the use of Spearman-Brown Prophecy Formula, the reliability of theentire instrument can be obtained.

    r11 =

    where: r11 = reliability of the whole test

    r = reliability of the half test

    What would be the reliability of the whole test if the computed coefficient from

    the odd-even method is r = .63?

    The KudeKudeKudeKuderrrr----Richardson Formula 20Richardson Formula 20Richardson Formula 20Richardson Formula 20 can also be used in determining thereliability of the entire test and at the same time solving the problem that may

    arise in using the Spearman-Brown Prophecy Formula

    Steps in Kuder-Richardson Formula 20

    2 (r )

    1 + r

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    36/54

    TCP.TIP_Rungduin Page 36

    1. Check the test by giving 1 for every correct answer and o for every wronganswer and get its frequency

    ITEMSRESPONDENTS

    A B C D E F G H I J K L M N ffff1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 12

    2 1 1 1 1 1 1 1 1 1 1 1 1 0 0 12

    3 1 1 1 1 1 1 1 1 1 1 1 0 0 0 11

    4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 10

    5 1 1 1 1 1 1 1 1 1 1 0 0 0 0 10

    6 1 1 1 1 1 1 1 1 1 1 0 0 0 0 10

    7 1 1 1 1 1 1 1 1 1 0 0 0 0 0 9

    8 0 1 1 1 1 1 0 1 1 0 1 0 0 0 8

    9 0 1 1 1 1 1 1 1 0 0 0 0 0 0 8

    10 0 0 1 0 0 1 1 0 1 0 0 0 0 0 4

    total 7 9 10 9 9 10 9 9 9 6 4 2 0 0

    2. Find the proportion passing each item (pi) and then the proportion failingeach item (qi). pipipipi is computed by dividing the number of respondents who

    got the correct answers in the total number of respondents, while the qiqiqiqi is

    computed through subtracting 1 to the computed pi.

    pi = qi = pi 1

    ITEMSf pipipipi qiqiqiqi

    1 12

    2 12

    3 11

    4 10

    5 10

    6 10

    7 9

    8 8

    9 8

    10 4

    total

    no. of students w/ correct answer

    total number of respondent

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    37/54

    TCP.TIP_Rungduin Page 37

    3. Multiply the pi and qiITEMS

    f pi qi piqipiqipiqipiqi

    1 12

    2 123 11

    4 10

    5 10

    6 10

    7 9

    8 8

    9 8

    10 4

    total 1.9509

    4. Compute for the variance (s2) of the instrument

    xxxx =

    ssss2222 =

    5. Compute for the Kuder-Richardson Formula 20

    rrrrtttttttt =

    RESPONDENTS x (x x) (x x)2

    A 7

    B 9

    C 10

    D 9

    E 9

    F 10

    G 9

    H 9I 9

    J 6

    K 4

    L 2

    M 0

    N 0

    total

    (x x)2

    n- 1

    x

    n

    k piqi

    1

    k-1 s2

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    38/54

    TCP.TIP_Rungduin Page 38

    THE TESTTHE TESTTHE TESTTHE TEST----RETEST RELIABILITYRETEST RELIABILITYRETEST RELIABILITYRETEST RELIABILITY

    This is also called the coefficient of stability. To calculate the coefficient, thetest is administered twice to the same sample with a given time interval.

    The Pearson r is then calculated to determine the reliability of the instrument The most critical problem in this technique is determining the correct time

    interval between the two testing. Generally, twp weeks or so.

    PARALLEL FORM OF RELIABILITY OR ALTERNATE FORMPARALLEL FORM OF RELIABILITY OR ALTERNATE FORMPARALLEL FORM OF RELIABILITY OR ALTERNATE FORMPARALLEL FORM OF RELIABILITY OR ALTERNATE FORM

    The coefficient of equivalence is computed by administering two parallel orequivalent forms of the test to the same group of individuals.

    This technique is also referred to as the method of equivalent forms. The coefficient obtained from this formula is also called as the coefficient of

    equivalence.

    6.2 Evaluation of validityEvaluation of validityEvaluation of validityEvaluation of validityCRITERIONCRITERIONCRITERIONCRITERION----RELATED VALIDITYRELATED VALIDITYRELATED VALIDITYRELATED VALIDITY

    Criterion-related validity is a very common type of validity, and it is primarilystatistical. It is a correlation between a set of scores or some other predictor

    with an external measure. This external measure is called criterion.

    A correlation coefficient is then run between two sets of measurements. In actual practice, several predictors are used. Then multiple r would be

    computed between these predictors.

    The difficulty usually met in this type of validity is in selecting or judging whichcriterion should be used to validate the measure at hand.

    Also called as predictive validity.CONSTRUCONSTRUCONSTRUCONSTRUCT VALIDITYCT VALIDITYCT VALIDITYCT VALIDITY

    Construct validity is determined by investigating the psychological qualities,trait, or factors measured by a test.

    It often called as concept validity because it does really after with the highvalidity coefficient but in the theory and concept behind the test. Likewise, it

    involves discovering positive correlation between and among the

    variables/constructs that define the concept.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    39/54

    TCP.TIP_Rungduin Page 39

    Discussion Point 5:

    Constructing Objective Test Items: Multiple-Choice Form

    Objective test items are not limited to the measurement of simple learning

    outcomes The multiple-choice item can measure at both the knowledge and

    understanding levels and is also free of many of limitations of other forms of objective

    items.

    The multiple-choice item is generally recognized at the most widely applicable and usefultype of objective test item.

    More effectively measures many of the simple learning outcomes measured by the short-answer item, the true-false item, and the matching exercise.

    Measures a variety of the more complex outcomes in the knowledge, understanding andapplication areas.

    This flexibility, plus the higher quality items usually found in the multiple-choice form, hasled to its extensive use in achievement testing.

    CHARACTERISTIC OF MULTIPLE-CHOICE ITEMS

    A multipleA multipleA multipleA multiple----choice item consists of a problem and a list of suggested solutions.choice item consists of a problem and a list of suggested solutions.choice item consists of a problem and a list of suggested solutions.choice item consists of a problem and a list of suggested solutions.

    The problem maybe stated as a direct question or an incomplete statement and is calledthestemstemstemstem of the item.

    The list of suggested solutions may include words, numbers, symbols, or phrases andare called alternatives (also called choices or options).

    The pupil is typically requested to read the stem and the list of alternatives and to selectthe one correct, or best, alternative.

    The correct alternative in each item is called merely the answeransweransweranswer, and the remainingalternatives are called distractersdistractersdistractersdistracters (also called decoys or foils). These incorrect

    alternatives receive their name from their intended function to distract those pupilswho are in doubt about the correct answer.

    Whether to use a direct question or incomplete statement in the stem depends onWhether to use a direct question or incomplete statement in the stem depends onWhether to use a direct question or incomplete statement in the stem depends onWhether to use a direct question or incomplete statement in the stem depends on

    several factors.several factors.several factors.several factors.

    The direct-question form is easier to write, is more natural for the younger pupils, andis more likely to present a clearly formulated problem.

    On the other hand, the incomplete statement is more concise, and if skillfullyphrased, it too can present a well-defined problem.

    A common procedure is to start each stem as a direct question and shifting to theincomplete statement form only when the clarity of the problem can be retained and

    greater conciseness achieved.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    40/54

    TCP.TIP_Rungduin Page 40

    Examples: Direct-question form

    In which one of the following cities is the capital of Philippines?

    a. Manila

    b. Paraaque

    c. Pasay

    d. Taguig

    Incomplete sentence form

    The capital of the Philippines is in ______.

    a. Manila

    b. Paraaque

    c. Pasay

    d. Taguig

    Examples: Best-answer type

    Which one of the following factors contributed most to the selection of Manila as thecapital of the Philippines?

    a. Central location

    b. Good climate

    c. Good highways

    d. Large population

    The bestbestbestbest----answer type of multipleanswer type of multipleanswer type of multipleanswer type of multiple----choice itemchoice itemchoice itemchoice item tends to be more difficult than the

    correctcorrectcorrectcorrect----answer typeanswer typeanswer typeanswer type. This due partly to the finer discriminations called for and partly to the

    fact that such items are used to measure more complex learning outcomes. The best-

    answer type is especially useful for measuring learning outcomes that require the

    understanding, application or interpretation of factual information.

    USES OF MULTIPLEUSES OF MULTIPLEUSES OF MULTIPLEUSES OF MULTIPLE----CHOICE ITEMSCHOICE ITEMSCHOICE ITEMSCHOICE ITEMS

    The multiple-choice item is the most versatile type of test item available. It can

    measure a variety of learning outcomes from simple to complex, and it is adaptable to most

    types of subject matter content. The uses show only its function in measuring some of the

    more common learning outcomes in the knowledge, understanding and application areas.

    The measurement of more complex outcomes, using modified forms of the multiple-choice

    item.

    Measuring Knowledge OutcomesMeasuring Knowledge OutcomesMeasuring Knowledge OutcomesMeasuring Knowledge Outcomes

    Knowledge of Terminology.Knowledge of Terminology.Knowledge of Terminology.Knowledge of Terminology. A simple but basic learning outcome measured by the multiple-

    choice item is knowledge of terminology. For this purpose, pupils can be requested to show

    their knowledge of a particular term selecting a word that has the same meaning as the

    given term or by choosing a definition of the term. Special uses of a term can also be

    measured, by having pupils identify the meaning of the term when used in context.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    41/54

    TCP.TIP_Rungduin Page 41

    Knowledge of Specific Facts.Knowledge of Specific Facts.Knowledge of Specific Facts.Knowledge of Specific Facts. Another learning outcome basic to all school subjects is the

    knowledge of the specific facts. It is important in its own right, and it provides a necessary

    basis for developing understanding, thinking skills, and other complex learning outcomes.

    Multiple-choice items designated to measure specific facts can take many different forms,but questions of the who, what, when, and where variety are most common.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    42/54

    TCP.TIP_Rungduin Page 42

    Knowledge of Principles.Knowledge of Principles.Knowledge of Principles.Knowledge of Principles. Knowledge of principles is also important learning outcome in most

    school subjects. Multiple-choice item can be constructed to measure knowledge of

    principles as easily as those designated to measure knowledge of specific facts. The items

    appear a bit more difficult, but this is because principles are more complex than isolated

    facts.

    Knowledge of Methods and Procedures.Knowledge of Methods and Procedures.Knowledge of Methods and Procedures.Knowledge of Methods and Procedures. Another common learning outcome readily

    adaptable to the multiple-choice form is knowledge of methods and procedures. In some

    cases we might want to measure knowledge of procedures before we permit pupils to

    practice in particular area (e.g., laboratory procedures). In other cases, knowledge of

    methods and procedures may be important learning outcomes in their own right (e.g,

    knowledge of governmental procedures).

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    43/54

    TCP.TIP_Rungduin Page 43

    Measuring OutcomesMeasuring OutcomesMeasuring OutcomesMeasuring Outcomes aaaatttt tttthe Understanding And Application Levelshe Understanding And Application Levelshe Understanding And Application Levelshe Understanding And Application Levels

    Many teachers limit the use of multiple-choice items to the knowledge area because

    they believe that all objective-type items are restricted to the measurement of relatively

    simple learning outcomes. Although this is true of most of the other types of objective items,

    the multiplemultiplemultiplemultiple----choice item is especially adaptable to the measchoice item is especially adaptable to the measchoice item is especially adaptable to the measchoice item is especially adaptable to the measurement of more complexurement of more complexurement of more complexurement of more complex

    learning outcomeslearning outcomeslearning outcomeslearning outcomes.

    In reviewing the following items, it is important to keep in mind that such item

    measure learning outcomes beyond factual knowledge only if the applications and

    interpretations are new to the pupils. Any specific applications or interpretations of

    knowledge can, of course, be taught directly to the pupils as any other fact is taught. When

    this is done, and the test item contains the same problem situations and solutions used in

    teaching, it is obvious that the pupils can be given credit for no more than mere retention of

    factual knowledge. To measure understanding and application, an element of novelty must

    be included in the test items.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    44/54

    TCP.TIP_Rungduin Page 44

    Ability to Identify Application of Facts and Principles.Ability to Identify Application of Facts and Principles.Ability to Identify Application of Facts and Principles.Ability to Identify Application of Facts and Principles. A common method of determining

    whether pupils learning has gone beyond the mere memorization of a fact or principle is to

    ask them to identify its correct application in a situation that is new to the pupil.

    AbilityAbilityAbilityAbility tttto Interpro Interpro Interpro Interpret Causeet Causeet Causeet Cause----AndAndAndAnd----Effect Relationship.Effect Relationship.Effect Relationship.Effect Relationship. Understanding can frequently be

    measured by asking pupils to interpret various relationships among facts. One of the most

    important relationships in this regard, and one common to most subject-matter areas, is the

    cause-and-effect relationship. Understanding of such relationships can be measured by

    presenting pupils with specific cause-and-effect relationship and asking them to identify the

    reason that best accounts for it.

    Ability to Justify MetAbility to Justify MetAbility to Justify MetAbility to Justify Methods and Procedureshods and Procedureshods and Procedureshods and Procedures. Another phase of understanding important in

    various subject-matter areas is concerned with methods and procedures. A pupil might know

    the correct method or sequence of steps in carrying out procedure, without being able to

    explain why it is the best method or sequence of steps. At the understanding level we are

    interested in the pupils ability tojustify the use of a particular

    method or procedure. This can

    be measured with multiple-

    choice items by asking the pupils

    to select the best of several

    possible explanations of method

    or procedure.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    45/54

    TCP.TIP_Rungduin Page 45

    Advantages and Limitations of MultipleAdvantages and Limitations of MultipleAdvantages and Limitations of MultipleAdvantages and Limitations of Multiple----Choice ItemsChoice ItemsChoice ItemsChoice Items

    The multiple-choice item is one of the most widely applicable test items for

    measuring achievement. It can effectively measure various types of knowledge and complex

    learning outcomes. In addition to this flexibility, it is free from some of the shortcomings

    characteristics of the other item types. The ambiguity and vagueness that frequently areambiguity and vagueness that frequently areambiguity and vagueness that frequently areambiguity and vagueness that frequently are

    present in the shortpresent in the shortpresent in the shortpresent in the short----answer item areanswer item areanswer item areanswer item are avoided because the alternatives better structure theavoided because the alternatives better structure theavoided because the alternatives better structure theavoided because the alternatives better structure the

    situationsituationsituationsituation. The short-answer item can be answered in many different ways, but the multiple-

    choice item restricts the pupils response to a specific area.

    Poor: Jose Rizal was born in _______.

    Better: Jose Rizal in

    A. Cavite

    B. Laguna

    C. Manila

    D. Quezon

    One advantage of the multiple-choice item over the true-false item is that pupils

    cannot receive credit for simply knowing that a statement is incorrect; they must also know

    what is correct.

    T F The degree to which a test measures what it purports to measure is reliability.

    The degree to which a test measures what it purports to measure is

    A. Objectivity

    B. Reliability

    C. Standardization

    D. Validity

    Another advantage of the multiple-choice items over the true-false item is the greaterthe greaterthe greaterthe greater

    reliability per itemreliability per itemreliability per itemreliability per item. Because the number of alternatives is increased from two to four or five,

    the opportunity for guessing the correct answer is reduced, and reliability is correspondingly

    increased. The effect of increasing the number of alternatives for each item is similar to that

    of increasing the length of the test.

    Using the best-answer type of multiple-choice item also circumvents a difficulty

    associated with the true-false item obtaining statements that are true or false without

    qualification. This makes it possible to measure learning outcomes in the numerous subject-

    matter areas in which solutions to problems are not absolutely true or false but vary indegree of appropriateness (e.g., best method, best reason, best interpretation).

    Another advantage of the multiple-choice item over the matching exercise is that the

    need for homogeneous material is avoidedneed for homogeneous material is avoidedneed for homogeneous material is avoidedneed for homogeneous material is avoided. The matching exercise, which is essentially a

    modified form of the multiple-choice item, requires a series of related ideas to form the list

    of premises and alternative responses. In many content areas it is difficult to obtain enough

    homogenous material to prepare effective matching exercises.

  • 7/30/2019 Comprehensive Material for Measurement and Evaluation

    46/54

    TCP.TIP_Rungduin Page 46

    Two other desirable characteristics of the multiple-choice item are worthy of mention.

    First, it is relatively free from response set. That is, pupils generally do not favor a particular

    alternative when they do not know the answer. Second, using number of plausible

    alternatives makes the result amenable to diagnosis. The kind of the incorrect alternatives

    pupils select provides clues to factual errors and misunderstanding that need correction.

    The wide applicability of the multiple-choice item, plus its advantages makes it easier

    to construct high-quality test items in this form than in any of the other forms. This does not

    mean that good multiple-choice items can be constructed without effort. But for given

    amount of effort, multiple-choice items will tend to be of a higher quality than short-answer,

    true-false, or matching-type items in the same area.

    Despite its superiority, the multiple-choice does have limitationslimitationslimitationslimitations.

    1. As with all other paper-and-pencil tests, it is limited to learning outcomes at theverbal level. The problems presented to pupils are verbal problems, free from the

    many irrelevant factors presenting natural situations. Also, the applications pupils are

    asked to make are verbal applications, free from the personal commitmentnecessary for application in natural situations. In short, the multiple-choice item, like

    other paper-and-pencil tests, measures whether the pupil knows or understandsmeasures whether the pupil knows or understandsmeasures whether the pupil knows or understandsmeasures whether the pupil knows or understands

    what to do when confronted with a problem situation, but it cannot determine howwhat to do when confronted with a problem situation, but it cannot determine howwhat to do when confronted with a problem situation, but it cannot determine howwhat to do when confronted with a problem situation, but it cannot determi