comprehensive material for measurement and evaluation
TRANSCRIPT
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
1/54
TCP.TIP_Rungduin Page 1
Technological Institute of the Philippines
College of Education
Center for Teaching Excellence
Teaching Certificate Program
Introduction to Measurement and Evaluation
Discussion Point 1:
Introduction to Measurement and Evaluation
The Necessity of EvaluThe Necessity of EvaluThe Necessity of EvaluThe Necessity of Evaluation in Teachingation in Teachingation in Teachingation in Teaching
To teach without evaluation is a contradiction in terms.
By its very nature, teaching requires innumerable judgments to be made by the
teacher, the school administrators, parents and the pupils themselves.
Teachers are obligated to assemble, analyze, and utilize whatever evidence can be
brought forward to make the most effective decisions (evaluations) for the benefit of
the students in their classes. Among these decisions are the following:1. The nature of the subject matter that should be taught at each grade level;2. Which aspects of the curriculum need to be eliminated, modified or included
as a function of the current level of student knowledge and attitudes;
3. How instruction can be improves to ensure that students learn;4. How pupils should be organized within the classroom to maximize learning;5. How teachers can tell if students are able to retain knowledge;6. Which students are in need of remedial or advanced work;7. Which students will benefit from placement in special programs for the
mentally retarded, emotionally disturbed, or physically handicapped;
8. Which children should be referred to the school counselor, psychologist,speech therapist, nurse or social worker; and
9. How each pupils progress can be explained most clearly and effectively.The Relationship between Teaching and EvaluationThe Relationship between Teaching and EvaluationThe Relationship between Teaching and EvaluationThe Relationship between Teaching and Evaluation
The purpose of teaching is to improve the knowledge, behaviors, and attitudes of
students.
Teachers want students to increase the amount of knowledge they possess and to
decrease the amount of forgetting.
Teaching consists of at least four interrelated elements (Glaser and DeCecco,
1968):
1. Developing instructional objectivesTeachers need to know what they are attempting to accomplish and
cannot leave such matters to chance.Students improve when they make progress toward clearly defined
objectives.
Clearly defined instructional objectives serve at least two roles:
a. Help the teacher recognize student improvement by clarifying whatit is the teacher wants to accomplish
b. Instructional goals imply the way in which the goals will beevaluated
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
2/54
TCP.TIP_Rungduin Page 2
2. Evaluating the students entering behaviorIndividual differences (academic achievement, sexual preference, social
class, notes from previous teachers, former school or former location ,
physical characteristics, knowledge of older brother or sister, and family
background)
Teaching methods are effective only if they are considered in relationship
to the background of the student.
3. Selecting an instructional strategyIf student background is important in selecting an instructional strategy,
the teacher will have to become familiar with those procedures used to
measure and evaluate those backgrounds.
4. Providing for an evaluation of the students performancePerformance assessment may suggest that a program is ineffective
because the objectives are unrealistic or because the entering behavior
was not considered adequately.
Evaluation can determine whether instructional objectives have been
met; it provides evidence that students have the necessary entering
behavior and it helps to evaluate the adequacy of an instructionalstrategy.
Test an instrument or systematic procedure for measuring a sample of behavior. (Answer
the question How well does the individual perform either in comparison with others or in
comparison with a domain of performance task?)
Measurement the process of obtaining a numerical description of degree to which an
individual possesses a particular characteristic. (Answer the question How much?)
Evaluation the systematic process of collecting, analyzing and interpreting information to
determine the extent to which pupils are achieving instructional objectives. (Answer thequestion How good?)
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
3/54
TCP.TIP_Rungduin Page 3
MeasurementMeasurementMeasurementMeasurement
Measurement involves the assigning of numbers to attributes or characteristics of
persons, objects, or events according to explicit formulations or rules.
Educational measurement requires the quantification of attributes according to
specified rules.
Characteristics of Scales of Measurement
Scale Definition Uses and Examples Limitations
LeastComplex
Nominal Scale involving the
classification of objects,
persons, or events into
discrete categories
Plate numbers, Social
Security numbers,
names of people,
places, objects,
numbers to identify
athletes
Cannot specify
quantitative differences
among categories
Ordinal Scale involving ranking
of objects, persons,traits or abilities without
regard to equality of
differences
Letter grades (ratings
from excellent tofailing), military ranks,
order of finishing a test
Restricted to specifying
relative differenceswithout regard to
absolute amount of
difference
MostComplex
Interval Scale having equal
differences between
successive categories
Temperature, Grades,
Scores
Ratios are meaningless,
the zero point is
arbitrarily defined
Ratio Scale having an
absolute zero and equal
intervals.
Distance, weight, time
required to learn a skill
or subject
None except that few
educational variables
have ratio
characteristics
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
4/54
TCP.TIP_Rungduin Page 4
Testing
A test may be defined as a task or series of tasks used to obtain systematic
observations presumed to be representative of educational or psychological traits
and or attributes.
Typically tests require examinees to respond to items or tasks from which the
examiner infers something about the attribute being measured.Tests and other measurement instruments serve a variety of purposes:
1. Selection. To determine which persons will be admitted to or deniedadmittance to an institution or organization.
2. Placement. To help individuals determine which of several programs they willpursue.
3. Diagnosis and remediation. To help discover the nature of the specificproblems individuals may have.
4. Feedback5. Motivation of guidance and learning6. Program and curriculum improvement7. Theory development
Tests may be classified by how they are administered (individually or in groups), how
they are scored (objectively or subjectively), what sort of response they emphasize
(power or speed), what type of response subjects make (performance or pencil-and-
paper), what they attempt to measure (sample or sign), and the nature of groups
being compared (teacher-made or standardized)
1. Individual and group testsSome tests are administered on a one-to-one basis during careful oral
questioning (e.g., individual intelligence tests), whereas others can be
administered to a group of individuals.
2. Objective and subjective testsAn objective test is one on which equally competent scorers will obtain
the same scores (e.g., multiple-choice tests), whereas subjective test isone on which the scores are influenced by the opinion or judgment of the
person doing the scoring (e.g., essay tests).
3. Power and speed testsA speed test measures the number of items that an individual can
complete in a given time, whereas a power tests measures the level of
performance under ample time conditions. Power tests items usually are
arrange in order of increasing difficulty.
Relationship between power and speed tests
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
5/54
TCP.TIP_Rungduin Page 5
PowerPowerPowerPower SpeedSpeedSpeedSpeed
TimeTimeTimeTime Generous Limited
Partially speeded
DifficultyDifficultyDifficultyDifficulty Relatively hard Relatively easy
4. Performance and paper-and-pencil testsPerformance tests require examinees to perform a task rather than
answer questions. They are usually administered individually so that the
examiner can count the number of errors committed by the student and
can measure how long each tasks takes.
Pencil-and-paper tests are almost always given in group situation in which
students are asked to write their answers on paper.
5. Sample and sign testsSample of a students total behavior
Sign tests are administered to distinguish one group of individuals from
another.
6. Teacher-made and standardized testsTeacher made tests are constructed by teachers for use within their own
classrooms. Their effectiveness depends on the skill of the teacher and
hi or her knowledge of test construction.
Standardized tests are constructed by test specialists working with
curriculum experts and teachers. They are standardized in that they have
been administered and scored under standard and uniform conditions so
that results from different classes and different schools may be compared
7. Mastery and survey testsSome achievement tests measure the degree of mastery of a limited set
of specific learning outcomes, whereas others measure pupils general
level of achievement over broad range outcomes.8. Supply and Selection Tests
Some tests require examinees to supply the answer (e.g., essay tests),
whereas others require them to select the correct response from the set
of alternatives (e.g., multiple-choice tests).
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
6/54
TCP.TIP_Rungduin Page 6
Evaluation
Evaluation is a process through which a value judgment or decision is made from a
variety of observations and from the background and training of the evaluator.
General Principles of Evaluation
1. Determining and clarifying what is to be evaluated always has priority in theevaluation process.
2. Evaluation techniques should be selected according to the purpose to beserved.
3. Comprehensive evaluation requires a variety of evaluation techniques.4. Proper use of evaluation techniques requires an awareness of both their
limitations and their strengths.
5. Evaluation is a means to an end, not an end in itself.Reasons for Using Tests and Other Measurements
Basis for
Classification
Type of Evaluation Function of Evaluation Illustrative Instruments
Nature ofMeasurement
Maximumperformance
Determines what individuals can
do when performing at their best
Aptitude tests, achievement tests
Typical performance
Determines what individuals will
do under natural conditions
Attitude, interest and personality
inventories; observational
techniques; peer appraisal
Use in Classroom
Instruction
Placement
Determines prerequisite skills,
degree of mastery of course
objectives, and/or best mode of
learning
Readiness tests, aptitude tests,
pretests on course objectives, self
report inventories, observational
techniques
Formative
Determine learning progress,
provides feedback to reinforce
learning, and correct learning
errors
Teacher-made mastery tests,
custom-made tests from test
publishers, observational
techniques
DiagnosticDetermines causes (intellectual,physical, emotional, environmental)
of persistent learning difficulties
Published diagnostic tests, teacher-made diagnostic tests,
observational techniques
Summative
Determines end-of-course
achievement for assigning grades
or certifying mastery of objectives
Teacher-made survey tests,
performance rating scales, product
scales
Method of
Interpreting
Results
Criterion referenced
Describes pupil performance
according to specified domain of
clearly defined learning tasks (e.g.,
adds single digit whole numbers).
Teacher-made mastery tests,
custom-made tests from test
publisher, observational techniques
Norm referenced
Describes pupil performance
according to relative position in
some known group (e.g., ranks
tenth in a classroom group of 30).
Standardized aptitude and
achievement tests, teacher-made
survey tests, interest inventories,
adjustment inventories
Motivation and Guidance of Learning
Tests can be used to motivate and guide students to learn, and because pupils study
for the type of examination they expect to take, it is the teachers responsibility to
construct examinations that measure important course objectives.
Program and Curriculum Improvement: Formative and Summative Evaluations
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
7/54
TCP.TIP_Rungduin Page 7
1. Formative EvaluationFormative EvaluationFormative EvaluationFormative Evaluation. Formative evaluation is used to monitor learningprogress during instruction and provide continuous feedback to both pupil
and teacher concerning learning success and failures.
2. Summative EvaluationSummative EvaluationSummative EvaluationSummative Evaluation. Summative evaluation typically comes at the end of acourse (or unit) of instruction. It is designed to determine the extent to which
the instructional objectives have been achieved and is used primarily for
assigning course grades or certifying pupil mastery of the intended learning
outcomes.
NormNormNormNorm----Referenced and CriterReferenced and CriterReferenced and CriterReferenced and Criterionionionion----Referenced MeasurementReferenced MeasurementReferenced MeasurementReferenced Measurement
Evaluation procedures can also be classified according to how the results are
interpreted. There are two basic ways of interpreting pupil performance on tests and other
evaluation instruments. One is to describe the performance in terms of the relative position
held in some known group (e.g., typed better than 90 percent of the class members). The
other is to directly describe the specific performance that was demonstrated (e.g., typed 40
words per minute without error). The first type of interpretation is called norm referenced;
the second criterion referenced. Both types of interpretation are useful.
Some Basic Terminologies
1. Norm-referenced test a test designated to provide a measure of performance thatis interpretable in terms of an individuals relative standing in some known group.
2. Criterion-referenced test a test designated to provide a measure of performancethat is interpretable in terms of a clearly defined and delimited domain of learning
tasks.
3. Objective-referenced test a test designated to provide a measure of performancethat is interpretable in terms of a specific instructional objective. (Many objective-
referenced test are called criterion-referenced tests by their developers).
Other terms that are less often used have meanings similar to criterion referenced:
content referenced, domain referenced, and universe referenced.
Comparison of Norm-Referenced Tests (NRTs) and Criterion-Referenced Tests (CRTs)
Common Characteristics of NRTs and CRTs
1. Both require specification of the achievement domain to be measured.2. Both require a relevant and representative sample of test items.3. Both use the same types of tests items.4. Both use the same rules fir item writing (except for item difficulty).5. Both are judge by the same qualities of goodness (validity and reliability).6. Both are useful in educational measurement.
Differences Between NRTs and CRTs (but it is only matter of emphasis)
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
8/54
TCP.TIP_Rungduin Page 8
1. NRT typically covers a large domain of learning tasks, with just a few itemsmeasuring each specific tasks
CRT typically focuses on a delimited domain of learning tasks, with a relatively large
number of items measuring each specific task.
2. NRT emphasizes discrimination among individuals in terms of relative level oflearning
CRT emphasizes description of what learning tasks individuals can and cannot
perform
3. NRT favors items of average difficulty and typically omits easy items.CRT matches item difficulty to learning tasks, without altering item difficulty or
omitting easy items
4. NRT used primarily (but not exclusively) for survey testingCRT used primarily (but not exclusively) for mastery testing
5. NRT interpretation requires a clearly defined groupCRT interpretation requires a clearly defined and delimited achievement domain
Strictly speaking, norm reference and criterion reference refer only to the method of
interpreting the results. These types of interpretation are likely to be most meaningful anduseful, however, when tests (and other evaluation instruments) are specially designated for
the type of interpretation to be made. Thus, we can use the terms criterion referenced and
norm referenced as broad categories for classifying tests and other evaluation techniques.
Tests that are specifically built to maximize each type of interpretation have much in
common, and it is impossible to determine to determine the type of test from examining the
test itself. Rather, it is in the construction and use of the tests that the differences can be
noted. A key feature in constructing norm-referenced tests is the selection of items of
average difficulty and the elimination of item that all pupils are likely to answer correctly.
This procedure provides a wide spread of scores so that discrimination among pupils at
various levels of achievement can be more reliably made. This is useful for decisions basedon relative achievement, such as selection, grouping and relative grading. In contrast, a key
feature in constructing criterion-referenced tests is the selection of items that are directly
relevant to the learning outcomes to be measured, without regard to the items ability to
discriminate among pupils. If the learning tasks are easy, the test items will be easy, and if
the learning tasks are difficult, the test items will be difficult. Here the main purpose is to
describe the specific knowledge and skills that each pupil can demonstrate, which is useful
for planning both group and individual instruction.
Norm-Referenced Test Combined Type Criterion-Referenced Test
Discrimination Dual Description
Among Pupils Interpretation of Performance
Other Descriptive Terms
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
9/54
TCP.TIP_Rungduin Page 9
Some of the other terms that are frequently used in describing tests are presented
here as contrasting test types, but some are simply the ends of a continuum (e.g., speed
versus power tests).
1. Informal Versus Standardized TestsInformal Versus Standardized TestsInformal Versus Standardized TestsInformal Versus Standardized Tests. Informal tests are those constructed byclassroom teachers, whereas those designated by test specialists and administered,
scored, and interpreted under standard conditions are called standardized tests.
2. Individual Versus Group TestsIndividual Versus Group TestsIndividual Versus Group TestsIndividual Versus Group Tests. Some tests are administered on a one-to-one basisduring careful oral questioning (e.g., individual intelligence tests), whereas others can
be administered to a group of individuals.
3. Mastery Versus SuMastery Versus SuMastery Versus SuMastery Versus Survey Testsrvey Testsrvey Testsrvey Tests. Some achievement tests measure the degree ofmastery of a limited set of specific learning outcomes, whereas others measure
pupils general level of achievement over broad range outcomes. Mastery tests are
typically criterion referenced, and survey tests tend to be norm referenced, but some
criterion-referenced interpretations are also possible with carefully prepared surveytests.
4. Supply Versus Selection TestsSupply Versus Selection TestsSupply Versus Selection TestsSupply Versus Selection Tests. Some tests require examinees to supply the answer(e.g., essay tests), whereas others require them to select the correct response from
the set of alternatives (e.g., multiple-choice tests).
5. Speed Versus Power TestsSpeed Versus Power TestsSpeed Versus Power TestsSpeed Versus Power Tests. A speed test measures the number of items that anindividual can complete in a given time, whereas a power tests measures the level of
performance under ample time conditions. Power tests items usually are arrange in
order of increasing difficulty.
6. Objective Versus Subjective TestsObjective Versus Subjective TestsObjective Versus Subjective TestsObjective Versus Subjective Tests. An objective test is one on which equallycompetent scorers will obtain the same scores (e.g., multiple-choice tests), whereas
subjective test is one on which the scores are influenced by the opinion or judgment
of the person doing the scoring (e.g., essay tests).
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
10/54
TCP.TIP_Rungduin Page 10
Discussion Point 2:
Preparing Instructional Objectives
Instructional objectives play a key role in the instructional process. When properly
stated, they serve as guides for both teaching and evaluation. A clear description of
the intended outcomes of instruction aids in selecting relevant materials and
methods of instruction, in monitoring pupil learning progress, in selecting or
constructing appropriate evaluation procedures, and in conveying instructional intent
to others.
In preparing instructional objectives, it is possible to focus on different aspects of
instruction.
Educational goal is a general aim or purpose of education that is stated as a broad,
long range outcome to work toward. Goals are used primarily in policy making and
general program planning (e.g. Develop proficiency in the basic skills of reading,
writing and arithmetic.)
General Instructional Objective is an intended outcome of instruction that has been
stated in general enough terms to encompass a set of specific learning outcomes
(e.g., Comprehends the literal meaning of written material).Specific Learning Outcome is an intended outcome of instruction that has been
stated in terms of specific and observable pupil performance (e.g. Identifies details
that are explicitly stated in a passage.) A set of specific learning outcomes describes
a sample of the types of performance that learners will be able to exhibit when they
have achieved a general instructional objective (also called Specific Objectives,
Performance Objectives, Behavioral Objectives, and Measurable Objectives).
Pupil Performance is any measurable or observable pupil response in the cognitive,
affective, or psychomotor area that is a result of learning.
Dimensions of Instructional Objectives
1. Mastery vs. Developmental OutcomesMastery objectives are typically concerned with relatively simple
knowledge and skill outcomes (adds two single-digit numbers with sums of
ten or less).
Developmental outcomes are concerned with objectives that can never be
fully achieved. Varying degrees of pupil progress along a continuum of
development.
2. Ultimate vs. Immediate ObjectivesUltimate objectives are concerned with those concerned with the typical
performance of individuals in the actual situations they will face in the
future. Example, good citizenship is reflected in adult life through voting
behavior, interest in community affairs, and the like; safety consciousnessshows up in safe driving and safe work habits and in obeying safety rules
in daily activities.
Immediate objectives should be closely related to ultimate situation. For
example, can pupils apply basic skills to practical situations? Such
objectives, calling for the application of knowledge and skills, aid in the
transfer of skills to ultimate situations and should be on any list of
objectives.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
11/54
TCP.TIP_Rungduin Page 11
3. Single-course vs. Multiple-course ObjectivesAreas Containing Multiple-Course Objectives
Whether these areas are the shared responsibility of several teachers depends
on the grade level and the schools goals (e.g., in some schools, every teacher
is considered a teacher of basic skills).
Reading Computer skills CreativityWriting Study skills Citizenship
Speaking Library skills Health
Selection of Instructional Objectives
1. Types of learning outcomes to consider2. Taxonomy of educational objectives3. Use of published lists of objectives4. Review of your own teaching materials and methods
Begin with a Simple Framework: Knowledge, Understanding, Application
Reading
K = Knows vocabularyU = Reads with comprehension
A = Reads a wide variety of printed materials
Writing
K = Knows the mechanics of writing
U = Understands grammatical principles in writing
A = Writes complete sentences (paragraph, theme)
Math
K = Knows the number system and basic operations.
U = Understands math concepts and processes.
A = Solves math problems accurately and efficiently
Criteria for Selecting Appropriate Objectives
1. Do the objectives include all important outcomes?2. Are the objectives in harmony with the general goals of the school?3. Are the objectives in harmony with sound principles of learning?4. Are the objectives realistic in terms of the pupils abilities and the time and facilities
available?
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
12/54
TCP.TIP_Rungduin Page 12
Stating the Specific Learning OutcomesStating the Specific Learning OutcomesStating the Specific Learning OutcomesStating the Specific Learning Outcomes
1. Focus on action verbsExamples:
a. Understands the meaning of terms.1. Defines the terms in own words2. Identifies the meaning of a term in context3. Differentiates between proper and improper usage of a term4. Distinguishes between two similar terms on the basis of meaning.5. Writes an original sentence using the term.
b. Demonstrates skill in critical thinking1. Distinguishes between fact and opinion2. Distinguishes between relevant and irrelevant information.3. Identifies fallacious reasoning in written material.4. Identifies the limitations of given data.5. Identifies the assumptions of underlying conclusions
2. Kept free of specific content so that they can be used in various units of study.Poor: Identifies the last ten presidents of the Psychological association
of the Philippines.Better: Identifies important historical figures.
Poor: Identifies the parts of the brain.
Better: Identifies the parts of a given structure.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
13/54
TCP.TIP_Rungduin Page 13
Discussion Point 3:
Achieving Different Types of Learning Outcomes
Achieving Cognitive LearningAchieving Cognitive LearningAchieving Cognitive LearningAchieving Cognitive Learning
Teaching Fact, Factual Information, and Knowledge
Basic Concepts
1. Fact something that has happened, an event or an actual state of affairs2. Factual Information information discriminated by many individuals who share the
same cultural background and accepted as correct or appropriate.
3. Information anything that is discriminated by an individual4. Knowledge factual information that is learned initially and then remembered
Types of Knowledge
1. General knowledge that applies to many different situations2. Domain-specific knowledge that pertains to a particular task or subjects3. Declarative knowledge of verbal information: facts, beliefs, opinions]4. Procedural knowledge of how a task is performed5. Conditional knowing when and why the need to use declarative and procedural
knowledge
Three Categories of Knowledge
1. Knowledge of specifics isolated facts and remembered separately2. Knowledge of ways and means conventions, trends and sequences, classification
and categories, criteria and methodology
3. Knowledge of abstraction laws, theories and principles
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
14/54
TCP.TIP_Rungduin Page 14
Application of Principles in Teaching and Learning Factual Information
Principles Applications in Classroom Situation
1. Organizing learning materialthrough meaningful association
facilitates acquisition of
information
Group itemso According to common attributeso By relationships
2. Transition from old to newmaterials facilitates acquisition of
information
Organize the material to a higher level ofgenerality
Use advanced organizer that is moregeneral, abstract, inclusive and
comparative
Utilize related prior knowledge of studentsas advance organizers
3. Proper sequencing of materialsfacilitates acquisition of
information
Order subject mattero According to regularity of the
structure
o According to the responsesavailable to the learner
o According to similarity of differentstimuli
4. Appropriate practice facilitatesacquisition of information
Provide adequate practice througho Use of knowledge in situationo Relationshipo Distribution of sessiono Review of small amount of material,
then at increasingly larger interval
Reinforce practice through confirmationof correct responses
5. Independent evaluation facilitatesacquisition of information
Provide mechanism for learners toevaluate their own responses.
Teaching Concepts and PrinciplesTeaching Concepts and PrinciplesTeaching Concepts and PrinciplesTeaching Concepts and Principles
Basic Concepts of Concept and Principles
1. Concept essentially an idea or an understanding of what something is A category used to group similar events, ideas, objects or people Organized information about the properties of one or more things
2. Principles relationship among two or more conceptsClassification of Principles
1. Cause and effect if-then relationship2. Probability prediction on actual sense3. Correlation prediction based on a wide range of phenomena4. Axioms rules
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
15/54
TCP.TIP_Rungduin Page 15
Instances of Concepts
1. Positive specific example of concepts2. Negative non-example of a concept
Attributes of Concept
1. Learnability some concepts are readily learned than others2. Usability some concepts can be used more than others3. Validity the extent to which experts agree on its attributes4. Generality the higher the concept, the more general it is5. Power the extent to which concept facilitates learning of other concepts6. Structure internally consistent organization7. Instance perceptibility the extent to which concepts can be sensed8. Instance numerousness number ranging from one to infinite number
Four Levels of Concept Attainment
1. Concrete2. Identity3. Classificatory4. Formal
Four Components in Any Concept Development
1. Name of concept2. Definition3. Relevant and irrelevant attributes4. Examples and non-examples
Simple Procedures in Concept Analysis
1. State attributes and non-attributes2. Give example and non-examples3. Indicate relationships of a concept to other concept4. Identify the principles in which concept is used5. Use concept in solving problems
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
16/54
TCP.TIP_Rungduin Page 16
Application of Principles in Teaching and Learning of Concepts and Principle
Principles Applications in Classroom Situation
1. Awareness of attributes facilitates concept
learning
Manage instruction by
Guiding learners to identify the critical attributes Using examples and non-examples which the
attribute of the concept can be identified
Utilizing activities where instances of theconcept can be directly observe
Providing for over generalization and undergeneralizing to establish the limits of the
concept
Providing the right amount of variation andrepetition
Varying irrelevant dimensions so that therelevant dimensions may be identified easily
2. Correct language for concepts facilitates
concept learning
Teach relevant names and labels associated with
Concept Attributes
3. Proper sequencing of instances facilitates
concept learning
Concept development should proceed
From simple to complex example From concrete to abstract From parts to whole From whole to parts
Present concepts of
Larger number of instances of one concept High dominance than those of low dominance Positive and negative instances of the concept
rather than all positive or all negative instances
Present instances of concept simultaneously rather than
successively
4. Guided student discovery facilitates concept
learning
Guided students discovery of concept through
Encounter with real and meaningful problems Gathering accurate information A responsive environment Prompt and accurate feedback
5. Concept application facilitates concept
learning
Conduct meaningful applications of concepts by
Drawing on the learners experiences Observing related situations Encountering life-like situations
6. Independent evaluation facilitates concept
learning
Arrange for independent evaluation by
Creating an attitude of seeking and searching Arranging for self-evaluation of the adequacy of
ones concept
Assisting learners to evaluate their conceptsand their methods of evaluating them.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
17/54
TCP.TIP_Rungduin Page 17
Developing Problem Solving AbilitiesDeveloping Problem Solving AbilitiesDeveloping Problem Solving AbilitiesDeveloping Problem Solving Abilities
Basic Concepts
1. Problem felt difficulty or a question for which a solution may be found only by aprocess of thinking
2. Thinking the recall and reorganization of facts and theories that occur at a timewhen the individual is face4d with obstacles and problems3. Reasoning productive thinking in which previous experiences are organized or
combined in new ways to solve problems
4. Problem solving creating new solutions to remove felt difficultySteps in Problem Solving
1. Felt need2. Recognizing a problem situation3. Gathering data4. Evaluating the possible solution5. Testing and verification6. Making generalization or conclusion
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
18/54
TCP.TIP_Rungduin Page 18
Principles for Developing Problem Solving Abilities and their Applications in Classroom
Situations
Principles Applications in classroom
1. Recognizing difficulties in a situation
facilitates problem solving
Assist students to
Identify solvable significant problems State problems themselves
2. Delimiting the problem facilitates problem
solving
Guide students in
Analyzing the situation related to the problem Determining problems of immediate concern Delimiting the problem Stating problems with opportunity for securing
progress towards a solution
Deciding the form in which the solutions mightappear
Using information processing skill of selectiveattention
3. Using new methods for arriving at a
conclusion facilitates problem solving
Help students in
Locating needed information Acquiring the necessary background
information, concepts, principles for dealingwith the problem
Developing their minimum reference list Identify various sources of information Drawing information from their own experiences Deciding on a uniform system for writing
bibliography
4. Generalizing possible solutions through
applying knowledge and methods to the
problem situation facilitates problem solving
Lead students to generate solutions through
Brainstorming session Processing information Analyzing information in terms of the larger
problems
Incorporating diverse information Eliminating overlapping and discrepancies5. problem solutions through inferring and
testing hypothesis facilitates problem solving
Develop the skills of the students in
Drawing hypotheses Stating hypotheses Testing hypotheses
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
19/54
TCP.TIP_Rungduin Page 19
Developing CreativityDeveloping CreativityDeveloping CreativityDeveloping Creativity
Basic Concepts
1. Restructuring conceiving of a problem in a new or different way2. Incubation unconscious work toward a solution while one is away from the problem3. Divergent thinking coming up with many possible solutions4. Convergent narrowing possibilities to the single answer5. Creativity occurrence of uncommon or unusual but appropriate responses:
imaginative, original thinking
Characteristics of a Creative Individual
1. Has a high degree of intellectual capacity2. Genuinely values intellectual matter3. Values own independence and autonomy4. Verbally fluent5. Enjoys aesthetics impressions6. Is productive7. Is concerned with philosophical problems8. Has high aspiration level of self9. Has a wide range of interests10.Thinks in unusual way11.Is an interesting, arresting person12.Appears straight forward, candid13.Behaves in ethically consistent manner
Principles for Developing Creativity and their Applications in Classroom Teaching
Principles Applications in Classroom Teaching
1. Production of novel forms of ideas
through expressing oneself by figural,verbal and physical means facilitates
development of creativity
Model creative behaviors such as
Curiosity Inquiry Divergent production
Provide opportunities for
Expression in many media: language,rhythm, music and art
Divergent production through figural, verbaland physical means
Creative processes Valuing creative achievement Production of ideas that cannot be scored
right or wrong
Develop a continuing program for developingcreative abilities
2. Associating success in creative efforts
with high level of creative experience
facilitates the development of creativity
Respect Unusual questions Imaginative creative ideas Reward Creative efforts Unique productions
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
20/54
TCP.TIP_Rungduin Page 20
Achieving Psychomotor LearningAchieving Psychomotor LearningAchieving Psychomotor LearningAchieving Psychomotor Learning
Basic Concepts
1. Capacity individuals potential power to do certain task2. Ability actual power to perform an act physically and mentally3. Skill level of proficiency attained in carrying out sequences of action in a consistentway
Characteristics of Skilled Performances
1. Less attention to the specific movement (voluntary to involuntary)2. Better differentiation of cues3. More rapid feedback and correction of movements4. Greater speed and coordination5. Greater stability under a variety of environmental conditions
Phases of Motor Skills Learning
1. Cognitive phase understanding the task2. Organizing phase associating responses with particular cues and integrating
responses
3. Perfecting phase executing performance in automatic fashion
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
21/54
TCP.TIP_Rungduin Page 21
Application of Principles in Developing Psychomotor Skills in Classroom Teaching
Principles Application in Classroom Teaching
1. Attending to the characteristicsof the skill and assessing ones
own related abilities facilitate
motor skill learning
Analyze the psychomotor skills in terms of the
learners abilities and development level
To determine the specific abilitiesnecessary to perform it To arrange the component abilities in order
To help students master them2. Observing and imitating a model
facilitates initial learning of skills
and movements
Demonstrate and describe the
Entire procedure for advance organizer Correct component of motor abilities Links of the motor chain in sequence Skill again step by step
3. Guiding initial responsesverbally and physically facilitates
learning of motor skills
Provide verbal guidance to
Give learners a feeling of security Direct attention to more adequate
techniques
Promote insight into the factors related tosuccessful performance of task
Provide physical guidance to
Facilitate in making correct response4sinitially
Correct immediately wrong responses4. Practicing under desirable
conditions facilitates the
learning of skills through
eliminating errors and
strengthening and refining
correct responses and form
Conduct practice of skills
Close to actual conditions where the skillwill be used
From whole to part arrangement Through repetitive drills in the same
materials By distributed rather than mass practice With interval of rest long enough to
overcome fatigue but not too long that
forgetting occurs
5. Knowledge of results facilitatesskill learning
Provide informational feedback on
Correct and incorrect responses Adequate and inadequate responses Correct6 or incorrect verbal remarks
Feedback may be secured from
Verbal analysis Chart analysis Tape performance
6. Evaluating ones ownperformance facilitates mastery
of skills
Self evaluation of learners performance through
Discussion Analysis Assessment
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
22/54
TCP.TIP_Rungduin Page 22
Achieving Affective LearningAchieving Affective LearningAchieving Affective LearningAchieving Affective Learning
Developing Attitudes and Values
1. Affective pertains to emotions or feelings rather than thought2. Affective learning consists of responses acquired as one evaluates the meaning of
an idea, object, person, or event in terms of his view of the world.
Main Elements
1. Taste like or dislike of a particular animal, color, or flavor2. Attitudes learned, emotionally toned predisposition to react in a consistent way,
favorable or unfavorable toward a person, object or idea
3. Values inner core belief, and internalized standards as norm of behaviorDefining Attributes of Attitudes
1. Learnability all attitudes are learned2. Stability learned attitudes become stronger and enduring3.
Personal-societal significance attitudes are of high importance to the individual andsociety
4. Affective-cognitive contents attitudes have both factual information and emotionsassociated with an object
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
23/54
TCP.TIP_Rungduin Page 23
Application of Principles in Developing Attitudes and Values in Classroom Teaching
Principles Application in Classroom Teaching
1. Recognizing an attitudefacilitates its initial learning
Guides students in
Identifying the attitudes and values todeveloped
Defining the terminal behavior expected ofthem
2. Observing and imitating a modelfacilitates initial attitude learning
Teachers provides
Different types of exemplary models Opportunities to examine carefully
instructional materials in terms of attitudes
and values presented
Teacher sets good example
3. Positive attitudes toward aperson, event or object
facilitates affective learning
Provide for pleasant and positive emotional
experiences by
Showing warmth and enthusiasm towardstudents
Keeping personal prejudices under control Allowing students to express ones own
value commitments
Demonstrating interest in subject matter Making possible for each student to
experience success
4. Getting information aboutperson, event, or object
influences initial attitude
learning and later commitment
to group held attitudes
Guide learners to extend their informative
experiences by
Undergoing direct experiences Listening to group lectures and discussions Engaging in extensive reading Participating in related activities
5. Interacting in primary groupsinfluences initial attitude
learning, later commitment to
group held attitude
Facilitate interacting in primary groups through
Group planning Group discussion Group decision making Role-playing
6. Practicing an attitude facilitatesstable organization
Practice context should
Regard the teacher as an exemplary modelmanifesting interest in the students
Be characterized by positive climate Confirm learner responses with positive
remarks, approving nod, and smile7. Purposeful learning facilitates
effective attitude acquisition and
modification
Guide learners to engage in independent attitude
cultivation through
Providing opportunities for them to thinkabout their own attitudes
Writing about open-ended themes
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
24/54
TCP.TIP_Rungduin Page 24
Discussion Point 4:
Test Construction, Reliability and Validity
STEP I: CONTENT VALIDATON
It is the degree to which the test represents the essence, the topics, and theareas that the test is designed to measure.
Considered the most crucial procedure in the test construction process becausecontent validity sets the pace for the succeeding validity and reliability measures.
1.1Documentary analysis or preDocumentary analysis or preDocumentary analysis or preDocumentary analysis or pre----survey.survey.survey.survey. At this stage, one must have familiarizedhim/herself with the theoretical constructs directly related to the test one
planning.
1.2Development of a Table of Specification.Development of a Table of Specification.Development of a Table of Specification.Development of a Table of Specification. Determining the areas or concepts thatwill represent the nature of the variable being measured and the relative
emphasis of each area are essentially judgmental.
Detailed TS includes areas or concepts, objectives, number of items, and the
percentage or proportion of items in each area.
It is advisable to make 50 to 100 percent allowance in the construction of items.
Sample
Table of Specification (first draft) for Introduction to Psychology Unit Exam
AREAS
LEARNING
OBJECTIVES
NUMBER
OF
ITEMS
PLACEMENT OF ITEMS PERCENTAGE
K C A A S E
I. History
of
Psychology
151,6,13,14,22,23,24,32,33,
49,50,51,59,65,6521.43 %
II.Branches
of
Psychology
157,12,15,16,20,21,31,34,
47,48,52,58,59,61,7021.43 %
III.
Schools of
Psychology
20
3,4,5,17,19,25,26,30,35,
36,39,40,44,46,53,54,60,
62,67,69
28.57 %
IV.
Research
Methods
20
2,8,9,10,11,18,27,28,29,
37,38,41,42,43,45,55,56,
63,64,68
28.57 %
Total 70 100 %
Building a Table of Specifications:
1. Obtaining a list of Instructional Objectives2. Outlining the Course Content3. Preparing a Two-way chart
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
25/54
TCP.TIP_Rungduin Page 25
Table of Specifications for a Summative Third-Grade Social Studies Test
1.3Consultation with expertsConsultation with expertsConsultation with expertsConsultation with experts. At this point it is advisable to consult with your thesisadviser or with some authorities that have the expertise in making judgment
about the representativeness or relevance of the entries made in your TS.
1.4Item writingItem writingItem writingItem writing. At this stage you should know what types of items you are supposedto construct: the type of instrument, format, scaling and scoring techniques.
STEP II: FACE VALIDATIONSTEP II: FACE VALIDATIONSTEP II: FACE VALIDATIONSTEP II: FACE VALIDATION
Face validity, the crudest type of validity, pertains to whether the test looks valid,that is, if by the face of the instrument, it looks like it can measure what you
intend to measure. This type of validity cannot stand-alone for use especially in researches in the
graduate level.
2.1 Item inspection.Item inspection.Item inspection.Item inspection. Have the initial draft of the instrument inspected by a group ofevaluators thesis adviser, test construction experts, expert/professionals whose
specialization are related to the subject matter at hand.
ITEM / ITEM NO.ITEM / ITEM NO.ITEM / ITEM NO.ITEM / ITEM NO. SUITABLESUITABLESUITABLESUITABLENOTNOTNOTNOT
SUITABLESUITABLESUITABLESUITABLE
NEEDNEEDNEEDNEED
REVISIONREVISIONREVISIONREVISION
2.2 InterInterInterInter----judge consistencyjudge consistencyjudge consistencyjudge consistency. You may collate the data gathered from the evaluatorsfor analysis. You have to look at the agreement or consistency of judgment they
made each of the items.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
26/54
TCP.TIP_Rungduin Page 26
STEP III: FIRST TRIAL RUNSTEP III: FIRST TRIAL RUNSTEP III: FIRST TRIAL RUNSTEP III: FIRST TRIAL RUN
At this stage you must have already a stencil of your first draft as a result ofsteps 1 and 2. Try out your test to a sample that is comparable to your target
population or of your final sample. This try out should be large enough to
provide meaningful computations.
STEP IV:STEP IV:STEP IV:STEP IV: ITEM ANALYSISITEM ANALYSISITEM ANALYSISITEM ANALYSIS
Both the reliability and validity of any test depend largely on the characteristicsof the items. It says that high validity and reliability can be built into the
instruments in advance through item analysis.
According to Likert, item analysisitem analysisitem analysisitem analysis can be used as an objective check indetermining whether the member of the group react differentially to the battery,
that is, item analysis indicates whether those person who fall toward one end of
the attitude continuum on the battery do so on the particular statement and
vice versa.
THE UTHE UTHE UTHE U----L INDEX METHODL INDEX METHODL INDEX METHODL INDEX METHOD
Appropriate for the test whose criterion is measured along the continuous scaleand whose individual item is scored right or wrong and negative or positive.
Steps in using USteps in using USteps in using USteps in using U----L Index MethodL Index MethodL Index MethodL Index Method
1. Score the test and arrange them from lowest to highest based on the total scores.Sample score from 10 item test
n = 30
List of scores
Arranged scoresArranged scoresArranged scoresArranged scores
from lowest tofrom lowest tofrom lowest tofrom lowest to
highesthighesthighesthighest
2 9 1 5
3 8 2 5
5 9 2 5
9 5 2 6
5 2 2 6
4 3 2 6
2 4 3 8
6 4 3 8
8 6 3 8
8 2 3 8
8 2 4 89 1 4 9
3 8 4 9
6 4 4 9
3 5 5 9
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
27/54
TCP.TIP_Rungduin Page 27
2. Separate the top 27 %top 27 %top 27 %top 27 % and bottom 27 %bottom 27 %bottom 27 %bottom 27 % of the cases.the 27 % of 30 is 8.1 or 8 (30 x .27 = 8.1)
Sample score
from 10 itemtest
n = 30
Arranged scoresArranged scoresArranged scoresArranged scores
from lowest tofrom lowest tofrom lowest tofrom lowest to
highesthighesthighesthighest
1 5
2 5
2 5
2 6
2 6
2 63 8
3 8
3 8
3 8
4 8
4 9
4 9
4 9
5 9
3. Prepare a tally sheet. Tally the number of cases from each group who got theitem right for each of the entire items. And then convert them into frequencies.
ITEMITEMITEMITEM
NO.NO.NO.NO.UPPER 27 %UPPER 27 %UPPER 27 %UPPER 27 % LOWER 27 %LOWER 27 %LOWER 27 %LOWER 27 %
tally frequency tally frequency
1111 IIIII III 8 II 2
2222 IIIII 5 II 2
3333 IIIII III 8 I 1
4444 IIIII I 6 I 1
5555 IIIII III 8 II 2
6666 IIIII II 7 III 3
7777 IIIII I 6 I 1
8888 IIIII 5 I 1
9999 IIIII III 8 II 2
10101010 IIIII - II 7 II 2
Bottom 27 %
To 27 %
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
28/54
TCP.TIP_Rungduin Page 28
4. Compute the proportions of each case in the different item number.
U or L =
ITEMITEMITEMITEM
NO.NO.NO.NO.
UPPER 2UPPER 2UPPER 2UPPER 27 %7 %7 %7 %
nnnn = 30= 30= 30= 30
LOWER 27 %LOWER 27 %LOWER 27 %LOWER 27 %
nnnn = 30= 30= 30= 30
f p F P
1111 8 2
2222 5 2
3333 8 1
4444 6 1
5555 8 2
6666 7 3
7777 6 18888 5 1
9999 8 2
10101010 7 2
5. Compute the discrimination index of each item.Discrimination indexDiscrimination indexDiscrimination indexDiscrimination index refers to the degree to which an item differentiates correctly
among test takers in the behavior that the test is designed to measure. Thus, a god test
item separates the bright from the poor respondents.
Ds =
Where: DsDsDsDs is discrimination index
PuPuPuPu is proportion of the upper 27 %
PlPlPlPl is proportion of the lower 27 percent %
ITEMITEMITEMITEM
NO.NO.NO.NO.
UPPER 27 %UPPER 27 %UPPER 27 %UPPER 27 %
nnnn = 30= 30= 30= 30
LOWER 27 %LOWER 27 %LOWER 27 %LOWER 27 %
nnnn = 30= 30= 30= 30DDDDssss
f p (Pu) f p (Pl)
1111 8 2
2222 5 2
3333 8 14444 6 1
5555 8 2
6666 7 3
7777 6 1
8888 5 1
9999 8 2
10101010 7 2
f
n
Pu Pl
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
29/54
TCP.TIP_Rungduin Page 29
6. Compute the difficulty index of each itemDifficulty inDifficulty inDifficulty inDifficulty indexdexdexdex is the percentage of the respondents who got the item right. It
can also be interpreted as how easy or how difficult an item is.
Df =
Where: DfDfDfDf is difficulty index
PuPuPuPu is proportion of the upper 27 %
PlPlPlPl is proportion of the lower 27 %
ITEMITEMITEMITEM
NO.NO.NO.NO.
UPPER 27 %UPPER 27 %UPPER 27 %UPPER 27 %
nnnn = 30= 30= 30= 30
LOWER 27 %LOWER 27 %LOWER 27 %LOWER 27 %
nnnn = 30= 30= 30= 30DDDDssss DDDDFFFF
F p (Pu) f p (Pl)
1111 8 2
2222 5 23333 8 1
4444 6 1
5555 8 2
6666 7 3
7777 6 1
8888 5 1
9999 8 2
10101010 7 2
7. Deciding whether to retain an item will be based on two ranges.ITEMITEMITEMITEM
NO.NO.NO.NO.
UPPER 27 %UPPER 27 %UPPER 27 %UPPER 27 %
nnnn = 30= 30= 30= 30
LOWER 27 %LOWER 27 %LOWER 27 %LOWER 27 %
nnnn = 30= 30= 30= 30DDDDssss DDDDFFFF DecisionDecisionDecisionDecision
F p f p
1111 8 2
2222 5 2
3333 8 1
4444 6 1
5555 8 2
6666 7 3
7777 6 1
8888 5 1
9999 8 2
10101010 7 2
Items with difficulty indicesdifficulty indicesdifficulty indicesdifficulty indices within .20 to .80 and didididiscrimination indicesscrimination indicesscrimination indicesscrimination indices within.30 to .80 are retained.
Pu Pl
2
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
30/54
TCP.TIP_Rungduin Page 30
The Chung-the-fan item analysis table can be obtained in the discriminationindices of the items.
.40 and above - very good item
.30 - .39 - reasonably good item but possibly subject
to improvement
.20 - .29 - marginal item, usually needing
improvement
.19 and below - poor item, to be rejected, improved or
revised
Difficulty indices can interpreted as the following:.00 - .20 - very difficult
.21 - .80 - moderately difficult
.81 1.00 - very easy
CRITERION OF INTERNAL CONSISTENCYCRITERION OF INTERNAL CONSISTENCYCRITERION OF INTERNAL CONSISTENCYCRITERION OF INTERNAL CONSISTENCY
Somewhat similar to the U-L Index Method, in that two criterion groups, the highgroup and the low group, are employed to judge the discriminatory power of anitem. However, in this method, Likert recommends the use of the high 10
percent and the low 10 percent groups.
Steps in Criterion of Internal Consistency Method
1. List all the scores of the respondents and get its high 10 percent and low 10percent. Write their respective scores for each item.
sample scores from a 10-item test
using 4-point scale
n = 50
High10
%
TEST ITEMS1111 2222 3333 4444 5555 6666 7777 8888 9999 10101010
AAAA 4 4 4 4 4 3 4 2 4 3
BBBB 4 2 4 2 2 4 4 2 4 3
CCCC 4 4 4 4 4 4 4 1 3 3
DDDD 4 4 4 4 2 4 4 2 4 4
EEEE 3 4 4 3 4 2 2 2 4 4
Low
10
%
FFFF 4 1 1 2 3 2 3 4 3 2GGGG 4 1 2 2 4 2 2 3 3 2
HHHH 3 2 2 1 3 3 1 4 2 2
IIII 3 2 2 1 4 2 3 4 2 2
JJJJ 3 2 2 2 4 1 3 3 1 2
respondents
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
31/54
TCP.TIP_Rungduin Page 31
2. Get the summation of the high group and the low group.sample scores from a 10-item test
using 4-point scale
n = 50
High 10 % TEST ITEMS1111 2222 3333 4444 5555 6666 7777 8888 9999 10101010
AAAA 4 4 4 4 4 3 4 2 4 3
BBBB 4 2 4 2 2 4 4 2 4 3
CCCC 4 4 4 4 4 4 4 1 3 3
DDDD 4 4 4 4 2 4 4 2 4 4
EEEE 3 4 4 3 4 2 2 2 4 4
Sum of high groupSum of high groupSum of high groupSum of high group
Sum of low groupSum of low groupSum of low groupSum of low group
Low 10 %
FFFF 4 1 1 2 3 2 3 4 3 2
GGGG 4 1 2 2 4 2 2 3 3 2HHHH 3 2 2 1 3 3 1 4 2 2
IIII 3 2 2 1 4 2 3 4 2 2
JJJJ 3 2 2 2 4 1 3 3 1 2
3. Get the difference of the groups.sample scores from a 10-item test
using 4-point scale
n = 50
High 10 %TEST ITEMS
1111 2222 3333 4444 5555 6666 7777 8888 9999 10101010
AAAA 4 4 4 4 4 3 4 2 4 3
BBBB 4 2 4 2 2 4 4 2 4 3
CCCC 4 4 4 4 4 4 4 1 3 3
DDDD 4 4 4 4 2 4 4 2 4 4
EEEE 3 4 4 3 4 2 2 2 4 4
Sum of high group
Sum of low group
DifferenceDifferenceDifferenceDifference
Low 10 %
FFFF 4 1 1 2 3 2 3 4 3 2GGGG 4 1 2 2 4 2 2 3 3 2
HHHH 3 2 2 1 3 3 1 4 2 2
IIII 3 2 2 1 4 2 3 4 2 2
JJJJ 3 2 2 2 4 1 3 3 1 2
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
32/54
TCP.TIP_Rungduin Page 32
4. To see the difference between scores, ranking will be helpful to use.sample scores from a 10-item test
using 4-point scale
n = 50
High 10 % TEST ITEMS1111 2222 3333 4444 5555 6666 7777 8888 9999 10101010
AAAA 4 4 4 4 4 3 4 2 4 3
BBBB 4 2 4 2 2 4 4 2 4 3
CCCC 4 4 4 4 4 4 4 1 3 3
DDDD 4 4 4 4 2 4 4 2 4 4
EEEE 3 4 4 3 4 2 2 2 4 4
Sum of high group
Sum of low group
Difference
RankRankRankRank
Low 10 %FFFF 4 1 1 2 3 2 3 4 3 2
GGGG 4 1 2 2 4 2 2 3 3 2
HHHH 3 2 2 1 3 3 1 4 2 2
IIII 3 2 2 1 4 2 3 4 2 2
JJJJ 3 2 2 2 4 1 3 3 1 2
PEARPEARPEARPEARSON PRODUCTSON PRODUCTSON PRODUCTSON PRODUCT----MOMENT CORRELATION METHODMOMENT CORRELATION METHODMOMENT CORRELATION METHODMOMENT CORRELATION METHOD
This item analysis technique is used for tests of continuous scaling with three(3) or more scale points. There is a total score, which serves as an X criterion,
and item score, which is the Y criterion. This is done to the entire items.
Therefore, if the draft consists of 60 items, there should be 60 correlation
coefficients computed.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
33/54
TCP.TIP_Rungduin Page 33
Steps in Pearson Product-Moment Correlation Method
1. Find the X and Y scores.Where XXXX is the total scores of the respondents while the YYYY is the item score.
sample scores in item no 1item no 1item no 1item no 1 in a75 item test
n = 10
Respondents X Y
A 30 4
B 43 5
C 53 3
D 45 4
E 70 2
F 45 3
G 68 4
H 48 5I 38 2
J 45 4
TOTAL 485 36
2. Square all the X and Y scores.sample scores in item no 1item no 1item no 1item no 1 in a
75 item test
n = 10
Respondents X Y X2 Y2
A 30 4
B 43 5
C 53 3
D 45 4
E 70 2
F 45 3
G 68 4
H 48 5
I 38 2
J 45 4
TOTAL 485 36 24905 140
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
34/54
TCP.TIP_Rungduin Page 34
3. Multiply all the X and Y.sample scores in item no 1item no 1item no 1item no 1 in a
75 item test
n = 10
Respondents X Y X2
Y2
XYA 30 4
B 43 5
C 53 3
D 45 4
E 70 2
F 45 3
G 68 4
H 48 5
I 38 2
J 45 4
TOTAL 485 36 24905 140
4. Given the above data, compute the Pearson r.rxy =
where: rxy = correlation between x and y
x = sum of total scores
y = sum of item scores
xy = sum of the product of XY
y2 = sum of the squared total scoresx2 = sum of squared total item scores
Significant coefficient reflects good items while insignificant coefficient onesreflect poor items. Most researchers considers a coefficient of .30 and above as
indicating good items.
To interpret the correlation coefficient values ( r) obtained, the followingclassification may be applied:
+.00 - + .20 = negligible correlation
+.21 - + .40 = low or slight correlation
+.41 - +.70 = marks or moderate correlation
+.71 - +.90 = high relationship
+.91 - +.99 = very high correlation
+1.00 = perfect correlation
nxy (x) (y)
[nx2
(x)2] [ny
2 (y)
2]
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
35/54
TCP.TIP_Rungduin Page 35
POINTPOINTPOINTPOINT----BISERIAL CORRELATION METHODBISERIAL CORRELATION METHODBISERIAL CORRELATION METHODBISERIAL CORRELATION METHOD
This is applied to test with dichotomous scoring system (yes/no, right/wrong,improved/ not improved). Unlike the Pearson Product method, the Y criterion
is scored either 1 or 0.
USING TWO OR MORE TECHNIQUESUSING TWO OR MORE TECHNIQUESUSING TWO OR MORE TECHNIQUESUSING TWO OR MORE TECHNIQUES
Basically it is a combination of two or more item analysis techniques. Althoughitem analysis is laborious, some researchers have adopted to play safe by going
through this process. This is done to ensure the more accurate quantitative
judgment.
STEP V: SECOND TRIAL RUN OR FINAL TEST ADMINISTRATIONSTEP V: SECOND TRIAL RUN OR FINAL TEST ADMINISTRATIONSTEP V: SECOND TRIAL RUN OR FINAL TEST ADMINISTRATIONSTEP V: SECOND TRIAL RUN OR FINAL TEST ADMINISTRATION
More often than not, the second trial run becomes the final run. This meansthat for the second trial run one may administer the draft resulting from the
item analysis to ones final sample.
Necessary adjustments can still be done before finally administering theinstrument to the final sample.
STEP VI: EVALUATION OF THE TESTSTEP VI: EVALUATION OF THE TESTSTEP VI: EVALUATION OF THE TESTSTEP VI: EVALUATION OF THE TEST
After the final run, the test can now be evaluated statistically of its final validityand reliability.
6.1 Evaluation of reliabilityEvaluation of reliabilityEvaluation of reliabilityEvaluation of reliabilityTHE SPLIT HALF RELIABILITYTHE SPLIT HALF RELIABILITYTHE SPLIT HALF RELIABILITYTHE SPLIT HALF RELIABILITY
The most common technique of evaluating the reliability of the half-test isthrough the odd-even split half technique. This is done by splitting the test into
two, the odd numbered items as one, and the even numbered items as the
other.
Through the use Pearson Product-Moment Correlation, the reliability of the halfof the instrument can be determined. The reliability coefficient of this type is
often called a coefficient of internal consistency.
Through the use of Spearman-Brown Prophecy Formula, the reliability of theentire instrument can be obtained.
r11 =
where: r11 = reliability of the whole test
r = reliability of the half test
What would be the reliability of the whole test if the computed coefficient from
the odd-even method is r = .63?
The KudeKudeKudeKuderrrr----Richardson Formula 20Richardson Formula 20Richardson Formula 20Richardson Formula 20 can also be used in determining thereliability of the entire test and at the same time solving the problem that may
arise in using the Spearman-Brown Prophecy Formula
Steps in Kuder-Richardson Formula 20
2 (r )
1 + r
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
36/54
TCP.TIP_Rungduin Page 36
1. Check the test by giving 1 for every correct answer and o for every wronganswer and get its frequency
ITEMSRESPONDENTS
A B C D E F G H I J K L M N ffff1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 12
2 1 1 1 1 1 1 1 1 1 1 1 1 0 0 12
3 1 1 1 1 1 1 1 1 1 1 1 0 0 0 11
4 1 1 1 1 1 1 1 1 1 1 0 0 0 0 10
5 1 1 1 1 1 1 1 1 1 1 0 0 0 0 10
6 1 1 1 1 1 1 1 1 1 1 0 0 0 0 10
7 1 1 1 1 1 1 1 1 1 0 0 0 0 0 9
8 0 1 1 1 1 1 0 1 1 0 1 0 0 0 8
9 0 1 1 1 1 1 1 1 0 0 0 0 0 0 8
10 0 0 1 0 0 1 1 0 1 0 0 0 0 0 4
total 7 9 10 9 9 10 9 9 9 6 4 2 0 0
2. Find the proportion passing each item (pi) and then the proportion failingeach item (qi). pipipipi is computed by dividing the number of respondents who
got the correct answers in the total number of respondents, while the qiqiqiqi is
computed through subtracting 1 to the computed pi.
pi = qi = pi 1
ITEMSf pipipipi qiqiqiqi
1 12
2 12
3 11
4 10
5 10
6 10
7 9
8 8
9 8
10 4
total
no. of students w/ correct answer
total number of respondent
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
37/54
TCP.TIP_Rungduin Page 37
3. Multiply the pi and qiITEMS
f pi qi piqipiqipiqipiqi
1 12
2 123 11
4 10
5 10
6 10
7 9
8 8
9 8
10 4
total 1.9509
4. Compute for the variance (s2) of the instrument
xxxx =
ssss2222 =
5. Compute for the Kuder-Richardson Formula 20
rrrrtttttttt =
RESPONDENTS x (x x) (x x)2
A 7
B 9
C 10
D 9
E 9
F 10
G 9
H 9I 9
J 6
K 4
L 2
M 0
N 0
total
(x x)2
n- 1
x
n
k piqi
1
k-1 s2
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
38/54
TCP.TIP_Rungduin Page 38
THE TESTTHE TESTTHE TESTTHE TEST----RETEST RELIABILITYRETEST RELIABILITYRETEST RELIABILITYRETEST RELIABILITY
This is also called the coefficient of stability. To calculate the coefficient, thetest is administered twice to the same sample with a given time interval.
The Pearson r is then calculated to determine the reliability of the instrument The most critical problem in this technique is determining the correct time
interval between the two testing. Generally, twp weeks or so.
PARALLEL FORM OF RELIABILITY OR ALTERNATE FORMPARALLEL FORM OF RELIABILITY OR ALTERNATE FORMPARALLEL FORM OF RELIABILITY OR ALTERNATE FORMPARALLEL FORM OF RELIABILITY OR ALTERNATE FORM
The coefficient of equivalence is computed by administering two parallel orequivalent forms of the test to the same group of individuals.
This technique is also referred to as the method of equivalent forms. The coefficient obtained from this formula is also called as the coefficient of
equivalence.
6.2 Evaluation of validityEvaluation of validityEvaluation of validityEvaluation of validityCRITERIONCRITERIONCRITERIONCRITERION----RELATED VALIDITYRELATED VALIDITYRELATED VALIDITYRELATED VALIDITY
Criterion-related validity is a very common type of validity, and it is primarilystatistical. It is a correlation between a set of scores or some other predictor
with an external measure. This external measure is called criterion.
A correlation coefficient is then run between two sets of measurements. In actual practice, several predictors are used. Then multiple r would be
computed between these predictors.
The difficulty usually met in this type of validity is in selecting or judging whichcriterion should be used to validate the measure at hand.
Also called as predictive validity.CONSTRUCONSTRUCONSTRUCONSTRUCT VALIDITYCT VALIDITYCT VALIDITYCT VALIDITY
Construct validity is determined by investigating the psychological qualities,trait, or factors measured by a test.
It often called as concept validity because it does really after with the highvalidity coefficient but in the theory and concept behind the test. Likewise, it
involves discovering positive correlation between and among the
variables/constructs that define the concept.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
39/54
TCP.TIP_Rungduin Page 39
Discussion Point 5:
Constructing Objective Test Items: Multiple-Choice Form
Objective test items are not limited to the measurement of simple learning
outcomes The multiple-choice item can measure at both the knowledge and
understanding levels and is also free of many of limitations of other forms of objective
items.
The multiple-choice item is generally recognized at the most widely applicable and usefultype of objective test item.
More effectively measures many of the simple learning outcomes measured by the short-answer item, the true-false item, and the matching exercise.
Measures a variety of the more complex outcomes in the knowledge, understanding andapplication areas.
This flexibility, plus the higher quality items usually found in the multiple-choice form, hasled to its extensive use in achievement testing.
CHARACTERISTIC OF MULTIPLE-CHOICE ITEMS
A multipleA multipleA multipleA multiple----choice item consists of a problem and a list of suggested solutions.choice item consists of a problem and a list of suggested solutions.choice item consists of a problem and a list of suggested solutions.choice item consists of a problem and a list of suggested solutions.
The problem maybe stated as a direct question or an incomplete statement and is calledthestemstemstemstem of the item.
The list of suggested solutions may include words, numbers, symbols, or phrases andare called alternatives (also called choices or options).
The pupil is typically requested to read the stem and the list of alternatives and to selectthe one correct, or best, alternative.
The correct alternative in each item is called merely the answeransweransweranswer, and the remainingalternatives are called distractersdistractersdistractersdistracters (also called decoys or foils). These incorrect
alternatives receive their name from their intended function to distract those pupilswho are in doubt about the correct answer.
Whether to use a direct question or incomplete statement in the stem depends onWhether to use a direct question or incomplete statement in the stem depends onWhether to use a direct question or incomplete statement in the stem depends onWhether to use a direct question or incomplete statement in the stem depends on
several factors.several factors.several factors.several factors.
The direct-question form is easier to write, is more natural for the younger pupils, andis more likely to present a clearly formulated problem.
On the other hand, the incomplete statement is more concise, and if skillfullyphrased, it too can present a well-defined problem.
A common procedure is to start each stem as a direct question and shifting to theincomplete statement form only when the clarity of the problem can be retained and
greater conciseness achieved.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
40/54
TCP.TIP_Rungduin Page 40
Examples: Direct-question form
In which one of the following cities is the capital of Philippines?
a. Manila
b. Paraaque
c. Pasay
d. Taguig
Incomplete sentence form
The capital of the Philippines is in ______.
a. Manila
b. Paraaque
c. Pasay
d. Taguig
Examples: Best-answer type
Which one of the following factors contributed most to the selection of Manila as thecapital of the Philippines?
a. Central location
b. Good climate
c. Good highways
d. Large population
The bestbestbestbest----answer type of multipleanswer type of multipleanswer type of multipleanswer type of multiple----choice itemchoice itemchoice itemchoice item tends to be more difficult than the
correctcorrectcorrectcorrect----answer typeanswer typeanswer typeanswer type. This due partly to the finer discriminations called for and partly to the
fact that such items are used to measure more complex learning outcomes. The best-
answer type is especially useful for measuring learning outcomes that require the
understanding, application or interpretation of factual information.
USES OF MULTIPLEUSES OF MULTIPLEUSES OF MULTIPLEUSES OF MULTIPLE----CHOICE ITEMSCHOICE ITEMSCHOICE ITEMSCHOICE ITEMS
The multiple-choice item is the most versatile type of test item available. It can
measure a variety of learning outcomes from simple to complex, and it is adaptable to most
types of subject matter content. The uses show only its function in measuring some of the
more common learning outcomes in the knowledge, understanding and application areas.
The measurement of more complex outcomes, using modified forms of the multiple-choice
item.
Measuring Knowledge OutcomesMeasuring Knowledge OutcomesMeasuring Knowledge OutcomesMeasuring Knowledge Outcomes
Knowledge of Terminology.Knowledge of Terminology.Knowledge of Terminology.Knowledge of Terminology. A simple but basic learning outcome measured by the multiple-
choice item is knowledge of terminology. For this purpose, pupils can be requested to show
their knowledge of a particular term selecting a word that has the same meaning as the
given term or by choosing a definition of the term. Special uses of a term can also be
measured, by having pupils identify the meaning of the term when used in context.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
41/54
TCP.TIP_Rungduin Page 41
Knowledge of Specific Facts.Knowledge of Specific Facts.Knowledge of Specific Facts.Knowledge of Specific Facts. Another learning outcome basic to all school subjects is the
knowledge of the specific facts. It is important in its own right, and it provides a necessary
basis for developing understanding, thinking skills, and other complex learning outcomes.
Multiple-choice items designated to measure specific facts can take many different forms,but questions of the who, what, when, and where variety are most common.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
42/54
TCP.TIP_Rungduin Page 42
Knowledge of Principles.Knowledge of Principles.Knowledge of Principles.Knowledge of Principles. Knowledge of principles is also important learning outcome in most
school subjects. Multiple-choice item can be constructed to measure knowledge of
principles as easily as those designated to measure knowledge of specific facts. The items
appear a bit more difficult, but this is because principles are more complex than isolated
facts.
Knowledge of Methods and Procedures.Knowledge of Methods and Procedures.Knowledge of Methods and Procedures.Knowledge of Methods and Procedures. Another common learning outcome readily
adaptable to the multiple-choice form is knowledge of methods and procedures. In some
cases we might want to measure knowledge of procedures before we permit pupils to
practice in particular area (e.g., laboratory procedures). In other cases, knowledge of
methods and procedures may be important learning outcomes in their own right (e.g,
knowledge of governmental procedures).
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
43/54
TCP.TIP_Rungduin Page 43
Measuring OutcomesMeasuring OutcomesMeasuring OutcomesMeasuring Outcomes aaaatttt tttthe Understanding And Application Levelshe Understanding And Application Levelshe Understanding And Application Levelshe Understanding And Application Levels
Many teachers limit the use of multiple-choice items to the knowledge area because
they believe that all objective-type items are restricted to the measurement of relatively
simple learning outcomes. Although this is true of most of the other types of objective items,
the multiplemultiplemultiplemultiple----choice item is especially adaptable to the measchoice item is especially adaptable to the measchoice item is especially adaptable to the measchoice item is especially adaptable to the measurement of more complexurement of more complexurement of more complexurement of more complex
learning outcomeslearning outcomeslearning outcomeslearning outcomes.
In reviewing the following items, it is important to keep in mind that such item
measure learning outcomes beyond factual knowledge only if the applications and
interpretations are new to the pupils. Any specific applications or interpretations of
knowledge can, of course, be taught directly to the pupils as any other fact is taught. When
this is done, and the test item contains the same problem situations and solutions used in
teaching, it is obvious that the pupils can be given credit for no more than mere retention of
factual knowledge. To measure understanding and application, an element of novelty must
be included in the test items.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
44/54
TCP.TIP_Rungduin Page 44
Ability to Identify Application of Facts and Principles.Ability to Identify Application of Facts and Principles.Ability to Identify Application of Facts and Principles.Ability to Identify Application of Facts and Principles. A common method of determining
whether pupils learning has gone beyond the mere memorization of a fact or principle is to
ask them to identify its correct application in a situation that is new to the pupil.
AbilityAbilityAbilityAbility tttto Interpro Interpro Interpro Interpret Causeet Causeet Causeet Cause----AndAndAndAnd----Effect Relationship.Effect Relationship.Effect Relationship.Effect Relationship. Understanding can frequently be
measured by asking pupils to interpret various relationships among facts. One of the most
important relationships in this regard, and one common to most subject-matter areas, is the
cause-and-effect relationship. Understanding of such relationships can be measured by
presenting pupils with specific cause-and-effect relationship and asking them to identify the
reason that best accounts for it.
Ability to Justify MetAbility to Justify MetAbility to Justify MetAbility to Justify Methods and Procedureshods and Procedureshods and Procedureshods and Procedures. Another phase of understanding important in
various subject-matter areas is concerned with methods and procedures. A pupil might know
the correct method or sequence of steps in carrying out procedure, without being able to
explain why it is the best method or sequence of steps. At the understanding level we are
interested in the pupils ability tojustify the use of a particular
method or procedure. This can
be measured with multiple-
choice items by asking the pupils
to select the best of several
possible explanations of method
or procedure.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
45/54
TCP.TIP_Rungduin Page 45
Advantages and Limitations of MultipleAdvantages and Limitations of MultipleAdvantages and Limitations of MultipleAdvantages and Limitations of Multiple----Choice ItemsChoice ItemsChoice ItemsChoice Items
The multiple-choice item is one of the most widely applicable test items for
measuring achievement. It can effectively measure various types of knowledge and complex
learning outcomes. In addition to this flexibility, it is free from some of the shortcomings
characteristics of the other item types. The ambiguity and vagueness that frequently areambiguity and vagueness that frequently areambiguity and vagueness that frequently areambiguity and vagueness that frequently are
present in the shortpresent in the shortpresent in the shortpresent in the short----answer item areanswer item areanswer item areanswer item are avoided because the alternatives better structure theavoided because the alternatives better structure theavoided because the alternatives better structure theavoided because the alternatives better structure the
situationsituationsituationsituation. The short-answer item can be answered in many different ways, but the multiple-
choice item restricts the pupils response to a specific area.
Poor: Jose Rizal was born in _______.
Better: Jose Rizal in
A. Cavite
B. Laguna
C. Manila
D. Quezon
One advantage of the multiple-choice item over the true-false item is that pupils
cannot receive credit for simply knowing that a statement is incorrect; they must also know
what is correct.
T F The degree to which a test measures what it purports to measure is reliability.
The degree to which a test measures what it purports to measure is
A. Objectivity
B. Reliability
C. Standardization
D. Validity
Another advantage of the multiple-choice items over the true-false item is the greaterthe greaterthe greaterthe greater
reliability per itemreliability per itemreliability per itemreliability per item. Because the number of alternatives is increased from two to four or five,
the opportunity for guessing the correct answer is reduced, and reliability is correspondingly
increased. The effect of increasing the number of alternatives for each item is similar to that
of increasing the length of the test.
Using the best-answer type of multiple-choice item also circumvents a difficulty
associated with the true-false item obtaining statements that are true or false without
qualification. This makes it possible to measure learning outcomes in the numerous subject-
matter areas in which solutions to problems are not absolutely true or false but vary indegree of appropriateness (e.g., best method, best reason, best interpretation).
Another advantage of the multiple-choice item over the matching exercise is that the
need for homogeneous material is avoidedneed for homogeneous material is avoidedneed for homogeneous material is avoidedneed for homogeneous material is avoided. The matching exercise, which is essentially a
modified form of the multiple-choice item, requires a series of related ideas to form the list
of premises and alternative responses. In many content areas it is difficult to obtain enough
homogenous material to prepare effective matching exercises.
-
7/30/2019 Comprehensive Material for Measurement and Evaluation
46/54
TCP.TIP_Rungduin Page 46
Two other desirable characteristics of the multiple-choice item are worthy of mention.
First, it is relatively free from response set. That is, pupils generally do not favor a particular
alternative when they do not know the answer. Second, using number of plausible
alternatives makes the result amenable to diagnosis. The kind of the incorrect alternatives
pupils select provides clues to factual errors and misunderstanding that need correction.
The wide applicability of the multiple-choice item, plus its advantages makes it easier
to construct high-quality test items in this form than in any of the other forms. This does not
mean that good multiple-choice items can be constructed without effort. But for given
amount of effort, multiple-choice items will tend to be of a higher quality than short-answer,
true-false, or matching-type items in the same area.
Despite its superiority, the multiple-choice does have limitationslimitationslimitationslimitations.
1. As with all other paper-and-pencil tests, it is limited to learning outcomes at theverbal level. The problems presented to pupils are verbal problems, free from the
many irrelevant factors presenting natural situations. Also, the applications pupils are
asked to make are verbal applications, free from the personal commitmentnecessary for application in natural situations. In short, the multiple-choice item, like
other paper-and-pencil tests, measures whether the pupil knows or understandsmeasures whether the pupil knows or understandsmeasures whether the pupil knows or understandsmeasures whether the pupil knows or understands
what to do when confronted with a problem situation, but it cannot determine howwhat to do when confronted with a problem situation, but it cannot determine howwhat to do when confronted with a problem situation, but it cannot determine howwhat to do when confronted with a problem situation, but it cannot determi