ani, elizabeth ngozika pg/m.ed/10/52484 application of
TRANSCRIPT
Ugwoke Oluchi C.
APPLICATION OF ITEM RESPONSE THEORY IN THE
DEVELOP
Digitally Signed by: Content manager’s
DN : CN = Webmaster’s name
O = University of Nigeria, Nsukka
OU = Innovation Centre
Ugwoke Oluchi C.
FACULTY OF EDUCATION
DEPARTMENT OF SCIENCE EDUCATON
APPLICATION OF ITEM RESPONSE THEORY IN THE
DEVELOPMENT AND VALIDATION OF MULTIPLE
CHOICE TEST IN ECONOMICS
ANI, ELIZABETH NGOZIKA
PG/M.Ed/10/52484
i
: Content manager’s Name
Webmaster’s name
a, Nsukka
FACULTY OF EDUCATION
DEPARTMENT OF SCIENCE EDUCATON
APPLICATION OF ITEM RESPONSE THEORY IN THE
MENT AND VALIDATION OF MULTIPLE
ii
TITLE PAGE
APPLICATION OF ITEM RESPONSE THEORY IN THE DEVELOPMENT
AND VALIDATION OF MULTIPLE CHOICE TEST IN ECONOMICS
BY
ANI, ELIZABETH NGOZIKA
PG/M.Ed/10/52484
A RESEARCH PROPOSAL PRESENTED TO THE DEPARTMENT OF
SCIENCE EDUCATION,
UNIVERSITY OF NIGERIA, NSUKKA
IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE
AWARD OF MASTER OF EDUCATION IN MEASUREMENT AND
EVALUATION (M.ED).
MAY, 2014
iii
APPROVAL PAGE
This project has been approved for the Department of Science Education,
University of Nigeria Nsukka.
By
Dr. B. C. Madu Professor Z. C. Njoku
Supervisor Head of Department
Professor O. A. Afemikhe Dr. J. C. Onuoha
External Examiner Internal Examiner
Professor Uju C. Umo Dean, Faculty of Education
iv
CERTIFICATION
Ani, Elizabeth Ngozika a postgraduate student in the Department of Science
Education with Registration number PG/M.Ed/10/52484 has satisfactorily completed
the requirements for the course and research work for the Degree of Master in
Measurement and Evaluation. The work embodied in this thesis is original and has not
been submitted in part or full for other diploma or degree of this or any other
university.
Ani, Elizabeth Ngozika Dr. B. C. Madu
Student Supervisor
v
DEDICATION
This project report is dedicated to God Almighty and my beloved father, late
Chief Matthew Chukwmaeze Ani who did not live to reap the fruits of his labour.
vi
ACKNOWLEDGEMENTS
The researcher sincerely appreciates God Almighty, for his love and guidance
throughout the period of this research work. The researcher wishes to acknowledge
with a deep sense of gratitude the co-operation, help and encouragement of all those
who in one-way or the other helped towards the success of this research.
First among them is the researchers’ supervisor Dr. B. C. Madu, whose
patience, guidance, fatherly advice, selfless services, and dedication, helped to bring
this work to a successful completion. The researcher’s special thanks go to Mr.
Christian Ugwuanyi and Dr. John Agah for their useful corrections and direction
during the proposal stage. The researcher’s special gratitude goes to the panel
members for their inputs and encouragement. The researcher also appreciate the
effort of Dr. (Mrs.) E. Umobong who contributed immensely to see that the analysis
and completion of the work is done.
The researcher is grateful to her friends and colleagues Nze Blessing,
Mrs.Violet Nwabufor, Mrs. Judith Kanu, Mrs. Rose Okoye, Ike Francis and India
Vershima. Finally, the researcher is indebted to her parents, brothers and sisters; Joe,
John, Ben, Chris, Goddy, Justina, Stella, Anayo (Nwanyi oma) and Ogbobe for their
prayers and financial support throughout this period of her study.
Ani, Elizabeth Ngozika
vii
TABLE OF CONTENTS
Title page - - - - - - - - - - i
Approval page - - - - - - - - - ii
Certification - - - - - - - - - iii
Dedication - - - - - - - - - iv
Acknowledgements - - - - - - - - - v
Table of Contents - - - - - - - - vi
List of Tables - - - - - - - - - ix
Abstract - - - - - - - - - - x
CHAPTER ONE: INTRODUCTION - - - - - - 1
Background of the Study - - - - - - - 1
Statement of the Problem - - - - - - 10
Purpose of the Study - - - - - - - - 10
Significance of the Study - - - - - - - 11
Scope of the Study - - - - - - - - 12
Research Questions - - - - - - - - - 13
Research Hypotheses - - - - - - - - 14
CHAPTER TWO: LITERATURE REVIEW - - - - 17
Conceptual Framework - - - - - - - - 15
Concept of Achievement Test - - - - - - - 15
Qualities of a Test - - - - - - - - - 19
Item Analysis - - - - - - - - - 24
viii
Differential Item Functioning (DIF) - - - - - - 25
Standard Error of Measurement (S.E.M) - - - - - - 25
Concept of Gender - - - - - - - - 25
Analysis of Fit - - - - - - - - - 26
Schematic Representation of Conceptual Framework - - - - 26
Theoretical Framework - - - - - - - - 27
Classical Test Theory - - - - - - - - 27
Item Response Theory - - - - - - - - 28
Review of Empirical Studies - - - - - - 34
Summary of Literature Review - - - - - - - 40
CHAPTER THREE: RESEARCH METHOD - - - - 41
Research Design - - - - - - - - - 41
Area of the Study - - - - - - - - - 41
Population of the Study - - - - - - - - 42
Sample and Sampling Technique - - - - - - - 42
Instrument for Data Collection - - - - - - - 43
Validation of the Instrument - - - - - - - 44
Reliability of the Instrument - - - - - - - 44
Method of Data Collection - - - - - - - - 45
Method of Data Analysis - - - - - - - 45
CHAPTER FOUR: RESULTS - - - - - - - 46
Research Questions 1 - - - - - - - - 46
Research Questions 2 - - - - - - - - 47
ix
Research Questions 3 - - - - - - - - 48
Research Questions 4 - - - - - - - - 49
Research Questions 5 - - - - - - - - 50
Research Questions 6 - - - - - - - - 51
Research Hypothesis I - - - - - - - - 53
Research Hypothesis II - - - - - - - - 53
Summary of the Findings - - - - - - - 54
CHAPTER FIVE: DISCUSSION OF FINDINGS, CONCLUSION,
IMPLICATIONS, RECOMMENDATIONS AND SUMMARY OF THE STUDY
- - - - - - - - - - - - 56
Discussion of Findings - - - - - - - 56
Conclusion - - - - - - - - - - 60
Educational Implications - - - - - - - - 61
Recommendations - - - - - - - - - 62
Limitation of the Study - - - - - - - - 63
Suggestions for Further Studies - - - - - - - 63
Summary of the Study - - - - - - - - 63
References - - - - - - - - - - 66
APPENDICES
A: Data for Area of Study - - - - - - - 74
B: Population Data - - - - - - - - - 77
C: Sampling Data - - - - - - - - - 80
x
D: Instrument - - - - - - - - - 81
E: Scoring Guide - - - - - - - - - 89
F: Table of specification - - - - - - - - 90
G: Reliability Test - - - - - - - - - 91
H: 3PL Model Analysis of Economics Achievement Test - - - 93
I: DIF Model Analysis of Economics Achievement Test - - - 98
xi
LIST OF TABLES
Pages
Table 1: Standard errors of measurement of the test items of the multiple-choice test
in Economics based on three-parameter logistic (3PL)
model................................................................................................... 66
Table 2: Fits statistics of multiple choice test based on three parameter logistic (3PL)
model.................................................................................................. 67
Table 3: Item threshold values (difficulty estimates) of the items of the multiple
choice test in Economics based on three parameter logistic (3PL)
model................................................................................................... 69
Table 4: Item parameters of the test items of the multiple choice test in Economics
based on three parameter logistic (3PL) model........................................70
Table 5: Guessing parameters of the test items of the multiple choice questions in
Economics based on three parameter logistic (3PL) model........................71
Table 6: Model for group differential item functioning of the test items of the multiple
choice test in Economics........................................................................ 72
xii
ABSTRACT
The study applied item response theory in the development and validation of multiple
-choice test in Economics. Instrumentation research design was used for the study. A
sample of 1005 Economics senior secondary school II students was randomly selected
from 46 government co-education schools. To guide this study, six research questions
were posed and two hypotheses were formulated. The Economics Multiple choice test
items numbering 50 developed by the researcher were used for data collection. To
ensure the validity of the instrument, the instrument was subjected to face and content
validation by three experts, two from the department of science education and one
from Economics department. The reliability index of 0.89 was obtained. The data
generated from the study were analyzed using maximum likelihood estimation
technique of BILOG-MG computer programming. The analysis of the data revealed
that 50 test items of Economics survived therefore, the final instrument developed for
assessing students’ ability in Economics contained 50 items with the appropriate
indices. The result of the study showed that 49 items of the multiple choice question in
Economics were reliable based on three parameter model (3pl) model. The findings
also showed that thirty one (31) items of the Economics multiple-choice test in
Economics were difficult. The findings further revealed that items functions
differential in Economics among male and female students. Based on the findings,
recommendations were made which include that the examination bodies and teachers
should encourage and adopt IRT in developing test items used in measuring students
ability in Economics.
1
CHAPTER ONE
INTRODUCTION
Background to the Study
Economics is one of the senior secondary school subjects that require
assessment to ascertain students’ basic knowledge and skills and understanding of the
concepts and the nature of economic problems in any society. Economics has been
defined variously by many authorities. These different definitions arise because
Economics studies human behavior and man behaves differently. Mankiw (2001)
defined Economics as the study of how society manages its scarce resources. Egunjobi
and Egwakhide (2010) opined that Economics is the study of human endeavors in
respect of production, distribution, exchange and consumption. Economics, according
to Orji (2002), is the science of scarcity and choice. This implies that when resources
are limited in quantity relative to their uses, they are scarce, and the fact about scarcity
forces the individual to make a choice among the alternatives. In Nigeria, Economics
came into the secondary school curriculum in1966 (Obemeata, 1991). The objectives
of studying Economics according to Asadu (2001) are:
• to enable students to acquire knowledge for the practical solution of the
economic problem of Nigerian societies, developing countries and the world at
large.
• to prepare and encourage students to be cautious and affective in the
management of scarce resources.
• to equip students with the basic principle of economics necessary for useful
living.
2
• to increase students respect for the dignity of labour and their appreciation to
economic, cultural and social values of the society.
The objectives discussed tend to suggest that the study of Economics is a form
of learning in which knowledge, skills and habits of a group of people are transferred
from one generation to the next through teaching, training or research. Learning is
simply described as a change in behavior as a result of experience (Maduewesi, 1999).
According to Black and William (2009) learning is tied to effective assessment by
monitoring students, progress and feeding that information back to students. Because
learning is unpredictable, assessment is necessary to make adaptive adjustments to
instruction, but assessment processes themselves impact the learner’s willingness,
desire, and capacity to learn (Harlen & Deakin-Crick, 2002). Assessment is the
systematic collection, review and use of information about educational programs to
improve student learning. In the view of Huba and Freed (2000), assessment is the
process of gathering and discussing information from multiple and diverse sources in
order to develop a deep understanding of what students know, understand, and can do
with their knowledge as a result of their educational experiences. This idea could be
seen in the Federal Republic of Nigeria (FRN) policy on education concerning
continuous assessment which is supposed to be implemented at all level of the
educational system for both adult and young learners (FRN, 2004). This type of
assessment could be affected through the use of achievement test. Malcolm (2003)
viewed achievement test as an exam designed to assess how much knowledge a
person has in a certain area or set of areas. The following are some objectives of
achievement tests:
3
• To measure whether students possess the pre-requisite skills needed to succeed
in any unit or whether the students have achieved the objectives of the planned
instruction.
• To monitor students' learning and to provide on-going feedback to both
students and teachers during the teaching-learning process.
• To identify the students' learning difficulties- whether persistent or recurring.
• To assign grades.
These objectives can be achieved by the use of different assessment
instruments such as; essay tests and objective tests which are utilized by the teacher
depending on the aims of the measurement. The focus of this study is on objective
tests. Objective test is one of the assessment instrument used in testing or assessing
students’ academic achievement in any given instruction. In objective tests, such as
multiple choice questions, students are asked and respondent required to select the
best possible answer (or answers) out of the choices from a list (Okoro, 2006).
Multiple choice items consist of a stem and a set of options. The stem is the beginning
part of the item that presents the problem to be solved, a question asked of the
respondent, or an incomplete statement to be completed, as well as any other relevant
information. The options are the possible answers that the examiner can choose from,
with the correct answer called the key and the incorrect answers called distracters.
Test scores obtained from the multiple choice questions are used to assess the
competence of the students. Some of the advantages of the multiple choice questions
as reported in the literature are; multiple choice test items can be used to measure both
the lower and higher levels of the cognitive domain (Onunkwo, 2002). Multiple
4
choice tests, unlike essay test, allow the teacher to ask a large number of questions
that adequately cover the course content (Okoro, 2006). Bush (2001) noted that
multiple choice questions can increase the test takers probability of guessing the right
answer to a question by eliminating unlikely choices. The multiple choice tests
generally are much more objective, because they are mostly self-administered and
scorers can apply a scoring key which allows them to agree perfectly (Meredith, Joyce
& Walter 2007). However all assessment instruments must satisfy the criteria of
reliability, validity, objectivity as well as usability (Anene & Ndubisi, 2003).
Reliability is conceived in relation to the extent of consistency or dependability of a
measuring instrument (Abonyi, 2011). This implies that if any test were to be applied
in Economics an infinite number of times, it would be expected to generate responses
that vary a little from trial to trial, as a result of measurement error. Therefore, for any
measuring instrument, the smaller the error, the greater the reliability while the greater
the error, the smaller the reliability. Individual scores on a test can be viewed as the
combined result of the true score and measurement error. The type of measurement
error that is utilized in interpreting individual scores is called standard error of
measurement. Standard error of measurement, according to Onunkwo (2002),
provides the standard deviation of a series of measurements taken on the same
individual. Validity refers to the extent to which an instrument measures what it is
designed to measure (Nworgu, 2006). A test with high validity will measure
accurately the particular qualities it is supposed to measure. The objectivity of a test
refers to whether its scores are undistorted by biases of individuals who administer
and score it, while usability of a test is the extent to which a test provides to the
5
teacher or test administrator, clear instructions that can be put into practice without a
great deal of difficulty or confusion. In order words, a test in Economics is usable if it
does not force students to waste their time dealing with the idea of recording the
answer. Nevertheless, instrument development in Economics requires more than
determination of reliability, validity, objectivity and usability of the items. Some other
indices such as item difficulty, item discrimination, distractors are required for
determination of the quality of the instrument.
Unfortunately, teacher of Economics which teachers are inclined to, do not
determine these qualities of a test. The reason may be that the questions should not
require these qualities or teachers lack the knowledge of setting quality tests. This may
result in the students’ failure in WAEC (West African Examinations Council).
However, the procedures for determining these indices or parameter of items of the
instrument depend on the measurement theory used. The two distinct measurement
theories are the Classical Test Theory (CTT) and Item Response Theory (IRT).
Classical test theory is based on the true score theory which views the observed score
(X) as a combination of the true scores (T) and an error component (E) (Adedoyin,
2010). The observed score of a test-taker is usually seen as an estimate of the true
scores of the test-taker plus or minus some unobservable measurement error (Crocker
& Algina, 2008). An advantage of classical test theory is that it is relatively simple
and easy to interpret. CTT does not have a complex theoretical model to relate an
examinee’s ability to succeed on a particular item. Instead, CTT collectively considers
a pool of examinees and empirically examinees ability to success on a particular item.
However, CTT can be criticized since the item difficulty could vary depending on the
6
sample of test-takers of test. Therefore, it is difficult to compare test-takers results
between different tests. Secondly, Npkone (2001) asserted that the proportion of
examinees in a sample that get an item correct changes from a sample whose mean
ability is high to one whose mean ability is low.
However, despite the limitation of CTT it is being used to describe the estimates
of achievement test in secondary schools. For instance, the students’ achievements in
Economics are often subjected to statistical measure as mean, standard deviation,
e.t.c. These statistics change for a test when another sample from the same population
of students is used. The estimates or indices are obtained depending on how many
samples were chosen from the students’ population. In order words, there is so much
dependence on student total (aggregate) score in a test while the achievement on
individual items is not determined. Therefore, to ensure effective teaching and
learning of Economics in schools, an achievement test that focuses on attainment on
individual items will have better utility than one on students’ aggregate scores. An
educational measurement scale that has ratio scale, sample independent attributes and
students’ ability reported on both item and total instrument levels can be developed
with the measurement theory called Item Response Theory (IRT) otherwise known as
modern theory. Item Response Theory (IRT) is, for some researchers, the answer to
the limitations of classical test theory (Troy- Gerard, 2004). Item response theory is a
modeling technique that tries to describe the relationship between an examinee’s test
performance and the latent trait underlying the performance (Henard, 2000). Reeve
(2002) describes item response theory as a body of theory describing the application
of mathematical models to data from questionnaires and tests as a basis for measuring
7
things such as abilities and attitudes. Item Response Theory (IRT) looks at the
examinee’s performance by using item distributions based on the examinee’s
probability of success on a latent variable. In IRT, item statistics also referred
parameters are estimated and interpreted. Under IRT, parameters of the persons are
invariant across items, and parameters of the items are invariant in different
populations of persons. It brings greater flexibility and provides more sophisticated
information which allows for the improvement of the reliability of an assessment.
According to Nenty (2004), invariance is the bedrock of objectivity in physical
measurement, and the lack of it raises a lot of questions about the scientific nature of
psychological measurement. Item response theory is a collection of different models
showing the relationship between a participant’s responses on an item and underlying
latent trait (Ercikan & Koh, 2005). These models were originally developed for items
that are scored dichotomously (correct or incorrect) but the concept and method of
IRT extend to a wide variety of polytomous models for all types of psychological
variables that are measured by rating scales of various kind (Vander & Hambleton,
1997). IRT model assumes that the performance of an examinee can be completely
predicted or explained from one or more abilities. IRT models the probability of a
correct answer using three logistic functions. The one-parameter logistic (1PL) model
attempts to address the probability of a correct answer by allowing each question to
have an independent difficulty variable. For instance, one-parameter model allows
each question on an achievement test to have an independent difficulty variable. The
two-parameter logistic (2PL) model attempts to model each item’s level of
discrimination between high and low ability students while in the (3PL) model adds a
8
third item parameter which is called pseudo-guessing parameter that reflects the
probability that an examinee with a very low trait level will correctly answer an item
solely by guessing. This implies that students can correctly answer an item in an
achievement test by guessing.
Obinne (2012) observed that guessing is giving an answer or making a
judgment about something without being sure of all the facts. Guessing parameter
model gives the probability of an individual with ability, responding correctly to an
item with a difficulty index, discrimination index and a guessing index. The model
assumes that the three parameters (difficulty, discrimination and guessing) are
necessary for an estimate and valid relationship between the probability of a correct
response to an item and the trait level (ability) of an individual. Within the latent trait
test model, the internal validity of a test is assessed in terms of the statistical fit of
each item to the model. Fit to the model also implies that item discriminations are
uniform and substantial, that there are no errors in item scoring. It also indicates that
guessing has had a negligible effect on test scores. IRT models are extremely helpful
in assessment instrument like Economics achievement test when trying to understand
students’ abilities by examining their test performance. To ensure that Economics
achievement test is fair for all examinees, the instrument should be fair. A test
instrument is said to be fair when two groups of equal ability with respect to the
construct measured by the test should earn the same score on each item of the test.
The comparison between results of subgroups gives indication of items that are
functioning differently for different groups of students. If the test is not fair or yield
different scores from subgroups for instance gender, it is said to suffer from
9
Differential Item Functioning (DIF). Differential item functioning is a collection of
statistical methods that gives indication of items that are functioning differently for
different groups of students (Madu, 2012). This implies that differential item
functioning would occur in Economics achievement test if the Item Response
Function (IRF) for an item are different for two groups. In the view of Meredith,
Joyce and Walter (2007) differential item functioning means that individuals of equal
ability but from different subgroup (e.g., males and females) do not have the same
probability of earning the same score. Gender is a broad analytic concept which
highlights women’s roles and responsibilities in relation to those of men. Gender
relates to the difference in sex (that is, either male or female) and how this quality
affects their dispositions and perception toward life and academic activities (Okoh,
2007). Hence, instrument developed for measuring achievement test in Economics
may suffer from differential item functioning if they do not have the basic qualities
that test instrument should have and moreover even when they tried to have some
qualities they are based on the CTT frame work where a large p-value difference and
item by group interaction may label an item as biased when in fact no bias exist.
However, the type of measurement theory that ensures item level performance instead
of aggregate level performance in analyzing Economics achievement test is therefore
the concern of this study.
Statement of the Problem
10
The Federal Republic of Nigeria Policy on Education (FRN) (2004) has
emphasized so much on continuous assessment which is necessary at all level of
education. By this policy, teachers assess the knowledge, skills and abilities of the
students in Economics at senior secondary school. Every assessment is expected to
treat the test-taker equally but the instrument development through classical test
theory which the teachers set hardly accomplishes this purpose. This is because, it is
group dependent and the item statistics such as item difficulty and item discrimination
are also group dependent.
Based on these limitations of the instrument developed under classical test
theory, the researcher designed this study using a modern measurement theory to
ensure objectivity in measurement of the students’ scores in analyzing Economics
multiple choice test items. Therefore, the question addressed is: would item response
theory influence the instrument development and validation of multiple choice test in
Economics?
Purpose of the Study
The main purpose of this study was to apply item response theory in the
development and validation of the multiple choice test in Economics. Specifically, the
study determined the;
1. Standard errors of measurement of the test items of the multiple choice test in
Economics.
2. Fit of the items of the Economics multiple choice test using three-parameter
logistic (3PL) model.
3. Difficulty parameter of the test items of the multiple choice test in
11
Economics.
4. Discrimination parameter of the test items of the multiple choice test in
Economics.
5. Guessing parameter of the test items of the multiple choice test in Economics.
6. Differential item functioning of the test items of the multiple choice test in
Economics with respect to gender.
Significance of the Study
The results of this study have both theoretical and practical significance.
Theoretically, item response theory which focused on paradigm for the design,
analysis, and scoring of tests, questionnaires, and similar instruments measuring
abilities, attitudes, or other variables was used to show the relationship between
student’s test performance and the latent trait underlying the performance. The theory
also provides a better view on the information each question provides about a student.
The practical significance of this study is expected to be beneficial to the
teachers, curriculum planners, students and guidance and counselors.
This study should help the teachers to understand the steps involved in the test
development. This enables teachers to set quality questions in the school which may
have similar qualities with external examination questions. This may also give insight
to the teachers that the performance of the students during external examinations
depends on the quality questions or assessment they set in the school. Teachers should
find this study useful as it helps to ensure maximum report of the achievement of the
examinees by providing ideas to meaningful interpretation of examinees result
through person-by-item encounter (latent trait model) during examination. The study
12
would report the examinees’ achievement by classifying the examinees into ability
levels on each of the items based on Item response theory (IRT) using item response
function (IRT). The Economics teachers can use instrument to predict the probability
of the examinees correctly answering any given item if the examinees’ ability levels
are known.
To curriculum planners, this study provides another reform of curricular goals
and objectives. The usefulness of this study ties in providing empirical data to enable
them plan a functional curriculum taking into consideration the development and
validation of achievement test such as Economics as a subject. This should encourage
and guide teachers to develop and set quality questions in the school.
To the student, it would enlighten them on the interpretation of their
performance in Economics when assessed using the developed instrument. The study
should enable them to understand the relationship between their performance on each
question they answered and underling latent trait.
On the aspect of the guidance and counselors, the findings of this study would
help them to understand the performance of the students on each question as exposed
by the teachers. This should enable them to determine the strength and weakness of
each student. This help to advice the student from time to time on the factors that
affect their performance or academic life in the career to choose.
Scope of the Study
Application of item response theory in the development and validation of
multiple choice test in Economic was limited to SS2 Economics students at senior
secondary school in Nsukka Education Zone of Enugu state. The SS2 students were
13
chosen because the topics used in the instrument of this study are contained in SS2
scheme of work. The content scope includes: Demand and supply, financial
institutions, public finance, labour force, alternative economic system, theory of cost
and inflation. The above topics were selected from the SS2 Economics syllabus. The
choice of these topics was because students always find them difficult to understand
during classroom teaching and learning.
Research Questions
The following research questions were posed to guide this study.
1. What are the standard errors of measurement of the test items of the multiple
choice test in Economics?
2. How do the items of the Economics multiple choice test fit the three-parameter
logistic (3PL) model?
3. What are the difficulty parameters of the test items of the multiple choice test
in Economics?
4. What are the discrimination parameters of the test items of multiple choice test
in Economics?
5. What are the guessing parameters of the test items of the multiple choice test in
Economics examinations?
6. What are the Differential item functioning of the test items of the multiple
choice test in Economics with respect to gender?
14
Research Hypotheses
The following null hypotheses (H0) were formulated and were tested at .05 level of
significance.
1. H01: There is no significant fit between the items of Economics multiple choice
test based on three-parameter model.
2. H02: The test items of multiple choice test in Economics do not function
differentially between male and female SS11 Economics students.
15
CHAPTER TWO
LITERATURE REVIEW
In this chapter, the researcher presents a review of related literature to the
present study. The review is organized under the following: conceptual framework,
theoretical framework, empirical studies and summary of literature review.
Conceptual Framework
• Concept of Achievement Test
• Procedures for Development of a Test
• Qualities of a Test
• Item Analysis
• Differential Item Functioning (DIF)
• Standard Errors of Measurement
• Concept of Gender
• Analysis of Fit
Theoretical Framework
• Classical Test Theory
• Item Response Theory
Empirical Studies
• Studies on Development and Validation of Instrument
• Studies on Item Response Theory
Summary of Literature Review
16
Conceptual Framework
Concept of Achievement Test
An achievement test is an examination designed to assess how much
knowledge a person has in a certain area or set of areas as a result of teaching. Ali
(2006) viewed achievement test as an instrument administered to an individual or
group as a stimuli to elicit certain desired or expected responses which represents
his/her ability. Every measuring instrument such as test is expected to possess certain
qualities so that whatever information obtained with it can be acceptable (Ezeh &
Onah, 2005). Any test and indeed any evaluation instrument must satisfy the criteria
of reliability, validity as well as objectivity (Anene & Ndubuisi, 2003). Achievement
test may be classified into teacher made test and standardized test. Teacher made test
are teachers own test (Onunkwo, 2002). They are tests constructed by individual
teachers in their schools for assessing their students/pupils. Ifeakor (2011) opined that
standardized test is the one that has norms. Norms are a set of descriptive data which
make it possible to determine the standing of a candidate in relation to a specified
reference group. Standardized tests provide a uniform set of questions, instructions
and method of administration. Tests for measuring the achievement of objectives in
the cognitive domain fall mainly into two categories: the essay test and objective test.
Onunkwo (2002) defined essay test as a test in which students are required to provide
answers to questions and offers students the opportunity to organize and express their
ideas in writing. The objective test such as multiple choice tests which is the focus of
this study can assume two forms; the first one may be a direct question which the
testees are requested to answer while the second one involves an incomplete question
17
posed to the testees to complete it (Onunkwo, 2002). Despite the form it occurs, any
multiple choice item has two parts namely; the stem and the alternatives (i.e, the
answer options). The stem is the direct questions or incomplete question while the
alternatives are the options from which the testees are instructed to pick only one
which is most correct.
Procedures for Development of a Test
In development of a test a number of steps are involved. These are: Content
Analysis: A test developer should have a clear outline of the subject matter or content
of the subject on which the test is being developed. Content analysis means that the
test developer should look at the relevant subject content on which the test is to based
and find out what the content is all about (Anene & Ndubisi, 2003).
Review of Instructional Objectives: The second step in the development of a test is the
review of instructional objectives. According to Anene and Ndubisi (2003),
instructional objectives are those behavioural changes, which a teacher expects to
notice in his students after they have been exposed to a particular topic. Therefore, a
test developer must be sure of the instructional objectives because these are the traits
he should be testing for in the testee.
Development of Test Blue Print: A table of specification is a plan or guide for test
preparation (Okolo, 2006). It specifies and states how number of questions to be asked
on each topic or course unit, and the number of questions on recall of facts,
comprehension, application etc.
18
A Sample of a Test Blue-print for a 50 Item Achievement Test.
Following the above examples, the number of questions for each cell is worked
out and this serve as the guide for constructing the test item.
Item Writing: This involves the written of the items of the test as guided by the test
blue print (Harbor-Peters, 1999).
Face Validation: This deals with what a scale appears to measure based on the various
items. Face validity, according to Polit and Hungler (2002), is the process of sending
scale items to experts in the field of the subject matter for criticism.
Item Review: Item review in view of Anene and Ndubisi (2003) involves looking
closely at the individual test items that have been written and choosing those that are
most appropriate so that at the end, those that survived the scrutiny are then be used
in the trial testing.
Trial testing: This involves administering the validated test to a large representative
sample of the students for whom the test was designed (Anastasi & Urbina 2002).
Content Know
ledge 40%
Compr
e
hensio
n 25%
Applic
ation
20%
Anal
ysis 5%
Synth
esis
5%
Evalu
ation
5%
Total
100%
Topic A
30%
6 4 3 1 0 1 15
Topic B
10%
2 1 1 0 0 0 4
Topic C
25%
5 3 3 1 0 1 13
Topic D
20%
4 3
2 0 1 1 11
Topic E
15%
3 2 2 0 0 0 7
Topic
100%
20 13 11 2 1 3 50
19
Item Analysis: In test construction item analysis is the last step the researcher
takes into consideration. (Anene and Ndubisi, 2003) asserted that item analysis
involves the analysis of responses to individual items that are in the test. They are
subjected to statistical analysis, so that those that pass the analysis are selected for the
final form of the test while those that fail are either discarded, or modified and tried
out again. All these procedure are seen in classical test theory and also could be seen
in item response theory.
Qualities of a Test
Measuring instrument used in psychology and education are tests, rating scales,
checklists, questionnaires and inventories. These instruments must possess certain
desirable qualities in order to be used as vital tools in psychological and educational
decisions. These qualities are validity, reliability, objectivity, and usability.
Concept of Validity
Validity centers on whether the instrument measures what it is intended to
measure. Ezeh (2003) stated that validity of a test refers to the extent to which a test
measures what it is supposed to measure and nothing else. Therefore, the validity of a
test depends on the purpose for which the test was developed. This means that a test,
which is valid for assessing achievement in S.S. II Economics, may not be valid for
assessing achievement in S.S. III Economics.
Types of Validity
There are four types of validity namely content validity, criterion-related
validity face and construct validity.
20
Content Validity
Content validity refers to the extent to which the test measures both the subject
matter content and the instructional objectives designed for a given course (Ezeh,
2003). It is the most appropriate form of validity for achievement test. A test blue
print or table of specification is used to ensure a systematic coverage of the entire
course content and instructional objectives.
Face Validity: This refers to the appropriateness of the test in relation to the course on
which test is based (Anikweze, 2010). A test has face validity when it appears valid to
examinees who take it, personnel who administer it and even to other untrained
observers.
Criterion Validity
This type of validity indicates the extent to which students who have been
taught based on the objectives being measured score higher on the test of those
objectives than students who have not been taught (Anikweze, 2010). If test of
proficiency has criterion validity, then students should score lower on it when used as
pretest than when it is used as posttest. Criterion validity is obtained by correlating the
two sets of scores from two testings. Examples of criterion-related validity include
concurrent and predictive validity.
Concurrent Validity
Concurrent validity deals with how present performance could be used to
estimate some other current measure of performance. For instance, the West African
Senior School Certificate Examination results could be used to predict performance in
the University Matriculation Examination. In the view of Martyn (2009), concurrent
21
validity measures the test against a benchmark test and high correlation indicates that
the test has strong criterion validity. For instance, if the scores from a test already
known to be valid test are highly correlated to a selection test administered to the
same group of learners, then concurrent validity is obtained for the selection test.
Predictive Validity
Onunkwo (2002) opined that predictive validity is the most relevant for
intelligence tests, aptitude tests, interest and attitude tests. All tests used in selection
of candidates (say into education, business, industry, armed force, etc) or in predicting
future performance/achievement must demonstrate high predictive validity.
Construct Validity
Ifeakor (2011) viewed construct validity as those educational and psychological
traits that cannot be seen with the eyes, their existence can only be inferred from
manifested characteristics or behavior ascribed to them. These traits can be attitude,
creativity intelligence, speed of reading, ability, interest, aptitude etc. If a test is able
to measure such psychological traits then the test has construct validity.
Concept of Reliability
Reliability of a test relates to the degree of consistency or stability, which the
text exhibits. According to Eze and Onah (2005), reliability can be seen as the degree
of consistency of two or more measures of the same thing. According to Eboh (2009)
reliability refers to the degree to which a given measurement procedure will give the
same description of that phenomenon if that measurement is repeated. It therefore,
concerns whether a particular technique will yield the same result always if repeatedly
applied to the same object. For instance, if Ngozi and Ifeoma each obtained scores of
22
70% in a given test and three days later, the same test was re-administered to the
same class, and their scores are 60% and 45% respectively, then the test is said to be
unrealistic because the sores are inconsistent.
Method of Measuring Reliability
The degree of consistency of a test is expressed as a coefficient called the
coefficient of reliability. This, in most cases is determined by correlating two sets of
scores independently obtained from the test. The reliability coefficient has been
defined as a description of the loss in efficiency of estimation resulting from
measurement error (Ferguson, 2011). It is therefore interpreted directly as the
proportion of true variance. For instance, if the reliability coefficient obtained for a
test is 0.90 this means that estimation resulting from true variance is 90%, while the
remaining 10% is due to error variance. In other words, 90% of the variance in the test
scores is due to true variance while the remaining 10% is attributable to chance factors
or error variance.
There are four types of reliability.
Stability
This is the correlation between two successive measurements with the same
test. Stability is the ability of the same test to give the same result whenever it is
administered on the same subjects within a given time interval (Harbour-Peters,
1999). This measure of stability, often called a test-retest estimate of reliability, is
obtained by administering a test to a group of individuals, re-administering the same
test to the same individuals at a later date, and correlating the two sets of scores using
Pearson (r) or spearman’s rank.
23
Equivalent Forms Reliability
Equivalent forms reliability is the successive administration of two parallel
forms of the same test. In order words, it is also referred to as parallel or alternate
form reliability method (Onunkwo, 2006). The two equivalent forms of a particular
test are administered to the same group of students. The students are administered
with one form of a test (say, Form A) on the first occasion and with a comparable
form of that test (say, Form B) on the second occasion. Their scores in the two forms
(i.e, A & B) are then correlated with Pearson r. The coefficient so computed
represents the equivalent-form reliability of the instrument.
Internal Consistency Reliability
Meredith, Joyce and Walter (2007) stressed that internal consistency is an
approach to estimating test score reliability that involves examination of the individual
items of the test.
Objectivity of a Test
The objectivity of a test refers to the degree to which equally competent scorers
obtain the same results. In Economics objective testing, as well as in the use of
various observational procedures, the results depend to a large extent upon the person
doing the scoring. Different persons get different results, and even the same person
may get different results at different time. Such inconsistency in scoring has an
adverse affect on the reliability of the measures obtained, for the test scores now
reflect the opinions and biases of the scorer as well as the differences among pupils in
the characteristic being measured.
24
Usability of a Test
Usability of a measuring instrument refers to the practicability of the
instrument. Harbors-Peters (1999) noted that it is the ability of a test to serve the
educational purpose it is design to serve. The usability of a test has implications on the
decisions taken on the test result. For instance, if a test is developed and validated for
use in schools, the cost of purchasing and administering such a test should be
affordable by the school. But where a valid test is developed and schools cannot
afford to purchase and use a test within the limit of the school time frame such a test is
not usable. These qualities of test instrument could also be applied in item response
theory.
Item Analysis
Item analysis has to do with the assessment of the adequacy of each of the
items that make up the test/instrument. During item analysis each of the items is
assessed in terms of its difficulty, discrimination, and distractor index (Abonyi, 2011).
Denga (2003) opined that item analysis is a process of assessing students’ responses to
each item in order to judge the quality or worth of the test. Item analysis is focused
upon answering two basic questions:
• How difficulty is each item for the students?
• To what extent did each item discriminate between good and poor students?
To answer these questions it is necessary to compute a statistical indices called
difficulty index for questions and another statistical index called discrimination
index.
25
Differential Item Functioning (DIF)
This refers to differences in the functioning of items across groups, oftentimes
demographic, which are matched on the latent trait or more generally the attribute
being measure by the items or test (Osterlind & Everson, 2009). It is important to note
that when examining items for DIF, the groups must be matched on the measured
attribute, otherwise this may result in inaccurate detection of DIF.
Standard Error of Measurement (S.E.M)
Any time a student takes a test, there is a possibility that the raw score
(observed score) obtained may be less or more than the score the students should have
received (true score). The difference between the observed score and the true score is
called the error score. Student true score = student observed score + student error
score. According to Chatterji (2003), standard error of measurement is a statistical
estimate of the amount of random error in the assessment of results or scores.
Meredith, Joyce & Walter (2007) indicated that standard error of measurement allows
you to determine the probable range within which the individual’s true score fall. The
standard error of measurement helps us to understand that the scores obtained in one
educational measurement are only estimates and may be considerably different from
individuals’ presumed true scores.
Concept of Gender
Gender is the range of physical, mental, and behavioral characteristics
pertaining to, and differentiating between, masculinity and femininity. According to
Lee (2001), gender is ascribed attribute that differentiates feminine from masculine.
The difference in academic achievement due to gender differences is crucial to the
26
educationists. The World Health Organization (2002) defines gender as the result of
socially constructed ideas about the behavior, actions, and role a particular sex
performs. Okeke (2006) described gender as socially or culturally constructed
characteristic, qualities, behaviours and roles which different societies ascribe to
female and males.
Analysis of Fit
The analysis of statistical fit is a check on internal validity (Obinne, 2013).
Within the latent trait test model, the internal validity of a test is assessed in terms of
the statistical fit of each item to the model. According to Korashy (1995), if the fit
statistic of an item is acceptable, then the item is valid. The IRT has three models:
one-parameter, two-parameter and three parameter models. If a given set of items fits
the model, this is the evidence that the items refer to unidimensional ability. Fit to the
model, also, implies that item discriminations are uniform and substantial, that there
are no errors in item scoring. However, a large positive fit statistics indicates no fitting
while a low statistics nearer to one indicates better fit.
Schematic Representation of Conceptual Framework
Steps in Test
Development
Content Analysis Content Analysis
Item Writing
Item Review Item Review
Development of test
blue print
Figure 1: Schema
27
The schema above indicates that there are steps involved in the development of
an instrument. This could be influenced by the following key variables-development
of Economics multiple choice questions, qualities of a test and validation of
Economics multiple-choice questions. The above framework describes how the
researcher develops and validates instrument measuring Economics multiple-choice
questions.
Theoretical Framework
The application of item response theory in the development and validation of
multiple choice questions in Economics is best described by two theories. The two
theories discussed in this study are classical test theory and item response theory.
An overview of Classical Test Theory
Classical test theory (CTT) is also known as True Score Model (TSM) The
basic idea behind the theory is that observed score (x) is made up of two components,
the true score and the error score (Anikweze, 2010). CTT is concerned with the
relationship between these three variables X, T, and E. This relationship is used to
discuss about the quality of the scores. The true score reflects the exact value of the
respondent’s ability or attitude. Mathematically, it is written as
X = T+ E
Where X = observed score
T = true score
E = error.
28
Mehrens and Lehmann (1978) emphasized that classical test theory describes
how errors of measurement can influence the observed scores. Onunkwo (2002)
indicated that the observed score (X) is the simple sum of a true score (T) and the
error score (E) reflects the effect of extraneous influences of the measurement process
at the time of measurement. Take for instance, a child’s mood at the time of
measurement may increase or decrease his test performance at that particular
measurement. The assumption of classical test theory is however difficult to realize in
practical situations because aside of random errors, the testing instruments used for
measurement for each testee is always different from the person’s true ability or
characteristics.
An overview of Item Response Theory
Item Response Theory (IRT) is commonly used to create a response curve
(probability of a student with a particular ability to answer the question correctly) for
each item and/or to create a scaled score for the whole test based on what is known
about each item (Windy & Carl, 2010). Item response theory according to Osterind
(2012) is an approach to modern educational and psychological measurement that
posits a particular notion about cognition and sets forth sophisticated statistics to
appraise cognitive processes. Its objective is to reliably calibrate individuals and test
stimuli (i.e., items and exercises) on a common scale that is interpreted to show the
individuals' ability or proficiency and specified characteristics of the test stimuli. IRT
is applicable to many practical testing problems, such as generalizability of test
results, various item analyses, examining test bias and differential item functioning,
equating test forms, estimating construct parameters, domain scoring, and adaptive
29
testing. Nering and Ostini (2010) see item response theory as latent trait
theory, strong true score theory, or modern mental test theory, a paradigm for the
design, analysis, and scoring of tests, questionnaire and similar instruments measuring
abilities, attitudes, or other variables. Unlike simpler alternatives for creating scales
and evaluating questionnaire responses it does not assume that each item is equally
difficult. Palmieri (2012) explained that it is a model-based version of test theory that
uses a mathematical function to describe the relationship between a person’s standing
on a latent trait and his/her item responses. When an appropriate model is selected, the
likelihood that a person will respond to an item in the keyed/direction is a function of
the person’s standing on the underlying construct and the item’s difficulty and
discrimination modeled as a function of person’s performance level of the trait being
measures and the characteristics of the items completed. Item response theory is also
a mathematical model that describes how people interact with test items (Embretson &
Reise, 2000). In IRT persons and items are located on the same continuum. Most IRT
models assume that the latent variable is represented by a unidimensional continuum.
In addition, for an item to have any utility it must be able to differentiate among
persons located at different points along a continuum. An item’s capacity to
differentiate among persons reduces our uncertainty about their locations. This
capacity to differentiate among people with different locations may be held constant
or allowed to vary across an instrument’s items. Therefore, individuals are
characterized in terms of their locations on the latent variable and, at a minimum,
items are characterized with respect to their locations and capacity to discriminate
among persons.
30
Assumptions of item response theory
• Unidimensionality of the Test
• Local Independence
• Item characteristics curve.
Unidimensionality
The IRT model is based on the assumption that the items are measuring a single
continuous latent variable θ ranging from -∞ to +∞ (Reeve, 2000). This implies that
the performance of each examinee is assumed to be governed by a single factor,
referred to as ability (though it should be noted that ability is a generic convention
used in measurement, referring to the construct and does not imply innate cognitive
potential). The assumption of unidimensionality means that a set of items and/or a test
measure(s) only one latent trait (θ), and local independence refers to the assumption
that there is no statistical relationship between examinees’ responses to the pairs of
items in a test, once the primary trait measured by the test is removed (Kyung, 2013).
Local Independence
Item responses are independent of one another given ability, once you know a
person’s ability level, the student responses to items are independent of one another.
This is one of the hallmark assumptions in IRT, and it makes many things possible (it
will also be important for estimating examinee trait levels) Conditional independence
provides us with statistically independent probabilities for item. In the words of Revee
(2000), assumption of local independence asserts that responses to an item are
independent of responses to another item once controlling for the underlying variable
measured by the scale. This concept is related to that of unidimensionality, if one trait
31
determines success on each item, then examinee ability is the only thing that
systematically affects item performance. Local independence means that if the trait
level is held constant, there should be no association among the item responses
Item Characteristic Curve
The Item Characteristic Curve (ICC) or Item Characteristic Function (ICF) is a
mathematical function that relates the probability of success on an item to the ability
measures by the item set or test that contains it. It is a basic building block of item
response theory; all the other constructs of the theory depend upon this curve (Baker,
2001). The item characteristics curve gives a clear distinction among different latent
trait models. There are two technical properties of an item characteristic curve that are
used to describe it. The first is the difficulty of the item. Under item response theory,
the difficulty of an item describes where the item functions along the ability scale. For
example, an easy item functions among the low-ability examinees and a hard item
functions among the high-ability examinees; thus, difficulty is a location index.
The second technical property is discrimination, which describes how well an
item can differentiate between examinees having abilities below the item location and
those having abilities above the item location. This property essentially reflects the
steepness of the item characteristic curve in its middle section.
32
Figure 2: A diagram of a typical item characteristics curve.
The probability of correct response is near zero at the lowest levels of ability. It
increases until at the highest levels of ability, the probability of correct response
approaches 1. This S-shaped curve describes the relationship between the probability
of correct response to an item and the ability scale.
Item Response Parameter
IRT parameters are not dependent on the sample used to generate the
parameters, and are assumed to be invariant (within a linear transformation) across
divergent groups within a research population and across populations (Reeve, 2002).
IRT models are described by the number of parameters they make use of. The three
33
parameter logistic (3PL) model is named so because it employs three item parameters.
Such as item difficulty, discrimination and guessing parameter.
The equation for the three-parameter model is:
P (�) = c + (1- c) = 1
1 +e – a
(� – b
)
Where:
b is the difficulty parameter
a is the discrimination parameter
c is the guessing parameter and
è is the ability level
The parameter c is the probability of getting the item correct by guessing alone.
It is important to note that by definition, the value of c does not vary as a function of
the ability level. Thus, the lowest and highest ability examinees have the same
probability of getting the item correct by guessing. The two-parameter logistic (2PL)
model assumes that the data have no guessing, but that items can vary in terms of
location (bi) i.e difficulty and discrimination (ai).The equation for the two-parameter
model is given below:
P (�) = 1 = 1
1 + e -L
1 +e – a
(� – b
)
Where: e is the constant
b is the difficulty parameter a is the discrimination parameter1
L = a (è - b) is the logistic deviate (logit) and è is an ability level. The difficulty
parameter, denoted by b, is defined as the point on the ability scale is the probability
of correct response to the item. The one-parameter logistic (1PL) model assumes that
34
data have no discrimination and guessing. Items are only described by a single
parameter in terms of location or difficulty (bi). The results in one-parameter models
have the property of specific objectivity, meaning that the rank of the item difficulty is
the same for all respondents independent of ability, and that the rank of the person
ability is the same for items independently of difficulty.
The equation for one parameter model is given by the following:
P (�) = 1
1 + e -1(�
– b )
Where: b is the difficulty parameter and è is the ability level
The above theories shown that item response theory is the modern theory that
describe the students ability using item by item performance, than the classical test
theory. Therefore, the study focused on item response theory.
Review of Empirical Studies
In this section, empirical studies that have been carried out are presented. This
is to ascertain trends, agreements and disagreements with the intention of establishing
ground for comparison of the findings of this present study.
Studies on Item Response Theory
Nkpone (2001) carried out a study on the application of latent trait models in
the development and standardization of physics achievement test for senior secondary
schools. The study determined the estimates of the item parameter using the one
parameter logistic model with 359 senior secondary schools, students used for the
study. Result showed that the items ranged in difficulty from -1.49 to 0.49. The
estimated value shows consistency with the way the Physics Achievement Test (PAT)
35
items were written to increase in difficulty within each content area. Approximately,
22 items out of 60 items were easy with difficulty level less than zero. About 37 items
were difficult with level of more than zero. The mean of the difficulty estimates was
zero; standard deviation equals 0.31 suggesting that there was little variability in
scores among the subjects. The study also estimated item parameters using 2pl model.
The result showed that the items were moderately difficult and has uniform
discrimination indices ranging from 1.76 to 0.39. Item difficulty indices ranged from
1.66 to 0.69. An effective discriminating power consist of item with discriminating
indices greater than 0.8 and an item difficulty index displaying a rectangular
distribution from -2.0to 2.0 for the latent trait 2pl model. The study was further the
estimate of the standard error for each of the PAT items. It was found that the standard
errors ranges from 0.0578 to 0.0518 with the mean of the standard error as 0.17 or
17% of the total variance unreliability while 83% is attributed to or due to true
variance reliability. The largest standard error was less than 10% of the range of
standard error values. This means that the difficulty indices have been estimated with
excellent precision. This study relates to the current one especially the use of
parameter models. However, the researcher did not include the three-parameter
logistic (3pl) model. The sample of the study was also very small to compare with the
present study. This is the basis for this study.
In a study carried out by Obinne (2008) to examine the psychometric properties
of the items of the Biology examinations conducted by the National Examination
Council (NECO), and the West African Examination Council (WAEC) using the Item
Response Theory (IRT). The study adopted an instrumentation research design.
36
Research questions and hypotheses were formulated, tested, and analyzed. The sample
was made up of 1800 senior secondary year three students from 36 secondary schools
in the urban and rural areas of Benue State. The multistage stratified sampling
technique was used. The NECO and WAEC Biology examination questions from 200-
2002 were the instruments for data collection. Maximum likelihood estimation
technique (using BILOG MG computer programme) was used to analyze the research
questions, according to IRT procedures. The t-test was used to test the hypotheses. It
was found that the Biology examination items from the two examination bodies were
equally reliable and valid. Biology items in the NECO-conducted examination for
2001 were more difficult than those of WAEC of the same year. WEAC items were
more prone to guessing than those of NECO items. It was recommended that IRT
procedures should be adopted by all examination bodies in Nigeria so that our
measurement problems could be put to rest. This study relates to the current one
especially in the area of design, the method of data analysis for research questions but
differ in the area of study and method of data analysis for testing hypothesis.
Orangi and Dorani (2010) conducted a research to develop a social studies
achievement test for high school students based on item-response theory (IRT). The
purpose of the study was to develop a social study achievement test for high school
Students (first grade) based on item response theory. The sample consisted of 321
high school students in Tehran. Multi-stage cluster sampling was used for selecting
the participants. The study adopted an instrumentation research design. The first step
in conducting this exam was to prepare two parallel forms of multiple choice style
which on one hand concentrated on the educational objectives and on the other hand
37
on the content of the lessons. These forms were administered in three preliminary
stages, first: the questions were analyzed for any ambiguity in their composition, the
comprehension of expressions was tested and the like, in the second stage which was
the practical stage, consisted of determining the difficulty level of the questions, the
students ability to recognize questions and also the level of interdependence of
questions with the overall score etc. Both of these forms were administrated to the
sample group in a ten day interval. The results show that the constructed forms were
of high reliability, they were at the same time acknowledgeable through the analysis
based on the classical Method and they were also in accordance with the three–factors
of the Item Response Theory. Taking into account the Item characteristic curve, both
of the forms produced the knowledge for the students with average ability. In this
analysis a kind of rank-percentile norm for both sex was formulated. This study has
one feature with the current study, for example: the design was the same but differ
with the sampling technique and sampling size.
Studies on Development and Validation of Instrument
Bradley and Herrin (2004) carried out a research to develop and validate an
Instrument to Measure Knowledge of Evidence-Based Practice and Searching Skills.
The aim of this study was to develop and validate three instruments which measure
knowledge about searching for and critically appraising scientific articles (evidence-
based practice-EBP). Twenty three questions were collected from previous studies and
modified by an expert panel. These questions were then administered to 55 delegates
before and after two international conferences in EBP; the responses were assessed for
discriminative ability and internal consistency. Five questions were discarded and
38
three instruments of six questions each were developed. Finally, the instruments were
revalidated in a randomized controlled trial comparing two educational interventions
at the University of Oslo, Norway by 166 of 175 eligible medical students. In the re-
validation, the instruments showed satisfactory level of discriminate validity (p<0.05),
but borderline levels of internal consistency (Cronbach’s α 0.52-0.61). More research
is needed to develop a suitable instrument which includes questions on searching for
evidence. The study is in agreement with the current study because, the current study
did not mention the design used. Moreover, the sampling technique and the sampling
size were not mentioned. On this premise, lies the rational to ascertain research
design, sample and sampling technique and the sampling size.
Jeffrey and Wendy (2006), sought to develop and validate an instrument to
assess secondary school students’ perceptions of assessment tasks was conducted.
Following a review of literature, a five-scale instrument of 40 items was trialed with a
sample of 658 science students in 11 English secondary schools. Based on internal
consistency reliability data and exploratory factor analysis, refinement decisions
resulted in a five-scale instrument called the Perceptions of Assessment Tasks
Inventory (PATI). The scales of the PATI are Congruence with planned learning,
Authenticity, Student consultation, Transparency and Diversity. The current study has
the similar view of developing and validation of an instrument. This study did not
mention the research design used, the sampling technique and the sampling size used
hence the need for the current study to describe research design, sample and sampling
technique and the sampling size.
39
Okoro (2010) conducted a research to develop and validate extracurricular
instructional package in social studies. The population consisted of all the JS1
students in Rivers State, The study employed a random sampling and stratified
sampling technique was used. One hundred and sixty students were drawn from the
population. To achieve these objectives two research questions and three hypotheses
were formulated. Four instruments were developed. The design adopted was
experimental study. The validated extracurricular instructional package (EIP) was
presented to the experimental group while the control group was taught the same
social studies topics using the conventional approach. the major findings were that (1)
JS1 Students taught with extracurricular instructional packages relationship develop
more cooperative attitude to work (2) exhibited cordial relationship with others (3)
developed positive attitude to work. Recommendations were made: Teachers work
load should be restructured to accommodate their involvement in extracurricular
programs (2) More flexible time-release from teaching or in structuring the allow time
for activities during school day. However, the design and sampling technique differ.
Another one is that sample is very small to compare with the current study.
All the empirical studies reviewed so far, revolve around the major variables of
the current study such as development, validation and item response theory.
Therefore, the researcher deems it appropriate to review them in this study as they
help in understanding what researchers have done before and the gap between such
studies and the present study.
40
Summary of Literature Review
In this summary of literature the study has been reviewed under the basic issues
related to the topic of this study. The conceptual framework covered the concept of
achievement test, procedures for development of a test, qualities of a test, item
analysis, differential item functioning and concept of gender. The literature reviewed
on conceptual issues revealed some of the qualities of a test which will guide the
researcher while setting questions in different concepts in Economics. Literature on
theoretical framework of the study covered item response theory and classical test
theory. The theory will guide the researcher in identifying and selecting ideas that will
stimulate the mind of the researchers in choosing the appropriate measurement frame
work in analyzing the scores of the students. From the empirical studies reviewed,
some studies were carried out on development, validation and item response theory.
The empirical studies gathered information on studies related to the present
study. However, none of the studies reviewed focused on the application of item
response theory in the development and validation of an instrument measuring
achievement in Economics. This aspect of theory is important in modeling the
relationship between an observed variable, usually conceptualized as examinees’
ability and the probability of examinee responding to any particular item. On this
premise, lies the rational for this study.
41
CHAPTER THREE
RESEARCH METHOD
In this chapter, the researcher describes the procedures that were adopted for
the study. The procedures are the design of the study, the area of the study,
population, sample and sampling technique, instrument for data collection, validation
of instrument, reliability of the instrument, methods of data collection and method of
data analysis.
Design of the Study
The design of this study is an instrumentation research design. Instrumentation
research design, according to Ali (2006), is when the major thrust of the study is
geared entirely towards the development and standardization of an instrument whose
different psychometric properties (validity, reliability, usability e.t.c) have been
empirically determined. The design is appropriate for this study because the researcher
developed test items of multiple-choice test in Economics that were analyzed with
reference to their psychometric properties.
Area of the Study
This study was conducted in Nsukka Education Zone of Enugu State. Nsukka
Education Zone is made up of three Local Government Areas (LGAs) namely;
Nsukka, Igbo-Etiti and Uzo-Uwani LGAs. The zone has 58 secondary schools,
distributed as follows: Nsukka has 30 secondary schools, and Igbo-Etiti has 16
secondary schools while Uzo-Uwani has 12 secondary schools. The distribution of the
schools according to L.G.A, type and ownership is shown in (Appendix A, page 74).
42
Population of the Study
The population of this study comprised the entire secondary school two (SS2)
Economics students in all the 46 government co-education senior secondary schools in
Nsukka Education Zone. Out of 58 government secondary schools in Nsukka
education zone, 46 schools are co-education schools while 12 schools are single
schools. The population of SS2 Economics students in the 46 government co-
education senior secondary schools in Nsukka Education Zone is three thousand seven
hundred and ninety five (3795). The distribution of the school population follows
thus; Nsukka local government has twenty two (22) senior secondary schools with
population of SS2 Economics students of two thousand and seventy nine (2079); Igbo-
Etiti has thirteen (13) senior secondary schools with population of SS2 Economics
students of one thousand two hundred and thirty two (1232); while Uzo-Uwani has
eleven (11) senior secondary schools with population of SS2 Economics students of
four hundred and eighty four (484) (Post Primary School Management Board
(PPSMB) Nsukka, 2013/2014 academic session). (See Appendix B, page 77). The
decision to use co-education senior secondary schools was because this study has
gender as a variable.
Sample and Sampling Technique
The sample size of this study was one thousand and five (1005) SS2 Economics
students from 46 government co-education senior secondary schools in Nsukka
Education Zone. Proportionate stratified random sampling technique was adopted to
enable the population of SS2 Economics students to be drawn approximately from
each local government area. Based on this premise, a sample of five hundred and fifty
43
one (551) Economics students were randomly selected from a population of 2079
Economics students in Nsukka L.GA; and a sample of three hundred and twenty six
(326) Economics students were randomly selected from the population of 1232
Economics students in Igbo-Etit Local government area while a sample of one
hundred and twenty eight(128) were randomly selected from a population of four
hundred and eighty four (484) Economics students in Uzo-Uwani local government
area. All SS2 Economics students in each sampled school were used. In all there were
462 males and 543 females giving a total of 1005. This is in line with Nworgu’s
(2006) recommendation, that proportionate stratified random sampling ensures greater
representativeness of the sample relative to the population. (See Appendix C page,
80). The decision to use 1005 as sample of this study is that adequate sample sizes for
(IRT) should not be less than 1000. Kim (2006) indicated that the use of 1000
examinees can be depended upon to give adequate parameter estimation results.
Instrument for Data Collection
The instrument for data collection for this study was the Economics Multiple
Choice Test (EMT) developed by the researcher. The Economics Multiple-Choice
Test (EMT) was based on the following topics: Demand and supply, financial
institution, Public finance, Labour force, Alternative Economics system, Theory of
cost and Inflation drawn from SS2 syllabus. The instrument consists of 50 multiple
choice questions each with 4 options (A-D). One mark was given for each correct
responses and zero for incorrect responses (See Appendix D page, 81). Scoring guide
which contained all the answers to the Fifty (50) multiple choice questions was also
developed by the researcher. (See Appendix E page, 89).
44
Validation of Instrument
The instrument was subjected to face validation and content validation. The
instrument was given to three experts for face validation, two lecturers from
measurement and evaluation, Department of Science Education and one from
Department of Economics all in University of Nigeria, Nsukka. The experts were
asked to examine the instrument with respect to:
• Whether the questions correspond to the table of specifications
• The structure and clarity of the questions;
• Whether the answers to the questions tally with the ones in the marking
scheme. The corrections and suggestions of these experts helped in modifying
the items in EMT. Content validation of the test was carried out by preparing
the table of specification based on the six levels of cognitive domain of
Bloom’s taxonomy of education (See Appendix F, page 90). The comments
and recommendations of these experts have been incorporated in the final
version of the instrument.
Reliability of the Instrument
The EMT was administered to twenty five (25) SS11 Economics students in
Nsukka Education Zone. The school used was outside the sample of the study, but has
some degree of similarities with sampled schools. Their responses were scored and
analyzed using Kuder-Richardson (KR-20) formula to determine the internal
consistency (reliability) of the instrument. Reliability index of 0.89 was obtained (See
45
Appendix G page, 91). The decision to use K-20 was due to the fact that the items
were dichotomously scored for a single administration.
Method of Data Collection
The data for this study were collected through the use of Economics Multiple
Choice Test (EMT). The researcher visited the sampled schools to collect the data for
the study. The copies of the instrument were administered to the students through the
assistance of the Economics teacher in the respective sampled schools. The test was
administered to the students under a good atmosphere and the test lasted for
50minutes. The administration of the instrument was done once and was retrieved
immediately for recording and analysis.
Method of Data Analysis
The research questions were answered using maximum likelihood estimation
technique of the BILOG-MG V3 of 3PL MODEL computer programming while the
hypotheses were tested using BILOG-MG V3 of DIF MODEL computer
programming. BILOG-MG is a software program for IRT analysis of dichotomous
(correct/incorrect) data, including fit and differential item functioning.
46
CHAPTER FOUR
RESULTS
In this chapter, the researcher presents the results obtained from the data in this
study. The results are presented based on research questions and hypotheses.
Research Question One: What are the standard errors of measurement of the test
items of the multiple choice test in Economics?
Table 1: Standard errors of measurement of the test items of the multiple choice test
in Economics based on three-parameter logistic (3PL) model.
Item S.E Item S.E Item S.E
1 0.44 19 0.10 37 0.06
2 0.27 20 0.08 38 0.14
3 0.12 21 0.08 39 0.05
4 0.09 22 0.09 40 0.12
5 0.10 23 0.13 41 0.07
6 0.10 24 0.07 42 0.05
7 0.16 25 0.08 43 0.07
8 0.06 26 0.15 44 0.16
9 0.09 27 0.33 45 0.05
10 0.10 28 0.08 46 0.15
11 0.22 29 0.07 47 0.09
12 0.14 30 0.08 48 0.07
13 0.05 31 0.06 49 0.16
14 0.11 32 0.24 50 0.20
15 0.09 33 0.08 16 0.36 34 0.09 17 0.58 35 0.09 18 0.06 36 0.07
The result in Table 1 shows the standard errors of measurement of the test
items of the multiple choice questions in Economics based on three parameter logistic
(3PL) model. Based on the data in table 1, all the items with the exception of item 17
have a standard error of 0.05 to 0.44. Therefore, forty nine (49) items (98%) had
47
standard error below 0.50 and one (1) item (2%) had standard error above 0.50. The
standard error below 0.50 indicates high reliability while standard error above 0.50
indicates low reliability. This high reliability indicated consistency in measuring the
students’ ability in Economics.
Research Question Two: How do the items of the Economics multiple choice test fit
the three-parameter logistic (3PL) model?
Table 2: Fits statistics of Economics3 multiple choice test based on three parameter
logistic (3PL) model.
Item Chi.sq. Prob Item Chi.Sq. Prob Item Chi.Sq. Prob
1 51.2 0. 10 19 76.0 0.00* 37 52.1 0.00*
2 37.6 0.00* 20 43.9 0.03* 38 29.3 0.00*
3 67.3 0.12 21 31.7 0.09 39 23.2 0.26
4 48.5 0.00* 22 44.2 0.18 40 179.9 0.00*
5 51.9 0.20 23 77.4 0.00* 41 77.8 0.04*
6 47.6 0.00* 24 13.7 0.06 42 23.8 0.00*
7 96.6 0.15 25 40.0 0.00* 43 70.4 0.00*
8 30.5 0.00* 26 84.2 0.13 44 116.5 0.00*
9 90.9 0.14 27 18.0 0.02* 45 26.1 0.09
10 46.7 0.05 28 79.0 0.06 46 41.0 0.00*
11 79.4 0.00* 29 46.0 0.07 47 138.4 0.13
12 57.1 0.16 30 43.7 0.00* 48 33.5 0.00*
13 31.5 0.03* 31 21.3 0.00* 49 45.5 0.02*
14 18.2 0.01* 32 103.4 0.08 50 94.3 0.09
15 55.0 0.08 33 48.7 0.00*
16 35.2 0.00* 34 45.4 0.01*
17 31.4 0.07 35 92.6 0.24
18 84.2 0.00* 36 55.2 0.00*
*Significant
Table 2 revealed the chi-square goodness-of-fit analysis for the items of the
multiple choice questions in Economics based on three parameter logistic (3pl) model.
48
Summary of the results revealed that the chi-square value linked with the probability
value ranged from 0.00 to 0.26. Based on the data in table 2, twenty nine (29) items
(58%) that is items 2, 4, 6, 8, 11,13, 14, 16, 18, 19, 20, 23, 25, 27, 30, 31, 33, 34, 36
37, 38, 40, 41, 42, 43, 44, 46, 48 and 49 did not fit the three parameter model because
the items were below .05 level of significant. Twenty one (21) items (42%) that is,
items 1, 3, 5, 7, 9, 10, 12, 15, 17, 21, 22, 24, 26, 28, 29, 32, 35, , 39, 45, 47, and 50
fitted the three parameter model because the items were above .05 level of significant.
These items are not marked with asterisk. This implies that 29 items were statistically
significant while 21 items were not statistically significant. The criterion for all the
items fit/misfit was determined at .05 level of significance.
Research Question Three: What are the difficulty parameters of the items of the
multiple choice test in Economics?
Table 3: Item threshold values (difficulty estimates) of the items of the multiple
choice test in Economics based on three parameter logistic (3PL) model.
Item Threshold Item Threshold Item Threshold
1 1.17 19 -0.52 37 0.44
2 -1.15 20 0.35 38 0.06
3 -0.35 21 -0.59 39 0.30
4 -0.44 22 0.13 40 0.27
5 -0.35 23 0.26 41 -0.34
6 -0.27 24 0.19 42 0.24
7 -0.16 25 -0.30 43 0.34
8 -0.07 26 -0.60 44 0.14
9 0.73 27 -1.49 45 -0.18
10 -0.61 28 -0.59 46 -1.12
11 -0.27 29 0.04 47 -0.59
12 -0.94 30 -0.73 48 0.08
13 0.17 31 -0.14 49 -0.64
14 -0.71 32 -1.38 50 -0.38
15 -0.48 33 -0.46
16 -2.10 34 -0.22 17 -2.38 35 0.10 18 -0.06 36 -0.09
49
Table 3 shows that thirty three (33) items (66%) that is items 2, 3, 4, 5, 6, 7, 8,
10 11, 12, 14, 15, 16, 17, 18, 19, 21, 25, 26, 27, 28, 30, 31, 32, 33, 34, 36, 41, 45, 46,
47, 49 and 50 within the b-value range of -3 to +3 had negative difficult estimates
while seventeen (17) items (34%) that is items, 1, 9, 13, 20, 22, 23, 24, 29, 35, 37, 38,
39, 40, 42, 43, 44 and 48 within the b-value range of -3 to +3 had positive difficulty
estimates. The negative estimates imply that 33 items are easy while 17 items are
difficult. Based on this information, none of the items were rejected in terms of
difficulty levels.
Research Question Four: What are the discrimination parameters of the test items of
the multiple choice test in Economics?
Table 4: Item parameters of the test items of the multiple choice test in Economics
based on three parameter logistic (3PL) model.
Item Slope Item Slope Item Slope 1 0.12 19 0.51 37 3.30
2 0.20 20 0.56 38 1.02
3 0.39 21 0.67 39 0.91
4 0.49 22 0.13 40 1.29
5 0.45 23 1.10 41 0.72
6 0.44 24 1.21 42 0.85
7 0.58 25 0.60 43 1.71
8 0.79 26 0.32 44 0.90
9 0.97 27 0.19 45 0.97
10 0.52 28 0.66 46 0.38
11 0.43 29 1.29 47 0.57
12 0.39 30 0.74 48 0.67
13 0.96 31 0.97 49 0.32
14 0.47 32 0.25 50 0.21
15 0.51 33 0.60 16 0.23 34 1.14 17 0.13 35 0.45 18 0.88 36 0.61
50
Table 4 reveals that Ten (10) items (20%), that is items 1, 2, 16, 17, 22, 26, 27,
32, 49 and 50 within the value range of .01 - .34 indicated very low discriminating
values, while eighteen (18) items (36%) that is items 3, 4, 5, 6, 7, 10, 11, 12, 14,
15,19, 20,25, 33, 35, 36, 46 and 47 within the value range of .35 - .64 indicated low
discriminating values. Also, twenty (20) items (40%) that is item 8, 9, 13, 18, 21, 23,
24, 28, 29, 30, 31, 34, 38, 39, 40, 41, 42, 44, 45 and 48 within the value range of .65 -
1.34 indicated moderate discriminating values and (43 & 37) items (4%) had values of
1.71 and 3.30 respectively, meaning that the two items had a very high discriminating
attributes.
Research Question Five: What are the guessing parameters of the test items of the
multiple choice test in Economics?
Table 5: Guessing parameters of the test items of the multiple choice test in
Economics based on three parameter logistic (3PL) model.
Item Asymptote Item Asymptote Item Asymptote
1 0.08 18 0.15 35 0.10
2 0.03 19 0.00 36 0.00
3 0.09 20 0.12 37 0.00
4 0.00 21 0.00 38 0.24
5 0.10 22 0.07 39 0.00
6 0.00 23 0.32 40 0.40
7 0.01 24 0.00 41 0.11
8 0.08 25 0.04 42 0.00
9 0.18 26 0.00 43 0.25
10 0.00 27 0.00 44 0.23
11 0.02 28 0.17 45 0.00
12 0.00 29 0.09 46 0.00
13 0.05 30 0.00 47 0.07
14 0.01 31 0.00 48 0.00
15 0.00 32 0.05 49 0.15
16 0.02 33 0.00 50 0.00
17 0.13 34 0.16
51
Table 5 shows the guessing (asymptote) values of the items of multiple choice
questions in Economics based on three parameter logistic (3pl) model. The data
reveals that items were ranged from 0.00 to 0.32. Based on the data in table 5, forty
five (45) items (90%) that is items 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17,
18, 19, 20, 21, 22, 24, 25, 26, 27, …………….37, 39, 41, 42, 45, 46, 47, 48, 49 and 50
fall within the c-value range of 0.00 to 0.20 which shows that the items were desirable
and the probability of getting an answer correctly by mere guessing is low while five
(5) items (10%) fall within the c-value range of 0.20 to 0.30 that is items 23, 38, 40,
43 and 44 which shows that the items were not very good and the probability of
getting an answer correctly by mere guessing is high.
Research Question Six: What are the differential item functioning of the test items of
the multiple choice test in Economics with respect to gender.
Table 6: Model for group differential item functioning of the test items of the multiple
choice test in Economics
Item Group
P
Chi.Sq
Item
Group
P Chi.Sq
1 Male
Female
0.00
0.00
120.2*
266.8*
28 Male
Female
0.85
0.57
4.0*
6.7*
2 Male
Female
0.00
0.00
68.2*
113.5*
29 Male
Female
0.55
0.00
15.2*
200.0*
3 Male
Female
0.72
0.00
5.3*
22.2*
30 Male
Female
0.00
0.00
23.8*
45.0*
4 Male
Female
0.49
0.48
7.4*
7.5*
31 Male
Female
0.24
0.00
10.4*
101.8*
5 Male
Female
0.89
0.03
3.5*
16.8*
32 Male
Female
0.32
0.00
9.2*
61.0*
6 Male
Female
0.15
0.00
11.9*
40.8*
33 Male
Female
0.54
0.65
6.9*
5.9*
7 Male
Female
0.00
0.00
26.3*
36.9*
34 Male
Female
0.32
0.00
9.0*
21.0*
8 Male 0.00 23.7* 35 Male 0.99 0.9*
52
Female 0.00 37.6* Female 0.00 68.6*
9 Male
Female
0.6
0.59
6.2*
6.5*
36 Male
Female
0.04
0.13
15.7*
12.4*
10 Male
Female
0.9
0.92
3.0*
3.2*
37 Male
Female
0.23
0.30
10.4*
9.4*
11 Male
Female
0.76
6.0
5.0*
6.4*
38 Male
Female
0.25
0.83
10.1*
4.2*
12 Male
Female
0.58
0.01
6.6*
19.8*
39 Male
Female
0.00
0.00
44.9*
78.9*
13 Male
Female
0.00
0.00
40.0*
105.6*
40 Male
Female
0.24
0.07
10.4*
14.5*
14 Male
Female
0.72
0.70
10.7
10.7
41 Male
Female
0.00
0.19
31.1*
11.1*
15 Male
Female
0.61
0.00
6.3*
22.3*
42 Male
Female
0.00
0.00
31.1*
68.4*
16 Male
Female
0.00
0.00
49.4*
109.2*
43 Male
Female
0.00
0.19
22.8*
11.2*
17 Male
Female
0.00
0.00
92.8*
242.6*
44 Male
Female
0.72
0.00
5.3*
24.5*
18 Male
Female
0.00
0.00
89.8*
90.4*
45 Male
Female
0.00
0.00
99.8*
83.8*
19 Male
Female
0.00
0.00
30.1*
30.0*
46 Male
Female
0.00
0.46
13.3
13.3
20 Male
Female
0.10
0.00
13.2*
20.3*
47 Male
Female
0.79
0.98
4.7*
2.0*
21 Male
Female
0.29
0.01
9.5
9.5
48 Male
Female
0.02
0.02
18.1*
18.0*
22 Male
Female
0.00
0.00
27.5*
147.4*
49 Male
Female
0.02
0.00
17.9*
76.2*
23 Male
Female
0.97
0.83
2.1*
4.2*
50 Male
Female
0.00
0.00
141.6*
228.0*
24 Male
Female
0.00
0.00
80.2*
134.8*
25 Male
Female
0.00
0.81
20.7*
4.5*
26 Male
Female
0.04
0.00
16.1*
71.9*
27 Male
Female
0.00
0.00
107.2
107.2
Table 6 shows the adjusted threshold values for group differential item
functioning of the test items of the multiple choice questions in Economics. From the
53
data, the result indicated that Differential Item Functioning (DIF) effects were
observed among 46 items (92%) that is items 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13,
15,16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 28…………….45, 47, 48, 49 and 50. This
shows that 46 items were identified as significantly exhibiting differential functioning
among male and female students. Four (4) items (8%) that is items 14, 21, 27, and 46
were identified as not exhibiting differential functioning among male and female
students. This is refers to unidimentionality ability. It reveals that the item
discriminations are uniform and substantial. The chi-square values were used to
dictate the differential item effect.
Research Hypothesis One: There is no significant fit between the items of
Economics multiple choice test based on three-parameter model. The chi-square
goodness-of- fit was used to test if there is fit between the items of Economics
multiple choice questions. The data for testing hypothesis one is presented in table 2.
The result shows that twenty nine (29) items (58%) that is items 2, 4, 6, 8, 11,13, 14,
16, 18, 19, 20, 23, 25, 27, 30, 31, 33, 34, 36 37, 38, 40, 41, 42, 43, 44, 46, 48 and 49
did not fit the three parameter model because the items were below .05 level of
significant. Twenty one (21) items (42%) that is, items 1, 3, 5, 7, 9, 10, 12, 15, 17, 21,
22, 24, 26, 28, 29, 32, 35, , 39, 45, 47, and 50 fitted the three parameter model
because the items were above .05 level of significant. Based on this premise, the null
hypothesis which states that there is no significant fit between the items of Economics
multiple choice based on three-parameter model was accepted for 29 items and
rejected for 21 items.
54
Research Hypotheses Two: The test items of multiple choice test in Economics do
not function differentially between male and female SS11 Economics students. The
model for group differential item functioning was used to test if there is differential
functioning effect between male and female students in Economics multiple choice
questions. The data for testing the hypothesis two is presented in table 6. The data in
table 6 indicates that forty six (46) items that is items 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12,
13, 15,16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 28…………45, 47, 48, 49 and 50 in the
Economics test were identified as significantly exhibiting differential item functioning
between male and female students at .05 level of significant while four (4) items that
is 14, 21, 27 and 46 do not differential function between male and female students.
The results also indicate that out of 50 test items of Economics multiple choice
questions male and female students perform differently in 46 items and none in 4
items.
Summary of Findings
Based on the results of the analysis of data presented in this chapter, the
following findings were established:
1. That forty nine (49) items had standard error below 0.50, indicating high
reliability of the test items while one (1) item indicates low reliability.
2. That twenty nine (29) items were statistically significant indicating that the
items did not fit the three parameter model while twenty one (21) items were
statistically not significant indicating that the item fitted the three parameter
model.
55
3. That thirty three (33) items which fall within the b-value range of -3 to +3 had
negative difficult estimates indicating easy items while seventeen (17) items
within the same range had positive difficulty estimates indicating difficult
items.
4. That Ten (10) items indicates very low discriminating values, eighteen (18)
items indicates low discriminating values, twenty (20) items discrimination
moderate discriminating values and two (2) items indicates high discriminating
values.
5. That forty five (45) items were considered desirable, meaning that the
probability of getting an answer correctly by mere guessing is low while five
(5) items were considered not very good, meaning that the probability of
getting an answer correctly by mere guessing is high.
6. The results of the analysis indicated that male and female Economics students
function differential in 46 items and no difference in 4 items.
56
CHAPTER FIVE
DISCUSSION OF FINDINGS, CONCLUSION, IMPLICATIONS,
RECOMMENDATIONS AND SUMMARY
In this chapter, the results are discussed based on the analyzed data. Conclusions
based on the results are also drawn. The limitations of the study, implications of the study
and recommendations for further studies are indicated; the summary of the entire study is
presented.
Discussion of Findings
The discussion of findings is based on the following:
1. Standard errors of measurement of the Economics multiple choice test.
2. Fits statistics of Economics multiple choice test.
3. Item threshold values (difficulty estimates) of Economics multiple choice questions.
4. Item parameters of the Economics multiple choice test.
5. Guessing parameters of the Economics multiple choice test.
6. Differential item functioning of the Economics multiple choice test with respect to
gender.
Standard errors of measurement of the Economics multiple choice test.
57
The findings of the study revealed standard error of measurement of the test
items of multiple choice questions in Economics based on three parameter logistic
(3pl) model. From the findings, forty nine (49) items (98%) that is item items 1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 13 14, 15, 16, 16, 18, 19, 20, 21, 22, 23, 24, 25………..50
had S.E below 0.50 which indicates high reliability while (1) item (2%) of item 17 had
S.E of 0.58 which indicates low reliability. The reliability of the instrument ensures
the consistency of the test instrument. For any measuring instrument, the smaller the
error, the greater the reliability while the greater the error, the smaller the reliability.
This is why (Baumgartner, 2002) said that the difficulty index of every item in a test is
accompanied by its standard error in latent trait analysis and the smaller the standard
error the better the item. This finding also agrees with Meredith et al (2007) that if
reliability coefficient increases, the standard error of measurement becomes smaller.
This result is in agreement with Obinne (2008) that S.E of 0.50 and below is described
as high reliability, while S.E above 0.50 is described as low reliability.
Fits statistics of Economics multiple choice test.
The findings of the study revealed the fit statistics of Economics multiple
choice questions. The result obtained indicated that twenty nine (29) items (58%) that
is items 2, 4, 6, 8, 11,13, 14, 16, 18, 19, 20, 23, 25, 27, 30, 31, 33, 34, 36 37, 38, 40,
41, 42, 43, 44, 46, 48 and 49 did not fit the three parameter model while twenty one
(21) items (42%) that is items, 1, 3, 5, 7, 9, 10, 12, 15, 17, 21, 22, 24, 26, 28, 29, 32,
35, 39, 45, 47, and 50 fitted the three parameter model. Nkpone (2001) asserted that in
the latent trait models, a fit to the model implies validity that item discriminations are
58
uniform and substantial, and there is no error in terms of scoring. The criterion for all
the item fit/misfit in research question 2 was determined at 0.05 level of significant.
The findings corresponds with Adedoyin (2010) who in his study used chi-square test
with probability greater than alpha level of 0.05 significant level to select items that fit
model.
Item threshold values (difficulty estimates) of Economics multiple choice test.
The findings of this study revealed that items 2, 3, 4, 5, 6, 7, 8, 10 11, 12, 14,
15, 16, 17, 18, 19, 21, 25, 26, 27, 28, 30, 31, 32, 33, 34, 36, 41, 45, 46, 47, 49 and 50
in research question 3 had negative difficult estimates. The findings also indicated
that items, 1, 9, 13, 20, 22, 23, 24, 29, 35, 37, 38, 39, 40, 42, 43, 44 and 48 in table 3
of research question 3 had positive difficulty estimates. The finding agrees with
(Chong, 2013) that difficulty parameter or the threshold parameter value tells us how
easy or how difficult an item is. The finding of this study corresponds with Obinne
(2008) that negative difficulty estimates indicate that the items are easy while positive
difficulty estimates indicate that the items are hard. The findings which revealed that
the items were selected based on the b-value range of -3 to +3 corresponds with
(Baker, 2001) that theoretically, difficulty values can range from - 00 to + 00, in
practice, difficulty values usually are in the range of - 3 to + 3.
Item parameters of the Economics multiple choice test.
From the findings of the study, it was revealed that (20%) of items 1, 2, 16, 17,
22, 26, 27, 32, 49 and 50 indicated very low discriminating values while (36%) of
items 3, 4, 5, 6, 7, 10, 11, 12, 14, 15,19, 20,25, 33, 35, 36, 46 and 47 indicated low
59
discriminating values Also, (40%) of items 8, 9, 13, 18, 21, 23, 24, 28, 29, 30, 31, 34,
38, 39, 40, 41, 42, 44, 45 and 48 indicated moderate discriminating values and (4%) of
items 43 and 37 indicated a very high discriminating values. Discriminating parameter
indicates how well an item discriminate between respondents below and above the
item threshold parameter, as indicated by the slope of the item characteristics curves
(Reeve & Fayers, 2005). This result is in agreement with the findings of Baker (2001)
who described the range of values for item discrimination as follows: very low, 01 -
.34, Low, 35 - .64, moderate, 65 - 1.34 High, 1.35 - 1.69 and Very high, 1.70 and
above.
Guessing parameters of the Economics multiple choice test.
The findings of the study revealed that guessing values (c-values) within the
value range of 0.00 to 0.20 had forty five (45) items that is item 1, 2, 3, 4, 5, 6, 7, 8, 9,
10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, ………..37, 39, 41, 42,
45, 46, 47, 48, 49 and 50 are desirable. This lower c-value range indicates that the
probability of getting an answer correctly by mere guessing is low while five (5) items
that is items 23, 38, 40, 43 and 44 were considered not very good. This higher c-value
range indicates that the probability of getting an answer by mere guessing is high. The
finding was supported by Kamiri (2010) that the lowest c-values, the better indicating a
lower probability of getting the answer correct by mere guessing of low ability
examinees. Harris (2005) concluded that the items with 0.30 or greater c-values are
considered not very good, rather c-values of 0.20 or lower are desirable. In like
manner, Akindele (2003) also noted that items do not have perfect c-values because
examinees do not guess randomly when they do not know the answer.
60
Differential item functioning of the Economics multiple choice test with respect
to gender.
The findings of this study revealed that item forty six (46) items had
differential item functioning (DIF) items are item 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13,
15…………..50 excluding items 14, 21, 27and 46. Four (4) items (8%) that is item 14,
21, 27, and 46 were identified as not exhibiting differential functioning among male
and female students. This finding is of the same view with Davis (2002) that however,
sometimes items are found to behave differently in distinct groups such as gender or
language (such as loading on different dimensions in a multi-dimensional factor
analysis, or having largely different mean item scores). In other words, two examinees
with the same latent trait value but differing in other characteristics may have
differing probabilities of response. The findings were determined at 0.05 level of
significance. The findings of Madu (2012) is in line with the findings of this study
when it concluded in a study that thirty nine (39) items in the mathematics test (stared)
were identified as significantly exhibiting differential item functioning between male
and female examinees at .05 level of significant while 11 items do not differential
function between male and female examinees.
Conclusion
Based on the result of the findings the following conclusions were drawn:
1. That forty nine (49) items indicated high reliability of the test items while one
(1) item indicated low reliability.
61
2. That twenty one (21) items fitted the three parameter model while twenty nine
(29) items did not fit the three parameter model.
3. That thirty three (33) items indicated difficult items while seventeen (17) items
indicated easy items.
4. That Ten (10) items indicated very low discriminating values, eighteen (18)
items indicated low discriminating values, twenty (20) items indicated
discrimination moderate values and two (2) items indicated high discriminating
values.
5. That forty five (45) items were considered desirable, meaning that the
probability of getting an answer correctly by mere guessing is low while five
(5) were considered not very good, and the probability of getting an answer
correctly by mere guessing is high.
6. The findings further revealed that items functions differential in Economics
among male and female students.
Educational Implications of the Study
The findings of this study have obvious educational implication for teachers,
examination bodies, psychometricians and test developers.
From the findings of the study, it is possible for Economics teachers to identify
the difficulty of each item. The implication is that teachers should try as much as
possible to set questions that are not very easy or very difficult. It is also possible to
detect items that are functioning differently among male and female students. To
62
ensure this effectiveness, it becomes necessary that teachers especially Economics
teachers should set and administer items that are fair to ensure quality education. The
study has implication on guessing parameter, which every teacher setting questions on
multiple choice questions should note that the probability of the students getting
correct answer by guessing can still be low.
Since psychometric properties have adequate effect on questions, teacher
should try as much as possible to set quality tests that have adequate psychometric
properties.
For examination bodies like West African Examination Council, National
Examination Council and others, since Item Response Theory (IRT) was designed to
overcome the limitations of the Classical Test Theory (CTT), teachers, examination
bodies and psychometricians should encouraged to adopt IRT in developing test items
used in measuring students ability especially in Economics.
The findings have implication to the test developers, that it is likely to make
comparison of different items in order to assess their discrimination values, difficulty,
fit statistics and standard errors.
Recommendations
Based on the findings of the study the following recommendations were made:
1. The psychometricians and measurement expert should organize workshops to
educate teachers on the implications of quality tests. They should as well train
63
teachers to know about the modern measurement frame work called IRT as
well as the necessary interpretations involved.
2. The examination bodies and teachers should be encouraged to adopt (IRT) in
developing test items used in measuring students ability in Economics.
3. Education ministry and universities should try and assist students who are
interested to study a research on item response theory to get software and
necessary computer packages.
4. It is imperative to determine how the items in an instrument fit the IRT
parameter model, such as one parameter, two parameter and three parameter
logistic models.
5. The differential item functioning effects of items should be properly determined
in the test instrument to avoid gender differences.
Limitations of the Study
1. IRT is a new concept in education measurement in Nigeria, hence obtaining
relevant literature and studies in relation to Nigeria are very difficult.
2. In Nigeria, the software packages for Item Response Theory (IRT) analysis are
not available and the measurement experts who know how to calibrate on IRT
package are also very few.
64
Suggestions for Further Studies
The researcher suggests that further studies be carried out in the following area:
1. Application of item response theory in the development and validation of
multiple choice test in another area like Commerce, Government, Geography
e.t.c.
2. Detecting of differential item functioning in Economics multiple choice test.
3. A replication of this study using a wider geographical area, if possible the
whole Enugu state.
Summary of the Study
The study investigated the application of item response theory in the
development and validation of multiple choice questions in Economics. Item response
theory was seen as an important aspect of measurement theory that determines the
latent trait of the students. The report on the limitations of the classical test theory
motivated the researcher to embark on this study to determine latent traits of the
students using item level performance instead of aggregate level performance. The
study also examined the guessing and differential item functioning effect. Six research
questions guided the study and two null hypotheses were formulated and tested at 0.5
level of significance.
From the literature review, the concept of achievement test, procedures for
development of a test, qualities of a test, item analysis, Differential Item Functioning
(DIF), standard errors of measurement, concept of gender and analysis of fit were
discussed. Theoretical review as well as review of empirical studies was discussed.
65
The empirical studies gathered information on studies related to the present study.
However, none of the studies reviewed focused on the application of item response
theory in the development and validation of an instrument measuring achievement in
Economics. Instrumentation design was the design of the study, with 46 government
co-education schools used in Nsukka education zone while sample of 1005 students
was used. The researcher developed an instrument tilted Economics Multiple Choice
Test (EMT) with reliability co-efficient of 0.89. The instrument was used to carry out
the study. Data obtained were subjected to statistical analysis. Maximum likelihood
estimation technique of the BILOG-MG V3 of 3PL MODEL computer programming
was used to answer research question while BILOG-MG V3 of DIF MODEL
computer programming was used to test the hypotheses.
The analysis of the data indicated that;
1. That 98% of the items had standard error that indicated high reliability while 2%
had standard error that indicated low reliability.
2. That forty nine (58%) items did not fit the three parameter logistic (3pl) model
while 42% fitted the three parameter model.
3. That 66% of the items were easy while twenty 34% of items were difficulty.
4. That 20% of the items had very low discriminating values, 36% items
had low discriminating values, 40% of the items had moderate discriminating
values and 4% items had high discriminating values.
66
5. That 90% items had the lowest c-values which implies that the probability of the
student getting the answer correct is very low while 10% implies that the
probability of getting an answer correctly by mere guessing is high.
6. The results of the analysis indicated that male and female Economics students
function differential in 46 items and no difference in 4 items.
Following the discussion of the findings, the educational implications of the study
were enumerated. It was recommended that the psychometricians and measurement
expert should organize workshops to educate the teachers on the implications of
quality tests. They should as well train teachers to know about the modern
measurement frame work called IRT as well as the necessary interpretations involved.
The limitation of the study was highlighted and suggestions for further studies were
made. Based on the findings of the study, it was concluded that 98% of the items had
standard error below 0.50, indicating high reliability.
67
REFERENCES
Abonyi, O. S. (2011). Instrumentation in behavioral research: A practical
approach. Enugu: TIMEX Publishing Company.
Adedoyin, O. O. (2010). Investigating the invariance of person parameter estimates
based on classical test and item response theories. International journal of
educational science. Retrieved November 30, 2012, from
http://www.uniBotswana./journal/ education
/science
Adeyegbe, S. (2004). History of West African Examinations Council. Retrieved
October 12, 2012, from http://www.waecnigeria. org/home.htm.
Akindele, B. P. (2003). The development of an item bank for selection tests into
Nigerian universities: an exploratory study. Unpublished doctoral dissertation,
University of Ibadan, Nigeria.
Ali, A. (2006). Conducting research in education and the social sciences. Enugu:
Tashiwa Networks Ltd.
Anaekwe, M.C. (2007). Basic research methods and statistics in education and social
sciences (2nd ed.). Onitsha: Sofie Publicity and Printry Limited.
Anastasi, A., & Urbina, S. (2002). Psychological testing. New York: Prentice Hall.
Anene, G. U., & Ndubisi, O.G (2003). Test development process. In B. G. Nworgu
(Ed.), Educational measurement and evaluation: Theory and practice
(pp.110-122). Nsukka: University Trust Publishers.
Anikweze, C. M. (2010). Measurement and evaluation: For teacher education. (2nd
ed.). Enugu: SNAAP Press Ltd.
68
Asadu, I. N. (2001). Trend in student’s enrolment and performance in senior
secondary certificate examination in Economics. Unpublished doctoral
dissertation, University of Nigeria, Nsukka.
Baker, F. B. (2001). The basics of item response theory. (2nd ed.).United States of
America: ERIC clearinghouse on assessment and evaluation.
Baumgartner. T. A. (2002). Conducting and reading research in health and human
performance (3rd ed.) Mc-Graw Hill high education New York.
Bhakta, B., Tennant, A., Horton, M., Lawton, G., Andrich, D., (2005). Using item
response theory to explore the psychometric properties of extended matching
questions examination in undergraduate medical education. journal of medical
education: 5(9). Retrieved October, 12, 2012, from http://www.biome central.
com/1472-6920/5/9. doi: 10.1186/1472-6920-5-9.
Black, P. J., & William, D. (2009). Assessment and classroom learning. Assessment in
education. 5, 7-74
Bradley, P., & Herrin, J. (2004). Development and validation of an instrument to
measure knowledge of evidence-based practice and searching skills. Med
Educ. Online Retrieved April, 25, 2013, from http://www.med-ed-online.org.
Bush, M. (2001). A multiple choice test that rewards partial knowledge. Journal of
further and higher education, 25(2), 157-163.
Chatterji, M. (2003). Designing and using tools for educational assessment. Journals
of education: Retrieved January, 21, 2014 from
http://www.columbia.edu/~mb1434/EdAssess.htm
Chong, H. Y. (2013). A Simple guide to the Item Response Theory (IRT) and Rasch
modeling. Retrieved from March, 2013, from http:// www.creative-
wisdom.com.
Crocker, L. & Algina, J. (2008). Introduction to classical and modern test theory.
Fort Worth: Harcourt Brace Jovanovich.
69
Davis, L. L. (2002). Strategies for controlling item exposure in computerized adaptive
testing with polytomously scored items. Unpublished doctoral dissertation,
University of Texas at Autin.
De-Jong, M. G., Steenkamp, E.M., Fox, J. & Baumgartner, H. F. (2008). Using item
response theory to measure extreme response style in marketing research: A
global investigation. Journal of marketing research: 45(1), Retrieved October,
14, 2012, from http://www.journl.marketing power.com
Denga, D. I. (2003). Educational measurement: continuous assessment and
psychological testing. Calabar: Rapid educational publisher Ltd.
Douglas, G. (1987). Latent trait measurement models. In T. Smart, (ed.) educational
research measurement model: An international hand book. (pp. 240-259).New Jetsey:
Alnold Press.
Ebo, E. C. (2009). Social and economic research: Principles and methods (2nd ed.).
Enugu: African Institute for Applied Economics.
Egunjobi, A., & Egwaikhide, F. (2010). Economics for Senior Secondary School.
Lagos: Macmillan Nigeria Publishers Ltd.
Emaikwu, S.O. (2011). Issues in test item bias in public examinations in Nigeria and
implications for testing. International journals of academic research in
progressive education and development. 1(1) (pp.40)
Embertson, S. E., Reise, S. P. (2000). Item response theory for psychologists.
Mahwah, NJ: Lawrence Erlbaum Associates.
Ercikan, K. & Koh, K. (2005). Construct comparability of the English and French
versions of TIMSS. International journal of testing (5), 23-35.
Eze, C.O. & Onah, P.C (2005) Measurement evaluation in education. Enugu:
Computer Edge Publishers.
Ezeh, D.N. (2003). Reliability and validation of tests. In B. G. Nworgu (Ed.),
Educational measurement and evaluation: Theory and practice (pp.123-135).
Nsukka: University Trust Publishers.
Federal Republic of Nigeria (FRN) (2004). National Policy on Education (4th ed.).
Lagos: NERDC press.
70
Falayajo, W. (1986). Philosophy and theory of continuous assessment. A paper
presented at a workshop for inspectors of education in Ondo state, Nigeria. 4th
,
December.
Ferguson, G. A. (2011). A bi-factor analysis of reliability coefficients: the British
journal of psychology. Retrieved September, 5, 2012 from general section-
Wiley online library.
Hambleton, R. K., & Swaminathan, H. (1991). Item response theory: Principles and
applications. Boston: Kluwer-Nijhoff.
Harbor-Peters, V. F. (1999). Noteworthy points on measurement & evaluation. Enugu:
Snap Press Ltd.
Harlen, W., & Deakin-Crick, R. (2002). A systematic review of the impact of
summative assessment and tests on students’ motivation for learning. In EPPI-
Centre (Ed.), Research evidence in education library (1.1 ed., pp. 153–).
London, UK: University of London Institute of Education Social Science
Research Unit.
Harris, D. (2005). Educational measurement issues and practice: comparison of 1-, 2-,
and 3- parameter IRT models. DOI: 10.1111/j.1745-3992.1989.tb00313.x.
Henard, D .H. (2000), Item response theory, in reading and understanding more -
multivariate statistics, Vol. II, Larry Grimm and Paul Yarnold, (Eds).,
Washington, DC: American Psychological Association, 67-97.
Huba, M. E. & Freed, J. E. (2000). Learner-centered assessment on college campuses:
Shifting the focus from teaching to learning. Boston, MA: Allyn & Bacon.
Ifeakor, A. C. (2011). Psychological measurement & evaluation in Education: Issues
and application. Onitsha: Folmech Printing and Publishing Co.Ltd.
Jeff, J., Sridhar, M., & Beverly, M. (2006), Estimating student proficiency using an
item response theory. Journal of intelligent tutorial system.4053, 473-480.
Retrieved September, 5, 2012 from http://www.link. springer
.com/10. 1007%2.pdf.
Jeffrey, P. D. & Wendy, M. K. (2006). Development and validation of an instrument
to assess secondary school students’ perceptions of assessment tasks. Journals
of Educational Studies, (32) 1. Retrieved June, 5, 2013, from
http://www.unilorin.edu.ng.
71
Karami, H. (2010). A Differential Item Functioning analysis of a language proficiency
test: an investigation of background knowledge bias. Unpublished Master‟s Thesis. University of Tehran, Iran.
Kim, S. (2006). A comparative study of Item response theory fixed parameter
calibration methods. Journal of Educational Measurement. Retrieved January, 30,
2013 from http://www.
measuredprogress.org/learning
Korashy, A.F. (1995). Applying the Rash model to the Selection of items for mental
ability test. Educational and Psychological Measurement, 55(5) 753-763.
Kyung, T. H. (2013). Windows software that generates IRT parameters and item
responses: Research and Evaluation Program Methods (REMP). University of
Massachusetts Amherst.
Lee. J. (2001). Inter State variation in rural students’ achievement and schooling
conditions. Retrieved June, 15, 2013 from http://www.ericdigest.org/2002
MacDonald, P. & Paunonen, S.Y. (2002). A Monte Carlo comparison of item and
Person statistics based on item response theory versus classical test theory:
International journals of Measurement 62(6): 91-943. Retrieved May, 30,
2012, from http://www.eri.ed.gov/ERICW.
Madu, B. C. (2012). Analysis of Gender-Related Differential Item Functioning in
Mathematics Multiple Choice Items Administered by West African
Examination council (WAEC). Journal of Education and Practice. Retrieved
May, 15, 2012, from ISSN /2222.1735 (Paper) 2222-288X (Online) Vol 3, N0.
8 2012.
Maduewesi, U.B. (1999). Curriculum implementation and instruction. Onitsha: West
and Solomon publishing COY LTD.
Malcolm, T. (2003). An achievement test. Retrieved November, 20, 2013, from
http://www.wisegeek.com/what-is-an- achievement-test.htm
Mankiw, N. G. (2001). Principles of Economics. 2nd ed. Forth Worth: Harcourt
Publishers.
Martyn, S. (2009). Face validity. Journal of Educational Measurement. Retrieved
December, 10, 2013 from Explorable.com: http://explorable.com/face-validity.
Mehrens, W.A. & Lehmann, I.J. (1978). Measurement and evaluation in education &
psychology. (2nd ed.). New York: Holt Rinehart and Winsten Inc.
Meredith, D. G., Joyce, P. G., & Walter, R B., (2007). Educational research: an
introduction (8th ed.). United State of America: Pearson Press.
72
Ndalichako, J.L & Rogers, W.T. (1997). Comparison of finite state score theory,
classical test theory and item response theory in scoring multiple-choice Item.
Educational and psychological measurement, 57, 580-589.
Neal, D. J., Corbin, W. R., & Fromme, K. (2006). Measurement of alcohol-related
consequences among high school and college students: Application of item-
response models to the Rutgers Alcohol Problem Index. Psychological
Assessment, 18, 402-414.
Nenty, H. J. (2004). From Classical Test Theory (CTT) to Item Response Theory
(IRT): An introduction to a desirable transition. In: OA Afemikhe, JG
Adewale (Eds.): Issues in Educational Measurement and Evaluation in
Nigeria. Institute of Education, University of Ibadan, Ibadan, Nigeria, pp.372-
384.
Nering, M. L., & Ostini, R. (2010). Handbook of polytomous item response theory
models. New York: Routledge.
Nering, M. L. & Ostini, R. (2006). Polytomous item response theory models.
Thousand Oaks, CA: Sage.
Nkpone, H.L. (2001). Application of latent trait models in the development and
standardization of physics achievement test for senior secondary students.
Unpublished doctoral dissertation, University of Nigeria, Nsukka.
Nworgu, B.G. (2006). Introduction to Educational Measurement and evaluation:
theory and practice (2nd ed.). Nsukka: Hallman Publisher.
Nworgu, B.G. (2006). Introduction to Educational Measurement and evaluation:
theory and practice (2nd ed.). Nsukka: Hallman Publisher.
Obemeata, J.O. (1991). Pupil’s perspective of the purpose of economics education in
Nigerian secondary grammar school. West African Journal of Education.
21(2). Retrieved December, 12, 2012, from
http://www.unilorin.edu.ng/ journal/ education.
Obidiegwu, U. J. (2008). Development and Validation of Physical Education
Achievement test (PEAT) for adult learners in Anambra state. Selected work.
Retrieved April, 22, 2013, from http://works.bepress.com druche_Obidiegwu/6.
Obinne, A.D.E. (2008). Psychometric properties of senior certificate biology
examinations conducted by West African Examinations council: Application of
item response theory. Unpublished doctoral dissertation, University of Nigeria,
Nsukka.
73
Obinne, A. D. E. (2012) Using IRT in determining test item prone to guessing.
Reprieved June, 20, 2013, URL: http://dx.doi.org/wje.v2
n1p91.
Obinne, A.D.E. (2013). Test item validity: item response theory (IRT) perspective for
Nigeria. Research Journal in Organizational Psychology & Educational
Studies 2(1). Retrieved January, 28, 2014, from www.emergingresource.org
Ohuche, R. O. & Ukeje, S.A (1977). Testing and evaluation in education. Lagos:
African Educational Resources.
Okeke, F. N. (2006). Women and leadership in higher education; facing international
challenges and maximizing opportunities. Association of Common Wealth
University Bulletin, 147, 14-17.
Okoh, E.E. (2007). Correlates of marital adjustment among married persons in Delta
state: Implication for guidance and counselling. Unpublished PhD Thesis
University of Benin, Benin city.
Okoro, O.M. (2006). Measurement and evaluation in education. Uruowulu-Obosi:
Pacific Publishers Ltd.
Okoro, C. O. (2010). Development and validation of extracurricular instructional
package in social studies. Faculty of education university of Port Harcourt, Port
Harcourt River state Nigeria. Journals of academia Retrieved May, 30, 2013,
from http://www.sciencepub.net
Olusola, O., Adesope, C., Gress, L. Z. & Nesbit, J. C. (2008). Validating the
psychometric properties of achievement goal questionnaire using item response
theory. Presented at the 2008 Annual Meeting of the Canadian Society for the
Study of Education, May 31- June 3, Vancouver, B. C., Canada.
Onunkwo, G .I .N. (2002). Fundamentals of education measurement and evaluation.
Owerri: Cape Publishers Int’l Ltd.
Orangi A.M. &, Dorani, K. (2010). Developing a social studies achievement test for
high school students based on item-response theory (IRT). Journal of
psychological models and methods: 1(1); 1-13. Retrieved, July, 11, 2013, from
http://www. Scientific Information Database (SID).
Orji, K.O. (2002). Basic Principles for Agricultural Project Policy Analysis. Nsukka:
Price Publishers.
74
Osterlind, S. J. (2012). Item response theory. Journals of home school and academic
learning. Retrieved November, 30, 2012, from
http://www.education.com>Home>School and Academics>
classroom learning.
Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning. Thousand
Oaks, CA: Sage Publishing.
Palmieri, P.A. (2012). Item response theory method and application gaining support
as assessment instrument. Retrieved December, 18, 2012, from
http://www.istss.org/publications.
Polit, D. F. & Hungler, B. P. (2002). Nursing research principles and methods (8th
ed.). Philadelphia: Lippincott.
Reeve, B. B. (2000). Item and scale-level analysis of clinical and non-clinical
sample responses to the MMPI-2 depression scales employing item response
theory. Unpublished doctoral dissertation, University of North Carolina at
Chapel Hill.
Reeve, B. B. (2002). An introduction to modern measurement theory. Bethesda,
Maryland: National cancer institution.
Reeve, B. B. & Fayers, P. (2005). Applying item response theory modeling for
evaluating questionnaire items and scale properties. In P. Fayers and R.D.
Hays (Eds.), Assessing quantity of life in clinical trials: method of practice.
(2nd ed.). USA: Oxford university press. Retrieved September, 11, from
http://cancer. Unic.edu/research/faculty/display member-plone.asp?ID-694.
Reise, S. P., & Waller, N. G. (1990). Fitting the two-parameter model to personality
data. Applied psychological measurement, 14, 45-58.
Robbins, L. (1932). An Essay on the Nature and Significance of Economic Science.
(2nd
ed.). London: Macmillan. Links for 1932 HTML and 1935 facsimile.
Thissen, D. & Stemberg, R. (1988). Test validity. Journals of education testing and
measurement. Retrieved December, 18, 2012, from http://www. Error!
Hyperlink reference not valid.> Testing & measurement.
Troy-Gerard, .C. (2004). An empirical comparison of item response theory and
Classical test theory item/person statistics. Unpublished doctoral dissertation,
University Texas A&M.
Vander, L. W. J., & Hambleton, R. K. (1997). Handbook of modern item response
theory. New York: Springer-Verlag.
75
Wendy, K. A. & Carl, E. W. (2010). Development and validation of instruments to
measure learning of expert-like thinking. International journal of science
education. Retrieved December, 28, 2013, from http://www.informaworld.
com/smpp/title~content=t713737283
World Health Organization (2002). "Gender and Reproductive Rights: Working
Definitions". Retrieved June 15, 2013 from http://www.ericdigest. org/2002.
Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C.
R. Rao & S. Sinharay (Eds.), Handbook of statistics and Psychometrics
(pp.45–79). Amsterdam, the Netherlands: Elsevier Science Publisher.
Zumbo, B. D. (2007). Three Generations of DIF Analyses: Considering Where It Has
Been, Where It Is Now, and Where It Is Going. Language Assessment
Quarterly, 4(2), 223–233.
76
APPENDIX A
AREA OF THE STUDY
LIST OF SCHOOLS IN NSUKKA EDUCATION ZONE
SS2 SS2
S/N NUMBER OF SCHOOLS IN NSUKKA
LOCAL GOVERNMENT AREA
M F TOTAL
1 S T C Nsukka 372 - 372
2 Nsukka High sch. Nsukka 338 - 338
3 Q R S S Nsukka - 215 215
4 Com. Sec. Sch. Isienu 100 148 248
5 Urban Girls Sec. Sch. Nsukka - 169 169
6 Opi High Sch. Opi 44 36 80
7 Com. Sec. Sch. Lejja 28 40 68
8 Com. Sec. Sch. Edem 63 88 151
9 Com. High Sch. Umabor 50 73 123
10 Com. Sec. Sch. Ehendiagu 8 2 10
11 Com. Sec. Sch. Okpuje 40 45 85
12 Com. Sec.Sch. Ibagwani 79 85 164
13 Com.Sec. Sch. Obimo 14 32 46
14 Com. Sec. Sch. Obukpa 51 76 127
15 Com. Sec. Sch. Edeoballa 140 160 300
77
16 Com. Sec. Sch. Ezebunagu 15 20 35
17 St. Cyprians Girls Sec. Sch. Nsukka - 230 230
18 Com. Sec. Sch. Nru Nskka 55 101 156
19 Model Sec. Sch. Nsukka 52 116 168
20 Girl sec. Sch. Opi - 31 31
21 Com. Sec. Sch. Alor-uno 25 29 54
22 Com. Sec. Sch. Opi agu 9 4 13
23 Lejja high sch. Lejja 31 42 73
24 Agu Umabor Sec Sch. Umabor 18 20 38
25 Urban Boys Sec. Sch. Nsukk 121 - 121
26 Comm. Sce. Sch. Akpotoro Obimo 19 38 57
27 Comm. Sec. Sch. Okutu 10 12 22
28 Edemani High Sch. Edemani 16 23 39
29 Comm. Sec. Sch. Breme 12 10 22
30 Comm. Sec. Sch. Ajona Obimo - - -
SCHOOLS IN IGBO-ETITI LOCAL GOVERNMENT
1 Premier Sec. Sch. Ukehe 90 92 182
2 St.James Aku 26 - 26
3 Com. High Sch. Ukehe 48 57 105
4 Girls Sec. Sch. Aku - 96 96
78
5 Com. Sec. Sch. Ozalla 35 20 55
6 Com. Sec. Sch. Ohodo 33 37 70
7 Com. High Sch. Ekwegbe 54 54 108
8 Com.Sec. Sch. Ukopi 22 34 56
9 Oranadu Com. Sch. Ukehe 44 50 94
10 Com. Sec. Sch. Ohebe Dim 61 73 134
11 Com. Sec. Sch. Umunko 40 50 90
12 Com. Sec. Sch. Aku 93 71 164
13 Com. Sch. Sch. Umunna 30 29 59
14 Igb-Etiti Sec. Sch. Ikolo 24 27 51
15 Akutara Sec Sch. Ohodo 30 34 64
16 Comp. Sec Sch. Diogbe - - -
SCHOOLS IN UZO-UWANI LOCAL GOVERNMENT AREA
1 Adada Sec. Sch. Nkpologu 18 9 27
2 Uzo-Uwani Sec. Sch. Adani 20 30 50
3 Attah Mem. High. Sch. Adaba 6 5 11
4 Girls Sec. Sch. Umulokpa - 36 36
5 Com. Sec Sch. Abbi-ugbene 30 36 66
6 Com. Sec. Sch. Upkata 30 13 43
7 Com. Sec. Sch.Nimbo 53 29 82
79
Source: Post Primary School Management Board (PPSMB) Nsukka 2013/2014
academic session.
8 Com. Sec. Sch. Ogurugu 27 20 47
9 Uvuru Sec.Sch. Uvuru 20 20 40
10 Com. High. Sch. Nrobo 20 20 40
11 Welfare Sec. Sch Opanda 10 4 14
12 Com. Sec Sch. Ugbene-Ajima 28 36 64
80
APPENDIX B
POPULATION OF THE STUDY
LIST OF CO-EDUCATION GOVERNMENT SENIOR SECONDARY
SCHOOLS IN NSUKKA EDUCATION ZONE
SS2 SS2
S/N NUMBER OF SCHOOLS IN NSUKKA
LOCAL GOVERNMENT AREA
M F TOTAL
1 Com. Sec. Sch. Isienu 100 148 248
2 Opi High Sch. Opi 44 36 80
3 Com. Sec. Sch. Lejja 28 40 68
4 Com. Sec. Sch. Edem 63 88 151
5 Com. High Sch. Umabor 50 73 123
6 Com. Sec. Sch. Ehendiagu 8 2 10
7 Com. Sec. Sch. Okpuje 40 45 85
8 Com. Sec.Sch. Ibagwani 79 85 164
9 Com.Sec. Sch. Obimo 14 32 46
10 Com. Sec. Sch. Obukpa 51 76 127
11 Com. Sec. Sch. Edeoballa 140 160 300
12 Com. Sec. Sch. Ezebunagu 15 20 35
81
13 Com. Sec. Sch. Nru Nskka 55 101 156
14 Model Sec. Sch. Nsukka 52 116 168
15 Com. Sec. Sch. Alor-uno 25 29 54
16 Com. Sec. Sch. Opi agu 9 4 13
17 Lejja high sch. Lejja 31 42 73
18 Agu Umabor Sec Sch. Umabor 18 20 38
19 Comm. Sce. Sch. Akpotoro Obimo 19 38 57
20 Comm. Sec. Sch. Okutu 10 12 22
21 Edemani High Sch. Edemani 16 23 39
22 Comm. Sec. Sch. Breme 12 10 22
GRAND TOTAL 879 1200 2079
SCHOOLS IN IGBO-ETITI LOCAL GOVERNMENT
1 Premier Sec. Sch. Ukehe 90 92 182
2 Com. High Sch. Ukehe 48 57 105
3 Com. Sec. Sch. Ozalla 35 20 55
4 Com. Sec. Sch. Ohodo 33 37 70
82
5 Com. High Sch. Ekwegbe 54 54 108
6 Com.Sec. Sch. Ukopi 22 34 56
7 Oranadu Com. Sch. Ukehe 44 50 94
8 Com. Sec. Sch. Ohebe Dim 61 73 134
9 Com. Sec. Sch. Umunko 40 50 90
10 Com. Sec. Sch. Aku 93 71 164
11 Com. Sch. Sch. Umunna 30 29 59
12 Igb-Etiti Sec. Sch. Ikolo 24 27 51
13 Akutara Sec Sch. Ohodo 30 34 64
GRAND TOTAL 604 628 1232
SCHOOLS IN UZO-UWANI LOCAL GOVERNMENT AREA
1 Adada Sec. Sch. Nkpologu 18 9 27
2 Uzo-Uwani Sec. Sch. Adani 20 30 50
3 Attah Mem. High. Sch. Adaba 6 5 11
4 Com. Sec Sch. Abbi-ugbene 30 36 66
5 Com. Sec. Sch. Upkata 30 13 43
83
Source: Post Primary School Management Board (PPSMB) Nsukka 2013/2014
academic session.
6 Com. Sec. Sch.Nimbo 53 29 82
7 Com. Sec. Sch. Ogurugu 27 20 47
8 Uvuru Sec.Sch. Uvuru 20 20 40
9 Com. High. Sch. Nrobo 20 20 40
10 Welfare Sec. Sch Opanda 10 4 14
11 Com. Sec Sch. Ugbene-Ajima 28 36 64
262 222 484
84
APPENDIX C
SAMPLE OF THE STUDY
POPULATION DISTRIBUTION OF 3795 SS2 ECONOMICS STUDENTS IN
46 GOVERNMENT CO-EDUCATION SENIOR SECONDARY SCHOOLS IN
NSUKKA EDUCATION ZONE ACCORDING TO LOCAL GOVERNMENT
AREA.
The researcher wishes to draw a sample of 1005 from this population.
Nsukka local government area
Male: = 879 x1005 = 233 Female: 1200x1005 = 318 total = 551
3795 3795
Igbo-Etiti local government area
Male: = 604x1005 = 160 Female: 628x1005 = 166 total = 326
3795 3795
Uzo-Uwani local government area
Male: = 262 x1005 = 69 Female: 222x1005 = 59 total = 128
3795 3795
Grand total = 1005
Total number of sample size is 1005. It should be observed that the relative
proportions of the Nsukka, Igbo-Etiti and Uzo-Uwani strata in the sample are exactly
the same in their relative proportions in the population.
LOCAL GOVERNMENT AREA
Nsukka Igbo-Etiti Uzo-Uwani Total
Male Female Male Female Male Female
Size 879 1200 604 628 262 222
2079 1232 484 3795
85
APPENDIX D
INSTRUMENT
ECONOMICS MULTIPLE CHOICE TEST (EMT) FOR SENIOR SECONDARY
SCHOOL ECONOMICS STUDENTS
CLASS: SS2
Time: 1 hour
Instruction: Answer all questions. Identify the correct option lettered A-D for each
question
Please indicate by ticking (�) in the box provided as applicable to you.
SEX: Male Female
NAME: ---------------------------------------
1. A cooperative bank is an institution establish for the main purpose of
A. Mobilizing savings of the cooperative societies for bank deposits
B. Established to accept risks and losses as they occurs in business
C. To finance personal buildings
D. Providing long-term and medium-term loans for the development of companies
2. Where a commodity takes an insignificant proportion of the consumer’s income
demand for it will be
A. Unitary elastic
B. Price inelastic
C. Fairly elastic
D. Income inelastic
86
3. The liquidity ratio of a commercial bank refers to the
A. Total amount of cash for the bank’s treasury
B. Total amount of cash for the bank in the central bank
C. Proportion of the bank cash that should be on loan
D. Proportion of the bank’s total assets which should be held in cash and liquid
form.
4. The demand curve for a commodity is downward sloping because the consumer
will pay
A. Less as the marginal utility falls
B. More as the marginal utility falls
C. Less as the total utility falls
D. More as the average utility falls
5. A decrease in the demand for a product X resulted in a decrease in the demand for
another product Y. the demand for X and Y is
A. Derived
B. Composite
C. Joint
D. Competitive
6. The main feature of regressive taxation is that its rate
A. Is higher when income is higher
B. Is equal tax for all categories of people
C. Remains constant when income increases
D. Reduce when income increases
7. Goods for which demand rises as income arises
A. Complementary goods
B. Inferior goods
87
C. Normal goods
D. Substitutes goods
8. The demand curve is
A. Downward slopping from left to right
B. Downward slopping from right to left
C. Upward slopping from right to left
D. Drawn vertically
9. Which of the following is not a function of central bank?
A. Acceptance of deposits from the customers
B. Bankers to commercial banks
C. Bankers to the government
D. Lenders of last resort
10. The capitalist economic system is characterized by all the following except
A. Private ownership of the means of production
B. Inheritance
C. Profit motive
D. ownership and management of the means of production are vested in the state
11. One of these is not a characteristic feature of inflation
A. Too much money chasing too few goods
B. A fall in employment opportunity
C. Too much money in circulation
D. A fall in the value of money
12. Which of the following is the instrument of control applied by the central bank
to ensure smooth running of economy?
A. Bank standard
88
B. Deposit slip
C. Bank draft
D. Use of reserve ratio
13. The upwards movement sloping of the supply curve indicates that
A. More will be supply as price rises
B. Less will be supplied as prices
C. Supply is not a function of price
D. Supply is static and demand is dynamic
14. Equilibrium price is reached when
A. Demand is less than supply
B. Supply is greater than supply
C. Demand equals supply
D. None of the above
15. When a small change in price brings about a bigger change in the quantity
supplied, the supply is
A. Relatively elastic
B. Relatively inelastic
C. Perfectly inelastic
D. Unitarily inelastic
16. Development bank mainly provide
A. Savings account facilitates for a developing Economy
B. Foreign exchange facilitates for importer and exporters
C. Capital for development of special banks
D. Capital for development of schools
89
17. Mortgage banks give loans to investors on long term basis to
A. Finance agriculture
B. Establish banks
C. Acquire machinery
D. Build houses
18. Limitations to mobility of labour includes the following except
A. Poor salary and wages
B. Provision of good working condition
C. Good climatic condition
D. Provision of social amenities
19. Geographical mobility of labour indicates
A. The movement of workers from one occupation to another
B. The workers within the same industry or from one industry to another
C. The movement of workers from one part of a country to another
D. The movement of workers from one geographical location to another
20. The central banks controls credit in the economy through the use of
A. Legal tender
B. Travellers cheques
C. Foreign exchange instruments
D. Open market operation
21. A commercial bank is able to create money by
A. Printing
B. Maintaining reserves
C. Creating a demand deposit as it gives a new loan
D. Issuing cheques to depositors
90
22. The system whereby the ownership and management of the means of production
are vested in both the private and public sectors is known as
A. Socialist economy
B. Mixed economy
C. Capitalist economy
D. communist economy
23. Which of the following defines inflation?
A. A buoyant economy
B. A reduction in taxes
C. A continuous rise in prices
D. A continuous fall in prices
24. A change in supply is mainly caused by
A .change in income
B. Weather condition
C. Changes in the price of the commodity
D. A change in taste and fashion
25. A demand schedule shows the quantities of goods that are
A. Bought at given prices at a time
B. Supplied at given prices at a time
C. Produced at given prices at a time
D. Reserved for future consumption
26. The point of interaction between the demand curve and supply curve is called
A. The point of intersection
B. The point of supply and demand curve
C. The equal point of demand and supply
91
D. The Equilibrium point
27. If the central bank intends to increase the money supply through open market
operations, then it will
A. Sell securities in the open market
B. Buy securities in the open market
C. Issue more currency notes
D. Give loan to the commercial bank.
28. Banks creates money by
A. Giving draft to customers
B. Printing more money
C. Lending out deposits to borrowers
D. Issuing cheques
29. The notion of short run and long run period is responsible for grouping cost into
A. fixed and variable
B. Implicit and explicit
C. Average and total
D. Capital and running
30. The total amount of goods that can be bought at a given price and at a particular
period of time
A. Demand
B. Supply
C. Market
D. Production
92
31. The central bank controls the activities of other banks by all but one of the following
A. Taxation
B. The purchase of sale of government bonds on the open market
C. special deposits
D. The use of bank rate
32. The implicit cost that economics consider but accounts do not
A. Fixed cost
B. Variable cost
C. Opportunity cost
D. Marginal cost
33. A tax is said to be regressive when the proportion paid by the
A. Rich pay a greater proportion of their income than the poor
B. Rich pay an equal amount with the poor
C.The low-income group is higher than that paid by the higher income earners
D. The low-income does not pay income
34. The law of demand says that the
A. Higher the price the lower the quantity demanded
B. Higher the price the lower the quantity demanded
C. Lower the price the lower the quantity demanded
D. Lower the price the higher the quantity demanded
35. A Deficit budget is usually drawn up during
A. Economics supply
B. Full employment
C. Inflationary period
D. Economic recession
93
36. Given that the fixed cost is N500.00 variable cost is N1500 and output is 50 units.
What will be the average cost of producing one unit?
A. N21000
B. N60.00
C. N50.00
D. N 40.00
37.A surplus budget means
A. Government spends more money than it receives as revenue
B. Government spends less than it actually receives as revenue
C. When the desired level of full employment exist in the economy
D. Government spends equal money with what it receives
38. The term mobility of labour refers to
A. Movement of workers from one country to another
B.Movement of workers from one occupation and geographical area to another
C. movement of workers from one occupation to another
D. Movement of workers from one geographical area to another
39. Public finance is the study of method employed by government
A. To raise revenue and how it spends the revenue and manages the national debt
B. To give employment opportunity
C. Produce goods and services
D. To raise revenue without spending
40. The following are the principles of good taxation except
A. Equity taxation
B. Benefit principle
C. Indirect tax
94
D. Economy
41. The following are the types of demand except
A. Derived demand
B. Comprehensive demand
C. Joint or complementary demand
D. competitive demand
42. The type of Economic system in which the ownership and management of all
means of production are vested in the hands of private individuals is known as
A. Capitalism
B. Socialism
C. Communism
D. Mixed economy
43. The full meaning of (VAT) is
A. Value additional tax
B. Variable added tax
C. Value application tax
D. Value added tax
44. The type of bank that requires collateral security from customers before issuing
loan is called
A. Central bank
B. First bank
C. Commercial bank
D. Insurance company
45. One of the features of mixed economy is that
A. Resources are jointly owned by public and private sectors
B. It involves a great deal of central economic planning
95
C. Inheritance
D. Ownership is privately owned
46. The following factors influence the size of labour force except
A. Total population of the country
B. Role of women in t he society
C. Retirement
D. Income
47. The following are reasons for labor mobility except
A. Promotion or transfer of workers
B. Bad management and lack of job security
C. Provision of good working conditions
D. Regular promotion and payment of salary
48. Commercial banks settle their inter-bank indebtedness through credit in the
economy through the use of
A. Merchant bank
B. Central bank
C. Development bank
D. Stock exchange
49. The following factors affects efficiency of labour except
A. Health
B. Working conditions
C. Specialization and division of labour
D. Poor salary
50. Demand for labour indicate the
A. Number of workers that are due for retirement
B. Number of workers that needed for promotion
C. Number of workers that that are brought into close contact with one another
D. Number of workers that are needed by producers to take part in productive
activities.
96
APPENDIX E
SCORING GUIDE FOR ECONOMICS MULTIPLE CHOICE TEST (EMT)
S/N 1 2 3 4 5 6 7 8 9 10 11 12 13
Answer A B A C C D B A A D A D A
S/N 14 15 16 17 18 19 20 21 22 23 24 25 26
Answer C B C D A D D C B C A B D
S/N 27 28 29 30 31 32 33 34 35 36 37 38 39
Answer D C A C A C C D D D B B A
S/N 40 41 42 43 44 45 46 47 48 49 50
Answer C B A B C A D C B D D
97
APPENDIX F
TABLE OF SPECIFICATION FOR SS2 ECONOMICS MULTIPLE CHOICE
TEST (EMT) FOR SS2 ECONOMICS STUDENTS
S/
n
Content
areas
Cont. % Know. Comp. Appl. Analys. Synth
esis
Evalu
ation
Tot
al
35% 15% 15% 25% 5% 5%
1 Demand
and
Supply
30% 5(2,5,24,
30, 25)
2(8, 15) 2(13, 4) 4(7,14,
26, 34)
- 1(41) 14
2 Financia
l
institutio
n
20% 4(3,9,28,
1)
2(48,12) 2(17,
20)
3(16,
44,27)
1(31) 1(21) 13
3 Public
finance
15% 2(33, 40) 2(37,43) - 2(6,39) - 1(35) 7
4 Labour
force
15% 2(18,50) 2(19,46) 1(38) 2(47,49
)
- - 7
5 Alternati
veecono
mic
system
10% 2(10,45) 1(22) 1(42) 1(44) - - 5
6 Theory
of
Cost
5% 1(36) - - 1(32) - 2
7 Inflation
5% 1(23) - - 1(11) - - 2
100% 17 9 6 14 1 3 50
98
APPENDIX G
COMPUTATION OF KR20 RELIABILITY CO-EFFICIENT FOR SS2
ECONOMICS MULTIPLE CHOICE TEST (EMT)
Items No. passing No.
failing
Proportion
passing(p)
Proportion
failing (q)
Pq
1 12 13 0.48 0.52 0.2496
2 17 8 0.68 0.32 0.2176
3 14 11 0.56 0.44 0.2464
4 17 8 0.68 0.32 0.2176
5 16 9 0.64 0.36 0.2304
6 18 7 0.72 0.28 0.2016
7 15 10 0.6 0.4 0.24
8 15 10 0.6 0.4 0.24
9 18 7 0.72 0.28 0.2016
10 13 12 0.52 0.48 0.2496
11 16 9 0.64 0.36 0.2304
12 18 7 0.72 0.28 0.2016
13 17 8 0.68 0.32 0.2117
14 14 11 0.56 0.44 0.2464
15 15 10 0.6 0.4 0.24
16 17 8 0.68 0.32 0.2176
17 13 12 0.52 0.48 0.2496
18 15 10 0.6 0.4 0.24
19 15 10 0.6 0.4 0.24
20 15 10 0.6 0.4 0.24
21 16 9 0.64 0.36 0.2304
22 13 12 0.52 0.48 0.2496
99
23 18 7 0.72 0.28 0.2016
24 14 11 0.56 0.44 0.2464
25 14 11 0.56 0.44 0.2464
26 13 12 0.52 0.48 0.2496
27 18 7 0.72 0.28 0.2016
28 14 11 0.56 0.44 0.2464
29 15 10 0.6 0.4 0.24
30 13 12 0.52 0.48 0.2496
31 18 7 0.72 0.28 0.2016
32 15 10 0.6 0.4 0.24
33 16 9 0.64 0.36 0.2304
34 14 11 0.56 0.44 0.2464
35 14 11 0.56 0.44 0.2464
36 15 10 0.6 0.4 0.24
37 14 11 0.56 0.44 0.2464
38 14 11 0.56 0.44 0.2464
39 14 11 0.56 0.44 0.2464
40 16 9 0.64 0.36 0.2304
41 14 11 0.56 0.44 0.2464
42 17 8 0.68 0.32 0.2176
43 14 11 0.56 0.44 0.2464
44 16 9 0.64 0.36 0.2304
45 16 9 0.64 0.36 0.2304
46 16 9 0.64 0.36 0.2304
47 15 10 0.6 0.4 0.24
48 15 10 0.6 0.4 0.24
100
49 16 9 0.64 0.36 0.2304
50 17 8 0.68 0.32 0.2176
Total 11.6773
Mean =N
X∑
= 764 = 30.56
25
S²= n
nXX /)( 22
∑∑ −
= 25
25/)764(25646 2−
= 25
84.2334725646 −
2298.16 = 91.9264
25
KR20 = 1−K
K(1-
2S
pq∑)
KR20 = 150
50
−
(1- 9264.91
6773.11) =
49
50(1-
9264.91
6773.11)
= 1.020408163 x (1 – 0.127028797)
KR20 = 1.020408163 x 0.872971203
KR20 = 0.890786941
KR20 = 0.89
101
APPENDIX H
BILOG-MG V3.0
REV 19990329.1300
BILOG-MG ITEM MAINTENANCE PROGRAM: LOGISTIC ITEM RESPONSE
MODEL
*** BILOG-MG ITEM MAINTENANCE PROGRAM ***
*** PHASE 2 ***
3PL MODEL ANALYSIS OF ECONOMICS ACHIEVEMENT TEST
0
>CALIB ACCel = 1.0000;
CALIBRATION PARAMETERS
======================
MAXIMUM NUMBER OF EM CYCLES: 20
MAXIMUM NUMBER OF NEWTON CYCLES: 2
CONVERGENCE CRITERION: 0.0100
ACCELERATION CONSTANT: 1.0000
LATENT DISTRIBUTION: NORMAL PRIOR FOR EACH GROUP
PLOT EMPIRICAL VS. FITTED ICC'S: NO
DATA HANDLING: DATA ON SCRATCH FILE
CONSTRAINT DISTRIBUTION ON ASYMPTOTES: YES
CONSTRAINT DISTRIBUTION ON SLOPES: YES
CONSTRAINT DISTRIBUTION ON THRESHOLDS: NO
SOURCE OF ITEM CONSTRAINT DISTIBUTION
MEANS AND STANDARD DEVIATIONS: PROGRAM DEFAULTS
102
SUBTEST TEST0001; ITEM PARAMETERS AFTER CYCLE 13
ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF
S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------------
ITEM0001 | -0.137 | 0.117 | 1.174 | 0.116 | 0.080 | 51.2 8.0
| 0.042* | 0.025* | 0.437* | 0.025* | 0.015* | (0.10000)
| | | | | |
ITEM0002 | 0.231 | 0.200 | -1.153 | 0.196 | 0.030 | 37.6 8.0
| 0.040* | 0.033* | 0.270* | 0.033* | 0.010* | (0.0000)
| | | | | |
ITEM0003 | 0.138 | 0.390 | -0.354 | 0.364 | 0.093 | 67.3 8.0
| 0.045* | 0.040* | 0.122* | 0.037* | 0.018* | (0.1200)
| | | | | |
ITEM0004 | 0.218 | 0.491 | -0.444 | 0.440 | 0.001 | 48.5 8.0
| 0.042* | 0.043* | 0.093* | 0.039* | 0.006* | (0.0000)
| | | | | |
ITEM0005 | 0.161 | 0.454 | -0.354 | 0.413 | 0.101 | 51.9 8.0
| 0.041* | 0.044* | 0.096* | 0.040* | 0.005* | (0.2034)
| | | | | |
ITEM0006 | 0.118 | 0.439 | -0.268 | 0.402 | 0.000 | 47.6 8.0
| 0.040* | 0.045* | 0.099* | 0.041* | 0.003* | (0.0000)
| | | | | |
ITEM0007 | 0.094 | 0.582 | -0.161 | 0.503 | 0.012 | 96.6 8.0
103
| 0.084* | 0.066* | 0.159* | 0.057* | 0.048* | (0.1544)
| | | | | |
ITEM0008 | 0.057 | 0.793 | -0.072 | 0.622 | 0.082 | 30.5 8.0
| 0.045* | 0.059* | 0.058* | 0.046* | 0.003* | (0.0002)
| | | | | |
ITEM0009 | -0.709 | 0.967 | 0.733 | 0.695 | 0.177 | 90.9 8.0
| 0.172* | 0.169* | 0.087* | 0.121* | 0.032* | (0.1441)
| | | | | |
ITEM0010 | 0.318 | 0.524 | -0.606 | 0.464 | 0.002 | 46.7 8.0
| 0.045* | 0.045* | 0.096* | 0.040* | 0.014* | (0.0500)
| | | | | |
ITEM0011 | 0.116 | 0.425 | -0.274 | 0.392 | 0.020 | 79.4 8.0
| 0.086* | 0.051* | 0.221* | 0.047* | 0.062* | (0.0000)
| | | | | |
ITEM0012 | 0.361 | 0.386 | -0.936 | 0.360 | 0.001 | 57.1 8.0
| 0.042* | 0.041* | 0.140* | 0.039* | 0.007* | (0.1633)
| | | | | |
ITEM0013 | -0.159 | 0.958 | 0.166 | 0.692 | 0.051 | 31.5 8.0
| 0.050* | 0.070* | 0.050* | 0.050* | 0.007* | (0.0343)
| | | | | |
ITEM0014 | 0.332 | 0.467 | -0.711 | 0.423 | 0.011 | 18.2 8.0
| 0.042* | 0.044* | 0.109* | 0.040* | 0.005* | (0.0113)
| | | | | |
ITEM0015 | 0.245 | 0.514 | -0.478 | 0.457 | 0.001 | 55.0 8.0
104
| 0.042* | 0.043* | 0.093* | 0.039* | 0.004* | (0.0812)
| | | | | |
ITEM0016 | 0.488 | 0.233 | -2.096 | 0.227 | 0.021 | 35.2 8.0
| 0.042* | 0.037* | 0.358* | 0.036* | 0.009* | (0.0001)
| | | | | |
ITEM0017 | 0.310 | 0.130 | -2.378 | 0.129 | 0.134 | 31.4 8.0
| 0.040* | 0.027* | 0.577* | 0.027* | 0.010* | (0.0745)
| | | | | |
ITEM0018 | 0.049 | 0.881 | -0.056 | 0.661 | 0.146 | 84.2 8.0
| 0.048* | 0.061* | 0.056* | 0.046* | 0.002* | (0.0000)
| | | | | |
ITEM0019 | 0.262 | 0.510 | -0.515 | 0.454 | 0.000 | 76.0 8.0
| 0.041* | 0.048* | 0.095* | 0.043* | 0.003* | (0.0000)
| | | | | |
ITEM0020 | -0.193 | 0.559 | 0.346 | 0.488 | 0.122 | 43.9 8.0
| 0.042* | 0.052* | 0.075* | 0.046* | 0.003* | (0.0333)
| | | | | |
ITEM0021 | 0.392 | 0.668 | -0.588 | 0.555 | 0.001 | 31.7 8.0
| 0.045* | 0.049* | 0.076* | 0.041* | 0.005* | (0.0912)
| | | | | |
ITEM0022 | -0.147 | 1.133 | 0.130 | 0.750 | 0.070 | 44.2 8.0
| 0.117* | 0.130* | 0.093* | 0.086* | 0.036* | (0.1823)
| | | | | |
ITEM0023 | -0.285 | 1.094 | 0.261 | 0.738 | 0.323 | 77.4 8.0
105
| 0.170* | 0.175* | 0.125* | 0.118* | 0.041* | (0.0000)
| | | | | |
ITEM0024 | -0.233 | 1.213 | 0.192 | 0.772 | 0.007 | 13.7 8.0
| 0.095* | 0.132* | 0.065* | 0.084* | 0.023* | (0.0566)
| | | | | |
ITEM0025 | 0.176 | 0.597 | -0.295 | 0.513 | 0.041 | 40.0 8.0
| 0.044* | 0.045* | 0.079* | 0.039* | 0.009* | (0.0000)
| | | | | |
ITEM0026 | 0.190 | 0.318 | -0.597 | 0.303 | 0.003 | 84.2 8.0
| 0.044* | 0.039* | 0.154* | 0.037* | 0.019* | (0.1296)
| | | | | |
ITEM0027 | 0.280 | 0.188 | -1.493 | 0.185 | 0.001 | 18.0 8.0
| 0.040* | 0.033* | 0.329* | 0.032* | 0.009* | (0.0215)
| | | | | |
ITEM0028 | 0.383 | 0.655 | -0.586 | 0.548 | 0.001 | 79.0 8.0
| 0.046* | 0.055* | 0.076* | 0.046* | 0.008* | (0.0635)
| | | | | |
ITEM0029 | -0.056 | 1.287 | 0.044 | 0.790 | 0.092 | 46.0 8.0
| 0.096* | 0.138* | 0.072* | 0.085* | 0.026* | (0.0723)
| | | | | |
ITEM0030 | 0.539 | 0.742 | -0.727 | 0.596 | 0.000 | 43.7 8.0
| 0.048* | 0.058* | 0.077* | 0.046* | 0.003* | (0.0000)
| | | | | |
ITEM0031 | 0.140 | 0.969 | -0.144 | 0.696 | 0.002 | 21.3 8.0
106
| 0.052* | 0.070* | 0.055* | 0.050* | 0.012* | (0.0034)
| | | | | |
ITEM0032 | 0.345 | 0.249 | -1.383 | 0.242 | 0.052 | 103.4 8.0
| 0.041* | 0.035* | 0.242* | 0.034* | 0.012* | (0.08209)
| | | | | |
ITEM0033 | 0.276 | 0.600 | -0.461 | 0.515 | 0.001 | 48.7 8.0
| 0.043* | 0.049* | 0.079* | 0.042* | 0.005* | (0.000)
| | | | | |
ITEM0034 | -0.252 | 1.143 | 0.221 | 0.752 | 0.160 | 45.4 8.0
| 0.132* | 0.157* | 0.094* | 0.104* | 0.038* | (0.0148)
| | | | | |
ITEM0035 | -0.045 | 0.447 | 0.101 | 0.408 | 0.100 | 92.6 8.0
| 0.040* | 0.047* | 0.089* | 0.043* | 0.003* | (0.2357)
| | | | | |
ITEM0036 | 0.057 | 0.614 | -0.092 | 0.523 | 0.000 | 55.2 8.0
| 0.042* | 0.055* | 0.070* | 0.046* | 0.003* | (0.0000)
| | | | | |
ITEM0037 | -1.464 | 3.297 | 0.444 | 0.957 | 0.000 | 52.1 8.0
| 0.605* | 0.988* | 0.063* | 0.287* | 0.025* | (0.0000)
| | | | | |
ITEM0038 | -0.058 | 1.016 | 0.057 | 0.713 | 0.242 | 29.3 8.0
| 0.145* | 0.155* | 0.138* | 0.108* | 0.051* | (0.0003)
| | | | | |
ITEM0039 | -0.275 | 0.929 | 0.295 | 0.681 | 0.000 | 23.2 8.0
107
| 0.053* | 0.067* | 0.050* | 0.049* | 0.001* | (0.2647)
| | | | | |
ITEM0040 | -0.346 | 1.289 | 0.268 | 0.790 | 0.404 | 179.9 8.0
| 0.196* | 0.201* | 0.120* | 0.123* | 0.037* | (0.0000)
| | | | | |
ITEM0041 | 0.242 | 0.722 | -0.335 | 0.585 | 0.114 | 77.8 8.0
| 0.045* | 0.053* | 0.066* | 0.043* | 0.007* | (0.04498)
| | | | | |
ITEM0042 | -0.203 | 0.845 | 0.240 | 0.645 | 0.000 | 23.8 8.0
| 0.048* | 0.065* | 0.053* | 0.049* | 0.001* | (0.0006)
| | | | | |
ITEM0043 | -0.590 | 1.714 | 0.344 | 0.864 | 0.251 | 70.4 8.0
| 0.190* | 0.283* | 0.068* | 0.143* | 0.029* | (0.0000)
| | | | | |
ITEM0044 | -0.120 | 0.890 | 0.135 | 0.665 | 0.234 | 116.5 8.0
| 0.153* | 0.137* | 0.157* | 0.102* | 0.053* | (0.0000)
| | | | | |
ITEM0045 | 0.173 | 0.973 | -0.177 | 0.697 | 0.000 | 26.1 8.0
| 0.049* | 0.065* | 0.054* | 0.047* | 0.002* | (0.0875)
| | | | | |
ITEM0046 | 0.425 | 0.380 | -1.117 | 0.356 | 0.002 | 41.0 8.0
| 0.043* | 0.041* | 0.150* | 0.038* | 0.011* | (0.0000)
| | | | | |
ITEM0047 | 0.334 | 0.568 | -0.587 | 0.494 | 0.071 | 138.4 8.0
108
| 0.043* | 0.048* | 0.087* | 0.042* | 0.005* | (0.1341)
| | | | | |
ITEM0048 | -0.052 | 0.667 | 0.078 | 0.555 | 0.001 | 33.5 8.0
| 0.044* | 0.051* | 0.065* | 0.042* | 0.005* | (0.0001)
| | | | | |
ITEM0049 | 0.205 | 0.319 | -0.643 | 0.304 | 0.153 | 45.5 8.0
| 0.045* | 0.039* | 0.157* | 0.037* | 0.020* | (0.02293)
| | | | | |
ITEM0050 | 0.079 | 0.207 | -0.379 | 0.203 | 0.001 | 94.3 8.0
| 0.039* | 0.035* | 0.199* | 0.035* | 0.009* | (0.0861)
-----------------------------------------------------------------------------
* STANDARD ERROR
LARGEST CHANGE = 0.019468 2876.8 387.0
109
APPENDIX I
BILOG-MG V3.0
REV 19990329.1300
BILOG-MG ITEM MAINTENANCE PROGRAM: LOGISTIC ITEM RESPONSE
MODEL
*** BILOG-MG ITEM MAINTENANCE PROGRAM ***
*** PHASE 2 ***
DIF MODEL ANALYSIS OF ECONOMICS ACHIEVEMENT TEST BY GENDER
0
>CALIB ACCel = 1.0000;
CALIBRATION PARAMETERS
======================
MAXIMUM NUMBER OF EM CYCLES: 20
MAXIMUM NUMBER OF NEWTON CYCLES: 2
CONVERGENCE CRITERION: 0.0100
ACCELERATION CONSTANT: 1.0000
LATENT DISTRIBUTION: EMPIRICAL PRIOR FOR EACH GROUP
ESTIMATED CONCURRENTLY
WITH ITEM PARAMETERS
REFERENCE GROUP: 1
PLOT EMPIRICAL VS. FITTED ICC'S: NO
DATA HANDLING: DATA ON SCRATCH FILE
110
CONSTRAINT DISTRIBUTION ON SLOPES: NO
CONSTRAINT DISTRIBUTION ON THRESHOLDS: NO 1
GROUP 1 MALE ; ITEM PARAMETERS AFTER CYCLE 3
ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF
S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------------
ITEM0001 | -0.102 | 0.535 | 0.190 | 0.472 | 0.000 | 120.2 8.0
| 0.060* | 0.007* | 0.113* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0002 | 0.231 | 0.535 | -0.431 | 0.472 | 0.000 | 68.2 8.0
| 0.063* | 0.007* | 0.118* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0003 | 0.264 | 0.535 | -0.492 | 0.472 | 0.000 | 5.3 8.0
| 0.069* | 0.007* | 0.129* | 0.006* | 0.000* | (0.7242)
| | | | | |
ITEM0004 | 0.239 | 0.535 | -0.446 | 0.472 | 0.000 | 7.4 8.0
| 0.067* | 0.007* | 0.125* | 0.006* | 0.000* | (0.4953)
| | | | | |
ITEM0005 | 0.181 | 0.535 | -0.338 | 0.472 | 0.000 | 3.5 8.0
| 0.068* | 0.007* | 0.127* | 0.006* | 0.000* | (0.8999)
| | | | | |
ITEM0006 | 0.197 | 0.535 | -0.369 | 0.472 | 0.000 | 11.9 8.0
| 0.066* | 0.007* | 0.124* | 0.006* | 0.000* | (0.1559)
| | | | | |
111
ITEM0007 | 0.148 | 0.535 | -0.276 | 0.472 | 0.000 | 26.3 8.0
| 0.073* | 0.007* | 0.137* | 0.006* | 0.000* | (0.0009)
| | | | | |
ITEM0008 | 0.083 | 0.535 | -0.155 | 0.472 | 0.000 | 23.7 8.0
| 0.074* | 0.007* | 0.138* | 0.006* | 0.000* | (0.0026)
| | | | | |
ITEM0009 | -0.231 | 0.535 | 0.432 | 0.472 | 0.000 | 6.2 8.0
| 0.069* | 0.007* | 0.129* | 0.006* | 0.000* | (0.6230)
| | | | | |
ITEM0010 | 0.417 | 0.535 | -0.779 | 0.472 | 0.000 | 3.0 8.0
| 0.070* | 0.007* | 0.131* | 0.006* | 0.000* | (0.9334)
| | | | | |
ITEM0011 | 0.214 | 0.535 | -0.399 | 0.472 | 0.000 | 5.0 8.0
| 0.068* | 0.007* | 0.127* | 0.006* | 0.000* | (0.7601)
| | | | | |
ITEM0012 | 0.400 | 0.535 | -0.747 | 0.472 | 0.000 | 6.6 8.0
| 0.069* | 0.007* | 0.129* | 0.006* | 0.000* | (0.5837)
| | | | | |
ITEM0013 | 0.018 | 0.535 | -0.034 | 0.472 | 0.000 | 40.0 8.0
| 0.075* | 0.007* | 0.141* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0014 | 0.280 | 0.535 | -0.524 | 0.472 | 0.000 | 10.7 8.0
| 0.068* | 0.007* | 0.127* | 0.006* | 0.000* | (0.2223)
| | | | | |
112
ITEM0015 | 0.383 | 0.535 | -0.714 | 0.472 | 0.000 | 6.3 8.0
| 0.071* | 0.007* | 0.132* | 0.006* | 0.000* | (0.6171)
| | | | | |
ITEM0016 | 0.515 | 0.535 | -0.963 | 0.472 | 0.000 | 49.4 8.0
| 0.067* | 0.007* | 0.125* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0017 | 0.340 | 0.535 | -0.635 | 0.472 | 0.000 | 92.8 8.0
| 0.062* | 0.007* | 0.117* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0018 | 0.018 | 0.535 | -0.034 | 0.472 | 0.000 | 89.8 8.0
| 0.078* | 0.007* | 0.145* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0019 | 0.239 | 0.535 | -0.446 | 0.472 | 0.000 | 30.1 8.0
| 0.068* | 0.007* | 0.127* | 0.006* | 0.000* | (0.0002)
| | | | | |
ITEM0020 | -0.183 | 0.535 | 0.341 | 0.472 | 0.000 | 13.2 8.0
| 0.070* | 0.007* | 0.132* | 0.006* | 0.000* | (0.1053)
| | | | | |
ITEM0021 | 0.357 | 0.535 | -0.666 | 0.472 | 0.000 | 9.5 8.0
| 0.075* | 0.007* | 0.140* | 0.006* | 0.000* | (0.2983)
| | | | | |
ITEM0022 | 0.075 | 0.535 | -0.140 | 0.472 | 0.000 | 27.5 8.0
| 0.073* | 0.007* | 0.137* | 0.006* | 0.000* | (0.0006)
| | | | | |
113
ITEM0023 | 0.470 | 0.535 | -0.878 | 0.472 | 0.000 | 2.1 8.0
| 0.071* | 0.007* | 0.133* | 0.006* | 0.000* | (0.9792)
| | | | | |
ITEM0024 | -0.126 | 0.535 | 0.236 | 0.472 | 0.000 | 80.2 8.0
| 0.078* | 0.007* | 0.147* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0025 | 0.264 | 0.535 | -0.492 | 0.472 | 0.000 | 20.7 8.0
| 0.074* | 0.007* | 0.139* | 0.006* | 0.000* | (0.0079)
| | | | | |
ITEM0026 | 0.206 | 0.535 | -0.384 | 0.472 | 0.000 | 16.1 8.0
| 0.067* | 0.007* | 0.125* | 0.006* | 0.000* | (0.0413)
| | | | | |
ITEM0027 | 0.231 | 0.535 | -0.431 | 0.472 | 0.000 | 107.2 8.0
| 0.062* | 0.007* | 0.115* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0028 | 0.409 | 0.535 | -0.763 | 0.472 | 0.000 | 4.0 8.0
| 0.074* | 0.007* | 0.138* | 0.006* | 0.000* | (0.8573)
| | | | | |
ITEM0029 | 0.205 | 0.535 | -0.384 | 0.472 | 0.000 | 15.2 8.0
| 0.074* | 0.007* | 0.138* | 0.006* | 0.000* | (0.0555)
| | | | | |
ITEM0030 | 0.452 | 0.535 | -0.845 | 0.472 | 0.000 | 23.8 8.0
| 0.075* | 0.007* | 0.140* | 0.006* | 0.000* | (0.0024)
| | | | | |
114
ITEM0031 | 0.156 | 0.535 | -0.292 | 0.472 | 0.000 | 10.4 8.0
| 0.073* | 0.007* | 0.137* | 0.006* | 0.000* | (0.2405)
| | | | | |
ITEM0032 | 0.497 | 0.535 | -0.929 | 0.472 | 0.000 | 9.2 8.0
| 0.073* | 0.007* | 0.136* | 0.006* | 0.000* | (0.3239)
| | | | | |
ITEM0033 | 0.239 | 0.535 | -0.446 | 0.472 | 0.000 | 6.9 8.0
| 0.073* | 0.007* | 0.137* | 0.006* | 0.000* | (0.5420)
| | | | | |
ITEM0034 | 0.010 | 0.535 | -0.019 | 0.472 | 0.000 | 9.0 8.0
| 0.072* | 0.007* | 0.134* | 0.006* | 0.000* | (0.3430)
| | | | | |
ITEM0035 | 0.019 | 0.535 | -0.035 | 0.472 | 0.000 | 0.9 8.0
| 0.069* | 0.007* | 0.129* | 0.006* | 0.000* | (0.9958)
| | | | | |
ITEM0036 | 0.043 | 0.535 | -0.080 | 0.472 | 0.000 | 15.7 8.0
| 0.071* | 0.007* | 0.132* | 0.006* | 0.000* | (0.0472)
| | | | | |
ITEM0037 | 0.255 | 0.535 | -0.477 | 0.472 | 0.000 | 10.4 8.0
| 0.074* | 0.007* | 0.138* | 0.006* | 0.000* | (0.2357)
| | | | | |
ITEM0038 | 0.382 | 0.535 | -0.714 | 0.472 | 0.000 | 10.1 8.0
| 0.075* | 0.007* | 0.140* | 0.006* | 0.000* | (0.2557)
| | | | | |
115
ITEM0039 | -0.248 | 0.535 | 0.463 | 0.472 | 0.000 | 44.9 7.0
| 0.076* | 0.007* | 0.141* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0040 | 0.552 | 0.535 | -1.031 | 0.472 | 0.000 | 10.4 8.0
| 0.076* | 0.007* | 0.141* | 0.006* | 0.000* | (0.2403)
| | | | | |
ITEM0041 | 0.272 | 0.535 | -0.508 | 0.472 | 0.000 | 31.1 8.0
| 0.076* | 0.007* | 0.142* | 0.006* | 0.000* | (0.0001)
| | | | | |
ITEM0042 | -0.223 | 0.535 | 0.417 | 0.472 | 0.000 | 31.1 8.0
| 0.073* | 0.007* | 0.137* | 0.006* | 0.000* | (0.0001)
| | | | | |
ITEM0043 | 0.272 | 0.535 | -0.508 | 0.472 | 0.000 | 22.8 8.0
| 0.076* | 0.007* | 0.141* | 0.006* | 0.000* | (0.0037)
| | | | | |
ITEM0044 | 0.409 | 0.535 | -0.763 | 0.472 | 0.000 | 5.3 8.0
| 0.071* | 0.007* | 0.133* | 0.006* | 0.000* | (0.7217)
| | | | | |
ITEM0045 | 0.205 | 0.535 | -0.383 | 0.472 | 0.000 | 99.8 8.0
| 0.081* | 0.007* | 0.152* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0046 | 0.534 | 0.535 | -0.997 | 0.472 | 0.000 | 13.3 8.0
| 0.071* | 0.007* | 0.132* | 0.006* | 0.000* | (0.1025)
| | | | | |
116
ITEM0047 | 0.206 | 0.535 | -0.384 | 0.472 | 0.000 | 4.7 8.0
| 0.071* | 0.007* | 0.133* | 0.006* | 0.000* | (0.7911)
| | | | | |
ITEM0048 | -0.046 | 0.535 | 0.086 | 0.472 | 0.000 | 18.1 8.0
| 0.073* | 0.007* | 0.136* | 0.006* | 0.000* | (0.0208)
| | | | | |
ITEM0049 | 0.197 | 0.535 | -0.369 | 0.472 | 0.000 | 17.9 8.0
| 0.067* | 0.007* | 0.125* | 0.006* | 0.000* | (0.0220)
| | | | | |
ITEM0050 | 0.083 | 0.535 | -0.156 | 0.472 | 0.000 | 141.6 8.0
| 0.060* | 0.007* | 0.112* | 0.006* | 0.000* | (0.0000)
-------------------------------------------------------------------------------
* STANDARD ERROR
LARGEST CHANGE = 0.005386 3540.2 378.0
(0.0000)
GROUP 2 FEMALE ; ITEM PARAMETERS AFTER CYCLE 3
ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF
S.E. S.E. S.E. S.E. S.E. (PROB)
-------------------------------------------------------------------------------
ITEM0001 | -0.202 | 0.535 | 0.377 | 0.472 | 0.000 | 266.8 8.0
| 0.044* | 0.007* | 0.082* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0002 | 0.277 | 0.535 | -0.518 | 0.472 | 0.000 | 113.5 8.0
117
| 0.047* | 0.007* | 0.088* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0003 | 0.089 | 0.535 | -0.167 | 0.472 | 0.000 | 22.2 8.0
| 0.048* | 0.007* | 0.091* | 0.006* | 0.000* | (0.0046)
| | | | | |
ITEM0004 | 0.219 | 0.535 | -0.410 | 0.472 | 0.000 | 7.5 8.0
| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.4884)
| | | | | |
ITEM0005 | 0.163 | 0.535 | -0.304 | 0.472 | 0.000 | 16.8 8.0
| 0.049* | 0.007* | 0.092* | 0.006* | 0.000* | (0.0323)
| | | | | |
ITEM0006 | 0.085 | 0.535 | -0.159 | 0.472 | 0.000 | 40.8 8.0
| 0.049* | 0.007* | 0.091* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0007 | 0.098 | 0.535 | -0.182 | 0.472 | 0.000 | 36.9 8.0
| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0008 | 0.067 | 0.535 | -0.126 | 0.472 | 0.000 | 37.6 8.0
| 0.054* | 0.007* | 0.100* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0009 | -0.129 | 0.535 | 0.241 | 0.472 | 0.000 | 6.5 8.0
| 0.051* | 0.007* | 0.095* | 0.006* | 0.000* | (0.5931)
| | | | | |
ITEM0010 | 0.277 | 0.535 | -0.517 | 0.472 | 0.000 | 3.2 8.0
118
| 0.051* | 0.007* | 0.095* | 0.006* | 0.000* | (0.9211)
| | | | | |
ITEM0011 | 0.093 | 0.535 | -0.175 | 0.472 | 0.000 | 6.4 8.0
| 0.050* | 0.007* | 0.093* | 0.006* | 0.000* | (0.6034)
| | | | | |
ITEM0012 | 0.381 | 0.535 | -0.712 | 0.472 | 0.000 | 19.8 8.0
| 0.050* | 0.007* | 0.093* | 0.006* | 0.000* | (0.0110)
| | | | | |
ITEM0013 | -0.142 | 0.535 | 0.266 | 0.472 | 0.000 | 105.6 8.0
| 0.055* | 0.007* | 0.103* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0014 | 0.381 | 0.535 | -0.712 | 0.472 | 0.000 | 10.7 8.0
| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.7028)
| | | | | |
ITEM0015 | 0.184 | 0.535 | -0.344 | 0.472 | 0.000 | 22.3 8.0
| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.0044)
| | | | | |
ITEM0016 | 0.572 | 0.535 | -1.069 | 0.472 | 0.000 | 109.2 8.0
| 0.049* | 0.007* | 0.092* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0017 | 0.368 | 0.535 | -0.687 | 0.472 | 0.000 | 242.6 8.0
| 0.045* | 0.007* | 0.085* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0018 | 0.097 | 0.535 | -0.182 | 0.472 | 0.000 | 90.4 8.0
119
| 0.055* | 0.007* | 0.103* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0019 | 0.286 | 0.535 | -0.534 | 0.472 | 0.000 | 30.0 8.0
| 0.050* | 0.007* | 0.093* | 0.006* | 0.000* | (0.0002)
| | | | | |
ITEM0020 | -0.185 | 0.535 | 0.345 | 0.472 | 0.000 | 20.3 8.0
| 0.050* | 0.007* | 0.093* | 0.006* | 0.000* | (0.0094)
| | | | | |
ITEM0021 | 0.385 | 0.535 | -0.720 | 0.472 | 0.000 | 9.8 8.0
| 0.054* | 0.007* | 0.102* | 0.006* | 0.000* | (0.0188)
| | | | | |
ITEM0022 | 0.041 | 0.535 | -0.077 | 0.472 | 0.000 | 147.4 8.0
| 0.057* | 0.007* | 0.107* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0023 | 0.317 | 0.535 | -0.592 | 0.472 | 0.000 | 4.2 8.0
| 0.053* | 0.007* | 0.099* | 0.006* | 0.000* | (0.8359)
| | | | | |
ITEM0024 | -0.070 | 0.535 | 0.130 | 0.472 | 0.000 | 134.8 8.0
| 0.057* | 0.007* | 0.106* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0025 | 0.137 | 0.535 | -0.255 | 0.472 | 0.000 | 4.5 8.0
| 0.051* | 0.007* | 0.096* | 0.006* | 0.000* | (0.8131)
| | | | | |
ITEM0026 | 0.211 | 0.535 | -0.394 | 0.472 | 0.000 | 71.9 8.0
120
| 0.048* | 0.007* | 0.089* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0027 | 0.368 | 0.535 | -0.686 | 0.472 | 0.000 | 107.2 8.0
| 0.047* | 0.007* | 0.088* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0028 | 0.349 | 0.535 | -0.652 | 0.472 | 0.000 | 6.7 8.0
| 0.054* | 0.007* | 0.100* | 0.006* | 0.000* | (0.5697)
| | | | | |
ITEM0029 | 0.132 | 0.535 | -0.246 | 0.472 | 0.000 | 200.0 8.0
| 0.059* | 0.007* | 0.110* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0030 | 0.513 | 0.535 | -0.958 | 0.472 | 0.000 | 45.0 8.0
| 0.056* | 0.007* | 0.105* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0031 | 0.141 | 0.535 | -0.262 | 0.472 | 0.000 | 101.8 8.0
| 0.057* | 0.007* | 0.106* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0032 | 0.331 | 0.535 | -0.618 | 0.472 | 0.000 | 61.0 8.0
| 0.048* | 0.007* | 0.090* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0033 | 0.295 | 0.535 | -0.551 | 0.472 | 0.000 | 5.9 8.0
| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.6534)
| | | | | |
ITEM0034 | 0.189 | 0.535 | -0.352 | 0.472 | 0.000 | 21.0 8.0
121
| 0.053* | 0.007* | 0.100* | 0.006* | 0.000* | (0.0071)
| | | | | |
ITEM0035 | -0.082 | 0.535 | 0.153 | 0.472 | 0.000 | 68.6 8.0
| 0.048* | 0.007* | 0.090* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0036 | 0.076 | 0.535 | -0.142 | 0.472 | 0.000 | 12.4 8.0
| 0.051* | 0.007* | 0.095* | 0.006* | 0.000* | (0.1341)
| | | | | |
ITEM0037 | 0.313 | 0.535 | -0.584 | 0.472 | 0.000 | 9.4 8.0
| 0.052* | 0.007* | 0.098* | 0.006* | 0.000* | (0.3085)
| | | | | |
ITEM0038 | 0.344 | 0.535 | -0.643 | 0.472 | 0.000 | 4.2 8.0
| 0.053* | 0.007* | 0.099* | 0.006* | 0.000* | (0.8371)
| | | | | |
ITEM0039 | -0.151 | 0.535 | 0.282 | 0.472 | 0.000 | 78.9 8.0
| 0.055* | 0.007* | 0.103* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0040 | 0.489 | 0.535 | -0.913 | 0.472 | 0.000 | 14.5 8.0
| 0.053* | 0.007* | 0.099* | 0.006* | 0.000* | (0.0702)
| | | | | |
ITEM0041 | 0.215 | 0.535 | -0.401 | 0.472 | 0.000 | 11.1 8.0
| 0.053* | 0.007* | 0.098* | 0.006* | 0.000* | (0.1975)
| | | | | |
ITEM0042 | -0.099 | 0.535 | 0.186 | 0.472 | 0.000 | 68.4 8.0
122
| 0.055* | 0.007* | 0.102* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0043 | 0.167 | 0.535 | -0.312 | 0.472 | 0.000 | 11.2 8.0
| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.1931)
| | | | | |
ITEM0044 | 0.232 | 0.535 | -0.434 | 0.472 | 0.000 | 24.5 8.0
| 0.053* | 0.007* | 0.100* | 0.006* | 0.000* | (0.0019)
| | | | | |
ITEM0045 | 0.149 | 0.535 | -0.279 | 0.472 | 0.000 | 83.8 8.0
| 0.056* | 0.007* | 0.104* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0046 | 0.418 | 0.535 | -0.781 | 0.472 | 0.000 | 13.3 8.0
| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.4978)
| | | | | |
ITEM0047 | 0.404 | 0.535 | -0.755 | 0.472 | 0.000 | 2.0 8.0
| 0.053* | 0.007* | 0.099* | 0.006* | 0.000* | (0.9803)
| | | | | |
ITEM0048 | -0.027 | 0.535 | 0.050 | 0.472 | 0.000 | 18.0 8.0
| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.0214)
| | | | | |
ITEM0049 | 0.242 | 0.535 | -0.451 | 0.472 | 0.000 | 76.2 8.0
| 0.048* | 0.007* | 0.089* | 0.006* | 0.000* | (0.0000)
| | | | | |
ITEM0050 | 0.085 | 0.535 | -0.159 | 0.472 | 0.000 | 228.0 8.0
123
| 0.045* | 0.007* | 0.083* | 0.006* | 0.000* | (0.0000)
-------------------------------------------------------------------------------
* STANDARD ERROR