ani, elizabeth ngozika pg/m.ed/10/52484 application of

Ugwoke Oluchi C.

APPLICATION OF ITEM RESPONSE THEORY IN THE

DEVELOP

Digitally Signed by: Content manager’s

DN : CN = Webmaster’s name

O = University of Nigeria, Nsukka

OU = Innovation Centre

Ugwoke Oluchi C.

FACULTY OF EDUCATION

DEPARTMENT OF SCIENCE EDUCATON


DEVELOPMENT AND VALIDATION OF MULTIPLE

CHOICE TEST IN ECONOMICS

ANI, ELIZABETH NGOZIKA

PG/M.Ed/10/52484

i

: Content manager’s Name

Webmaster’s name

a, Nsukka

FACULTY OF EDUCATION

DEPARTMENT OF SCIENCE EDUCATON


MENT AND VALIDATION OF MULTIPLE

ii

TITLE PAGE

APPLICATION OF ITEM RESPONSE THEORY IN THE DEVELOPMENT

AND VALIDATION OF MULTIPLE CHOICE TEST IN ECONOMICS

BY

ANI, ELIZABETH NGOZIKA

PG/M.Ed/10/52484

A RESEARCH PROPOSAL PRESENTED TO THE DEPARTMENT OF

SCIENCE EDUCATION,

UNIVERSITY OF NIGERIA, NSUKKA

IN PARTIAL FULFILMENT OF THE REQUIREMENTS FOR THE

AWARD OF MASTER OF EDUCATION IN MEASUREMENT AND

EVALUATION (M.ED).

MAY, 2014

iii

APPROVAL PAGE

This project has been approved for the Department of Science Education,

University of Nigeria Nsukka.

By

Dr. B. C. Madu Professor Z. C. Njoku

Supervisor Head of Department

Professor O. A. Afemikhe Dr. J. C. Onuoha

External Examiner Internal Examiner

Professor Uju C. Umo Dean, Faculty of Education

iv

CERTIFICATION

Ani, Elizabeth Ngozika a postgraduate student in the Department of Science

Education with Registration number PG/M.Ed/10/52484 has satisfactorily completed

the requirements for the course and research work for the Degree of Master in

Measurement and Evaluation. The work embodied in this thesis is original and has not

been submitted in part or full for other diploma or degree of this or any other

university.

Ani, Elizabeth Ngozika Dr. B. C. Madu

Student Supervisor

v

DEDICATION

This project report is dedicated to God Almighty and my beloved father, late

Chief Matthew Chukwmaeze Ani who did not live to reap the fruits of his labour.

vi

ACKNOWLEDGEMENTS

The researcher sincerely appreciates God Almighty, for his love and guidance

throughout the period of this research work. The researcher wishes to acknowledge

with a deep sense of gratitude the co-operation, help and encouragement of all those

who in one-way or the other helped towards the success of this research.

First among them is the researchers’ supervisor Dr. B. C. Madu, whose

patience, guidance, fatherly advice, selfless services, and dedication, helped to bring

this work to a successful completion. The researcher’s special thanks go to Mr.

Christian Ugwuanyi and Dr. John Agah for their useful corrections and direction

during the proposal stage. The researcher’s special gratitude goes to the panel

members for their inputs and encouragement. The researcher also appreciate the

effort of Dr. (Mrs.) E. Umobong who contributed immensely to see that the analysis

and completion of the work is done.

The researcher is grateful to her friends and colleagues Nze Blessing,

Mrs.Violet Nwabufor, Mrs. Judith Kanu, Mrs. Rose Okoye, Ike Francis and India

Vershima. Finally, the researcher is indebted to her parents, brothers and sisters; Joe,

John, Ben, Chris, Goddy, Justina, Stella, Anayo (Nwanyi oma) and Ogbobe for their

prayers and financial support throughout this period of her study.

Ani, Elizabeth Ngozika

vii

TABLE OF CONTENTS

Title page - - - - - - - - - - i

Approval page - - - - - - - - - ii

Certification - - - - - - - - - iii

Dedication - - - - - - - - - iv

Acknowledgements - - - - - - - - - v

Table of Contents - - - - - - - - vi

List of Tables - - - - - - - - - ix

Abstract - - - - - - - - - - x

CHAPTER ONE: INTRODUCTION - - - - - - 1

Background of the Study - - - - - - - 1

Statement of the Problem - - - - - - 10

Purpose of the Study - - - - - - - - 10

Significance of the Study - - - - - - - 11

Scope of the Study - - - - - - - - 12

Research Questions - - - - - - - - - 13

Research Hypotheses - - - - - - - - 14

CHAPTER TWO: LITERATURE REVIEW - - - - 17

Conceptual Framework - - - - - - - - 15

Concept of Achievement Test - - - - - - - 15

Qualities of a Test - - - - - - - - - 19

Item Analysis - - - - - - - - - 24

viii

Differential Item Functioning (DIF) - - - - - - 25

Standard Error of Measurement (S.E.M) - - - - - - 25

Concept of Gender - - - - - - - - 25

Analysis of Fit - - - - - - - - - 26

Schematic Representation of Conceptual Framework - - - - 26

Theoretical Framework - - - - - - - - 27

Classical Test Theory - - - - - - - - 27

Item Response Theory - - - - - - - - 28

Review of Empirical Studies - - - - - - 34

Summary of Literature Review - - - - - - - 40

CHAPTER THREE: RESEARCH METHOD - - - - 41

Research Design - - - - - - - - - 41

Area of the Study - - - - - - - - - 41

Population of the Study - - - - - - - - 42

Sample and Sampling Technique - - - - - - - 42

Instrument for Data Collection - - - - - - - 43

Validation of the Instrument - - - - - - - 44

Reliability of the Instrument - - - - - - - 44

Method of Data Collection - - - - - - - - 45

Method of Data Analysis - - - - - - - 45

CHAPTER FOUR: RESULTS - - - - - - - 46

Research Questions 1 - - - - - - - - 46


ix





Research Hypothesis I - - - - - - - - 53

Research Hypothesis II - - - - - - - - 53

Summary of the Findings - - - - - - - 54

CHAPTER FIVE: DISCUSSION OF FINDINGS, CONCLUSION,

IMPLICATIONS, RECOMMENDATIONS AND SUMMARY OF THE STUDY

- - - - - - - - - - - - 56

Discussion of Findings - - - - - - - 56

Conclusion - - - - - - - - - - 60

Educational Implications - - - - - - - - 61

Recommendations - - - - - - - - - 62

Limitation of the Study - - - - - - - - 63

Suggestions for Further Studies - - - - - - - 63

Summary of the Study - - - - - - - - 63

References - - - - - - - - - - 66

APPENDICES

A: Data for Area of Study - - - - - - - 74

B: Population Data - - - - - - - - - 77

C: Sampling Data - - - - - - - - - 80

x

D: Instrument - - - - - - - - - 81

E: Scoring Guide - - - - - - - - - 89

F: Table of specification - - - - - - - - 90

G: Reliability Test - - - - - - - - - 91

H: 3PL Model Analysis of Economics Achievement Test - - - 93

I: DIF Model Analysis of Economics Achievement Test - - - 98

xi

LIST OF TABLES

Pages

Table 1: Standard errors of measurement of the test items of the multiple-choice test

in Economics based on three-parameter logistic (3PL)

model................................................................................................... 66

Table 2: Fits statistics of multiple choice test based on three parameter logistic (3PL)

model.................................................................................................. 67

Table 3: Item threshold values (difficulty estimates) of the items of the multiple

choice test in Economics based on three parameter logistic (3PL)

model................................................................................................... 69

Table 4: Item parameters of the test items of the multiple choice test in Economics

based on three parameter logistic (3PL) model........................................70

Table 5: Guessing parameters of the test items of the multiple choice questions in

Economics based on three parameter logistic (3PL) model........................71

Table 6: Model for group differential item functioning of the test items of the multiple

choice test in Economics........................................................................ 72

xii

ABSTRACT

The study applied item response theory in the development and validation of multiple

-choice test in Economics. Instrumentation research design was used for the study. A

sample of 1005 Economics senior secondary school II students was randomly selected

from 46 government co-education schools. To guide this study, six research questions

were posed and two hypotheses were formulated. The Economics Multiple choice test

items numbering 50 developed by the researcher were used for data collection. To

ensure the validity of the instrument, the instrument was subjected to face and content

validation by three experts, two from the department of science education and one

from Economics department. The reliability index of 0.89 was obtained. The data

generated from the study were analyzed using maximum likelihood estimation

technique of BILOG-MG computer programming. The analysis of the data revealed

that 50 test items of Economics survived therefore, the final instrument developed for

assessing students’ ability in Economics contained 50 items with the appropriate

indices. The result of the study showed that 49 items of the multiple choice question in

Economics were reliable based on three parameter model (3pl) model. The findings

also showed that thirty one (31) items of the Economics multiple-choice test in

Economics were difficult. The findings further revealed that items functions

differential in Economics among male and female students. Based on the findings,

recommendations were made which include that the examination bodies and teachers

should encourage and adopt IRT in developing test items used in measuring students

ability in Economics.

1

CHAPTER ONE

INTRODUCTION

Background to the Study

Economics is one of the senior secondary school subjects that require

assessment to ascertain students’ basic knowledge and skills and understanding of the

concepts and the nature of economic problems in any society. Economics has been

defined variously by many authorities. These different definitions arise because

Economics studies human behavior and man behaves differently. Mankiw (2001)

defined Economics as the study of how society manages its scarce resources. Egunjobi

and Egwakhide (2010) opined that Economics is the study of human endeavors in

respect of production, distribution, exchange and consumption. Economics, according

to Orji (2002), is the science of scarcity and choice. This implies that when resources

are limited in quantity relative to their uses, they are scarce, and the fact about scarcity

forces the individual to make a choice among the alternatives. In Nigeria, Economics

came into the secondary school curriculum in1966 (Obemeata, 1991). The objectives

of studying Economics according to Asadu (2001) are:

• to enable students to acquire knowledge for the practical solution of the

economic problem of Nigerian societies, developing countries and the world at

large.

• to prepare and encourage students to be cautious and affective in the

management of scarce resources.

• to equip students with the basic principle of economics necessary for useful

living.

2

• to increase students respect for the dignity of labour and their appreciation to

economic, cultural and social values of the society.

The objectives discussed tend to suggest that the study of Economics is a form

of learning in which knowledge, skills and habits of a group of people are transferred

from one generation to the next through teaching, training or research. Learning is

simply described as a change in behavior as a result of experience (Maduewesi, 1999).

According to Black and William (2009) learning is tied to effective assessment by

monitoring students, progress and feeding that information back to students. Because

learning is unpredictable, assessment is necessary to make adaptive adjustments to

instruction, but assessment processes themselves impact the learner’s willingness,

desire, and capacity to learn (Harlen & Deakin-Crick, 2002). Assessment is the

systematic collection, review and use of information about educational programs to

improve student learning. In the view of Huba and Freed (2000), assessment is the

process of gathering and discussing information from multiple and diverse sources in

order to develop a deep understanding of what students know, understand, and can do

with their knowledge as a result of their educational experiences. This idea could be

seen in the Federal Republic of Nigeria (FRN) policy on education concerning

continuous assessment which is supposed to be implemented at all level of the

educational system for both adult and young learners (FRN, 2004). This type of

assessment could be affected through the use of achievement test. Malcolm (2003)

viewed achievement test as an exam designed to assess how much knowledge a

person has in a certain area or set of areas. The following are some objectives of

achievement tests:

3

• To measure whether students possess the pre-requisite skills needed to succeed

in any unit or whether the students have achieved the objectives of the planned

instruction.

• To monitor students' learning and to provide on-going feedback to both

students and teachers during the teaching-learning process.

• To identify the students' learning difficulties- whether persistent or recurring.

• To assign grades.

These objectives can be achieved by the use of different assessment

instruments such as; essay tests and objective tests which are utilized by the teacher

depending on the aims of the measurement. The focus of this study is on objective

tests. Objective test is one of the assessment instrument used in testing or assessing

students’ academic achievement in any given instruction. In objective tests, such as

multiple choice questions, students are asked and respondent required to select the

best possible answer (or answers) out of the choices from a list (Okoro, 2006).

Multiple choice items consist of a stem and a set of options. The stem is the beginning

part of the item that presents the problem to be solved, a question asked of the

respondent, or an incomplete statement to be completed, as well as any other relevant

information. The options are the possible answers that the examiner can choose from,

with the correct answer called the key and the incorrect answers called distracters.

Test scores obtained from the multiple choice questions are used to assess the

competence of the students. Some of the advantages of the multiple choice questions

as reported in the literature are; multiple choice test items can be used to measure both

the lower and higher levels of the cognitive domain (Onunkwo, 2002). Multiple

4

choice tests, unlike essay test, allow the teacher to ask a large number of questions

that adequately cover the course content (Okoro, 2006). Bush (2001) noted that

multiple choice questions can increase the test takers probability of guessing the right

answer to a question by eliminating unlikely choices. The multiple choice tests

generally are much more objective, because they are mostly self-administered and

scorers can apply a scoring key which allows them to agree perfectly (Meredith, Joyce

& Walter 2007). However all assessment instruments must satisfy the criteria of

reliability, validity, objectivity as well as usability (Anene & Ndubisi, 2003).

Reliability is conceived in relation to the extent of consistency or dependability of a

measuring instrument (Abonyi, 2011). This implies that if any test were to be applied

in Economics an infinite number of times, it would be expected to generate responses

that vary a little from trial to trial, as a result of measurement error. Therefore, for any

measuring instrument, the smaller the error, the greater the reliability while the greater

the error, the smaller the reliability. Individual scores on a test can be viewed as the

combined result of the true score and measurement error. The type of measurement

error that is utilized in interpreting individual scores is called standard error of

measurement. Standard error of measurement, according to Onunkwo (2002),

provides the standard deviation of a series of measurements taken on the same

individual. Validity refers to the extent to which an instrument measures what it is

designed to measure (Nworgu, 2006). A test with high validity will measure

accurately the particular qualities it is supposed to measure. The objectivity of a test

refers to whether its scores are undistorted by biases of individuals who administer

and score it, while usability of a test is the extent to which a test provides to the

5

teacher or test administrator, clear instructions that can be put into practice without a

great deal of difficulty or confusion. In order words, a test in Economics is usable if it

does not force students to waste their time dealing with the idea of recording the

answer. Nevertheless, instrument development in Economics requires more than

determination of reliability, validity, objectivity and usability of the items. Some other

indices such as item difficulty, item discrimination, distractors are required for

determination of the quality of the instrument.

Unfortunately, teacher of Economics which teachers are inclined to, do not

determine these qualities of a test. The reason may be that the questions should not

require these qualities or teachers lack the knowledge of setting quality tests. This may

result in the students’ failure in WAEC (West African Examinations Council).

However, the procedures for determining these indices or parameter of items of the

instrument depend on the measurement theory used. The two distinct measurement

theories are the Classical Test Theory (CTT) and Item Response Theory (IRT).

Classical test theory is based on the true score theory which views the observed score

(X) as a combination of the true scores (T) and an error component (E) (Adedoyin,

2010). The observed score of a test-taker is usually seen as an estimate of the true

scores of the test-taker plus or minus some unobservable measurement error (Crocker

& Algina, 2008). An advantage of classical test theory is that it is relatively simple

and easy to interpret. CTT does not have a complex theoretical model to relate an

examinee’s ability to succeed on a particular item. Instead, CTT collectively considers

a pool of examinees and empirically examinees ability to success on a particular item.

However, CTT can be criticized since the item difficulty could vary depending on the

6

sample of test-takers of test. Therefore, it is difficult to compare test-takers results

between different tests. Secondly, Npkone (2001) asserted that the proportion of

examinees in a sample that get an item correct changes from a sample whose mean

ability is high to one whose mean ability is low.

However, despite the limitation of CTT it is being used to describe the estimates

of achievement test in secondary schools. For instance, the students’ achievements in

Economics are often subjected to statistical measure as mean, standard deviation,

e.t.c. These statistics change for a test when another sample from the same population

of students is used. The estimates or indices are obtained depending on how many

samples were chosen from the students’ population. In order words, there is so much

dependence on student total (aggregate) score in a test while the achievement on

individual items is not determined. Therefore, to ensure effective teaching and

learning of Economics in schools, an achievement test that focuses on attainment on

individual items will have better utility than one on students’ aggregate scores. An

educational measurement scale that has ratio scale, sample independent attributes and

students’ ability reported on both item and total instrument levels can be developed

with the measurement theory called Item Response Theory (IRT) otherwise known as

modern theory. Item Response Theory (IRT) is, for some researchers, the answer to

the limitations of classical test theory (Troy- Gerard, 2004). Item response theory is a

modeling technique that tries to describe the relationship between an examinee’s test

performance and the latent trait underlying the performance (Henard, 2000). Reeve

(2002) describes item response theory as a body of theory describing the application

of mathematical models to data from questionnaires and tests as a basis for measuring

7

things such as abilities and attitudes. Item Response Theory (IRT) looks at the

examinee’s performance by using item distributions based on the examinee’s

probability of success on a latent variable. In IRT, item statistics also referred

parameters are estimated and interpreted. Under IRT, parameters of the persons are

invariant across items, and parameters of the items are invariant in different

populations of persons. It brings greater flexibility and provides more sophisticated

information which allows for the improvement of the reliability of an assessment.

According to Nenty (2004), invariance is the bedrock of objectivity in physical

measurement, and the lack of it raises a lot of questions about the scientific nature of

psychological measurement. Item response theory is a collection of different models

showing the relationship between a participant’s responses on an item and underlying

latent trait (Ercikan & Koh, 2005). These models were originally developed for items

that are scored dichotomously (correct or incorrect) but the concept and method of

IRT extend to a wide variety of polytomous models for all types of psychological

variables that are measured by rating scales of various kind (Vander & Hambleton,

1997). IRT model assumes that the performance of an examinee can be completely

predicted or explained from one or more abilities. IRT models the probability of a

correct answer using three logistic functions. The one-parameter logistic (1PL) model

attempts to address the probability of a correct answer by allowing each question to

have an independent difficulty variable. For instance, one-parameter model allows

each question on an achievement test to have an independent difficulty variable. The

two-parameter logistic (2PL) model attempts to model each item’s level of

discrimination between high and low ability students while in the (3PL) model adds a

8

third item parameter which is called pseudo-guessing parameter that reflects the

probability that an examinee with a very low trait level will correctly answer an item

solely by guessing. This implies that students can correctly answer an item in an

achievement test by guessing.

Obinne (2012) observed that guessing is giving an answer or making a

judgment about something without being sure of all the facts. Guessing parameter

model gives the probability of an individual with ability, responding correctly to an

item with a difficulty index, discrimination index and a guessing index. The model

assumes that the three parameters (difficulty, discrimination and guessing) are

necessary for an estimate and valid relationship between the probability of a correct

response to an item and the trait level (ability) of an individual. Within the latent trait

test model, the internal validity of a test is assessed in terms of the statistical fit of

each item to the model. Fit to the model also implies that item discriminations are

uniform and substantial, that there are no errors in item scoring. It also indicates that

guessing has had a negligible effect on test scores. IRT models are extremely helpful

in assessment instrument like Economics achievement test when trying to understand

students’ abilities by examining their test performance. To ensure that Economics

achievement test is fair for all examinees, the instrument should be fair. A test

instrument is said to be fair when two groups of equal ability with respect to the

construct measured by the test should earn the same score on each item of the test.

The comparison between results of subgroups gives indication of items that are

functioning differently for different groups of students. If the test is not fair or yield

different scores from subgroups for instance gender, it is said to suffer from

9

Differential Item Functioning (DIF). Differential item functioning is a collection of

statistical methods that gives indication of items that are functioning differently for

different groups of students (Madu, 2012). This implies that differential item

functioning would occur in Economics achievement test if the Item Response

Function (IRF) for an item are different for two groups. In the view of Meredith,

Joyce and Walter (2007) differential item functioning means that individuals of equal

ability but from different subgroup (e.g., males and females) do not have the same

probability of earning the same score. Gender is a broad analytic concept which

highlights women’s roles and responsibilities in relation to those of men. Gender

relates to the difference in sex (that is, either male or female) and how this quality

affects their dispositions and perception toward life and academic activities (Okoh,

2007). Hence, instrument developed for measuring achievement test in Economics

may suffer from differential item functioning if they do not have the basic qualities

that test instrument should have and moreover even when they tried to have some

qualities they are based on the CTT frame work where a large p-value difference and

item by group interaction may label an item as biased when in fact no bias exist.

However, the type of measurement theory that ensures item level performance instead

of aggregate level performance in analyzing Economics achievement test is therefore

the concern of this study.

Statement of the Problem

10

The Federal Republic of Nigeria Policy on Education (FRN) (2004) has

emphasized so much on continuous assessment which is necessary at all level of

education. By this policy, teachers assess the knowledge, skills and abilities of the

students in Economics at senior secondary school. Every assessment is expected to

treat the test-taker equally but the instrument development through classical test

theory which the teachers set hardly accomplishes this purpose. This is because, it is

group dependent and the item statistics such as item difficulty and item discrimination

are also group dependent.

Based on these limitations of the instrument developed under classical test

theory, the researcher designed this study using a modern measurement theory to

ensure objectivity in measurement of the students’ scores in analyzing Economics

multiple choice test items. Therefore, the question addressed is: would item response

theory influence the instrument development and validation of multiple choice test in

Economics?

Purpose of the Study

The main purpose of this study was to apply item response theory in the

development and validation of the multiple choice test in Economics. Specifically, the

study determined the;

1. Standard errors of measurement of the test items of the multiple choice test in

Economics.

2. Fit of the items of the Economics multiple choice test using three-parameter

logistic (3PL) model.

3. Difficulty parameter of the test items of the multiple choice test in

11

Economics.

4. Discrimination parameter of the test items of the multiple choice test in

Economics.

5. Guessing parameter of the test items of the multiple choice test in Economics.

6. Differential item functioning of the test items of the multiple choice test in

Economics with respect to gender.

Significance of the Study

The results of this study have both theoretical and practical significance.

Theoretically, item response theory which focused on paradigm for the design,

analysis, and scoring of tests, questionnaires, and similar instruments measuring

abilities, attitudes, or other variables was used to show the relationship between

student’s test performance and the latent trait underlying the performance. The theory

also provides a better view on the information each question provides about a student.

The practical significance of this study is expected to be beneficial to the

teachers, curriculum planners, students and guidance and counselors.

This study should help the teachers to understand the steps involved in the test

development. This enables teachers to set quality questions in the school which may

have similar qualities with external examination questions. This may also give insight

to the teachers that the performance of the students during external examinations

depends on the quality questions or assessment they set in the school. Teachers should

find this study useful as it helps to ensure maximum report of the achievement of the

examinees by providing ideas to meaningful interpretation of examinees result

through person-by-item encounter (latent trait model) during examination. The study

12

would report the examinees’ achievement by classifying the examinees into ability

levels on each of the items based on Item response theory (IRT) using item response

function (IRT). The Economics teachers can use instrument to predict the probability

of the examinees correctly answering any given item if the examinees’ ability levels

are known.

To curriculum planners, this study provides another reform of curricular goals

and objectives. The usefulness of this study ties in providing empirical data to enable

them plan a functional curriculum taking into consideration the development and

validation of achievement test such as Economics as a subject. This should encourage

and guide teachers to develop and set quality questions in the school.

To the student, it would enlighten them on the interpretation of their

performance in Economics when assessed using the developed instrument. The study

should enable them to understand the relationship between their performance on each

question they answered and underling latent trait.

On the aspect of the guidance and counselors, the findings of this study would

help them to understand the performance of the students on each question as exposed

by the teachers. This should enable them to determine the strength and weakness of

each student. This help to advice the student from time to time on the factors that

affect their performance or academic life in the career to choose.

Scope of the Study

Application of item response theory in the development and validation of

multiple choice test in Economic was limited to SS2 Economics students at senior

secondary school in Nsukka Education Zone of Enugu state. The SS2 students were

13

chosen because the topics used in the instrument of this study are contained in SS2

scheme of work. The content scope includes: Demand and supply, financial

institutions, public finance, labour force, alternative economic system, theory of cost

and inflation. The above topics were selected from the SS2 Economics syllabus. The

choice of these topics was because students always find them difficult to understand

during classroom teaching and learning.

Research Questions

The following research questions were posed to guide this study.

1. What are the standard errors of measurement of the test items of the multiple

choice test in Economics?

2. How do the items of the Economics multiple choice test fit the three-parameter

logistic (3PL) model?

3. What are the difficulty parameters of the test items of the multiple choice test

in Economics?

4. What are the discrimination parameters of the test items of multiple choice test

in Economics?

5. What are the guessing parameters of the test items of the multiple choice test in

Economics examinations?

6. What are the Differential item functioning of the test items of the multiple

choice test in Economics with respect to gender?

14

Research Hypotheses

The following null hypotheses (H0) were formulated and were tested at .05 level of

significance.

1. H01: There is no significant fit between the items of Economics multiple choice

test based on three-parameter model.

2. H02: The test items of multiple choice test in Economics do not function

differentially between male and female SS11 Economics students.

15

CHAPTER TWO

LITERATURE REVIEW

In this chapter, the researcher presents a review of related literature to the

present study. The review is organized under the following: conceptual framework,

theoretical framework, empirical studies and summary of literature review.

Conceptual Framework

• Concept of Achievement Test

• Procedures for Development of a Test

• Qualities of a Test

• Item Analysis

• Differential Item Functioning (DIF)

• Standard Errors of Measurement

• Concept of Gender

• Analysis of Fit

Theoretical Framework

• Classical Test Theory

• Item Response Theory

Empirical Studies

• Studies on Development and Validation of Instrument

• Studies on Item Response Theory

Summary of Literature Review

16

Conceptual Framework

Concept of Achievement Test

An achievement test is an examination designed to assess how much

knowledge a person has in a certain area or set of areas as a result of teaching. Ali

(2006) viewed achievement test as an instrument administered to an individual or

group as a stimuli to elicit certain desired or expected responses which represents

his/her ability. Every measuring instrument such as test is expected to possess certain

qualities so that whatever information obtained with it can be acceptable (Ezeh &

Onah, 2005). Any test and indeed any evaluation instrument must satisfy the criteria

of reliability, validity as well as objectivity (Anene & Ndubuisi, 2003). Achievement

test may be classified into teacher made test and standardized test. Teacher made test

are teachers own test (Onunkwo, 2002). They are tests constructed by individual

teachers in their schools for assessing their students/pupils. Ifeakor (2011) opined that

standardized test is the one that has norms. Norms are a set of descriptive data which

make it possible to determine the standing of a candidate in relation to a specified

reference group. Standardized tests provide a uniform set of questions, instructions

and method of administration. Tests for measuring the achievement of objectives in

the cognitive domain fall mainly into two categories: the essay test and objective test.

Onunkwo (2002) defined essay test as a test in which students are required to provide

answers to questions and offers students the opportunity to organize and express their

ideas in writing. The objective test such as multiple choice tests which is the focus of

this study can assume two forms; the first one may be a direct question which the

testees are requested to answer while the second one involves an incomplete question

17

posed to the testees to complete it (Onunkwo, 2002). Despite the form it occurs, any

multiple choice item has two parts namely; the stem and the alternatives (i.e, the

answer options). The stem is the direct questions or incomplete question while the

alternatives are the options from which the testees are instructed to pick only one

which is most correct.

Procedures for Development of a Test

In development of a test a number of steps are involved. These are: Content

Analysis: A test developer should have a clear outline of the subject matter or content

of the subject on which the test is being developed. Content analysis means that the

test developer should look at the relevant subject content on which the test is to based

and find out what the content is all about (Anene & Ndubisi, 2003).

Review of Instructional Objectives: The second step in the development of a test is the

review of instructional objectives. According to Anene and Ndubisi (2003),

instructional objectives are those behavioural changes, which a teacher expects to

notice in his students after they have been exposed to a particular topic. Therefore, a

test developer must be sure of the instructional objectives because these are the traits

he should be testing for in the testee.

Development of Test Blue Print: A table of specification is a plan or guide for test

preparation (Okolo, 2006). It specifies and states how number of questions to be asked

on each topic or course unit, and the number of questions on recall of facts,

comprehension, application etc.

18

A Sample of a Test Blue-print for a 50 Item Achievement Test.

Following the above examples, the number of questions for each cell is worked

out and this serve as the guide for constructing the test item.

Item Writing: This involves the written of the items of the test as guided by the test

blue print (Harbor-Peters, 1999).

Face Validation: This deals with what a scale appears to measure based on the various

items. Face validity, according to Polit and Hungler (2002), is the process of sending

scale items to experts in the field of the subject matter for criticism.

Item Review: Item review in view of Anene and Ndubisi (2003) involves looking

closely at the individual test items that have been written and choosing those that are

most appropriate so that at the end, those that survived the scrutiny are then be used

in the trial testing.

Trial testing: This involves administering the validated test to a large representative

sample of the students for whom the test was designed (Anastasi & Urbina 2002).

Content Know

ledge 40%

Compr

e

hensio

n 25%

Applic

ation

20%

Anal

ysis 5%

Synth

esis

5%

Evalu

ation

5%

Total

100%

Topic A

30%

6 4 3 1 0 1 15

Topic B

10%

2 1 1 0 0 0 4

Topic C

25%

5 3 3 1 0 1 13

Topic D

20%

4 3

2 0 1 1 11

Topic E

15%

3 2 2 0 0 0 7

Topic

100%

20 13 11 2 1 3 50

19

Item Analysis: In test construction item analysis is the last step the researcher

takes into consideration. (Anene and Ndubisi, 2003) asserted that item analysis

involves the analysis of responses to individual items that are in the test. They are

subjected to statistical analysis, so that those that pass the analysis are selected for the

final form of the test while those that fail are either discarded, or modified and tried

out again. All these procedure are seen in classical test theory and also could be seen

in item response theory.

Qualities of a Test

Measuring instrument used in psychology and education are tests, rating scales,

checklists, questionnaires and inventories. These instruments must possess certain

desirable qualities in order to be used as vital tools in psychological and educational

decisions. These qualities are validity, reliability, objectivity, and usability.

Concept of Validity

Validity centers on whether the instrument measures what it is intended to

measure. Ezeh (2003) stated that validity of a test refers to the extent to which a test

measures what it is supposed to measure and nothing else. Therefore, the validity of a

test depends on the purpose for which the test was developed. This means that a test,

which is valid for assessing achievement in S.S. II Economics, may not be valid for

assessing achievement in S.S. III Economics.

Types of Validity

There are four types of validity namely content validity, criterion-related

validity face and construct validity.

20

Content Validity

Content validity refers to the extent to which the test measures both the subject

matter content and the instructional objectives designed for a given course (Ezeh,

2003). It is the most appropriate form of validity for achievement test. A test blue

print or table of specification is used to ensure a systematic coverage of the entire

course content and instructional objectives.

Face Validity: This refers to the appropriateness of the test in relation to the course on

which test is based (Anikweze, 2010). A test has face validity when it appears valid to

examinees who take it, personnel who administer it and even to other untrained

observers.

Criterion Validity

This type of validity indicates the extent to which students who have been

taught based on the objectives being measured score higher on the test of those

objectives than students who have not been taught (Anikweze, 2010). If test of

proficiency has criterion validity, then students should score lower on it when used as

pretest than when it is used as posttest. Criterion validity is obtained by correlating the

two sets of scores from two testings. Examples of criterion-related validity include

concurrent and predictive validity.

Concurrent Validity

Concurrent validity deals with how present performance could be used to

estimate some other current measure of performance. For instance, the West African

Senior School Certificate Examination results could be used to predict performance in

the University Matriculation Examination. In the view of Martyn (2009), concurrent

21

validity measures the test against a benchmark test and high correlation indicates that

the test has strong criterion validity. For instance, if the scores from a test already

known to be valid test are highly correlated to a selection test administered to the

same group of learners, then concurrent validity is obtained for the selection test.

Predictive Validity

Onunkwo (2002) opined that predictive validity is the most relevant for

intelligence tests, aptitude tests, interest and attitude tests. All tests used in selection

of candidates (say into education, business, industry, armed force, etc) or in predicting

future performance/achievement must demonstrate high predictive validity.

Construct Validity

Ifeakor (2011) viewed construct validity as those educational and psychological

traits that cannot be seen with the eyes, their existence can only be inferred from

manifested characteristics or behavior ascribed to them. These traits can be attitude,

creativity intelligence, speed of reading, ability, interest, aptitude etc. If a test is able

to measure such psychological traits then the test has construct validity.

Concept of Reliability

Reliability of a test relates to the degree of consistency or stability, which the

text exhibits. According to Eze and Onah (2005), reliability can be seen as the degree

of consistency of two or more measures of the same thing. According to Eboh (2009)

reliability refers to the degree to which a given measurement procedure will give the

same description of that phenomenon if that measurement is repeated. It therefore,

concerns whether a particular technique will yield the same result always if repeatedly

applied to the same object. For instance, if Ngozi and Ifeoma each obtained scores of

22

70% in a given test and three days later, the same test was re-administered to the

same class, and their scores are 60% and 45% respectively, then the test is said to be

unrealistic because the sores are inconsistent.

Method of Measuring Reliability

The degree of consistency of a test is expressed as a coefficient called the

coefficient of reliability. This, in most cases is determined by correlating two sets of

scores independently obtained from the test. The reliability coefficient has been

defined as a description of the loss in efficiency of estimation resulting from

measurement error (Ferguson, 2011). It is therefore interpreted directly as the

proportion of true variance. For instance, if the reliability coefficient obtained for a

test is 0.90 this means that estimation resulting from true variance is 90%, while the

remaining 10% is due to error variance. In other words, 90% of the variance in the test

scores is due to true variance while the remaining 10% is attributable to chance factors

or error variance.

There are four types of reliability.

Stability

This is the correlation between two successive measurements with the same

test. Stability is the ability of the same test to give the same result whenever it is

administered on the same subjects within a given time interval (Harbour-Peters,

1999). This measure of stability, often called a test-retest estimate of reliability, is

obtained by administering a test to a group of individuals, re-administering the same

test to the same individuals at a later date, and correlating the two sets of scores using

Pearson (r) or spearman’s rank.

23

Equivalent Forms Reliability

Equivalent forms reliability is the successive administration of two parallel

forms of the same test. In order words, it is also referred to as parallel or alternate

form reliability method (Onunkwo, 2006). The two equivalent forms of a particular

test are administered to the same group of students. The students are administered

with one form of a test (say, Form A) on the first occasion and with a comparable

form of that test (say, Form B) on the second occasion. Their scores in the two forms

(i.e, A & B) are then correlated with Pearson r. The coefficient so computed

represents the equivalent-form reliability of the instrument.

Internal Consistency Reliability

Meredith, Joyce and Walter (2007) stressed that internal consistency is an

approach to estimating test score reliability that involves examination of the individual

items of the test.

Objectivity of a Test

The objectivity of a test refers to the degree to which equally competent scorers

obtain the same results. In Economics objective testing, as well as in the use of

various observational procedures, the results depend to a large extent upon the person

doing the scoring. Different persons get different results, and even the same person

may get different results at different time. Such inconsistency in scoring has an

adverse affect on the reliability of the measures obtained, for the test scores now

reflect the opinions and biases of the scorer as well as the differences among pupils in

the characteristic being measured.

24

Usability of a Test

Usability of a measuring instrument refers to the practicability of the

instrument. Harbors-Peters (1999) noted that it is the ability of a test to serve the

educational purpose it is design to serve. The usability of a test has implications on the

decisions taken on the test result. For instance, if a test is developed and validated for

use in schools, the cost of purchasing and administering such a test should be

affordable by the school. But where a valid test is developed and schools cannot

afford to purchase and use a test within the limit of the school time frame such a test is

not usable. These qualities of test instrument could also be applied in item response

theory.

Item Analysis

Item analysis has to do with the assessment of the adequacy of each of the

items that make up the test/instrument. During item analysis each of the items is

assessed in terms of its difficulty, discrimination, and distractor index (Abonyi, 2011).

Denga (2003) opined that item analysis is a process of assessing students’ responses to

each item in order to judge the quality or worth of the test. Item analysis is focused

upon answering two basic questions:

• How difficulty is each item for the students?

• To what extent did each item discriminate between good and poor students?

To answer these questions it is necessary to compute a statistical indices called

difficulty index for questions and another statistical index called discrimination

index.

25

Differential Item Functioning (DIF)

This refers to differences in the functioning of items across groups, oftentimes

demographic, which are matched on the latent trait or more generally the attribute

being measure by the items or test (Osterlind & Everson, 2009). It is important to note

that when examining items for DIF, the groups must be matched on the measured

attribute, otherwise this may result in inaccurate detection of DIF.

Standard Error of Measurement (S.E.M)

Any time a student takes a test, there is a possibility that the raw score

(observed score) obtained may be less or more than the score the students should have

received (true score). The difference between the observed score and the true score is

called the error score. Student true score = student observed score + student error

score. According to Chatterji (2003), standard error of measurement is a statistical

estimate of the amount of random error in the assessment of results or scores.

Meredith, Joyce & Walter (2007) indicated that standard error of measurement allows

you to determine the probable range within which the individual’s true score fall. The

standard error of measurement helps us to understand that the scores obtained in one

educational measurement are only estimates and may be considerably different from

individuals’ presumed true scores.

Concept of Gender

Gender is the range of physical, mental, and behavioral characteristics

pertaining to, and differentiating between, masculinity and femininity. According to

Lee (2001), gender is ascribed attribute that differentiates feminine from masculine.

The difference in academic achievement due to gender differences is crucial to the

26

educationists. The World Health Organization (2002) defines gender as the result of

socially constructed ideas about the behavior, actions, and role a particular sex

performs. Okeke (2006) described gender as socially or culturally constructed

characteristic, qualities, behaviours and roles which different societies ascribe to

female and males.

Analysis of Fit

The analysis of statistical fit is a check on internal validity (Obinne, 2013).

Within the latent trait test model, the internal validity of a test is assessed in terms of

the statistical fit of each item to the model. According to Korashy (1995), if the fit

statistic of an item is acceptable, then the item is valid. The IRT has three models:

one-parameter, two-parameter and three parameter models. If a given set of items fits

the model, this is the evidence that the items refer to unidimensional ability. Fit to the

model, also, implies that item discriminations are uniform and substantial, that there

are no errors in item scoring. However, a large positive fit statistics indicates no fitting

while a low statistics nearer to one indicates better fit.

Schematic Representation of Conceptual Framework

Steps in Test

Development

Content Analysis Content Analysis

Item Writing

Item Review Item Review

Development of test

blue print

Figure 1: Schema

27

The schema above indicates that there are steps involved in the development of

an instrument. This could be influenced by the following key variables-development

of Economics multiple choice questions, qualities of a test and validation of

Economics multiple-choice questions. The above framework describes how the

researcher develops and validates instrument measuring Economics multiple-choice

questions.

Theoretical Framework

The application of item response theory in the development and validation of

multiple choice questions in Economics is best described by two theories. The two

theories discussed in this study are classical test theory and item response theory.

An overview of Classical Test Theory

Classical test theory (CTT) is also known as True Score Model (TSM) The

basic idea behind the theory is that observed score (x) is made up of two components,

the true score and the error score (Anikweze, 2010). CTT is concerned with the

relationship between these three variables X, T, and E. This relationship is used to

discuss about the quality of the scores. The true score reflects the exact value of the

respondent’s ability or attitude. Mathematically, it is written as

X = T+ E

Where X = observed score

T = true score

E = error.

28

Mehrens and Lehmann (1978) emphasized that classical test theory describes

how errors of measurement can influence the observed scores. Onunkwo (2002)

indicated that the observed score (X) is the simple sum of a true score (T) and the

error score (E) reflects the effect of extraneous influences of the measurement process

at the time of measurement. Take for instance, a child’s mood at the time of

measurement may increase or decrease his test performance at that particular

measurement. The assumption of classical test theory is however difficult to realize in

practical situations because aside of random errors, the testing instruments used for

measurement for each testee is always different from the person’s true ability or

characteristics.

An overview of Item Response Theory

Item Response Theory (IRT) is commonly used to create a response curve

(probability of a student with a particular ability to answer the question correctly) for

each item and/or to create a scaled score for the whole test based on what is known

about each item (Windy & Carl, 2010). Item response theory according to Osterind

(2012) is an approach to modern educational and psychological measurement that

posits a particular notion about cognition and sets forth sophisticated statistics to

appraise cognitive processes. Its objective is to reliably calibrate individuals and test

stimuli (i.e., items and exercises) on a common scale that is interpreted to show the

individuals' ability or proficiency and specified characteristics of the test stimuli. IRT

is applicable to many practical testing problems, such as generalizability of test

results, various item analyses, examining test bias and differential item functioning,

equating test forms, estimating construct parameters, domain scoring, and adaptive

29

testing. Nering and Ostini (2010) see item response theory as latent trait

theory, strong true score theory, or modern mental test theory, a paradigm for the

design, analysis, and scoring of tests, questionnaire and similar instruments measuring

abilities, attitudes, or other variables. Unlike simpler alternatives for creating scales

and evaluating questionnaire responses it does not assume that each item is equally

difficult. Palmieri (2012) explained that it is a model-based version of test theory that

uses a mathematical function to describe the relationship between a person’s standing

on a latent trait and his/her item responses. When an appropriate model is selected, the

likelihood that a person will respond to an item in the keyed/direction is a function of

the person’s standing on the underlying construct and the item’s difficulty and

discrimination modeled as a function of person’s performance level of the trait being

measures and the characteristics of the items completed. Item response theory is also

a mathematical model that describes how people interact with test items (Embretson &

Reise, 2000). In IRT persons and items are located on the same continuum. Most IRT

models assume that the latent variable is represented by a unidimensional continuum.

In addition, for an item to have any utility it must be able to differentiate among

persons located at different points along a continuum. An item’s capacity to

differentiate among persons reduces our uncertainty about their locations. This

capacity to differentiate among people with different locations may be held constant

or allowed to vary across an instrument’s items. Therefore, individuals are

characterized in terms of their locations on the latent variable and, at a minimum,

items are characterized with respect to their locations and capacity to discriminate

among persons.

30

Assumptions of item response theory

• Unidimensionality of the Test

• Local Independence

• Item characteristics curve.

Unidimensionality

The IRT model is based on the assumption that the items are measuring a single

continuous latent variable θ ranging from -∞ to +∞ (Reeve, 2000). This implies that

the performance of each examinee is assumed to be governed by a single factor,

referred to as ability (though it should be noted that ability is a generic convention

used in measurement, referring to the construct and does not imply innate cognitive

potential). The assumption of unidimensionality means that a set of items and/or a test

measure(s) only one latent trait (θ), and local independence refers to the assumption

that there is no statistical relationship between examinees’ responses to the pairs of

items in a test, once the primary trait measured by the test is removed (Kyung, 2013).

Local Independence

Item responses are independent of one another given ability, once you know a

person’s ability level, the student responses to items are independent of one another.

This is one of the hallmark assumptions in IRT, and it makes many things possible (it

will also be important for estimating examinee trait levels) Conditional independence

provides us with statistically independent probabilities for item. In the words of Revee

(2000), assumption of local independence asserts that responses to an item are

independent of responses to another item once controlling for the underlying variable

measured by the scale. This concept is related to that of unidimensionality, if one trait

31

determines success on each item, then examinee ability is the only thing that

systematically affects item performance. Local independence means that if the trait

level is held constant, there should be no association among the item responses

Item Characteristic Curve

The Item Characteristic Curve (ICC) or Item Characteristic Function (ICF) is a

mathematical function that relates the probability of success on an item to the ability

measures by the item set or test that contains it. It is a basic building block of item

response theory; all the other constructs of the theory depend upon this curve (Baker,

2001). The item characteristics curve gives a clear distinction among different latent

trait models. There are two technical properties of an item characteristic curve that are

used to describe it. The first is the difficulty of the item. Under item response theory,

the difficulty of an item describes where the item functions along the ability scale. For

example, an easy item functions among the low-ability examinees and a hard item

functions among the high-ability examinees; thus, difficulty is a location index.

The second technical property is discrimination, which describes how well an

item can differentiate between examinees having abilities below the item location and

those having abilities above the item location. This property essentially reflects the

steepness of the item characteristic curve in its middle section.

32

Figure 2: A diagram of a typical item characteristics curve.

The probability of correct response is near zero at the lowest levels of ability. It

increases until at the highest levels of ability, the probability of correct response

approaches 1. This S-shaped curve describes the relationship between the probability

of correct response to an item and the ability scale.

Item Response Parameter

IRT parameters are not dependent on the sample used to generate the

parameters, and are assumed to be invariant (within a linear transformation) across

divergent groups within a research population and across populations (Reeve, 2002).

IRT models are described by the number of parameters they make use of. The three

33

parameter logistic (3PL) model is named so because it employs three item parameters.

Such as item difficulty, discrimination and guessing parameter.

The equation for the three-parameter model is:

P (�) = c + (1- c) = 1

1 +e – a

(� – b

)

Where:

b is the difficulty parameter

a is the discrimination parameter

c is the guessing parameter and

è is the ability level

The parameter c is the probability of getting the item correct by guessing alone.

It is important to note that by definition, the value of c does not vary as a function of

the ability level. Thus, the lowest and highest ability examinees have the same

probability of getting the item correct by guessing. The two-parameter logistic (2PL)

model assumes that the data have no guessing, but that items can vary in terms of

location (bi) i.e difficulty and discrimination (ai).The equation for the two-parameter

model is given below:

P (�) = 1 = 1

1 + e -L

1 +e – a

(� – b

)

Where: e is the constant

b is the difficulty parameter a is the discrimination parameter1

L = a (è - b) is the logistic deviate (logit) and è is an ability level. The difficulty

parameter, denoted by b, is defined as the point on the ability scale is the probability

of correct response to the item. The one-parameter logistic (1PL) model assumes that

34

data have no discrimination and guessing. Items are only described by a single

parameter in terms of location or difficulty (bi). The results in one-parameter models

have the property of specific objectivity, meaning that the rank of the item difficulty is

the same for all respondents independent of ability, and that the rank of the person

ability is the same for items independently of difficulty.

The equation for one parameter model is given by the following:

P (�) = 1

1 + e -1(�

– b )

Where: b is the difficulty parameter and è is the ability level

The above theories shown that item response theory is the modern theory that

describe the students ability using item by item performance, than the classical test

theory. Therefore, the study focused on item response theory.

Review of Empirical Studies

In this section, empirical studies that have been carried out are presented. This

is to ascertain trends, agreements and disagreements with the intention of establishing

ground for comparison of the findings of this present study.

Studies on Item Response Theory

Nkpone (2001) carried out a study on the application of latent trait models in

the development and standardization of physics achievement test for senior secondary

schools. The study determined the estimates of the item parameter using the one

parameter logistic model with 359 senior secondary schools, students used for the

study. Result showed that the items ranged in difficulty from -1.49 to 0.49. The

estimated value shows consistency with the way the Physics Achievement Test (PAT)

35

items were written to increase in difficulty within each content area. Approximately,

22 items out of 60 items were easy with difficulty level less than zero. About 37 items

were difficult with level of more than zero. The mean of the difficulty estimates was

zero; standard deviation equals 0.31 suggesting that there was little variability in

scores among the subjects. The study also estimated item parameters using 2pl model.

The result showed that the items were moderately difficult and has uniform

discrimination indices ranging from 1.76 to 0.39. Item difficulty indices ranged from

1.66 to 0.69. An effective discriminating power consist of item with discriminating

indices greater than 0.8 and an item difficulty index displaying a rectangular

distribution from -2.0to 2.0 for the latent trait 2pl model. The study was further the

estimate of the standard error for each of the PAT items. It was found that the standard

errors ranges from 0.0578 to 0.0518 with the mean of the standard error as 0.17 or

17% of the total variance unreliability while 83% is attributed to or due to true

variance reliability. The largest standard error was less than 10% of the range of

standard error values. This means that the difficulty indices have been estimated with

excellent precision. This study relates to the current one especially the use of

parameter models. However, the researcher did not include the three-parameter

logistic (3pl) model. The sample of the study was also very small to compare with the

present study. This is the basis for this study.

In a study carried out by Obinne (2008) to examine the psychometric properties

of the items of the Biology examinations conducted by the National Examination

Council (NECO), and the West African Examination Council (WAEC) using the Item

Response Theory (IRT). The study adopted an instrumentation research design.

36

Research questions and hypotheses were formulated, tested, and analyzed. The sample

was made up of 1800 senior secondary year three students from 36 secondary schools

in the urban and rural areas of Benue State. The multistage stratified sampling

technique was used. The NECO and WAEC Biology examination questions from 200-

2002 were the instruments for data collection. Maximum likelihood estimation

technique (using BILOG MG computer programme) was used to analyze the research

questions, according to IRT procedures. The t-test was used to test the hypotheses. It

was found that the Biology examination items from the two examination bodies were

equally reliable and valid. Biology items in the NECO-conducted examination for

2001 were more difficult than those of WAEC of the same year. WEAC items were

more prone to guessing than those of NECO items. It was recommended that IRT

procedures should be adopted by all examination bodies in Nigeria so that our

measurement problems could be put to rest. This study relates to the current one

especially in the area of design, the method of data analysis for research questions but

differ in the area of study and method of data analysis for testing hypothesis.

Orangi and Dorani (2010) conducted a research to develop a social studies

achievement test for high school students based on item-response theory (IRT). The

purpose of the study was to develop a social study achievement test for high school

Students (first grade) based on item response theory. The sample consisted of 321

high school students in Tehran. Multi-stage cluster sampling was used for selecting

the participants. The study adopted an instrumentation research design. The first step

in conducting this exam was to prepare two parallel forms of multiple choice style

which on one hand concentrated on the educational objectives and on the other hand

37

on the content of the lessons. These forms were administered in three preliminary

stages, first: the questions were analyzed for any ambiguity in their composition, the

comprehension of expressions was tested and the like, in the second stage which was

the practical stage, consisted of determining the difficulty level of the questions, the

students ability to recognize questions and also the level of interdependence of

questions with the overall score etc. Both of these forms were administrated to the

sample group in a ten day interval. The results show that the constructed forms were

of high reliability, they were at the same time acknowledgeable through the analysis

based on the classical Method and they were also in accordance with the three–factors

of the Item Response Theory. Taking into account the Item characteristic curve, both

of the forms produced the knowledge for the students with average ability. In this

analysis a kind of rank-percentile norm for both sex was formulated. This study has

one feature with the current study, for example: the design was the same but differ

with the sampling technique and sampling size.

Studies on Development and Validation of Instrument

Bradley and Herrin (2004) carried out a research to develop and validate an

Instrument to Measure Knowledge of Evidence-Based Practice and Searching Skills.

The aim of this study was to develop and validate three instruments which measure

knowledge about searching for and critically appraising scientific articles (evidence-

based practice-EBP). Twenty three questions were collected from previous studies and

modified by an expert panel. These questions were then administered to 55 delegates

before and after two international conferences in EBP; the responses were assessed for

discriminative ability and internal consistency. Five questions were discarded and

38

three instruments of six questions each were developed. Finally, the instruments were

revalidated in a randomized controlled trial comparing two educational interventions

at the University of Oslo, Norway by 166 of 175 eligible medical students. In the re-

validation, the instruments showed satisfactory level of discriminate validity (p<0.05),

but borderline levels of internal consistency (Cronbach’s α 0.52-0.61). More research

is needed to develop a suitable instrument which includes questions on searching for

evidence. The study is in agreement with the current study because, the current study

did not mention the design used. Moreover, the sampling technique and the sampling

size were not mentioned. On this premise, lies the rational to ascertain research

design, sample and sampling technique and the sampling size.

Jeffrey and Wendy (2006), sought to develop and validate an instrument to

assess secondary school students’ perceptions of assessment tasks was conducted.

Following a review of literature, a five-scale instrument of 40 items was trialed with a

sample of 658 science students in 11 English secondary schools. Based on internal

consistency reliability data and exploratory factor analysis, refinement decisions

resulted in a five-scale instrument called the Perceptions of Assessment Tasks

Inventory (PATI). The scales of the PATI are Congruence with planned learning,

Authenticity, Student consultation, Transparency and Diversity. The current study has

the similar view of developing and validation of an instrument. This study did not

mention the research design used, the sampling technique and the sampling size used

hence the need for the current study to describe research design, sample and sampling

technique and the sampling size.

39

Okoro (2010) conducted a research to develop and validate extracurricular

instructional package in social studies. The population consisted of all the JS1

students in Rivers State, The study employed a random sampling and stratified

sampling technique was used. One hundred and sixty students were drawn from the

population. To achieve these objectives two research questions and three hypotheses

were formulated. Four instruments were developed. The design adopted was

experimental study. The validated extracurricular instructional package (EIP) was

presented to the experimental group while the control group was taught the same

social studies topics using the conventional approach. the major findings were that (1)

JS1 Students taught with extracurricular instructional packages relationship develop

more cooperative attitude to work (2) exhibited cordial relationship with others (3)

developed positive attitude to work. Recommendations were made: Teachers work

load should be restructured to accommodate their involvement in extracurricular

programs (2) More flexible time-release from teaching or in structuring the allow time

for activities during school day. However, the design and sampling technique differ.

Another one is that sample is very small to compare with the current study.

All the empirical studies reviewed so far, revolve around the major variables of

the current study such as development, validation and item response theory.

Therefore, the researcher deems it appropriate to review them in this study as they

help in understanding what researchers have done before and the gap between such

studies and the present study.

40

Summary of Literature Review

In this summary of literature the study has been reviewed under the basic issues

related to the topic of this study. The conceptual framework covered the concept of

achievement test, procedures for development of a test, qualities of a test, item

analysis, differential item functioning and concept of gender. The literature reviewed

on conceptual issues revealed some of the qualities of a test which will guide the

researcher while setting questions in different concepts in Economics. Literature on

theoretical framework of the study covered item response theory and classical test

theory. The theory will guide the researcher in identifying and selecting ideas that will

stimulate the mind of the researchers in choosing the appropriate measurement frame

work in analyzing the scores of the students. From the empirical studies reviewed,

some studies were carried out on development, validation and item response theory.

The empirical studies gathered information on studies related to the present

study. However, none of the studies reviewed focused on the application of item

response theory in the development and validation of an instrument measuring

achievement in Economics. This aspect of theory is important in modeling the

relationship between an observed variable, usually conceptualized as examinees’

ability and the probability of examinee responding to any particular item. On this

premise, lies the rational for this study.

41

CHAPTER THREE

RESEARCH METHOD

In this chapter, the researcher describes the procedures that were adopted for

the study. The procedures are the design of the study, the area of the study,

population, sample and sampling technique, instrument for data collection, validation

of instrument, reliability of the instrument, methods of data collection and method of

data analysis.

Design of the Study

The design of this study is an instrumentation research design. Instrumentation

research design, according to Ali (2006), is when the major thrust of the study is

geared entirely towards the development and standardization of an instrument whose

different psychometric properties (validity, reliability, usability e.t.c) have been

empirically determined. The design is appropriate for this study because the researcher

developed test items of multiple-choice test in Economics that were analyzed with

reference to their psychometric properties.

Area of the Study

This study was conducted in Nsukka Education Zone of Enugu State. Nsukka

Education Zone is made up of three Local Government Areas (LGAs) namely;

Nsukka, Igbo-Etiti and Uzo-Uwani LGAs. The zone has 58 secondary schools,

distributed as follows: Nsukka has 30 secondary schools, and Igbo-Etiti has 16

secondary schools while Uzo-Uwani has 12 secondary schools. The distribution of the

schools according to L.G.A, type and ownership is shown in (Appendix A, page 74).

42

Population of the Study

The population of this study comprised the entire secondary school two (SS2)

Economics students in all the 46 government co-education senior secondary schools in

Nsukka Education Zone. Out of 58 government secondary schools in Nsukka

education zone, 46 schools are co-education schools while 12 schools are single

schools. The population of SS2 Economics students in the 46 government co-

education senior secondary schools in Nsukka Education Zone is three thousand seven

hundred and ninety five (3795). The distribution of the school population follows

thus; Nsukka local government has twenty two (22) senior secondary schools with

population of SS2 Economics students of two thousand and seventy nine (2079); Igbo-

Etiti has thirteen (13) senior secondary schools with population of SS2 Economics

students of one thousand two hundred and thirty two (1232); while Uzo-Uwani has

eleven (11) senior secondary schools with population of SS2 Economics students of

four hundred and eighty four (484) (Post Primary School Management Board

(PPSMB) Nsukka, 2013/2014 academic session). (See Appendix B, page 77). The

decision to use co-education senior secondary schools was because this study has

gender as a variable.

Sample and Sampling Technique

The sample size of this study was one thousand and five (1005) SS2 Economics

students from 46 government co-education senior secondary schools in Nsukka

Education Zone. Proportionate stratified random sampling technique was adopted to

enable the population of SS2 Economics students to be drawn approximately from

each local government area. Based on this premise, a sample of five hundred and fifty

43

one (551) Economics students were randomly selected from a population of 2079

Economics students in Nsukka L.GA; and a sample of three hundred and twenty six

(326) Economics students were randomly selected from the population of 1232

Economics students in Igbo-Etit Local government area while a sample of one

hundred and twenty eight(128) were randomly selected from a population of four

hundred and eighty four (484) Economics students in Uzo-Uwani local government

area. All SS2 Economics students in each sampled school were used. In all there were

462 males and 543 females giving a total of 1005. This is in line with Nworgu’s

(2006) recommendation, that proportionate stratified random sampling ensures greater

representativeness of the sample relative to the population. (See Appendix C page,

80). The decision to use 1005 as sample of this study is that adequate sample sizes for

(IRT) should not be less than 1000. Kim (2006) indicated that the use of 1000

examinees can be depended upon to give adequate parameter estimation results.

Instrument for Data Collection

The instrument for data collection for this study was the Economics Multiple

Choice Test (EMT) developed by the researcher. The Economics Multiple-Choice

Test (EMT) was based on the following topics: Demand and supply, financial

institution, Public finance, Labour force, Alternative Economics system, Theory of

cost and Inflation drawn from SS2 syllabus. The instrument consists of 50 multiple

choice questions each with 4 options (A-D). One mark was given for each correct

responses and zero for incorrect responses (See Appendix D page, 81). Scoring guide

which contained all the answers to the Fifty (50) multiple choice questions was also

developed by the researcher. (See Appendix E page, 89).

44

Validation of Instrument

The instrument was subjected to face validation and content validation. The

instrument was given to three experts for face validation, two lecturers from

measurement and evaluation, Department of Science Education and one from

Department of Economics all in University of Nigeria, Nsukka. The experts were

asked to examine the instrument with respect to:

• Whether the questions correspond to the table of specifications

• The structure and clarity of the questions;

• Whether the answers to the questions tally with the ones in the marking

scheme. The corrections and suggestions of these experts helped in modifying

the items in EMT. Content validation of the test was carried out by preparing

the table of specification based on the six levels of cognitive domain of

Bloom’s taxonomy of education (See Appendix F, page 90). The comments

and recommendations of these experts have been incorporated in the final

version of the instrument.

Reliability of the Instrument

The EMT was administered to twenty five (25) SS11 Economics students in

Nsukka Education Zone. The school used was outside the sample of the study, but has

some degree of similarities with sampled schools. Their responses were scored and

analyzed using Kuder-Richardson (KR-20) formula to determine the internal

consistency (reliability) of the instrument. Reliability index of 0.89 was obtained (See

45

Appendix G page, 91). The decision to use K-20 was due to the fact that the items

were dichotomously scored for a single administration.

Method of Data Collection

The data for this study were collected through the use of Economics Multiple

Choice Test (EMT). The researcher visited the sampled schools to collect the data for

the study. The copies of the instrument were administered to the students through the

assistance of the Economics teacher in the respective sampled schools. The test was

administered to the students under a good atmosphere and the test lasted for

50minutes. The administration of the instrument was done once and was retrieved

immediately for recording and analysis.

Method of Data Analysis

The research questions were answered using maximum likelihood estimation

technique of the BILOG-MG V3 of 3PL MODEL computer programming while the

hypotheses were tested using BILOG-MG V3 of DIF MODEL computer

programming. BILOG-MG is a software program for IRT analysis of dichotomous

(correct/incorrect) data, including fit and differential item functioning.

46

CHAPTER FOUR

RESULTS

In this chapter, the researcher presents the results obtained from the data in this

study. The results are presented based on research questions and hypotheses.

Research Question One: What are the standard errors of measurement of the test

items of the multiple choice test in Economics?

Table 1: Standard errors of measurement of the test items of the multiple choice test

in Economics based on three-parameter logistic (3PL) model.

Item S.E Item S.E Item S.E

1 0.44 19 0.10 37 0.06

2 0.27 20 0.08 38 0.14

3 0.12 21 0.08 39 0.05

4 0.09 22 0.09 40 0.12

5 0.10 23 0.13 41 0.07

6 0.10 24 0.07 42 0.05

7 0.16 25 0.08 43 0.07

8 0.06 26 0.15 44 0.16

9 0.09 27 0.33 45 0.05

10 0.10 28 0.08 46 0.15

11 0.22 29 0.07 47 0.09

12 0.14 30 0.08 48 0.07

13 0.05 31 0.06 49 0.16

14 0.11 32 0.24 50 0.20

15 0.09 33 0.08 16 0.36 34 0.09 17 0.58 35 0.09 18 0.06 36 0.07

The result in Table 1 shows the standard errors of measurement of the test

items of the multiple choice questions in Economics based on three parameter logistic

(3PL) model. Based on the data in table 1, all the items with the exception of item 17

have a standard error of 0.05 to 0.44. Therefore, forty nine (49) items (98%) had

47

standard error below 0.50 and one (1) item (2%) had standard error above 0.50. The

standard error below 0.50 indicates high reliability while standard error above 0.50

indicates low reliability. This high reliability indicated consistency in measuring the

students’ ability in Economics.

Research Question Two: How do the items of the Economics multiple choice test fit

the three-parameter logistic (3PL) model?

Table 2: Fits statistics of Economics3 multiple choice test based on three parameter

logistic (3PL) model.

Item Chi.sq. Prob Item Chi.Sq. Prob Item Chi.Sq. Prob

1 51.2 0. 10 19 76.0 0.00* 37 52.1 0.00*

2 37.6 0.00* 20 43.9 0.03* 38 29.3 0.00*

3 67.3 0.12 21 31.7 0.09 39 23.2 0.26

4 48.5 0.00* 22 44.2 0.18 40 179.9 0.00*

5 51.9 0.20 23 77.4 0.00* 41 77.8 0.04*

6 47.6 0.00* 24 13.7 0.06 42 23.8 0.00*

7 96.6 0.15 25 40.0 0.00* 43 70.4 0.00*

8 30.5 0.00* 26 84.2 0.13 44 116.5 0.00*

9 90.9 0.14 27 18.0 0.02* 45 26.1 0.09

10 46.7 0.05 28 79.0 0.06 46 41.0 0.00*

11 79.4 0.00* 29 46.0 0.07 47 138.4 0.13

12 57.1 0.16 30 43.7 0.00* 48 33.5 0.00*

13 31.5 0.03* 31 21.3 0.00* 49 45.5 0.02*

14 18.2 0.01* 32 103.4 0.08 50 94.3 0.09

15 55.0 0.08 33 48.7 0.00*

16 35.2 0.00* 34 45.4 0.01*

17 31.4 0.07 35 92.6 0.24

18 84.2 0.00* 36 55.2 0.00*

*Significant

Table 2 revealed the chi-square goodness-of-fit analysis for the items of the

multiple choice questions in Economics based on three parameter logistic (3pl) model.

48

Summary of the results revealed that the chi-square value linked with the probability

value ranged from 0.00 to 0.26. Based on the data in table 2, twenty nine (29) items

(58%) that is items 2, 4, 6, 8, 11,13, 14, 16, 18, 19, 20, 23, 25, 27, 30, 31, 33, 34, 36

37, 38, 40, 41, 42, 43, 44, 46, 48 and 49 did not fit the three parameter model because

the items were below .05 level of significant. Twenty one (21) items (42%) that is,

items 1, 3, 5, 7, 9, 10, 12, 15, 17, 21, 22, 24, 26, 28, 29, 32, 35, , 39, 45, 47, and 50

fitted the three parameter model because the items were above .05 level of significant.

These items are not marked with asterisk. This implies that 29 items were statistically

significant while 21 items were not statistically significant. The criterion for all the

items fit/misfit was determined at .05 level of significance.

Research Question Three: What are the difficulty parameters of the items of the

multiple choice test in Economics?

Table 3: Item threshold values (difficulty estimates) of the items of the multiple

choice test in Economics based on three parameter logistic (3PL) model.

Item Threshold Item Threshold Item Threshold

1 1.17 19 -0.52 37 0.44

2 -1.15 20 0.35 38 0.06

3 -0.35 21 -0.59 39 0.30

4 -0.44 22 0.13 40 0.27

5 -0.35 23 0.26 41 -0.34

6 -0.27 24 0.19 42 0.24

7 -0.16 25 -0.30 43 0.34

8 -0.07 26 -0.60 44 0.14

9 0.73 27 -1.49 45 -0.18

10 -0.61 28 -0.59 46 -1.12

11 -0.27 29 0.04 47 -0.59

12 -0.94 30 -0.73 48 0.08

13 0.17 31 -0.14 49 -0.64

14 -0.71 32 -1.38 50 -0.38

15 -0.48 33 -0.46

16 -2.10 34 -0.22 17 -2.38 35 0.10 18 -0.06 36 -0.09

49

Table 3 shows that thirty three (33) items (66%) that is items 2, 3, 4, 5, 6, 7, 8,

10 11, 12, 14, 15, 16, 17, 18, 19, 21, 25, 26, 27, 28, 30, 31, 32, 33, 34, 36, 41, 45, 46,

47, 49 and 50 within the b-value range of -3 to +3 had negative difficult estimates

while seventeen (17) items (34%) that is items, 1, 9, 13, 20, 22, 23, 24, 29, 35, 37, 38,

39, 40, 42, 43, 44 and 48 within the b-value range of -3 to +3 had positive difficulty

estimates. The negative estimates imply that 33 items are easy while 17 items are

difficult. Based on this information, none of the items were rejected in terms of

difficulty levels.

Research Question Four: What are the discrimination parameters of the test items of

the multiple choice test in Economics?

Table 4: Item parameters of the test items of the multiple choice test in Economics

based on three parameter logistic (3PL) model.

Item Slope Item Slope Item Slope 1 0.12 19 0.51 37 3.30

2 0.20 20 0.56 38 1.02

3 0.39 21 0.67 39 0.91

4 0.49 22 0.13 40 1.29

5 0.45 23 1.10 41 0.72

6 0.44 24 1.21 42 0.85

7 0.58 25 0.60 43 1.71

8 0.79 26 0.32 44 0.90

9 0.97 27 0.19 45 0.97

10 0.52 28 0.66 46 0.38

11 0.43 29 1.29 47 0.57

12 0.39 30 0.74 48 0.67

13 0.96 31 0.97 49 0.32

14 0.47 32 0.25 50 0.21

15 0.51 33 0.60 16 0.23 34 1.14 17 0.13 35 0.45 18 0.88 36 0.61

50

Table 4 reveals that Ten (10) items (20%), that is items 1, 2, 16, 17, 22, 26, 27,

32, 49 and 50 within the value range of .01 - .34 indicated very low discriminating

values, while eighteen (18) items (36%) that is items 3, 4, 5, 6, 7, 10, 11, 12, 14,

15,19, 20,25, 33, 35, 36, 46 and 47 within the value range of .35 - .64 indicated low

discriminating values. Also, twenty (20) items (40%) that is item 8, 9, 13, 18, 21, 23,

24, 28, 29, 30, 31, 34, 38, 39, 40, 41, 42, 44, 45 and 48 within the value range of .65 -

1.34 indicated moderate discriminating values and (43 & 37) items (4%) had values of

1.71 and 3.30 respectively, meaning that the two items had a very high discriminating

attributes.

Research Question Five: What are the guessing parameters of the test items of the

multiple choice test in Economics?

Table 5: Guessing parameters of the test items of the multiple choice test in

Economics based on three parameter logistic (3PL) model.

Item Asymptote Item Asymptote Item Asymptote

1 0.08 18 0.15 35 0.10

2 0.03 19 0.00 36 0.00

3 0.09 20 0.12 37 0.00

4 0.00 21 0.00 38 0.24

5 0.10 22 0.07 39 0.00

6 0.00 23 0.32 40 0.40

7 0.01 24 0.00 41 0.11

8 0.08 25 0.04 42 0.00

9 0.18 26 0.00 43 0.25

10 0.00 27 0.00 44 0.23

11 0.02 28 0.17 45 0.00

12 0.00 29 0.09 46 0.00

13 0.05 30 0.00 47 0.07

14 0.01 31 0.00 48 0.00

15 0.00 32 0.05 49 0.15

16 0.02 33 0.00 50 0.00

17 0.13 34 0.16

51

Table 5 shows the guessing (asymptote) values of the items of multiple choice

questions in Economics based on three parameter logistic (3pl) model. The data

reveals that items were ranged from 0.00 to 0.32. Based on the data in table 5, forty

five (45) items (90%) that is items 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11,12, 13, 14, 15, 16, 17,

18, 19, 20, 21, 22, 24, 25, 26, 27, …………….37, 39, 41, 42, 45, 46, 47, 48, 49 and 50

fall within the c-value range of 0.00 to 0.20 which shows that the items were desirable

and the probability of getting an answer correctly by mere guessing is low while five

(5) items (10%) fall within the c-value range of 0.20 to 0.30 that is items 23, 38, 40,

43 and 44 which shows that the items were not very good and the probability of

getting an answer correctly by mere guessing is high.

Research Question Six: What are the differential item functioning of the test items of

the multiple choice test in Economics with respect to gender.

Table 6: Model for group differential item functioning of the test items of the multiple

choice test in Economics

Item Group

P

Chi.Sq

Item

Group

P Chi.Sq

1 Male

Female

0.00

0.00

120.2*

266.8*

28 Male

Female

0.85

0.57

4.0*

6.7*

2 Male

Female

0.00

0.00

68.2*

113.5*

29 Male

Female

0.55

0.00

15.2*

200.0*

3 Male

Female

0.72

0.00

5.3*

22.2*

30 Male

Female

0.00

0.00

23.8*

45.0*

4 Male

Female

0.49

0.48

7.4*

7.5*

31 Male

Female

0.24

0.00

10.4*

101.8*

5 Male

Female

0.89

0.03

3.5*

16.8*

32 Male

Female

0.32

0.00

9.2*

61.0*

6 Male

Female

0.15

0.00

11.9*

40.8*

33 Male

Female

0.54

0.65

6.9*

5.9*

7 Male

Female

0.00

0.00

26.3*

36.9*

34 Male

Female

0.32

0.00

9.0*

21.0*

8 Male 0.00 23.7* 35 Male 0.99 0.9*

52

Female 0.00 37.6* Female 0.00 68.6*

9 Male

Female

0.6

0.59

6.2*

6.5*

36 Male

Female

0.04

0.13

15.7*

12.4*

10 Male

Female

0.9

0.92

3.0*

3.2*

37 Male

Female

0.23

0.30

10.4*

9.4*

11 Male

Female

0.76

6.0

5.0*

6.4*

38 Male

Female

0.25

0.83

10.1*

4.2*

12 Male

Female

0.58

0.01

6.6*

19.8*

39 Male

Female

0.00

0.00

44.9*

78.9*

13 Male

Female

0.00

0.00

40.0*

105.6*

40 Male

Female

0.24

0.07

10.4*

14.5*

14 Male

Female

0.72

0.70

10.7

10.7

41 Male

Female

0.00

0.19

31.1*

11.1*

15 Male

Female

0.61

0.00

6.3*

22.3*

42 Male

Female

0.00

0.00

31.1*

68.4*

16 Male

Female

0.00

0.00

49.4*

109.2*

43 Male

Female

0.00

0.19

22.8*

11.2*

17 Male

Female

0.00

0.00

92.8*

242.6*

44 Male

Female

0.72

0.00

5.3*

24.5*

18 Male

Female

0.00

0.00

89.8*

90.4*

45 Male

Female

0.00

0.00

99.8*

83.8*

19 Male

Female

0.00

0.00

30.1*

30.0*

46 Male

Female

0.00

0.46

13.3

13.3

20 Male

Female

0.10

0.00

13.2*

20.3*

47 Male

Female

0.79

0.98

4.7*

2.0*

21 Male

Female

0.29

0.01

9.5

9.5

48 Male

Female

0.02

0.02

18.1*

18.0*

22 Male

Female

0.00

0.00

27.5*

147.4*

49 Male

Female

0.02

0.00

17.9*

76.2*

23 Male

Female

0.97

0.83

2.1*

4.2*

50 Male

Female

0.00

0.00

141.6*

228.0*

24 Male

Female

0.00

0.00

80.2*

134.8*

25 Male

Female

0.00

0.81

20.7*

4.5*

26 Male

Female

0.04

0.00

16.1*

71.9*

27 Male

Female

0.00

0.00

107.2

107.2

Table 6 shows the adjusted threshold values for group differential item

functioning of the test items of the multiple choice questions in Economics. From the

53

data, the result indicated that Differential Item Functioning (DIF) effects were

observed among 46 items (92%) that is items 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13,

15,16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 28…………….45, 47, 48, 49 and 50. This

shows that 46 items were identified as significantly exhibiting differential functioning

among male and female students. Four (4) items (8%) that is items 14, 21, 27, and 46

were identified as not exhibiting differential functioning among male and female

students. This is refers to unidimentionality ability. It reveals that the item

discriminations are uniform and substantial. The chi-square values were used to

dictate the differential item effect.

Research Hypothesis One: There is no significant fit between the items of

Economics multiple choice test based on three-parameter model. The chi-square

goodness-of- fit was used to test if there is fit between the items of Economics

multiple choice questions. The data for testing hypothesis one is presented in table 2.

The result shows that twenty nine (29) items (58%) that is items 2, 4, 6, 8, 11,13, 14,

16, 18, 19, 20, 23, 25, 27, 30, 31, 33, 34, 36 37, 38, 40, 41, 42, 43, 44, 46, 48 and 49

did not fit the three parameter model because the items were below .05 level of

significant. Twenty one (21) items (42%) that is, items 1, 3, 5, 7, 9, 10, 12, 15, 17, 21,

22, 24, 26, 28, 29, 32, 35, , 39, 45, 47, and 50 fitted the three parameter model

because the items were above .05 level of significant. Based on this premise, the null

hypothesis which states that there is no significant fit between the items of Economics

multiple choice based on three-parameter model was accepted for 29 items and

rejected for 21 items.

54

Research Hypotheses Two: The test items of multiple choice test in Economics do

not function differentially between male and female SS11 Economics students. The

model for group differential item functioning was used to test if there is differential

functioning effect between male and female students in Economics multiple choice

questions. The data for testing the hypothesis two is presented in table 6. The data in

table 6 indicates that forty six (46) items that is items 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12,

13, 15,16, 17, 18, 19, 20, 22, 23, 24, 25, 26, 28…………45, 47, 48, 49 and 50 in the

Economics test were identified as significantly exhibiting differential item functioning

between male and female students at .05 level of significant while four (4) items that

is 14, 21, 27 and 46 do not differential function between male and female students.

The results also indicate that out of 50 test items of Economics multiple choice

questions male and female students perform differently in 46 items and none in 4

items.

Summary of Findings

Based on the results of the analysis of data presented in this chapter, the

following findings were established:

1. That forty nine (49) items had standard error below 0.50, indicating high

reliability of the test items while one (1) item indicates low reliability.

2. That twenty nine (29) items were statistically significant indicating that the

items did not fit the three parameter model while twenty one (21) items were

statistically not significant indicating that the item fitted the three parameter

model.

55

3. That thirty three (33) items which fall within the b-value range of -3 to +3 had

negative difficult estimates indicating easy items while seventeen (17) items

within the same range had positive difficulty estimates indicating difficult

items.

4. That Ten (10) items indicates very low discriminating values, eighteen (18)

items indicates low discriminating values, twenty (20) items discrimination

moderate discriminating values and two (2) items indicates high discriminating

values.

5. That forty five (45) items were considered desirable, meaning that the

probability of getting an answer correctly by mere guessing is low while five

(5) items were considered not very good, meaning that the probability of

getting an answer correctly by mere guessing is high.

6. The results of the analysis indicated that male and female Economics students

function differential in 46 items and no difference in 4 items.

56

CHAPTER FIVE

DISCUSSION OF FINDINGS, CONCLUSION, IMPLICATIONS,

RECOMMENDATIONS AND SUMMARY

In this chapter, the results are discussed based on the analyzed data. Conclusions

based on the results are also drawn. The limitations of the study, implications of the study

and recommendations for further studies are indicated; the summary of the entire study is

presented.

Discussion of Findings

The discussion of findings is based on the following:

1. Standard errors of measurement of the Economics multiple choice test.

2. Fits statistics of Economics multiple choice test.

3. Item threshold values (difficulty estimates) of Economics multiple choice questions.

4. Item parameters of the Economics multiple choice test.

5. Guessing parameters of the Economics multiple choice test.

6. Differential item functioning of the Economics multiple choice test with respect to

gender.

Standard errors of measurement of the Economics multiple choice test.

57

The findings of the study revealed standard error of measurement of the test

items of multiple choice questions in Economics based on three parameter logistic

(3pl) model. From the findings, forty nine (49) items (98%) that is item items 1, 2, 3,

4, 5, 6, 7, 8, 9, 10, 11, 12, 13 14, 15, 16, 16, 18, 19, 20, 21, 22, 23, 24, 25………..50

had S.E below 0.50 which indicates high reliability while (1) item (2%) of item 17 had

S.E of 0.58 which indicates low reliability. The reliability of the instrument ensures

the consistency of the test instrument. For any measuring instrument, the smaller the

error, the greater the reliability while the greater the error, the smaller the reliability.

This is why (Baumgartner, 2002) said that the difficulty index of every item in a test is

accompanied by its standard error in latent trait analysis and the smaller the standard

error the better the item. This finding also agrees with Meredith et al (2007) that if

reliability coefficient increases, the standard error of measurement becomes smaller.

This result is in agreement with Obinne (2008) that S.E of 0.50 and below is described

as high reliability, while S.E above 0.50 is described as low reliability.

Fits statistics of Economics multiple choice test.

The findings of the study revealed the fit statistics of Economics multiple

choice questions. The result obtained indicated that twenty nine (29) items (58%) that

is items 2, 4, 6, 8, 11,13, 14, 16, 18, 19, 20, 23, 25, 27, 30, 31, 33, 34, 36 37, 38, 40,

41, 42, 43, 44, 46, 48 and 49 did not fit the three parameter model while twenty one

(21) items (42%) that is items, 1, 3, 5, 7, 9, 10, 12, 15, 17, 21, 22, 24, 26, 28, 29, 32,

35, 39, 45, 47, and 50 fitted the three parameter model. Nkpone (2001) asserted that in

the latent trait models, a fit to the model implies validity that item discriminations are

58

uniform and substantial, and there is no error in terms of scoring. The criterion for all

the item fit/misfit in research question 2 was determined at 0.05 level of significant.

The findings corresponds with Adedoyin (2010) who in his study used chi-square test

with probability greater than alpha level of 0.05 significant level to select items that fit

model.

Item threshold values (difficulty estimates) of Economics multiple choice test.

The findings of this study revealed that items 2, 3, 4, 5, 6, 7, 8, 10 11, 12, 14,

15, 16, 17, 18, 19, 21, 25, 26, 27, 28, 30, 31, 32, 33, 34, 36, 41, 45, 46, 47, 49 and 50

in research question 3 had negative difficult estimates. The findings also indicated

that items, 1, 9, 13, 20, 22, 23, 24, 29, 35, 37, 38, 39, 40, 42, 43, 44 and 48 in table 3

of research question 3 had positive difficulty estimates. The finding agrees with

(Chong, 2013) that difficulty parameter or the threshold parameter value tells us how

easy or how difficult an item is. The finding of this study corresponds with Obinne

(2008) that negative difficulty estimates indicate that the items are easy while positive

difficulty estimates indicate that the items are hard. The findings which revealed that

the items were selected based on the b-value range of -3 to +3 corresponds with

(Baker, 2001) that theoretically, difficulty values can range from - 00 to + 00, in

practice, difficulty values usually are in the range of - 3 to + 3.

Item parameters of the Economics multiple choice test.

From the findings of the study, it was revealed that (20%) of items 1, 2, 16, 17,

22, 26, 27, 32, 49 and 50 indicated very low discriminating values while (36%) of

items 3, 4, 5, 6, 7, 10, 11, 12, 14, 15,19, 20,25, 33, 35, 36, 46 and 47 indicated low

59

discriminating values Also, (40%) of items 8, 9, 13, 18, 21, 23, 24, 28, 29, 30, 31, 34,

38, 39, 40, 41, 42, 44, 45 and 48 indicated moderate discriminating values and (4%) of

items 43 and 37 indicated a very high discriminating values. Discriminating parameter

indicates how well an item discriminate between respondents below and above the

item threshold parameter, as indicated by the slope of the item characteristics curves

(Reeve & Fayers, 2005). This result is in agreement with the findings of Baker (2001)

who described the range of values for item discrimination as follows: very low, 01 -

.34, Low, 35 - .64, moderate, 65 - 1.34 High, 1.35 - 1.69 and Very high, 1.70 and

above.

Guessing parameters of the Economics multiple choice test.

The findings of the study revealed that guessing values (c-values) within the

value range of 0.00 to 0.20 had forty five (45) items that is item 1, 2, 3, 4, 5, 6, 7, 8, 9,

10, 11,12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 24, 25, 26, 27, ………..37, 39, 41, 42,

45, 46, 47, 48, 49 and 50 are desirable. This lower c-value range indicates that the

probability of getting an answer correctly by mere guessing is low while five (5) items

that is items 23, 38, 40, 43 and 44 were considered not very good. This higher c-value

range indicates that the probability of getting an answer by mere guessing is high. The

finding was supported by Kamiri (2010) that the lowest c-values, the better indicating a

lower probability of getting the answer correct by mere guessing of low ability

examinees. Harris (2005) concluded that the items with 0.30 or greater c-values are

considered not very good, rather c-values of 0.20 or lower are desirable. In like

manner, Akindele (2003) also noted that items do not have perfect c-values because

examinees do not guess randomly when they do not know the answer.

60

Differential item functioning of the Economics multiple choice test with respect

to gender.

The findings of this study revealed that item forty six (46) items had

differential item functioning (DIF) items are item 1, 2, 3, 4, 5, 6, 7, 9, 10, 11, 12, 13,

15…………..50 excluding items 14, 21, 27and 46. Four (4) items (8%) that is item 14,

21, 27, and 46 were identified as not exhibiting differential functioning among male

and female students. This finding is of the same view with Davis (2002) that however,

sometimes items are found to behave differently in distinct groups such as gender or

language (such as loading on different dimensions in a multi-dimensional factor

analysis, or having largely different mean item scores). In other words, two examinees

with the same latent trait value but differing in other characteristics may have

differing probabilities of response. The findings were determined at 0.05 level of

significance. The findings of Madu (2012) is in line with the findings of this study

when it concluded in a study that thirty nine (39) items in the mathematics test (stared)

were identified as significantly exhibiting differential item functioning between male

and female examinees at .05 level of significant while 11 items do not differential

function between male and female examinees.

Conclusion

Based on the result of the findings the following conclusions were drawn:

1. That forty nine (49) items indicated high reliability of the test items while one

(1) item indicated low reliability.

61

2. That twenty one (21) items fitted the three parameter model while twenty nine

(29) items did not fit the three parameter model.

3. That thirty three (33) items indicated difficult items while seventeen (17) items

indicated easy items.

4. That Ten (10) items indicated very low discriminating values, eighteen (18)

items indicated low discriminating values, twenty (20) items indicated

discrimination moderate values and two (2) items indicated high discriminating

values.

5. That forty five (45) items were considered desirable, meaning that the

probability of getting an answer correctly by mere guessing is low while five

(5) were considered not very good, and the probability of getting an answer

correctly by mere guessing is high.

6. The findings further revealed that items functions differential in Economics

among male and female students.

Educational Implications of the Study

The findings of this study have obvious educational implication for teachers,

examination bodies, psychometricians and test developers.

From the findings of the study, it is possible for Economics teachers to identify

the difficulty of each item. The implication is that teachers should try as much as

possible to set questions that are not very easy or very difficult. It is also possible to

detect items that are functioning differently among male and female students. To

62

ensure this effectiveness, it becomes necessary that teachers especially Economics

teachers should set and administer items that are fair to ensure quality education. The

study has implication on guessing parameter, which every teacher setting questions on

multiple choice questions should note that the probability of the students getting

correct answer by guessing can still be low.

Since psychometric properties have adequate effect on questions, teacher

should try as much as possible to set quality tests that have adequate psychometric

properties.

For examination bodies like West African Examination Council, National

Examination Council and others, since Item Response Theory (IRT) was designed to

overcome the limitations of the Classical Test Theory (CTT), teachers, examination

bodies and psychometricians should encouraged to adopt IRT in developing test items

used in measuring students ability especially in Economics.

The findings have implication to the test developers, that it is likely to make

comparison of different items in order to assess their discrimination values, difficulty,

fit statistics and standard errors.

Recommendations

Based on the findings of the study the following recommendations were made:

1. The psychometricians and measurement expert should organize workshops to

educate teachers on the implications of quality tests. They should as well train

63

teachers to know about the modern measurement frame work called IRT as

well as the necessary interpretations involved.

2. The examination bodies and teachers should be encouraged to adopt (IRT) in

developing test items used in measuring students ability in Economics.

3. Education ministry and universities should try and assist students who are

interested to study a research on item response theory to get software and

necessary computer packages.

4. It is imperative to determine how the items in an instrument fit the IRT

parameter model, such as one parameter, two parameter and three parameter

logistic models.

5. The differential item functioning effects of items should be properly determined

in the test instrument to avoid gender differences.

Limitations of the Study

1. IRT is a new concept in education measurement in Nigeria, hence obtaining

relevant literature and studies in relation to Nigeria are very difficult.

2. In Nigeria, the software packages for Item Response Theory (IRT) analysis are

not available and the measurement experts who know how to calibrate on IRT

package are also very few.

64

Suggestions for Further Studies

The researcher suggests that further studies be carried out in the following area:

1. Application of item response theory in the development and validation of

multiple choice test in another area like Commerce, Government, Geography

e.t.c.

2. Detecting of differential item functioning in Economics multiple choice test.

3. A replication of this study using a wider geographical area, if possible the

whole Enugu state.

Summary of the Study

The study investigated the application of item response theory in the

development and validation of multiple choice questions in Economics. Item response

theory was seen as an important aspect of measurement theory that determines the

latent trait of the students. The report on the limitations of the classical test theory

motivated the researcher to embark on this study to determine latent traits of the

students using item level performance instead of aggregate level performance. The

study also examined the guessing and differential item functioning effect. Six research

questions guided the study and two null hypotheses were formulated and tested at 0.5

level of significance.

From the literature review, the concept of achievement test, procedures for

development of a test, qualities of a test, item analysis, Differential Item Functioning

(DIF), standard errors of measurement, concept of gender and analysis of fit were

discussed. Theoretical review as well as review of empirical studies was discussed.

65

The empirical studies gathered information on studies related to the present study.

However, none of the studies reviewed focused on the application of item response

theory in the development and validation of an instrument measuring achievement in

Economics. Instrumentation design was the design of the study, with 46 government

co-education schools used in Nsukka education zone while sample of 1005 students

was used. The researcher developed an instrument tilted Economics Multiple Choice

Test (EMT) with reliability co-efficient of 0.89. The instrument was used to carry out

the study. Data obtained were subjected to statistical analysis. Maximum likelihood

estimation technique of the BILOG-MG V3 of 3PL MODEL computer programming

was used to answer research question while BILOG-MG V3 of DIF MODEL

computer programming was used to test the hypotheses.

The analysis of the data indicated that;

1. That 98% of the items had standard error that indicated high reliability while 2%

had standard error that indicated low reliability.

2. That forty nine (58%) items did not fit the three parameter logistic (3pl) model

while 42% fitted the three parameter model.

3. That 66% of the items were easy while twenty 34% of items were difficulty.

4. That 20% of the items had very low discriminating values, 36% items

had low discriminating values, 40% of the items had moderate discriminating

values and 4% items had high discriminating values.

66

5. That 90% items had the lowest c-values which implies that the probability of the

student getting the answer correct is very low while 10% implies that the

probability of getting an answer correctly by mere guessing is high.

6. The results of the analysis indicated that male and female Economics students

function differential in 46 items and no difference in 4 items.

Following the discussion of the findings, the educational implications of the study

were enumerated. It was recommended that the psychometricians and measurement

expert should organize workshops to educate the teachers on the implications of

quality tests. They should as well train teachers to know about the modern

measurement frame work called IRT as well as the necessary interpretations involved.

The limitation of the study was highlighted and suggestions for further studies were

made. Based on the findings of the study, it was concluded that 98% of the items had

standard error below 0.50, indicating high reliability.

67

REFERENCES

Abonyi, O. S. (2011). Instrumentation in behavioral research: A practical

approach. Enugu: TIMEX Publishing Company.

Adedoyin, O. O. (2010). Investigating the invariance of person parameter estimates

based on classical test and item response theories. International journal of

educational science. Retrieved November 30, 2012, from

http://www.uniBotswana./journal/ education

/science

Adeyegbe, S. (2004). History of West African Examinations Council. Retrieved

October 12, 2012, from http://www.waecnigeria. org/home.htm.

Akindele, B. P. (2003). The development of an item bank for selection tests into

Nigerian universities: an exploratory study. Unpublished doctoral dissertation,

University of Ibadan, Nigeria.

Ali, A. (2006). Conducting research in education and the social sciences. Enugu:

Tashiwa Networks Ltd.

Anaekwe, M.C. (2007). Basic research methods and statistics in education and social

sciences (2nd ed.). Onitsha: Sofie Publicity and Printry Limited.

Anastasi, A., & Urbina, S. (2002). Psychological testing. New York: Prentice Hall.

Anene, G. U., & Ndubisi, O.G (2003). Test development process. In B. G. Nworgu

(Ed.), Educational measurement and evaluation: Theory and practice

(pp.110-122). Nsukka: University Trust Publishers.

Anikweze, C. M. (2010). Measurement and evaluation: For teacher education. (2nd

ed.). Enugu: SNAAP Press Ltd.

68

Asadu, I. N. (2001). Trend in student’s enrolment and performance in senior

secondary certificate examination in Economics. Unpublished doctoral

dissertation, University of Nigeria, Nsukka.

Baker, F. B. (2001). The basics of item response theory. (2nd ed.).United States of

America: ERIC clearinghouse on assessment and evaluation.

Baumgartner. T. A. (2002). Conducting and reading research in health and human

performance (3rd ed.) Mc-Graw Hill high education New York.

Bhakta, B., Tennant, A., Horton, M., Lawton, G., Andrich, D., (2005). Using item

response theory to explore the psychometric properties of extended matching

questions examination in undergraduate medical education. journal of medical

education: 5(9). Retrieved October, 12, 2012, from http://www.biome central.

com/1472-6920/5/9. doi: 10.1186/1472-6920-5-9.

Black, P. J., & William, D. (2009). Assessment and classroom learning. Assessment in

education. 5, 7-74

Bradley, P., & Herrin, J. (2004). Development and validation of an instrument to

measure knowledge of evidence-based practice and searching skills. Med

Educ. Online Retrieved April, 25, 2013, from http://www.med-ed-online.org.

Bush, M. (2001). A multiple choice test that rewards partial knowledge. Journal of

further and higher education, 25(2), 157-163.

Chatterji, M. (2003). Designing and using tools for educational assessment. Journals

of education: Retrieved January, 21, 2014 from

http://www.columbia.edu/~mb1434/EdAssess.htm

Chong, H. Y. (2013). A Simple guide to the Item Response Theory (IRT) and Rasch

modeling. Retrieved from March, 2013, from http:// www.creative-

wisdom.com.

Crocker, L. & Algina, J. (2008). Introduction to classical and modern test theory.

Fort Worth: Harcourt Brace Jovanovich.

69

Davis, L. L. (2002). Strategies for controlling item exposure in computerized adaptive

testing with polytomously scored items. Unpublished doctoral dissertation,

University of Texas at Autin.

De-Jong, M. G., Steenkamp, E.M., Fox, J. & Baumgartner, H. F. (2008). Using item

response theory to measure extreme response style in marketing research: A

global investigation. Journal of marketing research: 45(1), Retrieved October,

14, 2012, from http://www.journl.marketing power.com

Denga, D. I. (2003). Educational measurement: continuous assessment and

psychological testing. Calabar: Rapid educational publisher Ltd.

Douglas, G. (1987). Latent trait measurement models. In T. Smart, (ed.) educational

research measurement model: An international hand book. (pp. 240-259).New Jetsey:

Alnold Press.

Ebo, E. C. (2009). Social and economic research: Principles and methods (2nd ed.).

Enugu: African Institute for Applied Economics.

Egunjobi, A., & Egwaikhide, F. (2010). Economics for Senior Secondary School.

Lagos: Macmillan Nigeria Publishers Ltd.

Emaikwu, S.O. (2011). Issues in test item bias in public examinations in Nigeria and

implications for testing. International journals of academic research in

progressive education and development. 1(1) (pp.40)

Embertson, S. E., Reise, S. P. (2000). Item response theory for psychologists.

Mahwah, NJ: Lawrence Erlbaum Associates.

Ercikan, K. & Koh, K. (2005). Construct comparability of the English and French

versions of TIMSS. International journal of testing (5), 23-35.

Eze, C.O. & Onah, P.C (2005) Measurement evaluation in education. Enugu:

Computer Edge Publishers.

Ezeh, D.N. (2003). Reliability and validation of tests. In B. G. Nworgu (Ed.),

Educational measurement and evaluation: Theory and practice (pp.123-135).

Nsukka: University Trust Publishers.

Federal Republic of Nigeria (FRN) (2004). National Policy on Education (4th ed.).

Lagos: NERDC press.

70

Falayajo, W. (1986). Philosophy and theory of continuous assessment. A paper

presented at a workshop for inspectors of education in Ondo state, Nigeria. 4th

,

December.

Ferguson, G. A. (2011). A bi-factor analysis of reliability coefficients: the British

journal of psychology. Retrieved September, 5, 2012 from general section-

Wiley online library.

Hambleton, R. K., & Swaminathan, H. (1991). Item response theory: Principles and

applications. Boston: Kluwer-Nijhoff.

Harbor-Peters, V. F. (1999). Noteworthy points on measurement & evaluation. Enugu:

Snap Press Ltd.

Harlen, W., & Deakin-Crick, R. (2002). A systematic review of the impact of

summative assessment and tests on students’ motivation for learning. In EPPI-

Centre (Ed.), Research evidence in education library (1.1 ed., pp. 153–).

London, UK: University of London Institute of Education Social Science

Research Unit.

Harris, D. (2005). Educational measurement issues and practice: comparison of 1-, 2-,

and 3- parameter IRT models. DOI: 10.1111/j.1745-3992.1989.tb00313.x.

Henard, D .H. (2000), Item response theory, in reading and understanding more -

multivariate statistics, Vol. II, Larry Grimm and Paul Yarnold, (Eds).,

Washington, DC: American Psychological Association, 67-97.

Huba, M. E. & Freed, J. E. (2000). Learner-centered assessment on college campuses:

Shifting the focus from teaching to learning. Boston, MA: Allyn & Bacon.

Ifeakor, A. C. (2011). Psychological measurement & evaluation in Education: Issues

and application. Onitsha: Folmech Printing and Publishing Co.Ltd.

Jeff, J., Sridhar, M., & Beverly, M. (2006), Estimating student proficiency using an

item response theory. Journal of intelligent tutorial system.4053, 473-480.

Retrieved September, 5, 2012 from http://www.link. springer

.com/10. 1007%2.pdf.

Jeffrey, P. D. & Wendy, M. K. (2006). Development and validation of an instrument

to assess secondary school students’ perceptions of assessment tasks. Journals

of Educational Studies, (32) 1. Retrieved June, 5, 2013, from

http://www.unilorin.edu.ng.

71

Karami, H. (2010). A Differential Item Functioning analysis of a language proficiency

test: an investigation of background knowledge bias. Unpublished Master‟s Thesis. University of Tehran, Iran.

Kim, S. (2006). A comparative study of Item response theory fixed parameter

calibration methods. Journal of Educational Measurement. Retrieved January, 30,

2013 from http://www.

measuredprogress.org/learning

Korashy, A.F. (1995). Applying the Rash model to the Selection of items for mental

ability test. Educational and Psychological Measurement, 55(5) 753-763.

Kyung, T. H. (2013). Windows software that generates IRT parameters and item

responses: Research and Evaluation Program Methods (REMP). University of

Massachusetts Amherst.

Lee. J. (2001). Inter State variation in rural students’ achievement and schooling

conditions. Retrieved June, 15, 2013 from http://www.ericdigest.org/2002

MacDonald, P. & Paunonen, S.Y. (2002). A Monte Carlo comparison of item and

Person statistics based on item response theory versus classical test theory:

International journals of Measurement 62(6): 91-943. Retrieved May, 30,

2012, from http://www.eri.ed.gov/ERICW.

Madu, B. C. (2012). Analysis of Gender-Related Differential Item Functioning in

Mathematics Multiple Choice Items Administered by West African

Examination council (WAEC). Journal of Education and Practice. Retrieved

May, 15, 2012, from ISSN /2222.1735 (Paper) 2222-288X (Online) Vol 3, N0.

8 2012.

Maduewesi, U.B. (1999). Curriculum implementation and instruction. Onitsha: West

and Solomon publishing COY LTD.

Malcolm, T. (2003). An achievement test. Retrieved November, 20, 2013, from

http://www.wisegeek.com/what-is-an- achievement-test.htm

Mankiw, N. G. (2001). Principles of Economics. 2nd ed. Forth Worth: Harcourt

Publishers.

Martyn, S. (2009). Face validity. Journal of Educational Measurement. Retrieved

December, 10, 2013 from Explorable.com: http://explorable.com/face-validity.

Mehrens, W.A. & Lehmann, I.J. (1978). Measurement and evaluation in education &

psychology. (2nd ed.). New York: Holt Rinehart and Winsten Inc.

Meredith, D. G., Joyce, P. G., & Walter, R B., (2007). Educational research: an

introduction (8th ed.). United State of America: Pearson Press.

72

Ndalichako, J.L & Rogers, W.T. (1997). Comparison of finite state score theory,

classical test theory and item response theory in scoring multiple-choice Item.

Educational and psychological measurement, 57, 580-589.

Neal, D. J., Corbin, W. R., & Fromme, K. (2006). Measurement of alcohol-related

consequences among high school and college students: Application of item-

response models to the Rutgers Alcohol Problem Index. Psychological

Assessment, 18, 402-414.

Nenty, H. J. (2004). From Classical Test Theory (CTT) to Item Response Theory

(IRT): An introduction to a desirable transition. In: OA Afemikhe, JG

Adewale (Eds.): Issues in Educational Measurement and Evaluation in

Nigeria. Institute of Education, University of Ibadan, Ibadan, Nigeria, pp.372-

384.

Nering, M. L., & Ostini, R. (2010). Handbook of polytomous item response theory

models. New York: Routledge.

Nering, M. L. & Ostini, R. (2006). Polytomous item response theory models.

Thousand Oaks, CA: Sage.

Nkpone, H.L. (2001). Application of latent trait models in the development and

standardization of physics achievement test for senior secondary students.

Unpublished doctoral dissertation, University of Nigeria, Nsukka.

Nworgu, B.G. (2006). Introduction to Educational Measurement and evaluation:

theory and practice (2nd ed.). Nsukka: Hallman Publisher.

Nworgu, B.G. (2006). Introduction to Educational Measurement and evaluation:

theory and practice (2nd ed.). Nsukka: Hallman Publisher.

Obemeata, J.O. (1991). Pupil’s perspective of the purpose of economics education in

Nigerian secondary grammar school. West African Journal of Education.

21(2). Retrieved December, 12, 2012, from

http://www.unilorin.edu.ng/ journal/ education.

Obidiegwu, U. J. (2008). Development and Validation of Physical Education

Achievement test (PEAT) for adult learners in Anambra state. Selected work.

Retrieved April, 22, 2013, from http://works.bepress.com druche_Obidiegwu/6.

Obinne, A.D.E. (2008). Psychometric properties of senior certificate biology

examinations conducted by West African Examinations council: Application of

item response theory. Unpublished doctoral dissertation, University of Nigeria,

Nsukka.

73

Obinne, A. D. E. (2012) Using IRT in determining test item prone to guessing.

Reprieved June, 20, 2013, URL: http://dx.doi.org/wje.v2

n1p91.

Obinne, A.D.E. (2013). Test item validity: item response theory (IRT) perspective for

Nigeria. Research Journal in Organizational Psychology & Educational

Studies 2(1). Retrieved January, 28, 2014, from www.emergingresource.org

Ohuche, R. O. & Ukeje, S.A (1977). Testing and evaluation in education. Lagos:

African Educational Resources.

Okeke, F. N. (2006). Women and leadership in higher education; facing international

challenges and maximizing opportunities. Association of Common Wealth

University Bulletin, 147, 14-17.

Okoh, E.E. (2007). Correlates of marital adjustment among married persons in Delta

state: Implication for guidance and counselling. Unpublished PhD Thesis

University of Benin, Benin city.

Okoro, O.M. (2006). Measurement and evaluation in education. Uruowulu-Obosi:

Pacific Publishers Ltd.

Okoro, C. O. (2010). Development and validation of extracurricular instructional

package in social studies. Faculty of education university of Port Harcourt, Port

Harcourt River state Nigeria. Journals of academia Retrieved May, 30, 2013,

from http://www.sciencepub.net

Olusola, O., Adesope, C., Gress, L. Z. & Nesbit, J. C. (2008). Validating the

psychometric properties of achievement goal questionnaire using item response

theory. Presented at the 2008 Annual Meeting of the Canadian Society for the

Study of Education, May 31- June 3, Vancouver, B. C., Canada.

Onunkwo, G .I .N. (2002). Fundamentals of education measurement and evaluation.

Owerri: Cape Publishers Int’l Ltd.

Orangi A.M. &, Dorani, K. (2010). Developing a social studies achievement test for

high school students based on item-response theory (IRT). Journal of

psychological models and methods: 1(1); 1-13. Retrieved, July, 11, 2013, from

http://www. Scientific Information Database (SID).

Orji, K.O. (2002). Basic Principles for Agricultural Project Policy Analysis. Nsukka:

Price Publishers.

74

Osterlind, S. J. (2012). Item response theory. Journals of home school and academic

learning. Retrieved November, 30, 2012, from

http://www.education.com>Home>School and Academics>

classroom learning.

Osterlind, S. J., & Everson, H. T. (2009). Differential item functioning. Thousand

Oaks, CA: Sage Publishing.

Palmieri, P.A. (2012). Item response theory method and application gaining support

as assessment instrument. Retrieved December, 18, 2012, from

http://www.istss.org/publications.

Polit, D. F. & Hungler, B. P. (2002). Nursing research principles and methods (8th

ed.). Philadelphia: Lippincott.

Reeve, B. B. (2000). Item and scale-level analysis of clinical and non-clinical

sample responses to the MMPI-2 depression scales employing item response

theory. Unpublished doctoral dissertation, University of North Carolina at

Chapel Hill.

Reeve, B. B. (2002). An introduction to modern measurement theory. Bethesda,

Maryland: National cancer institution.

Reeve, B. B. & Fayers, P. (2005). Applying item response theory modeling for

evaluating questionnaire items and scale properties. In P. Fayers and R.D.

Hays (Eds.), Assessing quantity of life in clinical trials: method of practice.

(2nd ed.). USA: Oxford university press. Retrieved September, 11, from

http://cancer. Unic.edu/research/faculty/display member-plone.asp?ID-694.

Reise, S. P., & Waller, N. G. (1990). Fitting the two-parameter model to personality

data. Applied psychological measurement, 14, 45-58.

Robbins, L. (1932). An Essay on the Nature and Significance of Economic Science.

(2nd

ed.). London: Macmillan. Links for 1932 HTML and 1935 facsimile.

Thissen, D. & Stemberg, R. (1988). Test validity. Journals of education testing and

measurement. Retrieved December, 18, 2012, from http://www. Error!

Hyperlink reference not valid.> Testing & measurement.

Troy-Gerard, .C. (2004). An empirical comparison of item response theory and

Classical test theory item/person statistics. Unpublished doctoral dissertation,

University Texas A&M.

Vander, L. W. J., & Hambleton, R. K. (1997). Handbook of modern item response

theory. New York: Springer-Verlag.

75

Wendy, K. A. & Carl, E. W. (2010). Development and validation of instruments to

measure learning of expert-like thinking. International journal of science

education. Retrieved December, 28, 2013, from http://www.informaworld.

com/smpp/title~content=t713737283

World Health Organization (2002). "Gender and Reproductive Rights: Working

Definitions". Retrieved June 15, 2013 from http://www.ericdigest. org/2002.

Zumbo, B. D. (2007). Validity: Foundational issues and statistical methodology. In C.

R. Rao & S. Sinharay (Eds.), Handbook of statistics and Psychometrics

(pp.45–79). Amsterdam, the Netherlands: Elsevier Science Publisher.

Zumbo, B. D. (2007). Three Generations of DIF Analyses: Considering Where It Has

Been, Where It Is Now, and Where It Is Going. Language Assessment

Quarterly, 4(2), 223–233.

76

APPENDIX A

AREA OF THE STUDY

LIST OF SCHOOLS IN NSUKKA EDUCATION ZONE

SS2 SS2

S/N NUMBER OF SCHOOLS IN NSUKKA

LOCAL GOVERNMENT AREA

M F TOTAL

1 S T C Nsukka 372 - 372

2 Nsukka High sch. Nsukka 338 - 338

3 Q R S S Nsukka - 215 215

4 Com. Sec. Sch. Isienu 100 148 248

5 Urban Girls Sec. Sch. Nsukka - 169 169

6 Opi High Sch. Opi 44 36 80

7 Com. Sec. Sch. Lejja 28 40 68

8 Com. Sec. Sch. Edem 63 88 151

9 Com. High Sch. Umabor 50 73 123

10 Com. Sec. Sch. Ehendiagu 8 2 10

11 Com. Sec. Sch. Okpuje 40 45 85

12 Com. Sec.Sch. Ibagwani 79 85 164

13 Com.Sec. Sch. Obimo 14 32 46

14 Com. Sec. Sch. Obukpa 51 76 127

15 Com. Sec. Sch. Edeoballa 140 160 300

77

16 Com. Sec. Sch. Ezebunagu 15 20 35

17 St. Cyprians Girls Sec. Sch. Nsukka - 230 230

18 Com. Sec. Sch. Nru Nskka 55 101 156

19 Model Sec. Sch. Nsukka 52 116 168

20 Girl sec. Sch. Opi - 31 31

21 Com. Sec. Sch. Alor-uno 25 29 54

22 Com. Sec. Sch. Opi agu 9 4 13

23 Lejja high sch. Lejja 31 42 73

24 Agu Umabor Sec Sch. Umabor 18 20 38

25 Urban Boys Sec. Sch. Nsukk 121 - 121

26 Comm. Sce. Sch. Akpotoro Obimo 19 38 57

27 Comm. Sec. Sch. Okutu 10 12 22

28 Edemani High Sch. Edemani 16 23 39

29 Comm. Sec. Sch. Breme 12 10 22

30 Comm. Sec. Sch. Ajona Obimo - - -

SCHOOLS IN IGBO-ETITI LOCAL GOVERNMENT

1 Premier Sec. Sch. Ukehe 90 92 182

2 St.James Aku 26 - 26

3 Com. High Sch. Ukehe 48 57 105

4 Girls Sec. Sch. Aku - 96 96

78

5 Com. Sec. Sch. Ozalla 35 20 55

6 Com. Sec. Sch. Ohodo 33 37 70

7 Com. High Sch. Ekwegbe 54 54 108

8 Com.Sec. Sch. Ukopi 22 34 56

9 Oranadu Com. Sch. Ukehe 44 50 94

10 Com. Sec. Sch. Ohebe Dim 61 73 134

11 Com. Sec. Sch. Umunko 40 50 90

12 Com. Sec. Sch. Aku 93 71 164

13 Com. Sch. Sch. Umunna 30 29 59

14 Igb-Etiti Sec. Sch. Ikolo 24 27 51

15 Akutara Sec Sch. Ohodo 30 34 64

16 Comp. Sec Sch. Diogbe - - -

SCHOOLS IN UZO-UWANI LOCAL GOVERNMENT AREA

1 Adada Sec. Sch. Nkpologu 18 9 27

2 Uzo-Uwani Sec. Sch. Adani 20 30 50

3 Attah Mem. High. Sch. Adaba 6 5 11

4 Girls Sec. Sch. Umulokpa - 36 36

5 Com. Sec Sch. Abbi-ugbene 30 36 66

6 Com. Sec. Sch. Upkata 30 13 43

7 Com. Sec. Sch.Nimbo 53 29 82

79

Source: Post Primary School Management Board (PPSMB) Nsukka 2013/2014

academic session.

8 Com. Sec. Sch. Ogurugu 27 20 47

9 Uvuru Sec.Sch. Uvuru 20 20 40

10 Com. High. Sch. Nrobo 20 20 40

11 Welfare Sec. Sch Opanda 10 4 14

12 Com. Sec Sch. Ugbene-Ajima 28 36 64

80

APPENDIX B

POPULATION OF THE STUDY

LIST OF CO-EDUCATION GOVERNMENT SENIOR SECONDARY

SCHOOLS IN NSUKKA EDUCATION ZONE

SS2 SS2

S/N NUMBER OF SCHOOLS IN NSUKKA


M F TOTAL

1 Com. Sec. Sch. Isienu 100 148 248

2 Opi High Sch. Opi 44 36 80

3 Com. Sec. Sch. Lejja 28 40 68

4 Com. Sec. Sch. Edem 63 88 151

5 Com. High Sch. Umabor 50 73 123

6 Com. Sec. Sch. Ehendiagu 8 2 10

7 Com. Sec. Sch. Okpuje 40 45 85

8 Com. Sec.Sch. Ibagwani 79 85 164

9 Com.Sec. Sch. Obimo 14 32 46

10 Com. Sec. Sch. Obukpa 51 76 127

11 Com. Sec. Sch. Edeoballa 140 160 300

12 Com. Sec. Sch. Ezebunagu 15 20 35

81

13 Com. Sec. Sch. Nru Nskka 55 101 156

14 Model Sec. Sch. Nsukka 52 116 168

15 Com. Sec. Sch. Alor-uno 25 29 54

16 Com. Sec. Sch. Opi agu 9 4 13

17 Lejja high sch. Lejja 31 42 73

18 Agu Umabor Sec Sch. Umabor 18 20 38

19 Comm. Sce. Sch. Akpotoro Obimo 19 38 57

20 Comm. Sec. Sch. Okutu 10 12 22

21 Edemani High Sch. Edemani 16 23 39

22 Comm. Sec. Sch. Breme 12 10 22

GRAND TOTAL 879 1200 2079

SCHOOLS IN IGBO-ETITI LOCAL GOVERNMENT

1 Premier Sec. Sch. Ukehe 90 92 182

2 Com. High Sch. Ukehe 48 57 105

3 Com. Sec. Sch. Ozalla 35 20 55

4 Com. Sec. Sch. Ohodo 33 37 70

82

5 Com. High Sch. Ekwegbe 54 54 108

6 Com.Sec. Sch. Ukopi 22 34 56

7 Oranadu Com. Sch. Ukehe 44 50 94

8 Com. Sec. Sch. Ohebe Dim 61 73 134

9 Com. Sec. Sch. Umunko 40 50 90

10 Com. Sec. Sch. Aku 93 71 164

11 Com. Sch. Sch. Umunna 30 29 59

12 Igb-Etiti Sec. Sch. Ikolo 24 27 51

13 Akutara Sec Sch. Ohodo 30 34 64

GRAND TOTAL 604 628 1232

SCHOOLS IN UZO-UWANI LOCAL GOVERNMENT AREA

1 Adada Sec. Sch. Nkpologu 18 9 27

2 Uzo-Uwani Sec. Sch. Adani 20 30 50

3 Attah Mem. High. Sch. Adaba 6 5 11

4 Com. Sec Sch. Abbi-ugbene 30 36 66

5 Com. Sec. Sch. Upkata 30 13 43

83

Source: Post Primary School Management Board (PPSMB) Nsukka 2013/2014

academic session.

6 Com. Sec. Sch.Nimbo 53 29 82

7 Com. Sec. Sch. Ogurugu 27 20 47

8 Uvuru Sec.Sch. Uvuru 20 20 40

9 Com. High. Sch. Nrobo 20 20 40

10 Welfare Sec. Sch Opanda 10 4 14

11 Com. Sec Sch. Ugbene-Ajima 28 36 64

262 222 484

84

APPENDIX C

SAMPLE OF THE STUDY

POPULATION DISTRIBUTION OF 3795 SS2 ECONOMICS STUDENTS IN

46 GOVERNMENT CO-EDUCATION SENIOR SECONDARY SCHOOLS IN

NSUKKA EDUCATION ZONE ACCORDING TO LOCAL GOVERNMENT

AREA.

The researcher wishes to draw a sample of 1005 from this population.

Nsukka local government area

Male: = 879 x1005 = 233 Female: 1200x1005 = 318 total = 551

3795 3795

Igbo-Etiti local government area

Male: = 604x1005 = 160 Female: 628x1005 = 166 total = 326

3795 3795

Uzo-Uwani local government area

Male: = 262 x1005 = 69 Female: 222x1005 = 59 total = 128

3795 3795

Grand total = 1005

Total number of sample size is 1005. It should be observed that the relative

proportions of the Nsukka, Igbo-Etiti and Uzo-Uwani strata in the sample are exactly

the same in their relative proportions in the population.


Nsukka Igbo-Etiti Uzo-Uwani Total

Male Female Male Female Male Female

Size 879 1200 604 628 262 222

2079 1232 484 3795

85

APPENDIX D

INSTRUMENT

ECONOMICS MULTIPLE CHOICE TEST (EMT) FOR SENIOR SECONDARY

SCHOOL ECONOMICS STUDENTS

CLASS: SS2

Time: 1 hour

Instruction: Answer all questions. Identify the correct option lettered A-D for each

question

Please indicate by ticking (�) in the box provided as applicable to you.

SEX: Male Female

NAME: ---------------------------------------

1. A cooperative bank is an institution establish for the main purpose of

A. Mobilizing savings of the cooperative societies for bank deposits

B. Established to accept risks and losses as they occurs in business

C. To finance personal buildings

D. Providing long-term and medium-term loans for the development of companies

2. Where a commodity takes an insignificant proportion of the consumer’s income

demand for it will be

A. Unitary elastic

B. Price inelastic

C. Fairly elastic

D. Income inelastic

86

3. The liquidity ratio of a commercial bank refers to the

A. Total amount of cash for the bank’s treasury

B. Total amount of cash for the bank in the central bank

C. Proportion of the bank cash that should be on loan

D. Proportion of the bank’s total assets which should be held in cash and liquid

form.

4. The demand curve for a commodity is downward sloping because the consumer

will pay

A. Less as the marginal utility falls

B. More as the marginal utility falls

C. Less as the total utility falls

D. More as the average utility falls

5. A decrease in the demand for a product X resulted in a decrease in the demand for

another product Y. the demand for X and Y is

A. Derived

B. Composite

C. Joint

D. Competitive

6. The main feature of regressive taxation is that its rate

A. Is higher when income is higher

B. Is equal tax for all categories of people

C. Remains constant when income increases

D. Reduce when income increases

7. Goods for which demand rises as income arises

A. Complementary goods

B. Inferior goods

87

C. Normal goods

D. Substitutes goods

8. The demand curve is

A. Downward slopping from left to right

B. Downward slopping from right to left

C. Upward slopping from right to left

D. Drawn vertically

9. Which of the following is not a function of central bank?

A. Acceptance of deposits from the customers

B. Bankers to commercial banks

C. Bankers to the government

D. Lenders of last resort

10. The capitalist economic system is characterized by all the following except

A. Private ownership of the means of production

B. Inheritance

C. Profit motive

D. ownership and management of the means of production are vested in the state

11. One of these is not a characteristic feature of inflation

A. Too much money chasing too few goods

B. A fall in employment opportunity

C. Too much money in circulation

D. A fall in the value of money

12. Which of the following is the instrument of control applied by the central bank

to ensure smooth running of economy?

A. Bank standard

88

B. Deposit slip

C. Bank draft

D. Use of reserve ratio

13. The upwards movement sloping of the supply curve indicates that

A. More will be supply as price rises

B. Less will be supplied as prices

C. Supply is not a function of price

D. Supply is static and demand is dynamic

14. Equilibrium price is reached when

A. Demand is less than supply

B. Supply is greater than supply

C. Demand equals supply

D. None of the above

15. When a small change in price brings about a bigger change in the quantity

supplied, the supply is

A. Relatively elastic

B. Relatively inelastic

C. Perfectly inelastic

D. Unitarily inelastic

16. Development bank mainly provide

A. Savings account facilitates for a developing Economy

B. Foreign exchange facilitates for importer and exporters

C. Capital for development of special banks

D. Capital for development of schools

89

17. Mortgage banks give loans to investors on long term basis to

A. Finance agriculture

B. Establish banks

C. Acquire machinery

D. Build houses

18. Limitations to mobility of labour includes the following except

A. Poor salary and wages

B. Provision of good working condition

C. Good climatic condition

D. Provision of social amenities

19. Geographical mobility of labour indicates

A. The movement of workers from one occupation to another

B. The workers within the same industry or from one industry to another

C. The movement of workers from one part of a country to another

D. The movement of workers from one geographical location to another

20. The central banks controls credit in the economy through the use of

A. Legal tender

B. Travellers cheques

C. Foreign exchange instruments

D. Open market operation

21. A commercial bank is able to create money by

A. Printing

B. Maintaining reserves

C. Creating a demand deposit as it gives a new loan

D. Issuing cheques to depositors

90

22. The system whereby the ownership and management of the means of production

are vested in both the private and public sectors is known as

A. Socialist economy

B. Mixed economy

C. Capitalist economy

D. communist economy

23. Which of the following defines inflation?

A. A buoyant economy

B. A reduction in taxes

C. A continuous rise in prices

D. A continuous fall in prices

24. A change in supply is mainly caused by

A .change in income

B. Weather condition

C. Changes in the price of the commodity

D. A change in taste and fashion

25. A demand schedule shows the quantities of goods that are

A. Bought at given prices at a time

B. Supplied at given prices at a time

C. Produced at given prices at a time

D. Reserved for future consumption

26. The point of interaction between the demand curve and supply curve is called

A. The point of intersection

B. The point of supply and demand curve

C. The equal point of demand and supply

91

D. The Equilibrium point

27. If the central bank intends to increase the money supply through open market

operations, then it will

A. Sell securities in the open market

B. Buy securities in the open market

C. Issue more currency notes

D. Give loan to the commercial bank.

28. Banks creates money by

A. Giving draft to customers

B. Printing more money

C. Lending out deposits to borrowers

D. Issuing cheques

29. The notion of short run and long run period is responsible for grouping cost into

A. fixed and variable

B. Implicit and explicit

C. Average and total

D. Capital and running

30. The total amount of goods that can be bought at a given price and at a particular

period of time

A. Demand

B. Supply

C. Market

D. Production

92

31. The central bank controls the activities of other banks by all but one of the following

A. Taxation

B. The purchase of sale of government bonds on the open market

C. special deposits

D. The use of bank rate

32. The implicit cost that economics consider but accounts do not

A. Fixed cost

B. Variable cost

C. Opportunity cost

D. Marginal cost

33. A tax is said to be regressive when the proportion paid by the

A. Rich pay a greater proportion of their income than the poor

B. Rich pay an equal amount with the poor

C.The low-income group is higher than that paid by the higher income earners

D. The low-income does not pay income

34. The law of demand says that the

A. Higher the price the lower the quantity demanded

B. Higher the price the lower the quantity demanded

C. Lower the price the lower the quantity demanded

D. Lower the price the higher the quantity demanded

35. A Deficit budget is usually drawn up during

A. Economics supply

B. Full employment

C. Inflationary period

D. Economic recession

93

36. Given that the fixed cost is N500.00 variable cost is N1500 and output is 50 units.

What will be the average cost of producing one unit?

A. N21000

B. N60.00

C. N50.00

D. N 40.00

37.A surplus budget means

A. Government spends more money than it receives as revenue

B. Government spends less than it actually receives as revenue

C. When the desired level of full employment exist in the economy

D. Government spends equal money with what it receives

38. The term mobility of labour refers to

A. Movement of workers from one country to another

B.Movement of workers from one occupation and geographical area to another

C. movement of workers from one occupation to another

D. Movement of workers from one geographical area to another

39. Public finance is the study of method employed by government

A. To raise revenue and how it spends the revenue and manages the national debt

B. To give employment opportunity

C. Produce goods and services

D. To raise revenue without spending

40. The following are the principles of good taxation except

A. Equity taxation

B. Benefit principle

C. Indirect tax

94

D. Economy

41. The following are the types of demand except

A. Derived demand

B. Comprehensive demand

C. Joint or complementary demand

D. competitive demand

42. The type of Economic system in which the ownership and management of all

means of production are vested in the hands of private individuals is known as

A. Capitalism

B. Socialism

C. Communism

D. Mixed economy

43. The full meaning of (VAT) is

A. Value additional tax

B. Variable added tax

C. Value application tax

D. Value added tax

44. The type of bank that requires collateral security from customers before issuing

loan is called

A. Central bank

B. First bank

C. Commercial bank

D. Insurance company

45. One of the features of mixed economy is that

A. Resources are jointly owned by public and private sectors

B. It involves a great deal of central economic planning

95

C. Inheritance

D. Ownership is privately owned

46. The following factors influence the size of labour force except

A. Total population of the country

B. Role of women in t he society

C. Retirement

D. Income

47. The following are reasons for labor mobility except

A. Promotion or transfer of workers

B. Bad management and lack of job security

C. Provision of good working conditions

D. Regular promotion and payment of salary

48. Commercial banks settle their inter-bank indebtedness through credit in the

economy through the use of

A. Merchant bank

B. Central bank

C. Development bank

D. Stock exchange

49. The following factors affects efficiency of labour except

A. Health

B. Working conditions

C. Specialization and division of labour

D. Poor salary

50. Demand for labour indicate the

A. Number of workers that are due for retirement

B. Number of workers that needed for promotion

C. Number of workers that that are brought into close contact with one another

D. Number of workers that are needed by producers to take part in productive

activities.

96

APPENDIX E

SCORING GUIDE FOR ECONOMICS MULTIPLE CHOICE TEST (EMT)

S/N 1 2 3 4 5 6 7 8 9 10 11 12 13

Answer A B A C C D B A A D A D A

S/N 14 15 16 17 18 19 20 21 22 23 24 25 26

Answer C B C D A D D C B C A B D

S/N 27 28 29 30 31 32 33 34 35 36 37 38 39

Answer D C A C A C C D D D B B A

S/N 40 41 42 43 44 45 46 47 48 49 50

Answer C B A B C A D C B D D

97

APPENDIX F

TABLE OF SPECIFICATION FOR SS2 ECONOMICS MULTIPLE CHOICE

TEST (EMT) FOR SS2 ECONOMICS STUDENTS

S/

n

Content

areas

Cont. % Know. Comp. Appl. Analys. Synth

esis

Evalu

ation

Tot

al

35% 15% 15% 25% 5% 5%

1 Demand

and

Supply

30% 5(2,5,24,

30, 25)

2(8, 15) 2(13, 4) 4(7,14,

26, 34)

- 1(41) 14

2 Financia

l

institutio

n

20% 4(3,9,28,

1)

2(48,12) 2(17,

20)

3(16,

44,27)

1(31) 1(21) 13

3 Public

finance

15% 2(33, 40) 2(37,43) - 2(6,39) - 1(35) 7

4 Labour

force

15% 2(18,50) 2(19,46) 1(38) 2(47,49

)

- - 7

5 Alternati

veecono

mic

system

10% 2(10,45) 1(22) 1(42) 1(44) - - 5

6 Theory

of

Cost

5% 1(36) - - 1(32) - 2

7 Inflation

5% 1(23) - - 1(11) - - 2

100% 17 9 6 14 1 3 50

98

APPENDIX G

COMPUTATION OF KR20 RELIABILITY CO-EFFICIENT FOR SS2

ECONOMICS MULTIPLE CHOICE TEST (EMT)

Items No. passing No.

failing

Proportion

passing(p)

Proportion

failing (q)

Pq

1 12 13 0.48 0.52 0.2496

2 17 8 0.68 0.32 0.2176

3 14 11 0.56 0.44 0.2464

4 17 8 0.68 0.32 0.2176

5 16 9 0.64 0.36 0.2304

6 18 7 0.72 0.28 0.2016

7 15 10 0.6 0.4 0.24

8 15 10 0.6 0.4 0.24

9 18 7 0.72 0.28 0.2016

10 13 12 0.52 0.48 0.2496

11 16 9 0.64 0.36 0.2304

12 18 7 0.72 0.28 0.2016

13 17 8 0.68 0.32 0.2117

14 14 11 0.56 0.44 0.2464

15 15 10 0.6 0.4 0.24

16 17 8 0.68 0.32 0.2176

17 13 12 0.52 0.48 0.2496

18 15 10 0.6 0.4 0.24

19 15 10 0.6 0.4 0.24

20 15 10 0.6 0.4 0.24

21 16 9 0.64 0.36 0.2304

22 13 12 0.52 0.48 0.2496

99

23 18 7 0.72 0.28 0.2016

24 14 11 0.56 0.44 0.2464

25 14 11 0.56 0.44 0.2464

26 13 12 0.52 0.48 0.2496

27 18 7 0.72 0.28 0.2016

28 14 11 0.56 0.44 0.2464

29 15 10 0.6 0.4 0.24

30 13 12 0.52 0.48 0.2496

31 18 7 0.72 0.28 0.2016

32 15 10 0.6 0.4 0.24

33 16 9 0.64 0.36 0.2304

34 14 11 0.56 0.44 0.2464

35 14 11 0.56 0.44 0.2464

36 15 10 0.6 0.4 0.24

37 14 11 0.56 0.44 0.2464

38 14 11 0.56 0.44 0.2464

39 14 11 0.56 0.44 0.2464

40 16 9 0.64 0.36 0.2304

41 14 11 0.56 0.44 0.2464

42 17 8 0.68 0.32 0.2176

43 14 11 0.56 0.44 0.2464

44 16 9 0.64 0.36 0.2304

45 16 9 0.64 0.36 0.2304

46 16 9 0.64 0.36 0.2304

47 15 10 0.6 0.4 0.24

48 15 10 0.6 0.4 0.24

100

49 16 9 0.64 0.36 0.2304

50 17 8 0.68 0.32 0.2176

Total 11.6773

Mean =N

X∑

= 764 = 30.56

25

S²= n

nXX /)( 22

∑∑ −

= 25

25/)764(25646 2−

= 25

84.2334725646 −

2298.16 = 91.9264

25

KR20 = 1−K

K(1-

2S

pq∑)

KR20 = 150

50

−

(1- 9264.91

6773.11) =

49

50(1-

9264.91

6773.11)

= 1.020408163 x (1 – 0.127028797)

KR20 = 1.020408163 x 0.872971203

KR20 = 0.890786941

KR20 = 0.89

101

APPENDIX H

BILOG-MG V3.0

REV 19990329.1300

BILOG-MG ITEM MAINTENANCE PROGRAM: LOGISTIC ITEM RESPONSE

MODEL

*** BILOG-MG ITEM MAINTENANCE PROGRAM ***

*** PHASE 2 ***

3PL MODEL ANALYSIS OF ECONOMICS ACHIEVEMENT TEST

0

>CALIB ACCel = 1.0000;

CALIBRATION PARAMETERS

======================

MAXIMUM NUMBER OF EM CYCLES: 20

MAXIMUM NUMBER OF NEWTON CYCLES: 2

CONVERGENCE CRITERION: 0.0100

ACCELERATION CONSTANT: 1.0000

LATENT DISTRIBUTION: NORMAL PRIOR FOR EACH GROUP

PLOT EMPIRICAL VS. FITTED ICC'S: NO

DATA HANDLING: DATA ON SCRATCH FILE

CONSTRAINT DISTRIBUTION ON ASYMPTOTES: YES

CONSTRAINT DISTRIBUTION ON SLOPES: YES

CONSTRAINT DISTRIBUTION ON THRESHOLDS: NO

SOURCE OF ITEM CONSTRAINT DISTIBUTION

MEANS AND STANDARD DEVIATIONS: PROGRAM DEFAULTS

102

SUBTEST TEST0001; ITEM PARAMETERS AFTER CYCLE 13

ITEM INTERCEPT SLOPE THRESHOLD LOADING ASYMPTOTE CHISQ DF

S.E. S.E. S.E. S.E. S.E. (PROB)

-------------------------------------------------------------------------------

ITEM0001 | -0.137 | 0.117 | 1.174 | 0.116 | 0.080 | 51.2 8.0

| 0.042* | 0.025* | 0.437* | 0.025* | 0.015* | (0.10000)

| | | | | |

ITEM0002 | 0.231 | 0.200 | -1.153 | 0.196 | 0.030 | 37.6 8.0

| 0.040* | 0.033* | 0.270* | 0.033* | 0.010* | (0.0000)

| | | | | |

ITEM0003 | 0.138 | 0.390 | -0.354 | 0.364 | 0.093 | 67.3 8.0

| 0.045* | 0.040* | 0.122* | 0.037* | 0.018* | (0.1200)

| | | | | |

ITEM0004 | 0.218 | 0.491 | -0.444 | 0.440 | 0.001 | 48.5 8.0

| 0.042* | 0.043* | 0.093* | 0.039* | 0.006* | (0.0000)

| | | | | |

ITEM0005 | 0.161 | 0.454 | -0.354 | 0.413 | 0.101 | 51.9 8.0

| 0.041* | 0.044* | 0.096* | 0.040* | 0.005* | (0.2034)

| | | | | |

ITEM0006 | 0.118 | 0.439 | -0.268 | 0.402 | 0.000 | 47.6 8.0

| 0.040* | 0.045* | 0.099* | 0.041* | 0.003* | (0.0000)

| | | | | |

ITEM0007 | 0.094 | 0.582 | -0.161 | 0.503 | 0.012 | 96.6 8.0

103

| 0.084* | 0.066* | 0.159* | 0.057* | 0.048* | (0.1544)

| | | | | |

ITEM0008 | 0.057 | 0.793 | -0.072 | 0.622 | 0.082 | 30.5 8.0

| 0.045* | 0.059* | 0.058* | 0.046* | 0.003* | (0.0002)

| | | | | |

ITEM0009 | -0.709 | 0.967 | 0.733 | 0.695 | 0.177 | 90.9 8.0

| 0.172* | 0.169* | 0.087* | 0.121* | 0.032* | (0.1441)

| | | | | |

ITEM0010 | 0.318 | 0.524 | -0.606 | 0.464 | 0.002 | 46.7 8.0

| 0.045* | 0.045* | 0.096* | 0.040* | 0.014* | (0.0500)

| | | | | |

ITEM0011 | 0.116 | 0.425 | -0.274 | 0.392 | 0.020 | 79.4 8.0

| 0.086* | 0.051* | 0.221* | 0.047* | 0.062* | (0.0000)

| | | | | |

ITEM0012 | 0.361 | 0.386 | -0.936 | 0.360 | 0.001 | 57.1 8.0

| 0.042* | 0.041* | 0.140* | 0.039* | 0.007* | (0.1633)

| | | | | |

ITEM0013 | -0.159 | 0.958 | 0.166 | 0.692 | 0.051 | 31.5 8.0

| 0.050* | 0.070* | 0.050* | 0.050* | 0.007* | (0.0343)

| | | | | |

ITEM0014 | 0.332 | 0.467 | -0.711 | 0.423 | 0.011 | 18.2 8.0

| 0.042* | 0.044* | 0.109* | 0.040* | 0.005* | (0.0113)

| | | | | |

ITEM0015 | 0.245 | 0.514 | -0.478 | 0.457 | 0.001 | 55.0 8.0

104

| 0.042* | 0.043* | 0.093* | 0.039* | 0.004* | (0.0812)

| | | | | |

ITEM0016 | 0.488 | 0.233 | -2.096 | 0.227 | 0.021 | 35.2 8.0

| 0.042* | 0.037* | 0.358* | 0.036* | 0.009* | (0.0001)

| | | | | |

ITEM0017 | 0.310 | 0.130 | -2.378 | 0.129 | 0.134 | 31.4 8.0

| 0.040* | 0.027* | 0.577* | 0.027* | 0.010* | (0.0745)

| | | | | |

ITEM0018 | 0.049 | 0.881 | -0.056 | 0.661 | 0.146 | 84.2 8.0

| 0.048* | 0.061* | 0.056* | 0.046* | 0.002* | (0.0000)

| | | | | |

ITEM0019 | 0.262 | 0.510 | -0.515 | 0.454 | 0.000 | 76.0 8.0

| 0.041* | 0.048* | 0.095* | 0.043* | 0.003* | (0.0000)

| | | | | |

ITEM0020 | -0.193 | 0.559 | 0.346 | 0.488 | 0.122 | 43.9 8.0

| 0.042* | 0.052* | 0.075* | 0.046* | 0.003* | (0.0333)

| | | | | |

ITEM0021 | 0.392 | 0.668 | -0.588 | 0.555 | 0.001 | 31.7 8.0

| 0.045* | 0.049* | 0.076* | 0.041* | 0.005* | (0.0912)

| | | | | |

ITEM0022 | -0.147 | 1.133 | 0.130 | 0.750 | 0.070 | 44.2 8.0

| 0.117* | 0.130* | 0.093* | 0.086* | 0.036* | (0.1823)

| | | | | |

ITEM0023 | -0.285 | 1.094 | 0.261 | 0.738 | 0.323 | 77.4 8.0

105

| 0.170* | 0.175* | 0.125* | 0.118* | 0.041* | (0.0000)

| | | | | |

ITEM0024 | -0.233 | 1.213 | 0.192 | 0.772 | 0.007 | 13.7 8.0

| 0.095* | 0.132* | 0.065* | 0.084* | 0.023* | (0.0566)

| | | | | |

ITEM0025 | 0.176 | 0.597 | -0.295 | 0.513 | 0.041 | 40.0 8.0

| 0.044* | 0.045* | 0.079* | 0.039* | 0.009* | (0.0000)

| | | | | |

ITEM0026 | 0.190 | 0.318 | -0.597 | 0.303 | 0.003 | 84.2 8.0

| 0.044* | 0.039* | 0.154* | 0.037* | 0.019* | (0.1296)

| | | | | |

ITEM0027 | 0.280 | 0.188 | -1.493 | 0.185 | 0.001 | 18.0 8.0

| 0.040* | 0.033* | 0.329* | 0.032* | 0.009* | (0.0215)

| | | | | |

ITEM0028 | 0.383 | 0.655 | -0.586 | 0.548 | 0.001 | 79.0 8.0

| 0.046* | 0.055* | 0.076* | 0.046* | 0.008* | (0.0635)

| | | | | |

ITEM0029 | -0.056 | 1.287 | 0.044 | 0.790 | 0.092 | 46.0 8.0

| 0.096* | 0.138* | 0.072* | 0.085* | 0.026* | (0.0723)

| | | | | |

ITEM0030 | 0.539 | 0.742 | -0.727 | 0.596 | 0.000 | 43.7 8.0

| 0.048* | 0.058* | 0.077* | 0.046* | 0.003* | (0.0000)

| | | | | |

ITEM0031 | 0.140 | 0.969 | -0.144 | 0.696 | 0.002 | 21.3 8.0

106

| 0.052* | 0.070* | 0.055* | 0.050* | 0.012* | (0.0034)

| | | | | |

ITEM0032 | 0.345 | 0.249 | -1.383 | 0.242 | 0.052 | 103.4 8.0

| 0.041* | 0.035* | 0.242* | 0.034* | 0.012* | (0.08209)

| | | | | |

ITEM0033 | 0.276 | 0.600 | -0.461 | 0.515 | 0.001 | 48.7 8.0

| 0.043* | 0.049* | 0.079* | 0.042* | 0.005* | (0.000)

| | | | | |

ITEM0034 | -0.252 | 1.143 | 0.221 | 0.752 | 0.160 | 45.4 8.0

| 0.132* | 0.157* | 0.094* | 0.104* | 0.038* | (0.0148)

| | | | | |

ITEM0035 | -0.045 | 0.447 | 0.101 | 0.408 | 0.100 | 92.6 8.0

| 0.040* | 0.047* | 0.089* | 0.043* | 0.003* | (0.2357)

| | | | | |

ITEM0036 | 0.057 | 0.614 | -0.092 | 0.523 | 0.000 | 55.2 8.0

| 0.042* | 0.055* | 0.070* | 0.046* | 0.003* | (0.0000)

| | | | | |

ITEM0037 | -1.464 | 3.297 | 0.444 | 0.957 | 0.000 | 52.1 8.0

| 0.605* | 0.988* | 0.063* | 0.287* | 0.025* | (0.0000)

| | | | | |

ITEM0038 | -0.058 | 1.016 | 0.057 | 0.713 | 0.242 | 29.3 8.0

| 0.145* | 0.155* | 0.138* | 0.108* | 0.051* | (0.0003)

| | | | | |

ITEM0039 | -0.275 | 0.929 | 0.295 | 0.681 | 0.000 | 23.2 8.0

107

| 0.053* | 0.067* | 0.050* | 0.049* | 0.001* | (0.2647)

| | | | | |

ITEM0040 | -0.346 | 1.289 | 0.268 | 0.790 | 0.404 | 179.9 8.0

| 0.196* | 0.201* | 0.120* | 0.123* | 0.037* | (0.0000)

| | | | | |

ITEM0041 | 0.242 | 0.722 | -0.335 | 0.585 | 0.114 | 77.8 8.0

| 0.045* | 0.053* | 0.066* | 0.043* | 0.007* | (0.04498)

| | | | | |

ITEM0042 | -0.203 | 0.845 | 0.240 | 0.645 | 0.000 | 23.8 8.0

| 0.048* | 0.065* | 0.053* | 0.049* | 0.001* | (0.0006)

| | | | | |

ITEM0043 | -0.590 | 1.714 | 0.344 | 0.864 | 0.251 | 70.4 8.0

| 0.190* | 0.283* | 0.068* | 0.143* | 0.029* | (0.0000)

| | | | | |

ITEM0044 | -0.120 | 0.890 | 0.135 | 0.665 | 0.234 | 116.5 8.0

| 0.153* | 0.137* | 0.157* | 0.102* | 0.053* | (0.0000)

| | | | | |

ITEM0045 | 0.173 | 0.973 | -0.177 | 0.697 | 0.000 | 26.1 8.0

| 0.049* | 0.065* | 0.054* | 0.047* | 0.002* | (0.0875)

| | | | | |

ITEM0046 | 0.425 | 0.380 | -1.117 | 0.356 | 0.002 | 41.0 8.0

| 0.043* | 0.041* | 0.150* | 0.038* | 0.011* | (0.0000)

| | | | | |

ITEM0047 | 0.334 | 0.568 | -0.587 | 0.494 | 0.071 | 138.4 8.0

108

| 0.043* | 0.048* | 0.087* | 0.042* | 0.005* | (0.1341)

| | | | | |

ITEM0048 | -0.052 | 0.667 | 0.078 | 0.555 | 0.001 | 33.5 8.0

| 0.044* | 0.051* | 0.065* | 0.042* | 0.005* | (0.0001)

| | | | | |

ITEM0049 | 0.205 | 0.319 | -0.643 | 0.304 | 0.153 | 45.5 8.0

| 0.045* | 0.039* | 0.157* | 0.037* | 0.020* | (0.02293)

| | | | | |

ITEM0050 | 0.079 | 0.207 | -0.379 | 0.203 | 0.001 | 94.3 8.0

| 0.039* | 0.035* | 0.199* | 0.035* | 0.009* | (0.0861)

-----------------------------------------------------------------------------

* STANDARD ERROR

LARGEST CHANGE = 0.019468 2876.8 387.0

109

APPENDIX I

BILOG-MG V3.0

REV 19990329.1300

BILOG-MG ITEM MAINTENANCE PROGRAM: LOGISTIC ITEM RESPONSE

MODEL

*** BILOG-MG ITEM MAINTENANCE PROGRAM ***

*** PHASE 2 ***

DIF MODEL ANALYSIS OF ECONOMICS ACHIEVEMENT TEST BY GENDER

0

>CALIB ACCel = 1.0000;

CALIBRATION PARAMETERS

======================

MAXIMUM NUMBER OF EM CYCLES: 20

MAXIMUM NUMBER OF NEWTON CYCLES: 2

CONVERGENCE CRITERION: 0.0100

ACCELERATION CONSTANT: 1.0000

LATENT DISTRIBUTION: EMPIRICAL PRIOR FOR EACH GROUP

ESTIMATED CONCURRENTLY

WITH ITEM PARAMETERS

REFERENCE GROUP: 1

PLOT EMPIRICAL VS. FITTED ICC'S: NO

DATA HANDLING: DATA ON SCRATCH FILE

110

CONSTRAINT DISTRIBUTION ON SLOPES: NO

CONSTRAINT DISTRIBUTION ON THRESHOLDS: NO 1

GROUP 1 MALE ; ITEM PARAMETERS AFTER CYCLE 3



-------------------------------------------------------------------------------

ITEM0001 | -0.102 | 0.535 | 0.190 | 0.472 | 0.000 | 120.2 8.0

| 0.060* | 0.007* | 0.113* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0002 | 0.231 | 0.535 | -0.431 | 0.472 | 0.000 | 68.2 8.0

| 0.063* | 0.007* | 0.118* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0003 | 0.264 | 0.535 | -0.492 | 0.472 | 0.000 | 5.3 8.0

| 0.069* | 0.007* | 0.129* | 0.006* | 0.000* | (0.7242)

| | | | | |

ITEM0004 | 0.239 | 0.535 | -0.446 | 0.472 | 0.000 | 7.4 8.0

| 0.067* | 0.007* | 0.125* | 0.006* | 0.000* | (0.4953)

| | | | | |

ITEM0005 | 0.181 | 0.535 | -0.338 | 0.472 | 0.000 | 3.5 8.0

| 0.068* | 0.007* | 0.127* | 0.006* | 0.000* | (0.8999)

| | | | | |

ITEM0006 | 0.197 | 0.535 | -0.369 | 0.472 | 0.000 | 11.9 8.0

| 0.066* | 0.007* | 0.124* | 0.006* | 0.000* | (0.1559)

| | | | | |

111

ITEM0007 | 0.148 | 0.535 | -0.276 | 0.472 | 0.000 | 26.3 8.0

| 0.073* | 0.007* | 0.137* | 0.006* | 0.000* | (0.0009)

| | | | | |

ITEM0008 | 0.083 | 0.535 | -0.155 | 0.472 | 0.000 | 23.7 8.0

| 0.074* | 0.007* | 0.138* | 0.006* | 0.000* | (0.0026)

| | | | | |

ITEM0009 | -0.231 | 0.535 | 0.432 | 0.472 | 0.000 | 6.2 8.0

| 0.069* | 0.007* | 0.129* | 0.006* | 0.000* | (0.6230)

| | | | | |

ITEM0010 | 0.417 | 0.535 | -0.779 | 0.472 | 0.000 | 3.0 8.0

| 0.070* | 0.007* | 0.131* | 0.006* | 0.000* | (0.9334)

| | | | | |

ITEM0011 | 0.214 | 0.535 | -0.399 | 0.472 | 0.000 | 5.0 8.0

| 0.068* | 0.007* | 0.127* | 0.006* | 0.000* | (0.7601)

| | | | | |

ITEM0012 | 0.400 | 0.535 | -0.747 | 0.472 | 0.000 | 6.6 8.0

| 0.069* | 0.007* | 0.129* | 0.006* | 0.000* | (0.5837)

| | | | | |

ITEM0013 | 0.018 | 0.535 | -0.034 | 0.472 | 0.000 | 40.0 8.0

| 0.075* | 0.007* | 0.141* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0014 | 0.280 | 0.535 | -0.524 | 0.472 | 0.000 | 10.7 8.0

| 0.068* | 0.007* | 0.127* | 0.006* | 0.000* | (0.2223)

| | | | | |

112

ITEM0015 | 0.383 | 0.535 | -0.714 | 0.472 | 0.000 | 6.3 8.0

| 0.071* | 0.007* | 0.132* | 0.006* | 0.000* | (0.6171)

| | | | | |

ITEM0016 | 0.515 | 0.535 | -0.963 | 0.472 | 0.000 | 49.4 8.0

| 0.067* | 0.007* | 0.125* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0017 | 0.340 | 0.535 | -0.635 | 0.472 | 0.000 | 92.8 8.0

| 0.062* | 0.007* | 0.117* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0018 | 0.018 | 0.535 | -0.034 | 0.472 | 0.000 | 89.8 8.0

| 0.078* | 0.007* | 0.145* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0019 | 0.239 | 0.535 | -0.446 | 0.472 | 0.000 | 30.1 8.0

| 0.068* | 0.007* | 0.127* | 0.006* | 0.000* | (0.0002)

| | | | | |

ITEM0020 | -0.183 | 0.535 | 0.341 | 0.472 | 0.000 | 13.2 8.0

| 0.070* | 0.007* | 0.132* | 0.006* | 0.000* | (0.1053)

| | | | | |

ITEM0021 | 0.357 | 0.535 | -0.666 | 0.472 | 0.000 | 9.5 8.0

| 0.075* | 0.007* | 0.140* | 0.006* | 0.000* | (0.2983)

| | | | | |

ITEM0022 | 0.075 | 0.535 | -0.140 | 0.472 | 0.000 | 27.5 8.0

| 0.073* | 0.007* | 0.137* | 0.006* | 0.000* | (0.0006)

| | | | | |

113

ITEM0023 | 0.470 | 0.535 | -0.878 | 0.472 | 0.000 | 2.1 8.0

| 0.071* | 0.007* | 0.133* | 0.006* | 0.000* | (0.9792)

| | | | | |

ITEM0024 | -0.126 | 0.535 | 0.236 | 0.472 | 0.000 | 80.2 8.0

| 0.078* | 0.007* | 0.147* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0025 | 0.264 | 0.535 | -0.492 | 0.472 | 0.000 | 20.7 8.0

| 0.074* | 0.007* | 0.139* | 0.006* | 0.000* | (0.0079)

| | | | | |

ITEM0026 | 0.206 | 0.535 | -0.384 | 0.472 | 0.000 | 16.1 8.0

| 0.067* | 0.007* | 0.125* | 0.006* | 0.000* | (0.0413)

| | | | | |

ITEM0027 | 0.231 | 0.535 | -0.431 | 0.472 | 0.000 | 107.2 8.0

| 0.062* | 0.007* | 0.115* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0028 | 0.409 | 0.535 | -0.763 | 0.472 | 0.000 | 4.0 8.0

| 0.074* | 0.007* | 0.138* | 0.006* | 0.000* | (0.8573)

| | | | | |

ITEM0029 | 0.205 | 0.535 | -0.384 | 0.472 | 0.000 | 15.2 8.0

| 0.074* | 0.007* | 0.138* | 0.006* | 0.000* | (0.0555)

| | | | | |

ITEM0030 | 0.452 | 0.535 | -0.845 | 0.472 | 0.000 | 23.8 8.0

| 0.075* | 0.007* | 0.140* | 0.006* | 0.000* | (0.0024)

| | | | | |

114

ITEM0031 | 0.156 | 0.535 | -0.292 | 0.472 | 0.000 | 10.4 8.0

| 0.073* | 0.007* | 0.137* | 0.006* | 0.000* | (0.2405)

| | | | | |

ITEM0032 | 0.497 | 0.535 | -0.929 | 0.472 | 0.000 | 9.2 8.0

| 0.073* | 0.007* | 0.136* | 0.006* | 0.000* | (0.3239)

| | | | | |

ITEM0033 | 0.239 | 0.535 | -0.446 | 0.472 | 0.000 | 6.9 8.0

| 0.073* | 0.007* | 0.137* | 0.006* | 0.000* | (0.5420)

| | | | | |

ITEM0034 | 0.010 | 0.535 | -0.019 | 0.472 | 0.000 | 9.0 8.0

| 0.072* | 0.007* | 0.134* | 0.006* | 0.000* | (0.3430)

| | | | | |

ITEM0035 | 0.019 | 0.535 | -0.035 | 0.472 | 0.000 | 0.9 8.0

| 0.069* | 0.007* | 0.129* | 0.006* | 0.000* | (0.9958)

| | | | | |

ITEM0036 | 0.043 | 0.535 | -0.080 | 0.472 | 0.000 | 15.7 8.0

| 0.071* | 0.007* | 0.132* | 0.006* | 0.000* | (0.0472)

| | | | | |

ITEM0037 | 0.255 | 0.535 | -0.477 | 0.472 | 0.000 | 10.4 8.0

| 0.074* | 0.007* | 0.138* | 0.006* | 0.000* | (0.2357)

| | | | | |

ITEM0038 | 0.382 | 0.535 | -0.714 | 0.472 | 0.000 | 10.1 8.0

| 0.075* | 0.007* | 0.140* | 0.006* | 0.000* | (0.2557)

| | | | | |

115

ITEM0039 | -0.248 | 0.535 | 0.463 | 0.472 | 0.000 | 44.9 7.0

| 0.076* | 0.007* | 0.141* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0040 | 0.552 | 0.535 | -1.031 | 0.472 | 0.000 | 10.4 8.0

| 0.076* | 0.007* | 0.141* | 0.006* | 0.000* | (0.2403)

| | | | | |

ITEM0041 | 0.272 | 0.535 | -0.508 | 0.472 | 0.000 | 31.1 8.0

| 0.076* | 0.007* | 0.142* | 0.006* | 0.000* | (0.0001)

| | | | | |

ITEM0042 | -0.223 | 0.535 | 0.417 | 0.472 | 0.000 | 31.1 8.0

| 0.073* | 0.007* | 0.137* | 0.006* | 0.000* | (0.0001)

| | | | | |

ITEM0043 | 0.272 | 0.535 | -0.508 | 0.472 | 0.000 | 22.8 8.0

| 0.076* | 0.007* | 0.141* | 0.006* | 0.000* | (0.0037)

| | | | | |

ITEM0044 | 0.409 | 0.535 | -0.763 | 0.472 | 0.000 | 5.3 8.0

| 0.071* | 0.007* | 0.133* | 0.006* | 0.000* | (0.7217)

| | | | | |

ITEM0045 | 0.205 | 0.535 | -0.383 | 0.472 | 0.000 | 99.8 8.0

| 0.081* | 0.007* | 0.152* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0046 | 0.534 | 0.535 | -0.997 | 0.472 | 0.000 | 13.3 8.0

| 0.071* | 0.007* | 0.132* | 0.006* | 0.000* | (0.1025)

| | | | | |

116

ITEM0047 | 0.206 | 0.535 | -0.384 | 0.472 | 0.000 | 4.7 8.0

| 0.071* | 0.007* | 0.133* | 0.006* | 0.000* | (0.7911)

| | | | | |

ITEM0048 | -0.046 | 0.535 | 0.086 | 0.472 | 0.000 | 18.1 8.0

| 0.073* | 0.007* | 0.136* | 0.006* | 0.000* | (0.0208)

| | | | | |

ITEM0049 | 0.197 | 0.535 | -0.369 | 0.472 | 0.000 | 17.9 8.0

| 0.067* | 0.007* | 0.125* | 0.006* | 0.000* | (0.0220)

| | | | | |

ITEM0050 | 0.083 | 0.535 | -0.156 | 0.472 | 0.000 | 141.6 8.0

| 0.060* | 0.007* | 0.112* | 0.006* | 0.000* | (0.0000)

-------------------------------------------------------------------------------

* STANDARD ERROR

LARGEST CHANGE = 0.005386 3540.2 378.0

(0.0000)

GROUP 2 FEMALE ; ITEM PARAMETERS AFTER CYCLE 3



-------------------------------------------------------------------------------

ITEM0001 | -0.202 | 0.535 | 0.377 | 0.472 | 0.000 | 266.8 8.0

| 0.044* | 0.007* | 0.082* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0002 | 0.277 | 0.535 | -0.518 | 0.472 | 0.000 | 113.5 8.0

117

| 0.047* | 0.007* | 0.088* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0003 | 0.089 | 0.535 | -0.167 | 0.472 | 0.000 | 22.2 8.0

| 0.048* | 0.007* | 0.091* | 0.006* | 0.000* | (0.0046)

| | | | | |

ITEM0004 | 0.219 | 0.535 | -0.410 | 0.472 | 0.000 | 7.5 8.0

| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.4884)

| | | | | |

ITEM0005 | 0.163 | 0.535 | -0.304 | 0.472 | 0.000 | 16.8 8.0

| 0.049* | 0.007* | 0.092* | 0.006* | 0.000* | (0.0323)

| | | | | |

ITEM0006 | 0.085 | 0.535 | -0.159 | 0.472 | 0.000 | 40.8 8.0

| 0.049* | 0.007* | 0.091* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0007 | 0.098 | 0.535 | -0.182 | 0.472 | 0.000 | 36.9 8.0

| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0008 | 0.067 | 0.535 | -0.126 | 0.472 | 0.000 | 37.6 8.0

| 0.054* | 0.007* | 0.100* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0009 | -0.129 | 0.535 | 0.241 | 0.472 | 0.000 | 6.5 8.0

| 0.051* | 0.007* | 0.095* | 0.006* | 0.000* | (0.5931)

| | | | | |

ITEM0010 | 0.277 | 0.535 | -0.517 | 0.472 | 0.000 | 3.2 8.0

118

| 0.051* | 0.007* | 0.095* | 0.006* | 0.000* | (0.9211)

| | | | | |

ITEM0011 | 0.093 | 0.535 | -0.175 | 0.472 | 0.000 | 6.4 8.0

| 0.050* | 0.007* | 0.093* | 0.006* | 0.000* | (0.6034)

| | | | | |

ITEM0012 | 0.381 | 0.535 | -0.712 | 0.472 | 0.000 | 19.8 8.0

| 0.050* | 0.007* | 0.093* | 0.006* | 0.000* | (0.0110)

| | | | | |

ITEM0013 | -0.142 | 0.535 | 0.266 | 0.472 | 0.000 | 105.6 8.0

| 0.055* | 0.007* | 0.103* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0014 | 0.381 | 0.535 | -0.712 | 0.472 | 0.000 | 10.7 8.0

| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.7028)

| | | | | |

ITEM0015 | 0.184 | 0.535 | -0.344 | 0.472 | 0.000 | 22.3 8.0

| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.0044)

| | | | | |

ITEM0016 | 0.572 | 0.535 | -1.069 | 0.472 | 0.000 | 109.2 8.0

| 0.049* | 0.007* | 0.092* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0017 | 0.368 | 0.535 | -0.687 | 0.472 | 0.000 | 242.6 8.0

| 0.045* | 0.007* | 0.085* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0018 | 0.097 | 0.535 | -0.182 | 0.472 | 0.000 | 90.4 8.0

119

| 0.055* | 0.007* | 0.103* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0019 | 0.286 | 0.535 | -0.534 | 0.472 | 0.000 | 30.0 8.0

| 0.050* | 0.007* | 0.093* | 0.006* | 0.000* | (0.0002)

| | | | | |

ITEM0020 | -0.185 | 0.535 | 0.345 | 0.472 | 0.000 | 20.3 8.0

| 0.050* | 0.007* | 0.093* | 0.006* | 0.000* | (0.0094)

| | | | | |

ITEM0021 | 0.385 | 0.535 | -0.720 | 0.472 | 0.000 | 9.8 8.0

| 0.054* | 0.007* | 0.102* | 0.006* | 0.000* | (0.0188)

| | | | | |

ITEM0022 | 0.041 | 0.535 | -0.077 | 0.472 | 0.000 | 147.4 8.0

| 0.057* | 0.007* | 0.107* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0023 | 0.317 | 0.535 | -0.592 | 0.472 | 0.000 | 4.2 8.0

| 0.053* | 0.007* | 0.099* | 0.006* | 0.000* | (0.8359)

| | | | | |

ITEM0024 | -0.070 | 0.535 | 0.130 | 0.472 | 0.000 | 134.8 8.0

| 0.057* | 0.007* | 0.106* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0025 | 0.137 | 0.535 | -0.255 | 0.472 | 0.000 | 4.5 8.0

| 0.051* | 0.007* | 0.096* | 0.006* | 0.000* | (0.8131)

| | | | | |

ITEM0026 | 0.211 | 0.535 | -0.394 | 0.472 | 0.000 | 71.9 8.0

120

| 0.048* | 0.007* | 0.089* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0027 | 0.368 | 0.535 | -0.686 | 0.472 | 0.000 | 107.2 8.0

| 0.047* | 0.007* | 0.088* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0028 | 0.349 | 0.535 | -0.652 | 0.472 | 0.000 | 6.7 8.0

| 0.054* | 0.007* | 0.100* | 0.006* | 0.000* | (0.5697)

| | | | | |

ITEM0029 | 0.132 | 0.535 | -0.246 | 0.472 | 0.000 | 200.0 8.0

| 0.059* | 0.007* | 0.110* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0030 | 0.513 | 0.535 | -0.958 | 0.472 | 0.000 | 45.0 8.0

| 0.056* | 0.007* | 0.105* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0031 | 0.141 | 0.535 | -0.262 | 0.472 | 0.000 | 101.8 8.0

| 0.057* | 0.007* | 0.106* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0032 | 0.331 | 0.535 | -0.618 | 0.472 | 0.000 | 61.0 8.0

| 0.048* | 0.007* | 0.090* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0033 | 0.295 | 0.535 | -0.551 | 0.472 | 0.000 | 5.9 8.0

| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.6534)

| | | | | |

ITEM0034 | 0.189 | 0.535 | -0.352 | 0.472 | 0.000 | 21.0 8.0

121

| 0.053* | 0.007* | 0.100* | 0.006* | 0.000* | (0.0071)

| | | | | |

ITEM0035 | -0.082 | 0.535 | 0.153 | 0.472 | 0.000 | 68.6 8.0

| 0.048* | 0.007* | 0.090* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0036 | 0.076 | 0.535 | -0.142 | 0.472 | 0.000 | 12.4 8.0

| 0.051* | 0.007* | 0.095* | 0.006* | 0.000* | (0.1341)

| | | | | |

ITEM0037 | 0.313 | 0.535 | -0.584 | 0.472 | 0.000 | 9.4 8.0

| 0.052* | 0.007* | 0.098* | 0.006* | 0.000* | (0.3085)

| | | | | |

ITEM0038 | 0.344 | 0.535 | -0.643 | 0.472 | 0.000 | 4.2 8.0

| 0.053* | 0.007* | 0.099* | 0.006* | 0.000* | (0.8371)

| | | | | |

ITEM0039 | -0.151 | 0.535 | 0.282 | 0.472 | 0.000 | 78.9 8.0

| 0.055* | 0.007* | 0.103* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0040 | 0.489 | 0.535 | -0.913 | 0.472 | 0.000 | 14.5 8.0

| 0.053* | 0.007* | 0.099* | 0.006* | 0.000* | (0.0702)

| | | | | |

ITEM0041 | 0.215 | 0.535 | -0.401 | 0.472 | 0.000 | 11.1 8.0

| 0.053* | 0.007* | 0.098* | 0.006* | 0.000* | (0.1975)

| | | | | |

ITEM0042 | -0.099 | 0.535 | 0.186 | 0.472 | 0.000 | 68.4 8.0

122

| 0.055* | 0.007* | 0.102* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0043 | 0.167 | 0.535 | -0.312 | 0.472 | 0.000 | 11.2 8.0

| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.1931)

| | | | | |

ITEM0044 | 0.232 | 0.535 | -0.434 | 0.472 | 0.000 | 24.5 8.0

| 0.053* | 0.007* | 0.100* | 0.006* | 0.000* | (0.0019)

| | | | | |

ITEM0045 | 0.149 | 0.535 | -0.279 | 0.472 | 0.000 | 83.8 8.0

| 0.056* | 0.007* | 0.104* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0046 | 0.418 | 0.535 | -0.781 | 0.472 | 0.000 | 13.3 8.0

| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.4978)

| | | | | |

ITEM0047 | 0.404 | 0.535 | -0.755 | 0.472 | 0.000 | 2.0 8.0

| 0.053* | 0.007* | 0.099* | 0.006* | 0.000* | (0.9803)

| | | | | |

ITEM0048 | -0.027 | 0.535 | 0.050 | 0.472 | 0.000 | 18.0 8.0

| 0.052* | 0.007* | 0.097* | 0.006* | 0.000* | (0.0214)

| | | | | |

ITEM0049 | 0.242 | 0.535 | -0.451 | 0.472 | 0.000 | 76.2 8.0

| 0.048* | 0.007* | 0.089* | 0.006* | 0.000* | (0.0000)

| | | | | |

ITEM0050 | 0.085 | 0.535 | -0.159 | 0.472 | 0.000 | 228.0 8.0

123

| 0.045* | 0.007* | 0.083* | 0.006* | 0.000* | (0.0000)

-------------------------------------------------------------------------------

* STANDARD ERROR

ani, elizabeth ngozika pg/m.ed/10/52484 application of

Documents