test design
Post on 07-Nov-2014
67 Views
Preview:
DESCRIPTION
TRANSCRIPT
Language Assessment Design Dordt College Placement Exam
Vicky Fang & Hala Sun
Original Test Design: Placement Exam ii
Table of Contents
BACKGROUND INFORMATION .............................................................................................1
OVERVIEW .......................................................................................................................................................... 1 History......................................................................................................................................1
Target population.....................................................................................................................2
Purpose of the placement test ..................................................................................................3
Test description ........................................................................................................................3
TEST DEVELOPMENT PROCESS ................................................................................................................. 6 GENERAL ADMINISTRATION PROCEDURES ......................................................................................... 7 CONSTRUCTS ..................................................................................................................................................... 8
I. Listening comprehension ......................................................................................................8
II. Grammar .............................................................................................................................8
III. Reading comprehension .....................................................................................................9
IV. Writing ability ....................................................................................................................9
V. Oral skills ..........................................................................................................................10
SCORING AND INTERPRETATION ......................................................................................11
ANALYSIS ...................................................................................................................................16
APPLYING WESCHE’S (1983) FOUR COMPONENTS .........................................................................16 APPLYING SWAIN’S (1983) FOUR PRINCIPLES ..................................................................................18
EXAMNING VALIDITY AND RELIABILITY ......................................................................20
VALIDITY OF M-C QUESTIONS ..................................................................................................................21 Item facility ............................................................................................................................21
Distractor analysis .................................................................................................................22
Item discrimination ................................................................................................................23
Response frequency distribution ............................................................................................25
RELIABILITY ....................................................................................................................................................27 Inter-rater reliability ..............................................................................................................21
SUBTEST RELATIONSHIPS ....................................................................................................32
DISCUSSION ...............................................................................................................................35
CONCLUSION ............................................................................................................................38
REFERENCES .............................................................................................................................40
Original Test Design: Placement Exam 1
Background and Information
Overview
History. In 2012, as a language assessment project with Dr. Kathleen M. Bailey, we
(Hala Sun and Vicky Fang of the Monterey Institute of International Studies [MIIS]) re-designed
the Dordt College Placement Test (DCPT) (Hala Sun is a Dordt College alumna). The DCPT is a
specialized assessment tool to measure incoming international and exchange students’ academic
English language proficiency, specifically their listening, reading, writing, and speaking skills as
well as their grammatical knowledge. Unlike the previous DCPT, this newly designed test
includes a section called “Grammar.”
Concurrent with this language assessment project, we also designed an academic writing
course curriculum for the English for Academic Purposes (EAP) program at Dordt for our
curriculum design project. As part of the design process, we surveyed the current and the past
international students and interviewed the EAP and the English instructors at Dordt College.
Based on our needs analysis, we learned that the need to improve international students’
grammatical competence was crucial. Furthermore, we also found out that the DCPT, which
determines whether students need to take the EAP courses during their first semester, was
designed 16 years ago in 1996 by the current EAP course instructor, Sanneke Kok. We strongly
felt the need to “update” the stimulus materials presented in the previous DCPT because we
believe that the relevancy and the authenticity of stimulus materials affects the abilities that we
want to assess (Wesche, 1983). Based on our interview, Instructor Kok had attempted to update
the DCPT, but due to limited time and resources, she was only able to make minor changes to the
scoring rubrics 2 years ago; the content and the type of test methods were not revised. The
stimulus material for the listening comprehension subtest (a mock lecture) was slightly changed
Original Test Design: Placement Exam 2
over the years--the professor giving the lecture was changed. In addition, there had been no tests
conducted to examine the reliability and the validity of the DCPT (Sanneke Kok, personal
communication, September 24, 2012).
Applying all the concepts from our language assessment course, we envisioned this test
to be comprehensive and appropriate to the needs of the stakeholders. This newly designed
DCPT is still “organic” and may need follow-up revisions upon administering this test to
incoming international students at Dordt College. Nevertheless, we feel confident about the
foundations of this test because (1) we designed this test, following the decision-making format
presented by Alderson, Clapham, and Wall (1995); (2) we pre-piloted and piloted the new DCPT
with the current international students at Dordt; and (3) we ran various statistical tests to ensure
the validity and the reliability of this test.
Target population. All students admitted to Dordt College whose English is not their
native language (this includes exchange students and ESL students) are required to take the
DCPT. According to Instructor Kok, the number of international students admitted to Dordt
varies every year, but on average, 10 to 12 students take her EAP courses every semester. To
“pass” the old DCPT, students need to score at least 80% on the essay writing and 70% on all of
each of the remaining subtests (listening, reading comprehension, and speaking). Through our
needs analysis, we found out that the English professors have high academic expectations from
their students, especially in writing and grammar competency. Therefore, we decided to keep the
current 70%–80% standard except for one minor change. We included the newly added subtest,
grammar, into the 80% standard category. The students who do not score 80% or above on both
the essay writing and the grammar subtests are required to take the EAP reading and writing
Original Test Design: Placement Exam 3
course (Academic Writing from Sources). Similarly, if students do not score 70% or above, they
have to take the EAP speaking and listening course (Academic Interaction).
Purpose of the placement test. The revised DCPT is designed to provide an accurate
evaluation of international students’ academic English language skills, assessing their potential
to be successful in their college academic life. Specifically, this test helps determine whether
these international students have sufficient academic English skills and knowledge to take the
general courses at Dordt, especially English courses. International students who do not “pass”
the placement exam have to take either or both of the EAP courses offered in their first semester.
Once the international students complete these EAP courses, then they can register for general
English core courses.
As we re-designed the DCPT, we constantly made sure that the constructs assessed in our
DCPT matched the two EAP courses offered. We reviewed the curricula of these two courses
because we wanted to examine whether the areas or skills that students need further
improvement, based on the results of their DCPT, are covered in the current EAP courses.
Currently, the Academic Writing from Source course helps students to improve their academic
reading and writing skills, especially focusing on how to integrate various sources and to make
appropriate citations using standard documentation styles. The Academic Interaction course
focuses on helping students to develop and strengthen their speaking and listening skills of
academic English.
Test description. To understand the new DCPT, we must first examine the original
placement test created by the EAP instructor, Sanneke Kok. The previous DCPT had the
following four subtests:
Original Test Design: Placement Exam 4
Subtest 1: Oral interview. This subtest required the test-takers to answer several
questions posed by a test administrator. This subtest assessed the test-takers’ oral fluency and
accuracy in speaking.
Subtest 2: Mini article. This subtest comprised 10 multiple-choice questions based on a
sample reading. This subtest assessed the test-takers’ vocabulary knowledge and reading
comprehension.
Subtest 3: Mini lecture. This subtest required the test-takers to watch a video clip of a
mock lecture, testing their listening comprehension. After watching the clip, students had to
answer the true or false questions presented orally by the lecturer in the video. In addition,
students had to fill in the missing blanks of the given table.
Subtest 4: Writing prompt. For this subtest, the test-takers had to compose an essay
according to the given prompt. This subtest assessed the test-takers’ academic writing ability,
which includes organization skills and grammar.
Subtests 1, 2, and 3 were used to determine whether the test-takers needed to take the
Academic Interaction and subtest 4 was used to decide whether the test-takers had to take the
Academic Writing from Sources.
For our newly designed test, we have an overarching theme of language learning. This test has
the following five subtests:
Subtest 1: Listening comprehension. This subtest consists of five short-answer questions
and measures the test-takers’ ability to comprehend a speech from a video clip (Ted Talk). In this
video, the presenter discusses the concept of English manias and the implications of the spread or
the dominance of the English language. The test-takers have to identify various important
Original Test Design: Placement Exam 5
information from the video to be able to answer the short-answer questions. The maximum
allotted time for this subtest is 10 minutes.
Subtest 2: Grammar. This is a cloze-elide subtest, in which the test-takers have to
identify 15 “extra” words that make the sentence(s) within the given text ungrammatical. The
test-takers are required to cross out these extra words. The instructions indicate that there are
exactly 15 “extra” words to cross out. This given text, taken from the New York Times
newspaper, relates to the topic of language learning and immersion. The maximum allotted time
for this subtest is 5 minutes.
Subtest 3: Reading comprehension. This third subtest consists of 10 multiple-choice (M-
C) questions and measures the test-takers’ vocabulary knowledge, reading comprehension, and
grammar. There are two articles within this subtest, each having five M-C questions. The first
article is a short narrative story of the world’s oldest learner. The second part is an excerpt of an
article that discusses the influence of mother tongue and language learning experience. The
maximum allotted time for this subtest is 20 minutes.
Subtest 4: Mini-essay writing. This subtest measures the test-takers’ academic writing
ability. Specifically, the essay’s content and organization are assessed as well as the test-takers’
correct use of grammar. This subtest requires the test-takers to explain using specific examples
whether or not they think learning English is important. The maximum allotted time for this
subtest is 20 minutes. The test-takers are required to write at least 180 words.
Subtest 5: Oral interview. This subtest measures the test-takers’ speaking ability,
specifically their fluency and accuracy in speech (e.g., grammar, pronunciation, and coherence).
Furthermore, their content and comprehension are also assessed by examining the relevancy of
their answer to the given prompt including their examples to support their stance. The test-takers
Original Test Design: Placement Exam 6
are given 2 minutes to prepare and up to 3 minutes to answer the prompt. The prompt is written
as follows:
In the United States, many universities require students to learn an additional language
other than their native language. Do you support the idea that university students should
be required to learn an additional language (other than their native languages)? Why or
why not?
For this new DCPT, subtests 1, 3, and 5 will be used to determine whether the test-takers
need to take the Academic Interaction, and subtest 2 and 4 will be used to decide whether the
test-takers have to take the Academic Writing from Sources.
Test Development Process
The following table shows the steps we took to design this new DCPT:
Table 1
Dordt College Placement Test Development Process
Step 1: Decision-making
- Examined the (old) DCPT
- Familiarized the target population and the setting (including college goals and curricula).
- Conducted a needs analysis of the stakeholders
- Chose the constructs and the types of subtests
- Provided definition for each construct
- Determined the test methods for each construct
Step 2: Designing
- Gathered relevant, useful, and motivating stimulus materials
- Designed one subtest at a time
- Allocated specific time for each subtest
- Created the scoring criteria for each subtest
Step 3: Pre-piloting
- Pre-piloted the test with 3 TESOL MIIS classmates and the course instructor
- Revised the test based on the feedback and test results (e.g., lessened the time allotted for
each subtest; and revised M-C choice items, that were misleading, confusing, or too
obvious)
Step 4: Piloting
- Sent the revised DCPT to Instructor Kok to pilot/administer this test
- Instructor Kok returned 10 current international students’ DCPT, including the recorded
oral interview via DVD
- Scored the exams
Original Test Design: Placement Exam 7
Table 1 (Con’t)
Dordt College Placement Test Development Process
Step 5: Analysis
- Analyzed the validity of the M-C subtest, using item facility, item discriminability,
distractor analysis, and response frequency distribution
- Analyzed the reliability of the objectively scored subtests using the split-half method
- Analyzed the reliability of the scorers using the interrater reliability
Step 6: Reflections & Revisions
- Decided to make the oral interview prompt simpler (some students did not answer the
question asked)
- Made minor changes to the oral scoring criteria
- Created a model or an example for the cloze-elide test (instead of crossing out the words,
some students underlined or circled the words)
- Added more lines in the essay; some students did not meet the minimum word
requirement; we assume that the test-takers concluded their writing when they saw the
lines ending or lacking
General Administration Procedures
For test administration, Dordt College has two teams—the logistics team and the oral
interview team. The logistics team members are student volunteers recruited by Instructor Kok in
advance. Instructor Kok provides a 1-hour training to the student volunteers. We adapted the
current logistics guide and made minor changes. See Appendix A for the new logistics guide we
created and Appendix B for the original guide.
In addition, Instructor Kok recruits the oral interview team or as Instructor Kok refers to
as the Entrance Interview for International/ESL Students (EIIS) team. Every year, there are about
five to six groups of EIIS team, each team consisting three faculty members from various
disciplines (both male and female). Similar to the logistics team, the EIIS team members receive
an hour training from Instructor Kok. During the training session, Instructor Kok briefly
discusses the topic of language acquisition, as well as the benefits of being an EIIS team
member, such as gaining a “snap shot” of the new international students’ abilities and needs
(personal communication, November 14, 2012). For a sample oral interview schedule (provided
Original Test Design: Placement Exam 8
by Instructor Kok), see Appendix C.
Constructs
With Alderson, Clapham, and Wall’s (1995) guidance on developing test specifications,
we identified the five constructs for the new placement test. There are listening comprehension,
grammatical knowledge, reading comprehension, writing ability and oral skills. In addressing to
the issue of test methods, Bailey (1998) points out that indirect tests may fail to provide valid
assessment of a construct and may also have negative washback on test-takers. Wesche (1983)
also argues that integrative and direct tests are better to predict students’ use of the target
language in real life. Thus, when designing the placement test, we tried to incorporate direct and
integrative test methods to measure each construct.
I. Listening comprehension. In defining the listening construct, Buck (2001) argues that
listening tests need to be contextualized, “knowledge-independent”, “require fast, automatic, on-
line processing of texts” and “go beyond literal meaning” (p. 113). In correspondence to the
definition, we made a listening task that requires test-takers to respond to five short-answer
questions after watching a four-minute video. By doing so, we simulated a mini lecture to test
students’ abilities to recall specific words as well as students’ comprehension of the overall
speech.
II. Grammar. From our interview with Instructor Kok as well as the four English
professors at Dordt College, we have learned that the institution’s educational philosophy
stresses an emphasis on students’ grammatical competence. Therefore, we added the grammar
section in designing the test. In defining the concept of grammar, the Longman dictionary (2010)
states that “it usually takes into account the meanings and functions these sentences have in the
overall system of the language” (p. 252). Citing Larsen-Freeman (1991, 1997), Brown (2010)
Original Test Design: Placement Exam 9
also argues that grammatical knowledge includes grammatical forms, grammatical meanings and
pragmatic meaning. To implement the idea that grammatical forms are intimately associated
with grammatical meanings as well as pragmatic meaning, we inserted the grammar problems
that English as Second language (ESL) learners may encounter into an article. The grammar
problems we addressed in test include the use of articles and prepositions, adjective usage, verb
tense and subject-verb agreement. These grammar problems were intentionally selected from the
grammar criteria addressed in the analytic rubric of writing (see Appendix D for the scoring
criteria). By doing this, we hope to raise the test-takers’ awareness of these grammar problems
when they compose their writings.
III. Reading comprehension. Hedgcock and Ferris (2009) mention that from a bottom-
up view of reading, the reader starts from small units such as words and works towards large
units such as written discourse; from the top-down view, the reader’s understanding of a text is
the product of the reader’s background knowledge of the text and the information given by the
text. Thus, we designed the items that lead the test-takers to adopt both approaches to
comprehend the reading passage (Alderson, 2000). The bottom-up items include questions
asking for the interpretation of specific words. The top-down items include questions that require
the test-takers to paraphrase a sentence and recognize the implied message of a text. We included
two readings in the section, which consists of 10 multiple-choice questions.
IV. Writing ability. To study academic writing, students need to master the process of
structuring ideas into a piece of writing which shares the convention of a specific type of text
(Ferris & Hedgcock, 1998). To measure the students’ writing skills, we decided to assess the
students’ ability to write an expository essay, which is a common essay genre that college
students often encounter in academic life (Purdue Online Writing Lab, 2010). Thus, test-takers
Original Test Design: Placement Exam 10
need to write an essay of about 180-250 words to state, explain, and support their views on the
given prompt. Based on Purdue Online Writing Lab (2012), the structure of the expository essay
consists of the following main components:
A clear, concise, and defined thesis statement that occurs in the first paragraph of the
essay.
Clear and logical transitions between the introduction, body and conclusion.
Body paragraphs that include evidential support (whether factual, logical, statistical, or
anecdotal).
We used these descriptions to revise the analytic rubric developed by Instructor Kok to assess
students’ writing ability.
V. Oral skills. Luoma (2004) defines speaking tasks as “activities that involve speakers
in using language for the purpose of achieving a particular goal or objective in a particular
speaking situation” (p. 31). To effectively assess the construct, we created a prompt that requires
test-takers to expound on an argument based on a given topic. Test-takers will have two minutes
to prepare their speech and three minutes to perform their speech orally. During the preparation
time, test takers are also allowed to jot down some notes for their speech.
By having students relate the issue to a familiar environment, we hope the students will
gain confidence in discussing the topic. We also hope to maximize their opportunity to express
themselves in English by providing concrete personal examples.
Original Test Design: Placement Exam 11
Scoring and Interpretations
Different scoring criteria are used to evaluate each construct. Reading comprehension and
Grammar are both objectively scored subtests. The Reading comprehension subtest uses
multiple-choice questions to test students’ reading ability. There are 10 multiple-choice
questions, each worth 1 point. For the Grammar subtest, we created a cloze elide test method to
measure students’ grammatical knowledge. The test-taker receives one point when he/she crosses
out the extra word in the article. If the test-taker crosses out the wrong word, points would not be
deducted from his/her score. The cloze-elide test contains 15 extra words, so 15 points are
granted to the grammar subtest.
We used both exact word method and acceptable word method to evaluate the listening
comprehension construct. Bailey (1998) introduces two scoring methods to evaluate cloze tests.
Under exact word scoring method, the test-taker gets credit only when he/she writes down the
exact word in the response. In contrast, with acceptable-word scoring, the test-taker can get
credit when his/her response is “grammatically correct” and “makes good sense in the context”
(p. 61). The two methods both have merits and demerits. We used the exact word method for
evaluating responses that require accurate information from the listening, and we used the
acceptable word method to assess the test-taker’s comprehension of the overall content of the
listening. For each item scored by acceptable word method, we made a list of acceptable
answers. The total points of the listening subtest are 10 points. Each question is worth 2 points. 1
point will be deducted if the test-taker does not respond to the acceptable word questions in a
complete sentence.
The oral and writing subtests are both scored subjectively according to the respective
analytic rubrics. In setting up the rubrics for the oral presentation and essay writing, we revised
Original Test Design: Placement Exam 12
the analytic rubrics used in the original placement test. The analytic rubrics of writing include the
evaluation of three aspects, content, organization and grammar (see Appendix D). Based on the
needs analysis we conducted for our curriculum design project, we knew that both the
international students and the English department at Dordt College value grammatical
competence in language learning. Therefore, we kept grammar weighting 50% of the total
possible writing scores (100 points).
The rubric for the oral test was retained at first, but we found that this rubric was not
appropriate to score the oral test we designed. The old DCPT oral test was in the form of an
interview, so the rubrics included comprehension of the interview questions. However, the oral
test we designed is a presentation in a given scenario, so our rubrics need to assess whether the
student appropriately provides an answer based on the given prompt or not. In designing the new
rubrics for the oral test, we emphasized three main aspects of a speech: content, accuracy and
fluency (see Appendix D for the scoring criteria). The new rubrics made the total points of the
oral test increased to 40 points.
Table 2 presents our descriptive statistics based on the results of the new DCPT:
Table 2
Dordt College Placement Test Descriptive Statistics (N = 10)
Test
Points
Possible Mean Mode Median Range
Standard
Deviation Variance
Listening 10 8.4 10 10 8 2.76 7.6
Grammar 15 9 10, 13 10 14 4.64 21.56
Reading 10 6.7 8, 6 7 7 1.95 3.79
Writing 100 69.4 N/A 71.75 52.5 16.84 283.6
Oral 40 28.9 N/A 29 19.5 5.27 27.82
Total 175 122.4 N/A 127.75 101 31.46 344.37
Original Test Design: Placement Exam 13
Except listening and reading subtests, all the other tests are graded using different scales.
These subtest scores are not aggregated to enable the EAP/English Department to decide whether
an international student has to take either or both of the EAP courses. Table 3 and 4 present the
subtest scores, and the comments following Table 4 represents how subtest scores are used to
make placement decisions (Alderson et al., 1995).
Table 3
Placement Test Scores for Academic Interaction
Subtest (Points Possible)
Learner Listening (10) Reading (10) Oral (40)
1 10 8 R1 (30) R2 (33) =31.5
2 10 6 R1 (38) R2 (38) = 38
3 9 6 R1 (28) R2 (26) = 27.5
4 2 2 R1 (17) R2 (20) = 18.5
5 5 7 R1 (30) R2 (27) = 28.5
6 8 6 R1 (34) R2 (34) = 34
7 10 8 R1 (30) R2 (30) = 30
8 10 7 R1 (23) R2 (26) = 24.5
9 10 8 R1 (28) R2 (31) = 29.5
10
Mean
10
8.4
9
6.7
R1 (27) R2 (27) = 27
28.9
Original Test Design: Placement Exam 14
To be exempt from the Academic Interaction course, students must obtain a score of 70% or
higher on each of these three subtests—listening comprehension, oral presentation and grammar.
To be exempt from the Academic Writing from Sources course, students must obtain a score of
80% or higher on each of these two subtests—reading comprehension and mini-essay writing
respectively.
To further analyze students’ scores on each subtest, we created the following frequency
polygons for listening for where we used partially subjective scoring, reading, and grammar
subtests for where we both used objective scoring for both subtests:
Table 4
Placement Test Scores for Academic Writing
Subtest (Points Possible)
Learner Grammar (15) Writing (100)
1 14 R1 (77) R2 (75) =76
2 13 R1 (90) R2 (90) = 90
3 10 R1 (85) R2 (85) = 85
4 0 R1 (36) R2 (38) = 37.5
5 9 R1 (62) R2 (61) = 61.5
6 11 R1 (79) R2 (77) = 78
7 13 R1 (87) R2 (87) = 87
8 8 R1 (53) R2 (57) = 55
9 2 R1 (57) R2 (56) = 56.5
10 10
R1 (68) R2 (67) = 67.5
Mean 9 69.4
Original Test Design: Placement Exam 15
By looking at the frequency polygons and the descriptive results from Table 3 and Table
4, we wondered whether the listening comprehension subtest is too easy for the students. The
mean of the listening subtest is 8.4, much higher than a score of 70% of the total listening scores.
On the reading subtest, although the mean is only 6.7 out of 10, there are 60% of the students
Figure 1
Frequency Polygon for Listening and Reading Subtests
Figure 2
Frequency Polygon for Grammar Subtest
Original Test Design: Placement Exam 16
who obtained a score of 70% or higher. In contrast, in the grammar, writing, and oral subtests,
only 20–30% of the students met the requirements. Based on these scores, it seems that these
students’ needs to improve their writing and oral skills with an emphasis on grammatical
competence.
Analysis
Applying Wesche’s (1983) Four Components
The following table shows the application of Wesche’s (1983) four components
framework to our test:
Table 5
Wesche’s (1983) Framework
Subtest Stimulus
Materials
Task Posed to the
Learner
Learner’s
Response
Scoring Criteria*
Listening The test-taker
watches a video
clip of “English
Mania”
presented by
Jay Walker
(2009). The test
also contains
five short-
answer
questions
related to the
content of the
video.
The test-taker must watch
and listen to the video and
identify important
information.
The test-taker
must write
down their
responses to
the questions.
Questions 2 and 3
(requiring specific
number and country
names) are marked
using the exact word
method. The remaining
questions are marked
using the acceptable
word method. Students
are given either 2 points
or 0 points. For
Question 3, partial credit
(1 pt) is given when at
least two correct
countries are mentioned.
Grammar The test-taker
reads an article
from the New
York Times
(Bahanoo,
2012).
The test-taker must
identify 15 extra words
inserted within a sentence
that makes the sentence
ungrammatical based on
the structural rules of
English; the test-taker
must pay attention to the
details of the reading to
find multiple grammar
errors, such as use of
articles and tenses.
The test-taker
must cross out
the extra
words.
The test-taker gets
points when he/she
crosses out the exact
incorrect words.
Original Test Design: Placement Exam 17
*Note. The keys and rubrics of the scoring criteria were all pre-established by the test designers,
although the rubric of the oral interview was modified subject to the students’ responses from the
piloting tests.
Wesche (1983) points out the importance of using authentic materials in language testing.
Therefore, the stimulus materials we selected to include in the test were sourced from
Table 5 (Con’t)
Wesche’s (1983) Framework
Subtest Stimulus
Materials
Task Posed to the
Learner
Learner’s
Response
Scoring Criteria*
Reading The test-taker
reads 1 long
passage and 1
short passage.
The test contains
5 multiple-
choice questions
for each passage.
The test-taker must
identify the main
ideas of the
readings and
define the meaning
of the words
within the given
context.
The test-taker
must circle the
letter
representing the
answer to a
question.
The test-taker gets
points when they circle
the correct letters of the
multiple-choice
questions, as determined
by the established key.
Mini-
essay
Writing
An essay prompt
is presented to
the test-taker.
The test-taker must
read and answer to
the given prompt.
He/She must
compose an
organized writing
with sufficient
examples and
correct use of
vocabulary and
grammar.
The test-taker
must write an
essay about
180-250 words
that states,
explains, and
supports his/her
opinion on the
given prompt.
The test-taker’s essay is
subjectively scored
based on an analytic
rubric set by the test
designers. The rubric
consists of three
sections, content,
organization and
grammar.
Oral
Interview
A role-play
scenario is given
to the test-taker.
The test-taker must
read the prompt,
understand the
context and adopt
the role given in
the scenario.
The test-taker
must take 2
minutes to
prepare a
persuasive
speech that
states, explains,
and supports
his/her opinion
on the given
topic and
deliver it within
3 minutes.
The test-taker’s speech
is subjectively scored
based on an analytic
rubric set by the test
designers. The rubric
evaluates two aspects of
a speech which are
content and fluency and
accuracy.
Original Test Design: Placement Exam 18
authoritative publications, such as the New York Times and the National Geographic Learning.
We also decided to create a theme-based test to help scaffold students’ knowledge, as well as to
make the testing constructs more integrated with one another. Considering the background of our
test-takers, we chose “language learning” as the overarching theme, because all test-takers share
an experience of learning an additional language, English. In addition, we sequenced the test
from the receptive skills (listening, grammar and reading) to the productive skills (speaking and
writing) to enhance the production stage of the exam.
Applying Swain’s (1980) Four Principles of Communicative Language Development
The following table shows our application of our test to Swain’s four principles:
Table 6
Swain’s (1980) Framework
Subtest Start from
somewhere
Concentrate on
content
Bias for best Work for
washback
Listening Our choice of this
procedure is
motivated by our
intention to simulate
an academic
situation in which
students are given a
lecture.
Since the test-
takers are
international
students, the
topic of English
learning is
relevant to them
and the video
also serves to
activate test-
takers’ schemata.
The test-takers can
get visual support
besides the audio
input. Also, they
are allowed to take
notes when
watching the video.
The spelling errors
are not marked in
test-takers’
responses to the
comprehension
questions.
The test-takers can
Experience a
situation of
taking a real
academic
lecture.
Practice note-
taking skills.
Grammar Citing Larsen-
Freeman’s (1991,
1997), Brown (2010)
defines grammatical
knowledge as:
grammatical forms,
grammatical
meanings and
pragmatic meanings.
Students can
relate the content
to their own
experience in
language
learning.
The subtest
assesses multiple
grammar points,
such as use of
articles, adjectives
and verb tense.
The test-takers can:
Learn to pay
attention to the
details of the
reading
passages.
Know the
meanings are
associated with
the grammatical
forms.
Original Test Design: Placement Exam 19
Table 6 (Con’t)
Swain’s (1980) Framework
Subtest Start from somewhere Concentrate on
content
Bias for best Work for
washback
Reading
The design of the
subtest was driven by
both the top-down-
processing and the
bottom-up-processing
of reading
comprehension
(Longman dictionary,
2012).
Consistent with
the content of
the previous
subtests, the
two articles are
also about
language
learning.
The definitions of
some difficult
vocabulary terms
are given in the
test.
Key words and
key sentences are
either underlined
or bolded for
attention.
Paragraphs are
marked with
alphabetic letters
for the
convenience of
reference.
The test-takers can:
Expand their
vocabulary
knowledge.
Learn to use
context to
interpret the
meanings of the
words.
Identify the
main ideas from
the readings.
Paraphrase the
reading.
Mini-essay
Writing
Through essay writing
task, we are able to
identify students’
strengths and
weaknesses, including
grammar usage and
vocabulary knowledge.
The essay
prompt,
whether
learning English
is important or
not, has been
developed
through the
previous
subtests.
The test-takers can
use the materials
provided on the
test to support
their opinions.
The test-takers can:
Write in a
simulated
academic
context.
Compose an
argumentative
essay.
Incorporate
sufficient
sources into the
writing.
Oral
Interview
Besides the concern of
using direct test to
measure the test-takers’
oral competence, the
construct of the oral
test was also inspired
by the frequent
situations where
students are required to
orally express their
opinions supported by
examples in academic
settings.
The content is
related with the
theme of the
test, language
learning.
The test-takers can
use the materials
provided on the
test to support
their opinions.
The test-takers
have 2 minutes to
prepare and jot
down some notes
for their speech.
The test-takers can:
Experience a
simulated
academic
presentation.
Give a
persuasive
speech.
Original Test Design: Placement Exam 20
Examining Validity and Reliability
The Dordt College Placement Test (DCPT) is an important test not only for the English
Department but also for the international students. Since the results of this test will be used to
decide whether the incoming students need to take the English for Academic Purposes (EAP)
classes in their first semester, we had to make sure that our newly revised test is valid and
reliable. Therefore, we piloted the DCPT with the current EAP students at Dordt and decided to
conduct several analyses on our subtests, including Item Facility (I.F.), Item Discrimination
(I.D.), Distractor Analyses, Response Frequency Distribution, Split Half Reliability, Inter-Rater
Reliability and Subtest Relationships. Specifically for validity, we analyzed one of the
objectively scored portions of our test, the reading comprehension subtest, using I.F., Distractor
Analyses, I.D., and Response Frequency Distribution. To test the reliability, we evaluated the
subjectively scored parts of our test using Inter-Rater Reliability and one of the objectively
scored parts, the multiple-choice (M-C) test, using Split Half Reliability. Finally, we assessed the
correlation between scores on each of our subtests and the total test.
As mentioned previously, for our reading comprehension subtest, we designed an M-C
test. Bailey (1998) discusses that many teachers and test-makers use an M-C test as a method to
assess students’ ability because of the ease of test administration and scoring. Moreover, students
may perceive an M-C test to be much “fairer and/or more reliable” since this test can be scored
objectively (Bailey, 1998, p. 130). Despite its scoring practicality, the reality is that it is difficult
to design an M-C test. In fact, Bailey (1998) mentions that it is quite labor-intensive because test-
makers need to consider various factors. For instance, getting the question (stems) and the
options right takes time. To ensure that our M-C subtest is working well and that this subtest
Original Test Design: Placement Exam 21
provides valid information the college needs to make placement decisions, we conducted four
different types of validity analyses.
Validity of M-C Questions
Item facility. Item facility (I.F.) is “an index of how easy an individual item was for the
people who took it” (Bailey, 1998, p. 132). To calculate the I.F. of the reading comprehension
multiple-choice (M-C) subtest, we used the following formula, taken from Bailey (1998, p. 132):
I.F. = # of test-takers answering the item correctly ÷ # of test-takers
According to Bailey (1998), the I.F. number ranges from 0.0, which signifies that every
test-taker missed the item, to 1.0, which means everyone answered the item correctly. Table 7
represents the I.F. data for DCPT reading comprehension M-C subtest.
Table 7
Reading Comprehension Subtest Item Facility (n=10)
Item
Students who answered
the item correctly Item Facility (I.F.)
1 8 0.80 (80%)
2 8 0.80
3 10 1.00
4 9 0.90
5 9 0.90
6 3 0.30
7 6 0.60
8 4 0.40
9 5 0.50
10 5 0.50
Average I.F.= 0.67
Oller (1979) states that “items falling somewhere between about 0.15 and 0.85 are
usually preferred” (p. 247). Based on our I.F. data, item 3 (1.00) and items 4 and 5 (0.90) need
serious revisions, because Oller claims that “in tests that are intended to reveal differences
among the students who are better and worse performers on whatever is being tested, there is
Original Test Design: Placement Exam 22
nothing gained by including test items that every student answers correctly or that every student
answers incorrectly” (p. 246). Excluding items 3, 4, and 5, the remaining items fall well within
Oller’s preferred range. Half of the items (items 6, 7, 8, 9, and 10) appear to be in the medium
difficulty range, from 0.30 to 0.60. Based solely on these results, we would not change the items
with medium difficulty (items 6 to 10), but would revisit items 3, 4, and 5.
Distractor analysis. We conducted a Distractor Analysis to improve the validity of our
M-C test and to make sure that each option for each test item was “distracting.” Also, since we
assume that students have variable skills and knowledge, we do not want to have a test item
option that is too obvious and serves no purpose in terms of distinguishing students who know
and who do not know the correct answer. Table 8 shows the number of students that selected
each option.
Table 8
Reading Comprehension Subtest Distractor Analysis (n=10)
Item A B C D
1 8* 2 0 0
2 0 8* 1 1
3 0 10* 0 0
4 0 0 9* 1
5 9* 0 0 1
6 2 2 3 3*
7 2 2 0 6*
8 5 0 4* 1
9 1 4 5* 0
10 5* 1 2 2
Note. (*) indicates the correct answer to the item.
Based on the results of the Distractor Analysis, we can see that items 1 through 5 need
attention. Previously, our I.F. Analysis indicated that items 3, 4, and 5 should be revised because
these items were too easy for the students. In the Distractor Analysis, we can verify that options
in items 3 to 5 should be changed considerably because the majority of the students chose one
Original Test Design: Placement Exam 23
option over the others. Options in the items 1, 2, 7, 8, and 9 should also be reviewed to make
ensure that all of the options are distracting, like items 6 and 10.
Item discrimination. The Item Discrimination method was used to find out how the top
scorers and low scorers performed on each item in the M-C subtest. Using Flanagan’s method of
computing item discriminability (Oller, 1979), the top scorers and the low scorers were ranked
based on the scores of the entire exam, including the other four subtests. We took the top 33% of
the exams and the bottom 33% of the exams and calculated the I.D. using the following formula,
taken from Bailey (2008, p. 136)
I.D.= (# of high scorers who got the item right) – (# of low scorers who got the item right)
33% (total # of students [10])
Using Flanagan’s formula, Table 9 reflects the calculated I.D. values for the M-C subtest.
Investigating the I.D. value is helpful to us as test makers because we are able to know
whether our low I.F. items were truly difficult and our high I.F. items were too easy for the
Table 9
Reading Comprehension Subtest Item Discrimination (n=10)
Item
High scorers (top three)
with correct answers
Low scorers (bottom
three) with correct
answers
Item Discrimination
(I.D.)
1 2 2 0.00
2 3 1 0.61
3 3 3 0.00
4 3 2 0.30
5 3 2 0.30
6 1 1 0.00
7 2 2 0.00
8 2 1 0.30
9 2 1 0.30
10 1 2 − 0.30
Average I.D.= 0.151
Original Test Design: Placement Exam 24
students. Table 4 shows the I.F. and I.D. values side by side. We will use Table 10 to better
analyze the results of the I.D. values.
Table 10
Reading Comprehension Subtest Item Discrimination and Item Facility (n=10)
Item
High scorers
(top three) with
correct answers
Low scorers
(bottom three)
with correct
answers
Item
Discrimination
(I.D.)
Item Facility
(I.F)
1 2 2 0.00 0.80 (80%)
2 3 1 0.61 0.80
3 3 3 0.00 1.00
4 3 2 0.30 0.90
5 3 2 0.30 0.90
6 1 1 0.00 0.30
7 2 2 0.00 0.60
8 2 1 0.30 0.40
9 2 1 0.30 0.50
10 1 2 − 0.30 0.50
Based solely on the I.D. values, items 1, 3, 6, 7, and 10 should probably be revised since
Oller’s (1979) lowest acceptable value is 0.25. From our previous discussion on Item Facility,
we mentioned that item 3 should to be addressed because the item was too easy for the students
(IF=1.0). In addition, we also mentioned keeping the items 6, 7, and 10 because these items had
medium difficulty and were in Oller’s preferred I.F. range; but our I.D. values for items 6 and 7
show that they need to be revisited because equal numbers of top scoring students and low
scoring students answered these items correctly. In fact, in item 10 (I.D. = −.30), two high
scorers missed the item while only one low scorer incorrectly answered the item. Oller points out
that “we would be disturbed if we found an item that good readers (high scorers) tended to miss
more frequently than weak readers (low scorers)” (1979, p. 251). However, we need to consider
two important factors before making changes to any items. First, because our sample size is
Original Test Design: Placement Exam 25
small (n=10), it is difficult to decide whether to revise these items using Oller’s recommended
range, especially in items 4 and 5—the high I.F. value (0.9) signals “change” but I.D. value
(0.30) indicate that it is acceptable. If our sample size was larger, there may be more variance in
our results; thus, we could analyze these items better.
Second, the top scorers and the low scorers were divided based on the results of the entire
test, which consists of five subtests, evaluating different language constructs. The DCPT that we
designed heavily emphasize grammatical competence and writing skills; it was designed as such
based on our needs analysis. As a result, we may have considered students with high grammar
knowledge and writing skills as part of the top three high scorers, but we are analyzing the
performance of students’ reading comprehension ability. Students with high reading
comprehension skills may have scored lower in other sections, and thus, may not have been
included in the top three high scorer sample. Therefore, keeping these factors in mind, we will
revisit and mindfully revise the items that need attention.
Response frequency distribution. Prior to looking at the results of the Response
Frequency Distribution, Table 11 provides a brief overview of the items that need attention
and/or revision based on I.F., Distractor Analysis, and I.D. analysis:
Table 11
Overview of Items that Need Attention
Analysis Item(s)
Item Facility 3, 4, and 5
Distractor Analysis 3, 4, and 5 (maybe 1, 2, 7, 8, and 9)
Item Discriminability 1, 3, 6, 7, and particularly 10 (-0.30)
Based on the information from Table 5, items 3, 4, 5, and 10 (since it is showing a
negative discrimination) seem to need revisions, and items 7, 8, and 10 should be revisited. To
further assist us in making decisions and analyzing the validity of our M-C subtest, we conducted
the Response Frequency Distribution. Response Frequency Distribution analysis is a useful
Original Test Design: Placement Exam 26
method because it provides a detailed picture of what the top scorers and low scorers answered
for each item; this analysis reflects the combination of the Distractor Analysis and I.D. analysis
(see Table 12).
Table 12
Response Frequency Distribution on Reading Comprehension Subtest
Item
High/Low
Scorers A B C D
1 High 2* 1 0 0
Low 2* 1 0 0
2 High 0 3* 0 0
Low 0 1* 1 1
3 High 0 3* 0 0
Low 0 3* 0 0
4 High 0 0 3* 0
Low 0 0 2* 1
5 High 3* 0 0 0
Low 2* 0 0 1
6 High 1 0 1 1*
Low 1 1 0 1*
7 High 0 1 0 2*
Low 1 0 0 2*
8 High 1 0 2* 0
Low 2 0 1* 0
9 High 0 1 2* 0
Low 1 1 1* 0
10 High 1* 1 0 1
Low 2* 0 1 0
Note. (*) indicates the correct answer to the item.
As shown in Table 12, items 1, 3, 7, and 10 need to be revisited because these items did
not discriminate between the high scorers and the low scorers. For items 1, 3, and 7, there were
equal number of high scorers and low scorers who answered the items correctly; for item 10, as
mentioned previously, there were more low scorers who had the right answer than high scorers.
Item 6 also had equal number of high and low scorers, but in terms of Distractor Analysis, this
item is ideal. In addition, it is important to note that since we only selected the top three and
Original Test Design: Placement Exam 27
bottom three scorers, and because there are only four options, despite the even distribution, there
will still be a “0” number in our Response Frequency Distribution data, such is the case in items
2, 6, and 9. The results in items 2, 8, and 9 are quite interesting because a majority (if not all
three) of the top scorers answered these items correctly, while the low scorers were distracted by
other options. Referring back to Table 11, the items that were easy were items 3, 4, and 5. Thus,
we consider items 2, 8, and 9 as useful items to separate the performance of high and low
scorers. Finally, based on Table 12, it is worth revisiting items 4 and 5 because all three high
scorers as well as 2 out of 3 low scorers answered these items correctly.
Through several validity analyses, we have learned that some of the items—either the
questions or the options—need to be reviewed. Table 13 shows the overall summary of items
that need attention.
Table 13
Overall Summary of Items that Need Attention
Analysis Item(s)
Item Facility 3, 4, and 5
Distractor Analysis 3, 4, and 5
Item Discriminability
Response Frequency Analysis
Overall Items That Need Attention:
1, 3, 6, 7, and particularly 10 (-0.30)
1, 3, 7, and 10
1, 3, 4, 5, 7, and 10
Reliability
Aside from ensuring the validity, it is important to examine whether a test reliable.
According to Brown (2005), “test reliability is defined as the extent to which the results can be
considered consistent or stable” (p. 175). As mentioned above, the DCPT consists of five
subtests, in which the reading and grammar subtests are objectively scored. We consider the
listening subtest as a partially subjectively scored test because the scoring criteria adopted the
acceptable word method. Internal-consistency measures can only be applied to the objectively
Original Test Design: Placement Exam 28
scored tests. We also are not able to calculate the internal-consistency of the grammar subtest
because it is a cloze-elide test, and each item is dependent on one another. Therefore, we only
measured the internal-consistency reliability of the reading subtest, which is composed of 10 M-
C items. We used the split-half method and the Spearman-Brown prophecy formula to estimate
the full subtest reliability (see Table 14).
Table 14
Internal Consistency Measures of the Reading M-C Subtest
Subtest
Split Half
Reliability
Reliability after
using Spearman
Brown
Prophecy
Formula
Standard
Deviation
Standard Error
of
Measurement
(SEM)
Points
Possible
Reading 0.56 0.72 1.85 0.80 10.00
The reliability result of 0.72 using the Spearman Brown formula means that the scores of the
reading M-C subtest is 72% consistent, with an 18% measurement error (100%72%=18%). The
statistical results also suggest that our reading subtest has good reliability considering the
internal consistency. We claim as such in consideration of the following factors:
1). The sample size of the test is small—only 10 test-takers.
2). The test is first launched and is fairly new.
3). The total number of the items (10) on the M-C subtest is small.
In addition, Brown (2005) points out that all the methods used to estimate the internal-
consistency reliability underestimate the actual value. Therefore, we are confident enough to
conclude that the reading M-C subtest of the DCPT is fairly reliable.
Original Test Design: Placement Exam 29
With the reliability estimate, we calculated the standard error of measurement (SEM)
using the following formula, in which S stands for standard deviation and stands for the
reliability estimate for the test:
(Brown, 2005, p. 189)
As the formula suggests, the SEM value is related to the internal consistency of the test. It
refers to the possible score range a test-taker can get if he/she takes the test repeatedly. In other
words, it expresses the precision of test scores. The SEM value we obtained is 0.98, which can
be rounded up to 1 point. Thus, if a test-taker gets a score of 7 on the reading subtest, his/her true
ability score lies with a certain level of probability in between 6 and 8. Considering the fact that
the total points of the reading subtest is 10 and each item is worth 1 point, the 0.98 value of the
SEM is quite good and reasonable. Thus, our SEM value further supports the reliability of our
reading comprehension section.
Inter-rater reliability. Since the reliability of the reading section is confirmed, we now
proceed to examine the reliability of our subjectively scored tests—the oral and the essay writing
subtests. Using analytic scoring rubrics, we each rated the tests; upon scoring, the inter-rater
reliability was measured. We calculated the final score of each of the subjectively scored
sections by averaging our ratings (Rater1’s rating and Rater 2’s rating). According to Bailey
(1998), coefficient alpha is usually used to compare the scoring of the two raters. To calculate
the coefficient alpha, the variance for each rater and the total variance for both raters were
computed (see Table 15 and Table 16).
'xxr
'1 xxrSSEM
Original Test Design: Placement Exam 30
Table 15
Inter-rater Reliability for Oral Test
Learner Rater 1 Rater 2 Rater 1 + Rater 2
1 30 33 63
2 38 38 76
3 28 24 52
4 17 22 39
5 30 27 57
6 34 34 68
7 30 37 67
8 23 26 49
9 28 32 60
10 27 27 54
Mean 29 30 59
Standard Deviation 5.41 5.25 10.13
Variance 29.25 27.60 102.65
Coefficient Alpha = .89
As Table 15 suggests, the standard deviation for Rater 1 was slightly higher than the
standard deviation of Rater 2, but the mean for Rater 1’s scores was marginally lower than the
mean for Rater 2’s scores. Thus, we can infer that Rater 1 was slightly tougher and had a little
more variability in scoring the oral test. Moreover, as shown in Table 9, the calculated
coefficient alpha was 0.89. Bailey (1998) mentions, “the closer the value is to the whole number
1.00, the greater the inter-rater reliability” (p. 182). Therefore, based on the coefficient alpha
value, the ratings on the oral test were quite reliable.
Original Test Design: Placement Exam 31
Table 16
Inter-rater Reliability for Essay Writing Test
Learner Rater 1 Rater 2 Rater 1 + Rater 2
1 77 82 159
2 90 90 180
3 85 85 170
4 33 41 74
5 62 61 123
6 82 75 157
7 92 85 177
8 53 62 115
9 59 54 113
10 76 67 143
Mean 70.9 70.2 141.10
Standard Deviation 17.81 15.07 32.43
Variance 317.29 226.96 1051.49
Coefficient Alpha = .96
Similar to the analysis of the results shown in Table 15, the means and the standard
deviations in Table 16 indicate that Rater 1’s ratings were slightly more lenient and had more
variability in scoring the writing subtest than Rater 2’s. The coefficient alpha, 0.96, indicates an
extremely high correlation between the two raters. In other words, the ratings of the two raters
were highly reliable.
Since the ratings of the two raters on both subjectively scored subtests are quite reliable,
we deduce that one of the attributions of the results may be due to the use of analytic scoring
scales. The analytic scoring rubrics outline the components of writing, such as content,
organization and grammar in detail, so raters can easily identify the measurable concepts when
evaluating the tests. It is also noteworthy to mention that the inter-rater reliability of the writing
subtest is higher than the one of the oral subtest. The trigger of such a difference may be that the
analytic scoring scale of the writing subtest is more specific and detailed than of the oral subtest.
Original Test Design: Placement Exam 32
Although the inter-rater reliability appears to be substantial, we acknowledge our
limitations—that we both served as test-developers and raters. In real-life cases, the raters are
often not involved with the test development. Although trainings are given to ensure the
reliability of raters, raters sometimes have different perspectives and interpretations of the
scoring criteria. However, in our case, since we created the scoring criteria, we knew exactly
what we were looking for. To ensure the successful application of the DCPT, we recommend
that raters attend a rater conference, where they can be trained on how to score each criterion of
the analytic scoring rubrics before they start to evaluate the test.
Subtest Relationships
To further strengthen the validity of our test, we evaluated the relationship between each
pair of the subtests by conducting a statistical analysis of the correlation among the subtests of
the DCPT. As mentioned before, the DCPT consists of five subtests. The listening
comprehension subtest is worth 10 points; the grammar section is worth 15 points; the reading
comprehension section has 10 total points; the writing section contains 100 points in total; and
the oral subtest is worth 40 points (a total of 175 points). At first, we used the raw score formula
to calculate Pearson’s r, which is the correlation coefficient between each pair of the subtests.
However, since each subtest has different scoring scales and points, we decided to convert all the
subtest scores to a standardized scale—z scores—to calculate Pearson’s r. Interestingly, there
was no difference in the results between using z scores and raw scores. Table 17 reflects the
statistical results of the subtest relationship.
Original Test Design: Placement Exam 33
Table 17
Subtest Relationships
Test Correlation Coefficients (Pearson’s r)
Listening 0.57 0.67 0.79 0.58 -
Grammar 0.69 0.89 0.52 - 0.58
Reading 0.44 0.47 - 0.52 0.79
Writing 0.78 - 0.47 0.89 0.67
Oral - 0.78 0.44 0.69 0.57
Oral Writing Reading Grammar Listening
As evident in Table 17, the values of the correlation coefficient are all positive. This
positive correlation suggests that as scores in one subtest increase, so will the scores in another
subtest. Thus, if a test-taker improves his/her performance on the listening comprehension,
he/she may also perform better on any of the other four subtests.
Brown (2005) notes that “relatively strong correlations would be those that range from
+0.80 to +1.0, or 0.80 to 1.0” (p. 141). The greatest value indicated in Table 17 is 0.89, which
is the value of correlation coefficient between grammar and writing subtests. The correlation
coefficient between oral and writing is 0.78, and the correlation coefficients between listening
and reading is 0.79. Both of them can be rounded up to 0.80. These three high values indicate a
strong correlation between the paired subtests mentioned in comparison to the other pairs of the
subtests. For example, reading and writing subtests as well as reading and oral subtests show
relatively low correlations (0.44 and 0.47, respectively).
The high correlation between grammar and writing is expected, because half of the scores
in the writing section depend mainly on the test-taker’s grammatical competence. However, we
cannot claim that our grammar section measures the same construct as the writing section. Oller
(1979) strongly argues that a low correlation does not indicate that two tests are measuring
Original Test Design: Placement Exam 34
different constructs, nor does high correlation indicate that two tests are measuring the same
constructs. In fact, there are a lot of factors that may impact the correlation between two tests.
Oller also points out that “high correlations have been observed between a wide variety of testing
techniques with a wide range of tested populations” (p. 193). Since our test is fairly new and our
sample size of test-takers is small, it is normal that we do not have consistently high subtest
correlations.
To observe whether two different tests measure the same thing, we decided to square the
correlation coefficients to obtain the values of overlapping variance. Table 18 reflects the values
of overlapping variance between each pair of the subtests.
Table 18
r-squared for Subtest Relationships
Test Overlapping Variance
Listening 0.32 0.45 0.62 0.34 -
Grammar 0.48 0.79 0.27 - 0.35
Reading 0.19 0.22 - 0.27 0.62
Writing 0.61 - 0.22 0.79 0.45
Oral - 0.61 0.19 0.48 0.32
Oral Writing Reading Grammar Listening
The highest value in Table 18, 0.79, is the overlapping variance between the grammar
and writing subtests. This means that the writing section and the grammar section share almost
80% overlapping variance in measuring the same construct, which is grammatical competence.
As we mentioned previously, 50% of the scores in the writing section are dependent on
grammar. Based on our needs analysis of Dordt College ESL students and the English
Department, Dordt College strongly values grammatical competence and the writing skills of
students. Therefore, we were glad to see that the grammar and writing sections have high
reliability and relationship within the DCPT.
Original Test Design: Placement Exam 35
Discussion
According to Oller (1979), there are four traditional criteria for evaluating tests—validity,
reliability, practicality, and washback. First, the validity of a test refers to “how well the test does
what it is supposed to do, namely, to inform us about the examinee’s progress toward some goal
in a curriculum […] or to differentiate levels of ability among various examinees on some task”
(p. 4). In terms of face validity, although few items from the M-C subtest need improvement, we
feel that overall the newly revised DCPT has face validity because the test was designed
carefully considering the language constructs that need to be tested. Moreover, we made sure that
the overall difficulty level of the entire test was appropriate, the instructions were clear, and the
tasks were uncomplicated. It is also worth mentioning that we pre-piloted and piloted this test to
the current international students who are taking EAP classes at Dordt College.
According to Mousavi (1999), a test is considered valid when the content of the test
measures the language skills and structures that are meant to be concerned. To ensure that the
content of our test is valid, especially to Dordt College, we reviewed and adapted all the skills
that were tested on the original DCPT. However, based on our needs analysis, we found out that
the English Department at Dordt, as well as the past and current international students,
considered grammatical competence as an important language skill. Therefore, with Alderson,
Clapham, and Wall’s (1995) guidance on developing test specification, we identified five
constructs for the new placement test—listening comprehension, grammatical knowledge,
reading comprehension, writing ability, and oral skills. We also used Weshe’s (1983) four
components framework (see Appendix G) and Swain’s (1980) four principles of communicative
language test development (see Appendix H) as our guidance when developing and validating
the content of our test. Finally, since the target population for this test is incoming international
Original Test Design: Placement Exam 36
students who are language learners, we selected “language learning” as the overarching theme of
the exam.
Oller’s (1979) second criterion for evaluating tests is reliability. Oller states that
“reliability of a test is a matter of how consistently it produces similar results on different
occasions under similar circumstances” (p. 4). Furthermore, Baker (1989) defines the term
reliability as “stability in the measure” (p. 60). Based on several reliability analyses that we
conducted, it is safe to claim that the DCPT is a reliable test to assess incoming international
students’ overall English language abilities. Using the split-half method, we confirmed the
reliability of the reading M-C subtest. To examine the reliability of the subjectively rated scores,
we used coefficient alpha to calculate inter-rater reliability, in which we found that the ratings of
the two raters are highly reliable. Finally, we also evaluated the subtest relationships to
determine the strength of the correlation between two subtests. Our results showed positive
correlation coefficients between each pair of subtests.
The third criterion we used to evaluate our test is practicality. According to Oller (1979),
practicality includes the “preparation, administration, scoring, and interpretation of the test” (p.
4). To ensure the practicality of our test, we pre-piloted and piloted our exam to measure and
adjust the limits of time. We also referred to the test administration specification sent by the EAP
instructor. We adapted this specification and modified it accordingly (see Appendix A for Test
Administration Procedures). We also made sure that the entire exam, especially the oral
interview subtest, was not too lengthy, not only for the benefit of the students, but to also
increase the ease of scoring and interpretation, as well as to lessen the burden of volunteer
students and faculty members who are involved in the test administration and scoring.
Original Test Design: Placement Exam 37
Finally, it is important to consider the washback of a test, “the effect a test has on
teaching and learning” (Bailey, 1998, p. 249). Applying Swain’s (1980) four principles of
communicative language test development, the newly revised DCPT has the following washback
for each subtest, as shown in Table 19.
Table 19
Washback of Subtests: Applying Swain’s (1980) Framework
Subtest Washback
Listening Comprehension The test-takers can:
Experience a situation of taking a real
academic lecture.
Practice note-taking skills.
Grammar The test-takers can:
Learn to pay attention to the details of the
reading passages.
Know the meanings are associated with the
grammatical forms.
Reading Comprehension
The test-takers can:
Expand their vocabulary knowledge.
Learn to use context to interpret the
meanings of the words.
Identify the main ideas from the readings.
Mini-Essay Writing The test-takers can:
Write in a simulated academic context.
Compose an argumentative essay.
Incorporate sufficient sources into the
writing.
Grammar The test-takers can:
Experience a simulated academic
presentation.
Give a persuasive speech.
Original Test Design: Placement Exam 38
Conclusion
In the process of designing the DCPT, we used Weshe’s (1983) four components
framework (see Appendix G) and Swain’s (1980) (see Appendix H) four principles of
communicative language test development framework to ensure that the quality of the test. Test
specifications were also used to guide the test development process to establish good
comparability of scores across test forms (Alderson, Claphan, & Wall, 1995). Furthermore, we
pre-piloted the test to three native (or near native) English speakers before piloting the test to the
students from the target group—current Dordt College ESL students. The pre-piloting stage
allowed us to revisit some of our testing items, as well as to re-adjust the time allotted to each
subtest to increase practicality of the test. Thanks to the support of the ESL Department of Dordt
College, our test test was piloted in the environment where the test would actually be adopted
and administered. Therefore, the results of the DCPT were highly informative and indicative of
its future performance.
The latter sections of this report specifically discussed the validity and the reliability
results of the DCPT. In summary, the overall test is valid because the content measures the
language skills and structures that are meant to be concerned (Mousavi, 1999). We conducted
four statistical analyses, item facility, distractor analysis, item discrimination and response
frequency distribution, to test the quality of the M-C items on the reading subtest. Based on our
statistical results, although there are some items that need to be revisited to strengthen the overall
validity of the reading subtest (see Table 13), we can confirm the overall validity of our reading
subtest, especially considering that this is a new test. In addition, our reliability procedures
yielded positive results, showing significant inter-rater reliability between raters. Therefore,
based on all the statistical results, we can safely conclude that the DCPT is valid and reliable for
Original Test Design: Placement Exam 39
its use as a placement test. We can also confirm the practicality of the DCPT because we
carefully designed the test and conducted several piloting procedures. Finally, we ensured the
washback of the DCPT by applying Swain’s (1980) four principles of communicative language
test development.
Original Test Design: Placement Exam 40
References
Alderson, J. C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation.
Cambridge: Cambridge University Press.
Angeli, E., Wagner, J., Lawrick, E., Moore, K., Anderson, M., Soderlund, L., & Brizee, A.
(2012, May 30). General format. Retrieved from
http://owl.english.purdue.edu/owl/resource/560/01/
Bahanoo, S. (2012, April 3). How immersion helps to learn a new language. The New York
Times. Retrieved from http://www.nytimes.com/2012/04/03/science/how-immersion-
helps-to-learn-a-new-language.html
Bailey, K. M. (1998). Learning about language assessment: Dilemmas, decisions, and
directions. Boston, MA: Heinle & Heinle Publishers.
Baker, D. (1989). Language testing: A Critical survey and practical guide. London:
Edward Arnold.
Brown, H. D. (2010). Language assessment, principles and classroom practice. New York, NY:
Pearson Education.
Deutscher, G. (2010, August 26). Does your language share how you think? The New York
Times. Retrieved from http://www.nytimes.com/2010/08/29/magazine/29language-
t.html?pagewanted=all
Ferris, D., & Hedgcock, J. (Forthcoming). Teaching L2 composition: Purpose, process, and
practice (3rd ed.). New York, NY: Routledge.
Mousavi, S. A. (1999). A Dictionary of language testing. Tehran: Rahnama Publications.
Oller, J. W. (1979). Language tests at school. London: Longman Group.
Richards, J. C., & Schmidt, R. (2010). Longman dictionary of language teaching & applied
Original Test Design: Placement Exam 41
linguists. London, UK: Pearson Education Limited.
Swain, M. (1984). Large-scale communicative language testing: A case study. In S. J. Savignon
& M. Berns (Eds.), Initiatives in communicative language teaching (pp. 185–201).
Reading, MA: Addison-Wesley.
Walker, J. (2009, February). Jay Walker: World’s English mania [Video file]. Retrieved from
http://www.ted.com/talks/lang/en/jay_walker_on_the_world_s_english_mania.html
Wesche, M. B. (1983). Communicative testing in a second language. Modern Language Journal,
67, 41–55.
Vargo, M. & Blass, L. (2013). Pathways 1: Reading, writing, and critical thinking. Boston:
National Geographic Learning.
Original Test Design: Placement Exam 42
Appendices
Appendix A: New Logistics Guide
Instructions for Logistics Team for Test Administration
DATE
NAMES OF ADMINISTRATORS
9:00 AM – 12:00 noon
General Information
1. There will be a reception desk in front of the circulation desk in the library. All students
taking the interview will be told to report to the reception desk. You team members will
be at that desk to welcome students as they arrive. There will also be a few chairs for
students who may have to wait a minute for your attention.
2. You will have a schedule with a list of the interview rooms, interview teams, interview
times, and student names. As each student comes to the reception desk, you will find the
interview station where they are expected and bring them to the door/entrance.
3. After the first part of the EIIS, the oral interview, students will be directed to return to the
reception desk, where one of you will take the student to the appropriate place for the
next part of the interview.
4. Here follows a list of locations for the various parts of the interview:
Part I: Listening: watching video from YouTube called Ted Talk and answering
questions: computer bank in TRC. Each student will need a set of headphones. These
are available from a librarian at the circulation desk.
Part II, III, IV: Grammar, Reading, & Essay Writing: Reading of article and answering
objective questions: large tables or individual chairs with writing surface on the right
side of the Teaching Resource Center (TRC). Place one student at each table or chair.
Be sure each student has plenty of elbow space and privacy.
Part V: Oral Interview: assigned station; see schedule.
Turn to the next page
Original Test Design: Placement Exam 43
Specific Procedure
1. Part I: Listening: Give students pages 1, 2, and 3. Take students to one of the
computers in the TRC and give them a set of headphones. Point out and remind students
to carefully read the instructions on pages 2 and 3. Also point out that page 2 should be
used for taking notes on the video presentation. Tell students that they have 10 minutes to
complete this section. Note the starting and stopping times. Collect the pages and place
in the student folder, from which I will retrieve them. Guide them to return to the
reception desk.
2. Part II & III: Grammar & Reading: When students return to the reception desk, give
them a copy of pages 4, 5, 6, 7, and 8. Bring students to a table or chair in the Teaching
Resources Center (TRC). Instruct the student to read the instructions on both pages
carefully. Tell the students they have 25 minutes to complete Part II and III. Write down
the student starting and stopping times on the sheet provided. Collect the pages and place
in the student folder, from which I will retrieve them.
3. Part IV: Essay Writing: At the same location (TRC), give students a copy of pages 9
and 10. Remind students to read all of the instructions before they begin to write.
Remind them to write at least 180 words. Students have 20 minutes to complete this
essay. Record starting and stopping times. If the student does not come to you, you
should go to the student and inform him or her that time is up. When the student has
completed the essay, collect the pages and place in the student folder, from which I will
retrieve them. Ask them to sign up for an oral interview.
4. Photocopies and distribution—As soon as you have each student’s essay/paragraph,
make two copies (the reference librarian will give you money) and bring the three copies
of the paragraph to the team that interviewed the student. (Be sure the team is not in the
middle of an interview with another student.) If the team is waiting till all paragraphs are
done, store the three copies in the student folder and make sure the copies get to the right
place.
5. Part V: Oral Interview: As students come in, they will be helped right away, or asked to
take a seat until a team member is available. Greet the student and ask for the student’s
name. Check your schedule to see where the student will be interviewed and accompany
the student to the entrance of the interview station. The interview team will take over
from there.
THANK YOU!
Original Test Design: Placement Exam 44
Appendix B: Original Logistics Guide
Instructions for Logistics Team for EIIS Administration August 24, 2012
Kerrie Best ,Fanny Gonzales Garcia, Giovi Romero, Yushin Tsai
9:00 AM – 12:00 noon
General Information
1. There will be a reception desk in front of the circulation desk in the library. All students
taking the interview will be told to report to the reception desk. You team members will
be at that desk to welcome students as they arrive. There will also be a few chairs for
students who may have to wait a minute for your attention.
2. You will have a schedule with a list of the interview rooms, interview teams, interview
times, and student names. As each student comes to the reception desk, you will find the
interview station where they are expected and bring them to the door/entrance.
3. After the first part of the EIIS, the oral interview, students will be directed to return to the
reception desk, where one of you will take the student to the appropriate place for the
next part of the interview.
4. Here follows a list of locations for the various parts of the interview:
a. Part I: Oral Interview: assigned station; see schedule
b. Part II: Reading of article and answering objective questions: large tables or
individual chairs with writing surface on the right side of the Teaching Resource
Center (TRC). Place one student at each table or chair. Be sure each student has
plenty of elbow space and privacy.
c. Part III: watching video lecture and answering questions: computer bank in
TRC. Each student will need a set of head phones. These are available from a
librarian at the circulation desk.
d. Part IV: writing prompt: large tables or individual chairs with writing surfaces on
right hand side of TRC
Specific Procedure
1. Part I--As students come in, they will be helped right away, or asked to take a seat
until a team member is available. Greet the student and ask for the student’s name.
Check your schedule to see where the student will be interviewed and accompany the
student to the entrance of the interview station. The interview team will take over
from there.
2. Part II—When students return to the reception desk, give them a copy of pages 9 and
10. Bring students to a table or chair in the Teaching Resources Center (TRC).
Instruct the student to read the instructions on both pages carefully. Tell the students
they have 20 minutes to complete this part of the interview. Write down the student
starting and stopping times on the sheet provided. If students do not return to you 20
minutes after they have started, go to the CRC and politely inform them that time is
up. Collect the pages and place in the student folder, from which I will retrieve them.
3. Part III—Give students pages 16, 17, and 18. Take students to one of the computers
in the TRC and give them a set of headphones. Point out and remind students to
carefully read the instructions on page 18. Also point out that page 17 should be used
Original Test Design: Placement Exam 45
for taking notes on the video-taped lecture and that the chart on page 16 is a copy of a
chart shown briefly during the lecture. Tell students that they have 20 minutes to
complete this section. Note the starting and stopping times. Again, if the student
does not come to you after 20 minutes, you should go to the student. You should
collect the answer sheet, page 18, but instruct the student to keep the chart and the
notes for use with the final part of the interview.
4. Part IV—Take students, with their pages 16 and 17, back to a table or chair in the
TRC. Give them copies of pages 20 and 21 (this is one sheet that has the instructions
and a lined area for writing the essay). Remind students to read all of the instructions
before they begin to write. Remind them also that they can refer to their notes.
Students have 30 minutes to complete this final part of the interview. Record starting
and stopping times. Ask students to bring their completed “mini-essay” to one of the
team at the reception desk. As always, if the student does not come to you, you
should go to the student and inform him or her that time is up. When the student has
completed this final part of the interview, please direct him or her to the Commons
for lunch.
5. Photocopies and distribution—As soon as you have each student’s essay/paragraph,
make two copies (the reference librarian will give you money) and bring the three
copies of the paragraph to the team that interviewed the student. (Be sure the team is
not in the middle of an interview with another student.) If the team is waiting till all
paragraphs are done, store the three copies in the student folder and make sure the
copies get to the right place.
THANK YOU! THANK YOU! THANK YOU!
Original Test Design: Placement Exam 46
Appendix C: Oral Interview Schedule Sample
INTERVIEW SCHEDULE
ENTRANCE INTERVIEW FOR INTERNATIONAL/ESL STUDENTS
Friday August 24, 2012, John and Louise Hulst Library, Upper Level
ORAL
INTERVIEW
BEGINS
ROOM 262
L. VAN
BEEK
J. VERSLUIS
C. HENTGES
ROOM 263
H.
SCHAAP
D. ROTH
S.
GRONECK
ROOM 264
L. ZUIDEMA
B. KUIPER
K.
SANDOUKA
ALCOVE
M.
DENGLER
N. VAN
GAALEN
A.
FOREMAN
REFERENCE
CORNER
S. TAYLOR
I. MULDER
M. DRISSEL
9:00 AM (10:30)
Ivy Mang’eli
Kenya, ex.
Winnie
Obiero
Kenya, fr.
Yonatan
Ashenafi
Ethiopia, fr.
Henry
Murray
Panama, tr.
Juan Benitez
Gonzalez
Paraguay, fr.
9:20 AM (10:50)
Alba Garcia
Macias
Mexico, fr.
Eun Hye
Jee
South
Korea, ex.
Young In
Kim
South Korea,
ex.
Eui Shin
Kim
South
Korea, ex.
Ju Eun Park
South Korea,
ex.
9:40 AM (11:10)
Bit Null Ryu
South Korea,
ex.
Jung Eun
Sun
South
Korea, ex.
Fortunate
Magara
Uganda, ex.
David
Baldusi
Alves
Brazil, fr.
Ji Eun Kim
South Korea,
ex.
10:00 AM (11:30)
Carolyne
Muthoni
Kenya, fr.
Dong Hyun
Park
South
Korea, fr.
There will be a reception desk in front of the main circulation desk of the library and a
logistics team to welcome and move our students to and from various parts of the interview.
Team members are: Kerrie, Giovi, Yuhsin, Fanny, and Sanneke Kok, Coordinator of
Academic Services for International Students.
Original Test Design: Placement Exam 47
Appendix D: Answer Key with Scoring Criteria
I. Listening Comprehension: Watch the video “English Mania” presented by Jay Walker from
Ted Talk (about 4 minutes) and answer the following questions. Please answer in less than 50
words (ALL ANSWERS MUST BE IN COMPLETE SENTENCES EXCEPT FOR QUESTIONS 2
& 3)
Transcript1
Let's talk about manias. Let's start with Beatle mania: hysterical teenagers, crying, screaming,
pandemonium. Sports mania: deafening crowds, all for one idea -- get the ball in the net. Okay,
religious mania: there's rapture, there's weeping, there's visions. Manias can be good. Manias can
be alarming. Or manias can be deadly.
The world has a new mania. A mania for learning English. Listen as Chinese students practice
their English by screaming it.
Teacher: ... change my life!
Students: I will change my life.
T: I don't want to let my parents down.
S: I don't want to let my parents down.
T: I don't ever want to let my country down.
S: I don't ever want to let my country down.
T: Most importantly ... S: Most importantly ...
T: I don't want to let myself down.
S: I don't want to let myself down.
Jay Walker: How many people are trying to learn English worldwide? Two billion of them.
Students: A t-shirt. A dress.
JW: In Latin America, in India, in Southeast Asia, and most of all in China. If you are a Chinese
student you start learning English in the third grade, by law. That's why this year China will
become the world's largest English-speaking country. (Laughter) Why English? In a single word:
Opportunity. Opportunity for a better life, a job, to be able to pay for school, or put better food
on the table. Imagine a student taking a giant test for three full days. Her score on this one
test literally determines her future. She studies 12 hours a day for three years to prepare. 25
percent of her grade is based on English. It's called the Gaokao, and 80 million high school
Chinese students have already taken this grueling test. The intensity to learn English is almost
unimaginable, unless you witness it.
Teacher: Perfect! Students: Perfect!
T: Perfect! S: Perfect!
T: I want to speak perfect English.
S: I want to speak perfect English.
T: I want to speak -- S: I want to speak --
T: perfect English. S: perfect English.
T: I want to change my life!
S: I want to change my life!
1 Walker, J. (2009, February). Jay Walker: World’s English mania [Video file]. Retrieved from
http://www.ted.com/talks/lang/en/jay_walker_on_the_world_s_english_mania.html
Original Test Design: Placement Exam 48
JW: So is English mania good or bad? Is English a tsunami, washing away other languages? Not
likely. English is the world's second language. Your native language is your life. But with
English you can become part of a wider conversation: a global conversation about global
problems, like climate change or poverty, or hunger or disease. The world has other universal
languages. Mathematics is the language of science. Music is the language of emotions. And now
English is becoming the language of problem-solving. Not because America is pushing it, but
because the world is pulling it. So English mania is a turning point. Like the harnessing of
electricity in our cities or the fall of the Berlin Wall, English represents hope for a better future --
a future where the world has a common language to solve its common problems.
Short-answer questions: Spelling errors are allowed; Deduct 1 pt when the sentences are not
“complete” except for question 2 & 3 (2 pts per question =10 pts total)
1. In your own words, define the word “mania.”
*Acceptable words: enthusiasm, passion, craze, popular trend generating wide
enthusiasms, hysteria, craziness, alarming
2. How many people are trying to learn English worldwide?
*Answer: 2 (two) billion
3. Name at least 3 countries/regions that the speaker mentioned that are manias for English?
*Answer: Latin America, India, Southeast Asia, and China
4. According to the speaker, why are so many people trying to learn English?
*Answer: 2pt: opportunity, for better life, hope, language of problem solving, world’s
second language (full credit); 1pt: “acceptable words”: job, pay for school, put better
food on the table, academic achievement; 0 pt: no mention of any of the words
5. What is the speaker’s opinion on English mania?
*Answer: English mania is more “positive” than negative; English mania is positive; it is
a “turning point”=2 pts; no mention of “good”=0 pts.
Original Test Design: Placement Exam 49
II. Grammar: The passage was taken from the New York Times newspaper, published on April
3, 2012. Read the following passage and cross out 15 “extra” words that make the sentences
grammatically incorrect (15 pts total).
Example: The boys is are singing the national anthem.
“How Immersion Helps to Learn a Language”2
Answer key: The crossed out words are bolded
Learning (1) a the foreign language is never easy, but contrary to common wisdom, it is possible
for adults to process a language (2) a the same way (3) a the native speaker does. And over time,
processing improves even when the skill goes unused, researchers are reporting.
For (4) there their study, (5) in on the journal PloS One, the scientists used an artificial language
of 13 words, completely different from English. “It’s totally (6) unpractical impractical to
follow someone to high proficiency because it takes years and years,” said the lead author,
Michael Ullman, a neuroscientist at Georgetown University Medical Center.
The language dealt with pieces and moves in (7) a the computer game, and the researchers tested
proficiency by asking test subjects to play (8) a the game.
The subjects (9) are were split into two groups. One group studied the language in a formal
classroom setting, while the other (10) was were trained through immersion.
After five months, both groups retained the language (11) even though because they had not
used it at all, and both displayed brain processing similar to that of a native speaker. But the
immersion group displayed the full brain patterns (12) for of a native speaker, Dr. Ullman said.
The research has several applications, Dr. Ullman said.
“This should help us understand how foreign-language learners can achieve native like
processing with (13) increase increased practice,” he said. “It makes sense that you’d want to
have your brain process like (14) a the native speaker.”
And though it may (15) take takes time, and more research, the work “also could or should help
in rehabilitation of people with traumatic brain injury,” he added.
2 Bahanoo, S. (2012, April 3). How immersion helps to learn a new language. The New York Times. Retrieved from
http://www.nytimes.com/2012/04/03/science/how-immersion-helps-to-learn-a-new-language.html
Original Test Design: Placement Exam 50
III. Reading Comprehension: (10 pts- 1pt each)
Passage 1: “The World’s Oldest First Grader”
1. Based on the passage, we can infer that before 2003, primary education in Kenya was:
a. Not cheap
b. Not available
c. Prohibited
d. Free
2. Why was Maruge motivated to study?
a. To be in one of the top five students in his class.
b. To use his education to read the Bible.
c. To become the school’s student leader.
d. To study Swahili, English, and math.
3. Who did NOT want Maruge to be in school?
a. Kenyan government
b. First grade parents
c. Jane Obinchu
d. None of the above
4. The main idea in paragraph (E) is:
a. People were fighting and burning houses in the village.
b. It was too difficult to live in a tent at a refugee camp.
c. Maruge did not stop studying, even during those difficult times.
d. Maruge taught other residents of the home to read and write.
5. The main idea in paragraph (G) is:
a. Maruge was an inspiration to other adult Kenyans.
b. Kenyans enjoyed the movie The First Grader.
c. Thoma Litei decided to go to school to learn.
d. The First Grader was created after Maruge’s death.
Passage 2:
1. The author’s attitude to Whorf’s theory is
a. Ambivalent
b. Neutral
c. Supportive
d. Contemptuous
Original Test Design: Placement Exam 51
2. The word trauma in the passage is closest in meaning to
a. Physical injury
b. Torture
c. Emergency
d. Agony
3. All of the following can be inferred from the text EXCEPT
a. Learning our mother tongue can lead to positive experiences.
b. The influence of mother tongue on our thoughts is significant.
c. Whorf’s theory was based on hard facts and solid common sense.
d. Whorf failed to provide any evidence to support his theory.
4. The author uses the word crash-landed to imply that Whorf’s theory was _________ hard facts
and solid common sense.
a. in favor of
b. based on
c. inconsistent with
d. critical of
5. Which of the sentences below best expresses the essential information in the boldfaced
sentence in the passage?
a. Exploring the relationship between the mother tongue and our thoughts was
frowned upon for decades.
b. People reacted severely and they explored the relationship between the mother tongue
and our thought.
c. Whorf’s theory succeeded in exploring the relationship between the mother tongue and
our thoughts.
d. Whorf’s claims were so credible that no researcher made an attempt to dishonor Whorf
for decades.
Original Test Design: Placement Exam 52
IV. Mini-essay writing: Write a mini-essay about 180-250 words according to the following
prompt. You will be tested on the following criteria: content, organization, and grammar.
Do you think learning English is important? If so, why or why not? Please provide personal
examples to support your stance (Total 100 pts).
Content Scoring: circle the appropriate score
Clearly relates or answers to the given
topic or question
Clear 5—4—3—2—1—0 Missing
Gives sufficient examples/references Sufficient 5—4—3—2—1—0 Lacking
Clear connection between
examples/references and main ideas
Clear 5—4—3—2—1—0 Missing
Correct use of vocabulary words Correct 5—4—3—2—1—0 Incorrect
Sufficient number of words (180-250) Target #: 5; 160-179 words: 4; 140-159 words: 3;
120-139 words: 2; 100-119 words: 1; less than 100
words: 0
Subtotal: points for content _________/25
Organization Scoring: circle the appropriate score
Topic or introductory sentence Clear 5—4—Not Clear 3—2—1—Missing 0
Concluding sentence Clear 5—4—Not Clear 3—2—1—Missing 0
Coherence (logical progression and
development of ideas, good flow)
Always 5—4—Sometimes 3—2—1 Never 0
Cohesion (good connections between
sentences)
Always 5—4—Sometimes 3—2—1—Never 0
Sentence variety (both simple and
compound and/or complex)
Good Variety 5—4—Some Variety 3—2—1__Never
0
Subtotal: points for organization ________/25
Grammar Scoring: take off one point for each error in the
categories indicated. Circle the # of remaining pts.
Correct spelling (subtract 1 pt .ea. new
error)
5 4 3 2 1 0
Correct use of articles and prepositions 5 4 3 2 1 0
Standard capitalization 5 4 3 2 1 0
Standard punctuation (periods, commas,
semicolons)
5 4 3 2 1 0
Standard sentence word order 5 4 3 2 1 0
Agreement between subjects & verbs,
nouns and pronouns/antecedents
5 4 3 2 1 0
Correct verb tense and usage 5 4 3 2 1 0
Correct adverb and adjective usage 5 4 3 2 1 0
Appropriately placed phrasal modifiers 5 4 3 2 1 0
Standard academic diction (avoidance of
slang and informal language)
5 4 3 2 1 0
Subtotal: points for grammar _______/50
TOTAL POINTS _______/100
Original Test Design: Placement Exam 53
V. Oral Interview: 25 pts total
In the United States, many universities require students to learn an additional language other
than their native language.
Do you think universities in your home country should require students to learn an additional
language (other than your native language)? Why or why not?
You have 2 minutes to prepare. Use the space below to write down an outline or important points
that you want to discuss. You will be given maximum 3 minutes to answer the question. You
can use your notes to talk but do not read aloud what you have written out. Please relate the
issue to your personal experience and cultural background.
*For this subjectively scored portion, the following criteria will be assessed: Content and
Fluency & Accuracy; we will use an “analytic scale”:
Oral Interview Criteria
Content Scoring: circle the appropriate score
Clearly relates or answers to the given topic or
question
Clear 5—4—3—2—1—0 Missing
Gives adequate and meaningful examples/references Sufficient 5—4—3—2—1—0 Lacking
Clear connection between examples/references and
main ideas
Clear 5—4—3—2—1—0 Missing
Correct use of vocabulary words Correct 5—4—3—2—1—0 Incorrect
Accuracy
Correct use of grammar Correct 5—4—3—2—1—0 Incorrect
Clear pronunciation of words Clear 5—4—Not Clear 3—2—1—0
Missing
Fluency
Coherence (logical progression and development of
ideas, good flow)
Always 5—4—Sometimes 3—2—1— 0
Never
Fluency in speech (with few use of circumlocution
and few hesitation)
Fluent 5—4— Somewhat Fluent 3—2—
1—0 Not Fluent
TOTAL POINTS ______/40
Original Test Design: Placement Exam 54
Appendix E: Dordt College Placement Test (DCPT)
Instruction: The placement test consists of 5 sections (about 1 hour
total)
For the listening comprehension, the test instructor will play a short
video.
I. Listening Comprehension (10 minutes)
II. Grammar (5 minutes)
III. Reading Comprehension (20 minutes)
IV. Mini-Essay Writing (20 minutes)
After you complete all four sections, submit your test to the test
instructor and schedule a time to do the oral interview section. The test
instructor will provide you the oral interview section of the test. You
will be interviewed individually and the interview will be audio-
recorded.
V. Oral Interview (about 5 minutes)
Name: ___________________
Original Test Design: Placement Exam 55
I. Listening Comprehension: Short-Answer Questions
Watch the video “World’s English Mania” presented by Jay Walker from Ted Talk (about 4
minutes).3 Use the space below to take notes. After watching the video, answer the following five
questions. Please answer in less than 50 words (ALL ANSWERS MUST BE IN COMPLETE
SENTENCES EXCEPT FOR QUESTIONS 2 & 3)
3 Walker, J. (2009, February). Jay Walker: World’s English mania [Video file]. Retrieved from
http://www.ted.com/talks/lang/en/jay_walker_on_the_world_s_english_mania.html
Use this space to take notes
TURN TO NEXT PAGE FOR QUESTIONS
Original Test Design: Placement Exam 56
Please answer in less than 50 words (ALL ANSWERS MUST BE IN COMPLETE
SENTENCES EXCEPT FOR QUESTIONS 2 & 3)
1. In your own words, define the word “mania.”
2. How many people are trying to learn English worldwide?
3. Name at least 3 places (countries or regions) that the speaker mentioned that HAVE
manias for English?
4. According to the speaker, why are so many people trying to learn English?
5. What is the speaker’s opinion of English manias?
II. Grammar: The passage was taken from the New York Times newspaper, published on April
Original Test Design: Placement Exam 57
3, 2012.4 Read the following passage and cross out 15 “extra” words that make the sentences
grammatically incorrect.
Example: The boys is are singing the national anthem.
“How Immersion Helps to Learn a Language”
Learning a the foreign language is never easy, but contrary to common wisdom, it is possible for
adults to process a language a the same way a the native speaker does. And over time, processing
improves even when the skill goes unused, researchers are reporting.
For there their study, in on the journal PloS One, the scientists used an artificial language of 13
words, completely different from English. “It’s totally unpractical impractical to follow someone
to high proficiency because it takes years and years,” said the lead author, Michael Ullman, a
neuroscientist at Georgetown University Medical Center.
The language dealt with pieces and moves in a the computer game, and the researchers tested
proficiency by asking test subjects to play a the game.
The subjects are were split into two groups. One group studied the language in a formal
classroom setting, while the other was were trained through immersion.
After five months, both groups retained the language even though because they had not used it at
all, and both displayed brain processing similar to that of a native speaker. But the immersion
group displayed the full brain patterns for of a native speaker, Dr. Ullman said.
The research has several applications, Dr. Ullman said.
“This should help us understand how foreign-language learners can achieve native like
processing with increase increased practice,” he said. “It makes sense that you’d want to have
your brain process like a the native speaker.”
And though it may take takes time, and more research, the work “also could or should help in
rehabilitation of people with traumatic brain injury,” he added.
4 Bahanoo, S. (2012, April 3). How immersion helps to learn a new language. The New York Times. Retrieved from
http://www.nytimes.com/2012/04/03/science/how-immersion-helps-to-learn-a-new-language.html
Original Test Design: Placement Exam 58
III. Reading Comprehension:
Passage 1: The passage was taken from the National Geographic Learning.5 Read the passage
below and answer the multiple-choice questions following the passage. Circle the letter of the
best answer.
“The World’s Oldest First Grader”
On January 12, 2004, Kimani Maruge knocked on the door of the primary school in his
village in Kenya. It was the first day of school, and he was ready to start learning. The
teacher let him in and gave him a desk. The new student sat down with the rest of the first
graders—six- and seven-year-old boys and girls. However, Kimani Maruge was not an
ordinary first grader. He was 84 years old—the world’s oldest first grader.
Kimani Maruge was born in Kenya in 1920. At that time, primary education in Kenya
was not free, and Maruge’s family didn’t have enough money to pay for school. When
Maruge grew up, he worked hard as a farmer. In the 1950s, he fought with other Kenyans
against the British colonists. After years of fighting, Kenya became independent in 1963.
In 2003, the Kenyan government began offering free primary education to everyone, and
Maruge wanted an education, too. However, it wasn’t always easy for Maruge to attend
school. Many of the first graders’ parents didn’t want an old man in their children’s class.
School officials said that a primary education was only for children. But the school
principal, Jane Obinchu, believed Maruge was right. With her help, he was able to stay in
school.
Maruge was a motivated and successful student. In fact, he was one of the top five
students in his first grade class. In second grade, Maruge became the school’s student
leader. He went as far as seventh grade, the final year of primary school. Over the years,
Maruge studied Swahili, English, and math. He wanted to use his education to read the
Bible and to study veterinary medicine.
In 2008, there were problems in Kenya after an election. People were fighting and
burning houses in Maruge’s village. Maruge moved to a refugee camp for safety and
lived in a tent. However, even during those difficult times he continued to go to school.
Later that year, he moved to a home for the elderly. He continued going to school, and
even taught other residents of the home to read and write.
In 2005, Maruge flew in a plane for the first time in his life. He traveled to New York
City, where he gave a speech at the United Nations. He spoke about the importance of
education and asked for help to educate the people of Kenya. Maruge also wanted to
improve primary education for children in Africa.
5 The passage was printed in Vargo, M. & Blass, L. (2013). Pathways 1: Reading, writing, and critical thinking.
Boston: National Geographic Learning.
A
B
C
D
E
F
Original Test Design: Placement Exam 59
Maruge died in 2009, at age 89. However, his story lives on. The 2010 movie The First
Grader showed Maruge’s amazing fight to get an education. Many older Kenyans
decided to start school after seeing The First Grader. One of those people was 19-year-
old Thoma Litei. Litei said, “I knew it was not too late. I wanted to read, and to know
more language, so I came [to school] to learn. That is why it is important for his story to
be known.”
1. Based on the passage, we can infer that before 2003, primary education in Kenya was:
a. Not cheap
b. Not available
c. Prohibited
d. Free
2. Why was Maruge motivated to study?
a. To be in one of the top five students in his class.
b. To use his education to read the Bible.
c. To become the school’s student leader.
d. To study Swahili, English, and math.
3. Who did NOT want Maruge to be in school?
a. Kenyan government
b. First grade parents
c. Jane Obinchu
d. None of the above
4. The main idea in paragraph (E) is:
a. People were fighting and burning houses in the village.
b. It was too difficult to live in a tent at a refugee camp.
c. Maruge did not stop studying, even during those difficult times.
d. Maruge taught other residents of the home to read and write.
5. The main idea in paragraph (G) is:
a. Maruge was an inspiration to other adult Kenyans.
b. Kenyans enjoyed the movie The First Grader.
c. Thoma Litei decided to go to school to learn.
d. The First Grader was created after Maruge’s death.
G
Original Test Design: Placement Exam 60
Passage 2: The following extract was taken from the article “Does Your Language Shape How
You Think?” published in the New York Times magazine6. Read the passage below and answer
the multiple-choice questions following the passage. Circle the letter of the best answer.
Benjamin Lee Whorf’s theory crash-landed on hard facts and solid common sense, when
it transpired1 that there had never actually been any evidence to support his fantastic claims. The
reaction was so severe that for decades, any attempts to explore the influence of the mother
tongue on our thoughts were relegated2 to the loony
3 fringes
4 of disrepute
5. But 70 years on,
it is surely time to put the trauma of Whorf behind us. And in the last few years, new research
has revealed that when we learn our mother tongue, we do after all acquire certain habits of
thought that shape our experience in significant and often surprising ways.
1. The author’s attitude to Whorf’s theory is
a. Ambivalent
b. Neutral
c. Supportive
d. Contemptuous
2. The word trauma in the passage is closest in meaning to
a. Physical injury
b. Torture
c. Emergency
d. Agony
3. All of the following can be inferred from the text EXCEPT
a. Learning our mother tongue can lead to positive experiences.
b. The influence of mother tongue on our thoughts is significant.
c. Whorf’s theory was based on hard facts and solid common sense.
d. Whorf failed to provide any evidence to support his theory.
Turn to the next page
6 Deutscher, G. (2010, August 26). Does your language share how you think? The New York Times. Retrieved from
http://www.nytimes.com/2010/08/29/magazine/29language-t.html?pagewanted=all
Vocabulary word-bank:
1. transpire: occur, happen 2. relegate: assign, transfer 3. loony: crazy
4. fringe: border, trimming 5. disrepute: dishonor
Original Test Design: Placement Exam 61
4. The author uses the word crash-landed to imply that Whorf’s theory was _________ hard facts
and solid common sense.
6. in favor of
7. based on
8. inconsistent with
9. critical of
5. Which of the sentences below best expresses the essential information in the boldfaced
sentence in the passage?
e. Exploring the relationship between the mother tongue and our thoughts was frowned
upon for decades.
f. People reacted severely and they explored the relationship between the mother tongue
and our thoughts.
g. Whorf’s theory succeeded in exploring the relationship between the mother tongue and
our thoughts.
h. Whorf’s claims were so credible that no researcher made an attempt to dishonor Whorf
for decades.
Original Test Design: Placement Exam 62
IV. Mini-essay writing
Write a mini-essay about 180-250 words according to the following prompt. You will be tested
on the following criteria: content, organization, and grammar. Feel free to use the back page for
more space.
Do you think learning English is important? If so, why or why not? Please provide personal
examples to support your stance (in addition, you may refer to what you have learned from the
video “English Mania”).
END OF SECTION IV
SUBMIT YOUR TEST AND SCHEDULE AN ORAL INTERVIEW
Original Test Design: Placement Exam 63
NAME:_________________
V. Oral Interview:
In the United States, many universities require students to learn an additional language other
than their native language.
Do you think universities in your home country should require students to learn an additional
language (other than your native language)? Why or why not?
You have 2 minutes to prepare. Use the space below to write down an outline or important points
that you want to discuss. You will be given maximum 3 minutes to answer the question. You
can use your notes to talk but do not read aloud what you have written out. Please relate the
issue to your personal experience and cultural background.
Original Test Design: Placement Exam 64
Appendix F: Getting Started Worksheet (Alderson, Clapham, & Wall, 1995)
Worksheet for Getting Started on Your Original Test
1. What is the purpose of the test? (How will the information you gather be used? Are you
measuring achievement or progress? Are you placing students in a program?)
This test serves as an entrance (placement) test for incoming international students (normally 10-
12 students per semester). These international students will either attend college for all four years
(we’ll call them “regular” students”) or just for one year as “exchange” student. All international
students have to take this test—this test will determine whether they can take general English
core requirement (e.g., ENG 101). If the students do not pass this test, then they are required to
take the ESL courses—Reading & Writing and/or Speaking & Listening. It is only one level.
Students will be required to take either one of the ESL courses or to take both.
2. What sort of learners will be taking the test? (Describe the 2LLs’ age, first language[s],
purpose for learning the target language, etc.)
All international students who are admitted to Dordt College will be required to take this test. As
mentioned previously, these international students could either be regular or exchange students.
They come from all different countries with diverse L1’s. However, based on our interview with
the ESL professor, most of the students are from South Korea; there are some students from
Turkey, Mexico, and various African countries. The age group ranges from 18 to 25 years old.
3. What language skills should be tested (reading, writing, speaking and/or listening)?
We will be testing all four skills. Since there are not very many international students (around
10-12 per semester) and only one ESL level, we decided to revise the current placement/entrance
exam. Dordt College is Hala Sun’s alma mater.
4. What language elements should be tested (grammar, vocabulary, pronunciation, speech acts,
etc.)?
I. Listening comprehension: Content, Comprehension
II. Grammar: Grammar
III. Reading comprehension: Vocabulary, Comprehension, Grammar
IV. Mini-essay writing: Content, Organization, Grammar
V. Oral interview: Content, Fluency and Accuracy (Grammar, Pronunciation, Coherence, and
Fluency)
5. What target language situation is envisaged for the test, and is this to be simulated in some way
in the test content and method? (For instance, is this a test of academic French? Of English for
international TAs? Of Japanese for hotel workers in California?)
English for Academic Purposes in college
Original Test Design: Placement Exam 65
6. What text types should be chosen as stimulus material—written and/or spoken?
I. Listening comprehension: 1 approximately four-minute video/audio clip (a speech about
“World’s English mania” taken from Ted Talk)
II. Grammar: 1 written text with grammatical errors
III. Reading comprehension: 2 written texts (academic in nature)
IV. Mini-essay writing: 1 written essay question
V. Oral interview: The administrator will give students a role-play scenario, in which students
will have two minutes to prepare their speech and three minutes to perform their speech orally.
7. What sort of tasks are required -- discrete point, integrative, simulated ‘authentic’,
objectively assessable? (That is, what will the test-takers actually do?)
I. Listening comprehension: Students will watch and listen to a video clip (a speech from Ted
Talk). Students can take notes while watching/listening to the video clip. Students will then
answer 5 short answer questions (they can answer the questions while they are
watching/listening to the video).
II. Grammar: This is a cloze elide test. Students have to read a text and cross out 15 “extra”
words that make the sentences grammatically incorrect. This requires students editing skills as
well as their knowledge in grammar. This is objectively assessable as well since there will be
exact answers (words that need to be crossed out). Test scorers will only count the correctly
crossed answers; students do not lose points for incorrectly crossing (students will not be aware
of this specific aspect of the scoring method to avoid crossing out all or many words as they
can).
III. Reading comprehension: This test is objectively assessable (multiple-choice questions).
After reading two passages, students are required to answer the MC-questions by choosing the
best answer (questions will cover comprehension, vocabulary, and grammar aspects).
IV. Mini-essay writing: For this test, students have to read the prompt (subject/question/topic of
the essay) and write an essay (hand-written); they will be required to write minimum 180 and
maximum 250-word essay.
V. Oral interview: It is an integrative test examining the use of language elements (Grammar,
Vocabulary, Fluency, Comprehension, and Pronunciation). The test takers will have two minutes
to prepare their speech and three minutes to perform their speech orally. There will be an
analytic scale to assess these language elements.
8. What test methods (what item formats) are to be used? (One multiple-choice subtest is
required.)
I. Listening comprehension: 5 short answer questions
II. Grammar: 1 cloze elide test; crossing out “extra” (grammatically incorrect) words from 1
written text (15 crossed-out words in total)
Original Test Design: Placement Exam 66
III. Reading comprehension: two 5 multiple-choice questions (total of 10 questions)
IV. Mini-essay writing: 1 written essay question
V. Oral interview: Responding to 1 question (given orally)
9. How many sections should the test have, how long should they be and how will they
be differentiated? (There will be at least three sections – more if you are working with another
student.)
I. Listening comprehension: about 10 minutes
II. Grammar: 5 minutes to read and cross out “extra”/grammatically incorrect words
III. Reading comprehension: 20 minutes
IV. Mini-essay writing: 20 minutes to answer 1 essay question
V. Oral interview: about 5 minutes
10. How many items are required for each section? What is the relative weight for each
item?
I. Listening comprehension: 5 short answer questions (each is worth 2 points; 10 pts max)
II. Grammar: 1 written text with 15 “extra”/grammatically incorrect words (15 items; 1 pt each;
15 pts max)
III. Reading comprehension: 10 questions (two sections of 5 questions (10 pts; 1 pt each item)
IV. Mini-essay writing: 1 essay question (but more than one point for scoring; 100 pts max)
V. Oral interview: One 3-minute speech (40 pts max)
TOTAL Maximum Points: 175 pts
11. What rubrics are to be used as instructions for candidates? (That is, what instructions
and guidance are printed in the test and/or announced by the test administrator?)
Instructions and guidance are printed in the test in English. For oral interview, test administrators
will read the instruction to the student. The student will then be given 2 minutes to prepare
his/her speech, responding to the prompt. Once the time is up, the test administrator notifies the
student and gives him/her 3 minutes to respond. For listening comprehension, test administrators
will play the audio/video file. Students can take notes and proceed to answer the short answer
questions as they listen/watch the clip. The audio/video file will only be played once.
12. Which criteria will be used for assessment by markers? (In other words, describe how
the answer key will be developed for the objectively scored portion, and explain the
rating system for the subjectively scored portion.)
I. Listening comprehension: For this subjectively scored portion, the following criteria will be
assessed: Content and Comprehension (understanding the main points).
Listening Comprehension Criteria
Spelling errors are allowed; Deduct 1 pt when the sentences are not “complete” except for
question 2 & 3. Total possible points: 10 pts.
Original Test Design: Placement Exam 67
10. In your own words, define the word “mania.”
*Acceptable words: enthusiasm, passion, desire, craze, popular trend generating wide
enthusiasms, hysteria, craziness, alarming, deeply fascinated
11. How many people are trying to learn English worldwide?
*Answer: 2 (two) billion
12. Name at least 3 countries/regions that the speaker mentioned that are manias for English?
*Answer: Latin America, India, Southeast Asia, and China; 2 pts when three countries
are mentioned; only 1 pt when two countries are correct (one country is incorrect); 0
points for no answer or none of these countries are mentioned
13. According to the speaker, why are so many people trying to learn English?
*Answer: 2pt: opportunity, for better life, hope, language of problem solving, world’s
second language (full credit); 1pt: “acceptable words”: job, pay for school, put better
food on the table, academic achievement; 0 pt: no mention of any of the words
14. What is the speaker’s opinion on English mania?
*Answer: English mania is more “positive” than negative; Speaker’s opinion is neutral is
OK; English mania is positive; it is a “turning point”=2 pts; no mention of “good”=0 pts.
II. Grammar: Objectively scored. For each “extra” word (choice a) along with the correct word
(choice b), we will item that as (Question 1). Students get one mark for each correct answer (15
total).
III. Reading comprehension: MC questions— Students get one mark for each correct answer
(10 total). After piloting this test, we will do the following analyses: Item-discriminability, Item
facility, distractor analysis, and response frequency distribution. These analyses would enable us
to find out more about the questions and the choices we wrote.
IV. Mini-essay writing: Subjectively scored; the following criteria will be assessed; Content,
Organization, and Grammar; we will calculate the scores based on our essay criteria and
categorize the scores into following score system (analytic). We will use the interrater reliability
to test the validity of this section.
Essay Criteria
Content Scoring: circle the appropriate score
Clearly relates or answers to the given
topic or question
Clear 5—4—3—2—1—0 Missing
Gives sufficient examples/references Sufficient 5—4—3—2—1—0 Lacking
Clear connection between
examples/references and main ideas
Clear 5—4—3—2—1—0 Missing
Correct use of vocabulary words Correct 5—4—3—2—1—0 Incorrect
Sufficient number of words (180-250) Target #: 5; 160-179 words: 4; 140-159 words: 3;
120-139 words: 2; 100-119 words: 1; less than 100
words: 0
Subtotal: points for content _________/25
Original Test Design: Placement Exam 68
Organization Scoring: circle the appropriate score
Topic or introductory sentence Clear 5—4—Not Clear 3—2—1—Missing 0
Concluding sentence Clear 5—4—Not Clear 3—2—1—Missing 0
Coherence (logical progression and
development of ideas, good flow)
Always 5—4—Sometimes 3—2—1 Never 0
Cohesion (good connections between
sentences)
Always 5—4—Sometimes 3—2—1—Never 0
Sentence variety (both simple and
compound and/or complex)
Good Variety 5—4—Some Variety 3—2—1__Never
0
Subtotal: points for organization ________/25
Grammar Scoring: take off one point for each error in the
categories indicated. Circle the # of remaining pts.
Correct spelling (subtract 1 pt .ea. new
error)
5 4 3 2 1 0
Correct use of articles and prepositions 5 4 3 2 1 0
Standard capitalization 5 4 3 2 1 0
Standard punctuation (periods, commas,
semicolons)
5 4 3 2 1 0
Standard sentence word order 5 4 3 2 1 0
Agreement between subjects & verbs,
nouns and pronouns/antecedents
5 4 3 2 1 0
Correct verb tense and usage 5 4 3 2 1 0
Correct adverb and adjective usage 5 4 3 2 1 0
Appropriately placed phrasal modifiers 5 4 3 2 1 0
Standard academic diction (avoidance of
slang and informal language)
5 4 3 2 1 0
Subtotal: points for grammar _______/50
TOTAL POINTS _______/100
V. Oral interview: For this subjectively scored portion, the following criteria will be assessed:
Grammar, Vocabulary, Fluency, Comprehension, and Pronunciation; we will use an “analytic
scale”; we will use interrater reliability to test the validity of this section.
Oral Interview Criteria
Content Scoring: circle the appropriate score
Clearly relates or answers to the given topic or
question
Clear 5—4—3—2—1—0 Missing
Gives adequate and meaningful examples/references Sufficient 5—4—3—2—1—0 Lacking
Clear connection between examples/references and
main ideas
Clear 5—4—3—2—1—0 Missing
Correct use of vocabulary words Correct 5—4—3—2—1—0 Incorrect
Original Test Design: Placement Exam 69
Accuracy
Correct use of grammar Correct 5—4—3—2—1—0 Incorrect
Clear pronunciation of words Clear 5—4—Not Clear 3—2—1—0
Missing
Fluency
Coherence (logical progression and development of
ideas, good flow)
Always 5—4—Sometimes 3—2—1— 0
Never
Fluency in speech (with few use of circumlocution
and few hesitation)
Fluent 5—4— Somewhat Fluent 3—2—
1—0 Not Fluent
TOTAL POINTS ______/40
Original Test Design: Placement Exam 70
Appendix G
Weshe’s (1983) Four Components Framework
Subtest Stimulus
Materials
Task Posed to
the Learner
Learner’s
Response
Scoring Criteria*
Listening The test-taker
watches a video
clip of “English
Mania” presented
by Jay Walker
(2009). The test
also contains five
short-answer
questions related to
the content of the
video.
The test-taker
must watch and
listen to the
video and
identify
important
information.
The test-taker
must write
down their
responses to
the questions.
Questions 2 and 3
(requiring specific
number and country
names) are marked
using the exact word
method. The remaining
questions are marked
using the acceptable
word method. Students
are given either 2 points
or 0 points. For
Question 3, partial credit
(1 pt) is given when at
least two correct
countries are mentioned.
Grammar The test-taker reads
an article from the
New York Times
(Bahanoo, 2012).
The test-taker
must identify 15
extra words
inserted within a
sentence that
makes the
sentence
ungrammatical
based on the
structural rules
of English; the
test-taker must
pay attention to
the details of the
reading to find
multiple
grammar errors,
such as use of
articles and
tenses.
The test-taker
must cross out
the extra
words.
The test-taker gets
points when he/she
crosses out the exact
incorrect words.
Original Test Design: Placement Exam 71
*Note. The keys and rubrics of the scoring criteria were all pre-established by the test designers,
although the rubric of the oral interview was modified subject to the students’ responses from the
piloting tests.
Appendix G (Con’t)
Wesche’s (1983) Four Components Framework
Subtest Stimulus
Materials
Task Posed to the
Learner
Learner’s
Response
Scoring Criteria*
Reading The test-taker
reads 1 long
passage and 1
short passage.
The test contains
5 multiple-
choice questions
for each passage.
The test-taker must
identify the main
ideas of the
readings and
define the meaning
of the words
within the given
context.
The test-taker
must circle the
letter
representing the
answer to a
question.
The test-taker gets
points when they circle
the correct letters of the
multiple-choice
questions, as determined
by the established key.
Mini-
essay
Writing
An essay prompt
is presented to
the test-taker.
The test-taker must
read and answer to
the given prompt.
He/She must
compose an
organized writing
with sufficient
examples and
correct use of
vocabulary and
grammar.
The test-taker
must write an
essay about
180-250 words
that states,
explains, and
supports his/her
opinion on the
given prompt.
The test-taker’s essay is
subjectively scored
based on an analytic
rubric set by the test
designers. The rubric
consists of three
sections, content,
organization and
grammar.
Oral
Interview
A role-play
scenario is given
to the test-taker.
The test-taker must
read the prompt,
understand the
context and adopt
the role given in
the scenario.
The test-taker
must take 2
minutes to
prepare a
persuasive
speech that
states, explains,
and supports
his/her opinion
on the given
topic and
deliver it within
3 minutes.
The test-taker’s speech
is subjectively scored
based on an analytic
rubric set by the test
designers. The rubric
evaluates two aspects of
a speech which are
content and fluency and
accuracy.
Original Test Design: Placement Exam 72
Appendix H
Swain’s (1980) Four Principles of Communicative Language Test Development
Subtest Start from
somewhere
Concentrate on
content
Bias for best Work for
washback
Listening Our choice of
this procedure is
motivated by our
intention to
simulate an
academic
situation in
which students
are given a
lecture.
Since the test-
takers are
international
students, the
topic of English
learning is
relevant to them
and the video
also serves to
activate test-
takers’ schemata.
The test-takers can
get visual support
besides the audio
input. Also, they
are allowed to take
notes when
watching the video.
The spelling errors
are not marked in
test-takers’
responses to the
comprehension
questions.
The test-takers can:
Experience a
situation of
taking a real
academic
lecture.
Practice note-
taking skills.
Grammar Citing Larsen-
Freeman’s
(1991, 1997),
Brown (2010)
defines
grammatical
knowledge as:
grammatical
forms,
grammatical
meanings and
pragmatic
meanings.
Students can
relate the content
to their own
experience in
language
learning.
The subtest
assesses multiple
grammar points,
such as use of
articles, adjectives
and verb tense.
The test-takers can:
Learn to pay
attention to the
details of the
reading
passages.
Know the
meanings are
associated with
the grammatical
forms.
Original Test Design: Placement Exam 73
Appendix H (Con’t)
Swain’s (1980) Four Principles of Communicative Language Test Development
Subtest Start from
somewhere
Concentrate on
content
Bias for best Work for
washback
Reading
The design of the
subtest was driven by
both the top-down-
processing and the
bottom-up-processing
of reading
comprehension
(Richards & Schmidt,
2010).
Consistent with
the content of
the previous
subtests, the
two articles are
also about
language
learning.
The definitions of
some difficult
vocabulary terms
are given in the
test.
Key words and
key sentences are
either underlined
or bolded for
attention.
Paragraphs are
marked with
alphabetic letters
for the
convenience of
reference.
The test-takers can:
Expand their
vocabulary
knowledge.
Learn to use
context to
interpret the
meanings of the
words.
Identify the
main ideas from
the readings.
Paraphrase the
reading.
Mini-essay
Writing
Through essay writing
task, we are able to
identify students’
strengths and
weaknesses, including
grammar usage and
vocabulary
knowledge.
The essay
prompt,
whether
learning English
is important or
not, has been
developed
through the
previous
subtests.
The test-takers can
use the materials
provided on the
test to support
their opinions.
The test-takers can:
Write in a
simulated
academic
context.
Compose an
argumentative
essay.
Incorporate
sufficient
sources into the
writing.
Oral
Interview
Besides the concern of
using direct test to
measure the test-
takers’ oral
competence, the
construct of the oral
test was also inspired
by the frequent
situations where
students are required
to orally express their
opinions supported by
examples in academic
settings.
The content is
related with the
theme of the
test, language
learning.
The test-takers can
use the materials
provided on the
test to support
their opinions.
The test-takers
have 2 minutes to
prepare and jot
down some notes
for their speech.
The test-takers can:
Experience a
simulated
academic
presentation.
Give a
persuasive
speech.
top related