a construct validation study of phonological · pdf filea construct validation study of...
TRANSCRIPT
A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL AWARENESS FOR
CHILDREN ENTERING PRE-KINDERGARTEN
by
MI-YOUNG WEBB
(Under the Direction of Seock-Ho Kim)
ABSTRACT
The purpose of this study was to determine the psychometric characteristics of phonological awareness assessment in pre-kindergarten children based on Messick’s (1989) framework for unitary construct validity. Four hundred and fifteen pre-kindergarten children were given eight tasks of phonological awareness drawn from “The Phonological Awareness Test” (Robertson & Salter, 1997). The four aspects of construct validity, including content, substantive, structural, and external aspects were examined. The item analysis indicated a high internal consistency; however, the levels of item difficulty for each task were fairly difficult for this age group. Factor analysis with varimax rotation revealed that two factors may underlie the phonological awareness measurement. Although the effect size was small, multiple regression analysis indicated a linear combination of two tasks had a statistically significant predictive validity for beginning alphabet sound knowledge in pre-kindergarten. INDEX WORDS: Validation, Messick’s unitary construct validity, Reading, Phonological awareness, Assessment
A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL AWARENESS FOR
CHILDREN ENTERING PRE-KINDERGARTEN
by
MI-YOUNG WEBB
B.S, The Cheongju University, South Korea, 1997
A Thesis Submitted to the Graduate Faculty of The University of Georgia in Partia l
Fulfillment of the Requirement for the Degree
MASTER OF ARTS
ATHENS, GEORGIA
2003
© 2003
Mi – young Webb
All Rights Reserved
A CONSTRUCT VALIDATION STUDY OF PHONOLOGICAL AWARENESS FOR
CHILDREN ENTERING PRE-KINDERGARTEN
by
MI – YOUNG WEBB
Major Professor: Seock – Ho Kim Committee: Steve Olejnik Paula Schwanenflugel Electronic Version Approved: Maureen Grasso Dean of the Graduate School The University of Georgia August 2003
iv
TABLES OF CONTENTS
Page
LIST OF TABLES .............................................................................................................vi
LIST OF FIGURES ........................................................................................................viii
CHAPTER
I INTRODUCTION............................................................................................1
Reading and Academic Performance ........................................................1
The Component of Reading Acquisition ..................................................1
Overview...................................................................................................2
II PHONOLOGICAL AWARENESS .................................................................3
Definition of Phonological Awareness .....................................................3
The Role of Phonological Awareness in Reading Acquisition.................4
Developmental Sequence of Phonological Awareness .............................5
Validity Test for Phonological Awareness Tasks .....................................7
Purpose of the Study.................................................................................9
III VALIDITY.....................................................................................................10
Traditional Conception of Validity.........................................................10
Unified Conception of Validity ..............................................................13
Validity as Integrated Evidence ..............................................................21
Facets of the Unitary Validity.................................................................22
IV METHOD .....................................................................................................24
v
Participants..............................................................................................24
Materials .................................................................................................25
Procedure ................................................................................................31
V RESILTS .......................................................................................................35
The Content Aspect of Construct Validity..............................................35
The Substantive Aspect of Construct Validity........................................35
The Structural Aspect of Construct Validity ..........................................37
The External Aspect of Construct Validity.............................................41
VI DISCUSSION...............................................................................................45
The Content Aspect of Construct Validity..............................................45
The Substantive Aspect of Construct Validity........................................48
The Structural Aspect of Construct Validity ..........................................50
The Generalizability Aspect of Construct Validity ................................55
The External Aspect of Construct Validity.............................................56
The Consequential Aspect of Construct Validity ...................................59
VII CONCLUSION ............................................................................................61
REFERENCES ..................................................................................................................63
APPENDIX: PHONOLOGICAL AWARENESS TEST..................................................96
vi
LIST OF TABLES
Page
Table 1: The Maximum Scores, the Means, and the Standard Deviations for Phonological
Awareness Tasks Based on the Preliminary Item Condition...............................69
Table 2: The Maximum Scores, the Means, and the Standard Deviations fro Phonological
Awareness Tasks Based on the Actual Item Condition.......................................70
Table 3: Coefficients Alpha and the Standard Error of Measurements for Phonological
Awareness Tasks Based on the Preliminary Item Condition...............................71
Table 4: Coefficients Alpha and the Standard Error of Measurements for Phonological
Awareness Tasks Based on the Actual Item Condition.......................................72
Table 5: The Mean Levels of Task Difficulty of Phonological Awareness ......................73
Table 6: Item Analyses for Rhyming Discrimination Task Based on the Preliminary Item
Condition..............................................................................................................74
Table 7: Item Analyses for Rhyming Discrimination Task Based on the Actual Item
Condition..............................................................................................................75
Table 8: Item Analyses for Syllable Segmentation Task Based on the Preliminary Item
Condition..............................................................................................................76
Table 9: Item Analyses for Syllable Segmentation Task Based on the Actual Item
Condition..............................................................................................................77
Table 10: Item Analyses for Initial Isolation Task Based on the Preliminary Item
Condition............................................................................................................78
vii
Table 11: Item Analyses for Initial Isolation Task Based on the Actual Item
Condition............................................................................................................79
Table 12: Item Analyses for Phoneme Blending Task Based on the Preliminary Item
Condition............................................................................................................80
Table 13: Item Analyses for Phoneme Blending Task Based on the Actual Item
Condition............................................................................................................81
Table 14: Intercorrelations among the Phonological Awareness Tasks ............................82
Table 15: Factors, Eigenvalues, Percentage of Variance Accounted for...........................83
Table 16: Factor Loadings for One-Factor Solution..........................................................84
Table 17: Factor Loadings for Two-Factor Solution after Varimax Rotation...................85
Table 18: The Means and the Standard Deviations of Alphabet Sound Upper and Lower
Case Knowledge Tests .......................................................................................86
Table 19: Predictive Correlations between Phonological Awareness Tasks and Alphabet
Sound Upper and Lower Case Knowledge Tests ..............................................87
Table 20: The Means, and the Standard Deviations of Phonological Awareness Tasks by
Gender Groups ...................................................................................................88
Table 21: The Means and the Standard Deviations of Phonological Awareness Tasks by
Ethnicity Group ..................................................................................................89
Table 22: The Means and the Standard Deviations of Phonological Awareness Tasks by
Socioeconomic Group ........................................................................................90
viii
LIST OF FIGURES
Page
Figure 1: Developmental Sequence of Phonological Awareness ......................................91
Figure 2: Facets of Unitary Validity..................................................................................92
Figure 3: Plot of Eigenvalues and Factors of Scree Test ...................................................93
Figure 4: The Procedure for Assessment Construction and Validation.............................94
1
I. INTRODUCTION
Reading and Academic Performance
Research in early reading acquisition has received considerable attention because
children’s early reading skills have a strong and continuous relationship with their later
academic performance. Children who learn to read early and well are more likely to
become familiarized with print and to increase knowledge domains (Cunningham &
Stanovich, 1997). On the other hand, children who experience difficulties in learning to
read at early ages tend to continue their reading difficulties over time regardless of
remedial services (Johnston & Allington, 1991) and delay learning in other academic
areas which highly depend on their reading skills (Stanovich, 1986; Chall, Jacobs, &
Baldwin, 1990; Stevenson & Newman, 1986).
The Component of Reading Acquisition
No single factor determines the emergence of literacy because reading
development involves complex cognitive levels and multiple activities. Some studies
indicated positive and longitudinal correlations between oral language skills and reading
(Bishop & Adams, 1990). Other research suggests vocabulary skills significantly
influence learning to read (Wagner, Torgesen, Rashotte, Hecht, Barker, Burgess,
Donahue, & Garon, 1997). Whitehurst and Lonigan (1998) proposed three different
components of emergence of literacy named oral language skills, phonological
processing abilities, and print knowledge. Lonigan, Burgess, and Anthony (2000) found
that phonological sensitivity and letter knowledge explained 54 % of the variation in
2
children’s decoding skills. Regardless of different research suggestions on the
components of emergence of literacy, a substantial amount of research has revealed a
significant and continual relationship between phonological awareness and the
acquisition of early reading and spelling (Bradley & Bryant, 1983; Goswami & Bryant,
1990). Much research has suggested that children’s implicit understanding of and ability
to manipulate the sound system of language, which is known as phonological awareness,
is a crucial precursor to the emergence of early literacy. Because of an important role of
phonological awareness in young children, a considerable amount of research has tried to
operationalize the concept of phonological awareness.
Overview
This study investigates measures of phonological awareness for pre-kindergarten
children in terms of their psychometric characteristics. This study will focus on how
framework for unitary construct validity suggested by Messick (1989) can be
implemented in practice. Before the validation study, previous research on phonological
awareness, including the relationship between phonological awareness and the early
reading acquisition, developmental sequence of phonological awareness, and the validity
study of phonological awareness, will be briefly reviewed in the next section.
3
II. PHONOLOGICAL AWARENESS
Definition of Phonological Awareness
Because phonological awareness involves understanding that words can be
divided into segments of sound smaller than a syllable and learning about individual
phonemes, one must know what a phoneme is in order to understand the concept of
phonological awareness (Torgesen & Mathes, 2000). A phoneme is the smallest unit of
sound system in a language which makes a difference in meaning. Phonemic awareness
– a subset of phonological awareness – refers to the awareness that spoken language
consists of a sequence of phonemes (Yopp & Yopp, 2000).
Broadly speaking, phonological awareness refers to the sensitivity to or explicit
awareness of and the ability to manipulate the sound units in spoken language. Thus,
phonological awareness includes the ability to generate and recognize rhyming words, to
count syllables, to segment a word into phonemes, to separate the beginning of a word
from its ending. Beginning readers should understand the fundamental principle that
speech can be segmented and these sound units can be represented by printed forms
(Liberman, Shankweiler, Fischer, & Carter, 1974). Without phonological awareness
young children have difficulty in understanding how alphabetic transcription works, and
consequently, their ability to learn to read is hindered (Torgesen, 1999; Blachman, 1994;
Liberman, Shankweiler, & Liberman, 1989).
4
The Role of Phonological Awareness in Reading Acquisition
Overwhelming evidence from a variety of populations and tasks has indicated a
strong and specific relationship between phonological awareness and early acquisition of
reading and spelling (Adams, 1990; Bradley & Bryant, 1983; Bryant, MacLean, &
Bradley, 1990; Goswami, & Bryant, 1990; Stanovich, 1992; Wagner & Torgesen, 1987).
Children who have better abilities in analyzing and manipulating rhymes, syllables, and
phonemes are better at learning to read than children who have difficulties in acquiring
these skills. The relationship between phonological awareness and early reading
acquisition is present even after such factors as intelligence, vocabulary skills, and
listening comprehension are partialled out (Bryant, MacLean, Bradley, & Crossland,
1990; Stanovich, 1992; Wagner & Torgesen, 1987).
Some researchers have explained that the complex relationship between the
sounds of speech and the signs of print makes it difficult for young readers to perceive
the phonemic segments in speech (Liberman, 1978; Torgesen & Mathes, 2000). For
example, three segments of the written word lag overlap with one another (coarticulating)
and create a single sound in speech production. Coarticulating the phonemes in words
makes it difficult for beginning readers to identify phonemes as unique parts of speech.
Also, letters and phonemes do not always correspond to each other consistently, which
means graphic symbols more or less represent the sounds of speech in different words
(Liberman, Cooper, Shankweiler, & Studdert-Kennedy, 1967).
Torgesen and Mathes (2000) expound that phonological awareness is not the only
determinant of the early acquisition of reading but it is a critical precursor to effective
reading skills. Phonological awareness promotes children’s understanding of the
5
relationship between speech and alphabetic orthography. Children must understand that
speech is comprised of sound segments at the level of phonemes in order to read the
words in print (Blachman, 1994; Liberman, Shankweiler, & Liberman, 1989; Yopp &
Yopp, 2000). Also, phono logical awareness helps children perceive the categories of
common sounds that are represented by common letters. The ability to observe the
correspondence between letters and sounds in words reinforces children’s knowledge of
common spelling patterns and accurate recognition of whole words that come up in print
repeatedly (Bryant, MacLean, & Bradley, 1990; Goswami, 1986, 1988; Torgesen &
Mathes, 2000). Finally, phonological awareness enables children to produce possible
words in context from the partially sounded out words by elaborating similar phonemes
in words. Indeed, children who are quick to develop the ability to analyze and to
construct a connection between sound segments and letters almost invariably become
better readers than children who have difficulties in developing these skills (Share &
Stanovich, 1995).
Developmental Sequence of Phonological Awareness
Numerous studies and intervention which used various tasks of phonological
awareness have found that, regardless of task requirements, phonological awareness tasks
account for a large portion of common variance of construct that underlies the
measurement. In addition, these studies have demonstrated the different developmental
levels of task difficulty (Adams, 1990; Stahl & Murray, 1994; Stanovich, Cunningham, &
Cramer, 1984; Yopp, 1988). Understanding the developmental sequence of phonological
awareness is important because it is directly related to the issues of validity of
6
assessment. Different tasks involve different levels of cognitive and linguistic abilities or
age-appropriateness, thus the child’s assessed levels of phonological awareness might be
greatly determined by the complexity of the tasks (Backman, 1983; Burt, Holm, & Dodd,
1999).
Generally, the ability to analyze larger units (rhyme and syllable) is developed
prior to the ability to analyze smaller units (phoneme). Hoien, Lundberg, Stanovich, and
Bjaalid (1995) outlined that sensitivity to rhyme is thought to be the beginning of the
developmental continuum of phonological awareness, phoneme segmentation to be the
end of the continuum, and syllable segmentation might be the intermediate level of the
continuum. Children as young as 3 years of age show sensitivity to rhyme, which is a
more global aspect of sound structure of words (Lonigan, Burgess, Anthony, & Barker,
1998; MacLean, Bryant, & Bradley, 1987). Children’s knowledge of nursery rhymes at
age 3 is significantly related to the measure of rhyme detection a year later (MacLean et
al., 1987), and early sensitivity to rhyme and alliteration predicts later awareness of
phonemes which plays an important role in reading development (Bryant et al., 1990).
There is a ceiling effect on rhyme detection and production tasks at the kindergarten
level, and most children are able to blend and segment words into the syllabic unit.
Nonetheless, they cannot segment the words into a series of phonemes at this age level
(Blachman, 1994; Stanovich et al., 1984; Yopp, 1988). By the end of first grade, the
majority of children can manipulate phonemes. They can add, delete, or move phonemes
and generate words. More specific developmental processes of phonological awareness
can be found in Figure 1 (cf. Hill, 1999; Torgesen & Mathes, 2000).
7
Validity Test for Phonological Awareness Tasks
As discussed earlier, a great amount of research using various measures has
focused on the concept of phonological awareness and has found convergence evidence
that performance on phonological awareness tasks are intercorrelated with one another.
Furthermore, regardless of the measures that have been used, phonological awareness
tasks shared a large portion of total variance, which in turn, provide evidence for
construct validity of phonological awareness (Hoien et al., 1995; Stanovich et al., 1984;
Yopp, 1988). Two examples of test validity for phonological awareness tasks are briefly
discussed in this section.
Yopp (1988) administered 10 commonly used phonological awareness tasks,
including; rhyming task, auditory discrimination, phoneme blending, phoneme counting,
phoneme deletion, phoneme segmentation, sound isolation, and word-to-word matching
task, to 96 kindergarten children with an average age of 5 years, 10 months. She found
that the phoneme deletion was the most difficult task, and the rhyming was the easiest
task. She conducted a principal factor analysis with oblique rotation and found that the
first factor accounted for 58.7 % of the variance and the second factor accounted for an
additional 9.5 % of the variance. In addition, phoneme blending, phoneme counting,
phoneme segmentation, and sound isolation all loaded highly on the first factor and the
two phoneme deletion tasks loaded highly on the second factor. She labeled the first
factor as “Simple Phonemic Awareness”, and the second factor as “Compound Phonemic
Awareness”. A stepwise regression analysis was also conducted, with the score on the
learning rate test as the dependent variable and 10 tests of phonological awareness as
8
predictors. The sound isolation task explained 52 % of the variance, and phoneme
deletion task explained 10 % of the variance in the learning rate test.
Hoien, Lundberg, Stanovich, and Bjaalid (1995) utilized a very large sample size
to examine the differential validity of the different levels of phonological awareness. Six
types of phonological awareness tasks including rhyme recognition, syllable counting,
phoneme counting, initial phoneme matching, initial phoneme deletion, and phoneme
blending were administered to 128 Norwegian preschool children. The average age of
the children was 6 years, 11 months. A principal factor analysis using varimax rotation
revealed a three-factor solution. Initial phoneme matching, initial phoneme deletion,
phoneme blending, and phoneme counting were found highly loaded on the first factor
which accounted for 38.6 % of the variance. Syllable counting loaded highest on the
second factor, and rhyme recognition loaded highest on the third factor. The second and
the third factors accounted for 18.4 % and 17.6 % of the variance respectively. Hoien et
al. (1995) concluded that the study results indicated preschool children without any
formal reading instruction and with very limited reading skills showed phonemic
awareness.
9
Purpose of the Study
The studies of Yopp (1988) and Hoien et al. (1995) used large sample sizes and
included a variety of tasks to systematically investigate the concept of phonological
awareness of 5 to 6 years-old children. Similarly, most of studies relating to
phonological awareness have assessed preliterate children at the school entry, prior to
formal reading instruction. Compared with this aspect, there has been much less research
focused on the development of phonological awareness at the preschool age level,
specifically at age of four; nevertheless, the considerable evidence has indicated that
preschool children as young as the age of 3 show implicit knowledge of phonological
awareness (Bryant et al., 1990; MacLean et al., 1987).
The purpose of this study was to conduct a validity study regarding the off- level
use of The Phonological Awareness Test (Robertson & Salter, 1997) for identifying
phonological awareness in preliterate pre-kindergarten children using Messick’s (1989)
framework for unitary construct validity. Because validity is the most important
consideration in a test development and use, traditional view of validity and six aspects of
the unitary concept of validity proposed by Messick (1989, 1995) are briefly reviewed
prior to the validation process for phonological awareness tasks.
10
III. VALIDITY
Validity is “the degree to which evidence and theory support the interpretations of
test scores entailed by proposed uses of tests” (AERA, APA, & NCME, 1999, p. 9).
Accordingly, validation is the most crucial procedure in test development and use
because it is a process of collecting evidence to support the intended interpretation of test
scores and implications of the score meaning.
Traditional Conception of Validity
The conception of validity has gradually shifted from numerous specific criterion
validity to a few distinct validity types and finally to a unitary validity concept (Messick,
1989). Although there has been increasing emphasis on construct validity as a unitary
conception of validity, three or four different types of validity have been commonly
utilized in various assessment settings since the early 1950s. The traditional view of
validity argues that the types or aspects of validity depend on the inferences to be drawn
from the test scores and the implications of entailed test interpretations. These separate
types of validity and the limitation of the traditional conception of validity are briefly
discussed.
Content Validity
Content validity refers to the degree to which the content of test samples
represents the content of a particular behavioral domain of interest. Content validity is
primarily concerned with adequate sampling of the content of the domain. The
knowledge and skills that are measured by the test items should be representative to the
11
larger domain of knowledge and skills. The other aspect of content validity involves the
format of the test such as clarity of questions or directions and appropriateness of
language. Content validity is evaluated based on the professional judgment about the
domain relevance and representativeness of the content according to specific criteria or
objectives. Based upon the agreement in judgments by a panel of content experts, test
developers revise or select the final items. Hence, content validity is to specify the
universe of item content and item – selection procedures (Messick, 1989).
Content validity is important because it accumulates judgmental evidence to
support the domain relevance and representativeness of the test content which act upon
the nature of score inferences supported by other evidence. However, Messick (1989)
argues that using content validity as the solitary validity evidence has a critical limitation.
Content validity does not take into consideration the response processes, the internal and
external structure of the test, or performance differences; thus, it does not provide enough
evidence supporting inferences to be made from the test scores.
Criterion-related Validity
Criterion-related validity is the degree to which the test scores are systematically
associated with one or more external criteria considered to directly measure the same
variable. There are two aspects of criterion-related validity – predictive and concurrent
criterion-related validity. Predictive validity refers to the extent to which the test scores
predict the future performance on the criteria, and concurrent validity indicates the extent
to which the test scores estimate the present performance on the criteria. Therefore,
criterion-related validity is a matter of how the test scores accurately predict criteria
performance. Criterion-related validity is evaluated based on the level of empirical
12
relationship, commonly estimated by correlations or regressions, between the test scores
and criteria scores. For this reason, determining appropriate criteria is a critical step in
criterion-related validation.
Criterion-related validity is not about the pattern of relationships between test
scores and other measures, but rather it is about prediction which is more concerned with
non-causal dependence. Furthermore, criterion-related validity relies very heavily on the
empirical relationships with selected external measures. For this reason, criterion-related
validity may be too narrow to reflect the definition of validity because it does not
consider any other sources of evidence besides specific test – criteria relationships
(Messick, 1989).
Construct Validity
Construct validity refers to the extent to which test scores support the presence of
the psychological construct that underlies the measurement. In this manner, construct
validity is concerned with abstract and theoretical traits such as self-esteem, motivation,
temperament, and creativity. Construct validation begins with the operational definition
of the construct based on the literature reviews and theoretical reasoning. The process of
operationalizing the concept is similar to the process of content validation. Operational
definition is a process of defining the theoretical terms and specifying the hypotheses for
the legitimate experimental procedures for applying a theory (Messick, 1989). After the
construct is operationally defined, the hypotheses – the relationships between the
measures of concepts – are logically and empirically examined. In this process, it is
crucial to evaluate the test items for bias or construct- irrelevant variance which
13
systematically influence the test scores. Finally, empirical evidence is interpreted
whether it is consistent with the hypotheses or rival theories.
Construct validity can be assessed by internal and external test structures, that
examine the pattern of relationships among item scores or between test scores and other
measures. Construct validity also involves study of performance differences over time,
across groups, and different settings in response to experimental treatment. On that
ground, construct validity is an integration of any evidence to support the meaning of the
test scores (Messick, 1989).
Unified Conception of Validity
Traditional distinct types of validity – content, criterion-related, and construct
validity – have been widely utilized in various assessment settings. However, it is
common that inferences, to be drawn from the test scores, require multiple types of
validation approach rather than just one (e.g., Cronbach & Meehl, 1955). Moreover,
content validity as sole validity evidence is insufficient because it does not reflect on the
internal and external test structures and response processes. Thus, it does not provide
evidence that bears on inferences to be made from the test scores. Likewise, criterion-
related validity strictly depends on the specific test – criterion relationships and does not
consider any other sorts of evidence. On that account, Messick (1995) argues that the
traditional conception of validity is fragmented and incomplete because it fails to take
into consideration the evidence for the actual and potential consequences of score
interpretation and use. In addition, he addresses that the types of validity are not
alternatives but supplements of one another because all of these forms of evidence
14
fundamentally support the interpretation and implication of the test scores. Hence, the
relation between the evidence and the inferences should determine the validation
approach focus rather than a type of validity (Messick, 1989). This is why validity is
identified as a unitary concept.
In Messick’s (1989, 1995) view, construct validity incorporates content relevance
and representativeness as well as criterion-relatedness since information about the
domain content relevance and about the specific criterion-relationships predicted by the
test scores clearly influences score interpretation. Therefore, construct validity comprises
almost all aspects of validity evidence. A unitary conception of validity should intermix
considerations of content, criteria, and consequences into a construct framework to
empirically test the rational hypotheses about the interpretation and utility of the test
scores (Messick, 1989, 1995).
Messick’s new unified concept of validity heavily emphasizes on both score
meaning and social values in test interpretation and use. Messick (1989, 1995) suggests
six distinguishable aspects of construct validity to address the multiple and interrelated
validity questions to justify score interpretation and use. There are content, substantive,
structural, generalizability, external, and consequential aspects of construct validity.
Descriptions of these six aspects are outlined to guide the validation of phonological
awareness tasks.
The Content Aspect of Construct Validity
Test content refers to the “themes, wording, and formats of the items, tasks, or
questions on a test as well as guidelines for procedures regarding administration and
scoring” (AERA et al., 1999, p. 11). Hence, the content aspect of construct validity
15
subsumes theoretical and empirical analyses of adequacy of content relevance,
representativeness, and technical quality (Messick, 1989, 1995). This validation process
is to gather the construct-relevant sources of task difficulty and to guide the rational
development and scoring of performance tasks and other assessment formats.
The sources of invalidity are worth addressing because they can occur mostly
during the theoretical and empirical domain of construct – the content aspect of
validation (Benson, 1998). According to Messick (1989, 1995), one of the threats to
validity is known as “Construct Underrepresentation”. Construct underrepresentation
occurs when the assessment is defined too narrowly and fails to adequately cover the
important theoretical domain of construct. Another threat to validity is “Construct-
Irrelevancy”. Construct- irrelevant variance is when the assessment is defined too broadly
and contains excess reliable variance associated with other distinct construct in addition
to the focal construct. That is, aspects of the task are extraneous to the focal construct
and make the task irrelevantly difficult or easy for particular individuals or groups.
In essence, evidence about content is primarily concerned with the basis for
specifying the boundaries and structure of the construct to be assessed. The construct and
test content domain are carefully evaluated by a panel of experts’ professional judgments
and documentation of which addresses the potential sources of irrelevant difficulty or
easiness that require further analysis as well as sample domain processes in terms of their
functional importance (AERA et al., 1999; Messick, 1989, 1995).
On that ground, one needs to consider the definition of phonological awareness –
the sensitivity to or awareness of, and the ability to manipulate the sound units in spoken
language. Then, phonological awareness tasks should be designed to assess individual’s
16
awareness of and ability to manipulate the spoken language segments which make up
words. Regarding the sources of invalidity, understanding the developmental sequence
of phonological awareness in children is important because the difficulty and complexity
of the tasks directly influence children’s performances. The age of subjects and
demographic characteristics should be addressed in this validation step.
The Substantive Aspect of Construct Validity
The substantive aspect of construct validity requires engagement between judged
content relevance and representativeness and empirical response consistency or
performance regularity in the assessment tasks (Loevinger, 1957; Messick, 1989).
Theoretical and empirical analyses of response processes provide evidence for
appropriate sampling of domain and accrue empirical evidence for sampled processes
that are actually engaged by respondents in task performance.
Inferences about processes involved in performance are generally developed by
analyzing individual responses such as eye movements, response times, performance
strategies, or responses to particular items. Empirical evidence of response consistency
also derives from correlation patterns among parts of the test and between the test and
other variables or from consistency in response times for task segments (AERA et al.,
1999; Messick, 1995). In addition to evaluating the response in tasks, the scoring rubrics
or scoring guidelines should be carefully reviewed for the appropriateness of scoring
processes to the intended interpretation or construct definition.
In brief, the matter of test content entails not only the content representativeness
of the construct measure but also the process representation of the construct and the
degree to which these processes are reflective of construct measurement. The content
17
representativeness of the test items need to be assessed in terms of the empirical domain
structure which underlies the ultimate test form and score interpretation (Messick, 1989,
1995). Therefore, the scoring and recording response process should be clearly indicated.
The Structural Aspect of Construct Validity
The structural aspect of validity refers to “the extent to which structural relations
between test items parallel the structural relations of other manifestations of the trait
being measured” (Loevinger, 1957, p. 661). The analyses of internal structure of a test
are to determine the degree of the relationships among test items and the intended
structure of the theoretical domain. Thus, the structural aspect of construct validity
examines the consistency or fidelity of the scoring structure related to the structure of the
construct domain.
The structural aspect can be assessed by various statistical methods such as
intercorrelation among the items and subscales, exploratory and confirmatory factor
analysis, and item response theory. The specific types of analysis and interpretations of
the results rely on the implication and utility of the test scores (AERA et al., 1999). For
instance, if a set of test items of increasing difficulty is of interest, empirical analyses of
the number of items answered correctly or the pattern of scoring key should be provided.
The structural aspect of validity also includes the appropriateness and adequacy of
scaling and equating procedures using item response theory. The adequacy of scaling is
the degree to which the relative weights for different types of items are consistent with
the construct interpretation of the test results (Miller & Linn, 2000).
Indeed, the structural component of construct validity includes both the selection
or construction of relevant assessment tasks and the logical development of construct –
18
based scoring criteria, guidelines, and rubrics. The internal structure of the assessment
including intercorrelation among the items and subtest, degree of homogeneity in the test,
and the dimensionality of the interitem structure should be consistent and reflect the
internal structure of the construct domain (Messick, 1989, 1995). In this aspect, item
analyses including item difficulty, item discrimination, internal consistency, and factor
analysis should be reviewed in addition to the scoring guidelines or the procedure of
scoring on the phonological awareness tasks.
The Generalizability Aspect of Construct Validity
Generalizability is concerned with the numerous factors such as sampling
fluctuations and reliability of measures that contribute to systematic variability in
behavior and performance. Generalizability refers to the degree to which a construct
interpretation empirically generalize to and across population groups (population
generalizability), situations or settings (ecological generalizability), time periods
(temporal generalizability), and task domains (task generalizability) (Messick, 1989).
For example, ecological generalizability involves the sources of invalidity from the
standardization of test materials and administration conditions. As another example,
population generalizability examines the test scores across random samples of diverse
ethnic groups in order to indicate that the test measures the same construct in these
populations. In addition, the limits of score meaning are also influenced by the degree of
generalizability across observers or raters of the task performances.
The degree of generalizability of construct meaning across contexts can be
evaluated by assessing the degree to which test scores reflect comparable patterns of
relationships with other measures or similar responsiveness to treatment across groups,
19
situations, times, and tasks (Messick, 1989). Also, generalizability theory is the
application of analysis of variance models and random variance components to estimate
universe score variance which examines the consistency of the assessment procedures
under different conditions of population groups or tasks (Miller & Linn, 2000). The
generalizability aspect of validity evidence is determined by the degree of correlation of
the assessment tasks with other tasks representing the construct, by the nature of the
construct assessed, and by the scope of its theoretical applicability (Messick, 1989, 1995).
In summary, generalizability is primarily concerned with sources of measurement
error associated with the sampling of tasks, occasions, and raters which underlie
traditional reliability. The generalizability study presents an evidential basis for
judgments of the test interpretation and use across various contexts.
The External Aspect of Construct Validity
The external aspect refers to the degree to which the relationships of test scores
with other measures and non- assessment behaviors or performances reflect the expected
relations in the theory of construct being assessed (Loevinger, 1957). Indeed, “the
construct represented in the assessment should rationally account for the external pattern
of correlations” (Messick, 1995, p. 746).
The external component of validity evidence fundamentally depends on the
correlations between the total score of assessment and any subscores. Accordingly, the
external aspect can be established by the theoretical bases for the obtained patterns and
by structural equation models to reproduce the observed correlations in construct –
consistency. According to Benson (1998), multitrait-multimethod matrix procedure
connects the structural and external stages of validation. The multitrait-multimethod
20
matrix generates two important correlation patterns. One is the “convergent validity
coefficient”, which indicates the relationships between the test scores and other measures
of the same construct on theoretical grounds. Another correlation pattern is the
“discriminant validity coefficient” that specifies the relationships between the test scores
and measures of distinct constructs (AERA et al., 1999; Benson, 1998; Messick, 1995).
In addition, group differentiation also can be relevant if the theoretical construct suggests
the presence or absence of the group differences in the proposed test interpretation.
Contrasting the mean scores of gender, diverse ethnic groups, and socio-economic status
are examples of this approach.
In short, the meaning of the test scores is verified externally by assessing the
degree to which the relevance of the potential relationships with other criterion measure
in the stage of external aspect of validation. The test validation in essence is to insure
that empirical evidence of such relations attest to the scores for the applied purpose.
The Consequential Aspect of Construct Validity
The consequential aspect appraises the intended and unintended consequences of
test uses and implications of score interpretation. AERA et al. (1999) addresses the
distinction between validity evidence about consequences and issues of social policy. If
consequences of assessment are traced to any sources of invalidity such as construct
underrepresentation or construct- irrelevant variance, it is directly related to validity.
Hence, consequences as validity evidence affect or change the score interpretations and
implications of score meaning (Miller & Linn, 2000).
Consequences of assessment are either intended or unintended. Intended
consequences include improved instructional or educational practices, a test used in
21
placement decisions, or selections of effective treatment in therapy. On the other hand,
unintended or adverse consequences include bias in the assessment, unfairness of
assessment, and misinterpretations for certain individuals or groups. Fundamentally, the
measurement is concerned with any negative implication on individuals or groups that are
derived from any sources of invalidity. For example, low scores should not occur
because the test measures unrelated knowledge or skills of domain construct. Also, low
scores should not occur because the assessment contains something sensitive to particular
individuals or groups unintended to be part of the construct.
It is clear that the consequential aspect of validity evidence comprises the value
implication of score interpretations as a basis for actions in addition to actual and
potential consequences of test use (Messick, 1995). Since consequences as a source of
evidence for validity affect the inferences and use of the assessment, the value
implications of score interpretations should be addressed as a part of validity framework
(Messick, 1989; Miller & Linn, 2000).
Validity as Integrated Evidence
The six aspects of construct validity are emphasized as a unified concept that
addresses score-based interpretations, utility of scores, and value implications as a basis
for action. Validity rationale eventually accumulates various sources of evidence to
provide a sound scientific basis for the intended interpretation of test score for specific
use (AERA et al., 1999). Thus, integrating various components of evidence involves
appropriate sampling of domain, relevant assessment task construction procedures,
22
adequate score reliability, proper test administration and scoring procedures, accurate
score scaling and equating, standard setting, and careful attention to test invalidity.
These aspects of validity should be viewed as interdependent and complementary
forms of validity evidence rather than distinct and substitutable validity types. Indeed,
evidence relevant to all of the six aspects need to be integrated into an overall validity
judgment to support score-based interpretations and action implications. Once again, the
unified concept of validity brings considerations of content, criteria, and consequences
together into a construct framework for testing rational hypotheses about theoretical and
score-based inferences (Messick, 1989).
Facets of the Unitary Validity
The unified concept of validity is highlighted because it integrates the
appropriateness, meaningfulness, and usefulness of score-based inferences. Messick
(1989, 1995) suggests two interconnected facets of the unitary validity concept as a way
of cutting and combining validity evidence. The facets of validity enables the prevention
of excessive reliance on selected forms of evidence and emphasizes the supplementary
role of content- and criterion-related inferences to applied decisions and actions based on
the test scores.
The sources of justification of the testing (evidence or consequence) and the
function or outcome of the testing (interpretation or use) generates a four-fold
classification as presented in Figure 2. The evidential basis of test interpretation is
construct validity because construct validity means evidence and rationales support the
score meaning. The evidential basis of test use is also construct validity because it
23
involves the score meaning. Also, the evidential basis of test use is supported by
evidence for the relevance and utility of the test to the specific applied purpose and
setting. The consequential basis of test interpretation is the evaluation of value
implications of score meaning and is construct validity since the score interpretation is
necessary to assess the value implications. Finally, the consequential basis of test use is
the evaluation of both actual and potential social consequences of applied testing. The
social consequences also involve evidence of score meaning, of relevance, and of utility.
24
IV. METHOD
This study utilized data obtained an on-going study by Hamilton, Schwanenflugel,
Neuharth – Pritchett, and Restrepo in pre-kindergarten literacy development. The
descriptions presented here were based on the information provided by these original
investigators.
Participants
A total of 415 pre-kindergarten children (213 boys and 202 girls) participated in
the study. The initial investigators recruited participants at the pre-kindergarten
registration in spring of 2002 in three Northeastern Georgia school districts. Children
were attending 26 public elementary schools in these three school districts. The age of
children ranged from 4 years to 5 years, 7 months with an average age of 4 years, 6
months at the time of the school started in the month of August of the year of 2002. The
ethnic population was diverse; 41.7 % (n = 173) were African-American, 33.4 % (n =
139) were Caucasian, 18.5 % (n = 77) were Hispanic, 5 % (n = 21) were Asian, and 1.4
% (n = 6) were Bi-Racial. 75.8 % (n = 314) of the children spoke English as a first
language, 20.4 % (n = 85) spoke Spanish, and 3.8 % (n = 16) spoke other than English as
a first language. Children were predominantly drawn from a low to lower-middle socio-
economic class population. 32.9 % (n = 137) of children were reported receiving free or
reduced lunch, and 71% (n = 295) of children’s family were reported earning less than
$25,000 per year.
25
The majority of children in this age population did not have any detectable letter
knowledge prior to pre-kindergarten level. None of the children were acquired any
reading skills at the pre-kindergarten age level for the given tasks.
Materials
Phonological Awareness Tasks
A subset of The Phonological Awareness Test (Robertson & Salter, 1997) was
used to assess phonological awareness of pre-kindergarten children in this study. The
Phonological Awareness Test was designed to diagnose deficits in phonological
processing and phoneme-grapheme correspondence. The intended population of The
Phonological Awareness Test is five through nine years of age. The Phonological
Awareness Test included rhyming, segmentation, isolation, deletion, substitution,
blending, graphemes, and decoding subtests.
Eight phonological awareness tasks were drawn from The Phonological
Awareness Test by the initial investigators, Schwaneneflugel and Blake. The initial
investigators included the tasks that were considered to be potentially significant
predictors of reading ability in the previous studies and the tasks that to be included in the
intervention. The rhyming discrimination, sentence segmentation, syllable segmentation,
initial isolation, syllable blending, phoneme blending, consonant graphemes, and long
and short vowels graphemes were included to assess the child’s phonological awareness
in this study. However, instructions were modified slightly and ceiling rules were created
because of the age of the participants. Each of the tasks is described in detail as follows.
The actual tasks items and correct responses are presented in Appendix.
26
Rhyming Discrimination: The rhyming discrimination task was to measure the
child’s ability to identify rhyming words presented in pairs. The examiner said to the
child, “I am going to say two words and ask you if they rhyme. Listen carefully. Do
these words rhyme? Fan – man.” Then the child should respond with either “yes” or
“no”. The examiner indicated whether each response was correct or incorrect, and
provided the correct response, “Fan – man. Yes, they do rhyme.” If the child responded
with other than “yes” or “no”, the examiner repeated the question to elicit a “yes” or “no”
response. The stimulus phrase, “Do these words rhyme?” could be repeated, but no other
prompts were given to the examinees. The actual ten task items were administered to the
child who responded correctly to at least one of the three practice items. Thus, the child
who responded to all three practice items incorrectly was excluded from the task
administration. Practice items included “Fan – man”, “Fan – tan”, and “Fan – dog”.
Only words that the child responded correctly on their own were scored as correct, with a
possible score range of 0 to 10, excluding the three practice items. The examiner stopped
administering the task if there were three consecutive wrong items in the child’s
responses.
Sentence Segmentation: The purpose of sentence segmentation task was to
assess the child’s ability to divide sentence into their constituent words. The examiner
told the child, “I am going to say a sentence, and I want you to clap one time for each
word I say. My house is big. Now, clap it with me.” The examiner said the sentence
again and clapped as she/he said each word. “My – house – is – big. Now, you try it by
yourself. My house is big.” The child should respond with clapping four times, while
she/he repeated the sentence word by word. The examiner indicated whether the child’s
27
response was correct or incorrect. If the child responded incorrectly, the examiner
repeated the sentence and asked the child to clap with her/him. The stimulus phrase,
“Clap one time for each word I say.” was given to the examinees without any other
prompts. Three practice items, including “My – house – is – big.”, “My – name – is -
_______.", and "I – like – dogs.” were given prior to the actual task items. However, the
task administration took considerably long time for this age population. The initial
investigators decided that this task was too long for the concentration level at this age
population. Sentence segmentation task was dropped from the battery after the task was
administered to about 50 pre-kindergarten children.
Syllable Segmentation: The purpose of the syllable segmentation task was to
assess the child’s ability to divide the words into syllables. The examiner told the child,
“I am going to say a word, and I want you to clap one time for each word part or syllable
I say. Saturday. Now, clap it with me.” The examiner said the word again and clapped
once as she/he said each syllable. “Sat – ur – day. Now, try it by yourself.” The words,
including “Saturday”, “Friday”, and “Dog” were given as practice items. The child
should respond with claps, one for each syllable as the child said the word by syllable.
The examiner acknowledged a correct response. If the child responded incorrectly, the
examiner repeated the word and asked the child to clap with her/him. The stimulus
phrase, “Clap one time for each syllable in the word.” was repeated, but no other prompts
were given to the child. After three practice trials, the actual task items were
administered to the child who responded to at least one of the three practice items
correctly; hence, the child was excluded from the task administration if he/she responded
to all three practice items incorrectly. Only words that the child responded to correctly
28
on their own were scored as correct. The examiner stopped the task administration if the
child responded to three consecutive items incorrectly. The child’s score was the number
of correct responses, with a possible score range of 0 to 10, excluding the three practice
items.
Initial Isolation: The initial isolation task was to measure the child’s ability to
identify the initial phoneme in a word. The examiner began the task by saying, “I am
going to say a word, and I want you to tell me the beginning or first sound in the word.
What is the beginning sound in the word CAT?” The child should respond with /k/ or
“kuh”. The examiner gave feedback by saying, “That is correct.” or by saying, “The
beginning sound in CAT is /k/.” The stimulus phrase, “What is the beginning sound in
_______.”, was given to the child. The examiner emphasized the word “sound” if the
child gave letter names; however, she/he scored the item incorrect and did not repeated
the item. After the three practice trials, including “CAT”, “MAD”, and “JANE”, the
examiner administered the actual task items to the child who correctly responded to at
least one of the three practice items. The items that the child responded to correctly on
their own were scored as correct. Score had a possible rage of from 0 to 10 correct,
excluding the three practice items. The task administration stopped if the child responded
to three consecutive items incorrectly.
Syllable Blending: The syllable blending task was designed to assess the child’s
ability to blend individually presented syllables to form a word. The examiner told the
child, “I will say the parts of a word. You guess what the word is. What word is this?
Ta – ble.” The examiner paused for one second between syllables. If the child responded
with table as a whole word without pausing between syllables, the child’s response was
29
scored as correct. The examiner indicated whether each response was correct or incorrect.
If the child repeated the word in parts, the examiner told the child, “Say it faster, like this,
table.” Three practice items, including ta – ble, mo – ther, and he – llo were given to the
examinees before the administration of actual ten task items. However, the task
administration took too long for this age population. The task was dropped from the
battery after it was administered to 50 pre-kindergarten children.
Phoneme Blending: The purpose of the phoneme blending task was to measure
the child’s ability to blend phonemes together to form a word when phonemes were
presented individually. The examiner told the child, “I will say the sounds of a word.
You guess what the word is. What word is this? /P – o – p/.” The examiner paused for
one second between sounds. The child should respond with the word pop without
pausing or distorting any sounds. If the child repeated the sounds as given by the
examiner, she/he was told, “Say it faster, like this pop.” Each child was given three
practice items, including /p – o – p/, /d – o – g/, and /c – a – t/, prior to administration of
the test items. The examiner acknowledged a correct response. If the child responded
incorrectly, the examiner said, “/p – o – p/ is pop.” The stimulus phrase, “What word is
this?” was given to the child without any other prompts. The examiner administered the
actual task items to the child who responded correctly to at least one of the three practice
items. The child’ score was based on the total number of correct responses, with a
possible range of 0 to 10 correct, excluding the three practice items. When there were
three consecutive wrong items in the child’s responses, the examiner stopped
administering the task.
30
Consonants Graphemes: The consonants graphemes task was to assess the
child’s knowledge of sound and symbol correspondence when the letters were
individually presented. The task was not given to the children who did not know the
letters in his or her name. The examiner told the child, “I am going to show you some
letters. I want you to tell me what sound each letter makes.” Some of the letters had two
acceptable sounds. For instance, if the child responded with /k/ or /s/ for the letter c, the
examiner scored the item as correct. But, the consonants graphemes task took too long to
administrate, and the initial investigators decided to drop the consonant graphemes task
from the battery after administered to 50 children.
Long and Short Vowels Graphemes: The purpose of this task was to measure
the child’s knowledge of sound and symbol correspondence of vowels. The examiner
showed the vowels cards to the child and said to the child, “I am going to show you some
letters. I want you to tell me what sound each letter makes.” The task was given to the
children who knew the letters in his or her name. If the child responded with one vowel
sound, the examiner said to the child, “Tell me the other sound this letter makes.” There
were no practice items for this task. However, the administration for this task was too
long for this age population. The task was dropped from the battery after the task was
administered to 50 children.
Criterion Measure
There are many different measures that can be employed to appraise the criterion-
related validity. The initial investigators developed an alphabet knowledge test to
measure the child’s ability to identify the letter names and sounds of the alphabet. An
31
alphabet test was included to determine the predictive validity of each of phonological
awareness tasks.
Alphabet Knowledge Test: An alphabet test was to assess the child’s knowledge
of alphabet letter names and sound correspondence. The examiner showed the child a list
of upper and lower case of letters presented in a random order. The examiner pointed to
each letter sequentially and asked the child, “Do you know what this is?” If the child
responded with a correct letter name, he or she was asked, “What sound does it make?”
If the child responded with a correct letter name, the child was asked what the letter’s
sound was. The examiner recorded child’s responses either correct or incorrect on the
paper. Any correct pronunciation of the given letter was deemed letter sound knowledge.
For example, if the alphabet letter ‘C’ or ‘c’ was pronounced /k/, /s/, or /ch/, the child’s
response was scored as correct for the letter sound knowledge. The alphabet test
included four subtests, including letter name knowledge upper and lower case and letter
sound knowledge upper and lower case. Each of the alphabet tests consisted of 16 upper
and 16 lower case in a random order. Only the alphabet letters that the child responded
correctly on their own were scored as correct. Scores on the alphabet tests ranged from 0
to 16.
Procedure
Assessment Procedure
Assessment of phonological awareness tasks took place over a three-month period
during the months of August and October of the pre-kindergarten year of 2002. Fifteen
examiners were trained by the initial investigators for two days prior to the assessment
32
session. The initial investigators observed the assessment process for a week in order to
insure whether the examiners were fully informed with the administration and scoring
procedures.
The number of sessions taken to complete the assessment relied on the levels of
examinees’ concentration and frustration. Each of the phonological awareness tasks was
administered individually in a quiet room. Items in each task were directly drawn from
The Phonological Awareness Test (Robertson & Salter, 1997), and were given to the
examinees in sequential order. Each task of phonological awareness was administered in
random order in order to avoid the occurrence of an order effect. All examinees were
given three practice items prior to the actual task items. Ten actual task items were
administered to only examinees who responded to at least one of the three practice items
correctly. With respect to the examinees’ age, frustration, and concentration level, the
task administration was stopped if the examinees responded to three consecutive items
incorrectly. If the child was losing track of the task, the examiner went back to the
practice items to remind the child of the task.
The criterion measure, the alphabet knowledge test was given to the examinees
during the months of January and February in the year of 2003. The alphabet test was
administered by a new set of assessors who were similarly trained.
Validation Procedure
The validation study for the phonological awareness tasks in this study focused on
the content, substantive, structural, and external aspects of construct validity proposed by
Messick (1989). Each aspect of validation procedures is briefly reviewed as follows.
33
The content aspect of validity began with literature review about the relationship
between phonological awareness and reading skills of three to seven-year-old children.
The initial investigators selected phonological awareness tasks that were considered to be
related to reading skills later on. The content aspect of construct validity was enhanced
by a pilot study with 19 pre-kindergarten children and 11 kindergarten children.
The substantive aspect of construct validity focused on the age-appropriateness of
task administration. The initial investigators reconstructed guidelines for task
administration and scoring procedures. Because the age population in this study was
younger than the intended population of The Phonological Awareness Test, the
investigators set the ceiling for all subtasks. The actual task items were administered to
only examinees who responded to at least one of three practice items correctly.
Moreover, if the examinees responded to three consecutive items incorrectly, or if the
examinees showed the symptoms of frustration, the examiners stopped the task
administrations. The examiners were trained on the phonological awareness task
administration and scoring procedures by the initial investigators for two days. The
assessment process was observed by the investigators to ensure whether the examiners
were fully informed with phonological awareness task administration and scoring
procedures. In addition, the mean performances and the standard deviations were
calculated as well as internal consistency, using alpha coefficient.
The structural aspect of construct validity was established by the empirical
analyses of items difficulty, item discrimination, and intercorrelations among the tasks.
Factor analysis was also conducted to evaluate the internal structure of the assessment.
34
Finally, as a part of the external aspect of validity, criterion-relatedness was
evaluated by multiple regression analysis with total score on the alphabet upper and
lower sound knowledge test as the dependent variable and the scores on phonological
awareness tasks as the independent variables. In addition to the multiple regression
analysis, the correlation coefficients between alphabet name and knowledge tests and
phonological awareness tasks were calculated. The external aspect of construct validity
also included group differentiation in phonological awareness performances among
gender, ethnicity, and socioeconomic status.
35
V. RESULTS
The Content Aspect of Construct Validity
The description of phonological awareness tasks are presented in the Appendix.
In addition to the task descriptions, the Appendix displays the items and correct
responses, including the three practice items and the ten actual items.
The Substantive Aspect of Construct Validity
Descriptive Statistics
Table 1 and Table 2 summarize subjects’ performances on the tasks of
phonological awareness. The possible maximum scores, the mean scores, and the
standard deviations are presented, as well as the internal consistency of each task for this
sample.
Table 1 is based on the scores that took into consideration practice items. Recall
that the actual task items were administered to subjects who responded to at least one of
the three practice items correctly. If the subject was given the actual task items, the first
item in the current context labeled as ‘preliminary item’ and was scored as correct.
Likewise, if the computations of the means, the standard deviations, and reliabilities were
included the preliminary item, it was called ‘preliminary item condition’. Hence, Table 1
had a possible score range of 0 to 11, and a score of 0 indicated that the child responded
to all three practice items incorrectly. Table 2 summarized subjects’ performance based
on the actual task items. If the computations of the means, standard deviations, and
36
reliabilities were based on only actual ten task items, it was called ‘actual item condition’.
A possible score range in the actual item condition was 0 to 10.
In both of the preliminary item condition and the actual item condition, rhyming
discrimination task had the highest mean scores (M = 3.64, SD = 3.88 and M = 3.10, SD =
3.46, respectively). On the other hand, initial isolation task had the lowest mean
performance among the tasks in both of the preliminary item condition and the actual
item condition (M = 0.87, SD = 2.54 and M = 0.68, SD = 2.29, respectively). In the
actual item condition, phonemes blending task also had a low mean score of 0.74, with a
standard deviation of 1.86.
Task Reliability
The reliability of each task of phonological awareness was determined by
coefficient alpha. Table 3 displays the coefficient alpha of each task of phonological
awareness, as well as standard error of measurement in the preliminary item condition,
which took into consideration three practice items. Table 4 presents the coefficient alpha
and standard error of measurement of each tasks of phonological awareness based on the
actual item condition. According to Hills (1981), reliability coefficient should be at least
.85 if the interest of test use is to make decisions about individuals. Therefore, reliability
coefficients indicated that all of four phonological awareness tasks had high internal
consistencies, with á > .85. In both preliminary item condition and actual item
condition, initial isolation task had the highest internal consistency, with a coefficient
alpha of .97 and .98, respectively. In contrast, syllables segmentation task had the lowest
internal consistency, with coefficient alpha of .89 and .88, respectively.
37
The Structural Aspect of Construct Validity
Item Analyses
All of the items on the phonological awareness tasks were dichotomously scored.
The difficulty level of each task was obtained by averaging the total score mean by the
number of items on the task. Table 5 displays the mean difficulty levels. Examinees
experienced the greatest difficulty with initial isolation task which was to identify the
beginning phonemes in the words (P = .079 in the preliminary item condition, and P =
.067 in the actual item condition). Rhyming discrimination task proved to be the easiest
among the tasks (P = .330 in the preliminary item condition and P = .310 in the actual
item condition).
Because examinees experienced great difficulty with some of the tasks, item
analyses were conducted based on the number of examinees who actually responded to
the item in addition to the total number of examinees. The item difficulty corresponded
to the proportion of examinees who responded to the item correctly. The value of point
biserial correlation between an item score and total score was used for item
discrimination. The point biserial correlation coefficient of .350 or greater is considered
to differentiate relatively high ability examinees from relatively low ability examinees.
None of the items across the phonological awareness tasks had item discrimination that
was less than .350. The results are presented for respective tasks below.
Rhyming Discrimination: Table 6 and 7 display the results of item analyses on
the rhyming discrimination task. The item discrimination ranged from .484 to .823 in the
preliminary item condition, and ranged from .471 to .817 in the actual item condition.
Approximately 46% of the examinees responded to all of the three practice items
38
incorrectly and were not qualified for taking the task. Examinees were more likely to
have difficulty in detecting the non-rhyme words than detecting the rhyme words. All of
the non-rhyme words had the item difficulty level of .169 to .222 based on the total
number of examinees, and ranged .393 to .484 based on the number of examinees who
actually responded to the items. Although the levels of item difficulty were assumed to
systematically decrease as the task administration processed, the item difficulty seemed
to be unsystematically distributed.
Syllable Segmentation: About 54 % of examinees responded to at least one of
the three practice items correctly. Table 8 and 9 show the item difficulty and item
discrimination of the syllable segmentation task. In the preliminary item condition, the
item discrimination ranged from .466 to .727. In the actual item condition, item
discrimination ranged from .475 to .742. Examinees had greater difficulties with more
segmented words (e.g. watermelon or kindergarten) than less segmented words (e.g.
pizza or candy). All of the four-segmentation words had the item difficulty of less than
.100 when the item analyses were based on the total number of examinees. On the other
hand, those items had slightly higher levels of item difficulty of .162 to .204 when the
analyses were based on the number of examinees who actually responded to the items.
The levels of item difficulty seemed to be unsystematically distributed on the syllable
segmentation task.
Initial Isolation: Examinees had the greatest difficult with initial isolation task.
Only 19 % of the examinees responded to at least one of the three practice items
correctly. Table 10 and 11 summarize the item difficulty and item discrimination on the
initial isolation task. Item discrimination ranged from .609 to .953 in the preliminary
39
item condition, and ranged from .839 to .955 in the actual item condition. When the item
analyses were conducted based on the total number of examinees, the levels of item
difficulty seemed to systematically decrease. Moreover, none of the actual task items had
the difficulty level of greater than .083. In contrast, the levels of item difficulty seemed
to be unsystematically distributed when the item analyses were based on the number of
examinees who actually responded to the items. The item difficulty levels increased
dramatically when the item analyses were based on the actual number of responded
examinees. Initial isolation task seemed to be too difficult for this age population.
Phoneme Blending: Table 12 and 13 display the item analyses of the phoneme
blending task. The actual items of phoneme blending task were administered to about 31
% of the total examinees, indicating that about 69% of the examinees responded to all
three practice items incorrectly. Item discrimination ranged from .566 to .772 in the
preliminary item condition, and ranged from .577 to .755 in the actual item condition.
Although there were more examinees who responded to at least one of the three practice
items correctly on the phonemes blending task than on the initial isolation task,
examinees seemed to have more difficulty with the actual task items on the phonemes
blending task. When the analyses were based on the number of examinees who actually
responded to the items, none of the items had the item difficulty level greater than .50
except the first item (P = .598). Furthermore, the levels of item difficulty seemed to
systematically decrease when the analyses were based on the actual number of responded
examinees, as well as the total number of examinees. Phonemes blending task also
seemed to be too difficult for this age population.
40
Task Intercorrelations
The interrelationships between the phonological awareness tasks are demonstrated
in the correlation matrix as shown in Table 14. The correlation coefficients were
computed based on the actual item condition. Using the Bonferroni approach to control
for Type I error across the six correlations (.05/6 = .0083), all of the tasks were
significantly correlated one another. The tasks that correlated the highest were initial
isolation task and phonemes blending task (r = .51, p < .001). Syllables segmentation
task and phonemes blending task had the lowest correlation coefficient (r = .32, p <
.001). The percentage of variance accounted for by the significant correlations ranged
from 10.2 % to 26 %, indicating the medium to large strength of the relationships (J.
Cohen & P. Cohen, 1983).
Factor Analysis
A principal component factor analysis was carried out on the correlation matrix of
phonological awareness tasks (see Table 14 for correlations). The KMO (Kaiser – Meyer
– Olkin Measure of Sampling Adequacy) of .722 indicated that the correlation matrix of
phonological awareness tasks was middling agreeable to factoring. Two criteria were
used to determine the number of factors to rotate: eigenvalues-greater-than-one criterion
and the scree test. Table 15 displays the eigenvalues and the percentage of variance
accounted for. The eigenvalues indicate the variance accounted for by each factor, and
SPSS extracts the number of factors that have eigenvalues greater than one (Green,
Salkind, & Akey, 1997). Only the first factor exceeded the eigenvalues-greater-than-one
criterion for number of factors, and it accounted for 54.8 % of the total variance. The
41
factor loadings are presented in Table 16 when the eigenvalues-greater-than-one criterion
was considered.
The plot of eigenvalues indicated that a two-factor solution might also be
appropriate, especially given that an additional 18.3 % of variance is accounted for (See.
Figure 3). Two factors were extracted by specifying the number of factors in the
analysis, and were rotated using a varimax procedure. Table 17 presents the loadings of
the phonological awareness tasks on the factor after a varimax rotation, revealing that the
smaller unit, phoneme blending and initial isolation tasks loaded highly on Factor 1,
whereas the larger unit, rhyming discrimination and syllable segmentation tasks loaded
highly on Factor 2. This implies that the four tasks of phonological awareness might
have two factors that underlie the measurement.
The External Aspect of Construct Validity
Relationships to Alphabet Knowledge Test
The mean performances and the standard deviations on four tests of alphabet
name and sound knowledge are displayed in Table 18, including the possible maximum
scores. The letter name knowledge-upper case test had the highest mean score (M =
12.06, SD = 9.92), and the letter sound knowledge- lower case had the lowest mean score
(M = 4.03, SD = 6.67). The predictive correlations between four tasks of phonological
awareness and four tests of alphabet knowledge are presented in Table 19. The
correlation coefficients were computed based on the number of examinees who actually
responded to the phonological awareness task items. Using the Bonferroni method to
control for Type I error across the 16 correlations, a p-value of less than .0031 (.05 / 16 =
42
.0031) was required for significance. None of the predictive correlations between the
phonological awareness tasks and the alphabet knowledge tests were statistically
significantly correlated with one another. The initial isolation task had the highest
correlation with the letter sound knowledge -upper case test (r = .25, n = 36, p = .139).
The phoneme blending task had the lowest correlation with the letter name knowledge-
lower case test (r = -.03, n = 78, p = .767).
Regression Analysis
A forward regression analysis was conducted with a total score on the alphabet
sound-upper and lower case tests as the dependent variable and the four tasks of
phonological awareness as the independent variables. The regression analysis was
conducted based on the total number of examinees. The mean performance on the
alphabet sound knowledge test was 9.06, with a standard deviation of 13.61.
A linear combination of two tasks, initial isolation and phoneme blending made a
significant contribution to explaining the variation in the alphabet sound knowledge test,
F (2, 398) = 5.45, p = .005. The sample multiple correlation was .163, indicating that
approximately 2 % of the variance of the alphabet sound knowledge test in the sample
can be accounted for by the linear combination of initial isolation task and phonemes
blending task. The regression equation is shown below.
YPredicted Alphabet Sound = 1.12 Initial Isolation – 0.90 Phonemes Blending + 9.08
The squared cross-validated correlation coefficient was calculated to evaluate how useful
the sample regression equation would be useful when it is applied to other examinees in
the population (Browne, 1975). The squared cross-validated correlation coefficient was
43
fairly small (Rcv2 = .019) and was similar in value to the squared sample multiple
correlation coefficients (R 2 = .163).
Group Differentiation
Gender Differences: A series of independent samples t-test was conducted to
evaluate the relationship between gender and the performance on each of the
phonological awareness tasks. The Bonferoni procedure was used to control for Type I
error across the tests, with a p-value of less than .0125 (.05/4) for the significance. The
mean performances and the standard deviations on the each phonological awareness tasks
are shown in Table 20. The practical importance, effect size was calculated by the
standardized mean differences. The independent sample t-tests indicated that the groups
did not significantly differ on the following tasks: rhyming discrimination (t (394) =
0.136, p = .892, d = .014); syllable segmentation (t (392) = 0.627, p = .531, d = .063);
initial isolation (t (391) = -0.045, p = .964, d = -.004); and phoneme blending (t (391) = -
0.658, p = .511, d = -.070).
Ethnicity Differences: Table 21 displays the means and the standard deviations
on the each task of phonological awareness by ethnic groups. A series of one-way
analysis of variance was conducted to determine whether there were differences between
ethnic groups. The Bonferroni method was used to control for the Type I error rate
across the tests (.05/4 = .0125). The ANOVA results revealed that there were statistically
not significant differences among the ethnic group performances on the phonological
awareness tasks: rhyming discrimination (F = (4, 312) = 2.42, p = .049, partial ç2 = .030);
syllable segmentation (F (4, 310) = 0.38, p = .826, partial ç2 = .005); initial isolation (F
44
(4, 309) = 1.16, p = .328, partial ç2 = .015); and phonemes blending (F (4, 309) = 0.57, p
= .687, partial ç2 = .007).
Socioeconomic Differences: The two socioeconomic groups were identified
based on the whether the child receive free or reduced lunch or not. Approximately 30 %
of the participants received free or reduced lunch and were identified as lower
socioeconomic group. The mean performances and the standard deviations on each of
the phonological awareness tasks are shown in Table 22. A series of independent
samples t-test was conducted to examine the relationship between the socioeconomic
status and the performance on the phonological awareness tasks, using Bonferroni
method to control for Type I error across the tests (.05/4 = .0125). The independent
samples t-tests indicated nonsignificant relationship between socioeconomic status and
the performances on the phonological awareness tasks: rhyming discrimination (t (381) =
0.491, p = .624, d = .061); syllable segmentation (t (379) = 0.236, p = .814, d = .027);
initial isolation (t (378) = -0.676, p = .500, d = -.077); and phoneme blending (t (378) =
1.137, p = .256, d = .130).
45
VI. DISCUSSION
The current study was to examine the psychometric characteristics of
phonological awareness assessment in pre-kindergarten children. The peculiarity of the
validation study is pursuing six distinguishable and interdependent aspects of unitary
construct validity suggested by Messick (1989). Based upon the theoretical framework,
the study aimed to empirically integrate various components of evidence to form an
overall validity judgment to support the intended score interpretation and the implication
of score meaning. The aspects of construct validity the study focused on and the
limitations of the study are discussed in the following section, as well as the restatement
of the six aspects of construct validity.
The Content Aspect of Construct Validity
The validity evidence about the content is to set up the theoretical and empirical
basis for specifying the boundaries and the structure of the construct domain to be
assessed. The theoretical domain entails the scientific theory about the construct,
previous research, and one’s own observations. The empirical domain involves the
specific set of observed variables that measure the construct (Benson, 1998). Hence, a
matter for discussion about the content-related evidence is to address the professional
judgment and documentation to ensure all important parts of the construct domain are
covered (Messick, 1995).
46
The content-related validation study was primarily reached by examining
previous research about phonological awareness development in young children. The
initial investigators, Schwanenflugel and Blake, reviewed approximately 64 studies using
a wide variety of phonological awareness tasks to measure three to seven-year-old
children’s knowledge of the sound segments with intent to design phonological
awareness intervention for ongoing research, “PAVEd for Success” (Hamilton,
Schwanenflugel, Neuharth-Pritchett, & Restrepo, 2002). They summarized the studies by
the population age, the types of tasks used, the types of study design, and the findings of
the study. The initial investigators selected a subset of eight tasks of phonological
awareness which were considered to be significantly related to reading and decoding
skills later on. Also, the initial investigators took into consideration the mixture of
developmental path of the phonological awareness. They included the tasks that were
considered to be the beginning of the developmental continuum, such as rhyme and
syllable tasks in order to measure the beginning levels of phonological awareness. The
phoneme and grapheme tasks were included to assess the later development of the
phonological awareness. The tasks and the items were directly drawn from The
Phonological Awareness Test (Robertson & Salter, 1997). The initial investigators
conducted a pilot study with 19 pre-kindergarten and 11 kindergarten children during the
months of December and January of the year of 2002 from a local elementary school with
parental consent.
The initial investigators systematically investigated and brought the boundaries of
theoretical domain into focus based on the series of previous studies concerning the
construct. Furthermore, the tasks and the items used in the study drew from the
47
instrument that had established the norms. Nonetheless, some of the tasks of
phonological awareness in the study proved to be potentially incompatible to this age
population during the tasks administration (cf. the item analyses results). For instance,
some of the tasks dropped from the battery because the administration took too long for
the age level. For another example, the initial isolation and phoneme blending tasks
seemed to be too difficult for this age population. This might be due to the fact that the
intended age population of The Phonological Awareness Test was discordant with the age
population of the study. The test manual indicates that administering The Phonological
Awareness Test to children younger than 5 years may not be appropriate since they are
normally not developmentally ready to perform all of the assessment tasks in The
Phonological Awareness Test. Yet, the test manual points out that it is left to the
researcher’s discretion if the administration of particular tasks would be beneficial to
obtain the useful information (Robertson & Salter, 1997).
As discussed earlier, understanding the developmental sequence of phonological
awareness is important because the different developmental levels of task difficulty are
directly related to the issues of assessment validity. The child’s assessed level of
phonological awareness can be dramatically affected by the difficulty or complexity of
the tasks; that is, the different types of tasks depend on the different levels of cognitive
and linguistic abilities of the child.
The task items exceeded the subjects’ levels of attention span or the levels of
developmental task difficulty, may lead to the construct invalidity because the tasks are
irrelevantly too difficult for the age population. Therefore, the tasks should be revised for
48
this age level. For example, the initial investigators might need to reconstruct items by
using more familiar words to this age population.
One way of reconstructing the tasks or items is to comprise a panel of experts to
evaluate the content and format relevance. The content experts’ judgment about the
degree to which the item reflects the content defined by the facet of the domain
specification can provide ongoing professional test-development and systematic
documentation of the consensus of multiple judges (Messick, 1989). The judgment of
experts on the content relevance can be numerically summarized in statistical techniques
[e.g., index of item congruence (Hanbleton, 1980)]. The index of item congruence ranges
from -1 to +1, with the highest value of +1 indicating that all content experts agree that
the item is congruent with the domain specification. In addition, factor analysis or multi-
dimensional scaling of relevance rating by multiple experts can be useful tools for the
purpose of content validity that examine the theoretical boundaries of the construct
(Beson, 1998; Messick, 1989).
The Substantive Aspect of Construct Validity
The substantive component of construct validity incorporates the content
properties and the response consistencies. Indeed, the substantive aspect is to provide
theoretical rationales and empirical evidence of response consistencies or performance
regularities that manifest the domain specifications (Loevinger, 1957; Messick, 1989,
1995).
The substantive aspect of validity on the present study focused on the structure of
task administration and scoring procedures in order to make subjective judgments and to
49
show that the scores were based on the completion of a process. Regarding the
concentration levels and the cognitive abilities of the age population, the initial
investigators set up two types of ceiling for all the subtasks. The ceiling for
administration starting rule was that the actual task items were administered to subjects
who responded to at least one of the three practice items correctly. Then, the ceiling for
administration termination rule was that the task administration was stopped if there were
three consecutive incorrect items in the responses. Setting the ceiling for the tasks might
be one of the reasons that some of the tasks turned out to be too difficult. For instance,
the majority of the subjects were not qualified for taking the actual task items on the
initial isolation and phonemes blending tasks. Termination of the task administration
after the three consecutive incorrect responses also reduced the number of respondents as
the administration processes. The reduction of the number of respondents might
influence the levels of item difficulty. Use of the ceiling for both administration rule and
termination rule may acquire careful inspection because the observed set of responses
used to estimate the subjects’ abilities to successfully perform the task would be
restricted by the application of the ceiling.
The empirical evidence of response consistency in the study was derived from the
correlation patterns among the items on each task. The internal consistency was
measured by coefficient alpha, revealing that all of four tasks had high internal
consistencies, with á > .85. The high internal consistencies of the phonological
awareness tasks in the present study are likely to be as a consequence of the task
difficulty or setting the ceiling for the tasks. For example, if the particular task was
difficult for most of the subjects, the variance would be small, and the task reliability
50
would increase. Accordingly, one should be cautious to interpret the coefficient alpha
since the task reliability is an important consideration in task selection, and the task
reliability can be affected by multiple factors, such as the variance, the length of the task,
or the quality of instrument itself.
In addition, Messick (1989) suggests a combined convergent-discriminant
strategy for test construction as an elaboration of substantive approach. The convergent-
discriminant strategy is to develop measures of two or more distinct construct at the same
time. If the combined pool of items correlate more highly with their own purported
construct score than with score for other constructs, the items are kept on a given
construct scale. Hence, item selection could be systematically based upon convergent
and discriminant evidence, while method contaminants could be suppressed at the same
time. The present study was not able to achieve such strategy since the items were drawn
from the commercial assessment instrument. If a whole task construction process was
employed, it would be feasible to conduct such an elaboration of substantive approach to
investigate the convergent and discriminant evidence for item selection to provide
explicit reference to task cover and to rationally attune to the nature of the construct in
sound.
The Structural Aspect of Construct Validity
The structural aspect of construct validity entails the analyses of internal structure
of the task that appraise the relationships among the task items and the theory of the
construct domain. Messick (1995) notes that the structural aspect of validity should
evaluate not only the selection or construction of assessment tasks related to the domain
51
construct but also the rational development of construct-based scoring criteria, rubrics,
and guidelines. The structural component of validation in the study subsumes the
empirical analyses of item difficulty, item discrimination, and factor analysis in addition
to the task intercorrelations.
Item Analyses
The results of item analyses obtained in the current study agree with previous
studies regarding the levels of task difficulty. Generally, rhyme task is thought to be the
easiest, and phonemes deletion or phonemes segmentation is considered to be the most
difficult among the phonological awareness tasks (Hoien et al., 1994; Stanovich et al.,
1984; Yopp, 1988). Likewise, the present study found that rhyming discrimination was
the easiest, while initial isolation was the most difficult among the phonological
awareness tasks used in the study.
As noted earlier, the item analyses were conducted under two sets of conditions.
First, there was the preliminary item condition that took into consideration three practice
items. In this case, the preliminary item was scored as 1 if the subject responded to at
least one of the three practice items correctly; otherwise, it was scored as 0 on the
preliminary item. Then, there was the actual item condition which considered only the
actual task items. Therefore, item difficulties of preliminary items in Table 6, 8, 10, and
12 imply the proportion of subjects who were qualified for taking the actual task items.
This was based on the assumption that the score of 0 on the preliminary item condition
differed from the score of 0 on the actual item condition.
The second set of item analyses conditions relied on the number of subjects
considered in the data analyses. The item analyses were conducted based on the total
52
number of subjects (N = 415), and based on the number of subjects who actually
responded to the items. Indeed, the subjects who responded to any one of the three
consecutive items incorrectly were excluded from the latter case of the item analyses.
This was because the examinees experienced great difficulty with some of the tasks, and
this strategy was to ensure more suitable item analyses.
It was assumed that the item difficulties on each of the tasks would systematically
decrease as the task administrations progressed because of the ceiling for termination
rule. The results of item analyses indicated that the levels of item difficulty were
unsystematically distributed on rhyming discrimination and syllable segmentation tasks
when both the total number of subjects and the actual number of respondents on the items
were considered for the data analyses. The item difficulties seemed to systematically
decrease as the administration processed on the initial isolation task when the item
analyses were based on the total number of subjects. However, the levels of item
difficulty were unsystematically distributed when the actual number of respondents on
the items was applied to the data analyses. In contrast, the levels of item difficulty
seemed to systematically decrease on the phonemes blending task based on both the total
number of subjects and the actual number of respondents on the items.
Interestingly, there were great discrepancies in the levels of item difficulty on the
initial isolation task when the item analyses were based on the total number of subjects
and based on the number of subjects who actually responded to the items. When the
analyses considered the number of subjects who actually responded to the items, the
levels of item difficulty increased greatly. The item difficulties ranged from .053 (the last
actual item, laugh) to .190 (preliminary item) when the data analyses were based on the
53
total number of subjects. On the other hand, the item difficulties ranged from .190
(preliminary item) to .857 (the fourth actual item, fudge) when the item analyses were
based on the number of subjects who actually responded to the items. Similarly, the
levels of item difficulty on phonemes blending task ranged from .031 (the last actual
item, /s – l – i – p – çr/) to .308 (preliminary item) when the total number of subjects was
applied to the item analyses. When the item analyses were based on the number of
subjects who actually responded to the items, the levels of item difficult ranged from .216
(the second actual item, /n – ç/) to .598 (the first actual item, /b – oi/) (see. Table 10 and
12).
Item discrimination for each item was estimated by the point biserial correlation
coefficient. None of the items on each of the tasks had the item discrimination of less
than .35, revealing that the items discriminated well between the subjects with relatively
high abilities and relatively low abilities.
The separate item analyses on the all three practice items instead of the combined
set of the three preliminary items would provide valuable information to evaluate the
appropriateness of ceiling and to estimate subjects’ abilities to successfully perform the
tasks. That is, the ability of subjects who responded to all three practice items incorrectly
is more likely to differ from the ability of subject who responded to only one practice
item incorrectly. In that sense, it would be desirable to record all the information about
the subjects’ responses on the practice items in addition to the actual task items for more
detailed empirical analyses for those items to estimate the subjects’ potential abilities to
successfully perform the tasks.
54
Task Intercorrelations
Findings of previous studies indicated that various tasks to measure the
knowledge of sound segments were correlated with one another (e.g., Hoien et al., 1995;
Stanovich et al., 1984; Yopp, 1988). Likewise, the four tasks of phonological awareness
used in the study were significantly intercorrelated, suggesting that they tap much of the
same construct that underlies the measurements.
Factor Analysis
Since statistically significant interrelationships were obtained in the correlation
matrix, a principal component factor analysis was conducted in order to examine the
underlying structure. Using the eigenvalues-greater-than-one criterion, the factor
analysis extracted one factor, which accounted for 54.8 % of the total variance. Each of
the tasks strongly loaded on the factor, revealing that the construct may explain each task
well. Yopp (1988) conducted factor analysis based on the ten tasks of phonological
awareness and yielded a two-factor solution. She labeled the first factor as Simple
Phonemic Awareness and the second factor as Compounded Phonemic Awareness.
Hoien and his associates (1995) conducted a factor analysis with six tasks of
phonological awareness and found a three-factor solution, phoneme factor, syllable
factor, and rhyme factor. The present study does not agree with these two studies
regarding the dimensionality. This might be due to the fact that the current study
conducted factor analysis based on a limited number of tasks.
The plot of eigenvalues was also used to determine the number of factors to rotate
and showed that a two-factor solution might also be appropriate. Since an additional 18.3
% of total variance was accounted for by the second factor, factor analysis extracted two
55
factors by specifying the number of factors in the analysis. Phoneme blending and initial
isolation tasks loaded highly on the first factor, with loadings of .88 and .79, respectively;
while rhyming discrimination and syllable blending tasks loaded highly on the second
factor, with loadings of .81 and .81, respectively. This finding is somewhat consistent
with the findings of Hoien and his colleagues (1995). Their findings indicated that the
ability to analyze the smaller units, phonemes is separable from the ability to analyze the
larger unit, rhymes or syllables. Since the scree test yields more accurate analysis than
the eigenvalues-greater-than-one criterion (Green, Salkin, & Akey, 1997) and the second
factor accounted for large amount of total variance, it is concluded that two factors
underlie the construct of phonological awareness. However, it would be more advisable
to conduct confirmatory factor analysis to verify the underlying structure of phonological
awareness found in the present study.
The Generalizability Aspect of Construct Validity
The generalizability component is to examine the replicability or consistency of
assessment results across population groups, situations, time periods, and task domains,
in order to set boundaries of score meaning (Messick, 1995). According to Messick
(1989) the generality of construct meaning can be evaluated by any or all of the
techniques of construct validation. Assessing the comparable correlation patterns with
other measures, examining the test score across random samples of different groups (e.g.,
ethnic, cultural, or SES groups), and combing indicators of test-retest reliability and
construct meaning are examples of techniques to appraise the generalizability of score
meaning. Therefore, the present study also assesses the generality of construct meaning
56
since the purpose of the study was to empirically follow the construct validation process
advocated by Messick (1989) although it provides the limited evidence about the
consistency of assessment results across multiple levels of random facets of phonological
awareness assessment.
Devising a more direct way to appraise the generalizability aspect of construct
validity would be beneficial. For example, Benson (1998) recommended that
generalizability theory is a useful method to differentiate types of errors in measurement
and to provide evidence for how well the empirical domain represents the theoretical
domain. Furthermore, she suggests an informative set of studies that includes
confirmatory factor analysis and generalizability theory. Confirmatory factor analysis is
designed to determine how well the specific set of observed variables fit the structure of
the theoretical domain, and generalizability theory is to evaluate how adequately the
items are representative of the empirical domain.
The External Aspect of Construct Validity
The external aspect of construct validity is to evaluate how well the assessed
construc t empirically correlates in an expected way with different constructs and
characteristics of the subjects. The evidence about the external structure becomes
especially important if the assessment results are used for selection, placement, licensure,
or program evaluation (Messick, 1995). The present study includes the empirical
relationships between the tasks of phonological awareness and the tests of alphabet
knowledge by correlation coefficients and multiple regression analysis, and group
differentiation to establish the external evidence.
57
Relationships to Alphabet Knowledge Tests
None of the phonological awareness tasks were statistically significantly
correlated with four tests of alphabet knowledge. This finding is contradictory to the
findings of Lonigan and his associates (2000) that there was a predictive relation between
phonological awareness and later letter knowledge. This conflict might be due to
difference in time interval between the administration of phonological awareness and the
alphabet knowledge test. Lonigan and his associates (2000) had about a 12-month time
interval between the phonological awareness tasks at time 1 and the letter knowledge test
at time 2. On the other hand, the current study administered the alphabet name and sound
knowledge tests after about a four month time interval. These non-significant
correlations between the phonological awareness tasks and the alphabet knowledge tests
might also be the results of unknown characteristics of the subjects which might affect
the test scores in the present study. The subjects in the study included a various ethnic
and language background. Thus, there might be some outliers or compounding variables
that affected the score interpretation due to the limited English proficiency or speech
impairment. For example, examining the outliers through the residual would be
beneficial, as well as gathering more detailed information about language related
impairments. The further investigation with the data collected later than the current study
and more explicit reading tests might provide more clear explanation whether the
phonological awareness is significant predictors of later reading and decoding skills.
Regression Analysis
A linear combination of initial isolation and phonemes blending tasks made a
statistically significant contribution to accounting for the variance in the alphabet sound-
58
upper and lower case test, although only 2% of the variance in the alphabet sound test
was accounted for by the linear combination of phonological awareness tasks and the
squared cross-validated correlation coefficient was similarly .019. The result of
regression analysis in the study is similar to the finding of Hoien and his colleagues
(1995) that phonemic awareness proved to be a more potent predictor of early reading
acquisition than syllable or rhyme tasks. Because the linear combination of initial
isolation and phoneme blending tasks explained only 2 % of the variance of the alphabet
sound knowledge test, one needs to be cautious to interpret the result of the regression
analysis. In order to evaluate the relationships between the phonological awareness and
reading development more precisely, including the assessment of more explicit measure
of reading and decoding skills in children one or more years later might be useful.
Group Differentiation
The current study indicates consistent results with findings of Burt and her
associates (1999) that there is no significant gender difference in the performance on the
phonological awareness tasks. The current study also reports that there are no
statistically significant differences in phonological awareness task performances between
the lower socioeconomic (SES) group and the upper SES group; in contrast, Burt and her
colleagues (1999) found that the upper socioeconomic group significantly outperformed
the lower group. Since the subjects were ethnically diverse, and about 24.2 % of the
subjects spoke other than English as a first language, the present study also took into
consideration ethnic differences. There were no statistically significant differences
among the different ethnic groups. The present study found that the tasks of
59
phonological awareness do not seem to have gender, SES, and ethnicity differences when
tested separately.
In addition, study with multitrait-multimethod matrix and structural equation
modeling can provide valuable information about external structure of the assessments.
Multitrait-multimethod matrix can provide empirical collection of convergent and
discriminant evidence by displaying all of the intercorrelations generated when each of
several constructs or traits is measured by each of several methods. Therefore, the
multitrait-multimethod matrix allows estimating the relative contributions of trait and
method variance related to the particular construct measures (Messick, 1989).
Conducting such a method would be beneficial because multitrait-multimethod matrix
entails sound judgment about the constructs to be included in a matrix and offers
provisional evidence to support the nomological validity of the construct. Benson (1998)
suggests the structural equation modeling (SEM) to examine the external aspect of
construct validation. SEM links a specific set of items to the hypothesized structure of
the construct, and the structural model links the constructs with the nomological network
which is theoretical constructs and hypothesized relationships among the constructs.
The Consequential Aspect of Construct Validity
The consequential component of construct validity is fundamentally concerned
with any negative implication on individuals or groups due to the construct
underrepresentation or construct- irrelevant variance. Although some of the tasks used in
the current study might be too difficult for this age population, the levels of task difficulty
did not seemed to affect the scores of certain individuals or groups, such as different
60
ethnic groups and different SES. Additionally, the tasks used to measure the subjects
sensitivity to or ability to analyze the spoken language segments that comprise the words
agree to the purpose of the instrument which was designed to diagnose deficits in
phonological awareness and phoneme-grapheme correspondence.
As noted earlier, validity evidence relevant to all of the six aspects need to be
accumulated into an overall validity judgment to support score-based interpretations and
action implications. This process includes relevant sampling domain, constructing
relevant assessment tasks, appropriate task administration and scoring procedure, and
careful attention to the tasks invalidity. Figure 4 displays the assessment construction
procedures corresponding to the six aspects of construct validation procedures.
61
VII. CONCLUSION
The present study provides information about the psychometric characteristics of
phonological awareness assessment in pre-kindergarten children. Most of all, the study
aims to empirically implement the theoretical framework for unitary construct validity
that integrates various sources of evidence to support the validity of the score derived
from the test.
The current study confirms previous findings regarding the developmental
levels of task difficulty. Although some of the tasks seemed to be too difficult for this
age level, the study found that two factors underlie the construct of phonological
awareness. These two factors accounted 73.12 % of the total variance, supporting the
structural concept of phonological awareness. Furthermore, a linear combination of
initial isolation and phoneme blending tasks, from the first factor, support the predictive
validity for the initial stage of reading acquisition although the practical importance was
fairly small. In addition, the initial investigators modified the technical quality of a
testing system to establish standard setting for the age level, as well as appropriate task
administration and scoring procedures. The levels of task difficulty do not seem to affect
certain types of individuals or groups. From the various components of the validity
evidence, it is concluded that the tasks of phonological awareness in the study provide
valuable information about the knowledge of sound segments in pre-kindergarten
children.
62
Indeed, the present study carries out the unitary conception of construct validation
that accumulates content, criteria, and consequences together to form a scientific basis for
addressing score-based interpretations, utility of score meaning, and value implications as
a ground for action. One should note that the validation is a matter of degree rather than
the property of all or none. The degree to which the score interpretation and implications
of score meaning remain valid across individuals or population, across settings, or across
task context is a continual issue because the interpretation of score on the construct
changes as the social conditions shift (Benson, 1998; Messick, 1989). This is why the
validity is an evolving property, while the validation is a continual process. Therefore,
ongoing validation studies are necessary to reestablish the validity in order for a test to
remain valid over time.
63
REFERENCES
Adams, M. J. (1990). Beginning to read: Thinking and learning about print.
Cambridge, MA: MIT Press.
American Educational Research Association, American Psychological Association, &
National Council on Measurement in Education (1999). Standards for
educational and psychological testing. Washington DC: Author.
Backman, J. (1983). The role of psycholinguistic skills in reading acquisition: A look at
early readers. Reading Research Quarterly, 18, 466-479.
Benson, J. (1998). Developing a strong program of construct validation: A test anxiety
example. Educational Measurement: Issues and Practice, 17(1), 10-17, 22.
Bishop, D. V. M., & Adams, C. (1990). A prospective study of the relationship between
Specific language impairment, phonological disorders, and reading retardation.
Journal of Child Psychology and Psychiatry and Allied Disciplines, 31, 1027-
1050.
Blachman, B. A. (1994). Early literacy acquisition: The role of phonological awareness.
In G. P. Wallach & K. G. Butler (Eds.), Language learning disabilities in school-
age children and adolescents: Some principles and applications (pp. 253-274).
New York, NY: Macmillan.
Bradley, L. L., & Bryant, P. E. (1983). Categorizing sounds and learning to read: A
causal connection. Nature, 301, 419-421.
Browne, M. W. (1975). Predictive validity of a linear regression equation. British
64
Journal of Mathematical and Statistical Psychology, 28, 79-87.
Bryant, P. E., MacLean, M., & Bradley, L. L. (1990). Rhyme, language, and children’s
reading. Applied Psycholinguistics, 11, 237-252.
Bryant, P. E., MacLean, M., Bradley, L. L., & Crossland, J. (1990). Rhyme and
alliteration, phoneme detection, and learning to read. Developmental Psychology,
26, 429-438.
Burt, L., Holm, A., & Dodd, B. (1999). Phonological awareness skills of 4-year-old
British children: An assessment and developmental data. International Journal of
Language & Communication Disorders, 34, 311-335.
Chall, J. S., Jacobs, V., & Baldwin, L. (1990). The reading crisis: Why poor children fall
behind. Cambridge, MA: Harvard University Press.
Cohen, J., & Cohen, P. (1983). Applied multiple regression/correlation analysis for the
behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum.
Cronbach, L., & Meehl, P. (1955). Construct validity in psychological tests.
Psychological Bulletin, 52, 281-302.
Cunningham, A. E., & Stanovich, K. E. (1997). Early reading acquisition and its relation
to reading experience and ability 10 years later. Developmental Psychology, 33,
934-945.
Goswami, U. (1986). Children’s use of analogy in learning to read: A developmental
study. Journal of Experimental Child Psychology, 42, 73-83.
Goswami, U. (1988). Children’s use of analogy in learning to spell. British Journal of
Developmental Psychology, 6, 21-33.
65
Goswami, U., & Bryant, P. (1990). Phonological skills and learning to read. Hillsdale,
NJ: Lawrence Erlbaum.
Green, S. B., Salkind, N. J., & Akey, T. M. (1997). Using SPSS for windows: Analyzing
and understanding data (2nd ed.). Upper Saddle River, NJ: Prentice Hall.
Hambleton, R. K. (1980). Test score validity and standard-setting methods. In R. A.
(Ed.). Criterion-referenced measurement: The state of the art (pp. 80-123).
Baltimore, MD: Johns Hopkins University Press.
Hamilton, C. E., Schwanenflugel, P., Neuharth-Pritchett, S., & Restrepo, M. A. (2002).
Data from on-going research PAVEd for Success, Unpublished.
Hill, S. (1999). Phonics. York, ME: Stenhouse Publishers.
Hills, J. R. (1981). Measurement and evaluation in the classroom (2nd ed.). Columbus,
OH: Charles E. Merrill.
Hoien, T., Lundberg, I., Stanovich, K. S., & Bjaalid, I. (1995). Component of
phonological awareness. Reading and Writing: An Interdisciplinary Journal, 7,
171-188.
Johnston, P., & Allington, R. (1991). Remediation. In R. Barr, M. Kamil, P. Mosenthal,
& P. D. Pearson (Eds.), Handbook of reading research (pp. 984-1012). New
York: Longman.
Liberman, A. M., Cooper, F. S., Shankweiler, D. P., & Studdert-Kennedy, M. (1967).
Perception of the speech code. Psychological Review, 74, 431-461.
Liberman, I. Y. (1978). Segmentation of the spoken word and reading acquisition.
Bulletin of the Orton Society, 23, 65-77.
66
Liberman, I. Y., Shankweiler, D. P., Fischer, F. W., & Carter, B. (1974). Explicit
syllable and phoneme segmentation in the young child. Journal of Experimental
Child Psychology, 18, 201-212.
Liberman, I. Y., Shankweiler, D. P., & Liberman, A. M. (1989). The alphabetic principle
and learning to read. In D. Shankweiler & I. Y. Liberman (Eds.), Phonology and
reading disability: Solving the reading puzzle (pp. 1-33). Ann Arbor: University
of Michigan Press.
Loevinger, J. (1957). Objective tests as instruments of psychological theory.
Psychological Reports, 3, 635-694.
Lonigan, C. J., Burgess, S. R., Anthony, J. L., & Barker, T. A. (1998). Development of
phonological sensitivity in 2-to-5-year-old children. Journal of Educational
Psychology, 90, 294-311.
Lonigan, C. J., Burgess, S. R., & Anthony, J. L. (2000). Development of emergent
literacy and early reading skills in preschool children: Evidence from a latent-
variable longitudinal study. Developmental Psychology, 36, 596-613.
MacLean, M., Bryant, P. E., & Bradley, L. L. (1987). Rhymes, nursery rhymes, and
reading in early childhood. Merrill-Palmer Quarterly, 33, 11-37.
Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed.,
pp. 13-103). New York: Macmillan.
Messick, S. (1995). Validity of psychological assessment: Validation of inferences from
persons’ responses and performances as scientific inquiry into score meaning.
American Psychologist, 9, 741-749.
Miller, M. D., & Linn, R. L. (2000). Validation of performance based assessments.
67
Applied Psychological Measurement, 24, 367-378.
Robertson, C., & Salter, W. (1997). The phonological awareness test. East Moline, IL:
LinguiSystem.
Share, D. L., & Stanovich, K. E. (1995). Cognitive processes in early reading
development: Accommodating individual differences into a model of acquisition.
Issues in Education: Contributions from Educational Psychology, 1, 1-57.
Stahl, S. A., & Murray, B. A. (1994). Defining phonological awareness and its
relationship to early reading. Journal of Educational Psychology, 86, 221-234.
Stanovich, K. E. (1986). Matthew effects in reading: Some consequences of individual
differences in the acquisition of literacy. Reading Research, Quarterly, 21, 360-
407.
Stanovich, K. E. (1992). Speculations on the cause and consequences of individual
differences in early reading acquisition. In P. B. Gough, L. C. Ehri, & R. Treiman
(Eds.), Reading acquisition (pp. 307-342). Hillsdale, NJ: Lawrence Erlbaum.
Stanovich, K. E., Cunningham, A. E., & Cramer, B. B. (1984). Assessing phonological
awareness in kindergarten children: Issues of task comparability. Journal of
Experimental Child Psychology, 38, 175-190.
Stevenson, H. W., & Newman, R. S. (1986). Long – term prediction of achievement and
attitudes in mathematics and reading. Child Development, 57, 646-659.
Sulzby, E., & Teale, W. (1991). Emergent of literacy. In R. Barr, M. Kamil, P.
Mosenthal, & P. D. Pearson (Eds.), Handbook of reading research (pp. 727-758).
New York: Longman.
Torgesen, J. K. (1999). Phonologically based reading disabilities: Toward a coherent
68
theory of one kind of learning disability. In R. J. Sternberg & L. Spear-Swerling
(Eds.), Perspectives on learning disabilities: Biological, cognitive, contextual (pp.
106-135). Boulder, CO: Westview Press.
Torgesen, J. K., & Mathes, P. G. (2000). A basic guide to understanding, assessing, and
teaching phonological awareness. Austin, TX: Pro-ED.
Wagner, R. K., & Torgesen, J. K. (1987). The nature of phonological processing and its
causal role in the acquisition of reading skills. Psychological Bulletin, 101, 192-
212.
Wagner, R. K., Torgesen, J. K., Rashotte, C. A., Hecht, S. A., Barker, T. A., Burgess, S.
R., Donahue, J., & Garon, T. (1997). Changing relations between phonological
processing abilities and word- level reading as children develop from beginning to
skilled readers: A 5-year longitudinal study. Developmental Psychology, 33, 468-
479.
Whitehurst, G. J., & Lonigan, C. J. (1998). Child development and emergent literacy.
Child Development, 69, 848-872.
Yopp, H. L. (1988). The validity and reliability of phonemic awareness tests. Reading
Research Quarterly, 23, 159-177.
Yopp, H. K., & Yopp, R. H. (2000). Supporting phonemic awareness development in the
classroom. The Reading Teacher, 54, 130-143.
69
Table 1
The Maximum Scores, the Means, and the Standard Deviations for Phonological
Awareness Tasks Based on the Preliminary Item Condition
Task Max. Score M SD N
Rhyming discrimination 11 3.64 3.88 415
Syllable segmentation 11 2.35 2.93 415
Initial isolation 11 0.87 2.54 415
Phonemes blending 11 1.05 2.17 415
Note. Preliminary item condition is score 0 if the examinee responded to all three practice
items incorrectly otherwise, it is scored 1.
70
Table 2
The Maximum Scores, the Means, and the Standard Deviations for Phonological
Awareness Tasks Based on the Actual Item Condition
Task Max. Score M SD N
Rhyming discrimination 10 3.10 3.46 415
Syllable segmentation 10 1.81 2.58 415
Initial isolation 10 0.68 2.29 415
Phonemes blending 10 0.74 1.86 415
71
Table 3
Coefficients Alpha and the Standard Error of Measurements for Phonological Awareness
Tasks Based on the Preliminary Item Condition
Task á SEM N
Rhyming discrimination .93 1.02 415
Syllable segmentation .88 0.98 415
Initial isolation .97 0.45 415
Phonemes blending .89 0.71 415
Note. Preliminary item condition is score 0 if the examinee responded to all three practice
items incorrectly otherwise, it is scored 1.
72
Table 4
Coefficients Alpha and the Standard Error of Measurements for Phonological
Awareness Tasks Based on the Actual Item Condition
Task á SEM N
Rhyming discrimination .92 0.98 415
Syllable segmentation .88 0.93 415
Initial isolation .98 0.34 415
Phonemes blending .89 0.60 415
73
Table 5
The Mean Levels of Task Difficulty of Phonological Awareness Tasks
Task
Preliminary item
condition
Actual item
condition
N
Rhyming discrimination .330 .310 415
Syllable segmentation .214 .181 415
Initial isolation .079 .067 415
Phonemes blending .095 .074 415
74
Table 6
Item Analyses for Rhyming Discrimination Task Based on the Preliminary Item
Condition
Item Item difficultya Item difficultyb nb Item discriminationa
Preliminary .542 .542 415 .823
book – look .412 .760 225 .759
fun – run .417 .772 224 .782
ring – rat .222 .414 222 .484
box – mess .222 .449 205 .560
fish – dish .371 .762 202 .812
mop – hop .357 .767 193 .813
shoe – fan .219 .484 188 .601
sweater – better .347 .778 185 .802
camper – hamper .361 .829 181 .817
pudding – table .169 .393 178 .565
Note. Preliminary item condition is score 0 if the examinee responded to all three practice
items incorrectly otherwise, it is scored 1.
aItem difficulties and item discrimination are based on the total number of examinees (N
= 415)
bItem difficulties are based on the number of examinees who actually responded to the
items.
75
Table 7
Item Analyses for Rhyming Discrimination Task Based on the Actual Item Condition
Item Item difficultya Item difficultyb nb Item discriminationa
book – look .412 .760 225 .736
fun – run .417 .772 224 .761
ring – rat .222 .414 222 .471
box – mess .222 .449 205 .558
fish – dish .371 .762 202 .809
mop – hop .357 .767 193 .814
shoe – fan .219 .484 188 .604
sweater – better .347 .778 185 .803
camper – hamper .361 .829 181 .817
pudding – table .169 .393 178 .576
aItem difficulties and item discrimination are based on the total number of examinees (N
= 415)
bItem difficulties are based on the number of examinees who actually responded to the
items.
76
Table 8
Item Analyses for Syllable Segmentation Task Based on the Preliminary Item Condition
Item Item difficultya Item difficultyb nb Item discriminationa
Preliminary .540 .540 415 .650
pizza .316 .585 224 .672
watermelon .087 .162 222 .466
fix .337 .639 219 .575
calendar .166 .375 184 .352
television .089 .204 181 .572
moose .275 .659 173 .634
elephant .106 .273 161 .587
pillow .178 .481 154 .695
kindergarten .070 .195 149 .565
candy .190 .552 143 .727
Note. Preliminary item condition is score 0 if the examinee responded to all three practice
items incorrectly otherwise, it is scored 1.
aItem difficulties and item discrimination are based on the total number of examinees (N
= 415)
bItem difficulties are based on the number of examinees who actually responded to the
items.
77
Table 9
Item Analyses for Syllable Segmentation Task Based on the Actual Item Condition
Item Item difficultya Item difficultyb nb Item discriminationa
pizza .316 .585 224 .633
watermelon .087 .162 222 .475
fix .337 .639 219 .514
calendar .166 .375 184 .664
television .089 .204 181 .597
moose .275 .659 173 .604
elephant .106 .273 161 .608
pillow .178 .481 154 .709
kindergarten .070 .195 149 .596
candy .190 .552 143 .742
aItem difficulties and item discrimination are based on the total number of examinees (N
= 415)
bItem difficulties are based on the number of examinees who actually responded to the
items.
78
Table 10
Item Analyses for Initial Isolation Task Based on the Preliminary Item Condition
Item Item difficultya Item difficultyb nb Item discriminationa
Preliminary .190 .190 415 .609
bite .082 .430 79 .953
toy .075 .397 78 .897
dinosaur .065 .355 76 .862
fudge .072 .857 35 .927
nose .072 .833 36 .900
apple .065 .750 36 .881
garage .063 .844 36 .722
happy .063 .743 35 .890
chalk .065 .794 34 .862
laugh .053 .647 34 .827
Note. Preliminary item condition is score 0 if the examinee responded to all three practice
items incorrectly otherwise, it is scored 1.
aItem difficulties and item discrimination are based on the total number of examinees (N
= 415)
bItem difficulties are based on the number of examinees who actually responded to the
items.
79
Table 11
Item Analyses for Initial Isolation Task Based on the Actual Item Condition
Item Item difficultya Item difficultyb nb Item discriminationa
bite .082 .430 79 .955
toy .075 .397 78 .898
dinosaur .065 .355 76 .867
fudge .072 .857 35 .934
nose .072 .833 36 .904
apple .065 .750 36 .888
garage .063 .844 36 .848
happy .063 .743 35 .901
chalk .065 .794 34 .867
laugh .053 .647 34 .839
aItem difficulties and item discrimination are based on the total number of examinees (N
= 415)
bItem difficulties are based on the number of examinees who actually responded to the
items.
80
Table 12
Item Analyses for Phoneme Blending Task Based on the Preliminary Item Condition
Item Item difficultya Item difficultyb nb Item discriminationa
Preliminary .308 .308 415 .600
/b – oi/ .183 .598 127 .772
/n – ç/ .065 .216 125 .579
/p – ö/ .067 .286 98 .647
/s – i – t/ .092 .396 96 .651
/f – l – î/ .087 .456 70 .743
/m – ou – s/ .082 .472 72 .663
/k – î – n – d/ .051 .313 67 .630
/s – n – a – p/ .043 .327 55 .646
/m – i – l – k/ .043 .316 57 .566
/s – l – i – p – çr/ .031 .236 55 .589
Note. Preliminary item condition is score 0 if the examinee responded to all three practice
items incorrectly otherwise, it is scored 1.
aItem difficulties and item discrimination are based on the total number of examinees (N
= 415)
bItem difficulties are based on the number of examinees who actually responded to the
items.
81
Table 13
Item Analyses for Phoneme Blending Task Based on the Actual Item Condition
Item Item difficultya Item difficultyb nb Item discriminationa
/b – oi/ .183 .598 127 .706
/n – ç/ .065 .216 125 .577
/p – ö/ .067 .286 98 .656
/s – i – t/ .092 .396 96 .640
/f – l – î/ .087 .456 70 .755
/m – ou – s/ .082 .472 72 .662
/k – î – n – d/ .051 .313 67 .653
/s – n – a – p/ .043 .327 55 .679
/m – i – l – k/ .043 .316 57 .583
/s – l – i – p – çr/ .031 .236 55 .624
aItem difficulties and item discrimination are based on the total number of examinees (N
= 415)
bItem difficulties are based on the number of examinees who actually responded to the
items.
82
Table 14
Intercorrelations among the Phonological Awareness Tasks
Task 1 2 3 4
1. Rhyming Discrimination — .40 .36 .36
2. Syllables Segmentation — .43 .32
3. Initial Isolation — .51
4. Phonemes Blending —
Note. Computations are based on the actual item condition
83
Table 15
Factors, Eigenvalues, and Percentage of Variance Accounted for
Factor Eigenvalue Percentage of Variance Total Variance
1 2.19 54.83 54.83
2 .73 18.29 73.12
3 .62 15.44 88.56
4 .46 11.44 100.00
Note. Factor analysis is conducted based on the actual item condition.
84
Table 16
Factor Loadings for One-Factor Solution
Task Factor
Rhyming discrimination .70
Syllables segmentation .72
Initial isolation .79
Phonemes blending .75
Note. Factor analysis is conducted based on the actual item condition.
85
Table 17
Factor Loadings for Two-Factor Solution after Varimax Rotation
Task Factor 1 Factor 2
Rhyming Discrimination .20 .81
Syllables Segmentation .22 .81
Initial Isolation .79 .32
Phonemes Blending .88 .15
Note. Factor analysis is conducted based on the actual item condition.
86
Table 18
The Means and the Standard Deviations of Alphabet Knowledge Tests
Tests
Max.
Score
M
SD
N
Letter name knowledge-upper case 16 12.06 9.92 415
Letter sound knowledge-upper case 16 5.03 7.21 415
Letter name knowledge- lower case 16 9.55 8.80 415
Letter sound knowledge-lower case 16 4.03 6.67 415
87
Table 19
Predictive Correlations between Phonological Awareness Tasks and Alphabet
Knowledge Tests
Task
Rhyming
discrimination
Syllables
segmentation
Initial isolation
Phonemes
blending
Letter name
knowledge-upper case
.07
p = .298
.16
p = .033
.08
p = .666
-.03
p = .767
Letter sound
knowledge-upper case
.05
p = .434
.20
p = .006
.25
p = .139
.12
p = .300
Letter name
knowledge- lower case
.07
p = .258
.17
p = .021
.09
p = .605
-.06
p = .629
Letter sound
knowledge- lower case
.05
p = .479
.20
p = .007
.25
p = .140
.15
p = .179
N 212 184 36 78
Note. Correlation coefficients are computed based on the number of examinees who
actually responded to the items on the phonological awareness tasks
88
Table 20
The Means and Standard Deviations of Phonological Awareness Tasks by Gender Group
Male Female
Task M SD N M SD M
Rhyming discrimination 3.18 3.47 200 3.13 3.45 196
Syllables segmentation 1.89 2.70 199 1.73 2.40 195
Initial isolation 0.68 2.31 198 0.69 2.32 195
Phonemes blending 0.68 1.91 198 0.81 1.80 195
Note. The analysis is based on the total number of examinees in the actual item condition.
89
Table 21
The Means and the Standard Deviations of Phonological Awareness Tasks by Ethnic
Group
African-
American
Asian
Bi-Racial
Caucasian
Hispanic
Task
M SD M SD M SD M SD M SD
Rhyming
discrimination
2.64 3.23 1.94 2.82 3.75 2.87 3.48 3.58 4.02 3.85
Syllables
segmentation
1.75 2.49 1.56 2.39 2.25 2.63 1.58 2.40 2.03 2.86
Initial
isolation
0.67 2.35 0.00 0.00 0.00 0.00 0.50 1.89 1.12 2.96
Phonemes
blending
0.63 1.79 0.19 0.75 0.75 1.50 0.75 1.75 0.88 1.97
N 128 16 4 106 60
Note. The analysis is based on the total number of examinees in the actual item condition.
90
Table 22
The Means and the Standard Deviations of Phonological Awareness Tasks by
Socioeconomic Group
Lower group Upper group
Task M SD M SD
Rhyming discrimination 3.33 3.25 3.14 3.55
Syllables segmentation 1.86 2.63 1.79 2.56
Initial isolation 0.57 2.14 0.75 2.43
Phonemes blending 0.90 2.05 0.66 1.76
N 115 265
Note. The analysis is based on the total number of examinees in the actual item condition.
Socioeconomic status is based on whether the subject receives free or reduced lunch or
not.
91
Figure 1
Developmental Sequence of Phonological Awareness
Age Development in phonological awareness tasks
3-year-olds Can recite nursery rhymes.
4-year-olds Can detect if two words rhyme.
Can produce a rhyme for a simple word.
5-year-olds Can understand the components of sounds that make them the
same of different.
Can isolate and pronounce the initial sound in a word.
Can blend and segment words into the syllabic units.
6-year-olds Can isolate and pronounce sounds in up to three-phoneme words.
Can blend the sounds in four-phoneme words.
7-year-olds Can manipulate phonemes, including adding, deleting, and moving
any phonemes to generate designated words.
92
Figure 2
Facets of the Unitary Validity
Test Interpretation Test Use
Evidential Basis
Construct Validity
Construct Validity +
Relevance and Utility
Consequential
Basis
Construct Validity + Value
Implications
Construct Validity +
Relevance and Utility +
Value Implication + Social
Consequences
93
Figure 3
Plot of Eigenvalues and Factors of Scree Test
Factor
4321
Eig
enva
lue
2.5
2.0
1.5
1.0
.5
0.0
94
Figure 4
The Procedure for Assessment Construction and Construct Validation
Assessment construction Aspect of validity Validation procedure
• Specifying cognitive outcomes
/ taxonomy of objectives
• Table of specification
• Developing assessment tasks –
construction of items
Content aspect • Specifying domain of construct
– previous research and
observation
• Construct underrepresetation
and construct irrelevancy
• Index of item congruence
• Developing answer keys
• Developing scoring rubrics
• Developing models for scoring
Substantive
aspect
• Administrating and scoring
considerations
• Evaluating assessment
instruments – task reliability
• Summarizing measurement
data
• Gathering information about
item analysis
Structural aspect • Item and subscale
intercorrelations
• Item analysis
• Factor analysis
• Item response theory
• Multitrait-multimethod matrix
Generalizability
• Generalizability theory
• Meta-analysis
95
External aspect • Multitrait-multimethod matrix
• Group differentiation
• Correlations with other
measures
• Regression analysis
• Structural equation modeling
• Selecting items from the
information about item analysis
and item bias detection
• Developing question / item file
Consequential
aspect
• Detecting item bias and fair
selection
• Evaluating intended /
unintended consequences of
score interpretation and use
• Evaluating the impact of test
invalidity
96
APPENDIX: PHONOLOGICAL AWARENESS TEST (PAT)
Ceiling for all subtests: Stop the administration if all of the three practice items are
wrong, or when there are 3 consecutive wrong items. If child is
loosing track of the task, go back to the example to remind the
child of the task.
Name: _______________________________________
Date of Administration: _________________________
Examiner: ____________________________________
Summary of Results
Test Raw Score
Rhyming Discrimination
Sentence Segmentation
Syllable Segmentation
Initial Isolation
Syllable Blending
Phoneme Blending
Consonants Graphemes
Long & Short Vowels Graphemes
97
Rhyming Discrimination
“I am going to say two words and ask you if they rhyme. Listen carefully. Do these
words rhyme? Fan – man.”
Stimulus phrase: “Do these words rhyme? _____ - _____ ”
Practice items: 1. Fan – man (yes), 2. Fan – tan (yes), 3. Fan – dog (no).
Item
Correct Response
Examinee’s
Response
Score
book – look Yes 1 0
fun – run Yes 1 0
ring – rat No 1 0
box - mess No 1 0
fish – dish Yes 1 0
mop – hop Yes 1 0
shoe – fan No 1 0
sweater - better Yes 1 0
camper - hamper Yes 1 0
pudding - table No 1 0
TOTAL SCORE
98
Sentence Segmentation
“I am going to say a sentence, and I want you to clap one time for each word I say. My
house is big. Now, clap it with me.” Say the sentences again and clap once as you say
each word. “My – house – is – big. Now, you try it by yourself. My house is big.”
Stimulus phrase: “Clap one time for each word I say. ____________________”
Practice items: 1. My – house – is – big. (4 claps) 2. My – name – is – _____. (4 claps)
3. I – like – dogs. (3 claps)
Item
Correct
Response
Examinee’s
Response
Score
He can swim 3 claps 1 0
My cat is black 4 claps 1 0
I am very tall 4 claps 1 0
My dad’s car won’t start 5 claps 1 0
That flower is pretty 4 claps 1 0
Some cows give milk 4 claps 1 0
The clown has big feet 5 claps 1 0
Let’s go to school 4 claps 1 0
I have ten books 4 claps 1 0
The kite is flying high 5 claps 1 0
TOTAL SCORE
99
Syllable Segmentation
“I am going to say a word, and I want you to clap one time for each word part or syllable
I say. Saturday. Now, clap it with me.” Say the word and clap once as you say each
syllable. “Sat – ur – day. Now, you try it by yourself. Saturday.”
Stimulus phrase: “Clap one time for each syllable in the word _____.”
Practice items: 1. Sat – tur – day (3 claps) 2. Fri – day (2 claps) 3. Dog (1 clap)
Item
Correct Response
Examinee’s
Response
Score
Pizza 2 claps 1 0
watermelon 4 claps 1 0
Fix 1 claps 1 0
calendar 3 claps 1 0
television 4 claps 1 0
moose 1 claps 1 0
elephant 3 claps 1 0
pillow 2 claps 1 0
kindergarten 4 claps 1 0
candy 2 claps 1 0
TOTAL SCORE
100
Initial Isolation
“I am going to say a word, and I want you to tell me the beginning or first sound in the
word. What’s the beginning sound in the word CAT?”
Stimulus phrase: “What’s the beginning sound in the word _____?”
Practice items: 1. CAT /k/ 2. MAD /m/ 3. JANE /j/
Item
Correct Response
Examinee’s
Response
Score
Bite /b/ 1 0
Toy /t/ 1 0
dinosaur /d/ 1 0
fudge /f/ 1 0
Nose /n/ 1 0
Apple /a/ 1 0
garage /g/ 1 0
happy /h/ 1 0
Chalk /ch/ 1 0
Laugh /l/ 1 0
TOTAL SCORE
101
Syllable Blending
“I’ll say the parts of a word. You guess what the word is. What word is this?” Pause for
one second between syllables. “ta – ble” If the child repeats the word in parts, say “Say
it faster, like this, table.”
Stimulus phrase: “What word is this? _____ .”
Practice items: 1. ta – ble (table) 2. mo – ther (mother) 3. he – llo (hello)
Item
Correct Response
Examinee’s
Response
Score
win - dow window 1 0
flow – er flower 1 0
can – dy candy 1 0
com – pu – ter computer 1 0
moun - tain mountain 1 0
bas – ket basket 1 0
tel – e – phone telephone 1 0
croc – o – dile crocodile 1 0
dic – tion – ar – y dictionary 1 0
con – ver – ti – ble convertible 1 0
TOTAL SCORE
102
Phoneme Blending
“I’ll say the sound. You guess what the word is. What word is this?” Pause for one
second between syllables. “p – o – p” If the child repeats the word by sounds, say, “Say
it faster, like this, pop.”
Stimulus phrase: “What word is this? _____ .”
Practice items: 1. p – o – p (pop) 2. d – o – g (dog) 3. c – a – t (cat)
Item
Correct Response
Examinee’s
Response
Score
/b – oi/ boy 1 0
/n – ç/ knee 1 0
/p – ö/ paw 1 0
/s – i – t/ sit 1 0
/f – l – î/ fly 1 0
/m – ou – s/ mouse 1 0
/k – î – n – d/ kind 1 0
/s – n – a – p/ snap 1 0
/m – i – l – k/ milk 1 0
/s – l – i – p – çr/ slipper 1 0
TOTAL SCORE
103
Consonants Graphemes – Discontinue is child gets 8 consecutive letters wrong, and
does not know those in his or her name.
“I’m going to show you some letters. I want you to tell me what sound each letter
makes.”
Stimulus phrase: “Tell me what sound this makes.”
Note: If the student gives one correct sound of /c, g, s/, prompt for the other sound by
asking, “What’s another sound this makes?” If the student is able to provide one correct
sound, score the item as correct.
Use the graphemes booklet for this subtest.
Item
Correct
Response
Examinee’s
Response
Score
Item
Correct
Response
Examinee’s
Response
Score
b /b/ 1 0 n /n/ 1 0
c /k, s/ 1 0 p /p/ 1 0
d /d/ 1 0 q /k, kw/ 1 0
f /f/ 1 0 r /r/ 1 0
g /g, j/ 1 0 s /s, z/ 1 0
h /h/ 1 0 t /t/ 1 0
j /j/ 1 0 v /v/ 1 0
k /k/ 1 0 w /w/ 1 0
l /l/ 1 0 x /eks, z, ks/ 1 0
m /m/ 1 0 z /z/ 1 0
TOTAL SCORE
104
Long & Short Vowels Graphemes
Use the same vowel card to elicit both the short and long vowel sounds below. If
necessary, prompt with “Now, tell me the other sound this letter makes.”
Note: Use the vowel sounds booklet for this subtest.
Item
Correct Response
Examinee’s
Response
Score
A /a/ as in bat 1 0
A /â/ as in cake 1 0
E /e/ as in met 1 0
E /ç/ as in me 1 0
I /i/ as in sit 1 0
I /î/ as in high 1 0
O /o/ as in top 1 0
O /ô/ as in over 1 0
U /u/ as in but 1 0
U /û/ as in use or tool 1 0
TOTAL SCORE