a comparison of ranking methods for normalizing scores
Post on 18-Dec-2021
2 Views
Preview:
TRANSCRIPT
A COMPARISON OF RANKING METHODS
FOR NORMALIZING SCORES
by
SHIRA R. SOLOMON
DISSERTATION
Submitted to the Graduate School
of Wayne State University,
Detroit, Michigan
in partial fulfillment of the requirements
for the degree of
DOCTOR OF PHILOSOPHY
2008
MAJOR: EVALUATION AND RESEARCH
Approved by:
______________________________ Advisor Date
______________________________
______________________________
______________________________
UMI Number: 3303509
33035092008
Copyright 2008 bySolomon, Shira R.
UMI MicroformCopyright
All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company 300 North Zeeb Road
P.O. Box 1346 Ann Arbor, MI 48106-1346
All rights reserved.
by ProQuest Information and Learning Company.
ii
DEDICATION
To my maternal grandmother, Mary Karabenick Brooks, whose love of art,
literature, and music has gone hand in hand with her concern for social welfare. To
my paternal grandmother, Frances Hechtman Solomon, who played the cards she
was dealt with style and wit.
iii
ACKNOWLEDGEMENTS
A dissertation is largely a solitary project, yet it builds on the contributions of
many. I have been standing on many shoulders.
First, I would like to thank my major advisor, Professor Shlomo Sawilowsky,
whose grand passion for argument made him someone I could relate to, and made
statistics seem worth doing. Dr. Sawilowsky has been generous with his time,
technical help, and the spirited exegeses that put this discipline in its true human
context.
Professors Gail Fahoome, Judith Abrams, and Leonard Kaplan have
brought a great deal to this dissertation and to my graduate experience. Dr.
Fahoome has been an excellent teacher, consistently insightful and reassuringly
low-key. I lucked into meeting Dr. Abrams through my research assistantship with
the medical school. Her assistance and advice have been invaluable. Dr. Kaplan
paid me the extraordinary compliment of joining my committee on the brink of his
retirement. I am indebted to each of these professors for their intellectual integrity
and their simple kindness.
I regret the untimely passing of Professor Donald Marcotte, who would have
been proud to see this dissertation completed. Dr. Marcotte provided a wonderful
initiation into the world of statistics, with his perennial admonition that the faster
you can solve problems, the more time you have to enjoy life.
When it came time to apply for this doctoral program, I reached out to the
professors who knew me best. I did not find them, in the end, in the ideological
combat zone of my master’s program or in the artful arena of my literary studies. I
iv
found them within the seminary walls, among the rabbis and professors who taught
me Talmud. Studying Talmud helped me to stop thinking so much and just learn.
For accomplishing this ingenious feat, and for supporting all my educational
adventures, I would like to thank Professor David Kraemer, Rabbi Leonard Levy,
and Professor Mayer Rabinowitz.
To Bruce Chapman, the teacher who forced inspiration to the forefront,
where it belongs: Here’s to you, Captain. To my great friends, Regina DiNunzio,
Tom Kilroe, Katy Potter, and Deborah Mougoue, who keep me on my toes.
My parents, Carole and Elliot Solomon, have been the staunchest
advocates of this reckless leap. Their unrelenting curiosity and unvarnished
pleasure in my pursuits has given me strength. And Mark Sawasky, my constant
friend and fan and love, becomes a bigger mensch every day.
v
TABLE OF CONTENTS
DEDICATION ............................................................................................................ ii
ACKNOWLEDGEMENTS ........................................................................................ iii
LIST OF TABLES.................................................................................................... vii
LIST OF FIGURES................................................................................................... ix
CHAPTERS
CHAPTER 1 – Introduction...................................................................... …...1
Research problem................................................................................5
Importance of the problem...................................................................6
Assumptions and limitations ................................................................7
Definitions ............................................................................................8
CHAPTER 2 – Literature review...................................................................10
Mental testing and the normal distribution .........................................10
Norm-referencing and the T score .....................................................11
Nonnormality observed ......................................................................13
Statistical considerations ...................................................................14
Standardizing transformations ...........................................................21
Approaches to creating normal scores ..............................................28
CHAPTER 3 – Methodology.........................................................................32
Programming specifications...............................................................33
Sample sizes......................................................................................33
Number of Monte Carlo repetitions ....................................................33
Achievement and psychometric distributions.....................................33
vi
Presentation of results .......................................................................34
CHAPTER 4 – Results .................................................................................43
CHAPTER 5 – Conclusion............................................................................89
Discussion..........................................................................................92
Moment 1—mean ..............................................................................92
Moment 2—standard deviation ..........................................................92
Moment 3—skewness........................................................................95
Moment 4—kurtosis ...........................................................................95
Recommendations.............................................................................96
REFERENCES........................................................................................................98
ABSTRACT ...........................................................................................................110
AUTOBIOGRAPHICAL STATEMENT...................................................................112
vii
LIST OF TABLES
Table 1. Differences among Ranking Methods in Attaining Target Moments .........25
Table 2. Smooth Symmetric—Accuracy of T Scores on Means..............................45
Table 3. Smooth Symmetric—Accuracy of T Scores on Standard Deviations ........46
Table 4. Smooth Symmetric—Accuracy of T Scores on Skewness ........................47
Table 5. Smooth Symmetric—Accuracy of T Scores on Kurtosis ...........................48
Table 6. Discrete Mass at Zero—Accuracy of T Scores on Means.........................49
Table 7. Discrete Mass at Zero—Accuracy of T Scores on Standard Deviations ...50
Table 8. Discrete Mass at Zero—Accuracy of T Scores on Skewness ...................51
Table 9. Discrete Mass at Zero—Accuracy of T Scores on Kurtosis.......................52
Table 10. Extreme Asymmetric, Growth—Accuracy of T Scores on Means ...........53
Table 11. Extreme Asymmetric, Growth—Accuracy of T Scores on Standard
Deviations................................................................................................................54
Table 12. Extreme Asymmetric, Growth—Accuracy of T Scores on Skewness......55
Table 13. Extreme Asymmetric, Growth—Accuracy of T Scores on Kurtosis .........56
Table 14. Digit Preference—Accuracy of T Scores on Means….............................57
Table 15. Digit Preference—Accuracy of T Scores on Standard Deviations...........58
Table 16. Digit Preference—Accuracy of T Scores on Skewness...........................59
Table 17. Digit Preference—Accuracy of T Scores on Kurtosis ..............................60
Table 18. Multimodal Lumpy—Accuracy of T Scores on Means.............................61
Table 19. Multimodal Lumpy—Accuracy of T Scores on Standard Deviations .......62
Table 20. Multimodal Lumpy—Accuracy of T Scores on Skewness .......................63
Table 21. Multimodal Lumpy—Accuracy of T Scores on Kurtosis...........................64
viii
Table 22. Mass at Zero with Gap—Accuracy of T Scores on Means......................65
Table 23. Mass at Zero with Gap—Accuracy of T Scores on Standard
Deviations................................................................................................................66
Table 24. Mass at Zero with Gap—Accuracy of T Scores on Skewness ................67
Table 25. Mass at Zero with Gap—Accuracy of T Scores on Kurtosis....................68
Table 26. Extreme Asymmetric, Decay—Accuracy of T Scores on Means.............69
Table 27. Extreme Asymmetric, Decay—Accuracy of T Scores on Standard
Deviations................................................................................................................70
Table 28. Extreme Asymmetric, Decay—Accuracy of T Scores on Skewness .......71
Table 29. Extreme Asymmetric, Decay—Accuracy of T Scores on Kurtosis...........72
Table 30. Extreme Bimodal—Accuracy of T Scores on Means...............................73
Table 31. Extreme Bimodal—Accuracy of T Scores on Standard Deviations .........74
Table 32. Extreme Bimodal—Accuracy of T Scores on Skewness .........................75
Table 33. Extreme Bimodal—Accuracy of T Scores on Kurtosis ............................76
Table 34. Deviation from Target, Summarized by Moment, Sample Size, and
Distribution ..............................................................................................................90
Table 35. Winning Approximations, Summarized by Moment, Sample Size, and
Distribution ..............................................................................................................91
ix
LIST OF FIGURES
Figure 1. Comparison of Scores in a Normal Distribution .........................................3
Figure 2. Distribution of T Scores Using Blom’s Approximation: Good fit on all four
moments .................................................................................................................26
Figure 3. Distribution of T Scores Using Blom’s Approximation: Poor fit on second
and third moments ..................................................................................................27
Figure 4. Distribution of T Scores Using Blom’s Approximation: Poor fit on fourth
moment ...................................................................................................................28
Figure 5. Achievement: Smooth Symmetric ............................................................35
Figure 6. Achievement: Discrete Mass at Zero .......................................................36
Figure 7. Achievement: Extreme Asymmetric, Growth............................................37
Figure 8. Achievement: Digit Preference.................................................................38
Figure 9. Achievement: Multimodal Lumpy .............................................................39
Figure 10. Psychometric: Mass at Zero with Gap....................................................40
Figure 11. Psychometric: Extreme Asymmetric, Decay...........................................41
Figure 12. Psychometric: Extreme Bimodal ............................................................42
Figure 13. Smooth Symmetric: Power curve for deviation range of standard
deviation..................................................................................................................78
Figure 14. Smooth Symmetric: Power curve for deviation range of kurtosis ...........78
Figure 15. Discrete Mass at Zero: Power curve for deviation range of standard
deviation..................................................................................................................79
Figure 16. Discrete Mass at Zero: Power curve for deviation range of kurtosis ......79
x
Figure 17. Extreme Asymmetric, Growth: Power curve for deviation range of
standard deviation ...................................................................................................80
Figure 18. Extreme Asymmetric, Growth: Power curve for deviation range of
kurtosis ....................................................................................................................80
Figure 19. Digit Preference: Power curve for deviation range of standard
deviation..................................................................................................................81
Figure 20. Digit Preference: Power curve for deviation range of kurtosis................81
Figure 21. Multimodal Lumpy: Power curve for deviation range of standard
deviation..................................................................................................................82
Figure 22. Multimodal Lumpy: Power curve for deviation range of kurtosis ............82
Figure 23. Mass at Zero with Gap: Power curve for deviation range of standard
deviation..................................................................................................................83
Figure 24. Mass at Zero with Gap: Power curve for deviation range of kurtosis .....83
Figure 25. Extreme Asymmetric, Decay: Power curve for deviation range of
standard deviation ...................................................................................................84
Figure 26. Extreme Asymmetric, Decay: Power curve for deviation range of
kurtosis ....................................................................................................................84
Figure 27. Extreme Bimodal: Power curve for deviation range of standard
deviation..................................................................................................................85
Figure 28. Extreme Bimodal: Power curve for deviation range of kurtosis ..............85
Figure 29. Smooth Symmetric: Power curve for deviation range of standard
deviation with inclusion of large sample sizes .........................................................87
xi
Figure 30. Digit Preference: Power curve for deviation range of standard deviation
with inclusion of large sample sizes ........................................................................87
Figure 31. Mass at Zero with Gap: Power curve for deviation range of kurtosis with
inclusion of large sample sizes................................................................................88
1
CHAPTER 1
INTRODUCTION
To those who believe that “the purpose of data analysis is to analyze data better”
it is clearly wise to learn what a procedure really seems to be telling us about.
(J. W. Tukey, 1962)
Standardized tests can be used to determine aptitude or achievement
(Thorndike, 1982). Whether the goal of a test is to measure differences in ability,
personality, or mastery of a subject, it is necessary to analyze individual scores
relative to others in the group and also to analyze group scores relative to other
group scores (Angoff, 1971; AERA, APA, & NCME, 1999; Netemeyer, Bearden,
and Sharma, 2003). Scores are ultimately interpreted according to the purpose of
the test. For example, academic aptitude tests are likely to be interpreted
competitively, with high performing students favored for scholarships or admission
to selective programs and low performing students targeted for remediation.
Achievement tests are typically interpreted in the light of performance benchmarks
and used to measure the adequacy of teaching methods or school performance.
Analysis for either purpose requires a frame of reference for the interpretation of
raw scores (Aiken, 1994).
Standardization and normalization are two ways of defining the frame of
reference for a distribution of test scores. Both types of score conversions, or
transformations, mathematically modify raw score values (Osborne, 2002). The
defining feature of standard scores is that they use standard deviations to describe
scores’ distance from the mean, thereby creating equal units of measure within a
2
given score distribution. Standard scores may be modified to change the scale’s
number system (Angoff, 1984), but unless distributions of standard scores are
normalized, they will retain the shape of the original score distribution. Therefore,
standardization may enable effective analysis of individual scores within a single
test, but it does not lead to meaningful comparisons between tests.
Normalization surmounts this limitation by equalizing the areas under the
curve that correspond with scores’ successive intervals along the curve.
Normalization is considered a type of area transformation because it “redefines the
unit separations”(Angoff, 1984, p.36), changing the shape of the distribution itself.
Normalization has two great strengths, the first of which is shared by
standardization: 1) it transforms ordinal scales into continuous scales, which are
mathematically tractable; and 2) it superimposes a normal curve onto nonnormal
distributional shapes, allowing for between-test comparisons.
Normal scores may be scaled to make them easier to interpret. For
example, the formula T = 10Z + 50 replaces normalized standard scores with T
scores, which have a mean of 50 and a standard deviation of 10. Many normal
score systems are assigned means and standard deviations that correspond with
the T score. For example, the College Entrance Board’s Scholastic Aptitude Test
(SAT) Verbal and Mathematical sections are scaled to a mean of 500 and a
standard deviation of 100. Thus, T scores fall between 20 and 80 and SAT scores
fall between 200 and 800. Other normalized standard scores include normal curve
equivalent (NCE) scores, which have a mean of 500, a standard deviation of 21,
and a score range of 1-99; Wechsler scales, which have a mean of 100, a
3
standard deviation of 15, and a 95% score range of 55-145; and stanines, which
have a mean of 5, a standard deviation of 2, and a finite score range of 1-9.
Figure 1. Comparison of scores in a normal distribution. (Adapted from Test
Service Bulletin of The Psychological Corporation, 1955)
The first step in the process of converting raw scores into T scores or other
scaled, normal scores is a ranking of the raw scores according to their relative
placement on the unit normal distribution. This means that the raw scores will no
longer be used to characterize the test score distribution. Instead, raw scores will
be replaced by an estimate of their normal probability deviates. Whereas raw
scores originally refer to individual coordinates, they are transformed to become
components in the two dimensional spaces, or categories, which comprise the
area under the normal distribution. Once these normal probability deviates, or Z
4
scores, are obtained, the desired mean and standard deviation are applied. In the
case of T scores, Z scores are multiplied by 10 and assigned a mean of 50.
A number of ranking methods that improve the accuracy and efficiency of
the traditional percentile method have been developed in the last 60 years. These
ranking methods are sometimes referred to as proportion estimates because they
approximate where the ordinal scores fall along a normal distribution and how
much of the corresponding area under the curve the ranked, cumulative
proportions occupy. The most prominent of these procedures, based on their
inclusion in widely used computer statistical software (e.g., SPSS, 2006) are those
attributed to Van der Waerden (1952, 1953a, 1953b; Lehmann, 1975), Blom
(1958), and Tukey (1962), and the Rankit procedure (Ipsen & Jerne, 1944; Bliss,
1956). These proportion estimates have been explored to various degrees in the
context of hypothesis testing, where the focus is necessarily on the properties of
these estimates in the tails of a distribution. In the context of standardized testing,
however, the body of the distribution—that is, the 95% of the curve that lies
between the tails—is the focus. To date, there has been no empirical comparison
of these ranking methods as they apply to standardized testing.
When normalizing standard scores, practitioners need to know the
comparative effects of their selected ranking method on the transformed score
outcomes. Specifically, during the transformation of Z scores into T scores, the
practitioner would benefit from knowing each method’s potential accuracy and how
frequently it is capable of attaining a specific level of accuracy. Conversely, each
method’s likely degree and frequency of inaccuracy should be taken into account.
5
For T scores, the criteria for comparing ranking methods are the accuracy and
frequency of random scores’ attainment of a mean of 50 and a standard deviation
of 10. The standard deviation for T scores, alone, is not a useful point of
comparison because it is built on the mean; therefore, its degree of accuracy
derives from that of the mean and cannot be used as an independent reference
point. However, the accuracy of its standard deviation (that is, how nearly and how
frequently it obtains a value of 10) is equally important once the value of the mean
has been shown to be 50.
T scores express only the first and second moments of the distribution,
central tendency (mean) and variability (standard deviation), but they may also be
affected by the third and fourth moments, asymmetry (skewness) and peakedness
(kurtosis). Although each of these ranking methods is designed to produce a unit
normal score distribution, they may not achieve ideal skewness and kurtosis. A
normal curve is perfectly symmetrical, meaning it has zero skew. A kurtosis of
three (3) means the shape of the curve is neither more peaked nor more flat than
the shape of an idealized normal distribution. It is necessary to examine the
skewness and kurtosis of T scores, in addition to their means and standard
deviations, in order to fully evaluate each ranking method’s effectiveness in
normalizing test scores.
Research Problem
Given the importance of transforming Z scores to a scale that preserves a
mean of 50 and a standard deviation of 10, this study aims to empirically
demonstrate the relative accuracy of the Blom, Tukey, Van der Waerden, and
6
Rankit approximations for the purpose of normalizing test scores. It will compare
their accuracy in terms of achieving the T score’s specified mean and standard
deviation and unit normal skewness and kurtosis, among small and large sample
sizes in an array of real, nonnormal distributions. Although this objective is an
applied one, the investigation will benefit the theoretical advancement of area
estimation under the normal distribution.
Importance of the Problem
Standardized test scores, even scores abiding by the familiar T score scale,
are notoriously difficult to interpret (Micceri, 1990). Most test-takers, parents, and
even many educators, would be at a loss to explain exactly what a score of 39, 73,
or 428 means in conventional terms, such as pass/fail, percentage of questions
answered correctly, or performance relative to other test-takers. The matter is
complicated by standard error. Once error is computed and added/subtracted from
a given test score, it reveals a range of possible true scores.
Thus, a standard error of three would produce a range of six scores: it
would show the score 52 to be potentially as low as 49 or as high as 55. This
example assumes that the mean is 50. However, if a different ranking method
produces a mean of 51, the test-taker’s score would be between 50 and 56—or
combining the two methods’ results, between 49 and 56. If yet another method
produces a mean closer to 49, then theoretically, a test-taker’s true score could lie
anywhere between 48 and 56. The potential range of true scores expands with
each alternate method of computing the normalized score. Error is not a fixed
7
quantity; it may vary across computational methods as well as sample sizes and
statistical distributions.
The accuracy, both in terms of degree and frequency, of the four most
visible ranking methods has not been established. Blom, Tukey, Van der Waerden,
and Rankit each contribute a ranking formula that approximates a normal
distribution, given a set of raw scores or nonnormalized standard scores. However,
the formulas themselves have not been systematically compared for their first four
moments’ accuracy in terms of normally distributed data. Nor have they been
compared in the harsher glare of nonnormal distributions, which are prevalent in
the fields of education and psychology (Micceri, 1989). Small samples are also
common in real data and are known to have different statistical properties than
large samples (Conover, 1980). In general, real data can be assumed to behave
differently than data that is based on theoretical distributions, even if these are
nonnormal (Stigler, 1977).
Assumptions and Limitations
A series of Monte Carlo simulations will draw samples of different sizes from
eight different empirically established population distributions. These eight
distributions, though extensive in their representation of real achievement and
psychometric test scores, do not represent all possible distributions that could
occur in educational and psychological testing, or in social and behavioral science
investigations more generally. Nor do the sample sizes represent every possible
increment. However, both the sample size increments and the range of
distributional types are assumed to be sufficient for the purpose of outlining the
8
comparative accuracy and reliability of the ranking methods in real settings.
Although the interpretation of results need not be restricted to educational and
psychological data, similar distributional types may be most often found in these
domains.
Definitions
Z scores Raw scores or random variables that have undergone
the standardizing transformation (X – µ) / σ , where µ is
the mean and σ is the population standard deviation.
Also called unmodified standard scores.
Normal scores Raw scores or standard scores that have undergone a
normalizing transformation such that the ordinal
rankings of scores correspond to their probability
deviates on the unit normal distribution.
T scores Raw scores or standard scores that have undergone
the scaling transformation 10Z +50 , where Z is the
normal probability deviate corresponding to the ordinal
rank of the original raw or standard score.
Proportion estimates Approximation formulas estimating the cumulative
areas under a unit normal distribution that fall below the
ordinal rankings of test scores.
9
Rankit approximation A proportion estimate using the formula (r - 1/2) / n. *
Van der Waerden’s approximation A proportion estimate using the formula
r / (n + 1), where r is the rank, ranging from 1 to n.
Blom’s approximation A proportion estimate using the formula (r - 3/8) / (n +
1/4).
Tukey’s approximation A proportion estimate using the formula (r - 1/3) / (n +
1/3).
Monte Carlo simulation A statistical experiment modeled on a computer that
uses an iterative random sampling process, usually with
replacement of data values, to demonstrate the
behavior of statistical methods under specified
conditions.
* Notation for these four approximation formulas varies in the literature: 1) r is used
interchangeably with i and k; and 2) n is used interchangeably with w.
10
CHAPTER 2
LITERATURE REVIEW
The development of ranking methods stems from two related enterprises:
the psychological effort to measure mental phenomena and the statistical effort to
calculate the area under the unit normal distribution. Knowledge, intellectual ability,
and personality are psychological objects that can only be measured indirectly, not
by direct observation (Dunn-Rankin, 1983). The scales that describe them are
hierarchical—they result in higher or lower scores—but these scores do not
express exact quantities of test-takers’ proficiency or attitudes.
Likert scales, which are ordinal, and multiple choice items, which produce
discrete score scales, result in numbers that are meaningless without purposeful
statistical interpretation (Nanna & Sawilowsky, 1998). Measures with unevenly
spaced increments interfere with the interpretation of test scores against
performance benchmarks, the longitudinal linking of test editions, and the equating
of parallel forms of large-scale tests (Aiken, 1987). They also threaten the
robustness and power of the parametric statistical procedures that are
conventionally used to analyze standardized test scores (Friedman, 1937;
Sawilowsky & Blair, 1992).
Mental Testing and the Normal Distribution
Standardized test scores present a unique set of statistical considerations
because the scoring system may be devised for different purposes. Mehrens and
Lehmann (1987) characterized these purposes as instructional, guidance,
administrative, or research, but admittedly, these purposes often overlap. If the
11
purpose of a test is to discriminate between test-takers’ ability or achievement
levels, the scoring system would create maximum variability between scores. If its
purpose is to evaluate students’ progress toward a specified objective, then the
degree of variability between scores is less relevant. Apart from the natural range
of test-takers’ aptitude, subject-matter proficiency, and range of attitudes or
personality characteristics, a test’s design has a strong influence on its score
distribution.
Norm-Referencing and the T Score
The history of testing is fraught with incorrect distributional assumptions.
According to Angoff (1984), “the assumption underlying the search for equal units
was that mental ability is fundamentally normally distributed and that equal
segments on the base line of a normal curve would pace off equal units of mental
ability”(p.11). McCall (1939) devised the T score scale on this same assumption,
naming it after the educational and psychological measurement pioneers
Thorndike and Terman (Walker & Lev, 1969). McCall derived a normal scale by
randomly selecting individuals from a population that was presumed to be
homogenous, testing them, creating a distribution from their scores, and
transforming their percentile ranks to normal deviate scores with a preassigned
mean of 50 and standard deviation of 10. Today, this method would be considered
appropriate for norm-referencing a test to a target population, but thoroughly
inappropriate for determining any true ability distribution. Although there is no
reason to assume that cognitive phenomena are normally distributed, norm-
12
referencing can be useful for comparing individuals’ performance to others in the
same population.
Even when norming makes correct distributional assumptions, it can be
problematic. Angoff (1971) argued against normative scoring systems that have
built-in, definitional, or inherent meaning. These meanings are liable to be lost over
time or to become irrelevant. Aiken (1994) cautioned that norms can become
outdated even more quickly in certain circumstances: “for example, changes in
school curricula may necessitate restandardizing and perhaps modifying and
reconstructing an achievement test every 5 years or so”(p.78). Furthermore, scales
can function independently of direct representation. For example, inches, pounds,
and degrees Fahrenheit no longer reference their original object for most
Americans, but serve as effective measures nonetheless, due to their familiarity
and reliability. Likewise, the T score owes much of its usefulness to its
longstanding place as the scale of choice.
Despite these arguments, Mehrens and Lehmann (1987) viewed norm-
referencing as the basis for most testing theory and practice. It is “useful in
aptitude testing where we wish to make differential predictions. It is also very
useful to achievement testing”(p.18). They also noted that standardized tests are
often used in both norm-referenced and criterion-referenced contexts; they may be
constructed and interpreted to simultaneously compare a student’s performance
relative to other students in the target test-taking population as well as to evaluate
the student’s absolute knowledge of a subject. Norms may be referenced to
13
national, regional, and local standards; age and grade; mental age; percentiles; or
standard scores that are a function of a specific group’s performance.
Nonnormality Observed
According to Nunnally (1978), “test scores are seldom normally
distributed”(p.160). Micceri (1989) demonstrated the extent of this phenomenon in
the social and behavioral sciences by evaluating the distributional characteristics
of 440 real data sets collected from the fields of education and psychology.
Standardized scores from national, statewide, and districtwide test scores
accounted for 40% of them. Sources included the Comprehensive Test of Basic
Skills (CTBS), the California Achievement Tests, the Comprehensive Assessment
Program, the Stanford Reading tests, the Scholastic Aptitude Tests (SATs), the
College Board subject area tests, the American College Tests (ACTs), the
Graduate Record Examinations (GREs), Florida Teacher Certification
Examinations for adults, and Florida State Assessment Program test scores for 3rd
,
5th
, 8th
, 10th
, and 11th
grades.
Micceri summarized the tail weights, asymmetry, modality, and digit
preferences for the ability measures, psychometric measures, criterion/mastery
measures, and gain scores. Over the 440 data sets, Micceri found that only 19
(4.3%) approximated the normal distribution. No achievement measure’s scores
exhibited symmetry, smoothness, unimodality, or tail weights that were similar to
the Gaussian distribution. Underscoring the conclusion that normality is virtually
nonexistent in educational and psychological data, none of the 440 data sets
passed the Kolmogorov-Smirnov test of normality at alpha = .01, including the 19
14
that were relatively symmetric with light tails. The data collected from this study
highlight the prevalence of nonnormality in real social and behavioral science data
sets:
The great variety of shapes and forms suggests that respondent samples themselves consist of a variety of extremely heterogeneous subgroups, varying within populations on different yet similar traits that influence scores for specific measures. When this is considered in addition to the expected dependency inherent in such measures, it is somewhat unnerving to even dare think that the distributions studied here may not represent most of the distribution types to be found among the true populations of ability and psychometric measures. (Micceri, 1989, p.162) Furthermore, it is unlikely that the central limit theorem will rehabilitate the
demonstrated prevalence of nonnormal data sets in applied settings. Tapia and
Thompson (1978) warned against the “fallacious overgeneralization of central limit
theorem properties from sample means to individual scores”(cited in Micceri, 1989,
p.163). Although sample means may increasingly approximate the normal
distribution as sample sizes increase (Student, 1908), it is wrong to assume that
the original population of scores is normally distributed. According to Friedman
(1937), “this is especially apt to be the case with social and economic data, where
the normal distribution is likely to be the exception rather than the rule”(p.675).
Statistical Considerations
There has been considerable empirical evidence that raw and standardized
test scores are nonnormally distributed in the social and behavioral sciences. In
addition to Micceri (1989), numerous authors have raised concerns regarding the
assumption of normally distributed data (Pearson, 1895; Wilson & Hilferty, 1929;
Allport, 1934; Simon, 1955; Tukey & McLaughlin, 1963; Andrews et al., 1972;
15
Pearson & Please, 1975; Stigler, 1977; Bradley, 1978; Tapia & Thompson, 1978;
Tan, 1982; Sawilowsky & Blair, 1992). Bradley (1977) summarized the rationale for
adopting a statistical approach that responds to the fundamental nonnormality of
most real data:
One often hears the objection that if a distribution has a bizarre shape one should simply find and control the variable responsible for it. This outlook is appropriate enough to the area of quality control, but it is inappropriate to the behavioral sciences, and perhaps other areas, where the experimenter, even if he knew about the culprit variable and its influence upon population shape, is generally not interested in eliminating an assignable cause, but rather in coping with (i.e., drawing inferences about) a population in which it is free to vary. (p.149)
The prevalence of nonnormal distributions in education, psychology, and related
disciplines calls for a closer look at transformation procedures in the domain of
achievement and psychometric test scoring.
Transformations take many forms, ranging from the unadjusted linear
transformation to the logarithmic, square root, arc-sine, reciprocal, and inverse
normal scores transformations. Percentiles may also be staging a comeback.
Zimmerman and Zumbo (2005) argued that “a transformation to percentiles or
deciles is also similar to various normalizing transformations” insofar as those
transformations “bring sample values from nonnormal populations closer to a
normal distribution”(p.636). Percentile ranks denote the percentage of scores
falling below a certain point on the frequency distribution. They compared the
assignment of percentile values to raw scores with the assignment of ranks to raw
scores.
Traditionally, ranking was done by computing percentile ranks for the raw
scores, then finding the corresponding values from a normal probability
16
distribution. Today, statistical ranking formulas such as the Blom, Tukey, Van der
Waerden, and Rankit are used to estimate the normal probability deviates. Both
percentiles and statistical ranking methods minimize several types of deviations
from normality, but according to Zimmerman and Zumbo, “the percentile
transformation preserves the relative magnitude of scores between samples as
well as within samples”(p.635). This may be advantageous in certain
circumstances, but normalizing transformations have enduring appeal due to their
familiarity and efficiency.
History of normalizing transformations. An ordinal scale presents only score
ranks, without any reference to the distance between those ranks. There is no way
of knowing whether the distance between ranks (for example, the second-highest
and third-highest scores in a set) is similar to that between other ranks in the set.
Theorists have proposed proportion estimation formulas to deduce the average
distance between ranks based on what is known about the properties of the unit
normal distribution.
As described by Harter (1961):
The problem of order statistics has received a great deal of attention from statisticians dating at least as far back as a paper by Karl Pearson (1902) giving a solution of a generalization of a problem proposed by Galton (1902). The generalized problem is that of finding the average difference between the p
th and the (p+1)
th individuals in a sample of size n when the
sample is arranged in an order of magnitude. (p.151)
Other early attempts at characterizing variance among ordinal scales include Irwin
(1925); Tippet (1925); Thurstone (1928); Pearson and Pearson (1931); Fisher and
Yates (1938, 1953); Ipsen and Jerne (1944); Hastings, Mosteller, Tukey, and
Winsor (1947); Wilks (1948); Godwin (1949); Federer (1951); Mosteller (1951);
17
Bradley and Terry (1952); Scheffé (1952); Cadwell (1953); Pearson and Hartley
(1954); Blom (1954); Kendall (1955); and Harter (1959).
The pursuit of a useful way to characterize the difference between ordinal
points on a scale has primarily stemmed from the concerns of hypothesis testing.
This context has driven a focus on interval estimates and the extremes of the
normal distribution, because these are the areas that define the null hypothesis.
Testing, on the other hand, is primarily concerned with the differences which
characterize the body of the score distribution. In many research settings, ordinal
scales are often mathematically transformed into continuous scales in order to be
analyzed using parametric methods. According to Tukey (1957):
The analysis of data usually proceeds more easily if (1) effects are additive; (2) the error variability is constant; (3) the error distribution is symmetrical and possibly nearly normal.
The conventional purposes of transformation are to increase the degrees of approximation to which these desirable properties hold (p.609).
Transforming scales to a higher level of measurement leads to the problem of
gaps. “It is inevitable that gaps occur in the conversions when there are more scale
score points than raw score points, and gaps may be more of a problem for some
transformation methods and tests than for others.”(Chang, 2006, p.927). For this
reason, Bartlett advised “that even when measurements are available it may be
safer to analyze by use of ranks”(1947, p.50) by transforming them to expected
normal scores. “It is reasonable to assume that if the ranked data were replaced by
expected normal scores, the validity of the analysis of variance would be
somewhat improved”(p.50).
18
Transforming ordinal data into a continuous scale has been popular since
Fisher and Yates tabled the normal deviates in 1938. According to Wimberly
(1975):
An inherently linear relationship among the T-scores of different variables is free of mismatched kurtoses, skewnesses, and standard deviations which attenuate correlations or which lead to artificial non-linearities in regressions. Furthermore, the T-score transformation should generally result in a more nearly normal distribution than that provided by other transformations such as those from logarithms, exponents, or roots. (p.694)
T scores also have the advantage of being the most familiar scale, thus facilitating
score interpretation. The prime importance of interpretability has been stressed by
Petersen et al. (1989), Kolen and Brennan (2004), and Chang (2006).
Blom (1954) observed that “nearly all the transformations used hitherto in
the literature for normalization of binomial and related variables can be developed
from a common starting point”(p.303). Blom was referring to the use of the normal
probability integral to solve tail and confidence problems associated with certain
transformations, but this generalization holds conceptual value as well. The fact
that test scores are ordinal can be understood as the statistical point of origin for
the advantages and liabilities of normalizing transformations.
Transformation controversies. There has been considerable debate about
the statistical properties of various data transformations in the context of
hypothesis testing. This literature originally concerned the robustness of
parametric statistics such as the analysis of variance (ANOVA) to Type I error
(Glass, Peckham, & Sanders, 1972). Many early studies concluded that
transformations are unnecessary for ANOVA because the F test is impervious to
Type I error except in cases of heterogeneity of variance and unequal sample
19
sizes. Srivastava (1959), Games and Lucas (1966), and Donaldson (1968)
explored both Type I and Type II error rates for the F test among nonnormally
distributed data, suggesting that the test’s power increased in cases of extreme
skew and acute kurtosis.
Levine and Dunlap (1982) argued that power can generally be increased by
transforming skewed and heteroscedastic data. They took issue with the more
conservative approach of Games and Lucas, who “viewed transformation of data
as defensible only if it produced Type I error rates closer to the nominal
significance level when the null hypothesis was true and a lowered probability of
Type II errors (i.e., higher power) when the null hypothesis was false”(p.273). For
Levine and Dunlap, data transformations can do more than minimize error under
specific ANOVA assumptions violations. They can be used for the express
purpose of increasing power.
Games (1983) proceeded to redefine the argument by repositioning
skewness among the other three moments (central tendency, variability, and
kurtosis) that are changed by normalizing transformations. Power fluctuations
should be seen as resulting from the combination of transformed moments, not
skewness alone. Furthermore, Games argued that normalizing transformations
should not be undertaken out of a mechanistic desire to correct skew and increase
power. In line with Bradley (1978), Games (1983) held that “if Y has been
designated as the appropriate scale for psychological interpretation, then the
observation that Y is skewed is certainly an inadequate basis to cause one to
switch to a curvilinear transformation”(p.385-6).
20
Games also questioned the process of selecting transformations for
variance stabilization and normalization. “It is possible that a variance stabilizing
transformation may not be normalizing, and vice versa”(p.386), especially with
small samples. Games criticized Levine and Dunlap for not recognizing the
complexity of the decision to transform and the difficulty of evaluating the
appropriateness of specific transformations for specific purposes. Finally, Games
asserted that Levine and Dunlap generated their findings under irrelevant
statistical conditions (their sample data was neither skewed or heteroscedastic),
which lent to a facile conclusion. “Nobody in the literature has advocated taking
such data and applying a transformation”(p.386).
Levine and Dunlap (1983) disputed Games’ (1983) criticism, foremost the
assertion that transformations ought to be undertaken exclusively to correct skew.
Claiming that empirical demonstrations are insufficient, they invoked Kendall and
Stuart’s (1979) mathematical proof that the independent samples t test is the most
powerful statistical test in the case of normal, homoscedastic data. In short order,
Games (1984) rebutted Levine and Dunlap based on their “failure to distinguish
theoretical models from empirical data”(p.345), resulting in a fatal
misrepresentation of the behavior of empirical data.
Levine, Liukkonen, and Levine “partially resolved”(1992, p.680) this debate
by developing a statistic that identifies the effect of variance-stabilizing,
symmetrizing transformations on power. In line with Levine and Dunlap (1982,
1983), they concluded, albeit tentatively, that normalizing transformations could
indeed increase power for highly skewed data with equal sample sizes. This
21
represents a concession to Games’ (1983) emphasis on the dictates of observed
data: “In the absence of knowledge about the population distribution, we must rely
on the data itself to give clues as to which transformation to use”(p.691).
The Games-Levine controversy concerned the implications of
transformations for inferential statistical tests such as ANOVA. Here,
transformations may help to better meet parametric statistics’ underlying
assumptions and thereby reduce Type I and Type II errors. As this exchange
demonstrated, however, it is difficult to determine when it is justified to use a
transformation. The answer lies in the characteristics of the population, which can
only be inferred. Even when egregious assumptions violations seem to warrant a
transformation, it is not known to what extent the transformation corrects the
condition. Finally, once a transformation has changed the data’s original metric,
the resulting test statistic may become unintelligible in terms of the research
question (Bradley, 1978; Games, 1983).
In descriptive statistics, on the other hand, transformations serve to clarify
non-intuitive test scores. For example, the normalizing T score transformation
takes raw scores from any number of different metrics, few of which would be
familiar to a test taker, teacher, or administrator, and gives them a common
framework. Therefore, the T score is immune to the restrictions of normalizing
transformations in hypothesis testing scenarios.
Standardizing Transformations
Although standard scores may be assigned any mean and standard
deviation through linear scaling, the Z score transformation, which produces a
22
mean of 0 and a standard deviation of 1 for normally distributed variables, is the
baseline standardization technique (Walker & Lev, 1969; Mehrens & Lehmann,
1980; Hinkle, Wiersma, & Jurs, 2003). In the case of normally distributed data, Z
scores are produced by dividing the deviation score (the difference between raw
scores and the mean of their distribution) by the standard deviation. However, Z
scores can be difficult to interpret because they produce decimals and negative
numbers. Because 95% of the scores fall between -3 and +3, small changes in
decimals may imply large changes in performance. Also, because half the scores
are negative, it gives the impression to the uninitiated that half of the examinees
obtained an extremely poor outcome.
Linear scaling techniques. These problems can be remedied by multiplying
standard scores by a number sufficiently large to render decimal places trivial, then
adding a number large enough to eliminate negative numbers. The most common
type of modified standard score is one that multiplies Z scores by 10 to obtain their
standard deviation from the scaled mean of 50 (Cronbach, 1976; Kline, 2000). This
linear, scaling modification is sometimes confused with the T score formula, which
is a nonlinear, normalizing transformation. On the surface, the T score formula
resembles the modified standardization formula, but it operates on a different
principle. In the modified standard score formula Xm = 10Z + 50 , Z is a standard
score, the product of the standardizing transformation (X – µ) / σ ; in the T score
formula T = 10Z + 50 , Z refers not to the standard score but to the normal deviate
corresponding to that score. McCall used a simple linear transformation to convert
a group of norm-referenced standard scores into T scores.
23
The utility of modified standard scores is severely restricted by the nature of
achievement and psychometric test scores. Modified standard scores can only be
obtained for continuous data because they require computation of the mean.
However, most educational and psychological test scores are on a discrete scale,
not a continuous scale (Lester & Bishop, 2000). Furthermore, linear
transformations retain the shape of the original distribution. If a variable’s original
distribution is Gaussian, its transformed distribution will also be normal. If an
observed distribution manifests substantial skew, excessive or too little kurtosis, or
multimodality, these non-Gaussian features will be maintained in the transformed
distribution.
This is problematic for a wide range of practitioners because it is common
practice for educators to compare or combine scores on separate tests and for
testing companies to reference new versions of their tests to earlier versions.
Standard scores such as Z will not suffice for these purposes because they do not
account for differing score distributions between tests. Comparing scores from a
symmetric distribution with those from a negatively skewed distribution, for
example, will give more weight to the scores at the lower range of the skewed
curve than to those at the lower range of the symmetric curve (Horst, 1931).
For example, Wright (1973) described a scenario where standardization
would lend itself to the unequal weighting of test scores:
Some subjects, such as mathematics, tend to have widely dispersed scores while other subjects, such as English Composition, tend to have narrowly dispersed scores. Thus a student who is excellent in both subjects will find his mathematics grade of more value to his average than his English grade; the converse is of course true for the student who is poor in both subjects. If
24
you wish to have all subjects equally weighted you must perform a transformation that will equate their dispersions (p.4).
This scenario illustrates the necessity of normalizing transformations, which are
curvilinear, for rendering standard deviations uniform across test score
distributions. However, normalizing transformations may also mitigate the
inequitable interpretation of asymmetrical score distributions. A test score
distribution that is positively skewed has more variability than normal on the lower
end; therefore, cut points that are determined according to a specific standard
score or a standard deviation are likely to refer too many students to remedial
services.
Using Area Transformations to Normalize Score Distributions
Whereas linear transformations facilitate the interpretation of continuously
scaled, normally distributed raw scores, normalizing transformations create a
continuously scaled, normal distribution where there was none. According to
Petersen, Kolen, and Hoover (1989), there is not a good theoretical rationale for
normalizing transformations. They are undertaken for applied objectives. Linear
scaling transformations make standard scores easier to interpret, but they retain
the limitations of unmodified standard scores. They cannot be used to compare
scores from different tests, and they are statistically inappropriate for the analysis
of data from ordinally scaled instruments.
Establishing population normality is pivotal to the scoring and interpretation
of large-scale tests because it makes uniform the central tendency, variability,
symmetry, and peakedness of score distributions. Using area transformations to
rank random scores of different variables not only attempts to equate their means
25
and homogenize their variance, it also aims to create conformity in the third and
fourth moments, skewness and kurtosis. The following table illustrates the relative
accuracy of the Blom, Tukey, Van der Waerden, and Rankit approximations in
achieving the target moments of the unit normal distribution, with the first two
moments scaled to the T. These four transformations are performed on the same
10 scores from a smooth symmetric distribution.
Table 1
Differences among Ranking Methods in Attaining Target Moments
Computed Value of Moments \ Distance from Target
Mean (50) SD (10) Skew (0) Kurt (3)
Blom 50.010 \ 0.010 9.355 \ 0.645 0.008 \ 0.008 2.588 \ 0.412
Tukey 50.009 \ 0.009 9.211 \ 0.789 0.008 \ 0.008 2.559 \ 0.441
Van der W. 50.007 \ 0.007 8.266 \ 1.734 0.009 \ 0.009 2.384 \ 0.616
Rankit 50.011 \ 0.011 9.839 \ 0.161 0.007 \ 0.007 2.696 \ 0.304
All four ranking methods appear to be extremely accurate on the mean, with
the average deviation from target only 0.009. The difference between the most and
least accurate ranking methods on the mean is 0.004. Similarly, skewness shows
only slight deviation from target and negligible variability between methods.
Considerably more variability emerges on standard deviations and kurtosis,
however. The average distance from the target standard deviation is 0.832. Van
der Waerden’s approximation returns a deviation value that is ten times greater
than Rankit’s. Even the most accurate method is still nearly two-tenths of a
26
standard deviation off target. Kurtosis shows a similar pattern to standard
deviations, but with less average distance from target and variability within
deviation values. Rankit again is the most accurate, with half as much distance
from target kurtosis as Van der Waerden’s approximation. The average deviation
value for all four ranking methods on kurtosis is nearly half a point, 0.443.
Taking several variables from standardized assessment scores of infant
characteristics, the following graphs represent score distributions that have been
normalized using Blom’s ranking method. In all three examples (Figures 2 – 4),
Blom’s procedure has produced highly accurate means (corresponding to the
target T score mean of 50). However, Figure 3 shows a smaller than normal
standard deviation and a negative skew, and Figure 4 shows excessive kurtosis.
Figure 2. Distribution of T scores using Blom’s approximation: Good fit on all
four moments.
27
Figure 3. Distribution of T scores using Blom’s approximation: Poor fit on
second and third moments.
28
Figure 4. Distribution of T scores using Blom’s approximation: Poor fit on
fourth moment.
Approaches to Creating Normal Scores
Van der Waerden’s approximation. Tarter (2000) described Van der
Waerden’s approximation as “a useful nonparametric inferential procedure…based
on inverse Normal scores”(p.221). Normal scores are sometimes characterized as
quantiles, or equal unit portions of the area under a normal curve corresponding
with the number of observations comprising a sample. Van der Waerden (1952,
1953a, 1953b) suggested that quantiles be computed not strictly on the basis of
ranks, but according to the rank of a given score value relative to the sample size
(Conover, 1980).
29
Blom’s approximation. Harter (1961) noted that “there has been an
argument of long-standing between advocates of the approximations
corresponding to α = 0 and α = 0·5, neither of which is correct”(p.154). Blom (1954,
1958) observed the values of alpha to increase as the number of observations
increases, with the lowest value being 0.330. “For a given n, α is least for i = 1,
rises quickly to a peak for a relatively small value of i, and then drops off
slowly”(Harter, 1961, p.154). This reflects a nonlinear relationship between a
score’s rank in a sample and its normal deviate. Because “Blom conjectured that α
always lies in the interval (0·33, 0·50),” explained Harter, “he suggested the use of
α = 3/8 as a compromise value” (1961, p.154). Harter found the “compromise
value” of 3/8, or 0.375, appropriate for small samples but otherwise too low.
There is evidence that Blom envisioned a specific application of his normal
scores approximation. By his own evaluation: “We find that, in the special case of
a normal distribution, the plotting rule Pi = (i – 3/8) / (n + ¼)
leads to a practically unbiased estimate of a σ”(Blom, 1958, p.145). Blom
understood the empirical phenomenon of a normal distribution to be uncommon,
although it is not clear how he viewed the relative benefits of this formula in other
circumstances. Blom concurred with Chernoff and Lieberman (1954) that
“the plotting rule Pi = (i – 1/2) / n leads to a biased estimate of σ”(Blom,
1958, p.145). He suggested that this rule may be more efficient for large samples,
but his own formula promises higher efficiency, along with unbiasedness, with
small samples. Brown and Hettmansperger (1996) saw Blom’s approximation as
an outgrowth of the quantile function, which “suggests Φ-1
(i/n) or Φ-
30
1[i/(n+1)]”(p.1669). They considered Blom’s formula to be the most accurate
approximation of the normal deviate.
Rankit approximation. Bliss, Greenwood, and White (1956) credited Ipsen
and Jerne (1944) with coining the term “rankit,” but Bliss is credited with
developing the technique as it is now used. Bliss et al. refined this approximation in
their study of the effects of different insecticides and fungicides on the flavor of
apples. Its design drew on Scheffé’s advancements in paired comparison
research, which sought to account for magnitude and direction of preference, in
addition to preference itself. “The transformation of degree of preference to rankits
is a simple extension of Scheffé’s analysis in least squares”(Bliss et al., 1956,
p.399). In this way, “the proportion of choices…could be transformed to a normal
deviate…and the deviates for each sample averaged. These averages or scores
would measure the spacing on the hypothetical preference scale”(p.386).
Thus, the Rankit itself was transformed, from an array of observations that
are transformed into a single mean deviate, to the normalizing procedure that
effects this transformation. Blom found the Rankit approximation to be more
convenient and computationally efficient than the Thurstone-Mosteller, Bradley-
Terry, Kendall, and Scheffé techniques, even though “despite differences in the
underlying model and method of analysis, the treatment rankings on a preference
scale were substantially the same”(p.401). Rankit is also a plotting method for the
comparison of “ordered residuals against normal order statistics, which is used to
detect outliers and to check distributional assumptions”(Davison & Gigli, 1989,
p.211).
31
Tukey’s approximation. Tukey (1957) considered normalizing
transformations to be the most important type of data “re-expression”(Hoaglin,
2003, p.313). Pearson and Tukey (1965) affirmed their use for the analysis of
observed data, “graduating empirical data” and methodological investigations,
“providing possible parent distributions as foundations for the mathematical study,
analytical or empirical, of the properties of statistical procedures”(p.533). They
posited the sufficiency of approximations for these purposes, which “are unlikely to
require unusually high precision”(p.533). It seems that Tukey may have proposed
his approximation, which he characterized as “simple and surely an adequate
approximation to what is claimed to be optimum”(1962, p.22), as a refinement of
Blom’s.
32
CHAPTER 3
METHODOLOGY
The purpose of this study is to empirically demonstrate the comparative
accuracy of Van der Waerden’s, Blom’s, Tukey’s, and the Rankit approximations
for the purpose of normalizing standardized test scores. It will compare their
accuracy in terms of achieving the T score’s specified mean and standard
deviation and the unit normal distribution’s skewness and kurtosis among small
and large sample sizes for a variety of real, nonnormal data sets.
Procedure
A computer program will be written that computes normal scores using the
four proportion estimation formulas under investigation. These normal scores will
be computed for each successive iteration of randomly sampled raw scores drawn
from various real data sets.
The four different sets of normal scores will then be scaled to T scores. The
first four moments of the distribution will be calculated from these T scores for
each sample size in each population. Absolute values will be computed by
subtracting T score means from 50, standard deviations from 10, skewness values
from 0, and kurtosis values from 3. These absolute values will be sorted into like
bins; next, they will be ranked in order of proximity to the target mean, standard
deviation, skewness, and kurtosis.
Both the absolute values representing the T scores’ divergence from the
target values and the scores’ relative ranks in terms of accuracy on each criterion
will be reported.
33
Programming Specifications
Compaq Visual Fortran Professional Edition 6.6c will be run on a Microsoft
Windows XP platform. Fortran was chosen for its large processing capacity and
speed of execution. This is important for Monte Carlo simulations, which typically
require from thousands to millions of iterations.
Subroutine POP (Sawilowsky, Blair, & Micceri, 1990) is based on eight
distributions described by Micceri (1989). POP uses subroutines RNSET and
RNUND (IMSL, 1987). RNUND generates pseudorandom numbers from a uniform
distribution, and RNSET initializes a random seed for use in IMSL random number
generators (Visual Numerics, 1994). Subroutine RANKS (Sawilowsky, 1987) ranks
sorted data.
Sample Sizes
The simulation will be conducted on samples of size n = 5, 10, 15, 20, 25,
30, 35, 40, 45, 50, 100, 200, 500, and 1,000 selected from a theoretical normal
distribution, and from each of the eight Micceri (1989) data sets.
Number of Monte Carlo Repetitions
The goal is to compare the accuracy of four ranking methods. Therefore,
10,000 iterations should suffice to break any ties up to three decimal places.
Achievement and Psychometric Distributions
Micceri (1989) computed three indices of symmetry/asymmetry and two
indices of tail weight for each of the 440 large data sets he examined (for 70% of
which, n ≥ 1,000), grouped by data type: achievement/ability (accounting for 231 of
the measures), psychometric (125), criterion/mastery (35), and gain scores (49).
34
Eight distributions were identified based on specified levels of symmetry and tail
weight contamination. Sawilowsky, Blair, and Micceri (1990) translated these
results into a Fortran subroutine using achievement and psychometric measures
that best represented of the distributional characteristics described by Micceri
(1989).
Achievement distributions. The following five distributions were drawn from
achievement measures: Smooth Symmetric, Discrete Mass at Zero, Extreme
Asymmetric, Growth, Digit Preference, and Multimodal Lumpy. These distributions
are illustrated in Figures 5 through 9.
Psychometric distributions. Mass at Zero with Gap, Extreme Asymmetric,
Decay, and Extreme Bimodal were drawn from psychometric measures. These
distributions are illustrated in Figures 10 through 12.
All eight achievement and psychometric distributions are nonnormal.
Presentation of Results
Tables will document each ranking method’s performance in terms of
attaining the T score’s specified mean (50) and standard deviation (10), and the
skewness (0) and kurtosis (3) of the unit normal distribution.
35
Score
2520151050
Fre
qu
en
cy
500
400
300
200
100
0
Figure 5. Achievement: Smooth Symmetric. (Sawilowsky & Fahoome, 2003)
Basic characteristics of this distribution:
Range: (0 ≤ x ≤ 27)
Mean: 13.19
Median: 13.00
Variance: 24.11
Skewness: 0.01
Kurtosis: 2.66
36
Score
2520151050
Fre
qu
en
cy
300
250
200
150
100
50
0
Figure 6. Achievement: Discrete Mass at Zero. (Sawilowsky & Fahoome,
2003)
Basic characteristics of this distribution:
Range: (0 ≤ x ≤ 27)
Mean: 12.92
Median: 13.00
Variance: 19.54
Skewness: -0.03
Kurtosis: 3.31
37
Score
30252015105
Fre
qu
en
cy
500
400
300
200
100
0
Figure 7. Achievement: Extreme Asymmetric – Growth. (Sawilowsky &
Fahoome, 2003)
Basic characteristics of this distribution:
Range: (4 ≤ x ≤ 30)
Mean: 24.50
Median: 27.00
Variance: 33.52
Skewness: -1.33
Kurtosis: 4.11
38
Score
620595570545520495470445420
Fre
qu
en
cy
300
250
200
150
100
50
0
Figure 8. Achievement: Digit Preference. (Sawilowsky & Fahoome, 2003)
Basic characteristics of this distribution:
Range: (420 ≤ x ≤ 635)
Mean: 536.95
Median: 535.00
Variance: 1416.77
Skewness: -0.07
Kurtosis: 2.76
39
Score
4035302520151050
Fre
qu
en
cy
25
20
15
10
5
0
Figure 9. Achievement: Multimodal Lumpy. (Sawilowsky & Fahoome, 2003)
Basic characteristics of this distribution:
Range: (0 ≤ x ≤ 43)
Mean: 21.15
Median: 18.00
Variance: 141.61
Skewness: 0.19
Kurtosis: 1.80
40
Score
151050
Fre
qu
en
cy
600
500
400
300
200
100
0
Figure 10. Psychometric: Mass at Zero with Gap. (Sawilowsky & Fahoome,
2003)
Basic characteristics of this distribution:
Range: (0 ≤ x ≤ 16)
Mean: 1.85
Median: 0
Variance: 14.44
Skewness: 1.65
Kurtosis: 3.98
41
Score
3025201510
Fre
qu
en
cy
1200
1000
800
600
400
200
0
Figure 11. Psychometric: Extreme Asymmetric – Decay. (Sawilowsky &
Fahoome, 2003)
Basic characteristics of this distribution:
Range: (10 ≤ x ≤ 30)
Mean: 13.67
Median: 11.00
Variance: 33.06
Skewness: 1.64
Kurtosis: 4.52
42
Score
543210
Fre
qu
en
cy
250
200
150
100
50
0
Figure 12. Psychometric: Extreme Bimodal. (Sawilowsky & Fahoome, 2003)
Basic characteristics of this distribution:
Range: (0 ≤ x ≤ 5)
Mean: 2.97
Median: 4.00
Variance: 2.86
Skewness: -0.80
Kurtosis: 1.30
43
CHAPTER 4
RESULTS
The purpose of this study was to compare the accuracy of the Blom, Tukey,
Van der Waerden, and Rankit approximations. The following 32 tables present the
results. They show the absolute and relative accuracy of the four approximations in
attaining the target moments of the normal distribution at the values established by
the T score scale. The tables are organized sequentially according to distribution
and moment. Study results for the mean, the standard deviation, skewness, and
kurtosis appear in the same order for each of the eight distributions described in
Chapter 3. All numbers are rounded to the third decimal place.
The accuracy of the four ranking methods on the T score is given in two
forms. The first, which comes to the left of the backslash ( \ ), represents the
statistic’s rank relative to the other three approximations. The number to the right
of the backslash represents an actual value, not a rank. The top half of each table
displays the relative ranks and absolute values of approximated scores’ deviation
from the target value of the given moment. For example, the T score’s target
standard deviation is 10. Therefore, the deviation value represents the absolute
value of the distance of each approximation from 10. Two ranking methods that
produce a standard deviation of 9.8 or 10.2 would have the same the deviation
value: 0.2. The bottom half of the tables displays the ranks and values of the root
mean square (RMS). RMS values, which represent the magnitude of difference
between scores, are derived by taking the standard deviations of each set of
mean, standard deviation, skewness, and kurtosis values. Both deviation from
44
target (the top half of the tables) and RMS (the bottom half) compare the four
approximations’ variability. Whereas deviation from target computes each ranking
method’s hit rate, or how frequently it is accurate, RMS evaluates the degree of
difference between the methods’ performance. It is possible for an approximation
to have different ranks in terms of deviation from target and magnitude of
deviation.
The rank, which is the first number in each column, is a whole number when
the approximation method achieves the same rank over 10,000 Monte Carlo runs.
It is a decimal when this is not the case. However, unlike deviation ranks, RMS
ranks correspond to a single statistic: the standard deviation of the respective
statistic’s average performance across 10,000 random draws. Therefore, ties are
possible between RMS ranks. There are 18 instances of tied RMS ranks
distributed among nine tables. Ties are broken by assigning to each tied rank the
average value of the tied ranks and the missing rank. For example, the two-way tie
(1, 1, 3, 4) is missing the rank of (2). The first two ranks are reassigned as the
mean of (1) and (2): (1.5, 1.5, 3, 4). Three-way ties, which are rare, are broken in
the same way: (1, 1, 1, 4) becomes (2, 2, 2, 4), representing the midpoint of (1)
and the missing ranks of (2) and (3).
The final statistic that is provided in the tables is the range for deviation from
target and RMS. In both cases, the range represents the difference between the
highest and the lowest values (not the ranks) in each row. The larger the range,
the more the deviation and RMS ranks are likely to matter. Following the 32 tables
documenting accuracy, a series of figures explores the deviation range.
45
Table 2
Smooth Symmetric—Accuracy of T Scores on Means
Deviation from Target (50)
n Rank \ Value
B T V R
5 1.686 \ 0.000 1.360 \ 0.000 1.633 \ 0.000 1.358 \ 0.000
10 1.720 \ 0.000 1.619 \ 0.000 1.680 \ 0.000 1.747 \ 0.000
15 1.794 \ 0.000 1.805 \ 0.000 1.776 \ 0.000 1.836 \ 0.000
20 1.892 \ 0.000 1.835 \ 0.000 1.914 \ 0.000 1.963 \ 0.000
25 1.801 \ 0.000 1.845 \ 0.000 1.819 \ 0.000 1.814 \ 0.000
30 1.913 \ 0.000 2.067 \ 0.000 1.928 \ 0.000 1.828 \ 0.000
35 2.006 \ 0.000 2.079 \ 0.000 1.945 \ 0.000 2.115 \ 0.000
40 1.981 \ 0.000 2.074 \ 0.000 2.017 \ 0.000 2.037 \ 0.000
45 1.906 \ 0.000 1.923 \ 0.000 1.903 \ 0.000 1.923 \ 0.000
50 2.043 \ 0.000 2.047 \ 0.000 1.944 \ 0.000 1.955 \ 0.000
100 2.136 \ 0.000 2.157 \ 0.000 2.153 \ 0.000 2.161 \ 0.000
200 2.244 \ 0.000 2.284 \ 0.000 2.310 \ 0.000 2.317 \ 0.000
500 2.429 \ 0.000 2.445 \ 0.000 2.433 \ 0.000 2.425 \ 0.000
1000 2.466 \ 0.000 2.457 \ 0.000 2.465 \ 0.000 2.471 \ 0.000
Magnitude of Deviation (RMS)
n Rank \ Value
B T V R
5 1.500 \ 0.000 1.500 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
10 1.500 \ 0.000 3.500 \ 0.000 3.500 \ 0.000 1.500 \ 0.000
15 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
20 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
35 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
40 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
46
Table 3
Smooth Symmetric—Accuracy of T Scores on Standard Deviations
Deviation from Target (10)
n Rank \ Value Range
B T V R
5 1.000 \ 1.987 3.000 \ 2.177 4.000 \ 3.372 2.000 \ 2.089 1.385
10 1.000 \ 1.161 3.000 \ 1.296 4.000 \ 2.185 2.000 \ 1.185 1.024
15 1.998 \ 0.844 3.000 \ 0.951 4.000 \ 1.667 1.002 \ 0.842 0.825
20 2.000 \ 0.671 3.000 \ 0.760 4.000 \ 1.367 1.000 \ 0.659 0.708
25 2.000 \ 0.561 3.000 \ 0.638 4.000 \ 1.168 1.000 \ 0.544 0.624
30 2.000 \ 0.485 3.000 \ 0.554 4.000 \ 1.026 1.000 \ 0.465 0.561
35 2.000 \ 0.429 3.000 \ 0.491 4.000 \ 0.918 1.000 \ 0.408 0.510
40 2.000 \ 0.386 3.000 \ 0.442 4.000 \ 0.833 1.000 \ 0.364 0.469
45 2.000 \ 0.351 3.000 \ 0.403 4.000 \ 0.764 1.000 \ 0.329 0.435
50 2.000 \ 0.323 3.000 \ 0.371 4.000 \ 0.707 1.000 \ 0.300 0.407
100 2.000 \ 0.186 3.000 \ 0.215 4.000 \ 0.421 1.000 \ 0.167 0.254
200 2.000 \ 0.111 3.000 \ 0.128 4.000 \ 0.250 1.000 \ 0.010 0.240
500 2.000 \ 0.006 3.000 \ 0.007 4.000 \ 0.128 1.000 \ 0.005 0.123
1000 2.000 \ 0.004 3.000 \ 0.005 4.000 \ 0.008 1.000 \ 0.004 0.004
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 3.000 \ 0.008 4.000 \ 0.008 2.000 \ 0.007 1.000 \ 0.007 0.001
10 3.000 \ 0.005 4.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.004 0.001
15 3.000 \ 0.004 4.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.003 0.001
20 3.000 \ 0.004 4.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.003 0.001
25 3.000 \ 0.003 4.000 \ 0.003 2.000 \ 0.003 1.000 \ 0.002 0.001
30 4.000 \ 0.003 3.000 \ 0.003 2.000 \ 0.003 1.000 \ 0.002 0.001
35 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001
40 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001
45 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
50 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
100 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.000 0.001
200 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000
500 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 1000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000
47
Table 4
Smooth Symmetric—Accuracy of T Scores on Skewness
Deviation from Target (0)
n Rank \ Value Range
B T V R
5 3.717 \ 0.000 2.895 \ 0.000 1.295 \ 0.000 2.093 \ 0.000 0.000
10 3.936 \ 0.001 1.914 \ 0.000 1.232 \ 0.001 2.919 \ 0.001 0.001
15 1.013 \ 0.001 2.994 \ 0.001 3.989 \ 0.105 2.004 \ 0.001 0.104
20 2.006 \ 0.140 2.997 \ 0.140 3.987 \ 0.146 1.010 \ 0.140 0.006
25 1.995 \ 0.122 3.000 \ 0.122 4.000 \ 0.127 1.007 \ 0.122 0.005
30 2.000 \ 0.007 3.000 \ 0.007 4.000 \ 0.007 1.000 \ 0.007 0.000
35 1.993 \ 0.001 2.997 \ 0.001 3.994 \ 0.001 1.016 \ 0.001 0.000
40 2.116 \ 0.000 2.908 \ 0.000 3.732 \ 0.000 1.244 \ 0.000 0.000
45 2.007 \ 0.001 3.000 \ 0.001 3.989 \ 0.001 1.008 \ 0.001 0.000
50 2.020 \ 0.141 2.989 \ 0.141 3.965 \ 0.145 1.027 \ 0.141 0.004
100 2.937 \ 0.002 2.055 \ 0.002 1.170 \ 0.002 3.838 \ 0.002 0.000
200 2.930 \ 0.003 2.063 \ 0.003 1.190 \ 0.003 3.817 \ 0.003 0.000
500 2.897 \ 0.003 2.082 \ 0.003 1.233 \ 0.003 3.788 \ 0.003 0.000
1000 2.875 \ 0.003 2.094 \ 0.003 1.288 \ 0.003 3.743 \ 0.003 0.000
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 1.000 \ 0.002 2.000 \ 0.002 4.000 \ 0.002 3.000 \ 0.002 0.000
10 1.000 \ 0.256 3.000 \ 0.259 4.000 \ 0.279 2.000 \ 0.258 0.023
15 4.000 \ 0.521 2.000 \ 0.520 1.000 \ 0.515 3.000 \ 0.520 0.006
20 1.000 \ 0.446 3.000 \ 0.447 4.000 \ 0.456 2.000 \ 0.446 0.010
25 4.000 \ 0.570 2.000 \ 0.567 1.000 \ 0.551 3.000 \ 0.570 0.019
30 4.000 \ 0.453 2.000 \ 0.450 1.000 \ 0.436 3.000 \ 0.452 0.017
35 3.000 \ 0.479 2.000 \ 0.477 1.000 \ 0.461 4.000 \ 0.479 0.018
40 3.000 \ 0.612 2.000 \ 0.611 1.000 \ 0.560 4.000 \ 0.613 0.053
45 3.000 \ 0.587 2.000 \ 0.586 1.000 \ 0.578 4.000 \ 0.587 0.009
50 3.000 \ 0.607 2.000 \ 0.605 1.000 \ 0.593 4.000 \ 0.608 0.015
100 3.000 \ 0.565 2.000 \ 0.565 1.000 \ 0.564 4.000 \ 0.565 0.001
200 3.000 \ 0.555 2.000 \ 0.555 4.000 \ 0.555 1.000 \ 0.555 0.000
500 3.000 \ 0.549 2.000 \ 0.549 4.000 \ 0.549 1.000 \ 0.549 0.000 1000 3.000 \ 0.555 2.000 \ 0.555 1.000 \ 0.555 4.000 \ 0.555 0.000
48
Table 5
Smooth Symmetric—Accuracy of T Scores on Kurtosis
Deviation from Target (3)
n Rank \ Value Range
B T V R
5 1.000 \ 1.148 2.000 \ 1.155 4.000 \ 1.195 3.000 \ 1.156 0.047
10 1.000 \ 1.106 3.000 \ 1.111 4.000 \ 1.138 2.000 \ 1.110 0.032
15 1.000 \ 1.092 3.000 \ 1.095 4.000 \ 1.115 2.000 \ 1.093 0.023
20 1.001 \ 1.058 3.000 \ 1.061 4.000 \ 1.079 2.002 \ 1.058 0.021
25 1.922 \ 1.019 3.000 \ 1.022 4.000 \ 1.040 1.078 \ 1.019 0.021
30 2.000 \ 0.981 3.000 \ 0.983 4.000 \ 1.000 1.000 \ 0.980 0.020
35 2.000 \ 0.957 3.000 \ 0.959 4.000 \ 0.975 1.000 \ 0.956 0.019
40 2.000 \ 0.953 3.000 \ 0.956 4.000 \ 0.970 1.000 \ 0.953 0.017
45 2.000 \ 0.979 3.000 \ 0.980 4.000 \ 0.993 1.000 \ 0.978 0.015
50 2.000 \ 1.014 3.000 \ 1.016 4.000 \ 1.028 1.000 \ 1.013 0.015
100 2.000 \ 0.957 3.000 \ 0.960 4.000 \ 0.976 1.000 \ 0.956 0.020
200 2.000 \ 0.948 3.000 \ 0.950 4.000 \ 0.961 1.000 \ 0.947 0.014
500 2.000 \ 0.942 3.000 \ 0.943 4.000 \ 0.949 1.000 \ 0.941 0.008
1000 2.000 \ 0.940 3.000 \ 0.940 4.000 \ 0.944 1.000 \ 0.939 0.005
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 1.000 \ 0.006 2.000 \ 0.006 4.000 \ 0.006 3.000 \ 0.006 0.000
10 2.000 \ 0.310 3.000 \ 0.311 4.000 \ 0.313 1.000 \ 0.310 0.003
15 1.000 \ 0.434 3.000 \ 0.435 4.000 \ 0.438 2.000 \ 0.435 0.004
20 1.000 \ 0.402 3.000 \ 0.403 4.000 \ 0.411 2.000 \ 0.402 0.009
25 4.000 \ 0.470 2.000 \ 0.469 1.000 \ 0.462 3.000 \ 0.470 0.008
30 4.000 \ 0.456 2.000 \ 0.456 1.000 \ 0.452 3.000 \ 0.456 0.004
35 3.000 \ 0.444 2.000 \ 0.444 1.000 \ 0.443 4.000 \ 0.444 0.001
40 3.000 \ 0.462 2.000 \ 0.461 1.000 \ 0.457 4.000 \ 0.462 0.005
45 3.000 \ 0.500 2.000 \ 0.500 1.000 \ 0.498 4.000 \ 0.500 0.002
50 3.000 \ 0.495 2.000 \ 0.495 1.000 \ 0.494 4.000 \ 0.495 0.001
100 2.000 \ 0.476 3.000 \ 0.476 4.000 \ 0.476 1.000 \ 0.476 0.000
200 2.000 \ 0.477 3.000 \ 0.477 4.000 \ 0.478 1.000 \ 0.477 0.001
500 2.000 \ 0.472 3.000 \ 0.472 4.000 \ 0.472 1.000 \ 0.472 0.000 1000 2.000 \ 0.472 3.000 \ 0.472 4.000 \ 0.473 1.000 \ 0.472 0.001
49
Table 6
Discrete Mass at Zero—Accuracy of T Scores on Means
Deviation from Target (50)
n Rank \ Value
B T V R
5 1.811 \ 0.000 1.403 \ 0.000 1.594 \ 0.000 1.295 \ 0.000
10 1.761 \ 0.000 1.640 \ 0.000 1.711 \ 0.000 1.700 \ 0.000
15 1.774 \ 0.000 1.827 \ 0.000 1.796 \ 0.000 1.866 \ 0.000
20 1.902 \ 0.000 1.845 \ 0.000 1.934 \ 0.000 1.970 \ 0.000
25 1.796 \ 0.000 1.857 \ 0.000 1.840 \ 0.000 1.791 \ 0.000
30 1.937 \ 0.000 2.066 \ 0.000 1.947 \ 0.000 1.853 \ 0.000
35 1.982 \ 0.000 2.078 \ 0.000 1.957 \ 0.000 2.158 \ 0.000
40 1.987 \ 0.000 2.103 \ 0.000 2.007 \ 0.000 2.021 \ 0.000
45 1.924 \ 0.000 1.932 \ 0.000 1.913 \ 0.000 1.908 \ 0.000
50 2.072 \ 0.000 2.008 \ 0.000 1.975 \ 0.000 1.971 \ 0.000
100 2.127 \ 0.000 2.202 \ 0.000 2.136 \ 0.000 2.187 \ 0.000
200 2.266 \ 0.000 2.292 \ 0.000 2.303 \ 0.000 2.330 \ 0.000
500 2.441 \ 0.000 2.435 \ 0.000 2.415 \ 0.000 2.439 \ 0.000
1000 2.456 \ 0.000 2.458 \ 0.000 2.492 \ 0.000 2.463 \ 0.000
Magnitude of Deviation (RMS)
n Rank \ Value
B T V R
5 2.000 \ 0.000 2.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
10 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
15 1.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000
20 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000
25 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
35 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
40 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
50
Table 7
Discrete Mass at Zero—Accuracy of T Scores on Standard Deviations
Deviation from Target (10)
n Rank \ Value Range
B T V R
5 1.000 \ 2.049 3.000 \ 2.237 4.000 \ 3.421 2.000 \ 2.149 1.372
10 1.000 \ 1.182 3.000 \ 1.316 4.000 \ 2.202 2.000 \ 1.205 1.020
15 1.997 \ 0.853 3.000 \ 0.959 4.000 \ 1.675 1.003 \ 0.851 0.824
20 2.000 \ 0.675 3.000 \ 0.764 4.000 \ 1.370 1.000 \ 0.663 0.707
25 2.000 \ 0.571 3.000 \ 0.648 4.000 \ 1.176 1.000 \ 0.553 0.623
30 2.000 \ 0.496 3.000 \ 0.564 4.000 \ 1.035 1.000 \ 0.476 0.559
35 2.000 \ 0.440 3.000 \ 0.501 4.000 \ 0.927 1.000 \ 0.418 0.509
40 2.000 \ 0.396 3.000 \ 0.452 4.000 \ 0.842 1.000 \ 0.374 0.468
45 3.000 \ 0.368 3.000 \ 0.412 4.000 \ 0.773 1.000 \ 0.338 0.435
50 2.000 \ 0.333 3.000 \ 0.381 4.000 \ 0.716 1.000 \ 0.310 0.406
100 2.000 \ 0.195 3.000 \ 0.224 4.000 \ 0.429 1.000 \ 0.176 0.253
200 2.000 \ 0.120 3.000 \ 0.137 4.000 \ 0.258 1.000 \ 0.106 0.152
500 2.000 \ 0.007 3.000 \ 0.008 4.000 \ 0.137 1.000 \ 0.006 0.131
1000 2.000 \ 0.005 3.000 \ 0.006 4.000 \ 0.009 1.000 \ 0.005 0.004
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.117 3.000 \ 0.114 2.000 \ 0.113 1.000 \ 0.009 0.108
10 4.000 \ 0.008 3.000 \ 0.008 2.000 \ 0.008 1.000 \ 0.006 0.002
15 3.000 \ 0.005 4.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.004 0.001
20 3.000 \ 0.004 4.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.003 0.001
25 3.000 \ 0.003 4.000 \ 0.003 2.000 \ 0.003 1.000 \ 0.002 0.001
30 4.000 \ 0.004 3.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.003 0.001
35 4.000 \ 0.002 3.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.002 0.000
40 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001
45 4.000 \ 0.002 3.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001
50 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
100 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
200 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000
500 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 1000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000
51
Table 8
Discrete Mass at Zero—Accuracy of T Scores on Skewness
Deviation from Target (0)
n Rank \ Value Range
B T V R
5 3.740 \ 0.001 2.914 \ 0.001 1.264 \ 0.001 2.083 \ 0.001 0.000
10 1.005 \ 0.004 2.999 \ 0.005 3.996 \ 0.006 2.001 \ 0.005 0.002
15 2.880 \ 0.006 2.302 \ 0.006 2.238 \ 0.004 2.579 \ 0.006 0.002
20 3.983 \ 0.007 2.007 \ 0.006 1.023 \ 0.005 2.987 \ 0.007 0.002
25 2.274 \ 0.003 2.382 \ 0.004 3.197 \ 0.005 2.147 \ 0.003 0.002
30 2.005 \ 0.127 2.994 \ 0.128 3.985 \ 0.134 1.015 \ 0.127 0.007
35 2.017 \ 0.139 2.989 \ 0.140 3.968 \ 0.145 1.026 \ 0.139 0.006
40 2.000 \ 0.119 3.000 \ 0.120 3.999 \ 0.123 1.001 \ 0.119 0.004
45 2.003 \ 0.007 2.997 \ 0.007 3.992 \ 0.007 1.008 \ 0.007 0.000
50 2.007 \ 0.002 2.999 \ 0.002 3.972 \ 0.002 1.023 \ 0.002 0.000
100 2.881 \ 0.001 2.074 \ 0.001 1.339 \ 0.000 3.706 \ 0.001 0.001
200 3.001 \ 0.003 2.019 \ 0.003 1.069 \ 0.003 3.912 \ 0.003 0.000
500 2.965 \ 0.003 2.026 \ 0.003 1.033 \ 0.003 3.975 \ 0.003 0.000
1000 2.863 \ 0.001 2.127 \ 0.001 1.403 \ 0.001 3.607 \ 0.001 0.000
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.010 3.000 \ 0.010 1.000 \ 0.009 2.000 \ 0.010 0.001
10 4.000 \ 0.469 2.000 \ 0.467 1.000 \ 0.459 3.000 \ 0.467 0.010
15 1.000 \ 0.313 3.000 \ 0.313 4.000 \ 0.320 2.000 \ 0.313 0.007
20 1.000 \ 0.473 3.000 \ 0.473 4.000 \ 0.479 2.000 \ 0.473 0.006
25 4.000 \ 0.382 2.000 \ 0.382 1.000 \ 0.382 3.000 \ 0.382 0.000
30 4.000 \ 0.526 2.000 \ 0.525 1.000 \ 0.524 3.000 \ 0.526 0.002
35 3.000 \ 0.535 2.000 \ 0.535 1.000 \ 0.520 4.000 \ 0.535 0.015
40 3.000 \ 0.608 2.000 \ 0.607 1.000 \ 0.598 4.000 \ 0.609 0.011
45 3.000 \ 0.535 2.000 \ 0.559 1.000 \ 0.551 4.000 \ 0.560 0.025
50 3.000 \ 0.566 2.000 \ 0.565 1.000 \ 0.558 4.000 \ 0.567 0.009
100 3.000 \ 0.559 2.000 \ 0.559 1.000 \ 0.558 4.000 \ 0.558 0.001
200 3.000 \ 0.555 2.000 \ 0.555 4.000 \ 0.555 1.000 \ 0.555 0.000
500 4.000 \ 0.552 2.000 \ 0.552 1.000 \ 0.552 3.000 \ 0.552 0.000 1000 2.000 \ 0.542 1.000 \ 0.542 4.000 \ 0.542 3.000 \ 0.542 0.000
52
Table 9
Discrete Mass at Zero—Accuracy of T Scores on Kurtosis
Deviation from Target (3)
n Rank \ Value Range
B T V R
5 1.000 \ 1.164 2.000 \ 1.171 4.000 \ 1.208 3.000 \ 1.172 0.044
10 1.000 \ 1.109 2.100 \ 1.114 3.100 \ 1.139 2.000 \ 1.112 0.030
15 1.001 \ 1.070 3.000 \ 1.074 3.999 \ 1.098 2.000 \ 1.072 0.028
20 1.000 \ 1.016 3.000 \ 1.020 4.000 \ 1.047 2.000 \ 1.017 0.031
25 1.001 \ 1.078 3.000 \ 1.081 4.000 \ 1.095 1.999 \ 1.078 0.017
30 2.000 \ 1.075 3.000 \ 1.077 4.000 \ 1.091 1.000 \ 1.075 0.016
35 2.000 \ 1.044 3.000 \ 1.046 4.000 \ 1.060 1.000 \ 1.043 0.017
40 2.000 \ 0.996 3.000 \ 0.999 3.999 \ 1.012 1.000 \ 0.996 0.016
45 2.000 \ 0.953 3.000 \ 0.955 4.000 \ 0.968 1.000 \ 0.953 0.015
50 2.000 \ 0.945 3.000 \ 0.946 4.000 \ 0.959 1.000 \ 0.944 0.015
100 2.000 \ 1.081 3.000 \ 1.082 3.999 \ 1.088 1.001 \ 1.080 0.008
200 2.000 \ 0.949 3.000 \ 0.950 4.000 \ 0.961 1.000 \ 0.947 0.014
500 2.000 \ 0.942 3.000 \ 0.943 4.000 \ 0.949 1.000 \ 0.941 0.008
1000 2.000 \ 1.081 3.000 \ 1.081 3.999 \ 1.082 1.001 \ 1.081 0.001
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.010 3.000 \ 0.010 1.000 \ 0.009 2.000 \ 0.010 0.001
10 1.000 \ 0.304 3.000 \ 0.306 4.000 \ 0.320 2.000 \ 0.305 0.016
15 1.000 \ 0.339 3.000 \ 0.340 4.000 \ 0.348 2.000 \ 0.339 0.009
20 1.000 \ 0.326 3.000 \ 0.327 4.000 \ 0.332 2.000 \ 0.326 0.006
25 1.000 \ 0.397 3.000 \ 0.397 4.000 \ 0.398 2.000 \ 0.397 0.001
30 2.000 \ 0.502 3.000 \ 0.502 4.000 \ 0.505 1.000 \ 0.502 0.003
35 4.000 \ 0.354 2.000 \ 0.354 1.000 \ 0.354 3.000 \ 0.354 0.000
40 2.000 \ 0.468 3.000 \ 0.468 4.000 \ 0.469 1.000 \ 0.468 0.001
45 3.000 \ 0.503 2.000 \ 0.503 1.000 \ 0.500 4.000 \ 0.503 0.003
50 3.000 \ 0.465 2.000 \ 0.465 1.000 \ 0.464 4.000 \ 0.466 0.002
100 2.000 \ 0.494 3.000 \ 0.494 4.000 \ 0.495 1.000 \ 0.494 0.001
200 2.000 \ 0.480 3.000 \ 0.480 4.000 \ 0.480 1.000 \ 0.480 0.000
500 2.000 \ 0.477 3.000 \ 0.477 4.000 \ 0.477 1.000 \ 0.477 0.000 1000 2.000 \ 0.473 3.000 \ 0.473 4.000 \ 0.473 1.000 \ 0.473 0.000
53
Table 10
Extreme Asymmetric, Growth—Accuracy of T Scores on Means
Deviation from Target (50)
n Rank \ Value
B T V R
5 1.738 \ 0.000 1.411 \ 0.000 1.579 \ 0.000 1.196 \ 0.000
10 1.796 \ 0.000 1.669 \ 0.000 1.837 \ 0.000 1.630 \ 0.000
15 1.827 \ 0.000 1.810 \ 0.000 1.846 \ 0.000 1.778 \ 0.000
20 2.039 \ 0.000 1.878 \ 0.000 1.936 \ 0.000 1.923 \ 0.000
25 1.822 \ 0.000 1.853 \ 0.000 1.825 \ 0.000 1.880 \ 0.000
30 1.977 \ 0.000 2.051 \ 0.000 2.004 \ 0.000 1.882 \ 0.000
35 2.076 \ 0.000 2.076 \ 0.000 2.049 \ 0.000 2.028 \ 0.000
40 2.091 \ 0.000 2.033 \ 0.000 2.041 \ 0.000 1.988 \ 0.000
45 1.943 \ 0.000 1.958 \ 0.000 1.881 \ 0.000 1.926 \ 0.000
50 2.043 \ 0.000 2.025 \ 0.000 1.967 \ 0.000 1.988 \ 0.000
100 2.143 \ 0.000 2.235 \ 0.000 2.192 \ 0.000 2.141 \ 0.000
200 2.276 \ 0.000 2.310 \ 0.000 2.314 \ 0.000 2.390 \ 0.000
500 2.411 \ 0.000 2.449 \ 0.000 2.474 \ 0.000 2.437 \ 0.000
1000 2.477 \ 0.000 2.474 \ 0.000 2.457 \ 0.000 2.477 \ 0.000
Magnitude of Deviation (RMS)
n Rank \ Value
B T V R
5 1.500 \ 0.000 1.500 \ 0.000 3.500 \ 0.000 3.500 \ 0.000
10 1.000 \ 0.000 2.000 \ 0.000 3.500 \ 0.000 3.500 \ 0.000
15 1.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000
20 1.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000
25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
35 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
40 4.000 \ 0.000 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000
45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
54
Table 11
Extreme Asymmetric, Growth—Accuracy of T Scores on Standard Deviations
Deviation from Target (10)
n Rank \ Value Range
B T V R
5 1.000 \ 2.075 3.000 \ 2.263 4.000 \ 3.442 2.000 \ 2.176 1.367
10 1.000 \ 1.243 3.000 \ 1.375 4.000 \ 2.250 2.000 \ 1.265 1.007
15 1.984 \ 0.934 3.000 \ 1.038 4.000 \ 1.739 1.016 \ 0.932 0.807
20 2.000 \ 0.769 3.000 \ 0.855 4.000 \ 1.446 1.000 \ 0.756 0.690
25 2.000 \ 0.666 3.000 \ 0.740 4.000 \ 1.253 1.000 \ 0.649 0.604
30 2.000 \ 0.601 3.000 \ 0.666 4.000 \ 1.120 1.000 \ 0.581 0.539
35 2.000 \ 0.551 3.000 \ 0.609 4.000 \ 1.018 1.000 \ 0.530 0.488
40 2.000 \ 0.524 3.000 \ 0.577 4.000 \ 0.947 1.000 \ 0.502 0.445
45 2.000 \ 0.484 3.000 \ 0.532 4.000 \ 0.874 1.000 \ 0.462 0.412
50 2.000 \ 0.440 3.000 \ 0.485 4.000 \ 0.804 1.000 \ 0.418 0.386
100 2.000 \ 0.320 3.000 \ 0.347 4.000 \ 0.538 1.000 \ 0.302 0.236
200 2.000 \ 0.258 3.000 \ 0.273 4.000 \ 0.384 1.000 \ 0.245 0.139
500 2.000 \ 0.213 3.000 \ 0.220 4.000 \ 0.273 1.000 \ 0.205 0.068
1000 2.000 \ 0.197 3.000 \ 0.201 4.000 \ 0.230 1.000 \ 0.194 0.036
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.204 3.000 \ 0.200 2.000 \ 0.198 1.000 \ 0.163 0.041
10 4.000 \ 0.169 3.000 \ 0.168 2.000 \ 0.165 1.000 \ 0.144 0.025
15 3.000 \ 0.007 4.000 \ 0.007 2.000 \ 0.007 1.000 \ 0.006 0.001
20 3.000 \ 0.005 4.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.004 0.001
25 4.000 \ 0.010 3.000 \ 0.010 2.000 \ 0.010 1.000 \ 0.008 0.002
30 4.000 \ 0.009 3.000 \ 0.009 2.000 \ 0.009 1.000 \ 0.008 0.001
35 4.000 \ 0.006 3.000 \ 0.006 2.000 \ 0.006 1.000 \ 0.005 0.001
40 4.000 \ 0.005 3.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.005 0.000
45 4.000 \ 0.007 3.000 \ 0.007 2.000 \ 0.007 1.000 \ 0.006 0.001
50 4.000 \ 0.005 3.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.005 0.000
100 4.000 \ 0.004 3.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.004 0.000
200 4.000 \ 0.003 3.000 \ 0.003 2.000 \ 0.003 1.000 \ 0.003 0.000
500 4.000 \ 0.002 3.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.002 0.000
1000 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
55
Table 12
Extreme Asymmetric, Growth—Accuracy of T Scores on Skewness
Deviation from Target (0)
n Rank \ Value Range
B T V R
5 4.000 \ 0.005 3.000 \ 0.005 1.000 \ 0.005 2.000 \ 0.005 0.000
10 1.076 \ 0.001 2.973 \ 0.001 3.928 \ 0.001 2.024 \ 0.001 0.000
15 3.424 \ 0.004 2.163 \ 0.004 1.624 \ 0.004 2.790 \ 0.004 0.000
20 1.582 \ 0.107 2.997 \ 0.107 3.988 \ 0.109 1.433 \ 0.107 0.002
25 1.998 \ 0.139 2.998 \ 0.139 3.994 \ 0.143 1.010 \ 0.139 0.004
30 2.000 \ 0.176 2.998 \ 0.177 3.994 \ 0.182 1.008 \ 0.176 0.006
35 2.004 \ 0.142 3.000 \ 0.142 3.992 \ 0.148 1.007 \ 0.141 0.007
40 2.000 \ 0.002 2.999 \ 0.002 3.998 \ 0.003 1.003 \ 0.002 0.001
45 2.149 \ 0.009 2.968 \ 0.009 3.798 \ 0.009 1.085 \ 0.009 0.000
50 3.000 \ 0.010 2.000 \ 0.010 1.000 \ 0.008 4.000 \ 0.010 0.002
100 2.006 \ 0.166 2.996 \ 0.167 3.990 \ 0.168 1.008 \ 0.166 0.002
200 2.591 \ 0.170 2.420 \ 0.169 2.316 \ 0.164 2.677 \ 0.171 0.007
500 2.615 \ 0.174 2.388 \ 0.174 2.193 \ 0.171 2.804 \ 0.175 0.004
1000 2.620 \ 0.176 2.379 \ 0.175 2.160 \ 0.174 2.841 \ 0.176 0.002
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.113 3.000 \ 0.111 1.000 \ 0.010 2.000 \ 0.110 0.103
10 3.000 \ 0.413 2.000 \ 0.412 1.000 \ 0.410 4.000 \ 0.413 0.003
15 4.000 \ 0.481 2.000 \ 0.478 1.000 \ 0.457 3.000 \ 0.480 0.024
20 4.000 \ 0.654 2.000 \ 0.652 1.000 \ 0.640 3.000 \ 0.654 0.014
25 3.000 \ 0.600 2.000 \ 0.597 1.000 \ 0.580 4.000 \ 0.600 0.020
30 4.000 \ 0.504 2.000 \ 0.503 1.000 \ 0.498 3.000 \ 0.504 0.006
35 3.000 \ 0.668 2.000 \ 0.668 1.000 \ 0.665 4.000 \ 0.669 0.004
40 3.000 \ 0.649 2.000 \ 0.648 1.000 \ 0.644 4.000 \ 0.649 0.005
45 3.000 \ 0.666 2.000 \ 0.665 1.000 \ 0.663 4.000 \ 0.666 0.003
50 2.000 \ 0.500 3.000 \ 0.500 4.000 \ 0.505 1.000 \ 0.500 0.005
100 3.000 \ 0.541 2.000 \ 0.540 1.000 \ 0.538 4.000 \ 0.541 0.003
200 3.000 \ 0.596 2.000 \ 0.596 1.000 \ 0.595 4.000 \ 0.596 0.001
500 3.000 \ 0.576 2.000 \ 0.576 1.000 \ 0.576 4.000 \ 0.576 0.000 1000 3.000 \ 0.590 2.000 \ 0.590 1.000 \ 0.590 4.000 \ 0.590 0.000
56
Table 13
Extreme Asymmetric, Growth—Accuracy of T Scores on Kurtosis
Deviation from Target (3)
n Rank \ Value Range
B T V R
5 1.000 \ 1.176 2.000 \ 1.182 4.000 \ 1.219 3.000 \ 1.183 0.043
10 1.000 \ 1.063 3.000 \ 1.064 4.000 \ 1.093 2.000 \ 1.066 0.030
15 1.000 \ 1.018 3.000 \ 1.022 4.000 \ 1.042 2.000 \ 1.019 0.024
20 1.000 \ 1.008 3.000 \ 1.010 4.000 \ 1.028 2.000 \ 1.008 0.020
25 2.000 \ 1.025 3.000 \ 1.027 4.000 \ 1.041 1.000 \ 1.025 0.016
30 2.000 \ 1.078 3.000 \ 1.079 4.000 \ 1.091 1.000 \ 1.077 0.014
35 2.000 \ 1.085 3.000 \ 1.087 4.000 \ 1.098 1.000 \ 1.085 0.013
40 2.000 \ 1.116 3.000 \ 1.117 4.000 \ 1.124 1.000 \ 1.115 0.009
45 2.000 \ 1.066 3.000 \ 1.067 4.000 \ 1.079 1.000 \ 1.065 0.014
50 2.000 \ 1.082 3.000 \ 1.083 4.000 \ 1.094 1.000 \ 1.081 0.013
100 2.001 \ 1.044 2.999 \ 1.045 3.998 \ 1.051 1.001 \ 1.043 0.008
200 2.001 \ 1.023 3.000 \ 1.024 3.999 \ 1.030 1.002 \ 1.022 0.008
500 2.001 \ 1.019 3.000 \ 1.020 3.999 \ 1.023 1.002 \ 1.019 0.004
1000 2.001 \ 1.018 3.000 \ 1.018 3.999 \ 1.020 1.002 \ 1.018 0.002
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.010 3.000 \ 0.009 1.000 \ 0.008 2.000 \ 0.009 0.002
10 1.000 \ 0.254 3.000 \ 0.256 4.000 \ 0.263 2.000 \ 0.255 0.009
15 4.000 \ 0.371 2.000 \ 0.369 1.000 \ 0.355 3.000 \ 0.371 0.016
20 2.000 \ 0.456 3.000 \ 0.456 4.000 \ 0.457 1.000 \ 0.456 0.001
25 3.000 \ 0.423 2.000 \ 0.423 1.000 \ 0.421 4.000 \ 0.423 0.002
30 4.000 \ 0.556 2.000 \ 0.556 1.000 \ 0.554 3.000 \ 0.556 0.002
35 2.000 \ 0.544 3.000 \ 0.545 4.000 \ 0.548 1.000 \ 0.544 0.004
40 4.000 \ 0.552 2.000 \ 0.552 1.000 \ 0.551 3.000 \ 0.552 0.001
45 1.000 \ 0.569 3.000 \ 0.569 4.000 \ 0.570 2.000 \ 0.569 0.001
50 1.000 \ 0.433 3.000 \ 0.433 4.000 \ 0.435 2.000 \ 0.433 0.002
100 3.000 \ 0.474 2.000 \ 0.474 4.000 \ 0.475 1.000 \ 0.474 0.001
200 3.000 \ 0.516 2.000 \ 0.516 1.000 \ 0.516 4.000 \ 0.516 0.000
500 3.000 \ 0.503 2.000 \ 0.503 1.000 \ 0.503 4.000 \ 0.503 0.000 1000 3.000 \ 0.511 2.000 \ 0.511 1.000 \ 0.511 1.000 \ 0.511 0.000
57
Table 14
Digit Preference—Accuracy of T Scores on Means
Deviation from Target (50)
n Rank \ Value
B T V R
5 1.570 \ 0.000 1.371 \ 0.000 1.690 \ 0.000 1.184 \ 0.000
10 1.678 \ 0.000 1.644 \ 0.000 1.648 \ 0.000 1.771 \ 0.000
15 1.815 \ 0.000 1.816 \ 0.000 1.725 \ 0.000 1.838 \ 0.000
20 1.883 \ 0.000 1.825 \ 0.000 1.877 \ 0.000 1.986 \ 0.000
25 1.797 \ 0.000 1.870 \ 0.000 1.794 \ 0.000 1.799 \ 0.000
30 1.893 \ 0.000 2.072 \ 0.000 1.957 \ 0.000 1.775 \ 0.000
35 1.997 \ 0.000 2.048 \ 0.000 1.965 \ 0.000 2.075 \ 0.000
40 1.966 \ 0.000 2.048 \ 0.000 2.041 \ 0.000 2.025 \ 0.000
45 1.861 \ 0.000 1.954 \ 0.000 1.863 \ 0.000 1.922 \ 0.000
50 2.049 \ 0.000 2.026 \ 0.000 1.918 \ 0.000 1.922 \ 0.000
100 2.119 \ 0.000 2.185 \ 0.000 2.129 \ 0.000 2.094 \ 0.000
200 2.277 \ 0.000 2.272 \ 0.000 2.254 \ 0.000 2.292 \ 0.000
500 2.427 \ 0.000 2.418 \ 0.000 2.420 \ 0.000 2.429 \ 0.000
1000 2.459 \ 0.000 2.453 \ 0.000 2.456 \ 0.000 2.471 \ 0.000
Magnitude of Deviation (RMS)
n Rank \ Value
B T V R
5 1.000 \ 0.000 2.500 \ 0.000 2.500 \ 0.000 4.000 \ 0.000
10 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
15 1.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 1.000 \ 0.000
20 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 1.000 \ 0.000
25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
35 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000
40 1.000 \ 0.000 2.500 \ 0.000 2.500 \ 0.000 4.000 \ 0.000
45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
100 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000
200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
58
Table 15
Digit Preference—Accuracy of T Scores on Standard Deviations
Deviation from Target (10)
n Rank \ Value Range
B T V R
5 1.000 \ 1.975 3.000 \ 2.166 4.000 \ 3.361 2.000 \ 2.077 1.386
10 1.000 \ 1.130 3.000 \ 1.265 4.000 \ 2.159 2.000 \ 1.153 1.029
15 2.000 \ 0.819 3.000 \ 0.926 4.000 \ 1.645 1.000 \ 0.817 0.828
20 2.000 \ 0.652 3.000 \ 0.742 4.000 \ 1.350 1.000 \ 0.640 0.710
25 2.000 \ 0.543 3.000 \ 0.620 4.000 \ 1.152 1.000 \ 0.526 0.626
30 2.000 \ 0.468 3.000 \ 0.537 4.000 \ 1.010 1.000 \ 0.448 0.562
35 2.000 \ 0.413 3.000 \ 0.474 4.000 \ 0.903 1.000 \ 0.391 0.512
40 2.000 \ 0.372 3.000 \ 0.428 4.000 \ 0.820 1.000 \ 0.349 0.471
45 2.000 \ 0.336 3.000 \ 0.388 4.000 \ 0.750 1.000 \ 0.314 0.436
50 2.000 \ 0.309 3.000 \ 0.357 4.000 \ 0.695 1.000 \ 0.287 0.408
100 2.000 \ 0.176 3.000 \ 0.205 4.000 \ 0.411 1.000 \ 0.156 0.255
200 2.000 \ 0.102 3.000 \ 0.119 4.000 \ 0.241 1.000 \ 0.009 0.232
500 2.000 \ 0.005 3.000 \ 0.006 4.000 \ 0.119 1.000 \ 0.004 0.115
1000 2.000 \ 0.003 3.000 \ 0.004 4.000 \ 0.007 1.000 \ 0.003 0.004
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.118 3.000 \ 0.115 2.000 \ 0.114 1.000 \ 0.009 0.109
10 4.000 \ 0.002 3.000 \ 0.007 2.000 \ 0.007 1.000 \ 0.006 0.005
15 3.000 \ 0.002 4.000 \ 0.002 1.500 \ 0.002 1.500 \ 0.002 0.000
20 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001
25 3.000 \ 0.002 4.000 \ 0.003 2.000 \ 0.002 1.000 \ 0.002 0.000
30 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.002 0.000
35 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001
40 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
45 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
50 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
100 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
200 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000
500 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 1000 3.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000
59
Table 16
Digit Preference—Accuracy of T Scores on Skewness
Deviation from Target (0)
n Rank \ Value Range
B T V R
5 3.414 \ 0.000 2.817 \ 0.000 1.580 \ 0.000 2.189 \ 0.000 0.000
10 3.998 \ 0.000 2.000 \ 0.000 1.003 \ 0.000 2.999 \ 0.000 0.000
15 3.998 \ 0.002 2.000 \ 0.002 1.003 \ 0.000 2.999 \ 0.002 0.002
20 1.088 \ 0.010 3.000 \ 0.100 4.000 \ 0.109 1.912 \ 0.010 0.099
25 2.002 \ 0.135 2.999 \ 0.136 3.996 \ 0.141 1.002 \ 0.135 0.006
30 2.004 \ 0.122 2.998 \ 0.123 3.997 \ 0.127 1.000 \ 0.122 0.005
35 2.001 \ 0.008 2.999 \ 0.008 3.997 \ 0.008 1.003 \ 0.008 0.000
40 2.020 \ 0.003 2.973 \ 0.003 3.938 \ 0.003 1.069 \ 0.003 0.000
45 2.336 \ 0.003 2.732 \ 0.003 3.184 \ 0.003 1.748 \ 0.003 0.000
50 2.001 \ 0.009 2.983 \ 0.009 3.995 \ 0.010 1.021 \ 0.009 0.001
100 2.958 \ 0.003 2.045 \ 0.003 1.128 \ 0.003 3.869 \ 0.003 0.000
200 2.924 \ 0.003 2.055 \ 0.003 1.169 \ 0.003 3.851 \ 0.003 0.000
500 2.839 \ 0.003 2.094 \ 0.003 1.287 \ 0.003 3.781 \ 0.003 0.000
1000 2.883 \ 0.001 2.110 \ 0.001 1.345 \ 0.001 3.662 \ 0.001 0.000
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.001 3.000 \ 0.001 1.000 \ 0.001 2.000 \ 0.001 0.000
10 4.000 \ 0.313 2.000 \ 0.311 1.000 \ 0.298 3.000 \ 0.311 0.015
15 4.000 \ 0.562 2.000 \ 0.561 1.000 \ 0.557 3.000 \ 0.561 0.005
20 2.000 \ 0.371 3.000 \ 0.372 4.000 \ 0.385 1.000 \ 0.371 0.014
25 4.000 \ 0.533 2.000 \ 0.530 1.000 \ 0.518 3.000 \ 0.532 0.015
30 3.000 \ 0.520 2.000 \ 0.520 1.000 \ 0.520 4.000 \ 0.520 0.000
35 3.000 \ 0.420 2.000 \ 0.418 1.000 \ 0.410 4.000 \ 0.420 0.010
40 3.000 \ 0.486 2.000 \ 0.485 1.000 \ 0.477 4.000 \ 0.486 0.009
45 3.000 \ 0.587 2.000 \ 0.586 1.000 \ 0.574 4.000 \ 0.588 0.014
50 3.000 \ 0.620 2.000 \ 0.618 1.000 \ 0.606 4.000 \ 0.621 0.015
100 3.000 \ 0.564 2.000 \ 0.564 1.000 \ 0.563 4.000 \ 0.564 0.001
200 2.000 \ 0.553 3.000 \ 0.553 4.000 \ 0.554 1.000 \ 0.553 0.001
500 2.000 \ 0.552 3.000 \ 0.552 4.000 \ 0.553 1.000 \ 0.552 0.001 1000 3.000 \ 0.539 2.000 \ 0.539 1.000 \ 0.539 4.000 \ 0.539 0.000
60
Table 17
Digit Preference—Accuracy of T Scores on Kurtosis
Deviation from Target (3)
n Rank \ Value Range
B T V R
5 1.000 \ 1.153 2.000 \ 1.160 4.000 \ 1.200 3.000 \ 1.162 0.047
10 1.000 \ 1.057 3.000 \ 1.062 4.000 \ 1.095 2.000 \ 0.061 1.034
15 1.000 \ 1.072 3.000 \ 1.076 4.000 \ 1.099 2.000 \ 1.074 0.027
20 1.000 \ 1.082 3.000 \ 1.085 4.000 \ 1.102 2.000 \ 1.083 0.020
25 1.949 \ 1.054 3.000 \ 1.056 4.000 \ 1.073 1.051 \ 1.054 0.019
30 2.000 \ 1.020 3.000 \ 1.023 4.000 \ 1.039 1.000 \ 1.020 0.019
35 2.000 \ 0.976 3.000 \ 0.978 4.000 \ 0.993 1.000 \ 0.975 0.018
40 2.000 \ 0.947 3.000 \ 0.949 4.000 \ 0.964 1.000 \ 0.947 0.017
45 2.000 \ 0.956 3.000 \ 0.958 4.000 \ 0.971 1.000 \ 0.955 0.016
50 2.000 \ 0.969 3.000 \ 0.971 4.000 \ 0.983 1.000 \ 0.968 0.015
100 2.000 \ 0.955 3.000 \ 0.958 4.000 \ 0.974 1.000 \ 0.954 0.020
200 2.000 \ 0.946 3.000 \ 0.940 4.000 \ 0.959 1.000 \ 0.945 0.019
500 2.000 \ 0.940 3.000 \ 0.941 4.000 \ 0.947 1.000 \ 0.939 0.008
1000 2.000 \ 1.083 3.000 \ 1.083 3.999 \ 1.084 1.001 \ 1.083 0.001
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.121 3.000 \ 0.121 1.000 \ 0.117 2.000 \ 0.121 0.004
10 1.000 \ 0.328 3.000 \ 0.329 4.000 \ 0.333 2.000 \ 0.329 0.005
15 1.000 \ 0.321 2.000 \ 0.321 4.000 \ 0.323 3.000 \ 0.321 0.002
20 1.000 \ 0.344 3.000 \ 0.345 4.000 \ 0.354 2.000 \ 0.344 0.010
25 4.000 \ 0.433 2.000 \ 0.431 1.000 \ 0.425 3.000 \ 0.432 0.008
30 2.000 \ 0.467 3.000 \ 0.467 4.000 \ 0.472 1.000 \ 0.466 0.006
35 4.000 \ 0.355 3.000 \ 0.355 1.000 \ 0.355 2.000 \ 0.355 0.000
40 3.000 \ 0.433 2.000 \ 0.433 1.000 \ 0.432 4.000 \ 0.434 0.001
45 4.000 \ 0.459 2.000 \ 0.459 1.000 \ 0.458 3.000 \ 0.459 0.001
50 3.000 \ 0.496 2.000 \ 0.496 1.000 \ 0.492 4.000 \ 0.497 0.005
100 2.000 \ 0.481 3.000 \ 0.481 4.000 \ 0.481 1.000 \ 0.481 0.000
200 2.000 \ 0.480 3.000 \ 0.480 4.000 \ 0.480 1.000 \ 0.480 0.000
500 2.000 \ 0.476 3.000 \ 0.476 4.000 \ 0.476 1.000 \ 0.476 0.000 1000 2.000 \ 0.472 3.000 \ 0.472 4.000 \ 0.472 1.000 \ 0.472 0.000
61
Table 18
Multimodal Lumpy—Accuracy of T Scores on Means
Deviation from Target (50)
n Rank \ Value
B T V R
5 1.506 \ 0.000 1.383 \ 0.000 1.615 \ 0.000 1.300 \ 0.000
10 1.824 \ 0.000 1.619 \ 0.000 1.677 \ 0.000 1.667 \ 0.000
15 1.809 \ 0.000 1.842 \ 0.000 1.716 \ 0.000 1.839 \ 0.000
20 1.878 \ 0.000 1.786 \ 0.000 1.877 \ 0.000 1.948 \ 0.000
25 1.724 \ 0.000 1.876 \ 0.000 1.799 \ 0.000 1.850 \ 0.000
30 1.863 \ 0.000 2.072 \ 0.000 1.955 \ 0.000 1.750 \ 0.000
35 1.957 \ 0.000 2.054 \ 0.000 1.944 \ 0.000 2.050 \ 0.000
40 1.962 \ 0.000 2.014 \ 0.000 2.065 \ 0.000 1.967 \ 0.000
45 1.845 \ 0.000 1.938 \ 0.000 1.819 \ 0.000 1.905 \ 0.000
50 2.032 \ 0.000 1.988 \ 0.000 1.943 \ 0.000 1.937 \ 0.000
100 2.103 \ 0.000 2.170 \ 0.000 2.120 \ 0.000 2.090 \ 0.000
200 2.247 \ 0.000 2.268 \ 0.000 2.219 \ 0.000 2.265 \ 0.000
500 2.381 \ 0.000 2.413 \ 0.000 2.443 \ 0.000 2.401 \ 0.000
1000 2.460 \ 0.000 2.457 \ 0.000 2.443 \ 0.000 2.444 \ 0.000
Magnitude of Deviation (RMS)
n Rank \ Value
B T V R
5 1.500 \ 0.000 1.500 \ 0.000 3.500 \ 0.000 3.500 \ 0.000
10 1.000 \ 0.000 2.500 \ 0.000 2.500 \ 0.000 4.000 \ 0.000
15 1.000 \ 0.000 3.000 \ 0.000 3.000 \ 0.000 3.000 \ 0.000
20 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
30 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
35 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
40 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000
45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
62
Table 19
Multimodal Lumpy—Accuracy of T Scores on Standard Deviations
Deviation from Target (10)
n Rank \ Value Range
B T V R
5 1.000 \ 1.963 3.000 \ 2.154 4.000 \ 3.351 2.000 \ 2.065 1.388
10 1.000 \ 1.123 3.000 \ 1.258 4.000 \ 2.151 2.000 \ 1.146 1.028
15 2.000 \ 0.810 3.000 \ 0.917 4.000 \ 1.636 1.000 \ 0.809 0.827
20 2.000 \ 0.643 3.000 \ 0.733 4.000 \ 1.341 1.000 \ 0.631 0.710
25 2.000 \ 0.537 3.000 \ 0.615 4.000 \ 1.145 1.000 \ 0.520 0.625
30 2.000 \ 0.464 3.000 \ 0.533 4.000 \ 1.005 1.000 \ 0.444 0.561
35 2.000 \ 0.410 3.000 \ 0.471 4.000 \ 0.898 1.000 \ 0.388 0.510
40 2.000 \ 0.368 3.000 \ 0.424 4.000 \ 0.815 1.000 \ 0.346 0.469
45 2.000 \ 0.334 3.000 \ 0.386 4.000 \ 0.747 1.000 \ 0.311 0.436
50 2.000 \ 0.307 3.000 \ 0.355 4.000 \ 0.691 1.000 \ 0.285 0.406
100 2.000 \ 0.178 3.000 \ 0.206 4.000 \ 0.411 1.000 \ 0.158 0.253
200 2.000 \ 0.105 3.000 \ 0.122 4.000 \ 0.242 1.000 \ 0.009 0.233
500 2.000 \ 0.006 3.000 \ 0.006 4.000 \ 0.123 1.000 \ 0.005 0.118
1000 2.000 \ 0.004 3.000 \ 0.004 4.000 \ 0.008 1.000 \ 0.004 0.004
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.118 3.000 \ 0.115 2.000 \ 0.114 1.000 \ 0.009 0.109
10 4.000 \ 0.003 3.000 \ 0.003 2.000 \ 0.003 1.000 \ 0.002 0.001
15 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.002 0.000
20 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001
25 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.002 0.000
30 3.000 \ 0.002 4.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001
35 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
40 3.000 \ 0.001 4.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
45 4.000 \ 0.002 3.000 \ 0.002 2.000 \ 0.002 1.000 \ 0.001 0.001
50 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
100 4.000 \ 0.001 3.000 \ 0.001 2.000 \ 0.001 1.000 \ 0.001 0.000
200 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000
500 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000 1000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 0.000
63
Table 20
Multimodal Lumpy—Accuracy of T Scores on Skewness
Deviation from Target (0)
n Rank \ Value Range
B T V R
5 3.989 \ 0.000 2.996 \ 0.000 1.012 \ 0.000 2.003 \ 0.000 0.000
10 1.399 \ 0.003 2.912 \ 0.003 3.742 \ 0.003 1.948 \ 0.003 0.000
15 1.000 \ 0.009 3.000 \ 0.009 4.000 \ 0.009 2.000 \ 0.009 0.000
20 1.565 \ 0.128 2.999 \ 0.129 3.996 \ 0.134 1.439 \ 0.128 0.006
25 1.994 \ 0.121 2.999 \ 0.122 3.998 \ 0.128 1.009 \ 0.121 0.007
30 2.005 \ 0.006 2.998 \ 0.006 3.993 \ 0.007 1.004 \ 0.006 0.001
35 2.853 \ 0.118 2.104 \ 0.115 1.429 \ 0.009 3.614 \ 0.118 0.109
40 3.024 \ 0.120 2.001 \ 0.118 1.004 \ 0.100 3.972 \ 0.121 0.021
45 2.191 \ 0.005 2.735 \ 0.005 3.402 \ 0.006 1.673 \ 0.005 0.001
50 2.001 \ 0.144 2.999 \ 0.145 3.998 \ 0.149 1.002 \ 0.144 0.005
100 2.928 \ 0.003 2.054 \ 0.003 1.168 \ 0.003 3.851 \ 0.003 0.000
200 2.925 \ 0.003 2.062 \ 0.003 1.186 \ 0.003 3.827 \ 0.003 0.000
500 2.917 \ 0.003 2.079 \ 0.003 1.207 \ 0.003 3.797 \ 0.003 0.000
1000 2.884 \ 0.003 2.104 \ 0.003 1.286 \ 0.003 3.726 \ 0.003 0.000
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.010 3.000 \ 0.009 1.000 \ 0.009 2.000 \ 0.009 0.001
10 4.000 \ 0.572 2.000 \ 0.567 1.000 \ 0.534 3.000 \ 0.568 0.038
15 4.000 \ 0.490 2.000 \ 0.486 1.000 \ 0.461 3.000 \ 0.489 0.029
20 4.000 \ 0.684 2.000 \ 0.682 1.000 \ 0.672 3.000 \ 0.684 0.012
25 4.000 \ 0.566 2.000 \ 0.564 1.000 \ 0.553 3.000 \ 0.566 0.013
30 2.000 \ 0.399 3.000 \ 0.399 4.000 \ 0.401 1.000 \ 0.399 0.002
35 2.000 \ 0.477 3.000 \ 0.478 4.000 \ 0.482 1.000 \ 0.477 0.005
40 2.000 \ 0.588 3.000 \ 0.589 4.000 \ 0.596 1.000 \ 0.588 0.008
45 2.000 \ 0.448 3.000 \ 0.448 4.000 \ 0.449 1.000 \ 0.447 0.002
50 2.000 \ 0.484 3.000 \ 0.484 4.000 \ 0.485 1.000 \ 0.480 0.005
100 3.000 \ 0.559 2.000 \ 0.559 1.000 \ 0.559 4.000 \ 0.559 0.000
200 2.000 \ 0.556 3.000 \ 0.556 4.000 \ 0.557 1.000 \ 0.556 0.001
500 2.000 \ 0.550 3.000 \ 0.550 4.000 \ 0.550 1.000 \ 0.550 0.000 1000 2.000 \ 0.561 3.000 \ 0.561 4.000 \ 0.561 1.000 \ 0.561 0.000
64
Table 21
Multimodal Lumpy—Accuracy of T Scores on Kurtosis
Deviation from Target (3)
n Rank \ Value Range
B T V R
5 1.000 \ 1.150 2.000 \ 1.157 4.000 \ 1.197 3.000 \ 1.158 0.047
10 1.000 \ 1.043 3.000 \ 1.047 4.000 \ 1.077 2.000 \ 1.046 0.034
15 1.000 \ 1.022 3.000 \ 1.026 4.000 \ 1.050 2.000 \ 1.024 0.028
20 1.000 \ 1.042 3.000 \ 1.045 4.000 \ 1.064 2.000 \ 1.043 0.022
25 1.999 \ 1.083 3.000 \ 1.086 4.000 \ 1.101 1.001 \ 1.083 0.018
30 2.000 \ 1.108 3.000 \ 1.110 4.000 \ 1.123 1.000 \ 1.107 0.016
35 2.000 \ 1.049 3.000 \ 1.052 4.000 \ 1.070 1.000 \ 1.048 0.022
40 2.000 \ 1.021 3.000 \ 1.024 4.000 \ 1.044 1.000 \ 1.020 0.024
45 2.000 \ 1.105 3.000 \ 1.107 4.000 \ 1.117 1.000 \ 1.105 0.012
50 2.000 \ 1.072 3.000 \ 1.074 3.999 \ 1.084 1.001 \ 1.072 0.012
100 2.000 \ 0.960 3.000 \ 0.962 4.000 \ 0.978 1.000 \ 0.958 0.020
200 2.000 \ 0.950 3.000 \ 0.952 4.000 \ 0.962 1.000 \ 0.949 0.013
500 2.000 \ 0.943 3.000 \ 0.944 4.000 \ 0.950 1.000 \ 0.942 0.008
1000 2.000 \ 0.942 3.000 \ 0.942 4.000 \ 0.942 1.000 \ 0.941 0.001
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.121 3.000 \ 0.121 1.000 \ 0.117 2.000 \ 0.121 0.004
10 4.000 \ 0.410 2.000 \ 0.403 1.000 \ 0.390 3.000 \ 0.403 0.020
15 4.000 \ 0.426 2.000 \ 0.424 1.000 \ 0.415 3.000 \ 0.425 0.011
20 1.000 \ 0.480 3.000 \ 0.481 4.000 \ 0.485 2.000 \ 0.480 0.005
25 2.000 \ 0.416 3.000 \ 0.416 4.000 \ 0.420 1.000 \ 0.416 0.004
30 2.000 \ 0.348 3.000 \ 0.348 4.000 \ 0.355 1.000 \ 0.347 0.008
35 2.000 \ 0.435 3.000 \ 0.435 4.000 \ 0.439 1.000 \ 0.435 0.004
40 2.000 \ 0.463 3.000 \ 0.464 4.000 \ 0.473 1.000 \ 0.463 0.010
45 2.000 \ 0.399 3.000 \ 0.399 4.000 \ 0.400 1.000 \ 0.399 0.001
50 2.000 \ 0.432 3.000 \ 0.432 4.000 \ 0.434 1.000 \ 0.432 0.002
100 1.000 \ 0.472 3.000 \ 0.472 4.000 \ 0.472 2.000 \ 0.472 0.000
200 2.000 \ 0.475 3.000 \ 0.475 4.000 \ 0.475 1.000 \ 0.474 0.001
500 2.000 \ 0.471 3.000 \ 0.471 4.000 \ 0.471 1.000 \ 0.471 0.000 1000 2.000 \ 0.478 3.000 \ 0.478 4.000 \ 0.478 1.000 \ 0.478 0.000
65
Table 22
Mass at Zero with Gap—Accuracy of T Scores on Means
Deviation from Target (50)
n Rank \ Value
B T V R
5 1.190 \ 0.000 1.342 \ 0.000 1.561 \ 0.000 1.001 \ 0.000
10 1.989 \ 0.000 1.520 \ 0.000 2.024 \ 0.000 2.105 \ 0.000
15 1.619 \ 0.000 2.125 \ 0.000 2.031 \ 0.000 1.882 \ 0.000
20 1.535 \ 0.000 1.665 \ 0.000 2.191 \ 0.000 2.268 \ 0.000
25 2.016 \ 0.000 1.550 \ 0.000 1.807 \ 0.000 2.747 \ 0.000
30 2.103 \ 0.000 2.111 \ 0.000 2.396 \ 0.000 2.036 \ 0.000
35 2.618 \ 0.000 1.833 \ 0.000 2.457 \ 0.000 1.926 \ 0.000
40 2.503 \ 0.000 1.926 \ 0.000 1.804 \ 0.000 2.093 \ 0.000
45 2.514 \ 0.000 2.011 \ 0.000 1.863 \ 0.000 2.242 \ 0.000
50 2.410 \ 0.000 2.026 \ 0.000 1.932 \ 0.000 2.659 \ 0.000
100 2.350 \ 0.000 2.218 \ 0.000 2.594 \ 0.000 2.267 \ 0.000
200 2.352 \ 0.000 2.365 \ 0.000 2.273 \ 0.000 2.695 \ 0.000
500 2.542 \ 0.000 2.393 \ 0.000 2.161 \ 0.000 2.814 \ 0.000
1000 2.538 \ 0.000 2.331 \ 0.000 2.547 \ 0.000 2.523 \ 0.000
Magnitude of Deviation (RMS)
n Rank \ Value
B T V R
5 2.000 \ 0.000 2.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
10 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
15 2.500 \ 0.000 4.000 \ 0.000 2.500 \ 0.000 1.000 \ 0.000
20 3.000 \ 0.000 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
25 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000
30 3.000 \ 0.000 1.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
35 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000 1.000 \ 0.000
40 2.000 \ 0.000 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
45 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000
50 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
100 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
500 2.000 \ 0.000 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 2.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 1.000 \ 0.000
66
Table 23
Mass at Zero with Gap—Accuracy of T Scores on Standard Deviations
Deviation from Target (10)
n Rank \ Value Range
B T V R
5 1.000 \ 6.281 2.186 \ 6.372 2.779 \ 6.937 1.593 \ 6.331 0.656
10 1.000 \ 4.589 2.721 \ 4.678 3.581 \ 5.255 1.860 \ 4.608 0.666
15 1.382 \ 3.898 2.909 \ 3.976 3.864 \ 4.491 1.572 \ 3.900 0.593
20 1.957 \ 3.255 2.987 \ 3.323 3.981 \ 3.786 1.036 \ 3.246 0.540
25 2.000 \ 2.976 3.000 \ 3.035 4.000 \ 3.444 1.000 \ 2.962 0.482
30 2.000 \ 2.940 3.000 \ 2.993 4.000 \ 3.358 1.000 \ 2.925 0.433
35 1.999 \ 3.215 2.999 \ 3.262 3.998 \ 3.589 1.000 \ 3.199 0.390
40 2.000 \ 3.163 2.999 \ 3.206 3.999 \ 3.506 1.000 \ 3.146 0.360
45 2.000 \ 3.113 3.000 \ 3.153 4.000 \ 3.430 1.000 \ 3.096 0.334
50 2.000 \ 3.078 3.000 \ 3.114 4.000 \ 3.373 1.000 \ 3.060 0.313
100 2.000 \ 2.909 3.000 \ 2.931 4.000 \ 3.089 1.000 \ 2.893 0.196
200 2.000 \ 2.802 3.000 \ 2.815 4.000 \ 2.909 1.000 \ 2.790 0.119
500 2.000 \ 2.759 3.000 \ 2.765 4.000 \ 2.811 1.000 \ 2.752 0.059
1000 1.957 \ 2.746 2.810 \ 2.750 4.000 \ 2.776 1.233 \ 2.743 0.033
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 2.750 3.000 \ 2.717 2.000 \ 2.687 1.000 \ 2.283 0.467
10 4.000 \ 2.676 3.000 \ 2.668 2.000 \ 2.634 1.000 \ 2.362 0.314
15 3.000 \ 0.904 4.000 \ 0.906 2.000 \ 0.899 1.000 \ 0.867 0.037
20 4.000 \ 0.645 3.000 \ 0.644 2.000 \ 0.642 1.000 \ 0.626 0.019
25 4.000 \ 0.520 3.000 \ 0.519 2.000 \ 0.519 1.000 \ 0.511 0.009
30 4.000 \ 0.524 3.000 \ 0.523 2.000 \ 0.523 1.000 \ 0.517 0.007
35 4.000 \ 0.826 3.000 \ 0.824 2.000 \ 0.824 1.000 \ 0.814 0.012
40 4.000 \ 0.623 3.000 \ 0.623 2.000 \ 0.622 1.000 \ 0.617 0.006
45 4.000 \ 0.681 3.000 \ 0.680 2.000 \ 0.680 1.000 \ 0.677 0.004
50 4.000 \ 0.662 3.000 \ 0.661 2.000 \ 0.661 1.000 \ 0.658 0.004
100 1.000 \ 0.348 2.000 \ 0.348 4.000 \ 0.348 3.000 \ 0.348 0.000
200 1.000 \ 0.264 2.000 \ 0.265 3.000 \ 0.265 4.000 \ 0.265 0.001
500 1.000 \ 0.169 2.000 \ 0.169 3.000 \ 0.169 4.000 \ 0.170 0.001 1000 1.000 \ 0.119 2.000 \ 0.119 4.000 \ 0.119 3.000 \ 0.119 0.000
67
Table 24
Mass at Zero with Gap—Accuracy of T Scores on Skewness
Deviation from Target (0)
n Rank \ Value Range
B T V R
5 4.000 \ 0.719 3.000 \ 0.718 1.000 \ 0.715 2.000 \ 0.718 0.004
10 4.000 \ 0.688 2.000 \ 0.687 1.000 \ 0.684 3.000 \ 0.687 0.004
15 3.999 \ 0.675 2.000 \ 0.675 1.000 \ 0.672 3.000 \ 0.675 0.003
20 4.000 \ 0.761 2.000 \ 0.760 1.000 \ 0.753 3.000 \ 0.761 0.008
25 3.998 \ 0.774 2.000 \ 0.772 1.000 \ 0.764 3.002 \ 0.774 0.010
30 3.056 \ 0.808 2.000 \ 0.806 1.000 \ 0.794 3.944 \ 0.808 0.014
35 2.999 \ 0.692 2.000 \ 0.691 1.001 \ 0.689 4.000 \ 0.692 0.003
40 3.002 \ 0.676 2.000 \ 0.676 1.001 \ 0.674 3.997 \ 0.676 0.002
45 2.999 \ 0.676 1.999 \ 0.676 1.002 \ 0.674 3.999 \ 0.676 0.002
50 2.997 \ 0.694 2.003 \ 0.694 1.001 \ 0.692 3.997 \ 0.695 0.003
100 3.000 \ 0.701 2.000 \ 0.701 1.000 \ 0.699 4.000 \ 0.701 0.002
200 3.000 \ 0.747 2.000 \ 0.747 1.000 \ 0.743 4.000 \ 0.748 0.005
500 3.000 \ 0.749 2.000 \ 0.749 1.000 \ 0.746 4.000 \ 0.749 0.003
1000 2.999 \ 0.701 2.000 \ 0.700 1.001 \ 0.700 3.999 \ 0.701 0.001
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.491 3.000 \ 0.491 1.000 \ 0.491 2.000 \ 0.491 0.000
10 4.000 \ 0.575 2.000 \ 0.574 1.000 \ 0.573 3.000 \ 0.574 0.002
15 1.000 \ 0.676 3.000 \ 0.676 4.000 \ 0.677 2.000 \ 0.676 0.001
20 1.000 \ 0.692 3.000 \ 0.692 4.000 \ 0.693 2.000 \ 0.692 0.001
25 2.000 \ 0.626 3.000 \ 0.626 4.000 \ 0.632 1.000 \ 0.625 0.007
30 1.000 \ 0.751 3.000 \ 0.753 4.000 \ 0.764 2.000 \ 0.751 0.002
35 2.000 \ 0.684 3.000 \ 0.685 4.000 \ 0.686 1.000 \ 0.684 0.002
40 2.000 \ 0.658 3.000 \ 0.658 4.000 \ 0.659 1.000 \ 0.658 0.001
45 2.000 \ 0.686 3.000 \ 0.686 4.000 \ 0.687 1.000 \ 0.686 0.001
50 2.000 \ 0.693 3.000 \ 0.693 4.000 \ 0.694 1.000 \ 0.693 0.001
100 2.000 \ 0.714 3.000 \ 0.714 4.000 \ 0.714 1.000 \ 0.714 0.000
200 2.000 \ 0.681 3.000 \ 0.682 4.000 \ 0.685 1.000 \ 0.681 0.004
500 2.000 \ 0.691 3.000 \ 0.691 4.000 \ 0.693 1.000 \ 0.691 0.002 1000 2.000 \ 0.715 3.000 \ 0.715 4.000 \ 0.715 1.000 \ 0.715 0.000
68
Table 25
Mass at Zero with Gap—Accuracy of T Scores on Kurtosis
Deviation from Target (3)
n Rank \ Value Range
B T V R
5 1.000 \ 0.144 2.000 \ 0.145 4.000 \ 0.150 3.000 \ 0.145 0.006
10 1.000 \ 0.183 2.500 \ 0.184 4.000 \ 0.190 2.500 \ 0.184 0.007
15 1.001 \ 0.199 3.000 \ 0.200 3.999 \ 0.206 2.000 \ 0.200 0.007
20 1.001 \ 0.517 3.000 \ 0.518 3.999 \ 0.528 2.000 \ 0.517 0.011
25 1.000 \ 0.846 3.000 \ 0.848 4.000 \ 0.858 2.000 \ 0.846 0.012
30 1.742 \ 0.779 3.000 \ 0.782 4.000 \ 0.797 1.258 \ 0.780 0.018
35 2.000 \ 0.180 2.999 \ 0.181 4.000 \ 0.185 1.000 \ 0.180 0.005
40 1.997 \ 0.202 3.000 \ 0.203 3.999 \ 0.206 1.003 \ 0.202 0.004
45 2.000 \ 0.199 3.000 \ 0.200 3.997 \ 0.203 1.002 \ 0.199 0.004
50 2.001 \ 0.172 2.999 \ 0.172 3.998 \ 0.175 1.002 \ 0.171 0.004
100 2.000 \ 0.163 3.000 \ 0.164 3.998 \ 0.166 1.001 \ 0.163 0.003
200 2.000 \ 0.423 3.000 \ 0.424 4.000 \ 0.430 1.000 \ 0.423 0.007
500 2.000 \ 0.421 3.000 \ 0.422 4.000 \ 0.425 1.000 \ 0.421 0.004
1000 2.000 \ 0.164 2.999 \ 0.164 3.999 \ 0.164 1.002 \ 0.164 0.000
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 1.000 \ 0.807 2.000 \ 0.809 4.000 \ 0.819 3.000 \ 0.809 0.012
10 1.000 \ 0.675 3.000 \ 0.679 4.000 \ 0.705 2.000 \ 0.678 0.030
15 1.000 \ 0.667 3.000 \ 0.670 4.000 \ 0.691 2.000 \ 0.669 0.024
20 1.000 \ 0.615 3.000 \ 0.617 4.000 \ 0.628 2.000 \ 0.615 0.013
25 2.000 \ 0.945 3.000 \ 0.945 4.000 \ 0.951 1.000 \ 0.945 0.006
30 1.000 \ 0.900 3.000 \ 0.902 4.000 \ 0.913 2.000 \ 0.900 0.013
35 2.000 \ 0.740 3.000 \ 0.740 4.000 \ 0.745 1.000 \ 0.740 0.005
40 2.000 \ 0.754 3.000 \ 0.755 4.000 \ 0.756 1.000 \ 0.754 0.002
45 2.000 \ 0.715 3.000 \ 0.716 4.000 \ 0.724 1.000 \ 0.715 0.009
50 2.000 \ 0.611 3.000 \ 0.613 4.000 \ 0.622 1.000 \ 0.611 0.011
100 2.000 \ 0.687 3.000 \ 0.688 4.000 \ 0.691 1.000 \ 0.687 0.004
200 2.000 \ 0.940 3.000 \ 0.940 4.000 \ 0.942 1.000 \ 0.940 0.002
500 2.000 \ 0.945 3.000 \ 0.945 4.000 \ 0.946 1.000 \ 0.945 0.001 1000 2.000 \ 0.657 3.000 \ 0.657 4.000 \ 0.657 1.000 \ 0.656 0.001
69
Table 26
Extreme Asymmetric, Decay—Accuracy of T Scores on Means
Deviation from Target (50)
n Rank \ Value
B T V R
5 1.872 \ 0.000 1.520 \ 0.000 1.575 \ 0.000 1.032 \ 0.000
10 1.748 \ 0.000 1.604 \ 0.000 1.930 \ 0.000 2.158 \ 0.000
15 1.703 \ 0.000 1.945 \ 0.000 1.967 \ 0.000 1.854 \ 0.000
20 2.455 \ 0.000 1.720 \ 0.000 2.076 \ 0.000 1.546 \ 0.000
25 1.918 \ 0.000 1.684 \ 0.000 1.823 \ 0.000 2.114 \ 0.000
30 2.150 \ 0.000 1.895 \ 0.000 2.018 \ 0.000 1.992 \ 0.000
35 2.203 \ 0.000 2.284 \ 0.000 2.063 \ 0.000 2.092 \ 0.000
40 1.705 \ 0.000 2.564 \ 0.000 1.867 \ 0.000 2.313 \ 0.000
45 2.109 \ 0.000 2.300 \ 0.000 1.912 \ 0.000 1.831 \ 0.000
50 1.890 \ 0.000 2.431 \ 0.000 2.297 \ 0.000 1.955 \ 0.000
100 2.272 \ 0.000 2.329 \ 0.000 2.166 \ 0.000 2.466 \ 0.000
200 2.503 \ 0.000 2.275 \ 0.000 2.392 \ 0.000 2.389 \ 0.000
500 2.293 \ 0.000 2.640 \ 0.000 2.485 \ 0.000 2.444 \ 0.000
1000 2.528 \ 0.000 2.582 \ 0.000 2.447 \ 0.000 2.379 \ 0.000
Magnitude of Deviation (RMS)
n Rank \ Value
B T V R
5 2.000 \ 0.000 2.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
10 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000
15 3.000 \ 0.000 4.000 \ 0.000 1.000 \ 0.000 2.000 \ 0.000
20 3.000 \ 0.000 4.000 \ 0.000 1.500 \ 0.000 1.500 \ 0.000
25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
35 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
40 3.000 \ 0.000 2.000 \ 0.000 1.000 \ 0.000 4.000 \ 0.000
45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
50 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
70
Table 27
Extreme Asymmetric, Decay—Accuracy of T Scores on Standard Deviations
Deviation from Target (10)
n Rank \ Value Range
B T V R
5 1.000 \ 3.410 2.873 \ 3.566 3.810 \ 4.547 1.937 \ 3.494 1.137
10 1.000 \ 2.171 2.995 \ 2.287 3.993 \ 3.059 1.998 \ 2.190 0.888
15 1.974 \ 1.801 3.000 \ 1.892 4.000 \ 2.511 1.026 \ 1.798 0.713
20 2.000 \ 1.611 3.000 \ 1.687 4.000 \ 2.207 1.000 \ 1.599 0.608
25 2.000 \ 1.495 3.000 \ 1.560 4.000 \ 2.011 1.000 \ 1.480 0.531
30 2.000 \ 1.319 3.000 \ 1.376 4.000 \ 1.779 1.000 \ 1.301 0.478
35 2.000 \ 1.260 3.000 \ 1.312 4.000 \ 1.674 1.000 \ 1.241 0.433
40 2.000 \ 1.228 3.000 \ 1.275 4.000 \ 1.604 1.000 \ 1.209 0.395
45 2.000 \ 1.203 3.000 \ 1.246 4.000 \ 1.548 1.000 \ 1.184 0.364
50 2.000 \ 1.184 3.000 \ 1.224 4.000 \ 1.504 1.000 \ 1.165 0.339
100 2.000 \ 1.129 3.000 \ 1.152 4.000 \ 1.317 1.000 \ 1.114 0.203
200 2.000 \ 1.055 3.000 \ 1.068 4.000 \ 1.162 1.000 \ 1.044 0.118
500 2.000 \ 1.017 3.000 \ 1.022 4.000 \ 1.066 1.000 \ 1.010 0.056
1000 2.022 \ 1.003 2.911 \ 1.006 4.000 \ 1.030 1.067 \ 1.001 0.029
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.307 3.000 \ 0.303 2.000 \ 0.300 1.000 \ 0.257 0.050
10 4.000 \ 0.460 3.000 \ 0.459 2.000 \ 0.454 1.000 \ 0.414 0.046
15 4.000 \ 0.365 3.000 \ 0.365 2.000 \ 0.362 1.000 \ 0.340 0.025
20 4.000 \ 0.451 3.000 \ 0.450 2.000 \ 0.447 1.000 \ 0.428 0.023
25 4.000 \ 0.301 3.000 \ 0.300 2.000 \ 0.299 1.000 \ 0.287 0.014
30 4.000 \ 0.283 3.000 \ 0.283 2.000 \ 0.282 1.000 \ 0.273 0.010
35 4.000 \ 0.251 3.000 \ 0.250 2.000 \ 0.249 1.000 \ 0.240 0.011
40 4.000 \ 0.251 3.000 \ 0.250 2.000 \ 0.249 1.000 \ 0.239 0.012
45 4.000 \ 0.215 3.000 \ 0.215 2.000 \ 0.214 1.000 \ 0.209 0.006
50 4.000 \ 0.215 3.000 \ 0.215 2.000 \ 0.214 1.000 \ 0.209 0.006
100 4.000 \ 0.176 3.000 \ 0.175 2.000 \ 0.175 1.000 \ 0.174 0.002
200 4.000 \ 0.121 3.000 \ 0.120 2.000 \ 0.120 1.000 \ 0.119 0.002
500 4.000 \ 0.008 3.000 \ 0.008 2.000 \ 0.008 1.000 \ 0.008 0.000 1000 2.000 \ 0.005 3.000 \ 0.005 4.000 \ 0.005 1.000 \ 0.005 0.000
71
Table 28
Extreme Asymmetric, Decay—Accuracy of T Scores on Skewness
Deviation from Target (0)
n Rank \ Value Range
B T V R
5 4.000 \ 0.668 3.000 \ 0.666 1.000 \ 0.653 2.000 \ 0.665 0.015
10 4.000 \ 0.647 2.000 \ 0.645 1.000 \ 0.635 3.000 \ 0.645 0.012
15 3.998 \ 0.635 2.000 \ 0.633 1.001 \ 0.625 3.001 \ 0.634 0.010
20 3.994 \ 0.605 2.000 \ 0.604 1.000 \ 0.597 3.006 \ 0.605 0.008
25 3.000 \ 0.578 2.000 \ 0.577 1.000 \ 0.571 3.999 \ 0.578 0.007
30 3.000 \ 0.323 2.000 \ 0.322 1.000 \ 0.315 4.000 \ 0.323 0.008
35 3.000 \ 0.235 2.000 \ 0.234 1.001 \ 0.226 3.999 \ 0.235 0.009
40 3.000 \ 0.156 2.000 \ 0.155 1.000 \ 0.146 4.000 \ 0.157 0.011
45 3.000 \ 0.101 2.001 \ 0.010 1.001 \ 0.010 4.000 \ 0.102 0.092
50 3.000 \ 0.009 2.000 \ 0.009 1.000 \ 0.008 3.999 \ 0.009 0.001
100 3.000 \ 0.591 2.000 \ 0.590 1.000 \ 0.588 4.000 \ 0.591 0.003
200 3.000 \ 0.501 2.000 \ 0.500 1.000 \ 0.493 4.000 \ 0.502 0.009
500 3.000 \ 0.505 2.000 \ 0.504 1.000 \ 0.501 4.000 \ 0.505 0.004
1000 3.000 \ 0.505 2.000 \ 0.505 1.000 \ 0.503 4.000 \ 0.506 0.003
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.208 3.000 \ 0.208 1.000 \ 0.207 2.000 \ 0.208 0.001
10 1.000 \ 0.587 3.000 \ 0.588 4.000 \ 0.599 2.000 \ 0.588 0.012
15 4.000 \ 0.612 3.000 \ 0.612 1.000 \ 0.611 2.000 \ 0.612 0.001
20 3.000 \ 0.611 2.000 \ 0.611 4.000 \ 0.611 1.000 \ 0.611 0.000
25 2.000 \ 0.594 3.000 \ 0.594 4.000 \ 0.596 1.000 \ 0.594 0.002
30 2.000 \ 0.551 3.000 \ 0.551 4.000 \ 0.551 1.000 \ 0.551 0.000
35 3.000 \ 0.654 2.000 \ 0.653 1.000 \ 0.649 4.000 \ 0.654 0.005
40 4.000 \ 0.695 2.000 \ 0.694 1.000 \ 0.693 3.000 \ 0.695 0.002
45 3.000 \ 0.717 2.000 \ 0.717 1.000 \ 0.713 4.000 \ 0.717 0.004
50 2.000 \ 0.682 3.000 \ 0.682 4.000 \ 0.682 1.000 \ 0.682 0.000
100 2.000 \ 0.652 3.000 \ 0.652 4.000 \ 0.654 1.000 \ 0.652 0.002
200 2.000 \ 0.562 3.000 \ 0.562 4.000 \ 0.566 1.000 \ 0.561 0.005
500 2.000 \ 0.573 3.000 \ 0.573 4.000 \ 0.576 1.000 \ 0.573 0.003 1000 2.000 \ 0.571 3.000 \ 0.571 4.000 \ 0.573 1.000 \ 0.571 0.002
72
Table 29
Extreme Asymmetric, Decay—Accuracy of T Scores on Kurtosis
Deviation from Target (3)
n Rank \ Value Range
B T V R
5 1.000 \ 0.857 2.000 \ 0.861 4.000 \ 0.886 3.000 \ 0.862 0.029
10 1.000 \ 0.848 3.000 \ 0.851 4.000 \ 0.865 2.000 \ 0.850 0.017
15 1.001 \ 0.851 3.000 \ 0.853 3.999 \ 0.864 2.000 \ 0.852 0.013
20 1.000 \ 0.877 3.000 \ 0.878 4.000 \ 0.887 2.000 \ 0.877 0.010
25 2.000 \ 0.903 3.000 \ 0.904 4.000 \ 0.912 1.005 \ 0.903 0.009
30 2.000 \ 1.047 3.000 \ 1.048 4.000 \ 1.055 1.000 \ 1.046 0.009
35 2.000 \ 1.119 3.000 \ 1.120 4.000 \ 1.127 1.000 \ 1.119 0.008
40 2.000 \ 1.171 3.000 \ 1.172 4.000 \ 1.180 1.000 \ 1.171 0.009
45 2.000 \ 1.190 3.000 \ 1.191 3.999 \ 1.198 1.001 \ 1.190 0.008
50 2.001 \ 1.178 3.000 \ 1.179 3.999 \ 1.184 1.000 \ 1.178 0.006
100 2.001 \ 0.879 2.999 \ 0.879 4.000 \ 0.882 1.000 \ 0.879 0.003
200 2.000 \ 0.936 3.000 \ 0.937 4.000 \ 0.942 1.000 \ 0.936 0.006
500 2.000 \ 0.933 3.000 \ 0.934 4.000 \ 0.936 1.000 \ 0.933 0.003
1000 2.000 \ 0.933 3.000 \ 0.933 4.000 \ 0.934 1.000 \ 0.932 0.002
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.006 3.000 \ 0.006 1.000 \ 0.005 2.000 \ 0.005 0.001
10 1.000 \ 0.726 3.000 \ 0.729 4.000 \ 0.749 2.000 \ 0.728 0.023
15 2.000 \ 0.815 3.000 \ 0.816 4.000 \ 0.818 1.000 \ 0.815 0.003
20 1.000 \ 0.716 3.000 \ 0.717 4.000 \ 0.723 2.000 \ 0.716 0.007
25 2.000 \ 0.786 3.000 \ 0.787 4.000 \ 0.790 1.000 \ 0.786 0.004
30 2.000 \ 0.783 3.000 \ 0.784 4.000 \ 0.787 1.000 \ 0.783 0.004
35 1.000 \ 0.581 2.000 \ 0.581 4.000 \ 0.582 3.000 \ 0.581 0.001
40 4.000 \ 0.662 2.000 \ 0.661 1.000 \ 0.660 3.000 \ 0.662 0.002
45 3.000 \ 0.649 2.000 \ 0.648 1.000 \ 0.647 4.000 \ 0.649 0.002
50 2.000 \ 0.633 3.000 \ 0.633 4.000 \ 0.634 1.000 \ 0.633 0.001
100 2.000 \ 0.831 3.000 \ 0.831 4.000 \ 0.831 1.000 \ 0.831 0.000
200 2.000 \ 0.541 3.000 \ 0.541 4.000 \ 0.543 1.000 \ 0.541 0.002
500 2.000 \ 0.556 3.000 \ 0.556 4.000 \ 0.557 1.000 \ 0.556 0.001 1000 2.000 \ 0.563 3.000 \ 0.563 4.000 \ 0.564 1.000 \ 0.563 0.001
73
Table 30
Extreme Bimodal—Accuracy of T Scores on Means
Deviation from Target (50)
n Rank \ Value
B T V R
5 1.550 \ 0.000 1.590 \ 0.000 1.514 \ 0.000 1.075 \ 0.000
10 1.817 \ 0.000 1.602 \ 0.000 2.090 \ 0.000 1.741 \ 0.000
15 1.775 \ 0.000 2.050 \ 0.000 1.920 \ 0.000 1.693 \ 0.000
20 2.088 \ 0.000 1.717 \ 0.000 2.187 \ 0.000 1.928 \ 0.000
25 1.930 \ 0.000 1.722 \ 0.000 1.963 \ 0.000 2.033 \ 0.000
30 2.135 \ 0.000 1.913 \ 0.000 2.035 \ 0.000 2.069 \ 0.000
35 2.196 \ 0.000 2.195 \ 0.000 2.118 \ 0.000 2.131 \ 0.000
40 1.983 \ 0.000 2.133 \ 0.000 2.112 \ 0.000 2.226 \ 0.000
45 1.903 \ 0.000 2.309 \ 0.000 1.921 \ 0.000 1.936 \ 0.000
50 2.152 \ 0.000 2.085 \ 0.000 2.109 \ 0.000 2.057 \ 0.000
100 2.226 \ 0.000 2.390 \ 0.000 2.135 \ 0.000 2.351 \ 0.000
200 2.443 \ 0.000 2.348 \ 0.000 2.373 \ 0.000 2.346 \ 0.000
500 2.451 \ 0.000 2.501 \ 0.000 2.468 \ 0.000 2.450 \ 0.000
1000 2.476 \ 0.000 2.515 \ 0.000 2.511 \ 0.000 2.441 \ 0.000
Magnitude of Deviation (RMS)
n Rank \ Value
B T V R
5 2.000 \ 0.000 2.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
10 1.000 \ 0.000 2.500 \ 0.000 2.500 \ 0.000 4.000 \ 0.000
15 1.000 \ 0.000 2.000 \ 0.000 3.500 \ 0.000 3.500 \ 0.000
20 1.000 \ 0.000 4.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000
25 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
30 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
35 1.000 \ 0.000 3.000 \ 0.000 2.000 \ 0.000 4.000 \ 0.000
40 2.000 \ 0.000 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
45 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
50 1.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 2.000 \ 0.000
100 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
200 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
500 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000 1000 1.000 \ 0.000 2.000 \ 0.000 3.000 \ 0.000 4.000 \ 0.000
74
Table 31
Extreme Bimodal—Accuracy of T Scores on Standard Deviations
Deviation from Target (10)
n Rank \ Value Range
B T V R
5 1.000 \ 2.639 2.993 \ 2.811 3.989 \ 3.899 1.996 \ 2.730 1.260
10 1.000 \ 1.865 3.000 \ 1.982 4.000 \ 2.761 2.000 \ 1.882 0.896
15 1.998 \ 1.570 3.000 \ 1.659 4.000 \ 2.271 1.000 \ 1.565 0.701
20 2.000 \ 1.413 3.000 \ 1.486 4.000 \ 1.994 1.000 \ 1.400 0.594
25 2.000 \ 1.318 3.000 \ 1.381 4.000 \ 1.820 1.000 \ 1.302 0.518
30 2.000 \ 1.269 3.000 \ 1.324 4.000 \ 1.708 1.000 \ 1.252 0.456
35 2.000 \ 1.218 3.000 \ 1.266 4.000 \ 1.611 1.000 \ 1.200 0.411
40 2.000 \ 1.178 3.000 \ 1.222 4.000 \ 1.534 1.000 \ 1.160 0.374
45 2.000 \ 1.142 3.000 \ 1.182 4.000 \ 1.468 1.000 \ 1.123 0.345
50 2.000 \ 1.078 3.000 \ 1.115 4.000 \ 1.382 1.000 \ 1.060 0.322
100 2.000 \ 0.996 3.000 \ 1.018 4.000 \ 1.174 1.000 \ 0.981 0.193
200 2.000 \ 0.931 3.000 \ 0.943 4.000 \ 1.035 1.000 \ 0.921 0.114
500 2.000 \ 0.886 3.000 \ 0.892 4.000 \ 0.936 1.000 \ 0.879 0.057
1000 1.956 \ 0.869 2.986 \ 0.872 4.000 \ 0.897 1.058 \ 0.866 0.031
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.379 3.000 \ 0.371 2.000 \ 0.367 1.000 \ 0.298 0.081
10 4.000 \ 0.421 3.000 \ 0.421 2.000 \ 0.416 1.000 \ 0.379 0.042
15 3.000 \ 0.270 4.000 \ 0.270 2.000 \ 0.266 1.000 \ 0.240 0.030
20 4.000 \ 0.273 3.000 \ 0.273 2.000 \ 0.269 1.000 \ 0.246 0.027
25 4.000 \ 0.206 3.000 \ 0.205 2.000 \ 0.202 1.000 \ 0.184 0.022
30 4.000 \ 0.185 3.000 \ 0.184 2.000 \ 0.182 1.000 \ 0.167 0.018
35 4.000 \ 0.181 3.000 \ 0.180 2.000 \ 0.178 1.000 \ 0.162 0.019
40 4.000 \ 0.162 3.000 \ 0.161 2.000 \ 0.159 1.000 \ 0.145 0.017
45 4.000 \ 0.130 3.000 \ 0.129 2.000 \ 0.127 1.000 \ 0.115 0.015
50 4.000 \ 0.156 3.000 \ 0.155 2.000 \ 0.153 1.000 \ 0.140 0.016
100 4.000 \ 0.106 3.000 \ 0.105 2.000 \ 0.104 1.000 \ 0.010 0.096
200 4.000 \ 0.009 3.000 \ 0.009 2.000 \ 0.009 1.000 \ 0.008 0.001
500 4.000 \ 0.006 3.000 \ 0.005 2.000 \ 0.005 1.000 \ 0.005 0.001 1000 4.000 \ 0.004 3.000 \ 0.004 2.000 \ 0.004 1.000 \ 0.004 0.000
75
Table 32
Extreme Bimodal—Accuracy of T Scores on Skewness
Deviation from Target (0)
n Rank \ Value Range
B T V R
5 3.463 \ 0.003 2.813 \ 0.003 1.552 \ 0.003 2.172 \ 0.003 0.000
10 1.001 \ 0.162 3.000 \ 0.162 3.999 \ 0.166 2.000 \ 0.162 0.004
15 1.003 \ 0.155 3.000 \ 0.155 3.997 \ 0.159 2.000 \ 0.155 0.004
20 1.987 \ 0.149 2.999 \ 0.150 3.997 \ 0.154 1.018 \ 0.149 0.005
25 2.038 \ 0.136 2.768 \ 0.136 3.313 \ 0.138 1.882 \ 0.136 0.002
30 2.990 \ 0.307 2.002 \ 0.307 1.013 \ 0.306 3.995 \ 0.307 0.001
35 2.996 \ 0.304 2.002 \ 0.304 1.003 \ 0.303 3.999 \ 0.304 0.001
40 2.873 \ 0.309 2.018 \ 0.309 1.171 \ 0.309 3.939 \ 0.310 0.001
45 2.999 \ 0.293 2.002 \ 0.293 1.005 \ 0.292 3.995 \ 0.293 0.001
50 2.012 \ 0.170 2.984 \ 0.170 3.956 \ 0.171 1.047 \ 0.170 0.001
100 2.372 \ 0.006 2.645 \ 0.006 2.953 \ 0.006 2.031 \ 0.006 0.000
200 2.377 \ 0.006 2.627 \ 0.006 2.900 \ 0.006 2.099 \ 0.006 0.000
500 2.390 \ 0.005 2.608 \ 0.005 2.838 \ 0.006 2.164 \ 0.005 0.001
1000 2.032 \ 0.318 2.968 \ 0.318 3.905 \ 0.318 1.095 \ 0.318 0.000
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.365 3.000 \ 0.363 1.000 \ 0.352 2.000 \ 0.363 0.013
10 1.000 \ 0.571 3.000 \ 0.572 4.000 \ 0.575 2.000 \ 0.571 0.004
15 2.000 \ 0.664 3.000 \ 0.664 4.000 \ 0.666 1.000 \ 0.663 0.003
20 2.000 \ 0.595 3.000 \ 0.595 4.000 \ 0.597 1.000 \ 0.595 0.002
25 2.000 \ 0.723 3.000 \ 0.723 4.000 \ 0.724 1.000 \ 0.723 0.001
30 4.000 \ 0.562 2.000 \ 0.561 1.000 \ 0.559 3.000 \ 0.562 0.003
35 3.000 \ 0.577 2.000 \ 0.576 1.000 \ 0.576 4.000 \ 0.577 0.001
40 3.000 \ 0.624 2.000 \ 0.623 1.000 \ 0.622 4.000 \ 0.623 0.002
45 3.000 \ 0.584 2.000 \ 0.583 1.000 \ 0.582 4.000 \ 0.584 0.002
50 3.000 \ 0.567 2.000 \ 0.566 1.000 \ 0.563 4.000 \ 0.567 0.004
100 3.000 \ 0.661 2.000 \ 0.661 1.000 \ 0.660 4.000 \ 0.661 0.001
200 3.000 \ 0.654 2.000 \ 0.654 1.000 \ 0.654 4.000 \ 0.654 0.000
500 2.000 \ 0.665 3.000 \ 0.665 4.000 \ 0.665 1.000 \ 0.665 0.000 1000 3.000 \ 0.646 2.000 \ 0.646 1.000 \ 0.646 4.000 \ 0.646 0.000
76
Table 33
Extreme Bimodal—Accuracy of T Scores on Kurtosis
Deviation from Target (3)
n Rank \ Value Range
B T V R
5 1.000 \ 1.235 2.000 \ 1.238 4.000 \ 1.259 3.000 \ 1.239 0.024
10 1.004 \ 1.209 2.999 \ 1.211 3.996 \ 1.219 2.001 \ 1.210 0.010
15 1.002 \ 1.198 3.000 \ 1.199 3.996 \ 1.203 2.003 \ 1.199 0.005
20 1.010 \ 1.203 2.996 \ 1.203 3.988 \ 1.206 2.006 \ 1.203 0.003
25 1.333 \ 1.125 2.994 \ 1.126 3.982 \ 1.130 1.691 \ 1.125 0.005
30 2.000 \ 1.119 3.000 \ 1.120 4.000 \ 1.123 1.000 \ 1.119 0.004
35 2.000 \ 1.136 3.000 \ 1.136 4.000 \ 1.139 1.000 \ 1.136 0.003
40 2.000 \ 1.128 3.000 \ 1.128 4.000 \ 1.131 1.000 \ 1.128 0.003
45 2.000 \ 1.107 3.000 \ 1.107 4.000 \ 1.110 1.000 \ 1.107 0.003
50 2.000 \ 1.089 3.000 \ 1.089 4.000 \ 1.092 1.000 \ 1.089 0.003
100 1.936 \ 1.115 2.994 \ 1.115 3.971 \ 1.118 1.093 \ 1.115 0.003
200 1.953 \ 1.109 2.934 \ 1.109 3.817 \ 1.112 1.292 \ 1.109 0.003
500 2.039 \ 1.104 2.960 \ 1.105 3.867 \ 1.106 1.133 \ 1.104 0.002
1000 1.999 \ 1.085 2.998 \ 1.085 3.996 \ 1.085 1.007 \ 1.085 0.000
Magnitude of Deviation (RMS)
n Rank \ Value Range
B T V R
5 4.000 \ 0.310 3.000 \ 0.307 1.000 \ 0.290 2.000 \ 0.306 0.020
10 4.000 \ 0.593 2.000 \ 0.592 1.000 \ 0.591 3.000 \ 0.592 0.002
15 2.000 \ 0.725 3.000 \ 0.725 4.000 \ 0.726 1.000 \ 0.725 0.001
20 1.000 \ 0.557 3.000 \ 0.557 4.000 \ 0.557 2.000 \ 0.557 0.000
25 2.000 \ 0.657 3.000 \ 0.657 4.000 \ 0.658 1.000 \ 0.657 0.001
30 2.000 \ 0.500 3.000 \ 0.500 4.000 \ 0.501 1.000 \ 0.500 0.001
35 4.000 \ 0.589 2.000 \ 0.589 1.000 \ 0.589 3.000 \ 0.589 0.000
40 3.000 \ 0.600 2.000 \ 0.600 1.000 \ 0.599 4.000 \ 0.600 0.001
45 3.000 \ 0.634 2.000 \ 0.634 1.000 \ 0.633 4.000 \ 0.634 0.001
50 3.000 \ 0.550 2.000 \ 0.550 1.000 \ 0.547 4.000 \ 0.551 0.004
100 3.000 \ 0.645 2.000 \ 0.645 1.000 \ 0.644 4.000 \ 0.645 0.001
200 1.000 \ 0.636 3.000 \ 0.636 4.000 \ 0.636 2.000 \ 0.636 0.000
500 2.000 \ 0.631 3.000 \ 0.631 4.000 \ 0.631 1.000 \ 0.631 0.000 1000 3.000 \ 0.667 2.000 \ 0.667 1.000 \ 0.667 4.000 \ 0.667 0.000
77
The 16 figures that follow plot the range of deviation values for each
distribution against a power curve. The power curve is a regression model that
follows the formula: 1b
otbY = . Curve fitting is only possible for the deviation range
on the second and fourth moments, standard deviation and kurtosis. The first and
third moments, mean and skewness, either contain zeros, which make
transformations impossible, or lack sufficient variability to make curve fitting
worthwhile.
Only the first 10 sample sizes, which increase in increments of five from n =
5 to n = 50, are used for this initial set of figures. Typically, more statistical
variability occurs among smaller samples. This Monte Carlo study was designed to
comprehensively document the ranking methods’ performance at small sample
sizes and to evaluate these trends at larger sample sizes. To serve this end,
several of the small-sample regression models are fitted a second time with the
addition of four sample sizes: n = 100, n = 200, n = 500, and n = 1,000.
78
Figure 13. Smooth Symmetric: Power curve for deviation range of
standard deviation.
Figure 14. Smooth Symmetric: Power curve for deviation range of kurtosis.
79
Figure 15. Discrete Mass at Zero: Power curve for deviation range of
standard deviation.
Figure 16. Discrete Mass at Zero: Power curve for deviation range of
kurtosis.
80
Figure 17. Extreme Asymmetric, Growth: Power curve for deviation range of
standard deviation.
Figure 18. Extreme Asymmetric, Growth: Power curve for deviation range of
kurtosis.
81
Figure 19. Digit Preference: Power curve for deviation range of
standard deviation.
Figure 20. Digit Preference: Power curve for deviation range of kurtosis.
82
Figure 21. Multimodal Lumpy: Power curve for deviation range of
standard deviation.
Figure 22. Multimodal Lumpy: Power curve for deviation range of kurtosis.
83
Figure 23. Mass at Zero with Gap: Power curve for deviation range of
standard deviation.
Figure 24. Mass at Zero with Gap: Power curve for deviation range of
kurtosis.
84
Figure 25. Extreme Asymmetric, Decay: Power curve for deviation range of
standard deviation.
Figure 26. Extreme Asymmetric, Decay: Power curve for deviation range of
kurtosis.
85
Figure 27. Extreme Bimodal: Power curve for deviation range of standard
deviation.
Figure 28. Extreme Bimodal: Power curve for deviation range of kurtosis.
86
Power curves are variously selected for a second fitting with the addition of
the larger sample sizes. Figure 29 shows that inclusion of larger sample sizes
causes the Smooth Symmetric power curve to remain intact. This curve can be
compared with Figure 13. Figure 30 shows the rectifying of the Digit Preference
power curve fit when larger sample sizes are included. Compare this curve with
Figure 19. Figure 31 shows the Mass at Zero with Gap distribution, which achieves
an extremely poor fit when only small samples are included (see Figure 24), but
assumes the basic shape of the power curve with the addition of larger samples.
Together, these three large sample curves illustrate that to whatever extent
predictive patterns are established when n ≤ 50, those regression slopes either
improve in fit or continue to hold when sample sizes increase. Therefore, it does
not seem warranted to present a complete set of power curves with the larger
sample sizes.
87
Figure 29. Smooth Symmetric: Power curve for deviation range of standard
deviation with inclusion of large sample sizes.
Figure 30. Digit Preference: Power curve for deviation range of standard
deviation with inclusion of large sample sizes.
88
Figure 31. Mass at Zero with Gap: Power curve for deviation range of
kurtosis with inclusion of large sample sizes.
89
CHAPTER 5
CONCLUSION
The purpose of this study was to compare the accuracy of the Blom, Tukey,
Van der Waerden, and Rankit approximations in attaining the target moments of
the normal distribution. Means and standard deviations were scaled to the T to
facilitate interpretation in the context of standardized testing in education.
Accuracy was conceptualized in both relative and absolute terms, as expressed in
ranks and absolute values throughout the results tables. Deviation from target and
magnitude of deviation framed the comparison of accuracy measures.
A Monte Carlo simulation allowed the ranking methods’ performance to be
experimentally evaluated under a variety of real distributional conditions. Each
entry in the tables is the product of 10,000 iterations of a random selection
process. Replicating this experiment would produce slightly different numerical
values due to the random processes it involves. However, the design is sufficiently
powerful that the outcome of the comparisons would be identical.
The final two tables summarize the major findings according to moment,
sample size, and distribution. Table 34 presents the average deviation ranks and
values and Table 35 identifies the winning approximations by name. In Table 35,
hyphens ( - ) indicate that all values for the mean are zero. Forward slashes ( / )
indicate that three out of four values for skewness are tied.
90
Table 34
Deviation from Target, Summarized by Moment, Sample Size, and Distribution
Blom Tukey Van der
Waerden Rankit
Rank Value Rank Value Rank Value Rank Value Range
Moment
Mean 2.045 0.000 2.022 0.000 2.034 0.000 2.026 0.000 0.000
Standard Deviation 1.859 1.142 2.985 1.186 3.982 1.603 1.146 1.119 0.484
Skewness 2.668 0.192 2.477 0.192 2.269 0.191 2.586 0.192 0.001
Kurtosis 1.687 0.947 2.915 0.941 3.988 0.952 1.394 0.930 0.022
Sample Size
5 ≤ 50 1.976 0.609 2.585 0.628 3.103 0.769 1.720 0.603 0.166
100 ≤ 1000 2.231 0.435 2.599 0.423 2.962 0.447 1.883 0.416 0.031
Distribution
Smooth Symmetric 2.007 0.393 2.643 0.411 3.196 0.531 1.653 0.391 0.140
Discr Mass Zero 2.033 0.404 2.608 0.421 3.136 0.539 1.715 0.403 0.136
Asym – Growth 1.995 0.453 2.670 0.470 3.257 0.583 1.596 0.452 0.131
Digit Preference 2.039 0.390 2.622 0.408 3.131 0.527 1.692 0.370 0.158
Multimod Lumpy 1.987 0.412 2.624 0.396 3.126 0.510 1.737 0.376 0.134
Mass Zero w/Gap 2.239 1.129 2.465 1.126 2.747 1.204 2.103 1.113 0.092
Asym – Decay 2.238 0.726 2.528 0.739 2.765 0.835 2.046 0.725 0.109
Extreme Bimodal 1.980 0.655 2.649 0.669 3.190 0.765 1.753 0.654 0.112
91
Table 35
Winning Approximations, Summarized by Moment, Sample Size, and Distribution
1st Place 2nd Place 3rd Place 4th Place
Rank \ Value Rank \ Value Rank \ Value Rank \ Value
Moment
Mean T \ - V \ - R \ - B \ -
Standard Deviation R \ R B \ B T \ T V \ V
Skewness V \ V T \ B/T/R R \ B/T/R B \ B/T/R
Kurtosis R \ R B \ T T \ B V \ V
Sample Size
5 ≤ 50 R \ R B \ B T \ T V \ V
100 ≤ 1000 R \ R B \ T T \ B V \ V
Distribution
Achievement
Smooth Symmetric R \ R B \ B T \ T V \ V
Discrete Mass at Zero R \ R B \ B T \ T V \ V
Asymmetric – Growth R \ R B \ B T \ T V \ V
Digit Preference R \ R B \ B T \ T V \ V
Multimodal Lumpy R \ R B \ T T \ B V \ V
Psychometric
Mass at Zero with Gap R \ R B \ T T \ B V \ V
Asymmetric – Decay R \ R B \ B T \ T V \ V
Extreme Bimodal R \ R B \ B T \ T V \ V
92
Discussion
Moment 1—Mean
All four ranking methods attain the target value of 50 for the mean.
Differences appear in the numerical results only after the third decimal place, and
are therefore meaningless in terms of practical application. Most mean deviation
values are machine-constant zeros, meaning they are zero at least until the sixth
decimal place. Although these differences are reflected in the deviation and
magnitude ranks, they do not merit further summary statistics, such as deviation or
RMS range.
Moment 2—Standard Deviation
The absolute and relative accuracy of the four ranking methods in attaining
the target standard deviation differ substantially. Their average absolute deviation
from the target T score standard deviation is 1.263. This means that the
practitioner who uses any of the four ranking methods to normalize test scores
without reference to sample size or distribution can expect to obtain an estimated
standard deviation of 8.737 – 11.263. Adding the test instrument’s standard error
to this compounds the problem. An instrument with a standard error of three (± 3)
and a Z score of two would incur a final T score of 67.474 and 72.526, whose true
range would be between 64.474 and 75.526, for a total of 11.052. Even a standard
error half this size would lead to a true score range of 65.974 to 74.026, or 8.052.
Thus, a standard deviation that is off target by 1.263 would combine with a
standard error of ± 1.5 to nearly triple the size of the true score range, from a
theorized range of three to an actual range of more than eight. This is an increase
93
of 268%. As the standard error increases, the estimated difference between the
theorized and actual score range diminishes. At a standard error of three, this
increase is 184%. At a standard error of four, it becomes 163%.
The smallest observed deviation from the target standard deviation
occurred in the Multimodal Lumpy distribution, for which Rankit obtained an
average deviation value of 0.509. Van der Waerden’s method performed at its
worst in the Mass at Zero with Gap distribution, obtaining a 3.768 deviation value.
In applied terms, this means that a practitioner using Van der Waerden’s formula
for normalizing a standardized test score of Z = 2 could obtain a T score as low as
62.464 or as high as 77.536. Adding in a relatively modest standard error of two
and rounding to the nearest whole number, a test-taker’s strong performance could
result in an actual score as low as 60 or as high as 80. This range would indicate
that the test-taker’s true performance falls somewhere between the 74th
and the
99th
percentile. Such information would be useless for any real testing purpose. On
the other hand, the practitioner who uses Rankit with a Multimodal Lumpy
distribution (Z = 2) would obtain a T score between 68.982 and 71.018. Including a
standard error of two and rounding, the test-taker would see a final score of 67 to
73. A true score range of six is clearly preferable to a range of 20. However, even
the best ranking method produces an estimated half point deviation from the T
score’s target standard deviation. In one of the best applied scenarios, this means
that the true score range would still be 151% higher than the standardized test
instrument’s stated standard error.
94
When assessing the potential impact of selecting a ranking method on the
outcome of T scores, the average deviation score range (Table 34) may be
misleading. It indicates that the difference between the highest and the lowest
deviation values from the target standard deviation is less than a half point (0.484).
However, the gulf between the highest and lowest deviation values across all
distributions is vast: 3.259. By the same token, the average deviation range among
samples of 5 through 50 is 0.614, compared to 0.117 among samples of 100
through 1,000. Much more variability in the extent of ranking methods’ deviation
from target occurs among small samples than among large samples.
RMS values may provide additional insight here. As anticipated, the
magnitude of deviation from target as expressed in RMS values is highest for
samples of n = 5, with only one exception. The highest RMS for Extreme
Asymmetric – Decay is found at sample size 10. The highest RMS among all
sample sizes is 2.750, which is found at sample size 5 in the psychometric
distribution Mass at Zero with Gap. Four of the five achievement distributions attain
an RMS of zero at sample size 200. This lack of deviation magnitude holds for the
larger sample sizes as well. Among achievement distributions, only Extreme
Asymmetric – Growth does not attain a low RMS of zero. At sample size 200, it
reaches 0.003, which tapers off to 0.001 by n = 1,000. The average RMS range
among the five achievement distributions is 0.111 and among the three
psychometric distributions, 1.154. The average RMS range among all eight
distributions is 0.158, with most RMS variability found among the psychometric
distributions and the smaller samples. Curiously, all the worst RMS values belong
95
to Blom, yet Blom achieves second place in terms of relative and absolute
deviation from target (Table 35). This suggests that Blom’s approximation may
work by sacrificing some technical precision for reliability.
Moment 3—Skewness
The four ranking methods’ average deviation from the target skewness
value of the normal distribution is 0.192. The psychometric distribution Mass at
Zero with Gap contains the worst deviation value for skewness and the
achievement distribution Digit Preference contains the best. Blom and Tukey tie at
0.719 for the worst skewness performance in a given distribution, and Blom and
Rankit tie at 0.022 for the best. Ranking methods should not be selected on the
basis of their deviation from target skewness values because the deviation
quantities are small and the differences between them are negligible. Table 25
shows a three-way tie for second place between Blom, Tukey, and Rankit. Van der
Waerden scores its only first-place finish in this case, with a mere 0.001 margin of
win. Furthermore, it is not clear how deviations from normal skewness may affect
test scoring or interpretation.
Moment 4—Kurtosis
Kurtosis values show greater deviation from target than skewness values
but less than standard deviations. The average deviation value for kurtosis across
all sample sizes and distributions is 0.943. The average deviation is higher by
0.222 for the smaller samples (n ≤ 50) than for the larger sample sizes. Although
the difference between the highest deviation value on any distribution and the
lowest (1.145 on Extreme Bimodal and 0.328 on Mass at Zero with Gap,
96
respectively) appears substantial at 0.817, the overall difference between the best-
performing and worst-performing ranking methods is only 0.022 on kurtosis. RMS
values support the conclusion that differences on kurtosis are likely to have little
practical meaning. The psychometric distributions have higher RMS values than
the achievement distributions by an average of 0.176. The highest RMS value,
0.946, occurs at sample size 500 in the Mass at Zero with Gap distribution. The
third-highest RMS value, 0.726, occurs at sample size 15 in the Extreme Bimodal
distribution. The highest RMS value among achievement distributions, 0.556,
occurs at sample size 30 in the Extreme Asymmetric – Growth distribution.
However, all the lowest RMS scores on distributions occur at the smallest sample
size, n = 5. Aside from the question of how kurtosis considerations may actually
affect test scoring or interpretation, RMS values display irregularity and
nonconformity with the patterns established by the second and third moments.
Recommendations
The Blom, Tukey, Van der Waerden, and Rankit approximations display
considerable variability on the even moments, standard deviation and kurtosis.
Only standard deviation, however, has known practical implications for test scoring
and interpretation. Results for the odd moments, mean and skewness, may
contribute to the analytical pursuit of area estimation under the normal distribution.
The great variability between and within ranking methods on the standard
deviation suggests that practitioners should consider both sample size and
distribution when selecting a normalizing procedure.
97
Small samples and skewed distributions aggravate the inaccuracy of all
ranking methods. However, substantial differences between methods and
deviations from target are found among large samples and relatively symmetric
distributions as well. Therefore, scores from large samples should be plotted to
observe population variance, in addition to propensity scores, tail weight, modality,
and symmetry. Practitioners including analysts, educators, and administrators
should also be advised that most test scores are less accurate than they appear.
Caution should be exercised when making decisions based on standardized test
performance.
Table 35 simplifies this selection. Rankit is the most accurate method on the
standard deviation and on kurtosis when sample size and distribution are not taken
into account; it is the most accurate method among both small and large samples;
and it is the most accurate method among both achievement and psychometric
distributions. Van der Waerden’s approximation consistently performs the worst
across sample sizes and distributions. In most cases, Blom’s method comes in
second place and Tukey’s, third. The exceptions are trivial for applied purposes.
It would be useful to perform a more exhaustive empirical study of these
ranking methods to better describe their patterns. It would also be of theoretical
value to analyze the mathematical properties of their differences. More research
can be done in both theoretical and applied domains. However, for the purpose of
normalizing test scores in the social and behavioral sciences, these results suffice.
98
REFERENCES
Aiken, L. R. (1987). Formulas for equating ratings on different scales. Educational
and Psychological Measurement, 47(1): 51-54.
Aiken, L. R. (1994). Psychological Testing and Assessment, 8th
Ed. Boston: Allyn
and Bacon.
Allport, F. M. (1934). The J-curve hypothesis of conforming behavior. Journal of
Social Psychology, 5: 141-183.
American Educational Research Association (AERA), American Psychological
Association (APA), & National Council on Measurement in Education
(NCME) (1999). Standards for Educational and Psychological Testing.
Washington, D.C.: AERA.
Andrews, D. F., Bickel, P. J., Hampel, F. R., Huber, P. J., Rogers, W. H., & Tukey,
J. W. (1972). Robust estimates of location survey and advances. Princeton:
Princeton University Press.
Angoff, W. H. (1971). Scales, norms, and equivalent scores. In Thorndike, R. L.,
Ed. Educational Measurement, 2nd
Ed. Washington, D.C.: American Council
on Education.
—(1984). Scales, Norms, and Equivalent Scores. Princeton: Educational Testing
Service.
Bartlett, M. S. (1947). The use of transformations. Biometrics, 3(1): 39-52.
Retrieved August 6, 2007 from JSTOR database.
99
Blair, R. C. & Higgins, J. J. (1980). A comparison of the power of the Wilcoxon’s
rank-sum statistic to that of the Student’s t statistic under various non-
normal distributions. Journal of Educational Statistics, 5: 309-35.
Blair, R. C. & Higgins, J. J. (1985). Comparison of the power of the paired samples
t test to that of Wilcoxon’s signed-ranks test under various population
shapes. Psychological Bulletin, 97: 119-28.
Bliss, C. I., Greenwood, M. L., & White, E. S. (1956). A rankit analysis of paired
comparisons for measuring the effect of sprays on flavor. Biometrics, 12(4):
381-403. Retrieved March 26, 2007 from JSTOR database.
Blom, G. (1954). Transformation of the binomial, negative binomial, Poisson and
χ2 distributions. Biometrika, 41(3/4): 302-316.
Blom, G. (1958). Statistical Estimates and Transformed Beta-Variables. New York:
John Wiley & Sons.
Box, G. E. P. & Cox, D. R. (1964). An analysis of transformations. Journal of the
Royal Statistical Society, 26: 211-252.
Bradley, R. A. & Terry, M. E. (1952). Rank analysis of incomplete block designs I.
The method of paired comparisons. Biometrika, 39: 324-345.
Bradley, J. V. (1977). A common situation conducive to bizarre distribution shapes.
The American Statistician, 31: 147-150.
Bradley, J. V. (1978). Robustness? British Journal of Mathematical and Statistical
Psychology, 31: 144-152.
100
Brown, B. M. & Hettmansperger, T. P. (1996). Normal scores, normal plots, and
tests for normality. Journal of the American Statistical Association,
91(436):1668-1675. Retrieved August 6, 2007 from JSTOR database.
Cadwell, J. H. (1953). The distribution of quasi-ranges in samples from a normal
population. Annals of Mathematical Statistics, 24: 603-13.
Chang, S. W. (2006). Methods in scaling the basic competence test. Educational
and Psychological Measurement, 66: 907-929
Conover, W. J. (1980). Practical Nonparametric Statistics. New York: John Wiley &
Sons.
Cronbach, L. J. (1976). Essentials of Psychological Testing, 3rd
Ed. New York:
Harper & Row.
Davison, A. C. & Gigli, A. (1989). Deviance residuals and normal scores plots.
Biometrika, 76(2): 211-221. Retrieved August 3, 2007 from JSTOR
database.
Donaldson, T. S. (1968). Robustness of the F-Test to errors of both kinds and the
correlation between the numerator and the denominator of the F-ratio.
Journal of the American Statistical Association, 63: 660-676.
Dunn-Rankin, P. (1983). Scaling Methods. Hillsdale: Lawrence Erlbaum
Associates.
Federer, W. T. (1951). Evaluation of Variance Components from a Group of
Experiments with Multiple Classifications. Iowa Agricultural Experiment
Station Research Bulletin, 380.
101
Fisher, R. A. & Yates, F. (1938). Statistical Tables for Biological, Agricultural and
Medical Research. Edinburgh: Oliver and Boyd.
Fisher, R. A. & Yates, F. (1953). Statistical Tables for Biological, Agricultural and
Medical Research, 4th
Ed. London: Oliver and Boyd.
Friedman, M. (1937). The use of ranks to avoid the assumption of normality
implicit in the analysis of variance. Journal of the American Statistical
Association, 32(200): 675-701.
Galton, F. (1902). The most suitable proportion between the value of first and
second prizes. Biometrika, 1(4): 385-90.
Games, P. A. (1983). Curvilinear transformations of the dependent variable.
Psychological Bulletin, 93(2): 382-387.
Games, P. A. (1984). Data transformations, power, and skew: A rebuttal to Levine
and Dunlap. Psychological Bulletin, 95(2): 345-347.
Games, P. A. & Lucas, P. A. (1966). Power of the analysis of variance of
independent groups on non-normal and normally transformed data.
Educational and Psychological Measurement, 26: 311-327.
Glass, G. V., Peckham, P. D., Sanders, J. R. (1972). Consequences of failure to
meet the assumptions underlying the fixed effect analysis of variance and
covariance. Review of Educational Research, 42: 237-288.
Godwin, H. J. (1949). On the estimation of dispersion by linear systematic
statistics. Biometrika, 36: 92-100.
Gosset, W. S. (“Student”) (1908). The probable error of a mean. Biometrika, 6(1):
1-25.
102
Harter, H. L. (1959). The use of sample quasi-ranges in estimating population
standard deviation. Annals of Mathematical Statistics., 30: 980-99.
Harter, H. L. (1961). Expected values of normal order statistics. Biometrika,
48(1/2): 151-165. Retrieved August 3, 2007 from JSTOR database.
Hastings, C., Mosteller, F., Tukey, J. W., & Winsor, C. P. (1947). Low moments for
small samples: A comparative study of order statistics. Annals of
Mathematical Statistics, 18: 413-26.
Hinkle, D. E., Wiersma, W., & Jurs, S. G. (2003). Applied Statistics for the
Behavioral Sciences, 5th
Ed. Boston: Houghton Mifflin.
Hoaglin, D. C. (2003). John W. Tukey and data analysis. Statistical Science, 18(3):
311-318. Retrieved August 3, 2007 from JSTOR database.
Horst, P. (1931). Obtaining comparable scores from distributions of dissimilar
shape. Journal of the American Statistical Association, 26(176): 455-460.
Retrieved August 23, 2007 from JSTOR database.
Ipsen, J. & Jerne, N. (1944). Graphical evaluation of the distribution of small
experimental series. Acta Pathologica, Microbiologica et Immunologica
Scandinavica, 21: 343-361.
Irwin, J. O. (1925). The further theory of Francis Galton’s individual difference
problem. Biometrika, 17: 100-28.
Kendall, M. G. (1955). Further contributions to the theory of paired comparisons.
Biometrics, 11: 43-62.
Kendall, M. G. & Stuart, A. (1979). The Advanced Theory of Statistics, 4th
Ed., Vol.
2. New York: MacMillan.
103
Kline, P. (2000). Handbook of Psychological Testing, 2nd
Ed. London: Routledge.
Kolen, M. J. & Brennan, R. L. (2004). Test Equating, Scaling, and Linking:
Methods and practices, 2nd
Ed. New York: Springer Science+Business
Media.
Lehmann, E. L. (1975). Nonparametrics: Statistical Methods Based on Ranks. San
Francisco: Holden-Day.
Lester, P. E. & Bishop, L. K. (2000). Handbook of Tests and Measurement in
Education and the Social Sciences, 2nd
Ed. Lanham, MD: Scarecrow Press.
Levine, D. W. & Dunlap, W. P. (1982). Power of the F Test with skewed data:
Should one transform or not? Psychological Bulletin, 92(1): 272-280.
Levine, D. W. & Dunlap, W. P. (1983). Data transformation, power, and skew: A
rejoinder to Games. Psychological Bulletin,93(3): 596-599.
Levine, A., Liukkonen, J., & Levine, D. W. (1992). Predicting power changes under
transformations in ANOVA tests. Communications in Statistics, 21: 679-92.
McCall, W. A. (1939). Measurement. New York: MacMillan.
Mehrens, W. A. & Lehmann, I. J. (1980). Standardized Tests in Education, 3rd
Ed.
New York: Holt, Rinehart and Winston.
Mehrens, W. A. & Lehmann, I. J. (1987). Using Standardized Tests in Education,
4th
Ed. New York: Longman.
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures.
Psychological Bulletin, 105(1): 156-166.
Micceri, T. (1990). Proportions, pitfalls and pendulums. Educational and
Psychological Measurement, 50(4): 769-74.
104
Mosteller, F. (1951). Remarks on the method of paired comparisons: I. The least
squares solution assuming equal standard deviations and equal
correlations. Psychometrika, 16: 3-9.
Netemeyer, R. G., Bearden, W. O. & Sharma, S. (2003). Scaling Procedures:
Issues and Applications. Thousand Oaks: Sage Publications.
Nanna, M. J. & Sawilowsky, S. S. (1998). Analysis of Likert scale data in disability
and medical rehabilitation research. Psychological Methods, 3(1): 55-67.
Nunnally, J. C. (1978). Psychometric Theory. New York: McGraw-Hill.
Osborne, J. W. (2002). Normalizing data transformations. ERIC Digest,
ED470204. Available online: www.eric.ed.gov
Pearson, K. (1895). Contributions to the mathematical theory of evolution: II. Skew
variation in homogeneous material. Philosophical Transactions of the Royal
Society, Series A, 186: 343-414.
Pearson, K. (1902). Note on Francis Galton’s problem. Biometrika, 1(4): 390-9.
Pearson, K. & Hartley, H. O. (1954). Biometrika Tables for Statisticians, I.
Cambridge University Press for the Biometrika Trustees.
Pearson, K. & Pearson, M. (1931). On the mean character and variance of a
ranked individual, and on the mean and variance of the intervals between
ranked individuals. Biometrika, 23: 364-87.
Pearson, E. S. & Please, N. W. (1975). Relation between the shape of a
population distribution and the robustness of four simple test statistics.
Biometrika, 62: 223-241.
105
Pearson, E. S. & Tukey, J. W. (1965). Approximate means and standard
deviations based on distances between percentage points of frequency
curves. Biometrika, 52(3/4): 533-546. Retrieved August 6, 2007 from
JSTOR database.
Petersen, N. S., Kolen, M. J., & Hoover, H. D. (1989). Scaling, norming, and
equating. In R. L. Linn (Ed.), Educational Measurement, 3rd
Ed. New York:
American Council on Education; and Macmillan.
The Psychological Corporation (1955). Methods of expressing test scores. Test
Service Bulletin, 48: 7-10.
Sawilowsky, S., Blair, R. C., & Micceri, T. (1990). A PC FORTRAN subroutine
library of psychology and education data sets. Psychometrika, 55: 729.
Sawilowsky, S. & Blair, R. C. (1992). A more realistic look at the robustness and
Type II error properties of the t test to departures from population normality.
Psychological Bulletin, 111(2): 352-360.
Sawilowsky, S. & Fahoome, G. (2003). Statistics Through Monte Carlo Simulation
with Fortran. Oak Park: JMASM.
Scheffé, H. (1952). An analysis of variance for paired comparisons. Journal of the
American Statistical Association, 47: 381-400.
Simon, H. A. (1955). On a class of skew distribution functions. Biometrika, 42: 425-
440.
Srivastava, A. B. L. (1959). Effect of non-normality on the power of the analysis of
variance test. Biometrika, 46: 114-122.
106
SPSS (2006). Statistical Package for the Social Sciences (SPSS) 15.0 for
Windows. Author.
Stigler, S. M. (1977). Do robust estimators work with real data? The Annals of
Statistics, 5(6): 1055-1098.
Student—See Gosset, W. S.
Tan, W. Y. (1982). Sampling distributions and robustness of t, F and variance-ratio
in two samples and ANOVA models with respect to departures from
normality. Communications in Statistics, A11: 2485-2511.
Tapia, R. A. & Thompson, J. R. (1978). Nonparametric Probability Density
Estimation. Baltimore: Johns Hopkins University Press.
Tarter, M.E. (2000). Statistical Curves and Parameters: Choosing an appropriate
approach. Natick: A K Peters.
Thissen, D. & Wainer, H. (2001). Test Scoring. Mahwah: Lawrence Erlbaum
Associates.
Thorndike, R. L. (1982). Applied Psychometrics. Boston: Houghton Mifflin.
Thurstone, L. L. (1928). Attitudes can be measured. The American Journal of
Sociology, 22(4): 529-554.
Tindal, G. (1987). The effect of different metrics on interpretations of change in
program evaluation. Remedial and Special Education, 8(5): 19-28.
Tippett, L. H. C. (1925). On the extreme individuals and the range of samples
taken from a normal population. Biometrika, 17: 364-87.
107
Tukey, J. W. (1957). On the comparative anatomy of transformations. The Annals
of Mathematical Statistics, 28(3): 602-632. Retrieved March 26, 2007 from
JSTOR database.
Tukey, J. W. (1962). The future of data analysis. The Annals of Mathematical
Statistics, 33(1): 1-67. Retrieved August 3, 2007 from JSTOR database.
Tukey, J. W. & McLaughlin, D. H. (1963). Less vulnerable confidence and
significance procedures for location based on a single sample:
Trimming/Winsorization. Indian Journal of Statistics, 25: 331-351.
Van der Waerden, B. L. (1952/1953a). Order tests for the two-sample problem and
their power. Proceedings Koninklijke Nederlandse Akademie van
Wetenschappen (A), 55 (Indagationes Mathematical 14): 453-458, & 56
(Indagationes Mathematicae 15): 303-316.
Van der Waerden, B. L. (1953b). Testing a distribution function. Proceedings
Koninklijke Nederlandse Akademie van Wetenschappen (A),56
(Indagationes Mathematicae 15): 201-207.
Visual Numerics (1994). IMSL Stat/Library: FORTRAN subroutines for statistical
applications, Vol. 1. Houston: Author.
Walker, H. M. & Lev, J. (1969). Elementary Statistical Methods, 3rd
Ed. New York:
Holt, Rinehart and Winston.
Wilks, S. S. (1948). Order statistics. Bulletin of the American Mathematical Society,
54: 6-50.
108
Wilson, E. B. & Hilferty, M. M. (1929). Note on C. S. Peirce’s experimental
discussion of the law of errors. Proceedings of the National Academy of
Science, 15: 120-125.
Wimberley, R. C. (1975). A program for the T-score normal standardizing
transformation. Educational and Psychological Measurement, 35: 693-695.
Wright, E. N. (1973). Examinations, marks, grades and scales: A working paper.
Ontario: Toronto Board of Education.
Zimmerman, D. W. & Zumbo, B. D. (2005). Can percentiles replace raw scores in
the statistical analysis of test data? Educational and Psychological
Measurement, 65: 613-638. Retrieved March 7, 2007 from
http://epm.sagepub.com
109
ABSTRACT
A COMPARISON OF RANKING METHODS FOR NORMALIZING SCORES
by
SHIRA R. SOLOMON
May 2008
Advisor: Shlomo S. Sawilowsky
Major: Evaluation and Research
Degree: Doctor of Philosophy
Normalizing transformations define the frame of reference for standardized
test score distributions, allowing for meaningful comparisons between tests.
Normalization equalizes the intervals between data points by approximating where
ordinal scores fall along a normal distribution and how much of the corresponding
area under the curve the ranked, cumulative proportions occupy. The most
prominent among such ranking methods are the Blom, Tukey, Van der Waerden,
and Rankit approximations. The purpose of this study was to provide an empirical
comparison of these ranking methods as they apply to standardized test scoring
and interpretation.
A series of Monte Carlo simulations was performed to compare their
accuracy in terms of achieving the T score’s specified mean and standard
deviation and unit normal skewness and kurtosis. Eight nonnormal distributions of
real achievement and psychometric data were used at 10 small and four large
sample sizes. All four ranking methods were found to be accurate on the odd
moments but displayed considerable deviation from target values on the even
110
moments. Standard deviation showed the most variability on both accuracy
measures: deviation from target and magnitude of deviation.
The substantial variability between and within ranking methods on the
standard deviation suggests that practitioners should consider both sample size
and distribution when selecting a normalizing procedure. However, Rankit is the
most accurate method among small and large samples, achievement and
psychometric distributions, and overall. Van der Waerden’s approximation
consistently performs the worst across sample sizes and distributions. These
results indicate that Rankit should be the default selection for score normalization
in the social and behavioral sciences.
111
AUTOBIOGRAPHICAL STATEMENT
SHIRA R. SOLOMON
Prior to her doctoral work in Educational Evaluation and Research at
Wayne State University, Shira Solomon received a Bachelor of Arts in
Comparative Literature from Columbia University (1994), a Bachelor of Arts in
Talmud and Rabbinic Literature from the Jewish Theological Seminary of America
(1994), and a Master of Science in Teaching from the New School for Social
Research (1997).
She has taught English Language Arts in New York City public schools,
English as a Foreign Language at National Taiwan University, English as a Second
Language at Wayne State University’s English Language Institute, and research
methods for allied health and human services at Madonna University and
University of Detroit – Mercy. She has also worked as a technical writer, a
communications writer, and an entertainment writer. She trained in social and
behavioral health research at the Institute of Gerontology and the School of
Medicine at Wayne State University.
Ms. Solomon’s research interests include assessment methodology and
the epidemiological investigation of literacy.
top related