statistical reasoning at the secondary tertiary interfaceeprints.qut.edu.au/16358/3/therese wilson...
TRANSCRIPT
Statistical Reasoning
at the Secondary Tertiary Interface
Therese Maree Wilson
BSc (Hons)
A thesis submitted in fulfilment of the requirements of
the degree of Doctor of Philosophy
in the School of Mathematical Sciences,
Queensland University of Technology.
November 2006
ii
Keywords
statistical reasoning
statistical thinking
statistical literacy
numeracy
secondary/tertiary interface
Rasch analysis
partial credit model
attitudes towards statistics
self-efficacy
assessment
statistical education
introductory data analysis course
mathematical thinking
iii
Abstract
Each year thousands of students enrol in introductory statistics courses at universities
throughout Australia, bringing with them formal and informal statistical knowledge
and reasoning, as well as a wide range of basic numeracy skills, mathematical
inclinations and attitudes towards statistics, which have the potential to impact on
their ability to develop statistically. This research develops and investigates
measures of each of these components for students at the interface of secondary and
tertiary education, and investigates the relationships that exist between them, and a
range of background variables. The focus of the research is on measuring and
analysing levels and abilities in statistical reasoning for a range of students at the
tertiary interface, with particular interest also in investigating their basic numeracy
skills and how these may or may not link with statistical reasoning allowing for other
variables and factors. Information from three cohorts in an introductory data analysis
course, whose focus is real data investigations, provides basis for the research. This
course is compulsory for all students in degree programs associated with all sciences
or mathematics.
The research discusses and reports on the development of questionnaires to measure
numeracy and statistical reasoning and the students’ attitudes and reflections on their
prior school experiences with statistics. Students’ attitudes are found to be generally
positive, particularly with regard to their self-efficacy. They are also in no doubt as
to the links that exist between mathematics and statistics.
The Numeracy Questionnaire, developed to measure pre-calculus skills relevant to an
introductory data analysis course which emphasises real data investigations,
demonstrates that many students who have completed a basic algebra and calculus
senior school subject struggle with skills which are in the pre-senior curricula.
Direct examination of the responses helps to understand where and why difficulties
tend to occur. Rasch analysis is used to validate the questionnaire and assist in the
iv
description of levels of skill. General linear models demonstrate that a student’s
numeracy score depends on the result obtained in senior mathematics, whether or not
the student is a mathematics student, gender, whether or not higher level
mathematics has been studied, self-efficacy and year. The research indicates that
either the pre-senior curricula need strengthening or that exposure to mathematics
beyond the core senior course is required to establish confidence with basic skills
particularly when applied to new contexts and multi-step situations.
The Statistical Reasoning Questionnaire (SRQ) is developed for use in the Australian
context at the secondary/tertiary interface. As with the Numeracy Questionnaire,
detailed examination of the responses provides much insight into the range and
features of statistical reasoning at this level. Rasch analyses, both dichotomous and
polychotomous, are used to establish the appropriateness of this instrument as a
measuring tool at this level. The polychotomous, Rasch partial credit model is also
used to define a new approach to scoring a statistical reasoning instrument and
enables development and application of a hierarchical model and measures levels of
statistical reasoning appropriate at the school/tertiary interface.
General linear models indicate that numeracy is a highly significant predictor of
statistical reasoning allowing for all other variables including tertiary entrance score
and students’ backgrounds and self-efficacy. Further investigation demonstrates that
this relationship is not limited to more difficult or overtly mathematical items on the
SRQ.
Performance on the end of semester component of assessment in the course is shown
to depend on statistical reasoning at the beginning of semester as measured by the
partial credit model, allowing for all other variables. Because of the dominance of
the relationship between statistical reasoning (as measured by the SRQ) and
numeracy on entry, some further analysis of the end of semester assessment is
carried out. This includes noting the higher attrition rates for students with less
mathematical backgrounds and lower numeracy.
v
Contents
CHAPTER 1 INTRODUCTION 1
1.1 Motivation 1
1.2 Aims and Scope of the Study 2
1.3 An Overview of the Statistics Education Literature 2
1.3.1 Statistical literacy, reasoning and thinking 3
1.3.2 Curricula and assessment 7
1.3.3 Research instruments for measuring statistical reasoning 9
1.3.4 Student learning 10
1.3.5 Student background 10
1.3.6 Student attitudes 13
1.3.7 Impact of technology 14
1.4 Literacy, Reasoning, Understanding or Thinking? 14
1.5 Structure of the Research 15
1.6 Outline of the Thesis 18
CHAPTER 2 THE CONTEXT OF THE STUDY 21
2.1 Introduction 21
2.2 The Course: Statistical Data Analysis 1 21
2.3 Demographics and Educational Backgrounds of Students 26
CHAPTER 3 ATTITUDES TOWARDS MATHEMATICS AND
STATISTICS 33
vi
3.1 Introduction 33
3.2 Construction of the 2004 Attitudinal Survey 34
3.3 Results from the 2004 Attitudinal Survey 38
3.3.1 Affective responses of students towards statistics 39
3.3.2 Students’ perceived value in studying statistics 40
3.3.3 The motivation of students to learn statistics 42
3.3.4 The links which students perceive between statistics and mathematics 43
3.3.5 How students see the use of statistics in society 44
3.3.6 Students perceived difficulty of statistics 44
3.3.7 Self-efficacy of students in statistics and mathematics 45
3.4 2004 Attitudes Follow-up 47
3.5 Construction of the 2005 Attitudinal Survey 49
3.6 Results from the 2005 Attitudinal Survey 50
3.7 Discussion 58
CHAPTER 4 NUMERACY 61
4.1 Introduction 61
4.2 Construction of the Numeracy Questionnaire 63
4.3 Results of the Numeracy Questionnaire 68
4.3.1 Consideration of student responses 71
4.3.2 Results from Rasch analysis 81
4.3.3 Levels of thinking 83
4.3.4 What influences students’ scores? 86
4.3.5 What influences responses to individual questions? 92
4.4 Discussion 95
CHAPTER 5 THE STATISTICAL REASONING QUESTIONNAIRE 97
vii
5.1 Introduction 97
5.2 Construction of the Statistical Reasoning Questionnaire (SRQ) 100
5.3 Results of the SRQ 107
5.3.1 Student responses to individual items 108
5.4 Discussion - Suitability of the SRQ 124
CHAPTER 6 RASCH ANALYSES FOR THE STATISTICAL
REASONING QUESTIONNAIRE 127
6.1 Introduction 127
6.2 Dichotomous Rasch Model 127
6.3 Polychotomous Rasch Model 131
6.3.1 Consistency of item step codes and difficulties 147
6.3.2 The SRQPC Score 152
6.4 Expected Responses to the SRQ 154
6.5 Discussion - Suitability of the SRQ 157
CHAPTER 7 FACTORS WHICH INFLUENCE STATISTICAL
REASONING 159
7.1 Introduction 159
7.2 Predictors of Statistical Reasoning of Incoming Students 160
7.2.1 Results for the 2004 cohort 161
7.2.2 Results for the 2005 cohort 162
7.2.3 Combining the 2004 and 2005 cohorts 162
7.2.4 Synthesising the results 164
7.2.5 Recent school leavers 165
7.2.6 Incorporating levels of numeracy 166
7.3 Predictors of the SRQPC Scores 168
viii
7.4 Further Exploration of the Link between Numeracy and Statistical
Reasoning 170
7.5 Predictors of the Exam Component of Assessment 173
7.6 Discussion 178
CHAPTER 8 IMPLICATIONS 183
8.1 Introduction 183
8.2 Implications within the Research 183
8.2.1 Extent and limits of the study 188
8.3 Implications for Teaching, Assessment and Advising 190
8.4 Implications for Future Research 193
APPENDIX A. BACKGROUND INFORMATION SURVEY 196
APPENDIX B. 2004 ATTITUDINAL SURVEY 197
APPENDIX C. 2004 FOLLOW-UP ATTITUDINAL SURVEY 199
APPENDIX D. 2005 ATTITUDINAL SURVEY 200
APPENDIX E. NUMERACY QUESTIONNAIRE 201
APPENDIX F. STATISTICAL REASONING QUESTIONNAIRE 206
APPENDIX G. RESPONSES TO THE SRQ 215
APPENDIX H. PROJECT DESCRIPTION AND CRITERIA 220
APPENDIX I. END OF SEMESTER EXAM 224
BIBLIOGRAPHY 239
ix
List of Figures
Figure 1.1 The concept map illustrates the structure of the literature. 4
Figure 2.1 Years since graduating from high-school, by cohort 28
Figure 2.2 OP scores reported by students for each cohort and each
program type 28
Figure 2.3 Level of maths reported as having been studied for each cohort 30
Figure 2.4 Maths B results by cohort 30
Figure 3.1 Responses for the affect aspect of attitudes tend to be positive. 39
Figure 3.2 Responses for the value aspect of attitudes are generally positive. 40
Figure 3.3 Responses for the motivation aspect of attitudes are somewhat
contradictory. 42
Figure 3.4 Responses for the links aspect of attitudes are strongly positive. 43
Figure 3.5 Responses for the use aspect of attitudes are negative. 44
Figure 3.6 Responses for the difficulty aspect of attitudes are relatively
neutral. 44
Figure 3.7 Responses for the self-efficacy aspect of attitudes tend to be
positive. 45
Figure 3.8 Responses for the Likert items repeated in 2005 51
Figure 3.9 Thoughts on statistics according to whether or not it was
considered beneficial in grade 11 and 12 57
x
Figure 4.1 The distribution of total Numeracy Scores is similar across
the two cohorts. 70
Figure 4.2 The variable map shows the students (on the left hand side) and
items (on the right hand side) displayed on a single logistic scale. 84
Figure 4.3 The residual plots for the model in Equation 4.2 show no
systematic concerns. 89
Figure 4.4 The residual plots for the model in Equation 4.3 show slight
indication of long-tailedness. 91
Figure 5.1 The distribution of SRQ Scores is consistent across the three
cohorts. 108
Figure 6.1 The variable map for the dichotomous Rasch model displays the
item and person parameter estimates. 130
Figure 6.2 The variable map for the Rasch partial credit model displays the
person ability and item step difficulty estimates. 146
Figure 6.3 The distribution of SRQPC Scores is essentially consistent across
the three cohorts. 153
Figure 6.4 Scatterplot showing differences in expected responses between
Maths groups 156
Figure 7.1 Residual plots for the model of Equation 7.1 show no systematic
concerns. 163
Figure 7.2 Residual plots for the model of Equation 7.2 show no systematic
concerns. 164
Figure 7.3 Residual plots for the model in Equation 7.7 indicate some left
skewness. 169
Figure 7.4 Using the transformed response variable, residual plots
demonstrate no systematic concerns. 170
xi
Figure 7.5 Boxplots of Introductory and Intermediate Numeracy Scores by
response to SRQ_9 172
Figure 7.6 Residual plots for the model in Equation 7.7 show some indication
of skewness. 176
xii
List of Tables
Table 2.1 Topics covered in MAB101 23
Table 2.2 Assessment tasks in MAB101 in 2004, 2005 25
Table 2.3 Percentage of male and female students for each cohort 26
Table 2.4 Numbers of students surveyed, by program type, gender and
cohort 27
Table 3.1 What was beneficial – as reported by students who considered
statistics beneficial in grade 11 and 12 54
Table 3.2 What wasn’t beneficial – as reported by students who considered
statistics not beneficial in grade 11 and 12 55
Table 3.3 When I think of probability and statistics at school, I think of … 56
Table 4.1 Course breakdown of student cohort and respondents over two
years 69
Table 4.2 Items and responses to Numeracy Questionnaire in order of
difficulty 72
Table 4.3 Measures of fit from Rasch analysis indicate that the model fits
well. 82
Table 4.4 Significance of predic tors in the logistic regression for each item
on the Numeracy Questionnaire 93
Table 4.5 Student responses to item N_7 according to mathematical
background 95
xiii
Table 5.1 Questions in the SRQ are chosen to assess all aspects of reasoning
at the full range of levels. 106
Table 6.1 Fit statistics for the dichotomous Rasch model 128
Table 6.2 Each item response was coded for the Rasch partial credit model
on the basis of a substantive framework. 136
Table 6.3 Overall fit statistics for the Rasch partial credit model 144
Table 6.4 Individual item fit statistics for the Rasch partial credit model 145
Table 6.5 Expected responses to the SRQ predicted by the Rasch partial
credit model 155
Table 7.1 SRQ items for which successful and unsuccessful students reflect
a significant difference in mean Numeracy Scores 171
Table G Student responses to the SRQ 215
xiv
Statement of Original Authorship
The work contained in this thesis has not been previously submitted for a degree or
diploma at any other higher education institution. To the best of my knowledge and
belief, the thesis contains no material previously published or written by another
person except where due reference is made.
Signature: ____________________
Date: ___________
xv
Acknowledgements
I wish to acknowledge the friends and colleagues whose support and companionship
made this process much easier and far more enjoyable than it would otherwise have
been, in particular:
• my supervisor, mentor and friend, Helen MacGillivray, whose professional
input and personal encouragement got me started and kept me going;
• my husband, Richard, whose commitment and devotion enabled me to focus
on the task;
• my children, Katie and Stephen, whose patience and understanding made it
possible;
• my friend, Christine, who shared the journey and prayed me through it.
Thank you.
Even youths grow tired and weary, and young men stumble and fall;
but those who hope in the Lord will renew their strength.
They will soar on wings like eagles;
they will run and not grow weary,
they will walk and not be faint.
(Isaiah 40:30-31)
Chapter 1 Introduction
1
Chapter 1 Introduction
1.1 Motivation
It is over twenty years since Barry Jones (1982) noted that:
Australia is an information society in which more people are employed in collecting,
storing, retrieving, amending and disseminating data than in producing food, fibres and
minerals and manufacturing products.
It is entirely appropriate then, that each year thousands of students enrol in
introductory statistics courses at universities: some by choice, most as a compulsory
component of a chosen program.
The awareness of this information society is reflected in recent educational reforms,
and the study of statistics has received increasing prominence at the school level,
with Chance and Data now forming a component of all Australian primary and junior
secondary curricula. Hence students enrolling in introductory tertiary courses do so
with varying degrees of statistical skill and experience. Yet it is not only this formal
statistical knowledge with which a student embarks on tertiary study. Students also
bring with them a collection of informal knowledge which they have accumulated
from different sources. As well as both these forms of statistical knowledge, a
number of components such as basic numeracy, mathematical inclinations, attitudes
towards statistics and self-efficacy also have the potential to impact on a student’s
ability to develop statistical thinking within the context of an introductory tertiary
course.
With the growth of statistics as a discipline in its own right, there has been an
increasing tendency amongst some of the statistical community to focus on the
differences between mathematics and statistics rather than the commonalities. This
has resulted in an inclination to devalue the role of numerical and mathematical skill
in the ability of students to develop statistical thinking. Establishing and quantifying
Chapter 1 Introduction
2
the importance of this role has the potential to facilitate a better understanding of the
programs and support which students need in order to develop their statistical
thinking at the secondary/tertiary interface.
1.2 Aims and Scope of the Study
The fundamental aim of this research is to investigate, explore and model the
statistical thinking of students at the interface of secondary and tertiary education. A
profile of students entering a general first year statistics unit is formed with respect
to: mathematical and general academic background; attitude, beliefs and self-efficacy
with regard to mathematics and statistics; basic numeracy; and statistical reasoning.
Relationships between these factors as well as course outcomes are explored, with
the strength of the relationship between statistical reasoning and numeracy being of
particular interest.
While the research has been conducted within the context of a single specific
introductory data analysis subject, it is believed that the broad findings are applicable
across a range of such courses, particularly within the Australian setting where
statistics courses are traditionally included in an undergraduate program. The subject
under investigation is conducted in a School of Mathematical Sciences and
completed within programs associated with science. While this setting is not
uncommon, particularly in Australia, in the past such students have less often been
the subjects of research into the development of statistical thinking than their social
science counterparts.
1.3 An Overview of the Statistics Education Literature
In the past ten to twenty years there has been a considerable increase in research in
the area of the teaching and learning of statistics, to the point where statistical
education is now a research field in its own right, supporting entire journals such as
the Journal of Statistical Education (JSE) and the Statistical Education Research
Chapter 1 Introduction
3
Journal (SERJ) as well as regular international conferences, of which the most
celebrated is the International Conference on Teaching Statistics (ICOTS), and
numerous local meetings throughout the world. Because of the breadth of issues
which touch on the research reported in this thesis, it is fitting to include an overview
of the broader field in which this work sits. The concept map in Figure 1.1 indicates
the links between the different aspects of the literature and assists in illustrating the
complexities of the research.
1.3.1 Statistical literacy, reasoning and thinking
The concepts of statistical understanding, literacy, reasoning and thinking form the
basis of much work and it is generally agreed that the attainment or improvement of
one or more of these should be the ultimate goal of all statistics courses (Garfield, et
al., 2002). However, in much of the literature, the terms statistical understanding,
statistical literacy, statistical reasoning and statistical thinking are used
interchangeably and frequently without definition (delMas, 2002a). Rumsey (2002),
Garfield (2002) and Chance (2002) seek to clarify the differences between three of
these concepts in a collection of articles, published following a symposium at the
2000 Annual Meetings of the American Educational Research Association (AERA).
Further elaborations and clarifications in continuation of this work are attempted in
Ben-Zvi and Garfield (2004).
The term statistical literacy most often refers to the type of understanding needed to
be a good consumer of statistics. After quoting a number of definitions for statistical
literacy, from
… the ability to understand statistical concepts and reason at the most basic level
(Snell, 1999)
to
People’s ability to interpret and critically evaluate statistical information and data-based
arguments appearing in diverse media channels, and their ability to discuss their
opinions regarding such statistical information (Gal, 2000),
Chapter 1 Introduction
4
Figure 1.1 The concept map illustrates the structure of the literature.
Statistical Literacy, Reasoning & Thinking
How it works
What to assess
What to teach
How to teach
How to assess
Student Learning
What How
Technology
Gaps in
skills
Attitudes &
Beliefs
Misconceptions
Bac
kgro
und
Course
Self-
Efficacy
What it is Reasoning
Chapter 1 Introduction
5
Rumsey (2002) elects instead to define two different concepts: statistical competence
as
the basic knowledge that underlies statistical reasoning and thinking
and statistical citizenship as
the ultimate goal of developing the ability to function as an educated person in today’s
age of information.
Her concept of statistical competency then mirrors Snell’s early definition of
statistical literacy, while her statistical citizenship reflects Gal’s early definition.
In a more recent exposition of statistical literacy, Gal (2004) rejects a common
understanding of statistical literacy as defining a basic set of skills required to
function successfully, for a broader understanding which involves the ability of
citizens to be successful data consumers. This relies upon the ability to interpret and
critically evaluate statistical information and to discuss and communicate reactions to
this information. Based on this definition, Gal builds a model of statistical literacy
founded on both knowledge and dispositional elements.
By analysing over 3000 school students’ responses in a number of studies, Watson
and Callingham (2003) argue that statistical literacy is a hierarchical construct. They
identify six levels of statistical literacy from idiosyncratic literacy where personal
beliefs and experience dominate engagement with context, through to critical
mathematical literacy which requires a questioning engagement with context and
proportional reasoning.
Garfield and Gal (1999) broadly define statistical reasoning as
the way people reason with statistical ideas and make sense of statistical information.
On the basis of research into students’ reasoning about samples, Garfield (2002)
further defines five hierarchical levels of statistical reasoning, ranging from
idiosyncratic reasoning, where the student knows and uses some statistical words
and symbols but often without understanding, through to integrated process
reasoning, where the student fully integrates all components, resulting in a complete
Chapter 1 Introduction
6
understanding. These levels are similar to Watson and Callingham’s six levels of
statistical literacy, but the different setting and reliance on verbal descriptors makes
them difficult to compare directly.
Statistical thinking generally refers to having a statistical mindset or approach to
problems. Chance (2002) lists a number of definitions, all of which include the
importance of data, the presence of variability and the modelling or explanation of
variation.
In an attempt to develop a framework for the workings of statistical thinking as
practised by statisticians, Wild and Pfannkuch (1999) put forward a four dimensional
model in which each of the dimensions: the investigative cycle, types of thinking, the
interrogative cycle and dispositions, is described in detail. Aside from some further
elaboration (Pfannkuch and Wild, 2004), this thorough and complex model has
received little further development. It describes a level of thinking higher than that
defined by Chance (2002) and beyond the level of thinking at the secondary/tertiary
interface.
In describing statistical reasoning and attempting to distinguish it from statistical
thinking, delMas (2004) acknowledges that statistical reasoning and thinking may
both be involved when working on a single task, and that this implies that they
cannot be distinguished by the content of a problem. He claims however that the two
can be distinguished by the nature of a task. He describes statistical thinking as
involving tasks such as knowing when and how to apply statistical procedures, and
statistical reasoning as involving tasks such as explaining why results were produced
or how a conclusion is justified. There is an implication that statistical thinking
requires a degree of creativity which is not involved in statistical reasoning.
The term statistical understanding is even less clearly defined than the other three
concepts. Even within the series of articles aimed at clarifying statistical literacy,
reasoning and thinking, the use of statistical understanding varies. DelMas (2002a)
introduces the articles by describing statistical literacy, reasoning and thinking as
“types of understanding”, implying that statistical understanding is a broader concept
encompassing the other three. Garfield defines reasoning in terms of understanding
and Rumsey uses statistical understanding without any clear definition. In
Chapter 1 Introduction
7
summarizing the articles by Chance, Garfield and Rumsey, delMas (2002b) refers to
“statistical understanding, reasoning and thinking”, presumably taking statistical
understanding to be synonymous with statistical literacy.
Despite attempts to clarify these terms, there remains a degree of overlap, such that
continuing imprecision of their usage within the literature is inevitable. Broers
(2006) argues that, on the basis of this confusion and imprecision, a more appropriate
goal for statistics education is that of statistical knowledge which he argues is more
consistently definable and measurable.
1.3.2 Curricula and assessment
Clarification and modelling of the concepts and workings of statistical understanding
lead to an increased focus on what should be taught and assessed in introductory
statistics courses and how such teaching and assessment can be most effectively
carried out. Hogg (1991) calls educators to task for blaming difficulties on ill-
prepared and unmotivated students, while being poor teachers with little interest in
self- improvement. He emphasises the needs for goals to be carefully selected and
course-specific. Moore (1997) pushes even more strongly for statisticians to
reconsider the content of introductory courses. His suggestion to emphasise
statistical thinking, rather than theory and computation, through the use of more data,
teaching of more concepts and fewer recipes and derivations, along with the
automation of computations and graphics, is now the norm for many courses. As a
result of studying the relationship between students’ perceptions of learning and
teaching, Petocz and Reid (2003) call for teachers to focus less on content and more
on encouraging students to appreciate the wider impact of thinking statistically.
Moore (1997) also emphasises a change in the context of statistical learning. He
argues that
our teaching must avoid the professionals’ fallacy of imagining that our first courses are
a step in the training of statisticians.
On the other hand, opening statistical vistas for undergraduate students provides
them with a greater understanding of the purpose and structure of statistics (Sowey,
1998). While the majority of students in such courses may never become
Chapter 1 Introduction
8
statisticians, some will and, hopefully, some will be attracted to statistics through
such courses (Wild, 2006). The danger in Moore’s approach is that the needs of
these students will be ignored by failing to provide the breadth and depth necessary
on which to base further study or even a perspective which encourages it. Indeed
much of the research in statistics education is performed from the perspective of
students who are uncomfortable with mathematics with little regard for the more
mathematically capable.
While the considerable discussion surrounding course content has resulted in
substantial changes to most curricula (Garfield, et al., 2002), there remains a
significant need to document the effects of this change, identify the most effective
educational techniques and build models of how students come to understand
statistical concepts (Chance and Garfield, 2002). In one effort to identify effective
educational techniques, delMas et al. (1999) report on research into the use of
computer simulations to improve student understanding of sample distributions and
the Central Limit Theorem. They report that student understanding is optimised by
requiring students to predict the nature of the sample before carrying out simulations.
Meletiou-Mavrotheris and Lee (2002), believing that students’ statistical
understanding is limited by a deterministic mindset, improve student outcomes in an
introductory, non-mathematical course by adapting the curriculum to include
numerous activities aimed at specifically creating an awareness of variation in
everyday contexts.
Changes in content have lead to changes in assessment practices with the call for
teachers to assess what they value (Chance, 2002). However, Garfield et al. (2002),
in a survey of US statistics educators, report that of all areas of statistics education,
assessment practices have undergone the least reform. Garfield and Gal (1999) claim
that traditional tests are narrow and insufficient. Their list of alternative assessment
items includes
• individual or group projects
• portfolios of student work
• concept maps
Chapter 1 Introduction
9
• critiques of statistical ideas or issues in the news
• objective format questions aimed at assessing higher level thinking
• minute papers, where students have a minute immediately following a class
to write something about what they have learnt.
Assessment practices can have an explicit educational impact. MacGillivray (1998;
2002; 2005) explains how the use of own-choice group projects can synthesise
knowledge for meaningful use.
1.3.3 Research instruments for measuring statistical reasoning
Aside from the assessment issues within the context of a specific course, is the
question of how to assess the broader concepts of statistical literacy, reasoning and
thinking. While believing that these concepts are best assessed via one-on-one
communication and in-depth tasks such as projects, Garfield (2003) acknowledges
the need, particularly for use in research, for the development of an instrument which
is both easy to administer and score, and which accurately measures statistical
reasoning. The Statistical Reasoning Assessment (SRA) was developed and
validated to measure the effectiveness of a new statistics curriculum for high-school
students in the US (Konold, 1990; Garfield, 1991, 2003) and has been used in a
variety of settings since. This instrument consists of twenty multiple-choice
questions, which purport to provide scores on eight separate scales of correct
reasoning and eight scales of incorrect reasoning. However, tests of validity and
reliability have yielded low scores (Garfield, 1998). Of particular concern are the
extremely low correlations achieved between SRA scores and course outcomes
(Garfield, 2003). Whether this is an indication that
statistical reasoning and misconceptions are unrelated to students’ performance in a first
statistical course,
as Garfield suggests, or an indication of a weakness in the instrument is unknown.
An adaptation of the SRA by Sundre (2003) is aimed at addressing issues of low
internal consistency and improving the ease of scoring. Tempelaar (2004) argues
that a better use of the SRA is to combine correct and incorrect reasoning scores, and
Chapter 1 Introduction
10
uses this approach in a structural equation model which analyses relationships
between statistical reasoning abilities, attitudes and learning approaches (Tempelaar,
2006).
A pen and paper survey has also been designed to assess school students’
understanding of variation (Watson, et al., 2003). Sixteen multipart questions require
short answer responses and often an explanation of the students’ reasoning, making
this instrument more difficult to score than Garfield’s SRA. Responses are then
coded and combined using an Item Response model which suggests four increasing
levels of understanding: prerequisites for variation, partial recognition of variation,
applications of variation and critical aspects of variation.
1.3.4 Student learning
Just as teaching practices necessarily influence what students learn, an improved
understanding of how students learn should likewise impact good teaching practices.
In a study of school students from grade 3 to 9, Watson and Moritz (1999) identify
two unistructual-multistructua l-relational (U-M-R) learning cycles operating in the
development of understanding relating to the comparison of two data sets.
Responses in the first cycle compare data sets of equal size but do not recognise the
issue of non-equal sample size, while responses in the second cycle recognise and
resolve this issue. Similar learning cycles are identified in the development of
students’ concept of average (Watson and Moritz, 2000).
Petocz and Reid (2001) use in-depth interviews and phenomenography to model
students’ different concepts of learning in statistics. They describe students’
conceptions in terms of six hierarchical levels: doing, collecting, applying, linking,
expanding and changing. At the lowest level, doing, students see learning as simply
performing the required statistical activities to pass the assessment. At the highest
level, changing, learning is seen as being about using statistical concepts in order to
change their views.
1.3.5 Student background
Constructivist pedagogy is based on the belief that students
Chapter 1 Introduction
11
construct their own knowledge by combining their present experiences with their
existing conceptions (Moore, 1997).
Hence it is important when modelling student learning, to consider the impact of a
number of background issues, including students’ prior statistical abilities,
mathematical skills and attitudes towards statistics.
Gal (2004) comments that:
... the current knowledge base about statistical literacy of school or university students
and of adults in general is patchy,
and points to the need for further research and empirical information. A profile of
what statistical understanding can be expected of school students in the areas of data
collection, data tabulation and representation, data reduction, and interpretation and
inference is being constructed based on Rasch modelling (Reading, 2002).
A survey of introductory statistics students in the US (Albert, 2003) shows that
students generally do not have a clear understanding of probability. While their
ability to calculate probabilities under an equally likely assumption is well
developed, they are much less able to use a frequency interpretation or a subjective
viewpoint of probability.
Konold (1995) indicates that students enter statistics courses with strongly held but
incorrect intuitions which are often extremely difficult to alter and that the situation
is further complicated by a student’s ability to simultaneously hold multiple
contradictory beliefs. Considerable time and effort has been put into documenting
and explaining a variety of misconceptions which people apply to make sense of
probabilities in everyday situations (Kahneman, et al., 1982; Shaughnessy, 1992).
The representative heuristic, by which people determine the likelihood of an event by
how well it represents a population, has received considerable attention. Under the
representative heuristic, the sequence of coin tosses HTTH is considered to be more
likely than the sequence HHHH. Hirsch and O’Donnell (2001) construct an
instrument to identify students who hold this misconception. Konold (1989) argues
that many misconceptions are the result, not of incorrect probabilistic reasoning, but
of an approach which always aims to predict the outcome of a single trial, rather than
Chapter 1 Introduction
12
the likelihood of a single event in a series of trials. He calls this the outcome
approach. Garfield and Chance (2000) include scales for measuring eight such
misconceptions in their Statistical Reasoning Assessment instrument.
Although statistics is now frequently considered not to be a subfield of mathematics,
it is nonetheless true that statistics makes heavy and essential use of mathematics
(Moore, 1997).
Gal (2004) touches briefly on the numeracy requirements for statistical literacy,
acknowledging that there is some debate among statisticians regarding the level of
mathematical skill needed to understand statistical concepts. It is acknowledged as
one of the challenges of teaching statistics that:
Many students have difficulty with the underlying mathematics (such as fractions,
decimals, algebraic formulas), and that interferes with learning the related statistical
content. (Ben-Zvi and Garfield, 2004)
However, perhaps as a result of the democratisation of mathematics (Vere-Jones,
1995), there has been little formal research into the effect of mathematical
background on the development of statistical understanding in students. Cuthbert
and MacGillivray (2003) show that gaps in assumed skills form a barrier for learning
in mathematics courses for engineers. Gnaldi (2003) shows that in a statistics course
for psychologists, the statistical understanding of students at the end of the course
depends on students’ basic numeracy, rather than the number or level of previous
mathematics courses the students have undertaken. In modelling statistics as a
second language, Lalonde and Gardner (1993) conclude that mathematical ability
influences success in an introductory statistics course for psychologists both directly
and through mathematical anxiety. An analysis of background factors of students in
a UK business statistics unit shows that there are no simple predictors of success or
failure (Pokorny and Pokorny, 2005).
Before the effect of mathematical ability on statistical understanding can be
considered, the level of such ability amongst students needs to be understood. While
studies such as those conducted by the International Association for the Evaluation
of Educational Achievement (IEA) have investigated the level of mathematical skill
Chapter 1 Introduction
13
of school students, little formal research has been undertaken into the numeracy
levels of students entering university.
1.3.6 Student attitudes
Gal et al. (1997) point to the influence of attitudes and beliefs about statistics on
student learning. A number of instruments have been designed to measure attitudes
and beliefs towards statistics. Roberts and Bilderback (1980) construct the Statistics
Attitude Survey (SAS) using a 5-point Likert scale with 34 items. Wise (1985)
argues that many items on the SAS measure past achievement rather than attitude
and are inappropriate if students have no prior statistical experience. He proposes
instead another 5-point Likert scale: the Attitudes Toward Statistics (ATS). Roberts
and Reese (1987) compare the SAS and ATS on 280 students and conclude that there
is little difference between the two instruments with a correlation of 0.88. Waters et
al. (1988) reach a similar conclusion on the basis of 302 students. Sutarso (1992)
designs a 10-point Likert scale purported to consist of six factors: student’s interest
and future applicability, relationship and impact of the instructor, attitude toward
statistical tools, self-confidence, parental influence, and initiative and extra effort in
learning statistics. A survey consisting of 28 7-point Likert-type items designed by
Schau et al. (1995) includes scales of affect, cognitive competence, value and
difficulty.
Gal et al. (1997) stress the need to develop ways, other than Likert-type scales, of
assessing attitudes. In particular they point to the need to allow students to explain
their responses. All existing instruments are constructed from the supposition that
many student difficulties with statistics are the result of anxiety. While items are
framed in both a positive and negative light, there are no items that suggest that
statistics is dull or simplistic, as more mathematically capable students sometimes
report.
According to Bandura’s (1986) social cognitive theory, self-efficacy, that is one’s
own confidence in one’s ability to succeed in a particular task, strongly influences
one’s actual ability to succeed in that task. Self-efficacy influences choices, thought
patterns, reactions, effort expended and perseverance while struggling with a task.
Pajares and Millerb (1995) provide evidence that, as judgements of self-efficacy are
Chapter 1 Introduction
14
task-specific, accurate measures of self-efficacy need to be based on task-specific
items in the same area. Finney and Schraw (2003) argue that previously used
measures of self-efficacy in statistics are too general in nature to accurately measure
the construct. They develop the Current Statistics Self-Efficacy (CSSE) and Self-
Efficacy to Learn Statistics (SELS) scales, which have been trialled on an
introductory statistics course in educational psychology. The difficulty with their
scales is that the items are so specific to the material to be covered in the course and
to which students have had no previous exposure, that scores on the CSSE
administered at the beginning of the course are predictably low and uncorrelated with
course outcomes.
1.3.7 Impact of technology
Pea (1987) describes the effect of technology on mathematics education in terms of
an amplifier metaphor, i.e. that with computers students can carry out many more
calculations in a much shorter time and with much higher accuracy, but with minimal
change to the quality of education. By comparison, Ben-Zvi (2000) proposes a more
optimistic reorganisation metaphor, arguing that appropriate use of technology can
bring about structural change by allowing the student to focus on higher order tasks.
This belief places improved use of technology in the role of a major mechanism for
the reform in statistics education which the increased presence of technology has
largely motivated, as,
The increasing use of computers, not just within the discipline but in society in general
has placed an increasing premium on qualitative reasoning in general and on statistical
reasoning in particular (Cobb, 1999).
1.4 Literacy, Reasoning, Understanding or Thinking?
The terms statistical understanding, statistical reasoning, statistical literacy and
statistical thinking are all used, often interchangeably, in the literature. Definitions
and elaborations of these terms have been described in Section 1.3.1. Because a
degree of overlap exists between these concepts and in order to simplify discussion,
Chapter 1 Introduction
15
we will use the term statistical reasoning to encompass statistical reasoning and
literacy and, at a fundamental level, statistical thinking. At its most in-depth level,
statistical thinking refers to the complexity of thinking utilised by professional
statisticians and described by Wild and Pfannkuch (1999). The statistical thinking
included in this research is restricted to the introductory level.
1.5 Structure of the Research
This study has been conducted within the context of an introductory data analysis
course delivered to undergraduate students over three cohorts: semesters I and II
2004 and semester I 2005. A small pilot study was conducted in the summer
semester prior to semester I 2004. The students’ courses are broadly associated with
science and the majority are enrolled in their first semester of tertiary education.
Ethical approval for the research was granted by the university’s ethics committee,
and students who participated did so on a voluntary basis after indicating their
agreement in writing.
Data were collected by the use of questionnaires administered during class. These
questionnaires were used to gather information on students’ mathematical and
general educational backgrounds, attitudes towards mathematics and statistics, basic
numeracy skills and statistical reasoning. Students were asked to provide their
student identification number as a means of connecting the different surveys. The
participation rate was very high amongst those students who attended classes during
the survey period. In a large course such as this there are a substantial number of
students who do not attend classes at any time during semester and a significant
number who join after the first week. Approximately seventy-five percent of
enrolled students participated in the study over the three cohorts. Some data from
course assessment are also used in part of the analysis.
Although a number of instruments exist for measuring attitudes towards statistics, it
was felt that none completely met the needs of this study as their design had been for
use in non-mathematical or non-scientific service courses. A Likert-style instrument
was designed, covering the aspects of affective reactions towards statistics; the
Chapter 1 Introduction
16
perceived value in studying statistics; the perceived difficulty of statistics; motivation
to study statistics; perceived links between mathematics and statistics; the use of
statistics in society; and students’ self-efficacy regarding mathematics and statistics.
In the area of self-efficacy, particular attention was given to constructing items which
were specifically relevant to an introductory data analysis course but also within the
students’ previous experience.
This initial Attitudinal Survey was administered in Semester I 2004 with a similar
follow-up survey at the end of semester. Due to the small number and non-
representative nature of respondents to the follow-up survey, as well as the
complexities of interpretation of the meaning of comparisons of initial and follow-up
responses, a follow-up procedure was not completed with later cohorts. As initial
analyses with the 2004 data indicated that only the self-efficacy component of the
attitudinal survey was a significant predictor of statistical reasoning, the opportunity
was taken in 2005 to adjust the survey, including items more specific to high school
experiences of statistics. It should be noted that attitudes and beliefs about statistics
do not constitute a major emphasis of this research but are included for completeness
and to improve the strength of conclusions regarding the investigation into statistical
reasoning and its relationship with variables such as numeracy and student
background.
Although there has been an increasing trend in recent years to develop and
administer diagnostic surveys to incoming tertiary students to gauge their levels of
mathematical preparation, the instruments used for these surveys have tended to be
constructed to meet specific needs. A tool designed specifically to measure the pre-
calculus skills relevant to an introductory data analysis course and appropriate at the
secondary/tertiary interface could not be located in the literature and needed to be
constructed for this survey. The multiple-choice Numeracy Questionnaire designed
for this purpose was administered at the beginning of semester to each of the three
cohorts in the study. The mathematical skills on which the Numeracy Questionnaire
focuses are those skills commonly associated with an introductory data analysis
course, namely: handling of fractions, percentages and decimals; application of
operations; evaluation of simple expressions; substitution into expressions; and
Chapter 1 Introduction
17
handling simple equalities and inequalities. Use of calculators was not permitted for
the Numeracy Questionnaire.
The Statistical Reasoning Questionnaire (SRQ) was developed specifically for this
study to assess the statistical reasoning of Australian students at the
secondary/tertiary interface. This instrument was informed by the SRA, a US
instrument which was not entirely appropriate in the Australian context, and by
research at the Australian primary and secondary school level. The short-answer,
multiple-choice SRQ was administered at the beginning of semester to each of the
three cohorts in the study with the addition of one question and substitution of
another in 2005. Use of calculators was permitted for the SRQ.
The analysis of the Numeracy Questionnaire and the construction and analysis of the
Statistical Reasoning Questionnaire, together with the analysis of relationships
between these and other variables, form the major thrust of this research.
The pressures of course requirements and the need to maximise student cooperation
prevented a follow-up Statistical Reasoning Questionnaire from being administered
as part of this study. Indeed, what would constitute a suitable follow-up statistical
reasoning instrument at the end of an introductory tertiary course is a matter for
further research. However, the performance of students in course assessment should
be considered as an important aspect of their statistical development, provided the
assessment is both authentic and relates to objectives that address statistical
reasoning. Hence the end of semester data from the course assessment are used as a
measure here, recognising that richer measures may be developed in future research.
This research uses quantitative statistical procedures to analyse relatively large data
sets. Because of the lack of previous research which considers the different aspects
of students’ development at this level, this research is investigative and exploratory
in nature. Descriptive procedures and exploratory data analysis are used to explain
the results of each questionnaire. Item response theory, in the form of Rasch
dichotomous and partial credit models are used to validate the Numeracy and
Statistical Reasoning Questionnaires and to further understand the implication of
student responses. Logistic regression is used to identify factors which determine
whether a student’s response to individual items on the Numeracy Questionnaire is
Chapter 1 Introduction
18
likely to be correct or incorrect. Modelling of numeracy and statistical reasoning is
done through general linear models, the use of which is substantiated through
residual analysis. More complex modelling procedures such as structural equation
modelling are considered inappropriate without some form of pre-existing model.
1.6 Outline of the Thesis
Chapter 2 describes the context of this research in terms of the course, MAB101
Statistical Data Analysis 1, and the students who are the subjects of the study. The
course is outlined, including its purpose, aims, content and methods; and the student
population of MAB101 is described through the information obtained from the
Background Information Survey completed by students as part of the study.
In Chapter 3, the construction and findings of the Attitudinal Surveys are described.
Findings from the initial 2004 survey demonstrate a generally positive attitude
towards statistics with students being confident of the links that exist between
mathematics and statistics. They are, however, suspicious of its use in society and
demonstrate internal conflict regarding their motivation to study it although their
confidence in their ability to do so is high. A follow-up survey which was completed
by fewer students appears to indicate that the attitudes of this smaller group tend to
be less positive at the end of the semester than they are at the beginning but there are
difficulties with the interpretation of follow-up results. These and the reasons for not
repeating the follow-up procedure in 2005 are also discussed. In an adjusted 2005
survey, the belief of many students is that statistics at school is beneficial only in the
final two years of high school.
Chapter 4 develops and analyses a Numeracy Questionnaire to measure the level of
basic numeracy possessed by students at the interface of secondary and tertiary
education, embarking on a degree program associated with science. The structure of
this multiple-choice questionnaire, emphasising skills needed for an introductory
data analysis subject, is explained and the results of the questionnaire amongst
MAB101 students over two years are presented and discussed. Validity of the
questionnaire is confirmed through fitting a dichotomous Rasch model to the data.
Chapter 1 Introduction
19
This model is also used to define five levels of increasing understanding
demonstrated by the students. General linear models are used to show that students’
total scores on the Numeracy Questionnaire are related to their result in high school
mathematics, whether or not they can be considered to be a maths student, their
gender, the level of mathematics they have previously studied and their self-efficacy.
Logistic regression is also applied to each individual item on the Numeracy
Questionnaire to identify which of these factors are significant indicators of success
or failure for the item. This chapter demonstrates how tertiary educators need greater
awareness of the lack of emphasis on mathematical skills in the pre-senior years, and
the extent of consolidation needed at the senior school level for students to be able to
apply these skills at the tertiary level. The results of this chapter are of significance
in themselves and have been accepted for publication as a paper Counting on the
Basics: Mathematical skills amongst tertiary entrants (Wilson and MacGillivray,
2007).
In Chapter 5, we introduce an instrument, the Statistical Reasoning Questionnaire
(SRQ), for use at the interface of secondary and tertiary education. Construction of
the SRQ is described, drawing on the work of Garfield (1991; 2003), and Watson
and Callingham (2003). Particular attention is given to avoiding questions based on
combinatorial reasoning and those relying on examples of coin tossing and dice
throwing. Complete details of the responses of the students in the study to the
individual items of the SRQ are given, demonstrating the range of abilities present in
students at the secondary/tertiary interface.
In Chapter 6, the Statistical Reasoning Questionnaire is analysed using the
techniques of Rasch methods. Two different approaches are taken. In the first
approach, responses are scored dichotomously and the simple Rasch model fitted as
with the Numeracy Questionnaire in Chapter 4. This dichotomous approach is used
as a basis for the definition of the SRQ Score. In the second approach, responses are
scored polychotomously and the more complex Rasch partial credit model fitted to
the data. This model forms the foundation for the introduction of a score which we
will denote by the SRQPC Score. The construction of this SRQPC Score is a
significant development and extension of the work of Watson and Callingham (2003)
and applies their framework of statistical literacy in a new scoring approach. The
Chapter 1 Introduction
20
Rasch partial credit model is also used to investigate the expected responses of
students to individual items in the SRQ, indicating questions which are heavily
influenced by the level of mathematics previously studied. This chapter is concluded
with a critical examination of the SRQ confirming its suitability for use as an
instrument for measuring statistical reasoning at the secondary/tertiary interface.
Chapter 7 specifically addresses the major aim of this study in better understand ing
factors which influence statistical thinking at the secondary/tertiary interface. In this
chapter, two distinct aspects of this are considered. The first of these examines
performance on the Statistical Reasoning Questionnaire for incoming students, using
general linear models to describe each of the SRQ Score and SRQPC Score in terms
of students’ numeracy, attitudes, mathematical backgrounds and demographic
variables. The major implication of this modelling is the usefulness and importance
of the students’ Numeracy Scores in explaining statistical reasoning. This feature is
such that numeracy dominates the prediction in nearly all models, with both
introductory and intermediate components being significant. The second aspect
considers students’ performance on the end of semester section of the assessment. In
this case, modelling is dominated by the effect of tertiary entrance score with the
SRQPC Score helping to explain the result in a way the SRQ Score does no t. The
relationship between numeracy and individual items of the SRQ is also investigated
more closely, indicating that the link between numeracy and statistical reasoning is
not restricted to items which are more difficult or more obviously mathematical by
nature.
The thesis concludes with a summary of the research and its implications for the
teaching, assessment and advising of students. Possibilities for further research are
considered.
Chapter 2 The Context of the Study
21
Chapter 2 The Context of the Study
2.1 Introduction
This study has been conducted within the context of an int roductory data analysis
course delivered to undergraduate students. In this chapter, this context is described
in terms of the course and the students. Section 2.2 outlines the course, MAB101
Statistical Data Analysis 1: its purpose, aims, content and me thods. Section 2.3
describes the student population of MAB101 through the information obtained from
the Background Information Survey completed by students as part of this study.
2.2 The Course: Statistical Data Analysis 1
Statistical Data Analysis 1, MAB101, is delivered by the School of Mathematical
Sciences at the Queensland University of Technology. It services students enrolled
in a range of science-oriented degree programs including: Mathematics, all the areas
of Applied Science, Biotechnology Innovation and Education; and double degree
programs involving mathematics or science, such as: Applied Science/Mathematics,
Applied Science/Information Technology, Applied Science/Law, Applied
Science/Business, Applied Science/Education, Arts/Applied Science,
Mathematics/Information Technology, Engineering/Mathematics and
Mathematics/Business. For almost all students in these programmes, MAB101 is a
compulsory subject, most commonly scheduled for the first year and with the biggest
groups in the first semester of tertiary study. The course is given thrice each year,
including a summer semester with small numbers of students. In 2004, 450 students
enrolled in it in first semester and 130 in second semester, while in 2005, 330
students enrolled in it in first semester, and 180 in second.
Chapter 2 The Context of the Study
22
MAB101 is unusual in the sense that while it acts as a service course for most
students, for others it is a core course in their major field of study. Students enrolled
in an applied science degree majoring in biochemistry, for example, may be unlikely
to fit in any other mathematics or statistics course during their three year program,
while those undertaking a Bachelor of Mathematics, a Bachelor of Applied Science
majoring in Mathematics, or a double degree incorporating the Bachelor of
Mathematics, may complete up to half of their mathematics program in statistics.
This dual role means that MAB101, by necessity, aims to equip all students with the
tools necessary to apply basic statistical analyses to data drawn from a range of
contexts, with minimal mathematical derivation, while providing a foundation for
techniques studied in further statistics courses.
According to the official course description, MAB101 aims to provide students with
the essential grounding in statistical concepts, methods and analysis of data suitable for
application to real issues and as a basis for handling data and variation in all areas of
modern science, technology, industry and associated fields.
It builds on the statistics component of high school mathematics, commencing with
the organisation, exploration and presentation of data. It emphasises choice of
appropriate techniques for presentation and analysis of data, interpretation of results
and reporting of conclusions. Key statistical skills and concepts are provided as a
foundation for more advanced statistics courses.
MAB101 teaches introductory data analysis from exploratory data analysis through
to multiple regression. Aspects of probability included in the course are limited to
the estimation of probabilities from data; understanding p-values; and calculating
normal probabilities. Probability and basic distribution theory are covered in a
separate course taken by fewer students. Material which is frequently taught in an
introductory data analysis course but is not included in MAB101 includes:
probability rules, distributions and mathematical aspects of introductory statistics.
All ANOVA and regression in MAB101 are done through computer software. Also,
models and simple formulae are made available in words or symbols with students
free to choose their preferred form. Table 2.1 lists the topics which are covered in
Chapter 2 The Context of the Study
23
the course together with the approximate number of 50 minute lectures spent on each
section of material.
Topic Lectures Planning investigations, collecting, handling and presenting data: types of data; types of variables; coding data; design of data spreadsheet; bar charts; pie charts; contingency tables; dot plots; histograms; stem & leaf plots; boxplots; scatterplots.
4
Data features & summary statistics: mean; median; quartiles; standard deviation; variance; skewness; parameters, models & estimates.
2
Chi-squared tests (used to introduce hypothesis testing & p-values): testing proportions; testing independence.
3
Normal data and applications of normal distribution: working with normal probabilities; behaviour of sample statistics, particularly average.
3
Interval estimation: confidence interval for µ, s known and unknown; confidence interval for µ1 - µ2, s 1, s 2 known; confidence interval for µ1 - µ2, s 1 = s 2 unknown; confidence interval for µ1 - µ2, paired; confidence interval for p; tolerance intervals; calculating sample size for desired precis ion.
5
Hypothesis testing for one or two means, variances and proportions: for µ = µ0, s known and unknown; for µ1 - µ2 = d, s 1, s2 known; for µ1 - µ2 = d, s 1 = s 2 unknown; for µ1 - µ2 = d paired. for s 1 = s 2; for p = p0; for p1 = p2; for s = s 0.
4
ANOVA: one way; randomized blocks; two way, interaction, plots, general linear model; multiple comparisons; residuals, test for equal variance.
6
Regression: simple linear; residual plots and diagnostics; multiple regression.
6
Table 2.1 Topics covered in MAB101
Chapter 2 The Context of the Study
24
The content of MAB101 is presented to students in a manner that is rich in both data
and contexts. All topics are introduced, explained and illustrated using examples
which are of interest to students and with features relevant to the wide range of
disciplines from which the students are drawn. Most of these examples use data sets
which have been collected by past students as part of their assessment. These have a
range of variables and any particular data set is often used in different sections of the
course to illustrate selection as well as application of techniques, and the component
parts of a complex data investigation.
Classes in MAB101 consist of three 50 minute lectures per week and one 50 minute
practical class. Lectures build on the textbook Data Analysis: Introductory methods
in context (MacGillivray, 2004) and any PowerPoint slides or extra examples or
discussion are made available to students online after lectures have been delivered.
Much lecture time is devoted to examples, questions and computer demonstrations.
A weekly online diary keeps students informed of the activities carried out in classes.
In practical classes the focus is on students applying the methods they have learned
to given data sets using the statistical package Minitab. Ample guidance is provided
in written form and by experienced tutors. Students are also actively encouraged to
attempt the exercises given in the textbook (which either do not rely on computer
usage or are exercises on using given computer output), with solutions provided
online progressively throughout the course. Students are continually reminded that
learning in statistics comes only via doing, and course delivery and assessment are
structured to encourage students to take this approach. Analysis of variance and
regression are covered only through use of statistical software.
Assistance with the core learning of the course is provided through continuous
assessment, namely: fortnightly quizzes, a mid-semester test and a group project.
The full assessment schedule for 2004, 2005 is shown in Table 2.2. Quizzes consist
mainly of multiple-choice and fill- in-the-blank questions, but these questions are
carefully designed to emphasise the selection, application and interpretation of
techniques. Because the main purpose of the quizzes is to encourage learning, after
the first quiz, students are allowed up to five days to complete each quiz. The mid-
semester test, like the end of semester exam, consists of questions which are longer
than, but similar in style to, those used in quizzes. All the assessment is constructed
Chapter 2 The Context of the Study
25
around real datasets with computer output to be used in answering questions. As the
datasets usually involve many variables, a single dataset is often used more than once
to assess different aspects of the course.
A significant facilitator of learning in MAB101 is the group project. Groups of three
to four students choose a topic, plan an investigation, collect and analyse data and
report on their study. Students are required to synthesise the concepts and techniques
presented in the course as they work through the planning, investigating, analysing
and reporting stages of the project. The benefits to learning of this style of task have
been reported elsewhere (MacGillivray, 1998, 2002; 2005). A copy of the project
description and criteria is included in Appendix H.
Assessment Weight
Fortnightly quizzes best 5 out of 6 10% total
Mid-semester test 10%
Group project 20%
End of semester exam max 60%
Optional essay optional 10%1
Table 2.2 Assessment tasks in MAB101 in 2004, 2005
The topics listed in Table 2.1 describe the statistical content which forms the basis of
MAB101. The methods and techniques of this content are the fundamental skills
which students who complete the course with a sound passing grade can be expected
to take from the course and with them into their own area of study. The statistical
skills assessed within the course constitute the measurable (Broers, 2006) and
therefore assessable goals of the course.
The skills assessed within MAB101 incorporate statistical thinking, although the
degree to which this is acquired by students depends on whether they participate in
1 If a student chooses to submit an optional essay (on aspects of how statistics revolutionised science in the twentieth century) it is worth 10% and the exam is scaled to 50%, provided this advantages the student.
Chapter 2 The Context of the Study
26
all the learning experiences, as well as their inherent capabilities and background.
The content and data rich environment described above is the means by which
statistical thinking is taught. Varied contexts provide the opportunities to develop an
appreciation of the omnipresence of variation as well as a “general awareness and
critical perspective” (Gal, 2004) of the use of statistical methods. The use of real
data sets, particularly including numerous variables extraneous to the method being
illustrated, encourages in the student’s mind the formation of links and relationships
between concepts and techniques. This in turn develops a mindset which is, in
essence, statistical thinking.
2.3 Demographics and Educational Backgrounds of Students
Data for this study was collected from a total of 782 MAB101 students (382 in
semester I 2004, 103 in semester II 2004, and 297 in semester I 2005) who provided
identifying information and completed at least one survey instrument. Of these
identifiable students, participation rates for the Background Information Survey (see
Appendix A) completed in classes during the first week of semester were: 90% in
first semester 2004, 67% in second semester 2004, and 82% in first semester 2005.
Demographic information such as gender and course enrolment could be obtained for
most students who had not completed the Background Information Survey, while
information such as mathematical background could not.
In all three cohorts there were similar numbers of male and female students with
slightly more males in both first semester groups and slightly more females in the
second semester group. (See Table 2.3.)
Gender I_04 II_04 I_05 Total
female 49 52 47 48
male 51 48 53 52
Table 2.3 Percentage of male and female students for each cohort
Chapter 2 The Context of the Study
27
Students were drawn from 36 different university programs. In Table 2.4, these are
shown by cohort and have been classified as Education, including double degrees
involving education (Edu); Mathematics, including double degrees one of which is
mathematics, and Applied Science, maths major (Mat); Other double degrees
(ODD); Applied Science degree, non-maths major (ASn); Biotechnology (Bit); Other
(Oth). Over half the students are non-maths majors enrolled in an Applied Science
degree.
I_04 II_04 I_05 Total Program type male female male female male female (%)
Edu 23 11 8 5 12 17 9.7
Mat 42 20 12 0 36 21 17.2
ODD 7 9 8 9 10 13 7.2
ASn 101 131 19 39 79 73 56.5
Bit 16 14 0 1 13 13 7.4
Oth 4 0 2 0 7 2 1.9
Table 2.4 Numbers of students surveyed, by program type, gender and cohort
Students were also classified as maths or non-maths students, based on their program
and specific subject enrolment. This variable, used in analyses in Chapters 4 and 7,
indicates essentially an interest in mathematics and probably an intention for further
study in the area. As well as those classified in Table 2.4 as being in a Mathematics
program, approximately 85% of the education students and a small number of other
double degree and other students, were classified as maths students. Over all
cohorts, 26% were classified as maths students, the figure ranging from 23% in
second semester 2004 to 28% in first semester 2005. Over the three cohorts there is
dependence between gender and maths students (p-value < 0.001) with 64% of maths
students and 47% of non-maths students being male. There is however substantial
variation between cohorts with the percentage of maths students who are male
ranging from 56% in first semester 2005 to 79% in second semester 2004.
Chapter 2 The Context of the Study
28
Per
cent
Cohort
YSS
II_04
I_05
I_04
>106-103-5210
>106-103-5210
>106-103-5210
60
50
40
30
20
10
0
Years Since School
Percent within levels of Cohort. Miss ing valu es excluded. Figure 2.1 Years since graduating from high-school, by cohort
Cohort
OP
II_04(62)I_05(211)I_04(197)
18
16
14
12
10
8
6
4
2
0
Program
OP
Do uble_ deg (38)Maths (87)Educat io n(43)Biotech(33)Sci_non -maths(260)
16
14
12
10
8
6
4
2
0
Figure 2.2 OP scores reported by students for each cohort and each program
type. Smaller OP values indicate higher achievement. Values in brackets represent number of students responding to this item.
Chapter 2 The Context of the Study
29
Students were asked in the Background Information Survey to report the year in
which they completed high school. Figure 2.1 illustrates the responses for those 76%
of identifiable students who completed this item. In the two first semester cohorts,
55% to 60% of students had completed high school in the previous year with another
18% in their second year out of school. In the second semester 2004 cohort fewer
students (45%) had completed high school in the previous year and 23% in the year
before. This difference is due largely to the different scheduling of MAB101 in
some programs or to students changing courses. In all three cohorts approximately
97% of students had completed 12 years of schooling and almost 90% had attended
high school in Queensland. For approximately 75% of students in both first semester
cohorts this was their first semester at QUT.
Tertiary entrance in Queensland is determined largely on the basis of a single score,
the OP (overall position) score. OP scores range from 1 to 25, with 1 being the
highest score. As part of the Background Information Survey, students were asked to
record their OP score or an equivalent measure. Scores which were reported in
another form (e.g. tertiary entrance scores from other Australian states) were
converted to an OP equivalent using standard national information (QUT, 2004).
Because the survey used in first semester 2004 was an adaptation of a form routinely
completed by students, this section was described as optional and hence a number of
students who completed the Background Information Survey in the first cohort did
not report their OP score. The non-response rate for this item was 43% in that cohort
whereas it was only 10 to 15% in the other two cohorts. While the indications are
clear that this non-reporting of OP scores is non-random, it is difficult to predict the
exact nature of the bias. Although it appears that large numerical scores are
underreported, this cannot be assumed in any formal analysis. For this reason
considerable care needs to be exercised for this particular cohort when interpreting
any statistical models involving OP scores. (See Chapters 4 and 7.) Figure 2.2
shows the distribution of reported OP scores for each cohort and according to the
program division used in Table 2.4, except for the small group (9) of Others.
The median OP for each of the three cohorts is 7. The largest OP cut-off for any of
the programs represented in MAB101 is 13 (for an applied science degree and some
education degrees). Most programs have a cut-off of 10 or 12 but some courses
Chapter 2 The Context of the Study
30
represented here have cut-offs as tight as 4 (for a science/law double degree).
Students reporting an OP score outside these cut-offs are generally mature age
students for whom special entry rules apply. Note that the median OP for students
undertaking a mathematics program is 3 although the cut-off for this course is 12.
Per
cent
Cohort
level of math s
II_04
I_05
I_04
above_B
Ma th
s _B
below_ B
above_B
Math
s_B
below_ B
above_B
Math
s_B
below_B
60
50
40
30
20
10
0
Percen t within levels o f Cohort. Figure 2.3 Level of maths reported as having been studied for each cohort.
Per
cent
Coho rtMathsB_Result
II_04I_05I_ 04NPDNPDNPD
60
50
40
30
20
10
0
Percent within levels of Coho rt. Figure 2.4 Maths B results by cohort
Chapter 2 The Context of the Study
31
Students were also asked to report the level of mathematics which they had studied at
high school and any mathematics courses which they had studied since leaving
school. A wide variety of mathematical backgrounds were reported. For the purpose
of this study, these backgrounds have been summarised by comparison with the core
Queensland algebra and calculus based subject, Maths B. Maths B, or an equivalent
standard is assumed, but not enforced, prior knowledge for entry to the Science
faculty and hence into MAB101. Students are classified as having studied a level of
mathematics below Maths B, at Maths B or above Maths B. Students classified as
having studied above Maths B had either taken an extension mathematics subject in
high school (e.g. Maths C in Queensland) or had already studied some tertiary level
mathematics. The results of this classification are shown by cohort in Figure 2.3.
For the two first semester cohorts, (of the 92 to 95% of students from whom this
information was provided) 4 to 6% were classified as below Maths B, 38% above
Maths B and the remainder at Maths B. For the second semester 2004 cohort, (of the
88% who responded) only 2% were below Maths B, 50% at Maths B and 48% above
Maths B.
It was also noted, from responses to the survey question on previous mathematics
courses, which students had some prior exposure to a statistics course, with the
percentage being 6 to 7% for each cohort. This information however has not been
used in any of the analyses of statistical reasoning as the level of statistics exposure
in this group is exceedingly variable and includes courses which cover little more
statistics than the standard high school curriculum with Maths B, as well as students
who had previously enrolled in MAB101 but withdrawn or failed. It should be noted
that most of the “failures” in this course are students who attend little of the course.
Results of high school mathematics subjects (and higher subjects where they had
been studied) were also requested from students and provided by 80% of identifiable
students. These results were variously expressed depending on the particular subject
studied. The most convenient classification for use in analyses was to describe the
Maths B (or equivalent) result as one of three levels: D, for students obtaining a
distinction or higher (the top two levels on a school-based five-point scale); P, for
other passing grades; N, for those students who had failed to pass or not studied
Maths B. These results are shown in Figure 2.4.
Chapter 2 The Context of the Study
32
For all three cohorts the proportion of distinctions is between 55 and 60%. Not
surprisingly there are strong relationships between the Maths B result and both the
level of maths studied and whether or not a student is a maths student. For maths
students the proportion obtaining a D as a Maths B result is 87% and for students
who have studied beyond Maths B the proportion is 82%.
Combining the information on all demographic and background variables, it is
apparent that a good deal of consistency exists between the two first semester
cohorts. The variable which notably departs from this trend is the reported OP score
and, as has been described, this departure is clearly due to non-random, non-
reporting by students in semester I 2004. In most aspects, the semester II cohort
departs more noticeably but not substantially from the other two cohorts. This cohort
was also considerably smaller than the other two.
Chapter 3 Attitudes Towards Mathematics and Statistics
33
Chapter 3 Attitudes Towards Mathematics and
Statistics
3.1 Introduction
Gal and Ginsberg (1994) assert that many students arrive at statistics courses with
affective reactions and attitudes towards statistics which impact upon both their
learning process and learning outcomes, and argue the importance of educators being
sensitive to these factors. The purpose of including measures of non-cognitive
factors in this study was firstly to improve understanding of the profile of students
entering the introductory data analysis course, MAB101, and secondly to investigate
the possible impact of these factors on statistical learning.
While authors such as McLeod (1992) enunciate differences between the concepts of
attitudes and beliefs towards mathematics, this study follows the nomenclature of
most statistical education research, using the term attitudes to include both attitudes
and beliefs. The understanding is that these are relatively stable responses which
involve both positive and negative feelings (unlike emotions which are generally
unstable).
A number of instruments exist for measuring attitudes towards statistics. The
Statistics Attitude Survey (Roberts and Bilderback, 1980), the Attitudes Towards
Statistics (Wise, 1985), the Students’ Attitudes Towards Statistics (Sutarso, 1992)
and the Survey of Attitudes Towards Statistics (Schau, et al., 1995) are all based on
five to ten point Likert scales which claim to measure one or more constructs related
to a student’s attitudes towards statistics. Although some debate has surrounded
these as to which is the best scale, it was felt that none completely met the needs of
this study. In general these instruments have been designed for use in non-
mathematical and frequently non-scientific service courses where students may be
expected to demonstrate a considerable level of mathematical anxiety. Accordingly
Chapter 3 Attitudes Towards Mathematics and Statistics
34
these scales fail to include items which allow for a student’s frustration with lack of
mathematical connections as may be experienced by more mathematically capable
students in the context of early statistical experiences. As MAB101 is delivered by a
mathematical sciences department and services mathematics majors and mostly
scientifically-based students, it was considered important to gauge the full range of
student attitudes in this study.
This chapter describes the construction and results of the Attitudinal Surveys
administered to students in this study. Section 3.2 explains the survey conducted in
2004 while in Section 3.3 the subsequent findings are described. A follow-up
Attitudinal Survey was administered at the end of the semester in 2004, the results of
which are explained in Section 3.4. Section 3.4 also explains the difficulties with the
follow-up results and the reasons for not repeating a follow-up in the following year.
As only one component of the 2004 Attitudinal Survey, that of self-efficacy, was
significant in the analysis of Statistical Reasoning (described in Section 7.2), it was
decided to adjust the Attitudinal Survey in 2005, retaining only the self-efficacy
component and replacing the other items with a selection of items which focuses on
statistical experiences at the school level. The 2005 Attitudinal Survey is described
in Section 3.5 and the responses analysed in Section 3.6. Section 3.7 concludes this
chapter with a discussion of the results.
3.2 Construction of the 2004 Attitudinal Survey
Existing surveys of attitudes towards statistics are all formulated in the style of Likert
scales, from five to ten point. In a Likert scale respondents are requested to rate their
level of agreement with a selection of statements. Although Gal et al. (1997) suggest
a move away from this style of instrument, the ease of administration and scoring,
together with the need to maximise the cooperation of students who were being
requested to complete a number of questionnaires, made a Likert scale the instrument
of choice for this study. Students were asked to rate their level of agreement with
nineteen items from strongly disagree through to strongly agree. Space was provided
on the reverse side of the form for students to explain their responses. However,
Chapter 3 Attitudes Towards Mathematics and Statistics
35
very few students took this opportunity and the majority of those who did added very
little information to what they had supplied via the ratings. Hence all analysis and
discussion of responses in this chapter are restricted to the Likert scale responses.
The Survey of Attitudes Toward Statistics (Schau, et al., 1995) measures attitudes on
the basis of four separate subscales: affect, value, difficulty and cognitive
competence. These subscales directed the construction of the 2004 Attitudinal
Survey used in this study. Our survey includes items focusing on affect and value,
combines cognitive competence with difficulty, and also includes motivation,
perceived links between mathematics and statistics, use and self-efficacy. The
complete 2004 Attitudinal Survey can be found in Appendix B.
As is the custom with Likert scales, the survey contains a mix of positively and
negatively worded items to discourage unthinking responses. When calculating
scores for groups of items, scoring is reversed for those items which are negatively
worded.
Measures of affect focus on the feelings and emotions which tend to be engendered
in the student by statistics. In the 2004 Attitudinal Survey, the following three items
measure affect:
• Statistics is boring (A_1);
• I don’t like statistics because there never seems to be a right or wrong answer
(A_5);
• I feel insecure when I have to do a statistics problem (A_6).
Item A_5 investigates the feelings that students might have regarding the uncertain
nature of statistics. Sometimes students with a strong mathematical or scientific bent
entertain a black and white approach to the world and so feel uncomfortable with the
degree of subjective interpretation sometimes required in data analysis.
Attitudes regarding value reflect the importance or worth which students attribute to
their learning of statistics. This may be in relation to their field of study, their future
employment prospects or perhaps their daily life. In this survey, value is measured
by four items:
Chapter 3 Attitudes Towards Mathematics and Statistics
36
• Statistics will be valuable in my chosen career (A_4);
• Statistical skills will make me more employable (A_8);
• I use statistics in my everyday life (A_9);
• Understanding statistics is important in modern society (A_12).
Schau et al. (1995) differentiate between cognitive competence and difficulty. They
use cognitive competence to describe how difficult the individual personally finds
statistics, while difficulty describes the individual’s broader perception of the
difficulty and complexity of statistics. In formulating the 2004 Attitudinal Survey
one item was chosen from each of these aspects and the two have been combined
under the broad heading of difficulty. The two items are:
• I find statistics easy (A_3);
• Statistics is a complicated subject (A_7).
A further attitudinal aspect which is investigated by the survey is the students’
motivation to learn statistics. Two items relate to this aspect:
• I want to learn more statistics (A_11);
• I am taking this statistics unit only because I have to (A_14).
From the apparent contradictory nature of these items, one might expect a high
degree of negative correlation between the responses.
A further aspect of attitude which was considered relevant to this study was the link
which students perceive as existing between mathematics and statistics which we
measured via the items:
• I would do better at statistics if I were better at maths (A_10);
• If you are good at maths you are more likely to understand basic statistical
concepts (A_13).
As the growth of statistics as a discipline in its own right has taken place, and the
emphasis in statistical education on reasoning rather than computation has
Chapter 3 Attitudes Towards Mathematics and Statistics
37
developed, the impression is sometimes given by researchers that any association in
the minds of students between mathematics and statistics is likely to be a barrier to
statistical learning. For example, Gal et al. (1997) refer to:
Beliefs about the extent to which statistics is a part of mathematics or requires
mathematical skills (e.g. statistics is all computations).
This assumption and its logical consequences need closer examination. For students
who have a positive attitude towards mathematics, such a link surely encourages a
positive attitude towards statistics.
One item has been included in the survey to measure students’ attitude towards the
use of statistics in society. This item
• Statistics can be used to justify almost anything (A_2),
constitutes the aspect use.
The final five items of this survey are aimed at obtaining a broad measure of
students’ self-efficacy regarding mathematics and statistics. Self-efficacy is one’s
confidence in one’s own ability to succeed in a particular task. As outlined in
Chapter 1, the emphasis in the literature on the need for measures of self-efficacy to
be task-specific has led to previous measures consisting largely of items with which
beginning students would be unfamiliar. This has been avoided in the Attitudinal
Survey by focusing on students’ perception of their ability regarding aspects crucial
to an introductory statistics course, to which they should have had previous exposure,
as well as more general aspects of ability. The five self-efficacy items are:
• I am good at maths (A_15);
• I expect to do well in this unit (A_16);
• I am not confident of my ability to read and interpret information presented
graphically (A_17);
• I expect to be able to do the computing necessary for this unit (A_18);
• I expect to have trouble determining which procedure to use to answer
questions (A_19).
Chapter 3 Attitudes Towards Mathematics and Statistics
38
For each of the nineteen items on the 2004 Attitudinal Survey, students were asked to
rank their level of agreement on a five-point Likert scale from strongly disagree
through to strongly agree. When scores were formulated from the survey, responses
were scored as -2, for strongly disagree, through to 2, for strongly agree, with the
scoring reversed for items which are negatively framed. Scores were calculated for
each of the seven attitudinal aspects described above by averaging over the items in
that aspect. If a general attitude scale is desired, it can be calculated as the total score
over the aspects: affect, value, motivation, use and one of the two difficulty items2.
We believe that the links aspect should not be included in such a scale as we feel
there is some question over which direction constitutes a more positive general
attitude. Self-efficacy is possibly best kept as a separate scale.
3.3 Results from the 2004 Attitudinal Survey
As with all instruments in this study, the 2004 Attitudinal Survey was completed by
MAB101 students in class during the first week of semester. The survey was
completed by 301 students in first semester and a further 94 students in second
semester. Despite the differences between these cohorts described in Chapter 2, the
pattern of response for most items in the Attitudinal Survey was very similar for the
two cohorts. In order to discuss differences where they do occur, the two cohorts
have been kept separate in the discussion which follows.
Figures 3.1 to 3.7 illustrate the distribution of responses for each cohort within each
item. Items are grouped by aspect. The ordering of responses in the plots (from
strongly disagree through to strongly agree, or from strongly agree through to
strongly disagree) has been chosen so that, for each item, responses from left to right
demonstrate what is generally considered to be an increasingly positive attitude. For
the link items, which could be ordered in either direction, greater acknowledgement
of a link between mathematics and statistics has been taken as a more positive
attitude.
2 Item A_7 which states: Statistics is a complicated subject, is possibly best excluded as it could be seen to possibly reflect greater exposure to statistics rather than a negative attitude.
Chapter 3 Attitudes Towards Mathematics and Statistics
39
3.3.1 Affective responses of students towards statistics
Figure 3.1 indicates that the feelings engendered by statistics are generally positive,
with one item, A_1 “Statistics is boring” having a very strong neutral tendency.
Both the neutrality of this item and the general positivity of the affect aspect are a
little less pronounced in second semester than in first.
Coh
ort
SDDNASA
I_04
II_04
Each symb ol rep resents up to 5 observations .
A_1 Statistics is boring
Coh
ort
SDDNASA
I_04
II_04
Each symb ol rep resents up to 5 observations .
A_5 I don't like statistics because there never seems to be a right or wrong answer
Figure 3.1 continued over
Chapter 3 Attitudes Towards Mathematics and Statistics
40
Coh
ort
SDDNASA
I_04
II_04
Each symb ol rep resents up to 4 observations .
A_6 I feel insecure when I have to do a statistics problem
Figure 3.1 Responses for the affect aspect of attitudes tend to be positive.
3.3.2 Students’ perceived value in studying statistics
Figure 3.2 indicates that these students have a strong positive attitude regarding the
value of statistics in society. This is particularly evident with respect to the value of
statistics for future employment and in modern society. However, this positivity
does not extend to the value of statistics in daily life. The pattern of response to this
item (A_9) is interesting in its even split among “disagree”, “neutral” and “agree”,
rather than a strong neutral tendency. This item demonstrates the diversity of
appreciation which students have for the role of statistics in everyday life. There are
no apparent differences in the value aspect between the two cohorts.
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 5 observations .
A_4 Statistics will be valuable in my chosen career
Figure 3.2 continued over
Chapter 3 Attitudes Towards Mathematics and Statistics
41
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 6 observations .
A_8 Statistical skills will make me more employable
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 4 observations .
A_9 I use statistics in my everyday life
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 6 observations .
A_12 Understanding statistics is important in modern society
Figure 3.2 Responses for the value aspect of attitudes are generally positive.
Chapter 3 Attitudes Towards Mathematics and Statistics
42
3.3.3 The motivation of students to learn statistics
Figure 3.3 indicates that the students’ attitudes regarding motivation are somewhat
contradictory. While there is strong agreement with the statement “I am taking this
statistics unit only because I have to,” there is also a relatively positive response to “I
want to learn more statistics.” Given that MAB101 is a compulsory unit for most of
its students, the first of these is not surprising and the second should therefore be
taken as an encouraging approach in spite of this compulsion. There is some
suggestion that the proportion of students responding “strongly agree” to the first of
these items (A_14) is slightly higher in second semester than in first.
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 4 observations .
A_11 I want to learn more statistics
Coh
ort
SDDNASA
I_04
II_04
Each symb ol rep resents up to 5 observations .
A_14 I am taking this statistics unit only because I have to
Figure 3.3 Responses for the motivation aspect of attitudes are somewhat
contradictory.
Chapter 3 Attitudes Towards Mathematics and Statistics
43
3.3.4 The links which students perceive between statistics and mathematics
Figure 3.4 indicates that these students are in no doubt as to the links that exist
between mathematics and statistics. This is strongly evidenced in the non-personal
item A_13. The less positive personal link in A_10 should be interpreted in the light
of responses to A_15 “I am good at maths” (see Figure 3.7). Given that the students
generally see themselves as being good at maths, agreement that they would do
better at statistics if they were better at maths should be interpreted as
acknowledgement of a strong link between the two.
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 4 observations .
A_10 I would do better at statistics if I were better at maths
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 7 observations .
A_13 If you are good at maths you a re more likely to understand ba sic statistical conce pts
Figure 3.4 Responses for the links aspect of attitudes are strongly positive.
Chapter 3 Attitudes Towards Mathematics and Statistics
44
3.3.5 How students see the use of statistics in society
Figure 3.5 indicates a generally negative attitude towards the use of statistics in
society. Because of the wording of the question, this attitude is probably best
interpreted as one of cynicism. C
ohor
t
SDDNASA
I_04
II_04
Each symb ol rep resents up to 5 observations .
A_2 Statistics can be used to justify almost anything
Figure 3.5 Responses for the use aspect of attitudes are negative.
3.3.6 Students perceived difficulty of statistics
Figure 3.6 indicates a relatively neutral attitude with regard to difficulty. The
attitudes are a little more negative with regard to the complexity of the subject than
the personal experience of it.
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 5 observations .
A_3 I find statistics easy
Figure 3.6 continued over
Chapter 3 Attitudes Towards Mathematics and Statistics
45
Coh
ort
SDDNASA
I_04
II_04
Each symb ol rep resents up to 4 observations .
A_7 Statistcs is a complicated subject
Figure 3.6 Responses for the difficulty as pect of attitudes are relatively neutral.
3.3.7 Self-efficacy of students in statistics and mathematics
Figure 3.7 indicates that these students have a decidedly positive self-efficacy
regarding mathematics and statistics. The single item A_19 regarding ability to
choose a correct procedure has a neutral response. In general the responses are very
similar for the two cohorts although the second semester group is a little more
confident regarding their ability to do well in the subject. This is most likely due to
greater experience at university.
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 5 observations .
A_15 I am good at maths
Figure 3.7 continued over
Chapter 3 Attitudes Towards Mathematics and Statistics
46
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 5 observations .
A_16 I expect to do well in this unit
Coh
ort
SDDNASA
I_04
II_04
Each symb ol rep resents up to 6 observations .
A_17 I am not confident of my ability to read a nd interpret information presented graphically
Coh
ort
SAANDSD
I_04
II_04
Each symb ol rep resents up to 6 observations .
A_18 I expect to be able to do the computing necessary for this unit
Figure 3.7 continued over
Chapter 3 Attitudes Towards Mathematics and Statistics
47
Coh
ort
SDDNASA
I_04
II_04
Each symb ol rep resents up to 4 observations .
A_19 I expect to ha ve trouble de termining which proce dure to use to answe r questions
Figure 3.7 Responses for the self-efficacy aspect of attitudes tend to be positive.
In summary, the students demonstrate a generally positive attitude towards statistics.
They are, however, suspicious of its use in society and demonstrate internal conflict
regarding their motivation to study it. The first and second semester cohorts of 2004
had very few and only minor differences between them.
3.4 2004 Attitudes Follow-up
In first semester 2004, students were asked to complete a follow-up attitudes survey
during the final week of the course. The form of this was identical to the initial
survey apart from minor changes in wording which made the survey appropriate to
this timing. The complete follow-up survey is found in Appendix C.
While 301 students had completed the survey at the beginning of the semester, only
194 completed the follow-up, 56 of whom could not be matched with their initial
data. Hence there were only 138 students for whom initial and end of semester
attitudes could be compared.
Comparing final and initial attitudes of this smaller group of students showed that the
students’ attitudes tended to be less positive at the end of the semester than they had
been at the beginning. For each of the attitudinal aspects, affect, value, motivation,
links, difficulty and self-efficacy, the follow-up score was significantly less than the
Chapter 3 Attitudes Towards Mathematics and Statistics
48
initial score (p<0.001 from pairwise comparisons within each aspect). The only
aspect for which this was not the case was the single item aspect use where the
follow-up score was just significantly greater than the initial score (p=0.05), possibly
reflecting a less cynical attitude towards statistics.
However, the general approval rating for the course is high, particularly for a
compulsory course, with 65% of students rating the course as good or very good and
a further 31% rating it as satisfactory in the official QUT student evaluation of the
course.3 Given this apparent inconsistency, it is important to consider more carefully
how to interpret the follow-up results.
One issue which should be considered is the change during the course in students’
perspectives. Section 3.3 described the generally positive attitudes of the incoming
students. Observation and informal feedback from students during semester has
indicated that students tend to find during the course that there is more to statistics
than they had previously believed. This belief may have been encouraged by a
repetitive and relatively trivial coverage of statistics at school level. Hence the
demonstrated change in attitudes may reflect a growing awareness of the depth and
complexity of statistics as students are exposed to a wider range of applications and
techniques. The slight evidence of a decrease in cynicism demonstrated in the use
aspect may be related to this growing appreciation. Thus it is possible that the
changes in attitudes from the beginning to the end of a course such as MAB101 are
complex.
A second factor to consider is the obvious confounding which exists with the time of
follow-up. The low response rate at follow-up is not only a problem in itself but is
also indicative of this, namely that students are experiencing maximum pressure as
the semester draws to a close. This is a time when significant assessment is both
current and impending in this and all other courses. Many students skip classes when
under deadline pressure. Most are feeling overwhelmed with work. It is therefore
not unexpected that such feelings would considerably influence the results of an
attitudinal survey.
3 These results were obtained in I_2005. I_2004 were surveyed by different means, but II_2004 results were only slightly lower.
Chapter 3 Attitudes Towards Mathematics and Statistics
49
A third factor which could introduce further complexity is that the respondents for
the follow-up survey are not likely to be a random selection from the initial
respondents. Whether they are more or less likely to be struggling students, and
therefore the nature of any likely bias in their attitudes, is not known.
Finding a better time to administer the follow-up attitudinal survey is a complex
problem to solve. If follow-up is attempted sufficiently before the end of the
semester as to overcome this difficulty, then students will not have experienced the
complete impact of the course. The only other alternative, to survey students after
the completion of the course, is equally unsatisfactory. Although this could be easily
carried out using email lists, it would be unlikely that many students would respond
and in such situations it is usually found that those who do respond tend to be those
who are more emphatic in their attitudes than the complete cohort.
As described below in Section 3.5, the Attitudinal Survey was changed in 2005 to
more closely investigate experiences of statistics at the school level. This change,
together with the complications of interpretation described above, meant that follow-
up surveys were not conducted with subsequent cohorts in the study.
3.5 Construction of the 2005 Attitudinal Survey
After the completion of the 2004 data collection stage of this study, initial modelling
was carried out to investigate relationships between students’ backgrounds (see
Chapter 2), attitudes, basic numeracy skills (see Chapter 4) and statistical reasoning
(see Chapter 5). (Chapter 7 describes the relationships between these and the
techniques and results of this modelling over the entire study.) All of the attitudinal
aspects described above (Section 3.2) were included as possible predictors of
statistical reasoning, while only the self-efficacy component was considered as a
relevant predictor for numeracy. In these initial models, self-efficacy was the only
attitudinal aspect which was convincingly significant in contributing to the
explanation of either numeracy or statistical reasoning. An interaction between
Numeracy Score and affect was significant in modelling statistical reasoning but was
seen as being due to a small number of unusual observations. Some of the other
Chapter 3 Attitudes Towards Mathematics and Statistics
50
aspects appeared significant in complex and seemingly spurious interactions for
subgroups of students. For these reasons, apart from the self-efficacy aspect, much
of the 2004 Attitudinal Survey was not administered in 2005. Rather, the
opportunity was taken to survey the students more explicitly on their school
experience of statistics.
In 2005, students were asked to respond to six Likert-style items: A_15 to A_19 on
self-efficacy and A_14 on motivation, of the 2004 Attitudinal Survey. Students were
also asked to complete the statement:
When I think of probability and statistics at school, I think of...
and to indicate whether or not they found statistics beneficial at each of three
educational stages:
• grade 11 and 12 (senior secondary)
• grade 8 to 10 (junior secondary)
• grade 4 to 7 (middle to upper primary),
and to indicate what was or was not beneficial. The complete 2005 Attitudinal
Survey can be found in Appendix D.
3.6 Results from the 2005 Attitudinal Survey
The 2005 Attitudinal Survey was completed by 247 MAB101 students during class
in the first week of first semester. Responses to the open-ended questions were
completed on the questionnaire while responses to the six items from the 2004
survey were recorded on digitally scanned sheets. Twenty-one students completed
the open-ended questions only.
Figure 3.8 illustrates the responses for 226 students to the six Likert-style items. For
each item in the self-efficacy scale, responses are generally more positive than for
2004. Hence this group of students would appear to be even more confident than in
2004. The responses to item A_14, “I am taking this unit only because I have to,”
Chapter 3 Attitudes Towards Mathematics and Statistics
51
are more negative than in 2004, particularly with regard to the proportion who
strongly agree with this statement.
SDDNASA
Each symb ol rep resents up to 3 observations .
A_14 I am taking this statistics unit only because I have to
SAANDSD
Each symb ol rep resents up to 3 observations .
A_15 I am good at maths
SAANDSD
Each symb ol rep resents up to 3 observations .
A_16 I expect to do well in this unit
Figure 3.8 continued over
Chapter 3 Attitudes Towards Mathematics and Statistics
52
SDDNASA
Each symb ol rep resents up to 3 observations .
A_17 I a m not confident of my ability to read and interpret informa tion presented gra philcally
SAANDSD
Each symb ol rep resents up to 4 observations .
A_18 I expect to be able to do the computing necessary for this unit
SDDNASA
Each symb ol rep resents up to 3 observations .
A_19 I expect to ha ve trouble de termining which proce dure to use to answe r questions
Figure 3.8 Responses for the Likert items repeated in 2005
Chapter 3 Attitudes Towards Mathematics and Statistics
53
Responses to the question
Did you find probability and statistics (sometimes called chance and data)
beneficial
a) in grades 11 and 12?
b) in grades 8 to 10?
c) in grade 4 to 7?4
are summarised in Table 3.1. Nineteen of the 247 students did not respond to any of
the three components of this question. Of the students who responded to at least one
of the components, there were 11% who felt that statistics had not been beneficial at
any level and another 3% whose only expressed opinion was that it was not
beneficial. There were 25% of students who felt it had been beneficial at all levels.
Perhaps the most notable feature of these figures is that 78% of students felt that
probability and statistics in grades 11 and 12 was beneficial and that 29% felt it was
beneficial only at that level, with another 9% indicating that it was beneficial in
grades 11 and 12 but not expressing an opinion regarding earlier years. While the
response for the senior years is pleasing, it raises the question of the standing of the
chance and data strands in the Queensland P-10 syllabus undertaken by these
students.
In response to the open-ended question of what was or was no t beneficial, most
students who responded referred to the usefulness (or lack thereof) of their learning
with regard to life or education, or to specific or generic skills they had acquired.
Table 3.1 gives a summary of the benefits reported by those students who considered
statistics beneficial in grade 11 and 12.
Specific skills which students felt they had acquired included: reading results of
surveys and finding probabilities and creating graphs, while generic skills included:
problem solving, logic, and reading and understanding ‘things’. Students who felt
statistics had been beneficial to other subjects generally referred to using it to handle
data in science subjects at school, while one student mentioned its use in a business
4 Of those who expressed an opinion for at least one level, 13% expressed no opinion regarding grades 4 to 7, and 11% expressed no opinion before grades 11 and 12, possibly due to lack of recall.
Chapter 3 Attitudes Towards Mathematics and Statistics
54
subject. Students who felt statistics was useful for further study generally presumed
that what they had learnt at school would be helpful in MAB101.
What was beneficial Percent useful for further study 13 useful for life 8 useful for other subjects 9 useful for further study and job 1 useful for further study and life 1 useful for other subjects and life 2 practical 1 gain specific skills 12 gain generic skills 6 better mark 2 interesting 2 challenging maths 1 different maths 1 easy 1 fun maths 1 good foundation 2 builds on previous years 1 assignment 1 understood gambling 1 no response 34
Table 3.1 What was beneficial – as reported by students who considered statistics beneficial in grade 11 and 12
Table 3.2 summarises the reasons given by those students who considered statistics
not to be beneficial in grade 11 and 12. These students most often referred to the
lack of practical experience with statistics, with one student commenting:
We learnt about it but we didn’t apply it to real life situations. We didn’t get to use the
stuff we learnt.
There are clearly substantial differences in students’ statistical experiences at school.
Of those students who felt statistics was beneficial at one stage of high school and
not at another, only a few expanded on both what was and what was not beneficial.
It was possible to find cases where Student A reported a specific benefit at Stage 1 of
schooling and a specific non-benefit at Stage 2, while Student B reported the same
Chapter 3 Attitudes Towards Mathematics and Statistics
55
benefit at Stage 2 and the same non-benefit at Stage 1. Some of this is attributable to
external factors. Different schools and teachers provide different learning
experiences within the same curriculum. However, some differences are part of the
students’ subjective learning experience. The implication from some responses, for
example:
Too young to care,
(from a student who felt statistics was beneficial only in grade 11 and 12)
is that a certain level of maturity is required before much learning can be interpreted
(even retrospectively) as beneficial. The challenge for teachers to engage students in
meaningful applications is ever-present.
What wasn’t beneficial Percent not practical 15 couldn't see use 9 not interested 9 not used yet 7 not useful for life 7 difficult 7 not sufficient depth 4 too much repetition 4 poorly taught 2 too many numbers 2 no response 34
Table 3.2 What wasn’t beneficial – as reported by students who considered statistics not beneficial in grade 11 and 12
Responses to the item:
When I think of probability and statistics at school, I think of…
are summarised in Table 3.3, with percentages of students who mentioned each
concept. As the item is open-ended, students are able to list as many thoughts as
they wish. Hence, responding with one of the concepts listed does not exclude a
student from responding with another.
Chapter 3 Attitudes Towards Mathematics and Statistics
56
…I think of… Percent Data representations 28 graphs; pie charts; charts; tables Data 21 data; data manipulation Experimental 20 data collection; experiments; surveys; assignments Negative experience 17 negative experiences; boring/tedious; little meaning/ no point; repetitive Calculations 13 fractions; percentages; calculations; maths Concrete materials 5 13 coins; dice; marbles; cards Effort 10 basic/simplistic; hard effort; complex; easy; probability easy/stats difficult; different; applications difficult Probability 9 chance; tree diagrams; permutations and combinations Process 7 conclusions; problem solving; analysis; skills Statistical measures 6 mean/median/mode; hypothesis tests st. dev/sigma/correlation Applications 5 gambling; other specific examples Distributions 4 bell curves; distributions; hypothesis tests continuous/discrete; frequencies; p and q Classroom 4 teacher; class Positive experience 3 useful; positive experience Words 2 definitions/formulae/jargon; census Computers 1 computing; graphics calculator
Table 3.3 When I think of probability and statistics at school, I think of …
5 Although referred to as “concrete materials” there is no knowledge of whether the student is recalling handling, observing or simply discussing these objects.
Chapter 3 Attitudes Towards Mathematics and Statistics
57
Most common among students’ responses to this item are data representations and
data. One pleasing aspect of the students’ responses is that 20% of students mention
some form of hands-on experience (experimental) such as data collection,
experiments, surveys and assignments, reflecting an increasing emphasis on such
activities. Experiences such as these have more potential to result in genuine
learning (Moore, 1997; MacGillivray, 2002) than exercises using materials or
examples such as dice, coins, cards and marbles which have been recalled by 13% of
students.
Figure 3.9 further elucidates the list in Table 3.3 by focussing on the top seven
responses and illustrating percentages for those students who considered statistics
beneficial in grade 11 and 12, and those who did not, as well as the whole group. In
most cases the thoughts of the groups are comparable. The most obvious difference,
the percentages of students who refer to some sort of negative experience is, as
expected, considerably higher among students who felt that statistics was not
beneficial in grade 11 and 12.
0
5
10
15
20
25
30
35
40
data
summarie
s data
expe
rimen
ts
nega
tive ex
p
calcu
lation
s
conc
rete m
aterial
s effort
All
Ben
Not Ben
Figure 3.9 Thoughts on statistics according to whether or not it was considered
beneficial in grade 11 and 12
Two other aspects are worthy of comment. Firstly, the percentage of students who
thought of concrete materials (i.e. dice, coins, cards or marbles) was 19% among
Chapter 3 Attitudes Towards Mathematics and Statistics
58
students who felt statistics was not beneficial and only 11% among those who felt it
was. While there is no way of knowing from the data which stage of education the
students are associating with concrete materials, this response might imply that such
examples do not help students appreciate the benefits of statistics.
Secondly, the percentage of students who mentioned concepts related to effort was
12% for those who felt statistics was beneficial and only 4% for those who did not.
It should be noted that aspects related to effort included reports of statistics being
both easy and difficult, simplistic and complex, with one student commenting that
statistics was easy and probability was difficult. Again it cannot be assumed that the
stage of education associated with the memory of effort is grade 11 to 12. However,
if there is any implication to be taken from this response, it must be that helping
students to appreciate the benefit of statistics requires engaging them in the process
and possibly revealing some of the complexity of the area.
3.7 Discussion
Acknowledging the potential impact of non-cognitive factors on student learning, the
Attitudinal Surveys described in this chapter were administered to MAB101 students
in order to better understand the attitudinal profiles of these students. The 2004
survey used Likert scales to measure: Affect, Value, Motivation, Links between
mathematics and statistics, Use of statistics in society, Difficulty and Self-efficacy.
In 2005, the self-efficacy component was repeated and specific school experience of
statistics was surveyed.
In general, the 2004 cohort of students demonstrated positive attitudes towards
statistics with regard to their affective feelings and the value of learning statistics.
They displayed relatively neutral attitudes regarding the difficulty of statistics,
conflicting attitudes regarding their own motivation to learn statistics and negative
attitudes with regard to the use of statistics in society. There was no doubt in the
students’ minds as to the link that exists between statistics and mathematics. The
self-efficacy of the students was positive as was the case when this section of the
survey was repeated with the 2005 cohort.
Chapter 3 Attitudes Towards Mathematics and Statistics
59
When considered as a group, MAB101 students present at the secondary/tertiary
interface with a generally positive attitude, particularly regarding their confidence in
their own ability to learn statistics. This confidence makes these students different
from those often discussed in research into students’ attitudes towards statistics,
especially those in psychology statistics courses where such non-cognitive research
is often based. However, this positive attitude needs to be interpreted in the light of
the lack of breadth and depth in statistics to which these students have been exposed
at school level.
In exploring the students’ statistical experiences at school via the 2005 Attitudinal
Survey, 85% of students felt that during at least one of the three stages of schooling
(grade 4-7, grade 8-10, grades 11 and 12) the study of statistics had been beneficial.
Approximately a quarter of students felt that it had been beneficial at all three stages
while over a third felt it had been beneficial during grades 11 and 12 only. Over
three-quarters felt that the study of statistics had been beneficial in at least grades 11
and 12.
In discussing the perceived benefits of statistics, students most commonly referred to
its usefulness in other areas or to general and specific skills they had acquired.
Reasons statistics was considered non-beneficial referred most commonly to it being
not practical or to students’ inability to see its usefulness, with some
acknowledgement that student immaturity may be responsible for this perception.
Students in the 2005 cohort gave a variety of responses to the item:
When I think of probability and statistics at school, I think of…
Comparisons of these responses on the basis of whether or not students felt statistics
had been beneficial in grades 11 and 12 suggest that engaging students in a way that
connects with the effort they are required to exercise, may increase their perceived
benefit of their study.
While school experiences of statistics are varied, the attitudes towards statistics of
incoming MAB101 students can be summarised as generally positive particularly in
the area of self-efficacy. Addressing specific non-cognitive issues with individual
students is always necessary. In MAB101, this requires supporting those who are
Chapter 3 Attitudes Towards Mathematics and Statistics
60
traditionally seen as needing encouragement due to poor background or motivation,
while being aware of the challenge experienced by those who are encountering a
broader perspective to statistics than that to which they have been accustomed, and
the challenge of engaging those students who would prefer a more mathematical
approach to statistics.
Chapter 4 Numeracy
61
Chapter 4 Numeracy
4.1 Introduction
Over the past forty years, numerous studies at the national and international level
have been aimed at determining levels of numeracy and mathematical skills
possessed by students at different educational stages. Such studies include those
conducted by the International Association for the Evaluation of Educational
Achievement (IEA), extending from the First International Mathematics Study
(FIMS 1963-1967) to the most recent Trends in International Mathematics and
Science Study (TIMSS 2002/03). Most of these have concentrated on primary and
junior secondary school, the upper age limit generally being fourteen to fifteen years
– the end of compulsory schooling in most countries.
The British Cockcroft Report (Cockcroft, 1982) of 1982 first popularised the term
‘numeracy’ giving an informal definition of:
an “at-home-ness” with numbers and an ability to make use of the mathematical skills
which enable an individual to cope with the mathematical demands of his everyday life.
While educational literature differentiates between the terms ‘numeracy’,
‘quantitative literacy’ and ‘mathematical skill’, in many situations such differences
are insubstantial and Cockcroft’s definition brings to our attention some important
matters. In particular it emphasises that both familiarity and skills are needed to
achieve applicability, as well as highlighting the fact that a desired level of
competence depends on the specific demands of an individual’s circumstances.
Students beginning a degree program at university in a quantitative area such as
science will find that a certain level of mathematical skill is assumed for
understanding and successful completion of their course. There is a tendency to
think that the completion of an algebra and calculus based senior school mathematics
Chapter 4 Numeracy
62
subject provides more than sufficient mathematical preparation for such a course. In
some cases such a mathematics course is set as a formal prerequisite while in others
it is taken to be assumed prior knowledge. In either situation, the experience of
academics has been that the range of mathematical preparation with which students
arrive at university, even amongst those who have completed the school course in
question, is such that any assumption can be made only with caution (Coutis, et al.,
2002). Although this has resulted in many universities recognising the need to
provide support for students in this area, little formal research has to date been
carried out into the mathematical skills of students at the secondary/tertiary interface.
It has been shown that a lack of mathematical skills provides a barrier fo r learning in
engineering courses (Cuthbert and MacGillivray, 2003) and that basic numeracy is a
predictor of outcomes in introductory statistics (Gnaldi, 2003). Educators in the
tertiary context require a sound awareness of the level of skills which students are
likely to possess if they hope to understand the difficulties which students encounter
and provide them with the resources to overcome such difficulties. In particular, this
includes an awareness of the level of confidence retained by students from their pre-
senior school mathematics study.
In this chapter, a Numeracy Questionnaire to measure the level of basic numeracy
possessed by students at the interface of secondary and tertiary education, embarking
on a degree program associated with science, is developed and analysed. Section 4.2
describes the structure of this multiple-choice questionnaire. In Section 4.3.1, the
results of the questionnaire amongst 566 students over two years are presented and
discussed. A dichotomous Rasch model is fitted to the data in Section 4.3.2 and used
in Section 4.3.3 to define five levels of understanding demonstrated by the students.
In Section 4.3.4, general linear models are used to investigate relationships between
the students’ total scores on the Numeracy Questionnaire and their demographic,
mathematical and attitudinal backgrounds. Section 4.3.5 focuses on each individual
item on the Numeracy Questionnaire and applies logistic regression to identify which
of the factors determined as significant in Section 4.3.4 are significant indicators of
success or failure for the item. Section 4.4 concludes this chapter by discussing the
implications at the secondary/tertiary interface of the results of the Numeracy
Chapter 4 Numeracy
63
Questionnaire with respect to the importance of sufficient mathematical study at
secondary level.
4.2 Construction of the Numeracy Questionnaire
Over the past decade there has been an increasing trend to develop and administer
diagnostic surveys to incoming tertiary students to gauge their levels of mathematical
preparation. The instruments used for these surveys have tended to be constructed
for specific needs and the information obtained from them has rarely been
disseminated. Hence a tool designed specifically to measure the pre-calculus skills
relevant to an introductory data analysis course and appropriate at the
secondary/tertiary interface, could not be located in the literature and needed to be
constructed for this survey.
The Numeracy Questionnaire was developed specifically to assess the numeracy
skills considered relevant to the unit MAB101, Statistical Data Analysis 1, at the
Queensland University of Technology, but would be appropriate as a tool for
measuring pre-calculus mathematical skills of students entering any tertiary course
which required such skills. As described in Chapter 2, students in MAB101 are
enrolled in a science or broadly scientific degree program and the majority undertake
the unit in their first semester of study.
The Numeracy Questionnaire consists of 21 multiple-choice items which students
could complete in less than thirty minutes. Given that students were also asked to
complete the Statistical Reasoning Questionnaire described in Chapter 5, as well as
the Background Information Survey of Chapter 2 and Attitudinal Surveys described
in Chapter 3, it was felt that the multiple-choice format would both maximise student
cooperation, and minimise marking and processing difficulties. So as to enable very
basic questions to be asked, students were requested not to use calculators, with
numbers being deliberately chosen for ease of manipulation.
The mathematical skills on which the Numeracy Questionnaire focuses are those
skills commonly associated with an introductory data analysis course but it is
Chapter 4 Numeracy
64
designed to assess basic skills well below the assumed level of entry to the course.
(The complete Numeracy Questionnaire can be found in Appendix E.) For each
question, distracters were carefully chosen to reflect common or possible errors.
These are summarised, together with the proportion of students who selected each
response, in Table 4.2. Both the general and more specific skills for each question
are listed below. N_i is the item number on the questionnaire.
Handling of fractions, percentages and decimals:
N_1 Convert a percentage to a common fraction;
N_2 Convert a percentage less than one to a common fraction;
N_3 Convert a common fraction to a percentage;
N_4 Calculate the percentage of a group given the percentages of two
unequal subgroups;
N_5 Calculate a percentage of a group;
N_6 Calculate percentages in a two-step problem;
N_8 Add two decimal fractions;
N_9 Add two common fractions;
N_10 Add three common fractions;
N_16 Order a set of positive and negative, common and decimal fractions;
N_17 Order a set of common fractions;
Application of operations
(N_4, N_5, N_6) These items involve percentages in an applied context
N_7 Divide a class into groups of maximum size;
Evaluation of simple expressions
N_11 Evaluate a simple rational expression;
Chapter 4 Numeracy
65
N_12 Evaluate the square root of the sum of two fractions;
N_13 Evaluate a simple expression using order of operations;
Substitution and evaluation of expressions
N_14 Substitute into an expression which requires estimation of a surd;
N_15 Evaluate a multi-step substitution;
Solving of equalities and inequalities
N_18 Rearrange an expression involving 1
1 x;
N_19 Solve a simple linear inequality;
N_20 Solve a rational inequality with the unknown on the denominator;
N_21 Solve a pair of simple linear inequalities.
The emphasis in the Numeracy Questionnaire is intentionally put on basic skills
rather than any higher level thinking such as problem solving or mathematical
reasoning. In Chapter 7 of this thesis, relationships are shown to exist between these
basic skills and statistical reasoning. In particular it is demonstrated that a link exists
between the ability to manipulate fractions and the ability to utilise full and complete
proportional reasoning.
The first six questions (N_1 to N_6) involve the use of percentages. The application
of this area to an introductory statistics course is transparent, both in calculation and
interpretation. A student, for example, needs to be comfortable with converting
0.1% to 0.001 in the statement: “This factor is significant at the 0.1% level,” before
they can hope to understand the statistical implications.
The questions N_4 to N_7 require students to apply basic skills to practical
situations, perhaps combining two or more steps (N_4 and N_6) or reinterpreting the
mathematical answer to make sense in context (N_7). All these skills are commonly
applied in any introductory statistics course.
Chapter 4 Numeracy
66
Questions N_8 to N_13 involve simple calculations, such as adding two decimals
(N_8), two fractions (N_9), three fractions (N_10) and applying order of operations
(N_11). Such skills are all needed in simple calculations, in correctly using
calculator processes, and in understanding basic quantitative arguments.
An ability to perform basic calculations such as those demonstrated in N_1 through
to N_11 cannot be replaced by technology. Many teachers at the introductory
tertiary level have experienced situations where blind faith in a calculator, together
with an inability to manually check at least the scale of the computation, have led a
student to have difficulties in using a calculator, or to defend an error of sizeable
proportion on the basis that “the calculator said so”.
Questions N_14 and N_15 require students to substitute values and evaluate an
expression. In N_14, the expression is of the form used to calculate a pooled sample
standard deviation. N_15 is a form similar to a test of two proportions. Experience
has shown that some students have difficulty understanding what is happening in
such expressions, no matter what calculating technique is used.
Questions N_16 and N_17 involve ordering positive and negative fractions and
decimals. These questions further test students’ understanding of value.
Question N_18 asks students to rearrange the equation:
cxb
a=
− 1.
Experience has shown that students have difficulty understanding and handling an
expression such as this. Handling of fractions is often a problem area and it appears
that the combination of horizontal and sloped lines to indicate a fraction can add
considerably to the problem. Combined with the appearance in a fraction of
algebraic entities rather than simply numbers, this creates a significant barrier to
understanding. Students encounter such an expression during statistics in the form of
standardised sample means, such as:
xnµ
σ−
.
Chapter 4 Numeracy
67
Questions N_19 to N_21 involve the use of inequalities. Students encounter
inequalities in statistics in a number of situations.
During MAB101 students rely on calculators and computers to perform necessary
calculations. However, part of the motivation for this study were the indications that
an understanding of, familiarity with and ease of handling basic numerical and
algebraic expressions may be important to the ir development of statistical thinking.
The Numeracy Questionnaire was included in this research to clarify the degree to
which this is the case. The results of this aspect of the work are discussed in Chapter
7.
Comparison with the new Year 1 to 10 mathematics syllabus for Queensland, shows
that most of the items in the questionnaire require a level of understanding which
would be expected of students aged 10 to 15 years, requiring demonstration of
outcomes such as:
compare and order whole numbers and common decimal fractions of any size, making
connections between key percentages and fractions;
(Learning Outcome N4.1 p20, (Queensland Studies Authority, 2004))
and
identify and solve addition and subtraction problems involving rational numbers.
(Learning Outcome N6.2 p21, (Queensland Studies Authority, 2004)).
The first of these (designated a level 4 outcome in the syllabus) would be typically
expected of 12 year-old students, while the second (designated a level 6 outcome)
would be expected of 15 year-olds. These skills would be commonly represented in
mathematics syllabi for this age group both nationally and internationally. The
remaining questions involve evaluation of simple rational expressions and solving
simple inequalities, skills which students would be expected to consolidate within the
context of an algebra and calculus based senior school mathematics course, if not
during the mathematics of pre-senior compulsory schooling.
Chapter 4 Numeracy
68
4.3 Results of the Numeracy Questionnaire
MAB101 students completed the Numeracy Questionnaire in class during the first
week of semester. All students who attended classes were encouraged to complete
the questionnaire although there was no compulsion to do so. While it was expected
that many students would take less than twenty minutes to complete the tasks and
nearly all less than thirty, students were encouraged to take as long as they needed.
Hence speed of calculation did not affect students’ results. It was clearly explained
to students that neither their participation nor results would have any bearing on their
course, but would be used solely for research purposes. In the second year of the
study, students received feedback on their score and were given the opportunity to
compare their answers to the correct answers, but few availed themselves of this.
The Numeracy Questionnaire was administered to three cohorts of students: first
semester 2004, second semester 2004 and first semester 2005. Due to a complication
with the administration of the survey6 in second semester 2004, it was only
completed by a smaller and less representative proportion of the class. For this
reason only two cohorts, first semester 2004 and 2005, are included in the study.
Where students were included in the study within both these cohorts, only one set of
data was included. This was selected such that it belonged to the cohort in which the
student had completed a larger number of survey instruments over the entire study,
or, where an equal number had been completed, the most recent cohort was selected.
There were four students for whom such a decision needed to be made with regard to
the Numeracy Questionnaire.
As has been described in Chapter 2, background and demographic information,
including gender, course of study, tertiary entrance score (OP score), level of
mathematics previously studied and results obtained therein, was also provided by
the students. The Numeracy Questionnaire was completed by 562 students over two
years, with 548 of these providing the information required to match their answers
with their demographic and background information.
6 During the class in which the questionnaire was scheduled, the university administration elected to conduct a fire evacuation drill. Course requirements made it difficult to administer the questionnaire in the following class.
Chapter 4 Numeracy
69
Of the 548 ‘identifiable’ students who completed the skills questionnaire, 48.0%
were female and 52.0% male (the same proportions as for the total number of
identifiable students reported in Section 2.3). These students represented a total of
30 different courses which are summarised in Table 4.1 with percentages given for
all students over the two cohorts and those who completed the Numeracy
Questionnaire. As described in Chapter 2, the Queensland year 12 algebra and
calculus based mathematics course, Maths B, or its equivalent is the assumed level of
mathematics for MAB101. This standard was reported as having been studied by
53.1% of students who completed the Numeracy Questionnaire, 4.9% reported a
lower level of mathematical preparation, 35.6% had a higher level (either advanced
high school mathematics or previous tertiary study) and 6.4% did not report their
mathematical background.
Students were also classified as ‘maths’ or ‘non-maths’ students. Maths students
included all those who were studying a mathematics degree or double degree
including mathematics, as well as applied science students majoring in mathematics
and education students with mathematics as one of their teaching subjects. Under
this system, 27% of students in the study were classified as maths students,
essentially the same as the figure was 28% of those who completed the skills
questionnaire.
Course % of total 7
% of numeracy
respondents
education or double degree involving education 9.7 8.8
maths or double degree involving maths; applied science (maths)
17.2 19.7
other double degree 7.2 6.0
applied science (non-maths) 56.5 55.5
biotechnology 7.4 7.9
other 1.9 2.2
Table 4.1 Course breakdown of student cohort and respondents over two years
7 Reported in Section 2.3
Chapter 4 Numeracy
70
For the 562 students who completed the skills questionnaire, total scores ranged from
4 to 21, with a mean of 13.6, standard deviation of 4.1 and a median of 14. Twenty
students scored the maximum possible score of 21. The distributions of the two
cohorts are illustrated in Figure 4.1.
Although the two distributions are remarkably similar, a slight decrease in the
number of lower responses in 2005 resulted in quartiles which are higher by one
mark and a larger mean (p=0.011 from the t-test) over 2004. This statistically
significant difference contributes to some differences which appear between the
years in the analysis of background predictors (Section 4.3.4). However, closer
inspection of the ordering and distribution of responses showed so few differences
between the cohorts, that combining the years was considered a valid and the most
succinct way to describe the responses to individual items.
Descriptive Statistics: Numeracy Total Variable Year N Mean SE Mean StDev N_Total 04 303 13.224 0.238 4.135 05 259 14.104 0.251 4.040 Variable Year Minimum Q1 Median Q3 Maximum N_Total 04 4 10 14 16 21 05 4 11 14 17 21
Year
Num
erac
y To
tal
I_05I_04
22.5
20.0
17.5
15.0
12.5
10.0
7.5
5.0
Figure 4.1 The distribution of total Numeracy Scores is similar across the two cohorts.
Chapter 4 Numeracy
71
4.3.1 Consideration of student responses
The complete questionnaire and the percentage of students who gave each response,
together with a brief explanation of the error reflected in each distracter, can be
found in Table 4.2. As well as reporting the responses for the entire group, each
response is reported separately on the basis of the division of students into those who
have and have not studied Maths B. Questions are ordered by increasing difficulty
(as measured by student outcomes) and the correct response is listed first. In the
actual questionnaire, responses are arranged randomly.
An examination of groups of questions shows how the success rate falls rapidly as
the simplicity of the question decreases. Lack of familiarity, multi-step problems
and abstraction with the introduction of letters cause obvious difficulties for students.
As an example, consider a group of questions involving fractions, extending from
basic operations with fractions through to the application of fractions in the form of
manipulation of rational expressions. Question N_9, which requires addition of two
fractions, has a success rate of 84% which falls to 78% for adding three fractions
(N_10). For students who have not completed Maths B, the competency with
fractions is extremely limited with both these questions having a success rate close to
50%.
When a square root is also involved (N_12), the success rate for the complete cohort
falls to 56% and to 33% for those without Maths B. It is interesting to note here how
the addition of a step to a question causes students to regress in their skills. In
question N_9, 8.9% of all students added two fractions by either just adding the
denominators or by adding the numerators and denominators. When the square root
is added to the problem, 26.1% of students make either of these two errors (be it
before or after taking the square root).
When students are asked to apply their understanding of fractions to manipulate an
algebraic expression (N_18), the success drops to 48%, and to 42% when the
equality becomes an inequality (N_20). The three items (N_12, N_18 and N_20) all
have a success rate of around 30% for students without Maths B. Question (N_15),
% response
Question Choices all (566)
done maths B
(489)
not done maths B
(27)
Comments on responses
Level one8
10 95.7 96.3 88.9 Nearly all could manage this. 5 1.6 1.4 3.7 Forgot that reference group was 200.
40 2.3 1.9 7.4 1Used 5% .
5=
57 0.2 0.2 0.0 Assumed equally likely groups although told otherwise.
100 0.2 0.2 0.0 1Used 5% .
2=
N_5
Possible subject grades at a particular institution are 1 to 7, with 7 being the highest. In a particular class of 200, 5% of students were given a 1 or a 2. The number of students receiving a 1 or a 2 was:
No response. 0.0 0.0 0.0 1.21 92.2 92.6 92.6 Another high success rate. 0.1111 0.0 0 0.0 Does not trade. 0.121 4.3 4.1 0.0 Shifts decimal point. 1.111 3.6 3.3 7.4 Cannot trade correctly. 12.1 0.0 0.0 0.0 Shifts decimal point.
N_8 0.66 + 0.55 is equal to:
No response. 0.0 0.0 0.0 15
87.4 88.5 74.1 Most students can do this.
120
1.4 1.4 3.7 1
Since 10% is 10
20100
7.8 6.8 18.5 Did not simplify.
25
2.5 2.7 0.0
12
0.4 0.2 0.0 Guessing
N_1 Written as a fraction in its simplest form, 20% is equal to:
No response. 0.5 0.4 3.7
Table 4.2 continued over
8 The levels included in this table result from the Rasch analysis and are described in Section 4.3.3
% response
Question Choices all (566)
done maths B
(489)
not done maths B
(27)
Comments on responses
29 87.2 87.0 88.9 Well done for those without Maths B 41 2.7 2.9 0.0 Cancel 4 and 42 to give 5+9x4. 56 5.0 4.9 3.7 Cancel 4 into 42 to give 5x4+9x4. 89 3.0 3.1 3.7 Cancel 4 and 22 to give 5x16+9.
181 1.1 1.0 0.0 ( )22Thinks a b a b× = ×
No response. 0.9 0.8 3.7
N_11 2 25 4 9 2
4is equal to:
× + ×
No response hereafter.9 0.2 0.2 0.0
Level two
1130
83.6 86.2 48.1 Note the difference for without Maths B.
230
6.2 5.8 18.5 Finds common denominator but doesn’t convert numerator.
111
2.7 2.5 11.1 Adds denominators.
211
6.2 4.7 22.2 Adds numerators and denominators.
56
0.9 0.6 0.0 No idea.
N_9 1 1 is equal to:
6 5+
No response. 0.4 0.2 0.0 12 78.6 81.0 66.7 10 6.6 5.6 0.0 ‘Near enough’. 11 5.7 5.4 7.4 Round down. 11.3 8.0 7.4 14.8 No practical application. 13 0.7 0.4 7.4
N_7
A group of 340 students must be divided into lab classes with a maximum of 30 students in each. The smallest number of lab classes needed is:
No response. 0.4 0.2 3.7
Table 4.2 continued over
9 Three or more missing items at the end of the questionnaire were treated differently in the Rasch analysis. (See §4.3.)
% response
Question Choices all (566)
done maths B
(489)
not done maths B
(27)
Comments on responses
16.7% 78.3 79.2 74.1 6.0% 3.4 2.9 7.4 12.5% 10.0 10.1 7.4 60.0% 2.0 1.6 7.4 66.7% 5.9 5.8 3.7
N_3
Written as a percentagecorrect to 1 decimal place,
1the fraction is equal to:
6
No response. 0.5 0.4 0.0 3
14
77.6 80.1 51.9 Adding 3 fractions is harder for most.
813
7.3 4.9 37.0 Adds numerators and denominators - Slightly more than in N_9.
1372
1.4 1.7 0.0 Finds common denominator but numerator is sum of denominators.
3572
1.8 1.9 3.7 Finds common denominator but tries to cross-multiply for numerator.
51
12 10.3 10.3 3.7 Can find common denominator, but not numerator; more convincing
than correct answer which has been simplified.
N_10 1 2 5 is equal to:
4 3 6+ +
No response. 1.6 1.2 3.7 Level three
20 69.2 71.8 44.4 Add one step and the success rate reduces considerably. 10 2.7 1.2 11.1 Used 100 as reference group. 15 1.8 2.0 0.0 Combination of errors above and below. 30 4.6 3.9 14.8 Forgot to remove the 5%. 170 21.5 21.0 25.9 = 85% - perhaps did not read the question.
N_6
In the same class (as in question 5), 85% of students were awarded a grade from 3 to 6. The number of students receiving a grade of 7 was:
No response. 0.2 0.0 3.7
Table 4.2 continued over
% response
Question Choices all (566)
done maths B
(489)
not done maths B
(27)
Comments on responses
11000
61.2 62.8 66.7 A percentage <1 is much harder.
1100
9.3 8.0 14.8
110
25.8 25.9 14.8 Confused with 0.1.
10100
3.7 3.3 3.7
910
0.0 0.0 0.0
N_2 Written as a fraction in its simplest form, 0.1% is equal to:
No response. 0.0 0.0 0.0 72
a < 62.8 64.0 51.9
72
a > − 3.9 3.5 3.7 Take 2a over without changing sign.
3a = 12.3 11.1 14.8 An integer that works. 1a < 7.1 6.8 14.8 Divide 12 by 2, then subtract 5. 72
a = 10.3 11.1 7.4 Solve the equality.
No response. 0.4 0.4 0.0
N_19 The solution to theinequality:2 5 12 is:a + <
No response hereafter. 3.2 3.1 7.4
Table 4.2 continued over
% response
Question Choices all (566)
done maths B
(489)
not done maths B
(27)
Comments on responses
Level four
136
56.2 58.0 33.3 Add 1 step and the success rate with fractions reduces considerably.
15
3.4 3.3 3.7 Take square roots then add denominators.
113
15.0 13.0 33.3 Just add denominators – compare with same error in N_9.
25
7.7 7.2 18.5 Take square roots then add numerators and denominators.
56
12.3 13.0 3.7 Take square roots first then add fractions.
No response. 5.0 4.9 7.4
N_12 1 1 is equal to:
9 4+
No response hereafter. 0.5 0.6 0.0
3 13 81 4 3 5 20 6 7 54.1 55.4 44.4 Slightly better than N_16.
13 8 34 1 20 7 6 5 3 17.3 15.8 25.9 Descending denominators.
8 13 34 1 7 6 20 5 3 2.9 2.9 3.7 Descending.
3 8 131 4 3 5 6 7 20 7.7 7.2 7.4 Ascending numerators & denominators.
13 3 81 4 3 20 6 5 7 14.1 14.4 11.1 3 4
5 6>
No response. 2.3 2.7 0.0
N_17 Which of the following sets of values is correctly ordered from smallest to largest?
No response hereafter. 1.8 1.6 7.4
Table 4.2 continued over
% response
Question Choices all (566)
done maths B
(489)
not done maths B
(27)
Comments on responses
2 9a< < 51.6 53.5 33.3 3.75a = 3.6 2.7 11.1 Solution to the pair of simultaneous equations. 8a = 7.1 6.4 3.7 An integer that works.
2 9a> < 24.0 24.7 22.2 A complete lack of understanding of inequality signs. 9a < 6.9 6.6 14.8 Solution to the first equation and satisfies 3x9>6
No response. 3.6 3.1 7.4
N_21 The solution to the pairof inequalities:
7 16 and 3 6is:a a+ < >
No response N_19 on. 3.2 3.1 7.4
26% 52.1 53.3 40.7 25% 15.8 15.0 29.6 Average of 20% and 30%.
28% 17.1 16.9 18.5 1Used 30%
3=
52% 11.9 11.5 11.1 Forgot that reference group was 200. 56% 2.1 2.3 0.0 Combination of previous 2 errors.
N_4
A class consists of 80 males and 120 females. A non-compulsory excursion is attended by 20% of the male students and 30% of the females. The percentage of the class which attends the excursion is:
No response. 0.9 1.0 0.0 1 -0.05 0.05 0.5 0.555
− 50.4 49.0 51.9
1-0.05 0.05 0.55 0.55− 3.4 3.3 7.4 Doesn’t understand decimal places.
1-0.05 0.05 0.5 0.555− 34.2 35.6 25.9 Cannot order negatives.
1-0.05 0.05 0.5 0.555− 6.1 6.4 0.0 No idea about negatives.
1 -0.05 0.05 0.55 0.55− 4.2 4.1 7.4 Two mistakes.
No response. 0.4 0.4 0.0
N_16 Which of the following sets of values is correctly ordered from smallest to largest?
No response hereafter. 1.4 1.2 7.4
Table 4.2 continued over
% response
Question Choices all (566)
done maths B
(489)
not done maths B
(27)
Comments on responses
32 50.9 52.1 40.7 Poorly done.
5 16.7 17.3 11.1 ( ) ( )20 30 5 2+ ÷ ×
6.4 4.3 25.9 Left to right. 23 24.2 24.3 22.2 Multiplication before division – misunderstands BOMDAS. 52 0.7 0.8 0.0 ( )20 30 5 2+ ÷ ×
No response. 0.5 0.6 0.0
N_13 20 30 5 2 is equal to:+ ÷ ×
No response hereafter. 0.5 0.6 0.0
1bc
xa
=−
47.7 49.4 25.9
1ax
b c−
= 18.9 18.3 18.5 Just swap x & c – visually appealing.
1ax
bc−
= 20.5 19.8 29.6 1
Does not understand 1 x
1a b
xc
=+
4.6 4.3 11.1 1
Take 1 over first. Confused by 1 x
1x
ac
b
=−
4.6 4.3 7.4
1 1a ab x b x
−= −
No response. 1.4 1.6 0.0
N_18
The solution for to theequation:
-1 is given by:
x
a cb x
=
No response hereafter. 2.3 2.3 7.4
2.83 46.4 48.4 25.9 Less than half can do without a calculator. 3.94 16.6 15.8 29.6 a b a b+ = + 6.00 11.9 11.5 14.8 As above with incorrect simplification. 6.93 16.6 16.0 18.5 Incorrect cancelling.
11.31 5.0 4.9 3.7 ( )22Thinks a b a b× = ×
No response. 2.5 2.5 0.0
N_14 ( ) ( )2 2
When 25, 13, 2, 4,
the expressionn-1 1
2
(correct to 2 decimal places)is equal to:
n m s t
s m tn m
= = = =
+ −+ −
No response hereafter. 1.1 0.8 7.4
Table 4.2 continued over
% response
Question Choices all (566)
done maths B
(489)
not done maths B
(27)
Comments on responses
Level five
192a > 42.0 44.2 29.6 Another very low success rate. 1
12a < 3.9 3.5 7.4
12a < 10.5 10.1 11.1 192a < 27.6 27.4 22.2 Inverted both sides without changing direction of inequality. 12a > 11.2 10.5 18.5
No response 1.6 1.2 3.7
N_20 The solution to the
48 1inequality: is:4a
<
No response N_19 on. 3.2 3.1 7.4
16/15 37.7 39.5 22.2 Lowest success rate. 1/4 10.3 9.7 22.2 This is p. 4/5 17.4 17.7 7.4 ( )Forgets 1 p−
5/6 17.8 17.3 29.6 1 2Uses p p p= + 108/77 8.4 8.0 0.0 x2 and n1 confused. No response. 6.9 6.6 11.1
N_15 ( )
1 2 1 21 2
1 2 1 2
1 2
1 2 1 2
Given that,
, ,
the value of when1
10, 15, 25, 75is given by:
x x x xp p p
n n n np p
p px x n n
+= = =
+−−
= = = =
No response hereafter. 1.4 1.2 7.4 Can’t do (or be bothered) – highest non-participation.
Table 4.2 Items and responses to Numeracy Questionnaire in order of difficulty
Chapter 4 Numeracy
80
which students found most difficult with a success rate of 38% (22% without Maths
B), requires multi-step substitution of fractions into an algebraic rational expression.
The numbers involved in N_15 have been deliberately chosen so that a little
experience and comfort with fractions and a minimal degree of persistence should
produce success. Interestingly, this question also had the highest non-response rate
of 9%, (19% without Maths B) indicating that comfort and persistence are not
present in many students when handling fractions.
A smaller group of questions involving practical application of percentages again
shows how the addition of steps in the procedure decreases the success rate.
Question N_5, requiring students to find five percent of 200 students, has the highest
success rate of 96% (89% without Maths B). This success rate falls rapidly to 69%
(44% without Maths B) when another step is added to this question (N_6). When the
percentage of two subgroups has to be converted to the percentage of the whole
group (N_4), the success rate falls to only 52% (41% without Maths B).
There are three items, N_1, N_2 and N_3, which involve converting between
common fractions and percentages. N_1 is the simplest, involving the value twenty
percent, N_3 becomes more difficult as the fraction 16
is less familiar to the students
and N_2 more difficult again as the percentage is less than one. The success rate for
the entire cohort falls from 87% through 78% to 61% for these items. For those
students without Maths B, the struggle begins much earlier with a success rate of
74% for N_1 and hence does not show the same decrease, with success rates of 74%
and 67% for N_3 and N_2, respectively. Item N_2 is one where students without
Maths B slightly outperform those without Maths B; the contrast in this question is
more in the incorrect choice. The pattern of students without Maths B struggling
sooner (and so not demonstrating as great a decrease in success with item
complexity) is generally reflected throughout the questionnaire.
Some of the errors which students have made in these items could legitimately be
attributed to carelessness. Perhaps 22% of students did not correctly read N_6 and
thought they were only finding eighty-five percent. Also, 17% of students in N_4
calculated one third rather than thirty percent of 120. However, this carelessness,
Chapter 4 Numeracy
81
which tends to increase with problem complexity, is one aspect of the students’ gap
in skills and is likely to cause problems in the students’ application of techniques in
their fields of study.
4.3.2 Results from Rasch analysis
In order to further investigate and better understand the levels of numerical
competency represented by students in the study, a dichotomous Rasch model was
fitted to the responses to the Numeracy Questionnaire for the two cohorts. This
Rasch model describes the probability of a person of ability nβ succeeding at an item
of difficulty iδ , in terms of the logistic equation:
Equation 4.1
( )( )( )( )
exp; 0,1; 1,..., ; 1,... ;
1 expn i
nin i
xP X x x n N i L
β δ
β δ
−= = = = =
+ −
where 1 for success, or 0 for failure,niX = is the response of person n on item i .
This is one of a family of models developed by George Rasch (Rasch, 1960) and
used over the past twenty years in the areas of education, psychology and sociology
as an alternative to traditional test theory and applied widely in school mathematics
studies (Wilson, 1992) and surveys such as the Third International Mathematics and
Science Survey (TIMSS) (Lokan, et al., 1997). The Rasch model assumes that the
questionnaire is measuring an underlying one-dimensional and hierarchical trait and
that the items are independent of one another. Model diagnostics are used to
measure the fit of the data to the model and in so doing provide evidence of the
validity of the questionnaire (Wright, 1999; Watson and Callingham, 2003). This
model has the advantage that sufficient statistics exist for nβ and iδ , with nR , the
total score for person n , being sufficient for nβ , and iS , the number of people
responding correctly to item i , being sufficient for iδ . The existence of these
sufficient statistics allows item difficulty and person ability to be separately
Chapter 4 Numeracy
82
estimated on a single scale (Keeves and Alagumalai, 1999). When n iβ δ= , person
n has a probability of 0.5 of succeeding at item i .
The Rasch model was fitted using Quest software (Adams and Khoo, 1996). For the
analysis, missing answers were considered as incorrect, apart from in cases where
students left three or more items blank at the end of the questionnaire in which case
they were treated as missing. As students were given virtually unlimited time, it was
felt that those who left a sequence of questions blank at the end of the paper had
chosen to proceed no further, whereas for individual missed responses it was
considered that the student could not determine the correct answer. Nineteen
students received perfect scores and are not included in the analysis as they and their
results cannot contribute to the estimation process.
One measure of fit used in Rasch modelling is the item infit mean square. The item
infit mean squares are the means of the weighted squared standardised residuals for
each item, averaged over students. For each of the 21 items in the questionnaire, the
infit mean square fell between 0.85 and 1.21. According to Keeves and Alagumalai
(1999), an item is generally accepted as fitting the Rasch model if the infit mean
square lies between 0.77 and 1.30, although some researchers would prefer a more
restricted range of 0.83 to 1.20. This provides evidence that the items are all
consistent with the underlying construct being measured by the questionnaire (in this
case, basic numeracy skill), and is a measure comparable to the concept of internal
validity used in traditional test theory (Wright, 1999).
Table 4.3 presents the estimates of the item and person separation reliabilities and the
overall fit measures.
item separation reliability 0.99 item infit mean square mean=1.00 SD=0.09 person separation reliability 0.74 person infit mean square mean=1.00 SD=0.20
Table 4.3 Measures of fit from Rasch analysis indicate that the model fits well.
Chapter 4 Numeracy
83
The item separation reliability index is the proportion of the total variance of item
estimates that is associated with parameter variance. The person separation
reliability is the corresponding index for persons. Large values of these indices (i.e.
close to one) are indicative of better estimation of parameters. (See Keeves and
Masters (1999) p275 for a detailed definition.)
For the Numeracy Questionnaire, the item separation index of 0.99 is very high,
providing evidence that the items give a good spread of difficulty which suggests that
the level of basic numeracy skills can be measured by the questionnaire. The person
separation reliability is acceptable at 0.74, providing evidence that the questionnaire
is of an appropriate level of difficulty for the students. The average item infit mean
square and the average person infit mean square are equal to the expected value of
1.00, suggesting that the model is appropriate for the data and hence that the
questionnaire is measuring a one-dimensional construct.
The variable map in Figure 4.2, shows the students (on the left hand side) and items
(on the right hand side) displayed on a single logistic scale. The level at which an
item appears is called the threshold. This is the level of ability at which a student has
a 50% chance of answering the question correctly. The map provides a convenient
visual display of the relative difficulty of the items. From the map, there appears to
be a group of students of very high ability, as well as the nineteen students who are
not included in the analysis because they received perfect scores. Although the case
separation reliability and case infit mean square provide evidence that the
questionnaire is of an appropriate level of difficulty for the students, the distribution
of items along the variable map verifies that, as intended, the questionnaire does not
reach into the upper levels of numeracy skill. This is not surprising as the
questionnaire is designed to assess basic skills below the assumed level of entry into
the course. In this sense the questionnaire is acting as a remedial diagnostic tool.
Questions at a higher level would need to be included in order to assess the full range
of numeracy skill of the cohort.
4.3.3 Levels of thinking
Consideration of the variable map produced by the Rasch analysis, together with
question complexity has been used to divide the items into five levels of difficulty
Chapter 4 Numeracy
84
Item Estimates (Thresholds) (N = 566 L = 21 Probability Level=0.50) ---------------------------------------------------------------------------- 4.0 | | | | XXXXXXXXX | | | | 3.0 | | | XXXXXXXX | | | XXXXXXXXXXXX | | 2.0 | XXXXXXXXXXXXXXXX | | | 15 XXXXXXXXXXXXXXX | | 20 XXXXXXXXXXXXXXXXXXXX | | 14 1.0 XXXXXXXXXXXXXXXXXX | 16 18 | 4 13 21 XXXXXXXXXXXX | 17 | 12 XXXXXXXXXXXXXX | | 2 XXXXXXXXXXXXX | 19 0.0 X | XXXXXXXXXX | 6 | XXXXXXXXXXXX | | XXXXXX | 3 10 X | 7 XXXXXX | -1.0 | 9 XXX | | | 1 11 XXXX | | | XX | -2.0 | 8 | | | | | 5 | | -3.0 | ---------------------------------------------------------------------------- Each X represents 3 students ============================================================================
Figure 4.2 The variable map shows the students (on the left hand side) and items (on the right hand side) displayed on a single logistic scale.
Level 5
Level 4
Level 3
Level 2
Level 1
Chapter 4 Numeracy
85
and the students into five levels of ability as indicated in Figure 4.2. Students at
level one are characterised by simple, common, single-step thinking. It can be
expected that a student at this level successfully:
• understands and uses common percentages;
• adds decimals;
• performs a simple calculation involving a combination of basic operations on
whole numbers.
At level two, students’ thinking is still simple and single-step but has progressed to
encompass less common applications. A student at this level can be expected to
successfully:
• understand and use less common fractions and percentages;
• add fractions with different denominators;
• solve simple practical problems.
At level three, students’ thinking has progressed to the two-step stage and is
beginning to encompass abstract notation in the use of simple algebra. It can be
expected that a student at this level:
• understands percentages less than one;
• solves a two-step problem involving percentages;
• uses abstract notation in a simple question;
• solves a simple linear inequality.
The thinking of students at level four has progressed to the multi-step stage with
abstract thinking continuing into more complex applications. It can be expected that
a student at this level successfully:
• solves a multi-step problem involving percentages;
• performs calculations requiring order of operations; combinations of fractions
and square roots;
• orders positive and negative fractions and decimals;
Chapter 4 Numeracy
86
• substitutes into complex expressions, approximating if necessary to obtain an
answer;
• manipulates abstract notation in a multi-step problem;
• rearranges a rational expression;
• solves a pair of simple inequalities.
At level five, a student’s thinking is multi-step, synthesising concepts with persistent
use of abstract notation. At this level a student can successfully:
• perform a multi-step substitution;
• solve a rational inequality with the unknown on the denominator.
Posit ioning of the students’ ability levels in Figure 4.2 shows that most of the
students in the study have begun tertiary studies operating at level three, four or five,
although there are some at level two and a small percentage of students who are still
characterised by level one thinking. It must be emphasised again that the Numeracy
Questionnaire covers junior high school mathematics, not the assumed senior
mathematics knowledge.
4.3.4 What influences students’ scores?
General linear model analysis was used to explore relationships between students’
scores on the Numeracy Questionnaire and their demographic and mathematical
backgrounds and attitudes. When describing linear models the convention used in
this thesis is to italicise the names of variables so as to aid the discussion thereof.
The variables used in modelling the total Numeracy Scores were formed from data
obtained in the Background Information Survey (See Section 2.3 and Appendix A)
and are described below.
Gender male/female;
1st Semester dichotomous variable =1 for students for whom this was
their first year of enrolment at QUT;
Years Since School continuous variable calculated from response to ‘Year
finished high school’;
Chapter 4 Numeracy
87
OP continuous variable - OP (overall position) or equivalent
score;
Maths Student dichotomous variable =1 if enrolled in a mathematics
degree, a double degree including mathematics, an applied
science degree majoring in mathematics, or an education
degree with mathematics as a teaching subject;
Repeat dichotomous variable =1 if student had previously failed
MAB101; constructed from information supplied on
mathematics subjects previously studied at QUT and
course records.
The data collected from the Attitudes Survey should not, in the most part, be
expected to relate directly to the results of the Numeracy Questionnaire. However,
as the aspect self-efficacy involves mathematics and statistics, it is appropriate to
include this variable in the modelling process for numeracy. Hence we also include
the variable:
Self-Efficacy continuous variable with possible values ranging from
-10 to 10; constructed as the sum of responses to the five
relevant items on the Attitudes Survey and described in
Section 3.2.
Students also provided information on the mathematics subjects they had stud ied at
school and the results they obtained. The nesting of these variables made variable
definition somewhat complex and several possibilities were experimented with
during initial analyses. The most informative variables constructed and used for
analysis were:
Maths B Result a categorical variable with three levels:
D = distinction (an A or B standard at school level; a 6 or
7 standard in the university equivalent subject),
P = pass (any other passing grade),
N = failed or not attempted (includes all students who
have not successfully completed Maths B);
Chapter 4 Numeracy
88
Higher Maths a dichotomous variable =1 for students who have studied
mathematics beyond Maths B, generally either the
extension maths subject (Maths C) at high school or at
least some university level mathematics.
A final dichotomous variable was included to describe the cohort to which a student
belonged:
Year 04 or 05.
The initial form used to collect background information was adapted from a form
which students in MAB101 had, in previous years, been routinely asked to complete.
As explained in Section 2.3, on this form the OP was described as “optional”. This
remained on the form in 2004 and hence resulted in an underreporting of OP scores
in 2004. Although this non-reporting was understandably non-random, the nature of
the bias was difficult to predict. This wording and the associated non-reporting were
avoided in 2005, but the inclusion and interpretation of tertiary entrance scores in
2004 or when the two years are combined or compared requires considerable caution
and care.
The advantage of using general linear models is that all available variables can be
investigated simultaneously. In the complex situation of dealing with numerous
variables which are expected to be interdependent, this type of statistical procedure is
necessary in order to examine the effect of variables in the presence of others that
cannot be ignored. In such situations a multitude of two-way correlations can be
misleading and almost impossible to interpret correctly. Two-way correlations are
not reported in this thesis so as to avoid their problems. An exception to this is made
in Section 8.2 in order to facilitate the comparison of results found in this work with
those quoted in previous research.
The approach taken in this thesis to fitting general linear models is to use a
combination of backwards elimination and forwards selection. Beginning with all
available main effects and all allowable two-way interactions in the model,
insignificant terms are deleted until all terms remaining in the model are significant.
Individual terms are then reconsidered and included in the reduced model if they are
Chapter 4 Numeracy
89
significant. Not all two-way interactions can be fitted in the initial model. Clearly,
for example, no repeating student is in his or her first semester at QUT, and hence an
interaction between Repeat and 1st Semester cannot be included. Given the multiple
testing required in the case of interactions, a Bonferroni approach is taken, enforcing
more stringent requirements for significance of interactions.
Under this technique, for the two years combined and not allowing for OP, the model
which best describes the Numeracy Score involves as significant predictors: Maths B
Result (p<0.001), Maths Student (p<0.001), Gender (p=0.002), Higher Maths
(p=0.003), Self-Efficacy (p=0.003) and Year (p=0.035). This model explains 29.4%
of the variation in the Numeracy and residual analyses display no systematic
concerns with the model. (See Figure 4.3)
Standardized Residual
Per
cen
t
420-2-4
99.9
99
90
50
10
1
0.1
Fitted Value
Sta
nda
rdiz
ed R
esi
du
al
17.515.012.510.0
4
2
0
-2
Standardized Residual
Fre
que
ncy
3210-1-2-3
48
36
24
12
0
Observation Order
Stan
dard
ize
d R
esi
dua
l
650
600
550
500
450
400
350
300
250
200
150100501
4
2
0
-2
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
Histogram of the Residuals Residuals Versus the Order of the Data
Residual Plots for Numeracy
Figure 4.3 The residual plots for the model in Equation 4.2 show no systematic
concerns
The fitted equation from this linear model is given by:
Equation 4.2
[ ] ( ) ( )
( ) ( )(s.e.=0.37) (s.e.=0.35) (s.e.=0.41)
(s.e.=0.40)
10.44 1.10 male 1.23 1
1.57 D 0.86 F/N
E Numeracy Gender Higher Maths
Maths B Result Maths B Result
= + × = + × =
+ × = − × =
( ) ( ) ( ) (s.e.=0.64)
(s.e.=0.45) (s.e.=0.061.69 S 1 0.20 - 0.72 05 .Maths tudent Self Efficacy Year+ × = + × + × =
6) (s.e.=0.34)
Chapter 4 Numeracy
90
When the predictor variables: Maths B Result, Maths Student, Higher Maths and
Self-Efficacy are examined, it is not surprising to find that they are highly inter-
dependent. Maths students are more likely to have studied at a higher level, to have
obtained a higher Maths B result and to have higher self-efficacy. Similarly those
who have studied higher level maths tend to have better Maths B results and higher
self-efficacy. In fact all the two way relationships between these four variables are
highly significant (p<0.001) and in the expected direction. It is all the more
interesting then that each of these four variables contributes a significant (albeit
small) increase to the Numeracy Score when all the other variables have been
allowed for.
A similar effect is noted for the variable Gender. Males are more likely to be maths
students, to have studied higher level maths and to have higher self-efficacy, but not
necessarily to have obtained better results in Maths B. Still there is an additional
advantage to the skills score in being male, even after allowing for the other
variables.
Although TIMMS 2002/03 showed that there was no significant difference between
genders in overall scores in Australian schools at age approximately 13 years, there
was a significant difference in the number section of the TIMMS assessment for this
group with males outperforming females (Thomas and Fleming, 2004). As this
section of the TIMMS assessment covered the topics of whole numbers, fractions
and decimals, integers, ratio, proportion and percent, the topics most heavily
represented in the Numeracy Questionnaire, the higher scores of males here may not
be unexpected.
The difference between the two years may be attributable to a difference in the two
cohorts or perhaps to different administrative practices in the two years. As
described in Section 4.3, in 2005 students were told they would receive feedback in
their performance on the Numeracy Questionnaire for their own benefit, and this may
have motivated some students to exercise greater effort. When the two years are
considered separately (without including OP), significant predictors for each year are
subsets of the predictors for combined years. For 2004, the best and most
Chapter 4 Numeracy
91
interpretable model (R-squared=21.4%) involves Maths Student (p<0.001), Higher
Maths (p<0.001) and Self-Efficacy (p=0.002); while for 2005 (R-squared=29.7%) it
involves Maths Student (p<0.001), Maths B Result (p<0.001) and Gender (p=0.001).
When the OP score is included in the analysis there is in fact little difference in the
best model with Gender (p<0.001), Higher Maths (p<0.001), Maths B Result
(p=0.001), Self-Efficacy (p=0.001), Maths Student (p=0.005) and Year (p=0.017) all
remaining significant. OP is also significant in this model with p=0.024. The R-
squared value increases to 39% due partially to the extra variation in Numeracy
which has been explained by allowing for OP, but also to the drop in error degrees of
freedom (from 418 to 304) caused by the number of students who did not report an
OP or equivalent score. The residual plots for the model, given in Figure 4.4, show a
slight indication of some long-tailedness in the distribution but not sufficient to be of
concern.
Standardized Residual
Per
cent
420-2-4
99.999
90
50
10
1
0.1
Fitted Value
Stan
dar
diz
ed R
esid
ual
20.017.515.012.510.0
2
1
0
-1
-2
Standardized Residual
Fre
que
ncy
2.251.500.750.00-0.75-1.50-2.25
40
30
20
10
0
Observation Order
Sta
nda
rdiz
ed
Re
sidu
al
550500450400350300250200150100501
2
1
0
-1
-2
Normal Probability Plot of the Residuals Residuals Versus the Fit ted Values
Histogram of the Residuals Residuals Versus the Order of the Data
Residual Plots for Numeracy
Figure 4.4 The residual plots for the model in Equation 4.3 show slight indication
of long-tailedness.
The fitted equation for the linear model allowing for OP is given by:
Chapter 4 Numeracy
92
Equation 4.3
[ ] ( ) ( )
( ) ( )(s.e.=0.66) (s.e.=0.36) (s.e.=0.43)
(s.e.=0.44)
11.26 1.34 male 1.67 1
1.02 D 1.69 F/N
E Numeracy Gender Higher Maths
Maths B Result Maths B Result
= + × = + × =
+ × = − × =
( ) ( ) (s.e.=0.75)
(s.e.=0.48) (s.e.=0.070) 1.37 S 1 0.23 - 0.15Maths tudent Self Efficacy OP+ × = + × − ×
( ) (s.e.=0.066)
(s.e.=0.36)0.87 05 .Year+ × =
It should be noted in Equation 4.3 that a smaller numerical value of OP is an
indicator of higher achievement and hence the negative coefficient of OP is to be
expected. OP scores range from a high of 1 to a low of 25, although the lowest score
over the two cohorts is 17. OP of course is also highly correlated with Maths B
Result, Maths Student, Higher Maths and Self-Efficacy. Students with better Maths
B results and those with higher self-efficacy tend to have better OP scores as do
those studying mathematics and those who have previously studied higher level
mathematics. Indeed, all the correlations between each pair of these five variables
are highly significant and in the expected direction. Given these interrelationships
and the non-random non-reporting of OP in one year of the study, it is both
interesting and surprising that exactly the same variables appear in Equations 4.2 and
4.3 with very similar coefficients. The fact that even when OP is allowed for, the
variables measuring the result in Maths B, whether or not higher level mathematics
has been studied, the student’s interest in mathematics (one implication of whether a
student is a maths student or otherwise) and the student’s self-efficacy in
mathematics, each contribute a significant (albeit small) increase to the Numeracy
Score is of importance.
4.3.5 What influences responses to individual questions?
For each item in the skills questionnaire, a logistic regression was performed on the
dichotomous response variable correct/incorrect using the six predictors (excluding
OP) which were significant in explaining total scores, namely: Gender; Maths B
Result; Higher Maths; Maths Student; Self-Efficacy and Year. This allows us to see
which of the students’ characteristics are most important in explaining their success
Chapter 4 Numeracy
93
at a particular question, remembering that the presence or absence of one
characteristic is dependent on the presence or absence of the others. Table 4.4 gives
these results with questions arranged in increasing order of difficulty. The p-values
indicated are those obtained while allowing for all other significant variables. While
acknowledging that some spurious results may be included here as a result of much
testing, all individually significant terms are noted for investigative purposes.
Question Gender Maths B Result
Higher Maths
Maths Student
Self- Efficacy Year
level one N_5 N_8 * + ** N_1 *** N_11 * + level two N_9 *** *** *** N_7 * *** N_3 *** *** N_10 *** *** * level three N_6 *** N_2 *** N_19 *** *** * level four N_12 *** *** * N_17 ** *** ** N_21 *** *** N_4 *** N_16 *** N_13 *** *** N_18 *** * ** N_14 *** * * level five N_20 *** * * N_15 * *** *
*** p<0.005, ** 0.005<p<0.01, * 0.01<p<0.05; + opposite direction to that expected
Table 4.4 Significance of predictors in the logistic regression for each item on the Numeracy Questionnaire
Chapter 4 Numeracy
94
Given that the level of items in the questionnaire is consistent with junior high school
mathematics, it is particularly interesting to note for which questions success is
strongly influenced by studying higher level (i.e. beyond Maths B) mathematics.
Apart from N_15 (an item which was found difficult across the range of student
characteristics), questions which require application of fractions beyond addition
(N_12, N_17, N_18, N_20) are all dependent on students having studied higher level
mathematics. This is consistent with comments made earlier regarding students’
tendency to regress in basic skills when application thereof is required.
Other questions which depend on higher level mathematics include those involving
inequalities (N_19, N_20, N_21). Interestingly, two of these questions (N_19,
N_20) also depend on being a maths student. Anecdotal evidence of teachers
suggests that inequalities are currently de-emphasised in the Queensland high school
curriculum. This is undoubtedly the reason that success in this area is heavily
dependent on studying higher level mathematics and having an interest in
mathematics.
One further question for which Higher Maths is a significant predictor is N_7. This
question asks:
A group of 340 students must be divided into lab classes with a maximum of 30
students in each. The smallest number of lab classes needed is:
It is important to emphasise that such a practical question is significantly influenced
by the study of mathematics beyond the standard senior algebra and calculus based
course because it is not sufficiently realised how much higher level mathematics
develops generic problem-solving skills. Further consideration of responses to this
question is even more enlightening. Table 4.5 gives the percentage of students who
gave each possible response for the three groups: those who had not studied Maths
B; those who had studied at the Maths B level; and those who had studied beyond
Maths B.
Not only does the success rate continue to increase with the level of mathematics
studied, but the percentage who choose the meaningless answer of 11.3 also falls
noticeably. Many students are encouraged to choose not to study Maths B or higher
Chapter 4 Numeracy
95
level mathematics on the basis that algebra and calculus are more specialised and less
applied and are not necessarily linked with ‘real life’ skills. This example clearly
questions such beliefs.
Response % students
below Maths B % students at Maths B
% students above Maths B
10 0.0 7.6 2.6 11 7.4 7.2 2.6 11.3 14.8 9.6 4.1 12 66.7 75.3 89.7 13 7.4 0.3 0.5 no response 3.7 0.0 0.5
Table 4.5 Student responses to item N_7 according to mathematical background
4.4 Discussion
The debate as to whether or not educational standards are falling is common among
educators, the media and the general public. This study does not attempt to address
this issue. Undoubtedly high school graduate capabilities have risen in some areas
and fallen in others. However tertiary educators need to be aware of which skills can
or cannot be assumed of the majority of their incoming students.
A senior algebra and calculus based mathematics course is a prerequisite for many
scientific or quantitative tertiary courses, and if not a formal prerequisite, the content
of senior mathematics is likely to be considered ‘assumed prior knowledge’. Such a
course depends crucially on pre-senior mathematics. When commenting on tertiary
students’ backgrounds, many tertiary educators consider only the courses of senior
schooling. In considering mathematical skills it is essential to consider the effects of
the pre-senior years. Senior algebra and calculus based mathematics courses will
consolidate pre-senior mathematics, but the extent of consolidation clearly depends
on the students’ pre-senior mathematics experiences. Hence tertiary educators need
greater awareness of the emphasis (or lack of emphasis) on mathematical skills in the
pre-senior years, and the extent of consolidation needed at the senior school and even
tertiary level for students to be able to apply these skills in the new or multi-step
Chapter 4 Numeracy
96
situations that characterise so many tertiary areas. Observing that 34% of the cohorts
in both years are operating with a numeracy ability at or below the lower boundary of
level three, as defined by this questionnaire, emphasises that tertiary educators
should be aware that individuals, even with a senior algebra and calculus based
background, may need help with some of the basics, particularly in unfamiliar or
multi-step situations.
There is a growing tendency within universities to maximise potential student intake
by reducing the number of prerequisite subjects. In quantitative areas this has
resulted in the removal of higher level mathematics as a prerequisite for any field of
study, and of the standard algebra and calculus based course for many areas. Due to
the increased variety of subjects available at senior high-school level, this removal of
mathematics as a prerequisite has resulted in a number of senior students opting for a
non-algebra, non-calculus based alternative or no mathematics at all. This decision
is rarely challenged and even encouraged by parents and guidance officers who
consider that unless specific mathematical content will be used in further study, then
it is not valuable. There is a general lack of understanding across the community of
the generic problem-solving skills that are acquired by students in studying specific
mathematical skills, and of the amount of study of mathematics that is required to
attain the Cockcroft (1982) ‘at-home-ness’ with mathematics. Compare this with
language studies in which the study of literature is rarely questioned.
This study demonstrates that in order to have full and confident use of basic pre-
senior mathematical skills, mathematics needs to be studied beyond, and preferably
well beyond, the pre-senior level. This enables students to consolidate their basic
mathematical skills to the point where they can be confidently and reliably used and
applied, particularly in multi-step problems. Without such consolidation, skills
which may be known in isolation are lost in application and the ‘at-home-ness with
numbers and ability to make use of mathematical skills’ of Cockcroft’s numeracy is
not developed. The conclusion which must be drawn by those advising school
students, is that students should be encouraged to continue with mathematics to the
highest level of their ability. Only then can they be certain of maximising their
competency with even basic mathematical skills.
Chapter 5 The Statistical Reasoning Questionnaire
97
Chapter 5 The Statistical Reasoning
Questionnaire
5.1 Introduction
The educational reforms on statistics, which have taken place over the last ten to
twenty years, could be largely summarised by the observation that the generally
agreed upon course goal is the development of statistical reasoning. This approach
has included highlighting the importance of authentic assessment that focuses on
statistical reasoning and the need for specific research evaluation tools. While it is
generally agreed that statistical reasoning is best assessed via one-on-one
communication and the analysis of student work in in-depth tasks such as projects, it
is also acknowledged that instruments are needed which are easy to administer and
score for the purpose of accurately evaluating statistical reasoning, particularly in a
research environment (Garfield, 2003).
The Statistical Reasoning Assessment (SRA) was developed in order to measure the
effectiveness of a new statistics curriculum for high-school students in the US
(Konold, 1990; Garfield, 1991, 2003) and has been used in a variety of settings since.
This instrument, which consists of 20 multiple-choice questions, focuses on
reasoning about data, representations of data, statistical measures, uncertainty,
samples and association. In the development stage of the SRA, it was reported that
low internal reliability coefficients suggested that the instrument did not measure a
single trait, leading to some confusion as to how best to interpret the SRA results
(Garfield, 2003). One interpretation involves the calculation of eight scales of
correct reasoning and eight scales of incorrect reasoning (or misconceptions). Each
of these 16 scales is measured by between one and five items with most items
contributing to both a correct and an incorrect reasoning scale and some items
contributing to more than one incorrect reasoning scale depending on the distracter
Chapter 5 The Statistical Reasoning Questionnaire
98
which is chosen. Garfield (2003) reports these 16 scales in a cross-cultural
comparison but relies on the aggregate correct reasoning score and the aggregate
incorrect reasoning score for statistical analysis.
Questioning the use of aggregate scores given their reported low reliability,
Tempelaar (2004) prefers to analyse the 16 individual scales, each of which is
assumed to be reliable, as well as the correct and incorrect reasoning aggregates.
Tempelaar also suggests (and later uses in the context of structural equation
modelling (Tempelaar, 2006)) combining individual correct and incorrect reasoning
scales in accordance with the results of factor analysis. Under such a scheme, pairs
of correct and incorrect reasoning scales would be chosen for combination by virtue
of large negative correlations. As Tempelaar describes, these correlations are a result
of the design of the scales since the pairs largely consist of correct responses (in the
case of the correct reasoning scale) and selected incorrect responses (in the case of
the incorrect reasoning scale) to the same questions. Such a solution seems to be
little more than a complex construction of a partial credit scheme where the choice of
some distracters (namely those indicating a particular misconception) is seen to be
“more wrong” than the choice of other incorrect responses.
One difficulty with analysing the individual scales from the SRA would appear to be
the small number of items (between one and five) on which each scale is based. A
larger number of items with a range of difficulty would be preferable to meaningfully
interpret individual scales. This is particularly the case if comparisons are to be
made between the results for different scales as Tempelaar has done. Of course,
increasing the number of items on each scale would quickly lead to a questionnaire
of considerable length and hence impracticality with regard to its administration.
The Quantitative Reasoning Quotient (QRQ) (Sundre, 2003) is an adaptation of the
SRA developed to address the limitation of low internal consistency and to improve
the ease of scoring with regard to some of the more complex items on the SRA. In
particular, items which require students to identify the correct rationale from a list
were adapted so that students needed to indicate their agreement or disagreement
with each correct or incorrect rationale presented. The aim of this was to elicit
further information regarding students’ misconceptions from single items, consistent
Chapter 5 The Statistical Reasoning Questionnaire
99
with the view (Konold, 1995) that students can simultaneously hold multiple
contradictory beliefs, and also to increase the number of items from the same
domain, hopefully leading to an improvement in internal reliability.
In keeping with the use of the SRA, the QRQ focuses on scales of correct and
incorrect reasoning, rather than a total score. It includes three additional scales of
correct reasoning and seven additional scales of incorrect reasoning, although many
of the added scales of incorrect reasoning simply indicate a lack of a particular aspect
of correct reasoning, for example “failure to recognise potential sources of bias and
error”. Identification of such deficiencies seems to add little to the understanding of
statistical reasoning beyond the information contained in the corresponding correct
reasoning scale. This is in contrast to the majority of the original incorrect reasoning
scales which measure previously documented misconceptions such as an outcome
orientation (Konold, 1989) and the representativeness misconception (Kahneman, et
al., 1982; Shaughnessy, 1992) which are applied within and across a range of
situations.
In another approach, Watson and Callingham (2003) use Item Response Theory to
analyse 80 questionnaire items administered to over 3000 Australian school children
and argue that statistical literacy is a single construct. While the collection of much
of these data was via interview, some involved the administration of a pen and paper
instrument focusing specifically on the understanding of variation. Included in the
80 items are a number of items also included in the SRA, as well as it ems specific to
the school curriculum and items taken from the media. One conclusion from this
work would be that it should be possible to measure statistical reasoning as a single
trait without resorting to individual scales which consider various aspects of correct
and incorrect reasoning. This need not detract from the analysis of individual items
or combinations thereof with the intent of adding to the understanding of students’
reasoning and misconceptions.
In this chapter, we introduce an instrument, the Statistical Reasoning Questionnaire
(SRQ), for use at the interface of secondary and tertiary education. Section 5.2
describes the construction of the SRQ, drawing on the work of Garfield, and Watson
and Callingham. Section 5.3 details the responses of the students in the study to the
Chapter 5 The Statistical Reasoning Questionnaire
100
individual items of the SRQ. Section 5.4 begins a discussion of the suitability of the
SRQ as an instrument to measure statistical reasoning at the secondary/tertiary
interface. This discussion continues in Section 6.5 after further analyses of the SRQ
are carried out in Chapter 6 using Rasch methods.
5.2 Construction of the Statistical Reasoning Questionnaire
(SRQ)
The Statistical Reasoning Questionnaire (SRQ) was developed specifically for this
study to assess the statistical reasoning of students entering the unit MAB101,
Statistical Data Analysis 1, at the Queensland University of Technology. As
described in Chapter 2, students in this unit are enrolled in a science degree or a
degree program associated with science. In constructing the SRQ we considered
each item of the Statistical Reasoning Assessment (SRA) (Konold, 1990; Garfield,
1991, 2003) and the items of Watson and Callingham (2003), in the context of
statistical reasoning at the secondary/tertiary interface. We chose three items
common to these two instruments and adapted two more. Another two items from
the SRA and six items from the Watson and Callingham Questionnaire were
included. The remaining nine items were developed based on experience at the
secondary/tertiary interface, with five of these specifically relevant to Australian
school syllabi, particularly the officially assumed knowledge for the unit. Table 5.1
indicates the sources and aspects of reasoning of the 22 items. The complete SRQ
can be found in Appendix F and the questions are also given in the discussion of
Section 5.3.
In developing the SRQ, we referred to the six aspects of statistical reasoning listed by
Garfield (2003). These are:
• Reasoning about data: recognising data types and understand ing
implications for use;
Chapter 5 The Statistical Reasoning Questionnaire
101
• Reasoning about representations of data: being able to read and interpret a
graph and understand how it represents a sample; seeing beyond individual
data points to general characteristics of the distribution of the data;
• Reasoning about statistical measures: understanding and using appropriate
measures of central tendency and variation to describe and compare data;
knowing that such summaries are better for large samples;
• Reasoning about uncertainty: understanding the concept of probability to
make decisions about uncertain events; being able to determine and compare
simple probabilities;
• Reasoning about samples: understanding the relationship between a sample
and a population; knowing the importance of size and representativeness;
• Reasoning about association: being able to interpret relationships between
two variables, including reading of scatter plots and two-way tables; knowing
the difference between association and causality.
This list does not explicitly mention reasoning about variation which we also
included in our reference framework. Garfield considered the above aspects of
statistical reasoning to encompass the skills covered by a new US high school
curriculum which the SRA was designed to assess. The essential elements of these
aspects, together with reasoning about variation, were considered by us to be an
appropriate basis for assessing the statistical reasoning of students at the
secondary/tertiary interface, although we place less emphasis on determining and
comparing simple probabilities.
The first of Garfield’s aspects, reasoning about data, is assessed only implicitly in the
SRA. MAB101 includes specific and explicit teaching regarding types of data and
this emphasis is continued throughout the course in the teaching of techniques for
data analysis. This approach increases continuity within the course and helps
students to acquire skills in selecting statistical tools appropriately. While it would
be desirable to know the preparedness of students for this approach, it is difficult to
assess such understanding without relying on specifically taught vocabulary such as
the terms categorical and continuous variables. Hence in the SRQ, assessment of
Chapter 5 The Statistical Reasoning Questionnaire
102
reasoning about data, in the Garfield sense of recognising and using data types,
remains implicit.
The second aspect, reasoning about representations of data, is also covered only
implicitly by the SRA. As the students at the secondary/tertiary interface in this
study have generally been well exposed to reasoning about representations of data,
and the construction of items which assess this specific aspect of reasoning is
uncomplicated, it was felt that this was a distinct area of reasoning which warranted
explicit coverage in the SRQ. One aspect we identified in this area of reasoning is a
commonly held misconception “prettier is better” by which people believe that a
more complex graphical representation is preferable to a simpler version. This
misconception is frequently developed during primary school where increased access
to computer programs enables students to select an embellished graph without regard
for understanding. The example used in the SRQ is a three-dimensional pie graph.
Absent from Garfield’s list of aspects of statistical reasoning is reasoning about
variation. This is somewhat unexpected given the emphasis in the literature on the
recognition and understanding of variation as a vital component of statistical
reasoning (see for example, Chance and Garfield (2002), Meletiou-Mavrotheris and
Lee (2002) and Watson et al.(2003)). However some aspects of reasoning about
variation are covered in the SRA with two questions contributing to a correct
reasoning skill labelled “understands sampling variability”, which may be seen as a
component of reasoning about samples. Perhaps reasoning about variation is so
fundamental to statistical reasoning that, like reasoning about data, it too is thought
to be implicit in most aspects of reasoning. In the SRQ, reasoning about variation
features implicitly in many items and explicitly in SRQ_2, SRQ_10, SRQ7, SRQ_9,
SRQ_13 and SRQ_16. The last four of these items also describe reasoning about
samples and hence we refer to this aspect as Reasoning about Samples and Variation.
One aspect of the SRA which was eliminated from the SRQ is a heavy reliance on
questions involving tossing of coins and rolling of dice, with seven out of twenty
questions of the SRA being framed in that context. Such reliance is typical of
traditional examples in the area of chance. There appear to be two main reasons for
this. The first is that because these contexts are familiar, they can be made accessible
Chapter 5 The Statistical Reasoning Questionnaire
103
to students with a minimum of explanation, a useful feature particularly with regard
to assessment items. The second reason for the reliance on these contexts is that
much of the research into the development of young children’s understanding of
chance is set in this context. The second reason depends partly on the first and also
on the fact that experiments with dice and coins (also cards, coloured marbles and
spinners) can be easily carried out in the classroom.
It has been previously argued (Gal and Garfield, 1997; Wild and Pfannkuch, 1999)
that context is crucial in statistical reasoning and that students must be involved in
genuine contexts in order to fully develop their statistical reasoning. While educators
undoubtedly intend that dice and coins are simple representations of more general
and complex contexts, the assumption that students of any age at an introductory
level are capable of making the leap from dice, coins and cards to people, days and
products is unfounded. Research indicates that errors in probabilistic reasoning are
frequently associated with problems referring to everyday life (Kahneman, et al.,
1982) even for those with some formal training (Kahneman and Tversky, 2000). It
may be that over reliance on artificial and simple contexts prevents students from
taking skills and understanding to real world contexts. Hence we believe that such
artificial examples are better avoided wherever possible in examples and assessment.
The four questions involving dice in the SRA constitute the section “uses
combinatorial reasoning” of the scale “correctly computes probability” and are also
intended to detect either the outcome orientation misconception (in which a
probability is interpreted in terms of an individual outcome rather than a series of
events) or the equiprobability bias (which assumes that all events under consideration
are equally likely). However, the heavy reliance of these questions on calculation
means that incorrect responses which are attributed to particular misconceptions are
at least as likely to be the result of incorrect numerical calculation as an application
of a misconception. An example of this is found in the SRA question which asks
which of the outcomes: 5 black and 1 white, or 6 black, is more likely when a die
with five black sides and one white is rolled five times. These two possible
outcomes are too close in probability to be intuitive to most people and hence require
the application of the binomial distribution. Such use of combinatorics to calculate
probabilities was an earlier foundation of Queensland senior high-school
Chapter 5 The Statistical Reasoning Questionnaire
104
mathematics classes in probability, but is misplaced when included in an
introductory data analysis course. It is an example of emphasising calculation, and in
fact irrelevant calculation, possibly at the expense of understanding. Perhaps the
relative emphasis on questions of this nature in the SRA reflects an emphasis in the
curriculum which the SRA was designed to assess. However, as such specific
knowledge of the behaviour of probability distributions and combinatorial reasoning
is not required for MAB101, nor is it an emphasis in the current core Queensland
senior high-school mathematics curriculum, the SRQ does not include any questions
of this nature.
The three coin tossing items in the SRA are intended to constitute the correct
reasoning scale “understands independence” and are also intended to detect the
representativeness misconception by which people determine the likelihood of an
event by how well it represents a population. For example, one such question reads:
Which of the following sequence is most likely to result from flipping a fair coin 5
times?
a) HHHTT
b) THHTH
c) THTTT
d) HTHTH
e) all four sequences are equally likely.
(Responses a, b and d are taken to indicate a representativeness misconception.)
Care needs to be taken with the interpretation of this scale as the application of
independence in these three items is within the single setting of binomial trials. It is
highly likely that a student may have developed an understanding of independence in
this setting without extending it to other contexts. It is in the application of
independence to real world contexts that students (and professionals) frequently
come to grief. The SRQ uses just one coin tossing item, set in a sporting context for
greatest familiarity.
An emphasis in the work of Watson (1993; 1997) has been in the use of examples
from the media or set in a media context to assess statistical literacy. For much of
Chapter 5 The Statistical Reasoning Questionnaire
105
society, the media are the most common source of statistical information and provide
a context which is appreciable to a wide range of students including those at the
secondary/tertiary interface. Hence, a number of items of this nature are adapted
from Watson and Callingham (2003) for the SRQ.
A group of curriculum-specific questions (SRQ_17 to SRQ_20) were developed for
the SRQ requiring an understanding of quantiles. These questions are set in the
context of data which has been transformed by taking the logarithm to base ten, and
hence require students to combine two distinct areas of knowledge from their senior
mathematics studies. Because of the specific curriculum-dependent nature of this set
of questions and the level of mathematical knowledge they require, they are not
included in some of the analyses which are described in Chapter 7 where
relationships between statistical reasoning, numeracy and background are explored.
Another curriculum-specific question (SRQ_22) was added to the SRQ in its second
year, based on the use of the normal distribution. Inclusion of this question was
prompted by classroom observation that first-year students appeared to have less
confidence with the normal distribution than in previous years. This question
requires students to transform to the standard normal distribution and to interpret
probability as an area under the curve, utilising the symmetry of the distribution.
Another change to the SRQ in its second year was the removal of SRQ_4 and
addition of SRQ_21. The first of these questions was removed because of an
extremely high success rate of 96%. In fact the replacement question proved to be
almost as easy for students with a success rate of 94%. In the analyses of Chapter 7,
only questions common to both years are included.
As well as selecting questions to cover the different aspects of reasoning, items were
intentionally chosen with varying degrees of difficulty. In the Watson and
Callingham study, Rasch analysis is used to define levels of reasoning from Level 1:
Idiosyncratic to Level 6: Critical mathematical. As eleven of the items on the SRQ
appear in or are similar to the Watson and Callingham study, including some
common to the SRA, the Rasch analysis variable map could be used to select items
whose responses covered the full range of levels. For each question on the SRQ,
Question Source Aspect Reasoning about… Skill Misconception
Watson & Callingham
Level
SRQ_1 SRA; W&C Statistical measures Understands algorithm for mean Mean is most common value Mean is middle value 4
SRQ_2 SRA; W&C* Statistical measures Variation Understands how to select an appropriate average Mean is most common value
Mean calculated regardless of outliers 6
SRQ_3 SRA; W&C** Uncertainty Understands probability as a ratio 4 SRQ_4 SRA; W&C Uncertainty Correctly interprets probabilities Outcome approach 3 SRQ_5 W&C Uncertainty Correctly interprets probabilities Conjunction; Availability 3 SRQ_6 W&C Uncertainty Understands independence Representativeness 4
SRQ_7 SRA; W&C Samples Variation Understands importance of large samples Law of small numbers
Outcome approach 5
SRQ_8 W&C Representations of data Knows that sum of percentages should be 100% 5
SRQ_9 W&C Samples Variation Understands sampling-re-sampling process 6
SRQ_10 SRA Uncertainty Variation Correctly interprets probabilities Outcome approach
SRQ_11 W&C Representations of data Recognises importance of scale on graph 6 SRQ_12 Representations of data Can choose an appropriate graph Prettier is better
SRQ_13 SRA Samples variation Understands sampling variation Law of small numbers
SRQ_14 W&C Statistical measures Association Understands importance of rate Correlation implies causality 6
SRQ_15 Association Distinguishes between correlation and causality Correlation implies causality
SRQ_16 Samples Variation Understands sampling variation
SRQ_17 Statistical measures Can calculate a median SRQ_18 Statistical measures Can calculate a lower quartile SRQ_19 Statistical measures Understands effect of data transformation on median SRQ_20 Statistical measures Understands effect of data transformation SRQ_21 Representations of data Recognises importance of scale on graph SRQ_22 Uncertainty Understands use of standard normal curve
Table 5.1 Questions in the SRQ are chosen to assess all aspects of reasoning at the full range of levels
* SRQ_2 w as a multiple-choice question in SRA and W&C ** SRQ_3 used less obvious values than in SRA and W&C
Chapter 5 The Statistical Reasoning Questionnaire
107
Table 5.1 reports the levels of understanding demonstrated by a fully correct
response to that or similar question in the Watson and Callingham study. As a partial
credit model was used in that study, the lower levels of understanding, particularly
levels 1 and 2, are generally consistent with partially correct responses to items.
Also included in Table 5.1 is the source of each question, as well as aspects of
reasoning, skill and misconception assessed by each item.
5.3 Results of the SRQ
MAB101 students completed the Statistical Reasoning Questionnaire in class during
the first week of semester. For the SRQ, students were permitted to use calculators
(unlike the Numeracy Questionnaire). Although the majority of students found the
time in class ample, a handful of students who required more time opted to complete
the questionnaire at home. It was clearly explained to students that, although the
questionnaire was relevant to their studies in the subject, the results obtained would
have no bearing on their grade, but would be used solely for research purposes. The
majority of students who attended classes during the first week of semester were
willing to participate, with approximately 66% of enrolled students completing the
SRQ.
The SRQ was administered to three separate cohorts of students: first semester 2004,
second semester 2004 and first semester 2005. Where students were included in the
study within more than one cohort, only one set of data was included. This was
selected such that it belonged to the cohort in which the student had completed a
larger number of survey instruments over the entire study, or, where an equal number
had been completed, the most recent cohort was selected. There were ten students
for whom such a decision needed to be made with regard to the SRQ.
The distribution of results over the three cohorts was remarkably similar. For the
nineteen questions common to both years, Figure 5.1 shows the summary statistics
and boxplots of the total score for each of the three cohorts. A one-way ANOVA for
differences in the mean scores gives p>0.3. Further inspection of responses to
Chapter 5 The Statistical Reasoning Questionnaire
108
individual questions demonstrated further consistency between the three cohorts. For
this reason the groups are pooled for the discussion of individual questions which
follows.
Descriptive Statistics: SRQ_Com Variable Cohort N Mean SE Mean StDev SRQ_Com I_04 300 9.193 0.166 2.877 I_05 237 8.865 0.196 3.011 II_04 75 9.267 0.322 2.787 Variable Cohort Minimum Q1 Median Q3 Maximum SRQ_Com I_04 0 7 9 11 18 I_05 2 7 9 11 18 II_04 4 7 9 11 15
Cohort
SRQ
_Com
II_04I_05I_04
20
15
10
5
0
Boxplot o f SRQ Total (Common ques tions ) by Cohort
Figure 5.1 The distribution of SRQ Scores is consistent across the three cohorts.
5.3.1 Student responses to individual items
This section discusses students’ responses to individual items with reference to the
main aspect of reasoning assessed. All items, but not all responses, are discussed
here. A complete summary of responses can be found in Appendix G.
Reasoning about representations of data
Items SRQ_8, SRQ_11, SRQ12 and SRQ_21 assess reasoning about representations
of data. SRQ_8 requires students to critically read a pie chart from the media and
recognise that it sums to more than 100%.
Chapter 5 The Statistical Reasoning Questionnaire
109
SRQ_8 Is there a problem with the following pie
chart? If so, identify the problem.
A fully correct response was given by 72% of students while another 6% recognised
that the pie chart was not drawn in correct proportion but did not take the final step
of checking the percentages.
In item SRQ_12, students are required to choose between a two and three
dimensional pie chart, giving reasons for their choice.
SRQ_12 Another group of students carried out a survey at the local library
regarding the most frequent reason for using the internet. They
produced the following two graphs.
Graph A Graph B
Internet Usage
contact friends33%
study22%
entertainment17%
work14%
information6%
other8%
Internet Usage
contact friends33%
study22%
entertainment17%
work14%
information6%
other8%
Which of the graphs, A or B, would you recommend the students use
and why?
Fifty-eight percent of students chose the two-dimensional graph, justifying their
choice with explicit or implicit reference to the greater accuracy of the two-
dimensional representation. While 28% preferred the three-dimensional version,
only 7% of students demonstrated a “prettier is better” misconception by choosing
Chapter 5 The Statistical Reasoning Questionnaire
110
the three-dimensional graph giving an incorrect reason such as “it’s more
professional.” This is pleasing because this has been emphasised in recent years in
professional development workshops and other advice to teachers.
Items SRQ_11 and SRQ_21 both require an appreciation of the importance of scale
in reading a graph.
SRQ_11 A group of students recorded the number of years their families had
lived in their town. Here are two graphs that the students drew to
illustrate their results.
Graph 1
x x x x x x x x x x x x x x x x x x x x x x
0 1 2 3 4 5 6 10 11 12 13 14 17 25 37
YEARS IN TOWN
Graph 2
x x x x x x x x x x x x x x x x x x x x x x
0 5 10 15 20 25 30 35
YEARS IN TOWN
Which of these two graphs (1 or 2) would you recommend the
students use and why?
SRQ_21 The graph below shows the number of members of the American
Mathematical Society.
Join the ever increasing number of professionals who enjoy the benefits of AMS membership! 29
28
26
24
22
20
18
1987 1988 1989 1990 1991
Mem
bers
hip
in th
ousa
nds
Chapter 5 The Statistical Reasoning Questionnaire
111
Circle any of the following statements which are correct:
a) Membership of the AMS in 1991 was twice what it was in 1989.
b) Between 1987 and1991 membership of the AMS doubled every
two years.
c) Membership of the AMS could reasonably be expected to reach
50000 by 1992
d) None of the above statements is correct.
In item SRQ_21 (introduced in the second year of the questionnaire), students need
to read the scale on the y-axis of the graph and not rely on the visual illusion created
by the size of the bars. Nearly all students (94%) did this correctly. Item SRQ_11 is
more difficult (with 69% success rate), although non-constant scale, this time along
the x-axis, is still the focus of the question. The most likely reason for the degree of
difference in the correct response rate to these questions is that in SRQ_21, the form
of the question requires students to make a numerical comparison which prompts
them to read the values on the graph rather than rely on the visual image. It is
possible that some of these students answer correctly without even noticing the
discrepancy between the numerical and visual interpretation of the graph. Item
SRQ_11, by comparison, is open-ended and does not prompt a numerical reading of
the graph so that students are less likely to detect the changing scale.
Reasoning about statistical measures
Items SRQ_1 and SRQ_2 assess reasoning about statistical measures, as do the more
curriculum-based items SRQ_17 to SRQ_20. SRQ_1 requires an interpretation of
the statement that the average number of children is 2.2.
SRQ_1 To get the average number of children per family in a small town, a
teacher counted the total number of children in the town. She then
divided by 50, the total number of families. The average number of
children per family was 2.2.
Which of the following is certain to be true?
Chapter 5 The Statistical Reasoning Questionnaire
112
a) Half of the families in the town have more than 2 children.
b) More families in the town have 3 children than have 2 children.
c) There are a total of 110 children in the town.
d) There are 2.2 children in the town for every adult.
e) The most common number of children in a family is 2.
f) None of the above.
In the SRA component scales, a correct answer to this question is deemed towards
the correct reasoning scale “understands how to select an appropriate average”. This
interpretation is perhaps a little generous as all selection has already been carried out
for the student. Rather, the question requires interpretation of the mean in terms of
the algebraic unpacking of the algorithm. In this question students are able to choose
more than one response. Two-thirds of students chose only the correct response to
this question. The misconception that the mean is always the middle value was
demonstrated by 3% of students, while the belief that the mean is the most common
value was demonstrated by 19%.
SRQ_2 requires the selection of an appropriate average for a sample in which, given
the context, one value is clearly an error.
SRQ_2 A small object was weighed on the same scales separately by nine
students in a science lab. The weights (in grams) recorded by each
student are shown below.
6.3 6.0 6.0 15.3 6.1 6.3 6.2 6.15 6.3
The true weight could be estimated in several ways.
How would you estimate it?
Only 37% of students demonstrated their appreciation of the context of the question
by removing the outlier before calculating the mean. An interesting response, given
by 9% of students, was to delete both the maximum and minimum values before
Chapter 5 The Statistical Reasoning Questionnaire
113
calculating the mean. Although this response shows some degree of understanding,
it demonstrates the same lack of engagement with context as many other responses.
The mode, either on its own or in combination with another measure, was suggested
as an appropriate average by 10% of students; this reflects a known problem with
school syllabi and school texts in incorrectly including mode as a ‘measure of centre’
without qualification.
As well as being specifically curriculum-dependent, SRQ_17 to SRQ_20 are more
computation and terminology based than most of the other questions. They assess
whether students can find the median (SRQ_17) and first quartile (SRQ_18) of the
data set and whether they understand the relationship between transformed and
original data (SRQ_19 and SRQ_20).
Total rainfall (in millimetres) for the month of November 2003 was collected from 25
weather stations over a region of northern Australia. The data below represent the
logarithms (to base 10) of the rainfall. These values have been ordered from
smallest to largest for ease of manipulation.
0.00 1.04 1.04 1.18 1.36
1.54 1.56 1.59 1.62 1.62
1.64 1.64 1.71 1.71 1.75
1.76 1.77 1.77 1.80 1.86
1.92 1.95 2.03 2.14 2.41
SRQ_17 What is the median of the above data set (i.e. the median of the log
of the rainfalls)?
SRQ_18 What is the lower quartile of the data set?
SRQ_19 Using your answer to question SRQ_17, calculate the median total
rainfall (in millimetres) for the region.
SRQ_20 What was the highest total rainfall recorded for the month in the
region?
The median could be correctly identified (the 13th value in a set of 25) by 67% of
students with another 8% choosing a nearly correct value (such as the 12th). Finding
the first quartile proved far more difficult. The 7th or the average of the 6th and 7th
Chapter 5 The Statistical Reasoning Questionnaire
114
values was accepted as a correct response, with 14% of students giving one of these.
It is particularly interesting to note that 23% of students gave a range of values (such
as 0 to 1.56) in response to this question. This concept that a quartile is a range of
values appears to be quite prevalent at high school and is one of which tertiary
educators are generally unaware. Although the idea of dividing the distribution into
quarters may still be part of the students’ understanding, it takes some rethinking on
the students’ part to move their understanding of the terminology from a range of
values to a single value and they are likely to need some specific teaching to do so.
The results of SRQ_19 and SRQ_20 show that very few students know how to back-
transform from the logarithm of the data to the original data despite studying logs at
school. In SRQ_19, 17% of students gave the correct transformation of the median,
6% reported the median without transformation, 11% gave an incorrect but plausible
transformation (i.e. one involving logs, exponents or powers of 10) while 53% gave
no response. In responding to SRQ_20, 17% of students gave the correct
transformation of the maximum, 39% reported the maximum without transformation,
11% gave an incorrect but plausible transformation and 30% gave no response. The
difference between the numbers of students who simply gave the recorded value in
the two questions is almost certainly due to the hint given in SRQ_19 to use the
answer to SRQ_17. With the hint and the preceding question, the students knew that
they needed to do something to the data but did not know what, hence the high non-
response rate. Without the hint, (even though the context remains unchanged and
SRQ_17 has already been attempted) it is very probable that the students would not
realise that the maximum needed to be back-transformed.
Reasoning about uncertainty
Reasoning about uncertainty is addressed by SRQ_3 through to SRQ_6. SRQ_3
requires students to understand probability in terms of a ratio.
Chapter 5 The Statistical Reasoning Questionnaire
115
SRQ_3 Box A and Box B are filled with red and blue marbles as follows.
Box A Box B
Each box is shaken. In order to win a ticket to a sporting match, you
need to get a blue marble, but you are only allowed to pick out one
marble without looking. Which box should you choose?
a) Box A (with 12 red and 8 blue).
b) Box B (with 30 red and 20 blue).
c) It doesn’t matter which box is chosen.
An accurate comparison of the two probabilities was made by 82% of students.
Rough working on the questionnaire and classroom discussion later in the semester
suggested that the majority of errors in this question were the result of inaccurate
calculation rather than the misunderstand ing of probabilities. A typical error was to
calculate the ratio of red to blue marbles in one box but red to all marbles in the
other. It is suggested that a lack of confidence and familiarity with common
fractions (see section 4.3.1) is the cause of this error. This item receives further
attention in Section 7.4.
SRQ_4, requiring a correct interpretation of a probability in a daily context, was
correctly handled by nearly all (96%) of students in 2004 leading to its non- inclusion
in the 2005 version of the SRQ.
SRQ_4 A bottle of medicine has the following printed on it: WARNING: For
applications to skin areas there is a 15% chance of getting a rash. If
you get a rash, consult your doctor. How would you interpret this?
a) Don’t use the medicine on your skin – there’s a good chance of
getting a rash.
b) For application to the skin, apply only 15% of the recommended
dose.
30 red 20 blue
12 red 8 blue
Chapter 5 The Statistical Reasoning Questionnaire
116
c) If you get a rash, it will probably involve only 15% of the skin.
d) About 15 out of every 100 people who use this medicine get a
rash.
e) There is hardly any chance of getting a rash using this
medicine.
The choice of the distracter which interprets a 15% chance as ‘hardly ever’ is
classified as demonstration of an outcome approach by Garfield (2003).
Demonstration of this misconception at this degree of difficulty is minimal (2% of
students) at this level. In SRQ_10, however, 44% of students demonstrated this
same misconception by claiming that a weather prediction which gave a 70% chance
of rain was very good if it rained on 85 to 100% of days for which such a prediction
was made. This result is comparable to that (46%) reported by Konold (1995).
SRQ_10 The Bureau of Meteorology wanted to determine the accuracy of
their weather forecasts. They searched their records for those days
when the forecaster had reported a 70% chance of rain. For those
particular days (that is, those days for which the forecast was stated
as a 70% chance of rain), they compared the forecast with records
of whether or not it actually rained.
The forecast of 70% chance of rain can be considered very accurate
if it rained on:
a) 95% - 100% of those days
b) 85% - 94% of those days
c) 75% - 84% of those days
d) 65% - 74% of those days
e) 55% - 64% of those days
In the SRQ 46% of students answered this question correctly.
Chapter 5 The Statistical Reasoning Questionnaire
117
SRQ_5 requires students to understand probabilities involved in two events and their
conjunction.
SRQ_5 An Australian male is rushed to hospital in an ambulance. Which of
the following is least likely?
a) The man is over 55.
b) The man has had a heart attack.
c) The man is over 55 and has had a heart attack.
Both incorrect responses are evidence of the conjunction fallacy (in this case
believing that being over 55 and having a heart attack is more probable than either
one of the individual events). No doubt the reason behind the belief is due to the use
of an availability heuristic (Kahneman, et al., 1982), as it is easier to call to mind
examples of the conjunction than of the individual events. Sixty-two percent of
students answered this question correctly.
SRQ_6 requires students to understand independence in a coin tossing context.
SRQ_6 As captain of your cricket team you have lost 8 out of 9 tosses in
your previous 9 matches. For the next 4 tosses of the coin, you
choose heads. Tails comes up 4 times. For the 5th toss, what
should you choose?
a) Heads
b) Tails
c) It doesn’t matter
What is the probability of getting heads on this 5th toss?
What is the probability of getting tails on this 5th toss?
Note: assume only fair coins are used in tosses at cricket matches!
It was thought that the wording of this question in first semester 2004, which
included the phrase “Suppose you decide to choose heads from now on”, may have
led to some students believing that heads was the correct choice for the 5th toss while
Chapter 5 The Statistical Reasoning Questionnaire
118
acknowledging that the probabilities of heads and tails were equal. For this reason,
the leading phrase was deleted in second semester and subsequently. In fact, the
percentage of students who chose heads but gave both probabilities as equal to a half
fluctuated from 9%, through 11%, to 7% between the three cohorts. Due to this
minimal and inconsistent change, the results from the three cohorts have been pooled
in the analyses despite the slightly different wording. A complete understanding of
independence in the context of this question was demonstrated by 76% of students.
SRQ_22 regarding interpretation of normal probabilities was introduced to the SRQ
in 2005 because it was observed in 2004 that a slight change in emphasis at school
level (see below) was having a large effect on students’ abilities with visualisation
and sketching in this context.
SRQ_22 The heights of first year female university students are normally
distributed with a mean of 165 cm and a variance of 4 cm2 . The
graphs below are all of the standard normal distribution, that is,
normal with mean 0 and variance 1. Choose the graph in which the
shaded area gives the probability that a randomly chosen first year
female student has a height of more than 161 cm.
a) b) c) d)
e)
3210-1- 2-3 3210-1- 2-3
3210-1- 2-3 3210-1- 2-3
3210-1- 2-3
Chapter 5 The Statistical Reasoning Questionnaire
119
A correct response involves the ability to transform from a given mean and variance
to a standard normal distribution and to interpret a probability as the area under the
curve, utilising the symmetry of the distribution. This is core content in all school
mathematics syllabi done by these students. A complete understanding was
demonstrated by 26% of students. One of the possible reasons for the difficulty
students experienced with this question is the increasing emphasis on graphics
calculators in senior high school mathematics. While judicial use of this technology
can enhance a student’s understanding, inappropriate use can have the opposite effect
(Ben-Zvi, 2000). When students were required to transform, draw a diagram,
manipulate and look up tables to calculate a probability, they acquired skills which
could be both transferred and generalised. When a graphics calculator is used to
calculate a normal probability, three or four numbers are entered and an answer
obtained with little understanding being gained regarding the visualisation of the
normal distribution or the relationship between values, areas and probabilities, and
almost no skills acquired which can be readily transferred to a different technological
environment.
Reasoning about samples and variation
Reasoning about samples is assessed by SRQ_7, SRQ_9, SRQ_13 and SRQ_16.
These items also include reasoning about variation more explicitly than other items.
SRQ_7 assesses an understanding of the importance of large samples, asking
whether a decision would be better made on the basis of a large or small sample.
SRQ_7 Mrs Jones wants to buy a new car, either a Honda or a Toyota. She
wants whichever car will break down the least. First she read in
Consumer Reports that for 400 cars of each type, the Toyota had
more breakdowns than the Honda. Then she talked to three friends.
Two were Toyota owners, who had no major breakdowns. The other
friend used to own a Honda, but it had lots of break -downs, so he
sold it. He said he’d never buy another Honda. Which car should
Mrs Jones buy?
Chapter 5 The Statistical Reasoning Questionnaire
120
a) Mrs Jones should buy the Toyota, because her friend had so
much trouble with his Honda, while the other friends had no
trouble with their Toyotas.
b) She should buy the Honda, because the information about
break-downs in Consumer Reports is based on many cases,
not just one or two cases.
c) It doesn’t matter which car she buys. Whichever type she gets,
she could still be unlucky and get stuck with a particular car that
would need a lot of repairs.
Sixty-seven percent of students demonstrated their understanding of the importance
of large samples. The response that the choice does not matter as anything could
happen, is said to indicate an outcome approach (Garfield, 2003). While 30% of
students chose this response, it is difficult to know whether or not they would employ
such an attitude in a real life situation. Perhaps their response is a little flippant,
exhibiting a general mistrust of survey results.
In SRQ_9 students are asked to estimate the size of a population on the basis of a tag
and recapture process.
SRQ_9 A farmer wants to know how many fish are in his dam. He took out
200 fish and tagged each of them. He put the tagged fish back in
the dam and let them get mixed with the others. On the second day,
he took out 250 fish in a random manner, and found that 25 of them
were tagged. Estimate how many fish are in the dam.
While a situation such as this is most likely outside the experience of students at this
level, an intuitive understanding of estimation, samples and representativeness
should result in a correct response as demonstrated by 55% of students, with another
11% giving a ‘nearly correct’ answer (usually the result of an incorrect ratio or
taking the wrong ratio, illustrating once again a lack of confidence with fractions).
This question was notable for the number of different responses: 60 in all. Seventeen
Chapter 5 The Statistical Reasoning Questionnaire
121
percent of students gave the response 425 or similar. This value is obtained by
counting up the number of distinct fish, tagged and untagged, which are seen in the
process. Such a response demonstrates a lack of appreciation for variation and the
underlying concept of estimation which is surprising for students at this level.
Question SRQ_13, the hospital problem, originating from Kahneman and Tversky
(1972), requires an appreciation of the variation in large and small samples.
SRQ_13 Half of all newborns are girls and half are boys. Hospital A records
an average of 50 births a day. Hospital B records an average of 10
births a day. On a particular day, which hospital is more likely to
record 80% or more female births?
a) Hospital A (with an average of 50 births a day).
b) Hospital B (with an average of 10 births a day).
c) The two hospitals are equally likely to record such an event.
Students who believe that the likelihood of observing 80% or more female births is
the same regardless of sample size are said to be susceptible to the “law of small
numbers”. This misconception was demonstrated by 58% of students, while 35%
demonstrated an appreciation of the importance of sample size by correctly selecting
the smaller hospital. In Kahneman and Tversky’s original sample of 15 to 18 year-
olds, a similar percentage of students (56%) demonstrated the misconception of “the
law of small numbers”, but the choice between the large and the small hospital was
divided almost equally, whereas in this group only 6% chose the larger hospital. (1%
did not respond.).
In SRQ_16, students were asked to decide which, if any, of three samples
demonstrated greater than expected variation from a theoretical set of proportions.
One sample was significantly different from the theoretical model, while the other
two were well within acceptable variation.
Chapter 5 The Statistical Reasoning Questionnaire
122
SRQ_16 A Brisbane City Council brochure states that on a typical summer
weekend, users of the Goodwill Bridge fall into the following age
groups.
Age group percentage of users
0-10 5
11-20 20
21-30 40
31-40 15
41-50 8
51-60 5
61-70 5
71+ 2
One typical summer weekend, 100 people were observed crossing
the bridge. Which of the data sets given below would cause you to
question the information in the brochure?
a) Set A only.
b) Set B only.
c) Set C only.
d) Set A and B only.
e) Set A and C only.
f) Set B and C only.
g) Set A, B and C.
h) None of A, B or C.
Set A Set B Set C
Age group percentage of users Age group percentage
of users Age group percentage of users
0-10 5 0-10 10 0-10 7
11-20 13 11-20 20 11-20 19
21-30 32 21-30 45 21-30 43
31-40 14 31-40 10 31-40 13
41-50 13 41-50 5 41-50 10
51-60 12 51-60 5 51-60 3
61-70 11 61-70 3 61-70 4
71+ 0
71+ 2
71+ 1
Chapter 5 The Statistical Reasoning Questionnaire
123
Forty-two percent of students correctly identified the single, clearly unusual sample;
38% felt that none of the three samples threw doubt on the theoretical proportions;
10% chose the extreme sample together with others; while the remainder chose
samples other than the truly anomalous sample. This suggests that most incoming
students have an understanding of sample variation, but that the extent of acceptable
variation is difficult for them to assess.
Reasoning about association
SRQ_14 and SRQ_15 both deal with reasoning about association. In SRQ_14, a
researcher draws a conclusion regarding heart-related deaths and vehicle-registration
on the basis of an increase in the numbers of both.
SRQ_14 A local newspaper published the following article:
Do you agree with Mr Robinson’s findings? (Please explain your
response.)
In SRQ_15, a conclusion is drawn relating academic achievement to learning to play
a musical instrument.
SRQ_15 A music teacher was pleased to read the following article in a
professional journal:
Do you agree with these research findings? (Please explain your
response.)
Instrumental Lessons Improve OPs.
Research has shown that learning to play a musical instrument during primary school increases a child’s chance of attaining a good OP. In a longitudinal study, students enrolled in primary schools across Queensland since 1990 have been followed until they graduated from high school. Of those students who were involved in an instrument program, 20% obtained an OP of 5 or better, while the figure was 15% for those not involved in instrumental music.
Family car is killing us, says researcher
Twenty years of research has convinced Mr Robinson that motoring is a health hazard. Studying figures from the Australian Bureau of Statistics, Mr Robinson has produced graphs which show quite dramatically that as the numbers of new vehicle registrations increase, so have the numbers of deaths due to heart-related causes.
Chapter 5 The Statistical Reasoning Questionnaire
124
A fully correct response to SRQ_15 required the discussion of other factors
(effectively confounders) which may have influenced the result. A similar response
to SRQ_14 was not credited as fully correct as a more fundamental issue in this
situation was seen to be the realisation that due to an increasing population, some
sort of rate would need to be calculated, even if more subtle aspects were then
considered. Only 16% of students identified the increasing population as a problem
in SRQ_14, while 27% discussed possible confounders. Interestingly in SRQ_15
only 18% raised the issue of confounders. In fact, while only 8% of students agreed
with the researcher’s conclusion in SRQ_14 (3% non-response rate), 40% did so in
SRQ_15 (8% non-response rate). Undoubtedly this difference demonstrates the
extent to which prior beliefs are able to influence one’s ability to think critically
regarding analysis of data, as while a link between tertiary entrance scores and
musical exposure is plausible, a link between heart disease and car registrations is
not. Another way in which this is demonstrated is by the most common (31%)
response to SRQ_14, namely disagreement with the findings solely on the basis that
the two factors are “unrelated” or that the result is “just coincidence”. In this
statement students are justifying their response with prior belief, without legitimate
criticism of the analysis. These two items illustrate the tension that exists for the
user of statistics between personal preconception, statistically supported studies
(good or otherwise) and input from other sources.
5.4 Discussion - Suitability of the SRQ
The Statistical Reasoning Questionnaire was developed to assess the statistical
understanding of students at the interface of secondary and tertiary education
enrolling in a science or science-related degree program in an Australian university.
Its development drew on the work of Garfield (1991; 2003) whose Statistical
Reasoning Assessment was designed for students of similar age who had undertaken
a new high school statistics curriculum in the US, and the work of Watson and
Callingham (2003) who had interviewed Australian school children from primary to
junior high school. In designing the SRQ, particular attention was given to selecting
questions at an appropriate range of levels and avoiding questions based on
Chapter 5 The Statistical Reasoning Questionnaire
125
combinatorial reasoning and relying on artificial examples of coin tossing and dice
throwing.
In administering the SRQ to the three cohorts of students, the distribution of total
scores remained remarkably consistent. This consistency of outcomes is an
indication of the reliability of the questionnaire as a measuring tool and its suitability
for students at this level. The range of student success rate for different items (from
14% to 96%) is indicative of its appropriateness for the wide range of abilities and
backgrounds present in these students at the secondary/tertiary interface. Further
analysis of the questionnaire is reported in Chapter 6.
126
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
127
Chapter 6 Rasch Analyses for the Statistical
Reasoning Questionnaire
6.1 Introduction
In this chapter, the Statistical Reasoning Questionnaire is analysed using the
techniques of Rasch methods. Two different approaches are taken. In the first
approach, described in Section 6.2, responses are scored dichotomously and the
simple Rasch model fitted as with the Numeracy Questionnaire in Chapter 4. This
dichotomous approach is used as a basis for the definition of the SRQ Score.
In the second approach, described in Section 6.3, responses are scored
polychotomously and the more complex Rasch partial credit model fitted to the data.
This model forms the foundation for the introduction of the SRQPC Score. The
construction of this SRQPC Score is a significant development and extension of the
work of Watson and Callingham (2003) and applies their framework of statistical
literacy in a new scoring approach. In Section 6.4, the Rasch partial credit model is
used to investigate the expected responses of students to individual items in the SRQ.
Section 6.5 concludes this chapter with a critical examination of the SRQ regarding
its suitability for use at the secondary/tertiary interface.
6.2 Dichotomous Rasch Model
The simplest technique for combining items on the SRQ is to assign to each fully
correct response a score of one and to all other responses a score of zero. Under this
system, the total score is simply the number of fully correct responses.
As described in Chapter 4, the dichotomous Rasch model is given by:
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
128
Equation 6.1
( )( )( )( )
exp; 0,1; 1,..., ; 1,... ;
1 expn i
nin i
xP X x x n N i L
β δ
β δ
−= = = = =
+ −
where niX is the response of person n on item i , nβ is the ability parameter
associated with person n and iδ is the difficulty parameter for item i .
This model has the advantage that the existence of sufficient statistics, nR (the total
score for person n ) for nβ , and iS (the number of correct responses to item i ) for iδ ,
allows the estimation of item difficulty and person ability on a single scale (Keeves
and Alagumalai, 1999).
When n iβ δ= , person n has a probability of 0.5 of succeeding at item i .
This model was fitted to the 22 SRQ items administered to three cohorts (semester I
2004, semester II 2004 and semester I 2005) totalling 612 students. Overall person
and item statistics are given in Table 6.1
item separation reliability 0.99 item infit mean square mean=1.00 SD=0.10 person separation reliability 0.62 person infit mean square mean=1.01 SD=0.25
Table 6.1 Fit statistics for the dichotomous Rasch model
The statistics in Table 6.1 are indicative of a good fit. The infit average item and
person mean square values are close to the expected value of one, suggesting that
items on the questionnaire are measuring a single, one-dimensional construct. The
separation reliability index of 0.99 provides evidence that the items provide a good
spread of difficulty, although the person separation reliability index of 0.62 is a little
low.
Considering the items individually, most item infit mean square values fell between
0.90 and 1.13. The exceptions to this were item 20 which had an infit mean square
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
129
of 0.80 and item 19 with an infit mean square of 0.77. According to Keeves and
Alagumalai (1999), an item is generally accepted as fitting the Rasch model if it has
an infit mean square lying between 0.77 and 1.30, although some researchers would
prefer a more restricted range of 0.83 to 1.20. Although items 19 and 20 satisfy the
broader requirements, they are clearly borderline cases. Both these items (together
with items 17 and 18) are classified as curriculum-based items as they require an
understanding of the relatively mathematical concept of a logarithm and, in the case
of item 19 (and 17), as they require familiarity with the specific statistical
terminology of “median” (and “quartile” in the case of item 18). The lack of fit of
item 19 is discussed further in Section 6.3 with respect to the polychotomous Rasch
model.
The variable map shown in Figure 6.1 displays items on the right-hand side and
students on the left-hand side, illustrating the parameters nβ and iδ respectively, on
a single logistic scale. It can be seen that the items cover most of the range of
students’ abilities, although it should be noted that coverage in the upper range of
ability is highly dependent on curriculum-based items. It may not be possible to
measure the full range of students’ statistical reasoning independent of curriculum on
entering university. During analysis of the SRQ, there were also a number of
indications that higher statistical reasoning at the secondary/tertiary interface may
not be able to be separated from mathematical reasoning. These are possible
questions for further research.
While the Rasch model is able to calculate parameter estimates for all items,
including those which were not administered to all cohorts, only items which were
administered to all students can be meaningfully included when combining items into
a single scale. Given the more curriculum-dependent nature of items 17 to 20, it was
also decided that these four items would be better not included in the formation of an
overall score for use in investigating dependence on other variables. Hence the SRQ
Score, which is used for further analysis in Chapter 7, consists of the items SRQ_1 to
SRQ_3 and SRQ_5 to SRQ_16 (SRQ_4 having been discarded in 2005 when
SRQ_21 and SRQ_22 were introduced).
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
130
SRQ all - 1_04, 2_04, 1_05: ---------------------------------------------------------------------------- Item Estimates (Thresholds) all on all (N = 612 L = 22 Probability Level=0.50) ---------------------------------------------------------------------------- 4.0 | | | | | | | 3.0 | | | | X | | | 18 2.0 XX | 14 | 19 20 X | 15 XXX | XX | XXX | 22 XXX | 1.0 XXXXXXX | XXXX | 13 XXXXXXXX | 2 XXXX | 16 XXXXXXXXX | XXXXXX | 10 0.0 XXXXXXXXXXXXXX | 9 | XXXXXXXXXXXXXXXXXXX | 11 | 12 XXXXXXXXXXXXX | 5 | 1 7 17 XXXXXXXXX | -1.0 | 8 XXXXXXXX | | 6 XXX | | 3 XX | | -2.0 | X | | | | X | | 21 -3.0 | | 4 | | | | | -4.0 | ---------------------------------------------------------------------------- Each X represents 5 students =====================================
Figure 6.1 The variable map for the dichotomous Rasch model displays the item and person parameter estimates.
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
131
6.3 Polychotomous Rasch Model
A more sophisticated approach to scoring the Statistical Reasoning Questionnaire is
to allow partial credit for responses in a way that reflects the level of understanding
demonstrated. Masters (1982) described the use of a more complex Rasch model to
allow for polychotomous response variables. This Rasch partial credit model has
been used by Watson and Callingham (2003) in their study of statistical literacy
(which included items on which some SRQ items were based) and Reading (2002) in
her study of statistical understanding.
The Rasch partial credit model is given by:
Equation 6.2
( )( )
( )
( )
1
1 1
1 1
exp; 1,2,..., ; 1,..., ; 1,... ;
1 exp
1; 0;
1 exp
i
i
x
n ijj
ni im k
n ijk j
m k
n ijk j
P X x x m n N i L
x
β δ
β δ
β δ
=
= =
= =
−= = = = =
+ −
= =+ −
∑
∑ ∑
∑ ∑
where niX is the response of person n on item i , nβ is the ability parameter for
person n and ijδ is the difficulty parameter associated with giving response j rather
than 1j − on item i .
In this model the number of possible responses, 1im + , is allowed to vary between
items.
The model effectively simplifies to a dichotomous model at each response boundary
within an item, so that, conditioning on having either response 1x − or x , we have:
Equation 6.3
{ }( ) ( )( )
exp| 1, ; 1,2,..., ; 1,..., ; 1,... .
1 expn ix
ni ni in ix
P X x X x x x m n N i Lβ δ
β δ−
= ∈ − = = = =+ −
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
132
So when a person’s ability parameter nβ is equal to ixδ , there is a probability of 0.5
of giving response x to item i , given that the response choice is 1x − or x . This
stepwise interpretation of the partial credit model is consistent with increasing values
of the response x reflecting increasing levels of understanding. However, because
the explanation is conditional on the response being equal to either 1x − or x , it is
not necessary for the ixδ to be increasing with x . In the case where the ixδ are
unordered, there exists at least one response value which is never the most probable
for any value of β . (Masters, 1988a)
As in the dichotomous model, one of the benefits of the polychotomous Rasch model
is that sufficient statistics exist for nβ and ijδ , with nR , the total score for person n ,
being sufficient for nβ and the number of students who give response j to item i
being sufficient for ijδ .
In his initial development of the partial credit model, Masters (1982) states that the
numerical response values niX “indicate only the ordering of the response categories
and are not used as category ‘weights’”. However, mathematically the use of nR in
estimating nβ contradicts this assertion since, for example, a maximum response
value of four for one item gives that item more weight in estimating person ability
than an item with a maximum response of two. Further difficulty with the model
was raised by Jansen and Roskam (1986) who argue essentially that the estimate of a
person’s ability should be invariant under a change in coding of responses,
particularly by joining categories. They showed that the partial credit model, as
described in Masters (1982), does not satisfy this requirement and hence questioned
its use.
The most convincing response to these difficulties with the partial credit model, is to
ensure that the values assigned as item responses are numerically meaningful.
Wilson and Masters (1993) describe this as a “response framework” where response
values have a “uniform substantive meaning” across all items. Wilson and Masters
also reparameterised the partial credit model in a way that allows the inclusion of
null categories, often an important further requirement if such a response framework
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
133
is to be implemented consistently across items. A null category is one in which no
response occurs either by design in the coding process or by observation in the data.
The presence of a null category results in the allowable response codes being non-
consecutive.
In scoring the SRQ items for a partial credit model, we began by looking to Watson
and Callingham’s application of the partial credit model to statistical literacy. As a
number of SRQ items are drawn from this study it was hoped that the same scoring
system could be implemented and extended. The intention of Watson and
Callingham was that the codes they used reflect the cognitive frameworks of Biggs
and Collis (1982; 1991) and Watson’s own three-tiered framework established in
previous work (Watson, 1997). However, inspection of codes assigned to various
questions indicated problems with regard to the meaningfulness of codes across
questions. As an example, consider the following two questions from Watson and
Callingham:
Box A and Box B are filled with red and blue marbles as follows.
Box A Box B
Each box is shaken. In order to win a ticket to a sporting match, you
need to get a blue marble, but you are only allowed to pick out one
marble without looking. Which box should you choose?
A farmer wants to know how many fish are in his dam. He took out
200 fish and tagged each of them. He put the tagged fish back in
the dam and let them get mixed with the others. On the second day,
he took out 250 fish in a random manner, and found that 25 of them
were tagged. Estimate how many fish are in the dam.
60 red 40 blue
6 red 4 blue
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
134
The first of these questions appears in the Statistical Reasoning Questionnaire as
SRQ_3 with slightly more difficult numbers, while the second question appears as
SRQ_9 in this form.
For the box question, scoring by Watson and Callingham assigned a code of three for
the correct choice, it doesn’t matter, accompanied by an explanation of correct
proportional reasoning. A response which relied on the absolute number of marbles
rather than proportions was assigned a code of two, while a response that anything
could happen or idiosyncratic reasoning was assigned a code of one. A code of zero
was restricted to no response. By contrast, for the fish question, a correct response of
2000 was assigned a code of one and an incorrect or no response was assigned a code
of zero. Under this coding, when the Rasch partial credit model is applied to these
two questions, a correct response to the box question carries greater weight than a
correct response to the fish question. One defendable position would be to argue
that, given that a correct response to each question requires the use of proportional
reasoning, the highest response code for the two questions should be the same. It
could also be argued that the level of understanding required to correctly answer the
fish question is greater than that required for the box question and that therefore the
fish question should receive greater weighting. In fact there may be incorrect
responses to the fish question which demonstrate the same level of understanding as
the fully correct response to the box question.
An inspection of coding for the complete set of Watson and Callingham items
reveals that, while there appears to be a response framework applied within each
item, there has been no equating or moderating of numerical codes across items to
satisfy a ‘uniform substantive meaning’ across all items. If the latter is not satisfied,
a response to one item may be attributed the same numerical code as a response to
another item, but the two responses may reflect a different level of understanding
within the underlying overall response framework.
One reason for this occurrence with the Watson and Callingham scoring system is
the restriction to consecutive values for scores. Given the Wilson and Masters
reparameterisation of the partial credit model to allow null categories, this is not
necessary and a coding system which incorporates jumps can be legitimately applied.
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
135
Watson and Callingham used the results of their Rasch partial credit analysis,
together with their previously hypothesised framework to describe six hierarchical
levels of understanding within statistical literacy, namely:
1. Idiosyncratic – relying on idiosyncratic engagement with context,
tautological use of terminology and fundamental mathematical skills;
2. Informal – relying on informal engagement with context, reflecting intuitive
beliefs, single aspects of terminology and basic one-step calculation;
3. Inconsistent – requiring selective engagement with context, conclusions
without justification, qualitative use of statistics;
4. Consistent non-critical – requiring non-critical engagement with context,
multiple aspects of terminology, some appreciation of variation, basic
quantitative statistical skills;
5. Critical – requiring critical engagement with context, appropriate use of
terminology, qualitative statistical skills but not including proportional
reasoning;
6. Critical mathematical – requiring critical and questioning engagement with
context; understanding of subtle aspects of language, use of proportional
reasoning.
Despite the problems in the Watson and Callingham coding system, the relative
positioning of item step difficulties produced by the Rasch analysis, results in the
above hierarchy being consistent and well described. It is suspected that, due to the
number of items used (80), the effect of code variation between items may not have
significantly affected the outcome of the analysis. In any case, the levels are well
defined by the descriptors given above. We have used this hierarchy to inform
choices of levels in building a response framework for scoring the SRQ.
The descriptors of the levels in the Watson and Callingham hierarchy, together with
the Rasch output underpinning those descriptors, was used to score item responses to
the SRQ according to the levels of understanding which they reflect. Table 6.2 gives
Item number Item Response
W&C initial
code
W&C final level
Into SRQPC Explanation for change Out of
SRQPC
There are a total of 110 children - only response. 3 ? 5 4
There are 2.2 children for every adult. 2 3 3 3 The most common number of children in a family is 2.
2 3 3 3
None of the above. 2 3 3 3 Multiple responses. 2 3 3 3 Half the families have more than 2 children.
1 3 3 3
More families have 3 children than have 2 children.
1 3 3 3
SRQ_1
To get the average number of children per family in a small town, a teacher counted the total number of children in the town. She then divided by 50, the total number of families. The average number of children per family was 2.2. Which of the following is certain to be true?
No response. 0 0 0
Initial code 3 doesn’t appear on the W&C map although discussion suggests it is also at level 3. We scored at level 5 on the basis that critical reasoning is needed to unpack the mean algorithm.
0 Exclude outlier to calculate mean. 3 6 6 5 Use median; use median with outlier excluded. 3 6 5 5
Uncertain about outlier; calculate mean; mean with max & min excluded.
?; 2; ? ?; 4; ? 4 4
Mode; mean of max & min; do it again multiple times & average.
3; ?; ? 6; ?; ? 3 3
6; discard outlier but then unclear ?; ? ?; ? 2 3 repeat once only ? ? 1 3
SRQ_2
A small object was weighed on the same scales separately by nine students in a science lab. The weights (in grams) recorded by each student are shown below. 6.3 6.0 6.0 15.3 6.1 6.3 6.2 6.15 6.3 The true weight could be estimated in several ways. How would you estimate it?
other; no response 0 0 0
Given the context, at tertiary level, the median is considered less ideal than the mean with outlier excluded and the mode unacceptable. Hence needed to differentiate between responses that were all coded as completely correct by W&C.
0
It doesn’t matter – correct reasoning. 3 4 4 4
Box A/B – numeric explanation. 2 2 1 1
Box A/B - idiosyncratic reasoning. 1 2 1 1 SRQ_3
Box A and Box B are filled with red and blue marbles as follows. Box A Box B
Each box is shaken. In order to win a ticket to a sporting match, you need to get a blue marble, but you are only allowed to pick out one marble without looking. Which box should you choose?
No response. 0 0 0
No explanation was required for the SRQ. Selecting A or B is considered sufficiently low level for tertiary entrance to score as 1.
0
Table 6.2 continued over
30 red 20 blue
12 red 8 blue
Item number Item Response
W&C initial
code
W&C final level
Into SRQPC Explanation for change Out of
SRQPC
About 15 out of every 100 people who use this medicine get a rash; or this with either of next 2
2 3 3 3
Don’t use the medicine on your skin – there’s a good chance of getting a rash.
1 1 1 2
There is hardly any chance of getting a rash using this medicine.
1 1 1 2
For application to the skin, apply only 15% of the recommended dose.
0 0 0 0
If you get a rash, it will probably involve only 15% of the skin. 0 0 0 0
SRQ_4
A bottle of medicine has the following printed on it: WARNING: For applications to skin areas there is a 15% chance of getting a rash. If you get a rash, consult your doctor. How would you interpret this?
No response. 0 0 0
0 The man is over 55 and has had a heart attack. 1 3 3 4
The man is over 55. 0 0 1 1 The man has had a heart attack. 0 0 1 1
SRQ_5 An Australian male is rushed to hospital in an ambulance. Which of the following is least likely?
No response. 0 0 0
Question on W&C requires estimation of probability of each event. SRQ coding is by analogy. Generally restrict code 0 to no response. 0
Either – 0.5, 0.5. 2 4 4 4 Heads; tails – 0.5, 0.5. 1 2 2 3 No choice – 0.5, 0.5. ? ? 2 3 Either – any or no prob given. 1 2 2 3 Heads; tails – probs add to 1. 0 0 1 3 Other. 0 0 0 0
SRQ_6
As captain of your cricket team you have lost 8 out of 9 tosses in your previous 9 matches. For the next 4 tosses of the coin, you choose heads. Tails comes up 4 times. For the 5th toss, what should you choose? What is the probability of getting heads on this 5th toss? What is the probability of getting tails on this 5th toss? No response. 0 0 0
Some credit deserved for realising probabilities should sum to one.
0 Buy the Honda, the information in Consumer Reports is based on many cases...
3 5 5 4
It doesn’t matter which car she buys. She could still be unlucky.
2 2 2 2
Buy the Toyota, because her friend had so much trouble with his Honda, ...
1 1 1 1
SRQ_7
Mrs Jones wants to buy a new car, either a Honda or a Toyota. First she read in Consumer Reports that for 400 cars of each type, the Toyota had more breakdowns than the Honda. Then she talked to three friends. Two were Toyota owners, who had no major breakdowns. The other friend used to own a Honda, but it had lots of break-downs. Which car should Mrs Jones buy? No response 0 0 0
0
Table 6.2 conti nued over
Item number Item Response
W&C initial
code
W&C final level
Into SRQPC Explanation for change Out of
SRQPC
Adds to more than 100% 2 5 5 4
Out of proportion. 2 5 5 4 “Other” too large; content and heading inconsistent.
1 4 4 4
Uncertain re importance of >100%. ? ? 3 4
Not enough info. ? ? 2 4
Other. ? ? 1 4
SRQ_8 Is there a problem with the following pie chart? If so, identify the problem.
Style issues; no response. 0 0 0
Could have separated the first two responses, but chose to remain consistent with W&C. Inserted levels at lower end.
0
2000 1 6 6 5
Responses that have calculated 10% then stopped. 0 0 5 5
Incorrect responses between 1000 and 3000. 0 0 4 4
Responses >3000 or between 600 & 800. 0 0 2 4
Responses between 250 and 550. 0 0 1 3
SRQ_9
A farmer wants to know how many fish are in his dam. He took out 200 fish and tagged each of them. He put the tagged fish back in the dam and let them get mixed with the others. On the second day, he took out 250 fish in a random manner, and found that 25 of them were tagged. Estimate how many fish are in the dam.
No response; <250. 0 0 0
Top code has been maintained as for W&C output. Partial credit created. Responses between 1000 & 3000 appear to be result of calculating proportions incorrectly. Responses between 250 and 550 appear to be result of counting observed fish – eg 200+(250-25) – no recognition of sampling. 0
65% - 74% of those days. 5 5
75% - 84% of those days. 3 5
95% - 100% of those days. 2 2
85% - 94% of those days. 2 2
55% - 64% of those days. 1 1
SRQ_10
The Bureau of Meteorology wanted to determine the accuracy of their weather forecasts. They searched their records for those days when the forecaster had reported a 70% chance of rain. For those particular days ..., they compared the forecast with records of whether or not it actually rained. The forecast of 70% chance of rain can be considered very accurate if it rained on: No response. 0
Not in W&C. Correct response demonstrates critical thinking consistent with level 5.
0
Table 6.2 continued over
Item number Item Response
W&C initial
code
W&C final level
Into SRQPC Explanation for change Out of
SRQPC
Graph 2 – uniform scale; more accurate; easier to use. 3 4 4 4
Graph 2 – clearer; looks better; no reason.
2 3 3 4
Graph 2 – grouped; other. 1 2 2 2
Graph 1 – any reason. ? ? 2 2
SRQ_11
A group of students recorded the number of years their families had lived in their town. Here are two graphs that the students drew to illustrate their results: Graph 1: not to scale Graph 2: to scale Which of the graphs would you recommend the students use and why? No response. 0 0 0
Difference in ages of subjects (between W&C and SRQ) makes it difficult to equate preference explanations.
0 Graph A – size of pieces reflects proportions; more accurate; easier to compare pieces; clearer.
4 5
Graph A – can be done by hand; simpler; all the same; no reason.
3 4
Either; Graph B any reason. 2 2
SRQ_12
Another group of students carried out a survey at the local library regarding the most frequent reason for using the internet. They produced the following two graphs: Graph A: 2D pie graph Graph B: 3D pie graph Which of the graphs would you recommend the students use and why? No response. 0
Not in W&C. Assignment of codes consistent with previous question.
0
Hospital B. 6 5
The two hospitals are equally likely to record such an event. 3 3
Hospital A. 2 2 SRQ_13
Half of all newborns are girls and half are boys. Hospital A records an average of 50 births a day. Hospital B records an average of 10 births a day. On a particular day, which hospital is more likely to record 80% or more female births?
No response. 0
Not in W&C. Correct response requires critical mathematical understanding consistent with code 6. “Equally likely” implies an understanding of proportions being equal – a higher level understanding than choosing the larger absolute value in hospital A
0
Table 6.2 continued over
Item number Item Response
W&C initial
code
W&C final level
Into SRQPC Explanation for change Out of
SRQPC
Disagree – population increase so need to use rate. 6 6
Disagree – other factors. 5 5 Disagree – correlation doesn’t imply causation; just coincidence; not enough info.
4 4
Agree – large sample size. 3 4
Disagree – other or no reason. 2 3
Agree – explains a possible link; other. 2 3
Agree – no reason. 1 3
SRQ_14
A local newspaper published the following article: (regarding increase in vehicle registrations being linked to heart disease) Do you agree with Mr Robinson’s findings?
No response. 0
The W&C item is phrased in terms of “What questions would you ask?” Coded at only 3 levels with top code (2) awarded for questions regarding existence of other causes, or how the two are linked. For SRQPC code 6 was restricted to the identification of need for rate - considered a critical mathematical response.
0
Disagree – other factors. 5 6 Disagree – correlation doesn’t imply causation; just coincidence; not enough info; want to check data.
4 5
Disagree – not enough difference in %; too much difference in sample sizes; conditional probability confused.
3 5
Agree – large sample size. 3 5
Disagree – other or no reason. 2 3 Agree – statistical info was supplied; explains possible link; other. 2 3
Agree – personal experience; no reason. 1 3
SRQ_15
A music teacher was pleased to read the following article in a professional journal: (regarding link between OP and music lessons) Do you agree with these research findings?
No response. 0
Not in W&C. Coded to maintain consistency with previous item.
0
Table 6.2 continued over
Item number Item Response
W&C initial
code
W&C final level
Into SRQPC Explanation for change Out of
SRQPC
A only. 5 5
A & B only; A, B & C; none. 4 4
Can’t tell. 3 4
B only; C only; A & C only; B & C only.
2 3
Circled individual values (sem1 2004). 1 3
SRQ_16
A Brisbane City Council brochure states that on a typical summer weekend, users of the Goodwill Bridge fall into the following age groups. (table given) One typical summer weekend, 100 people were observed crossing the bridge. Which of the data sets given below would cause you to question the information in the brochure? A, B, C are in decreasing order of difference from brochure. No response. 0
Not in W&C. Correct choice requires appreciation of variation, consistent with level 5.
0
SRQ_17 to 20
Total rainfall (in millimetres) for the month of November 2003 was collected from 25 weather stations over a region of northern Australia. The data below represent the logarithms (to base 10) of the rainfall. These values have been ordered from smallest to largest for ease of manipulation.
Not in W&C
Correct (1.71 - 13th obs) 4 4 Data value near median; max/2; middle row; mean of middle row.
3 4
Mean; modes. 2 4
Other. 1 4
SRQ_17 What is the median of the above data set (i.e. the median of the log of the rainfalls)?
No response. 0
0
Correct (7th or 6.5th obs) 5 6 6.25th obs.; posn of quartile given not value. 4 6
Data value near quartile; med/2; max/4; first quarter of data.
3 5
3rd quartile. 2 5
Other. 1 4
SRQ_18 What is the lower quartile of the data set?
No response. 0
Correct response requires more terminology than for SRQ_17.
0
Table 6.2 continued over
Item number Item Response
W&C initial
code
W&C final level
Into SRQPC Explanation for change Out of
SRQPC
Correct 6 - Incorrect trans of median involving logs, exp or powers of 10 5 -
Other trans of median 4 - Median; trans of mean 3 - Other 1 -
SRQ_19 Using your answer to question SRQ_17, calculate the median total rainfall (in millimetres) for the region
No response 0
Correct response has greater mathematical requirements than for SRQ_18.
- Correct 6 6 Incorrect trans of max involving logs, exp or powers of 10
5 5
Max 3 4 Other 1 4
SRQ_20 What was the highest total rainfall recorded for the month in the region?
No response 0
As for SRQ_19.
0 None of the above statements is correct. 2 2 Membership of the AMS in 1991 was twice what it was in 1989 1 1
Between 1987 and1991 membership of the AMS doubled every two years.
1 1
Membership of the AMS could reasonably be expected to reach 50000 by 1992.
0 0
SRQ_21 The graph below shows the number of members of the American Mathematical Society: (Graph is out of proportion)
No response. 0
Straightforward graph (at tertiary level) consistent with level 2
0 Shaded area Z<2. 6 5 Shaded area Z<-2. 4 5 Shaded area Z>1. 3 4 Shaded area Z>2. 3 4
Shaded area 0<Z<1. 1 4
SRQ_22
The heights of first year female university students are normally distributed with a mean of 165 cm and a variance of 4 cm2 . The graphs below are all of the standard normal distribution, that is, normal with mean 0 and variance 1. Choose the graph in which the shaded area gives the probability that a randomly chosen first year female student has a height of more than 161 cm No response. 0
Correct response requires use of symmetry – mathematical requirement consistent with level 6.
0
Table 6.2 Each item response was coded for the Rasch partial credit model on the basis of a substantive framework.
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
143
the justification for the scoring applied to each item. The procedure applied was as
follows. Where an SRQ item was used in the Watson and Callingham study, a
response the same or similar to one described by Watson and Callingham received a
score equal to the level of understanding in which that response occurred in the
Watson and Callingham Rasch variable map. (A few minor deviations from this are
explained in Table 6.2.) Where the SRQ identified partially correct responses not
identified by Watson and Callingham, these were scored below the maximum level
of the item in the Watson and Callingham Rasch output, endeavouring to maintain
consistency with our framework. Where an SRQ item did not have an equivalent
within the Watson and Callingham study, responses were scored to reflect our
framework.
Having scored the SRQ responses to our framework, the Rasch partial credit model
was fitted to the 612 students from three cohorts, using the Quest (Adams and Khoo,
1996) program. Quest uses a joint maximum likelihood procedure to estimate the
parameters of the model in its reparameterised form with the advantage of catering
for null categories. In fitting the model, students who score full or zero marks and
items which are answered fully correctly by all or no students cannot contribute to
the estimation process. As there was one student who scored full marks under the
partial credit scoring system, there were 611 students who contributed to the fit of the
model. 10
Table 6.3 shows the fit statistics for the analysis. The average infit mean square
values for both items and persons are again very pleasing, suggesting that the Rasch
partial credit model fits the data well. The item separation reliability index at 0.88 is
not as high as it was for the dichotomous model. Item separation reliability improves
as the number of persons increases. With 611 students it is probably not surprising
that for 22 items, the dichotomous model gave a very high item reliability. In the
partia l credit model, the number of item parameters to be estimated increases
substantially and hence a larger number of students is needed to maintain precision.
Given that our partial credit model allows for up to six levels, an item reliability of
10 Scoring of SRQ_8 gave the highest code to students who noted the pie graph was out of proportion but did not take the final step of noting that it added to more than 100% - the only answer counted as correct in the dichotomous model. As the student in question fell into this category, he was not deleted from the estimation process in the dichotomous case.
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
144
0.88 is still quite pleasing. On the other hand our person separation reliability index
has risen slightly from 0.62 in the dichotomous case to 0.72 in fitting the partial
credit model. The infit mean square of each item was also examined to determine
the fit of individual items to the model. As explained in Section 6.2, it is regarded
that infit mean square values should lie between 0.77 and 1.30 and those which lie
outside 0.83 to 1.20 may also be of concern. The infit mean square for each item is
given in Table 6.4. It would appear that item 19, with an infit mean square of 0.71,
does not sufficiently fit the Rasch partial credit model and that item 20 may also be a
cause for some concern.
All 22 items Item 19 excluded
item separation reliability 0.88 0.87 item infit mean square mean=1.00 SD=0.11 mean=1.00 SD=0.08 person separation reliability 0.72 0.66 person infit mean square mean=0.98 SD=0.36 mean=1.00 SD=0.36
Table 6.3 Overall fit statistics for the Rasch partial credit model
One of the reasons an item may not fit the Rasch model is due to dependencies on
other items. It is possible that SRQ_19 falls into this category as it specifically asks
for a transformation to the median which had been identified in SRQ_17. Care was
taken, however, in scoring SRQ_19 to ensure that full credit was given in the case of
students who gave an incorrect response to SRQ_17 and then applied a correct
transformation to that response. As has been discussed in Section 5.3.1, a
comparison of the non-response rates to SRQ_19 and 20 indicates that the form of
SRQ_19 influenced student’s responses by including the hint “using your answer to
question 17”. It is likely that this influence contributed to the lack of fit of SRQ_19.
Because of its lack of fit, item 19 was deleted and the model refitted. The overall
measures of fit for this new model are included in Table 6.3 and the individual item
infit mean squares in Table 6.4. Looking at the fit of individual items with item 19
excluded, for most items there has been very little change, however those with more
extreme infit mean squares (measured roughly by the distance from 1) have become
less extreme. This results in item 20 no longer being a cause for concern. This fit of
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
145
the Rasch partial credit model, with SRQ_19 excluded, using the scoring described
in Table 6.2 (in the column “into SRQPC”) and based on 611 students from three
cohorts is the fit used in further discussion and analysis.
Infit Mean Square Infit Mean Square Item 19 in 19 out
Item 19 in 19 out
1 0.96 0.95 12 1.05 1.03 2 1.00 0.99 13 1.18 1.14 3 0.97 0.95 14 0.95 0.94 4 0.99 0.98 15 1.08 1.05 5 1.04 1.03 16 0.99 0.98 6 0.98 0.98 17 0.90 0.91 7 1.14 1.11 18 0.97 0.96 8 1.18 1.17 19 0.71 -- 9 1.03 0.97 20 0.80 0.89
10 1.06 1.02 21 0.94 0.94 11 0.97 0.96 22 1.14 1.11
Table 6.4 Individual item fit statistics for the Rasch partial credit model
As for the dichotomous model, the Rasch partial credit analysis provides a variable
map which displays item and case estimates on the same logistic scale. While the
case estimates are the nβ of Equation 6.2, the item estimates are not the ijδ , which,
as explained earlier, need not be ordered. Rather they are the threshold estimates
from a further alternative parameterisation (Masters, 1988b). The threshold for an
item step is the ability level required in order to have an unconditional probability of
0.5 of passing that step. Thresholds must, by definition, be ordered.
A study of the variable map shown in Figure 6.2 is revealing regarding the mutual
coverage of students and items. (It should be noted that each x on the map represents
five students. Asterisks have been added to the map to indicate the highest, 1.95, and
lowest, -0.30, abilities recorded.) At the lower end of the spectrum, there is a good
selection of items and steps which are indicative of that level, with the most basic
item steps reaching well below the lowest level of student ability. At the upper end
of the spectrum however, this is not the case. There are fewer items in the upper
ranges of student ability with no item reaching beyond the most able student. It
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
146
SRQ Partial Credit using 1_04 2_04 1_05 - item19 deleted ------------------------------------------------------------------------------------- Item Estimates (Thresholds) all on all (N = 611 L = 21 Probability Level=0.50) ------------------------------------------------------------------------------------- 2.0 * | | 14.6 | X | X | 15.5 XXX | 18.4 18.5 20.6 XXX | XXXXXX | 15.4 XXXXXXXXXXXX | 20.5 22.6 1.0 XXXXXXXXXXXX | 2.5 2.6 13.6 16.5 XXXXXXXXXXXXX | 10.5 14.5 18.2 18.3 22.4 XXXXXXXXXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXXX | 9.5 9.6 10.3 12.4 15.3 XXXXXXXXXXXXXXXX | 5.3 7.5 9.2 9.4 12.3 17.4 18.1 20.3 22.3 XXXXXXXXX | 1.5 17.3 20.1 XXXX | 6.4 8.5 11.4 17.1 17.2 XXX | 3.4 8.2 8.3 8.4 14.4 X | 8.1 11.3 14.3 16.4 0.0 | 2.4 9.1 15.2 16.3 22.1 X | 2.3 6.2 | 2.2 16.2 | 6.1 14.2 * | 1.3 14.1 15.1 16.1 | 2.1 13.3 | 4.3 | 12.2 | 11.2 13.2 -1.0 | | 7.2 21.2 | 4.1 | 10.2 | 10.1 | 7.1 | 21.1 | | | -2.0 | | 5.1 | | | | | | | -3.0 | | | | | | | | | 3.1 -4.0 | ------------------------------------------------------------------------------------- Each X represents 5 students =====================================================================================
Figure 6.2 The variable map for the Rasch partial credit model displays the person ability and item step difficulty estimates.
Level 1
Level 2
Level 3
Level 4
Level 5
Level 6
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
147
should also be noted that in the upper tail there is a heavy reliance on the curriculum-
based items 17 to 20. An ideal extension of the SRQ would be to include more items
which reached into and beyond the higher levels of student ability at the
secondary/tertiary interface. Further discussion of this is undertaken in Section 6.5.
The variable map can also be used to help define levels of understanding. As item
step codes were developed on the basis of a substantive framework, these too can be
incorporated into the definition of these levels. Hence six levels of understanding
were mapped out in the following way. Firstly, the cut-off between level one and
two was chosen so that all item steps in level one had an item step code of one.
Secondly, the cut-off between levels two and three was chosen so that no item step
code greater than two appeared in level two. Finally, cut-offs between levels three
and four, four and five, and five and six were chosen so that levels two, three, four
and five all spanned approximately equal widths on the logistic scale. This divided
the map naturally into six levels with cut-offs at -1.30, -0.65, 0.00, 0.65 and 1.30 on
the logistic scale. Using the initial item step codes to help define these levels ensures
some degree of consistency between these levels and those of the Watson and
Callingham hierarchy.
The final column, “Out of SRQPC” of Table 6.2 gives the levels at which each coded
response to each item appears in the Rasch variable map. If the response framework
based on the six levels of Watson and Callingham has been applied consistently to
the SRQ and if the same framework is applicable to describe understanding at the
secondary/tertiary interface, than it should be expected that the item difficulty levels
resulting from the Rasch partial credit model would be consistent with the item
response codes which were used in the model. In the discussion which follows, each
item is examined for its consistency or otherwise.
6.3.1 Consistency of item step codes and difficulties
In SRQ_1, which required the unpacking of the algorithm for the mean, the fully
correct response, which had been coded at level 5, was mapped to level 4 by the
Rasch analysis. Level 5 had been selected for this response as it required a
mathematical understanding seen to be consistent with this level. It should be noted
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
148
here that in assigning levels to the item step difficulties of the Rasch analysis we are
categorising a continuous variable. As this response to SRQ_1 appears towards the
top of level 4, there is in fact little inconsistency demonstrated here.
In SRQ_2, which requires the calculation of an appropriate average, the fully correct
response, coded at level 6, maps to level 5, and lower level responses, coded at 1 and
2, map to level 3. In so doing, responses coded at 6 (mean with outlier removed)
demonstrate essentially the same ability level (1.05) as responses coded at level 5
(median) with ability level 0.97. Part of the reason for this may simply be that,
because of the small percentage of students coded at level 5 (3.6%) compared to 38%
coded at 6, the Rasch analysis cannot differentiate between the two responses. It is
surprising that the lowest responses map to level 3, however this too may be partially
due to the small percentage of students giving each of these responses. One possible
approach to this would be to revise the scoring system to ensure a more uniform
distribution of responses across categories (Keeves and Masters, 1999). Rather than
taking this approach at the initial stage, we have chosen to define codes to
differentiate meaningfully between responses where possible but will later use the
levels of understanding defined by the Rasch partial credit model.
In SRQ_3, the marble question, initial coding is confirmed by the Rasch analysis.
There is one other item where the Rasch analysis results in complete confirmation of
coding, that is SRQ_21, the item with an out of scale graph reporting on increasing
membership of a society.
In SRQ_4, the item about rash medication, the top code is confirmed at level 3, while
a partially correct response coded as 1 appears at level 2 in the Rasch map. Given
that 96% of students had this item correct, it is probably impossible to glean much
from the level of understanding demonstrated by a partially correct response. In fact,
in follow-up discussion, at least one student of very high ability reported thinking
that there must be more required by this question than the obvious response, a
common problem for students presented with questions well below their ability.
In SRQ_5, the correct response which required not being mislead by the conjunction
fallacy, mapped from code 3 to the top of level 4 in the Rasch analysis. This is one
of few items whose top code is mapped to a higher level of understanding than had
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
149
been predicted. The reason here may be that for the analogous item in the Watson
and Callingham study which requires the three probabilities to be estimated, students
are less likely to fall prone to the conjunction fallacy than in the SRQ multiple-
choice question.
In SRQ_6, the item about winning the toss in a game of cricket, the correct response
is confirmed at level 4 but lower responses, coded at level 1 and 2, move up to level
3. Responses coded as 2 included those who chose heads or tails despite knowing
that each had a probability of 0.5 of occurring. It may be the case that the numerical
calculation reflects the students’ level of understanding but the choice of heads or
tails follows their own internal rules of choice which can operate simultaneously
without apparent conflict.
In SRQ_7, the new car item, a correct response mapped from code 5 to level 4. It is
suspected that this is one item where there is a difference in the application of levels
of understanding between the school students in Watson and Callingham’s study and
those at the secondary/tertiary interface in this study. For these students, at the
secondary/tertiary stage, a lower level of understanding is reflected in a correct
response to this item.
The results of the Rasch partial credit analysis on SRQ_8, the pie chart summing to
more than 100%, were initially surprising. In this item all responses, coded 1
through to 5, mapped to level 4 understanding. Hence this item has no real
discriminatory power at this level. The suggested reason for this is that not
recognising the problem on the graph does not mean that it would not have been
understood as a problem had it been identified for the students. While it may have
been expected that a more critical eye for such detail would be consistent with higher
levels of understanding, the results of the analysis show this not to be the case.
SRQ_9 is the question requiring estimation of the number of fish in the dam. The
fully correct response coded at 6 mapped to level 5, as did responses of 10% (coded
as 5). Given the small number (less than 1%) of code 5s, it is not surprising that the
model could not discriminate between these two responses. However, given the
requirement of proportional reasoning for the correct answer, it was expected that
this item would reach into level 6. For this item, the lower response codes, 2 and 4,
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
150
both mapped to level 4, while code 1 maps to level 3. Examining the item step
difficulties in their continuous form, aside from the gap between code 0 and code 1,
the biggest jump appears between codes 1 and 2. This can be fully explained as from
code 2 up, some understanding of sampling and proportions is required. For many
responses coded 4 and possibly even 2, it appears to be lack of basic mathematical
skills rather than lack of statistical understanding which has been demonstrated. (See
Section 7.4 for further discussion of this issue.) This item is also one of a number for
which the lowest level of understanding is higher than expected. Unlike in SRQ_2, it
is not caused here by a small percentage response in the lowest groups resulting in
lack of discrimination with the higher code. It appears here that simply being able or
willing to give a response is demonstrative of a reasonable level of understanding.
This may be consistent with the fact that most of the students in the study are
operating at level 4 or 5.
In SRQ_10, the weather prediction item, the only difference between coding and
final levels is that the response 75-84%, coded as 3, has mapped to level 5 with the
correct response, 65-74%. It is likely that students giving the response 75-84%
understood the importance of a 70% chance in the question and possibly, reading
more into the question than had been intended, interpreted this as “at least 70%”.
In SRQ_11, the graph of number of years in the town drawn not to scale, there is a
minor adjustment with code 3 mapping to level 4 along with the top response code.
The lower code 2 remains unchanged. In SRQ_12, the choice between a two and
three dimensional pie chart, the top code 4 maps to level 5 and code 3 to level 4. In
fact these two codes lie respectively just above and below the level 4/5 border.
For SRQ_13, the hospital question, the correct response, coded at 6, maps to level 5.
It is worth noting here that no completely multiple-choice item has a step difficulty
which appears in level 6. In fact all level 6 responses require either a calculation or a
significant explanation.
SRQ_14 and SRQ_15 require a critical assessment of study results. These two items
had been coded to maintain consistency between them. Code 6 was restricted to
SRQ_14 which required proportional reasoning in terms of identification of the fact
that a rate was required. In SRQ_14, codes 6, 5, 4 and 3 mapped to levels 6, 5, 4 and
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
151
4 respectively, while in SRQ_15, codes 5, 4 and 3 mapped to levels 6, 5 and 5
respectively. Clearly the extra level of critical thinking required in SRQ_15 where
the study results appeared more plausible, added sufficient difficulty to shift
responses, which appear to be the same across the two questions, into a higher level
of understanding. In both items the lowest codes, 1 and 2 mapped to level 3, again a
reflection of the degree of effort required to provide a response.
In the Goodwill Bridge question, SRQ_16, the top responses, coded 5 and 4, mapped
to levels 5 and 4 respectively. Code 3 which also mapped to level 4 represented less
than 1% of students. Again codes 1 and 2 mapped to level 3.
SRQ_17 to 20 are the rainfall questions. (SRQ_19 has been excluded from the
Rasch analysis.) SRQ_17, 18 and 20 are characterised by high non-response rates:
18%, 36% and 30% respectively.) For all three questions the lowest non-zero code
maps to level 4, an indication of the level of ability demonstrated by being willing to
provide any response to this set of items. Like SRQ_8, SRQ_17 which asks for the
median rainfall has effectively no discriminatory power at tertiary level with all
codes (1 to 4) mapping to level 4. In SRQ_18, requesting the lower quartile, codes 4
(representing less than 1% of students) and 5 map to level 6, codes 2 and 3 to level 5.
These responses all demonstrate a reasonable, if not perfectly correct knowledge of
more advanced terminology than other questions. For SRQ_20, requiring a
mathematical transformation, codes 5 and 6 mapped to levels 5 and 6, while codes 1
and 3 mapped to level 4.
SRQ_21 was a question of low difficulty with response codes being confirmed at
levels 1 and 2 by the Rasch partial credit analysis. SRQ_22 required knowledge of
the normal distribution. As for SRQ_17, 18 and 20, this item had a relatively high
non-response rate (16%), and the lowest codes, 1 and 3 all mapped to level 4. Both
code 4 and 6 mapped to level 5. Given that a fully correct response to this item
demonstrated a degree of familiarity with the normal distribution sufficient to utilise
its symmetry, it is a little surprising that this response did not appear as level 6
understanding. However this is consistent with the observation noted previously that
no completely multiple-choice question had an item step difficulty in level 6.
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
152
6.3.2 The SRQPC Score
In general it can be concluded that the six levels of statistical literacy described by
Watson and Callingham can be used as a framework for statistical understanding at
the secondary/tertiary interface with one or two minor adjustments. Firstly, the
restriction of proportional reasoning to level 6, critical mathematical, is not necessary
at this stage of education. Such understanding could be lowered to level 5, critical,
or even level 4, consistent non-critical. Secondly, an added emphasis on more
advanced terminology could be added at level 6. It is suggested however that more
items reaching into the critical mathematical level should be developed to better
define the description of this level. All students in this study demonstrated a level of
understanding above the two lowest levels described in that framework: idiosyncratic
and informal. A small number of students were operating with an inconsistent
understanding. Approximately 27% demonstrated a consistent non-critical
understanding, 66% a critical understanding and 6% a critical mathematical
understanding.
Having used the Rasch partial credit model to attribute individual item responses to
levels of understanding, these levels could then be used to develop and calculate a
new score based on the SRQ. Rather than using the dichotomous approach of the
SRQ Score described in Section 6.2, the SRQPC Score uses the levels of
understanding resulting from the Rasch partial credit analysis. We shall call this the
SRQPC Score. As was the case for the SRQ Score, the SRQPC Score only uses
items which were administered in both years of the study. Excluded from the Rasch
analysis due to lack of fit to the model, SRQ_19 was not included. Due to the lack of
discriminatory power of SRQ_8 and SRQ_17, described above, these items were also
not included. Although SRQ_18 and 20 could have been included, it was felt that
they should also be excluded to maintain greater consistency with the original SRQ
Score. Hence the SRQPC Score is calculated as the sum of the fitted levels of
understanding demonstrated by the item steps achieved for the fourteen items:
SRQ_1, 2, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15 and 16.
As for the aggregate SRQ Score, the SRQPC Score was examined for consistency
across cohorts. (See Fig 6.3) While the SRQPC Score does not appear as consistent
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
153
as the SRQ Score, a one-way ANOVA does not indicate any differences between
means across cohorts (p>0.8). As described in Section 2.3, the second semester
(II_04) cohort is routinely a smaller group with backgrounds which tend to differ
from the larger class. Both the size and different nature of this group explains the
apparent difference in shape of the distribution.
One effect of using the SRQPC Score rather than the original aggregate score, is the
lengthening of the lower tail without much change in the bulk of the distribution.
This score accentuates those students whose understanding is particularly poor.
Descriptive Statistics: SRQPC Variable Cohort N Mean SE Mean StDev SRQPC I_04 300 52.343 0.365 6.329 I_05 237 52.021 0.412 6.342 II_04 74 51.919 0.887 7.626 Variable Cohort Minimum Q1 Median Q3 Maximum SRQPC I_04 26 49 53.5 57.00 66 I_05 23 48 52.0 56.00 66 II_04 24 47 54.0 57.25 64
Cohort
SRQ
PC
II_04I_05I_04
70
60
50
40
30
20
B oxplot of SRQPC by Cohort
Figure 6.3 The distribution of SRQPC Scores is essentially consistent across the
three cohorts.
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
154
6.4 Expected Responses to the SRQ
Using the Rasch partial credit model, it is possible to determine the estimated
expected response to an item for a person of given ability. In this section, the
expected response code to each item on the SRQ (excluding SRQ_19) is calculated
for a person of average ability within three groups of students, namely, those who
have not completed Maths B, those who have studied at, but not beyond, Maths B
level, and those who have studied beyond Maths B.
The estimated expected responses are calculated from the model in Equation 6.2,
using estimates from the fitted model for the ijδ and the average β estimate for each
of the three groups. The β estimates obtained are:
0
1
2
ˆ 0.73 for students below Maths B level;ˆ 0.77 for students at Maths B level;ˆ 0.93 for students beyond Maths B level.
β
β
β
=
=
=
Using these ability estimates and each item step difficulty, ijδ , the expected response
to item i for a person of average ability in maths group g, is calculated as:
( )0
ˆ ; 1,...,18,20,..22; 0,1,2;im
gix
xP X x i g=
= = =∑
where ( )ˆgiP X x= is calculated from Equation 6.2.
Table 6.5 gives the estimated expected level of response for each group, together
with the maximum possible response code allocated for each item.
The closeness of the average β estimates for students below Maths B and those at
Maths B, results in little difference in expected responses between these two groups
of students. This tends to reflect the small number of students below Maths B (n=28)
included in the data. In considering the difference in expected responses between
students at Maths B level and those beyond Maths B, it must be realised that the
degree to which the second group can outperform the first depends on the proximity
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
155
of students at Maths B level to the maximum possible response for an item. Figure
6.4 plots the difference in expected response to each item between the groups beyond
and at Maths B, and also between the groups at and below Maths B. These
differences are plotted against the expected response for the at Maths B group as a
proportion of the maximum possible response.
Item Below
Maths B At
Maths B Beyond
Maths B Max
Response
SRQ_1 4.2 4.3 4.4 5
SRQ_2 4.5 4.5 4.8 6
SRQ_3 3.4 3.5 3.6 4
SRQ_4 2.9 2.9 2.9 3
SRQ_5 2.2 2.2 2.3 3
SRQ_6 3.3 3.4 3.6 4
SRQ_7 3.9 4.0 4.3 5
SRQ_8 4.3 4.4 4.6 5
SRQ_9 3.8 4.1 4.8 6
SRQ_10 3.3 3.4 3.7 5
SRQ_11 3.4 3.4 3.5 4
SRQ_12 3.1 3.1 3.3 4
SRQ_13 3.7 3.8 4.2 6
SRQ_14 4.1 4.2 4.4 6
SRQ_15 2.6 2.7 3.0 5
SRQ_16 4.1 4.1 4.3 5
SRQ_17 3.0 3.1 3.4 4
SRQ_18 1.5 1.6 2.0 5
SRQ_20 2.5 2.6 3.3 6
SRQ_21 1.9 1.9 1.9 2
SRQ_22 2.8 3.0 3.6 6
Table 6.5 Expected responses to the SRQ predicted by the Rasch partial credit model
While the differences are small between the at Maths B and below Maths B groups,
the nature of the Rasch model is such that the pattern of differences is the same as for
the comparison between the beyond Maths B and at Maths B groups which can be
seen more clearly. As expected, there is a general decrease in difference as the
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
156
expected response for the at Maths B groups gets closer to the maximum possible
response.
Expected response for At Maths B as a proportion of Max response
Dif
fere
nce
in e
xpec
ted
resp
onse
1.00.90.80.70.60.50.40.3
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
VariableBey ond-At_MathsBAt-Below_MathsB
22
21
20
18
17
16
15
14
13
12
11
10
9
8
7
65
4
3
2
1
22
21
20
18 17
1615 14
13
12
11
10
9
8765
4
3
2
1
Figure 6.4 Scatterplot showing differences in expected responses between Maths
groups
Items which are of most interest are those which appear to differ from this general
pattern. The most notable of these is SRQ_9, with SRQ_20 and SRQ_22 also of
some interest. The item SRQ_20 requires students to recognise the need and to know
how, to find the maximum rainfall given the logarithms of the rainfalls. Although
logs are covered in the Maths B curriculum, the reinforcement of Maths B in higher
maths and greater experience in a broader range of applications gives those students
with higher level maths a clear advantage.
Item SRQ_22 involves the normal distribution. In this case, students with higher
maths are less likely to have had more experience in this specific area. As described
in Section 5.3.1, students’ difficulties with this item may in part be due to a reliance
on graphics calculators in working with distributions. It is possible that students who
have studied more maths have developed better visualisation abilities and more
transferable skills which are being demonstrated in this item.
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
157
Item SRQ_9 is notable as showing the greatest advantage for students with higher
level maths despite being of average difficulty. This item involves estimating the
fish population in a dam on the basis of a tag and recapture procedure. It is unlikely
that any of the students in each of the three groups would have previously met a
situation of this nature within the school curriculum. The advantage of the higher
maths students in this item appears to rest on their ability to problem solve and to
handle multi-step tasks. In Section 7.4, the relationship of performance in this item
to numeracy is discussed with emphasis on confidence with handling fractions. In
Section 4.3.5 it was noted that students who have studied beyond Maths B
demonstrate greater ability in items on the Numeracy Questionnaire which involve
application of fractions. This greater ability will give these students an added
advantage in this item.
6.5 Discussion - Suitability of the SRQ
In Chapter 5, the appropriateness of the Statistical Reasoning Questionnaire as a tool
for measuring statistical reasoning at the secondary/tertiary interface was indicated.
Despite this confirmation, an application of both a dichotomous Rasch analysis and a
Rasch partial credit analysis indicates that the complete coverage of student abilities
by items may be a little lacking. In particular, the SRQ has a large number of item
step difficulties below the range of student abilities in the study and insufficient item
step difficulties at and above the maximum student ability. An interesting aspect of
this is that most of the SRQ items which do reach into the higher levels of ability are
part of a group of items requiring a slightly higher level of mathematical knowledge
as well as a greater appreciation of statistical terminology. However, as indicated by
the analysis in Chapter 7, it may be that this is necessary to assess higher student
ability in statistical reasoning.
It is felt that in order to measure more accurately a fuller range of statistical
understanding at the secondary/tertiary interface, more items reaching into the higher
levels of understanding should be included. Such items may well include an
appreciation of terminology, but care needs to be taken to separate reliance on
Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire
158
mathematical manipulation. (The concurrence of terminology and mathematical
knowledge is one of the complications with the items SRQ_17 to SRQ_20.) The
development of items which measure higher statistical understanding (particularly
apart from probabilistic reasoning) at this educational level remains an area for
further research, but which should be informed by the results in Chapter 7.
While acknowledging the need for more items which assess higher levels of
understanding, reassurance should be taken from the distribution of total SRQ
Scores. When an assessment is overly simple for a group of students the effect on
the scores should be to apply an artificial ceiling. The resultant distribution would be
skewed with a cluster of observations in the upper tail. An examination of the
distributions (see Figure 5.1) shows that this is not the case for the total SRQ Scores.
Rather all three cohorts demonstrate symmetric distributions, indicating that a ceiling
effect is not in operation.
In acknowledging the tension between the consistency and distribution of SRQ
Scores and the possibility for improvement of the SRQ, notice also needs to be taken
of its purpose within this research. In a study such as this, validity of outcomes
depends largely on the cooperation of students and in particular of students who
represent as accurately as possible all those within the study population. As
described in Chapter 2, the students enrolled in MAB101 are drawn from a wide
variety of backgrounds. Even moderately difficult items designed to measure the
ability of higher level students would be well beyond the ability of others. In order
to maximise a student’s cooperation, care needs to be taken to minimise the number
of such difficult items which, particularly at first glance, would be a disincentive to
less able or less motivated students. On the basis of the symmetry and consistency of
the distribution of students’ scores, it is argued that the SRQ is of an appropriate
level to assess the statistical reasoning of students at the secondary/tertiary interface,
balancing the competing factors of student ability and cooperation.
Chapter 7 Factors Which Influence Statistical Reasoning
159
Chapter 7 Factors Which Influence Statistical
Reasoning
7.1 Introduction
The major aim of this study is to better understand factors which influence statistical
thinking at the secondary/tertiary interface. In this chapter, two distinct aspects of
this are considered within the context of MAB101. The first of these examines
performance on the Statistical Reasoning Questionnaire for incoming students, while
the second looks at students’ performance on the end of semester section of the
assessment. The pressures of course requirements and the need to maximise student
cooperation prevented a follow-up Statistical Reasoning Questionnaire from being
administered as part of this study; moreover, the performance of students in course
assessment should be considered as an important aspect of their statistical
development.
Because of the differences between the first and second semester cohorts, described
in Chapter 2, and the difficulties with the administration of the Numeracy
Questionnaire in Semester II 2004, explained in Chapter 4, only the Semester I 2004
and Semester I 2005 cohorts are included in this chapter. In Section 7.2, general
linear models are used to describe the SRQ Score in terms of students’ numeracy,
attitudes, mathematical backgrounds and demographic variables. Similar modelling
carried out in relation to the SRQPC Score (defined in Section 6.3.2) is described in
Section 7.3. Section 7.4 investigates more closely the relationship between
numeracy and individual items of the SRQ. The modelling of students’ course
outcomes via general linear models is described in Section 7.5. Section 7.6
concludes this chapter with a discussion of these results.
Chapter 7 Factors Which Influence Statistical Reasoning
160
7.2 Predictors of Statistical Reasoning of Incoming Students
As described in Section 6.2, the SRQ Score used here is formed as the total number
of correct responses to items common to the SRQ in both 2004 and 2005, excluding
those items considered to be more curriculum-based in nature. Items thus included
are SRQ_1 to SRQ_3, SRQ_5 to SRQ_16. Variables included as possible predictors
for the SRQ Score were as for those considered in the analysis of Numeracy Scores
(see Section 4.3.4) as well as the Numeracy Score itself and separate aspects of
attitudes towards statistics. Hence the allowable predictor variables are:
Gender male/female;
1st Semester dichotomous variable =1 for students for whom this was
their first year of enrolment at QUT;
Years Since School continuous variable;
OP continuous variable - OP (overall position) or equivalent
score; ranges from 1 to 25; numerically smaller values of
OP indicate a better performance;
Maths Student dichotomous variable =1 if enrolled in a mathematics
degree, a double degree including mathematics, an applied
science degree majoring in mathematics, or an education
degree with mathematics as a teaching subject;
Repeat dichotomous variable =1 if student had previously failed
MAB101;
Maths B Result a categorical variable with three levels:
D = distinction (an A or B standard at school level; a 6 or
7 standard in the university equivalent subject),
P = pass (any other passing grade),
N = failed or not attempted (includes all students who
have not successfully completed Maths B);
Chapter 7 Factors Which Influence Statistical Reasoning
161
Higher Maths a dichotomous variable =1 for students who have studied
mathematics beyond Maths B, generally either the
extension maths subject (Maths C) at high school or at
least some university level mathematics;
Numeracy a continuous variable given by the total score on the
Numeracy Questionnaire, with range 0 to 21;
Affect continuous variable with possible values ranging from
-2 to 2; constructed as the mean of responses to the five
relevant items on the Attitudes Survey and described in
Section 3.2;
Value as for Affect – based on four items;
Difficulty as for Affect – based on two items;
Motivation as for Affect – based on two items;
Link as for Affect – based on two items;
Use as for Affect – based on one item;
Self-Efficacy continuous variable with possible values ranging from
-10 to 10; constructed as the sum of responses to the five
relevant items on the Attitudes Survey and described in
Section 3.2.
As described in Section 4.3.4, general linear models with a combination of
backwards elimination and forwards selection were used to describe relationships
between SRQ Score and the predictor variables. Because of the complications in the
reporting of OP scores (see Section 2.3) modelling was carried out both with OP as a
possible predictor and without.
7.2.1 Results for the 2004 cohort
An initial analysis was performed using only the 2004 data. With OP in the model,
significant predictors for SRQ Score are: Numeracy (p=0.001), OP (p=0.001),
Gender (p=0.012) and Years Since School (p=0.040). Further investigation revealed
that the significance of Years Since School is due entirely to three students for whom
Chapter 7 Factors Which Influence Statistical Reasoning
162
Years Since School is greater than ten. A more robust model involves Numeracy
(p<0.001), Gender (p=0.005) and OP (p=0.024), and has an adjusted R-squared
value of 22%. In this model SRQ Score increases with improved Numeracy and OP,
with males outperforming females. With OP excluded from the model, significant
predictors are Numeracy (p<0.001) and Gender (p=0.004), with an adjusted R-
squared of 22%.
Due to the insignificance of the various aspects of attitudes in explaining the SRQ
Score in 2004 and the desire to investigate issues specifically relevant to high school
experiences of statistics (see Section 3.5), the format of the Attitudinal Survey was
changed in 2005. The Affect, Value, Difficulty, Motivation, Link and Use aspects
were dropped, while the Self-Efficacy component was retained because of the
significance of this aspect in explaining numeracy. (See Section 4.3.4.)
7.2.2 Results for the 2005 cohort
Results for 2005 were also considered separately. Allowing for OP, significant
predictors are Numeracy (p<0.001), OP (p=0.001) and Years Since School (p<0.001)
with an adjusted R-squared of 30%. Further investigation of the Years Since School
term indicated that it is not the result of a small number of older students as in 2004.
Rather, for 2005, even when restricted to students with Years Since School less than
five, this variable is still a significant predictor with higher values resulting in higher
SRQ Scores. Without OP in the model, significant predictors are Numeracy
(p<0.001) and Maths B Result (p=0.009). The effect of Maths B Result in this model
is such that there is benefit in SRQ for students with Maths B Result = D over the
others, but no significant difference between those with Maths B = P and those who
have failed or not completed Maths B.
7.2.3 Combining the 2004 and 2005 cohorts
The two years were also considered in a combined analysis with an indicator variable
allowing for year and possible two-way interactions with Year included. With OP
included, the best model for the combined years has an adjusted R-squared of 25%
and significant predictors: Numeracy (p<0.001), OP (p<0.001) and Years Since
School (p<0.001). The fitted equation for this combined model is:
Chapter 7 Factors Which Influence Statistical Reasoning
163
Equation 7.1
[ ] ( ) ( ) ( )(s.e.=0.65) (s.e.=0.035) (s.e.=0.045) (s.e.=0.047)
6.40 0.21 0.19 0.20 . E SRQ Numeracy OP Years Since School= + × − × + ×
Recall that the negative coefficient for OP is as expected since smaller values of OP
score indicate a better performance. The residual plots, shown in Figure 7.1, show
no systematic concerns with the model.
Standar dized Residual
Per
cent
420-2-4
99.9
99
90
50
10
1
0.1
Fitted V alue
Sta
ndar
diz
ed R
esi
du
al
12.510.07.55.0
3.0
1.5
0.0
-1.5
-3.0
Standar dized Residual
Freq
uen
cy
2.251.500.750.00-0.75-1.50-2.25
30
20
10
0
Observation Order
Sta
nda
rdiz
ed
Res
idua
l
450400350300250200150100501
3.0
1.5
0.0
-1.5
-3.0
Normal Probability Plot of the Residuals Residuals Versus the Fit ted Values
Histogram of the Residuals Residuals Versus the Order of the Data
SRQ on Numeracy, OP and Years Since School
Figure 7.1 Residual plots for the model of Equation 7.1 show no systematic
concerns.
Without OP, the best model has as significant predictors, Numeracy (p<0.001), Self-
Efficacy (p=0.011) and Gender (p=0.039) and an adjusted R-squared of 21%. The
fitted equation for this model is:
Equation 7.2
[ ] ( ) ( ) ( )(s.e.=1.15) (s.e.=0.22) (s.e.=0.027) (s.e.=0.039)
4.50 0.45 Male 0.23 0.10 - . E SRQ Gender Numeracy Self Efficacy= + × = + × + ×
Again the residual plots shown in Figure 7.2 show no concerns.
Chapter 7 Factors Which Influence Statistical Reasoning
164
Standardized Residual
Per
cent
420-2-4
99.999
90
50
10
1
0.1
Fitted Value
Stan
dar
diz
ed R
esid
ual
1210864
2
1
0
-1
-2
Standardized Residual
Fre
que
ncy
2.251.500.750.00-0.75-1.50-2.25
30
20
10
0
Observation Order
Sta
nda
rdiz
ed
Re
sidu
al
400350300250200150100501
2
1
0
-1
-2
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
Histogram of the Residuals Residuals Versus the Order of the Data
SRQ on Gender, Numeracy, and Self-e fficacy
Figure 7.2 Residual plots for the model of Equation 7.2 show no systematic
concerns.
7.2.4 Synthesising the results
In synthesising the results of the analyses for the two years both individually and
combined, a few points are notable. Firstly, it is clearly evident that in all models,
with or without OP score included, the score on the Numeracy Questionnaire is a
highly significant predictor of the score on the SRQ and is a more useful predictor
than other variables formed from the mathematical backgrounds of students. When
the OP score is included in the model, it too is highly significant in addition to the
Numeracy Score. Both these variables work in the expected direction, that is,
increasing Numeracy increases SRQ and improving OP increases SRQ.
The significance of gender differs between the two years. In 2004, Gender is
significant both with and without OP, while in 2005 it is not. When the two years
are combined, Gender is significant without OP but not with. In all cases where
Gender is significant, males outperform females in the SRQ in addition to their
advantage in the Numeracy Questionnaire. (See Section 4.3.4)
When OP is included in the model, Years Since School is significant in 2005 but not
in 2004. It is likely that this difference between years is related to the under-
reporting of OP scores in 2004. It is not surprising that students with higher values
of Years Since School are less likely to report their OP score. Investigation of the
two years indicates that in the 2004 data, there is much stronger evidence than in
Chapter 7 Factors Which Influence Statistical Reasoning
165
2005 of this relationship between the non-reporting of OP and increasing values of
Years Since School. For the 2005 data, the coefficient of Years Since School in the
model is positive, suggesting that mature-age students in general have experience
which improves their statistical reasoning above what their numeracy and OP scores
together would predict. The same effect is not evident in the 2004 data possibly
because fewer such students have provided an OP score.
When OP score is not included, Years Since School is not significant. Investigation
of the Years Since School variable shows that, considering pairwise correlations, it is
uncorrelated with both SRQ and Numeracy but positively correlated with OP, that is
larger OP scores (indicating lower achievement) are linked with larger values of
Years Since School. Hence a reasonable interpretation of the positive coefficient of
Years Since School in models for SRQ which include OP, is as a correction for the
OP score amongst older students.
In general, when OP score is not included in the model, less variation in SRQ can be
explained. Other variables, such as Self-Efficacy in the combined cohorts and Maths
B Result in 2005, become useful predictors, but these do not explain SRQ as well as
OP. In all cases, the Numeracy Score remains pre-eminent in its usefulness in
explaining statistical reasoning.
7.2.5 Recent school leavers
In order to focus on those students who had graduated more recently from high
school, the above analyses (for each year individually and combined) were repeated
considering only those students who had values for Years Since School equal to zero
or one. This removed 192 students from the total combined years' data of 484
students. Eighty-two of these had a Years Since School value of two or more, while
110 had a missing value for this variable.
The results for these smaller groups were similar to those for the complete groups.
When considering only these students at the immediate secondary/tertiary interface,
for the 2004 group, Gender is not significant in predicting SRQ whether or not OP is
included. Without OP, Self-Efficacy (p=0.008) becomes a useful predictor in 2004,
while in 2005, Maths B Result is not significant when OP is not allowed for. When
Chapter 7 Factors Which Influence Statistical Reasoning
166
the two years are combined, Gender is not significant but there is a significant
difference between years which acts in the opposite direction to the Year effect for
Numeracy. The two fitted equations for years combined are:
Equation 7.3
[ ] ( ) ( )(s.e.=0.76) (s.e.=0.040) (s.e.=0.052)
6.52 0.21 0.22 E SRQ Numeracy OP= + × − ×
when OP is included, and,
Equation 7.4
[ ] ( ) ( ) ( )(s.e.=0.55) (s.e.=0.036) (s.e.=0.052) (s.e.=0.28)
3.78 0.25 0.15 - 0.63 2004 E SRQ Numeracy Self Efficacy Year= + × + × + × =
when OP is not included. Hence when consideration is restricted to only those
students more specifically at the secondary/tertiary interface, the importance of
numeracy in predicting statistical reasoning is not altered.
7.2.6 Incorporating levels of numeracy
Using the levels of numeracy defined in Section 4.3.3 by the Rasch analysis, the
Numeracy Score was broken into two components, introductory numeracy and
intermediate numeracy. The variables thus defined:
Intro-Numeracy a continuous variable given by the total score
corresponding to levels 1 to 3 on the Numeracy
Questionnaire (i.e. items N_1-N_3, N_5-N_11 and N_19)
with range 0 to 11;
Inter-Numeracy a continuous variable given by the total score
corresponding to levels 4 and 5 on the Numeracy
Chapter 7 Factors Which Influence Statistical Reasoning
167
Questionnaire (i.e. items N_4, N_12-N_18, N_20-N_21)
with range 0 to 10;
were included as possible predictors for SRQ, using the complete 2004/2005 data set.
Allowing for OP, the best model for SRQ involves both components of numeracy,
Intro-Numeracy (p=0.021) and Inter-Numeracy (p=0.001) as well as OP (p<0.001),
and Years Since School (p<0.001). This model has an adjusted R-squared of 25%
and the fitted equation:
Equation 7.5
[ ] ( ) ( ) ( )(s.e.=0.77) (s.e.=0.083) (s.e.=0.062) (
6.46 0.19 - 0.21 - 0.19
E SRQ Intro Numeracy Inter Numeracy OP= + × + × − ×
( )s.e.=0.045)
(s.e.=0.048)0.21 .
Years SinceSchool+ ×
This model is virtually identical to the corresponding model (Equation 7.1) which
considers the Numeracy Score as a single entity, explains the variation in SRQ
equally well, and gives essentially equal weighting to the two components of the
Numeracy Score.
Without allowing for OP, the best model involves the predictors Intro-Numeracy
(p<0.001), Inter-Numeracy (p=0.003), Self-Efficacy (p=0.010) and Gender
(p=0.030). This model has an adjusted R-squared of 21% and the fitted equation:
Equation 7.6
[ ] ( ) ( )
( ) ( )(s.e.=0.48) (s.e.=0.22) (s.e.=0.064)
(s.e.=0.
3.88 0.47 Male 0.34 -
0.15 - 0.10 - .
E SRQ Gender Intro Numeracy
Inter Numeracy Self Efficacy
= + × = + ×
+ × + ×049) (s.e.=0.039)
Again, this model has the same significant variables as the corresponding model
(Equation 7.2) which considers the Numeracy Score as a single entity and explains
the variation in SRQ equally well. However, without OP in the model, SRQ is best
Chapter 7 Factors Which Influence Statistical Reasoning
168
predicted by giving approximately twice the weighting to the introductory
component of the Numeracy Score as to the intermediate component.
The most important point to be noted from these models is that, contrary to what may
appear intuitive, both introductory and intermediate components of the Numeracy
Score are significant in explaining statistical reasoning. This is despite the fact that
the SRQ requires very few mathematical calculations of the type included in the
introductory component of the Numeracy Questionnaire and no manipulations of the
type included in the intermediate component. Again we see evidence that ability
with mathematical skills tends to be linked with statistical reasoning even if the
reasoning does not explicitly require those specific mathematical skills.
7.3 Predictors of the SRQPC Scores
In Section 6.3.2, the results of the Rasch partial credit model were used to create the
SRQPC Score in which responses to the SRQ are scored according to the level of
reasoning they reflect. As for the SRQ Score, general linear models were used to
determine the best predictors for the SRQPC Score.
This modelling resulted in the same combination of predictor variables for SRQPC as
for SRQ, namely: Numeracy (p<0.001), Years Since School (p<0.001) and OP
(p=0.010), with an adjusted R-squared of 20% and the fitted equation:
Equation 7.7
[ ] ( ) ( ) ( )(s.e.=1.70) (s.e.=0.093) (s.e.=0.12) (s.e.=0.12)
47.00 0.54 0.30 0.44 . E SRQPC Numeracy OP Years SinceSchool= + × − × + ×
However, the residual plots, shown in Figure 7.3 indicate a degree of left skewness.
Chapter 7 Factors Which Influence Statistical Reasoning
169
Standardized Residual
Per
cent
420-2-4
99.999
90
50
10
1
0.1
Fitted Value
Stan
dar
diz
ed R
esid
ual
60555045
2
0
-2
-4
Standardized Residual
Fre
que
ncy
1.500.750.00-0.75-1.50-2.25-3.00
40
30
20
10
0
Observation Order
Sta
nda
rdiz
ed
Re
sidu
al
450400350300250200150100501
2
0
-2
-4
Normal Probability Plot of the Residuals Residuals Versus the Fitted Values
Histogram of the Residuals Residuals Versus the Order of the Data
SRQPC on Numeracy, OP, Years Since School
Figure 7.3 Residual plots for the model in Equation 7.7 indicate some left
skewness.
A standard technique for handling skewness is to apply a Box-Cox transformation,
that is a transformation of the form Y? to the response variable, Y, such that the
transformed data is closer to normal. For the SRQPC data, choosing the value of ?
which minimises the variance of the transformed variable, hence maximising the
likelihood, gives a 95% confidence interval for ? of 1.90 to 3.32. Hence, the value
?=3 is chosen as the most appropriate. When this transformation is applied and the
modelling procedure repeated for the transformed data, the resulting best model
contains the same predictor variables as for SRQPC, (and SRQ), namely Numeracy,
Years Since School and OP. The residual plots for this model, shown in Figure 7.4
demonstrate no systematic concerns. It should be emphasised that this procedure is
used here simply to check that the significant variables are unchanged by
transforming to obtain more normal residuals. It is not suggested that a
transformation of the SRQPC be used. The interest lies not in predicting the SRQPC
but in investigating its nature and what influences it.
If the Numeracy Score is split into its two components, then again the predictor
variables are the same as for SRQ. No further modelling of the SRQPC Score was
carried out because of the similarity to the SRQ with respect to important predictors,
and because of the difficulties of interpreting the transformed variable.
Chapter 7 Factors Which Influence Statistical Reasoning
170
Standardized Residual
Per
cent
420-2-4
99.999
90
50
10
1
0.1
Fitted Value
Stan
dar
diz
ed R
esid
ual
200000175000150000125000100000
3.0
1.5
0.0
-1.5
-3.0
Standardized Residual
Fre
que
ncy
2.251.500.750.00-0.75-1.50-2.25-3.00
40
30
20
10
0
Observation Order
Sta
nda
rdiz
ed
Re
sidu
al
450400350300250200150100501
3.0
1.5
0.0
-1.5
-3.0
Normal Probability Plot of the Residuals Residuals Versus the Fit ted Values
Histogram of the Residuals Residuals Versus the Order of the Data
Residual Plots for lambda=3
Figure 7.4 Using the transformed response variable, residual plots demonstrate no systematic concerns.
7.4 Further Exploration of the Link between Numeracy and
Statistical Reasoning
As well as considering the SRQ and SRQPC Scores, individual items on the SRQ
were investigated regarding their relationship with numeracy. For each item in the
SRQ, the mean Numeracy Scores were compared, using t-tests, for the group of
students who had the item correct and those who had it incorrect. This comparison
was made for both the introductory and intermediate Numeracy Scores as well as the
total score. Table 7.1 shows the results of these comparisons. For those items with a
highly significant difference in mean scores for each of introductory, intermediate
and total numeracy, a 95% confidence interval for the mean difference in the total
score is also quoted.
Items SRQ_1, SRQ_3, SRQ_9, SRQ_19 and SRQ_20 are notable, as for these items
the difference in Numeracy Scores between the students who succeed at the item and
those who do not, is greatest. In the case of SRQ_19 and SRQ_20 which involve
logarithms, no explanation of this relationship is required. SRQ_1 is the item which
requires an appreciation of the algorithm for the mean. Its relationship to numeracy
is evidence of the claim made in Section 5.3.1 that a correct response to this item
reflects something other than the students being able to select an appropriate average
as is implied in the SRA component scales.
Chapter 7 Factors Which Influence Statistical Reasoning
171
The remaining two items in this group, SRQ_3 and SRQ_9, both require some ability
with fractions. Given that the item SRQ_3 (involving drawing marbles from boxes)
is relatively easy for students at this level with a success rate of 82%, its strong
relationship to numeracy is interesting. As explained in Section 5.3.1, it is found
with this item that most errors are the result of miscalculations rather than
misunderstandings of probability. There is evidence here that errors resulting from
apparent carelessness (such as calculating the ratio of red to blue marbles in one box
and red to all marbles in the other), even with the use of a calculator, are not entirely
random, but likely to be related to a lack of confidence and familiarity with handling
common fractions in any situation.
Introductory Numeracy
Intermediate Numeracy
Total Numeracy
95% CI for mean diff. in total numeracy
SRQ_1 *** *** *** (2.4,3.9) SRQ_2 *** *** *** (1.2,2.7) SRQ_3 *** *** *** (2.1,4.0) SRQ_4 - - - SRQ_5 *** *** *** (0.7,2.2) SRQ_6 *** - * SRQ_7 * - * SRQ_8 - - - SRQ_9 *** *** *** (2.4,3.8) SRQ_10 *** *** *** (1.4,2.8) SRQ_11 - - - SRQ_12 - - - SRQ_13 - ** ** SRQ_14 * - - SRQ_15 - - - SRQ_16 - - - SRQ_17 *** - *** SRQ_18 - - - SRQ_19 *** *** *** (2.0,3.8) SRQ_20 *** *** *** (2.6,4.3) SRQ_21 _ ** ** SRQ_22 - - -
*** p<0.005, ** 0.005<p<0.01, * 0.01<p<0.05
Table 7.1 SRQ items for which successful and unsuccessful students reflect a significant difference in mean Numeracy Scores
Chapter 7 Factors Which Influence Statistical Reasoning
172
Responses to SRQ_9 (the tag and recapture fish question) were investigated more
closely regarding their relationship with both the introductory and intermediate levels
of numeracy. In Section 6.3, through the Rasch analysis, responses to this item were
scored as:
0 no response, or response < 200;
3 responses between 200 and 525;
4 other incorrect responses – usually the result of calculating an
incorrect proportion;
5 correct response (2000); 10% (did not quite complete).
Response level for SRQ_ 9
Intr
odu
ctor
y N
umer
acy
Sco
re
5430
11
10
9
8
7
6
5
4
3
2
Response level for SRQ_ 9
Inte
rmed
iate
Num
erac
y S
core
5430
10
8
6
4
2
0
Figure 7.5 Boxplots of Introductory and Intermediate Numeracy Scores by response to SRQ_9
Chapter 7 Factors Which Influence Statistical Reasoning
173
Figure 7.5 gives the boxplots for the introductory and intermediate Numeracy Scores
of students, according to their fitted response code for item SRQ_9.
The boxplots indicate that students giving a level 5 response to SRQ_9 tend to score
more highly on both components of numeracy. There is a suggestion that those
responding at level 4 tend to have lower scores than those responding at level 3 on
the introductory numeracy but not on the intermediate numeracy. As the
introductory component of the Numeracy Questionnaire includes the items which
require handling of fractions, this is evidence of a relationship between the
manipulation of fractions in simple arithmetic and the successful application of
proportional reasoning. That is, students responding at level 4 are most likely being
prevented from completing their statistical reasoning by a lack of competency in
handling fractions.
The demonstration of a link between handling and manipulating fractions, as
assessed in the Numeracy Questionnaire, and using full and complete proportional
reasoning, as required for success at both SRQ_3 and SRQ_9, is of importance.
While manipulation and reasoning may well be considered as different abilities, the
link shown here suggests that they cannot be separated.
7.5 Predictors of the Exam Component of Assessment
The assessment schedule for MAB101 is given in Table 2.2. The assessment focus
in this section is on the end of semester exam, as it is a measure of students’ overall
individual learning of the core knowledge and skills presented in the course. The
mid-semester test focuses only on specific sections of the course. Quizzes, designed
largely to encourage engagement with the material, are often completed with a
substantial amount of assistance from tutors and other students. The project is an
excellent tool for developing statistical thinking but has the disadvantage in research
of being completed in groups and therefore may not necessarily be accurate as a tool
for measuring individual learning. However, the end of semester exam reflects the
skills and understanding developed through the projects. Analysis of data as well as
interaction with students in helping with projects throughout the semester indicates
Chapter 7 Factors Which Influence Statistical Reasoning
174
that those with the best understanding in their projects also tend to demonstrate better
understanding and performance in the exam.
In both 2004 and 2005, the end of semester exam contributed 60% of the assessment
for MAB101, or 50% for students who completed an optional essay. In both years,
the maximum marks available on the exam paper were more than what was required
for full marks in order to give students some flexibility of choice, but because of the
diversity of students, a considerable number of students obtained more than the
requisite full marks. Hence, for this analysis, raw exam scores were scaled as a
percentage of the maximum marks available and the two years combined.
Of those students who sat the exam at the end of semester, approximately 80% had
completed at least one of the Attitudes Survey, Numeracy Questionnaire or
Statistical Reasoning Questionnaire at the beginning of semester. Students who were
not surveyed consist predominantly of those who enrolled after the first week of
semester and those who attended very few classes even from week one.
Approximately 80% of students involved in the survey process sat the final exam.
Those not doing so include students who formally withdrew without penalty in the
first weeks of the course and those who failed to complete the course without notice.
Not surprisingly, the participants in both the survey process and the final exam are
not entirely representative of the complete group. In the final exam, those students
who had participated in the survey process outperformed those who had not, with a
95% confidence interval for the mean difference in exam mark of (5.2, 13.0).
Conversely, both introductory and intermediate levels of numeracy as well as SRQ
and SRQPC Scores were significantly higher for students who sat the exam
compared to those who did not. Interestingly however, there is no indication of a
difference in OP scores between these two groups. When possible relationships are
investigated between the various background variables and whether or not a student
sat the exam, the only significant relationship is with the variable Maths B Result,
with students who have failed or not completed Maths B less likely to sit the exam.
This is consistent with the observation that students without Maths B tend to need
extra support with all concepts throughout the semester, not only those that are
specifically associated with mathematics.
Chapter 7 Factors Which Influence Statistical Reasoning
175
It has already been observed in this chapter that numeracy, mathematical
backgrounds, OP and self-efficacy feature in explaining the SRQ and the SRQPC
Scores. General linear models were used to investigate the simultaneous significance
of students’ mathematical backgrounds, statistical reasoning and numeracy at the
beginning of semester on their end of semester exam score. That is, knowing that
numeracy, mathematical backgrounds and OP are significant in explaining SRQ, the
use of general linear models explores what happens when we consider their
combined effects on the end of semester exam score. Using the SRQ Score to
measure statistical reasoning, the best model for Exam Score has an adjusted R-
squared of 26% and significant predictors: OP (p<0.001), Year (p<0.001), Maths
Student (p=0.003) and an interaction between Maths Student and Year (p=0.005).
Neither Numeracy nor SRQ is significant in the presence of these variables (which, it
must be remembered, are related to them). However, if the SRQPC Score is used to
measure statistical reasoning, it is a useful predictor of the final exam score even in
the presence of other variables which are related to it. Using SRQPC, the best model
for Exam Score has an adjusted R-squared of 32% and significant predictor
variables: OP (p<0.001), Year (p<0.001), SRQPC (p=0.009), Gender (p=0.009),
Years Since School (p=0.015), Maths Student (p=0.015) and an interaction between
Maths Student and Year (p=0.011). The residuals for this model, shown in Figure
7.6, indicate no real concern. The fitted equation for this model is given by:
Equation 7.8
[ ] ( )
( )
( )
(s.e.=11.04) (s.e.=2.14)
(s.e.=3.37)
(s.e.=2.50)
43.38 5.62 female
0.20 1; 04
2.33 0; 05
14.33 1
E Exam Score Gender
Maths Student Year
Maths Student Year
Maths Student
= + × =
+ × = =
+ × = =
+ × =( )
( ) ( )(s.e.=3.36)
(s.e.=0.19) (s.e.=0.37) (s.e.=0.44)
; 05
0.50 2.43 1.09 .
Year
SRQPC OP Years Since School
=
+ × − × + ×
Chapter 7 Factors Which Influence Statistical Reasoning
176
Standardized Residual
Per
cen
t
420-2-4
99.9
99
90
50
10
1
0.1
Fitted Value
Stan
dard
ize
d R
esid
ual
10080604020
3.0
1.5
0.0
-1.5
-3.0
Standardized Residual
Fre
qu
ency
2.251.500.750.00-0.75-1.50-2.25-3.00
30
20
10
0
Observation Order
Stan
dard
ized
Res
idua
l
750
7 00650
600
550
500
4 50400
350
3 00250
200
1 50100501
3.0
1.5
0.0
-1.5
-3.0
Normal Probability Plot of the Residuals Residuals Versus the Fit ted Values
Histogram of the Residuals Residuals Versus the Order of the Data
Exam on Gender, Maths Student, Year, SRQPC, OP, Years SInce School, Maths Student x Year
Figure 7.6 Residual plots for the model in Equation 7.7 show some indication of
skewness.
Observed SRQPC Scores over the two cohorts range from 23 to 66, and hence the
predicted contribution to exam score from SRQPC ranges from 11.5 to 33. The
interaction between Year and Maths Student is a feature of the two cohorts which
needs to be allowed for in the model. Investigation of the exam scores and also of
marks for other assessment items indicates that in 2005 there was a significant group
of maths students who dominated the high achievers. Another feature of this model
is that unlike for the Numeracy and Statistical Reasoning Questionnaires, females
significantly outperform males. This is also the case for other assessment items and
is not uncommon in performance measures. However, it needs to be remembered
that at least some association exists between Gender, Maths Student and Year as
described in Section 2.3. It may be that part of the role of Gender in Equation 7.8 is
to adjust for the combined effects of Maths Student and Year. It must be emphasised
that none of these terms should be interpreted in isolation. Our main focus here is on
investigating the possible role of the SRQPC Score in explaining the exam score in
the presence of all other variable available to us.
The OP score is a measure of effort and commitment as well as ability, which are all
components of achievement in the end of semester exam. If OP is not included in
modelling Exam Score, but SRQPC is, then the best model involves SRQPC
(p<0.001), Gender (p=0.001), Maths Student (p<0.001), Year (p<0.001), Maths B
Result (p=0.014) and an interaction between Maths Student and Year (p=0.002), but
Chapter 7 Factors Which Influence Statistical Reasoning
177
the adjusted R-squared falls to 20%. The Numeracy Score is not significant here in
the presence of SRQPC, but if SRQPC is forced out of the model, then a model
involving Gender (p<0.001), Year (p<0.001), Maths Student (p=0.001), Numeracy
(p=0.002), Maths B Result (p=0.003), Years Since School (p=0.030) and interactions
between Maths Student and Year (p=0.004) and Gender and Maths B Result
(p=0.013) also has an adjusted R-squared of 20%.
The above results, taken in conjunction with the results of previous chapters, are of
great importance for statistical educators, statistical education researchers and
curriculum developers and advisors. We have seen that both introductory and
intermediate numeracy and mathematical backgrounds or inclinations are highly
significant in explaining SRQ and SRQPC Scores even though these involve no
questions usually regarded as requiring mathematical background.
This section has now investigated the end of semester exam component of the
assessment which represents the statistical operational knowledge and skills learnt
during the semester. The statistical operational knowledge and skills of this course
are those endorsed and emphasised by the statistical education reform movement,
focussing on data-driven statistical reasoning with negligible calculations,
manipulations or considerations of the “mathematical” type. However, not only do
we have that mathematical background and skills significantly contribute to the
statistical understanding and operational knowledge learnt in this apparently non-
mathematical course, but we now also see that the SRQPC Score, even in the
presence of these significant mathematical variables, is important in explaining the
extent to which students develop statistical operational skills and understanding
during the semester. Neither the Statistical Reasoning Questionnaire nor the end of
semester exam involves overt mathematics. What is emerging throughout this
research and analysis of data is that the link between mathematical and statistical
thinking and skills goes beyond any specific numerate techniques that might appear
in basic statistics.
Chapter 7 Factors Which Influence Statistical Reasoning
178
7.6 Discussion
This chapter has considered the relationships between statistical reasoning of
MAB101 students, their numeracy skills, attitudes towards statistics, mathematical
backgrounds and demographic variables, both at the beginning and end of the course.
At the beginning of the course statistical reasoning was measured using the
Statistical Reasoning Questionnaire developed in Section 5.2, while at the end of the
course, performance on the end of semester section of the assessment was used as an
indicator of statistical reasoning.
Use of the same SRQ instrument, or a comparable end of semester SRQ, was not
possible due to circumstances surrounding these two time points and the aims of the
study. The wide range of statistical backgrounds and experience with which students
enter the course necessitated the use of an instrument which is virtually technique
free at the beginning of the study. While comparisons may have been of interest had
a similar instrument been administered at the end of the course, such an instrument
would be perceived by students as irrelevant to their work on projects, assignments
and study towards the end of semester, and hence viewed as an imposition, resulting
in low student cooperation and even resentment.
In measuring statistical reasoning based on the SRQ, both the SRQ and SRQPC
Scores were used. The more symmetric distribution of the SRQ residuals made it
more appropriate for use in general linear models, however models formed using the
SRQPC Score were virtually identical to those based on the SRQ Score.
The major implication of the modelling used to describe the statistical reasoning
scores of students at the beginning of MAB101 is the usefulness and importance of
the students’ Numeracy Scores, with higher Numeracy Scores corresponding to
higher SRQ Scores. This feature is such that numeracy dominates the prediction of
SRQ Score in nearly all models, whether or not the OP score is included.
Furthermore, when the Numeracy Score is broken into two measures, the
introductory and intermediate components, both of these together are needed to best
describe the SRQ Score. This is despite the lack of mathematical calculations
required in the SRQ.
Chapter 7 Factors Which Influence Statistical Reasoning
179
When included in the model, the OP score is also highly significant. The negative
coefficient of OP is as expected since smaller OP scores indicate higher achievement.
With OP score acting as a general measure of ability, the important feature of this
model is the significance of Numeracy after allowing for OP.
The linear models developed indicate that there is a positive effect on SRQ Scores
from the number of years since leaving school in addition to the effect of numeracy
and OP scores. Investigation shows that this is most likely acting to correct the
effect of larger OP scores amongst older students. While these older students do not
in general have higher SRQ Scores, such a correction essentially extends the
applicability of the model to include these students.
With numeracy, OP and the number of years since school allowed for, none of the
other variables is helpful in explaining statistical reasoning. This includes the
various aspects of attitudes towards statistics described in Chapter 3, as well as
mathematical background and demographic variables. (It should be remembered,
however, that self-efficacy, gender and other background variables are useful
predictors of numeracy. See Section 4.3.4) Without allowing for OP, in addition to
numeracy, self-efficacy, Maths B result and possibly gender further help to explain
statistical reasoning.
Examining the relationship between Numeracy Scores and individual items on the
Statistical Reasoning Questionnaire, indicates that the link between numeracy and
statistical reasoning is not restricted to items which are more difficult or more
obviously mathematical by nature. In particular, the successful use of even a basic
level of proportional reasoning appears to be related to the level of basic numeracy
skills.
The end of semester component of the assessment in MAB101 measures the
statistical operational knowledge and understanding developed during the semester in
a data-driven, non-mathematical course strongly aligned with the principles and
practices of the statistical education reform movement. The effect of the OP score on
the marks on the end of semester component of the assessment can most probably be
explained by the impact on both these measures of a combination of effort,
Chapter 7 Factors Which Influence Statistical Reasoning
180
commitment and ability. With OP in the model, the number of years since school is
significant, possibly again correcting for the effect of OP.
Using the SRQPC Score to describe statistical reasoning helps to explain the exam
score in a way that the SRQ Score does not. It is possible that a contribution to this
is that there is a slightly lower correlation between SRQPC and OP than between
SRQ and OP. It is of interest that the use of levels of partial understanding in
formulating the SRQPC helps to explain marks in assessment that bring together the
operational knowledge and interpretation developed over the semester. That is, the
partial understanding measured by the SRQPC Score appears to provide a measure of
potential to succeed in development within the course.
There is also a significant effect of gender on exam mark. However, unlike in the
Numeracy Scores where males outperform females, in exam marks females
outperform males.
In the presence of other predictors which themselves depend on the Numeracy Score
(as well as an interaction between year and maths students which is a feature of the
two cohorts) the Numeracy Score is not additionally significant in predicting the
exam mark. This is perhaps not surprising given its strong influence on SRQ and
SRQPC. This remains the case even if OP is not included in the model. However,
without OP and without the benefit of the SRQPC Score, the Numeracy Score is
again a highly significant predictor of statistical skills and understanding.
In investigating exam scores, we are automatically excluding students who have
withdrawn from, or failed to complete, the course. Comparisons between students
who did and did not sit the exam, indicate that those who did not have lower scores
at the beginning of semester in statistical reasoning and both introductory and
intermediate levels of numeracy. Notably however, there is no significant difference
in OP scores between these two groups. Also, the only background variable that is
significantly related to sitting the end of semester exam is the Maths B result.
Students who have failed or not completed Maths B are more likely to not complete
MAB101 than those who have successfully completed Maths B.
Chapter 7 Factors Which Influence Statistical Reasoning
181
Hence, despite the dominance of OP score in predicting course outcome as measured
by the end of semester section of the assessment, care needs to be taken to account
for other features. In particular, it must be noted that students who enter MAB101
with lower levels of numeracy and statistical reasoning as well as those without
Maths B, may need extra support to persist to the end of the course.
182
Chapter 8 Implications
183
Chapter 8 Implications
8.1 Introduction
This chapter concludes the thesis with the implications of this research. Section 8.2
considers the key findings within the research and in so doing summarises the work.
Some aspects of the extent and limits of the study are discussed in Section 8.2.1.
Section 8.3 discusses implications of the work for teaching, assessment and advising
students. The thesis concludes with Section 8.4 which considers possibilities for
further research.
8.2 Implications within the Research
The focus on the nature and promotion of statistical literacy, reasoning and thinking
in the statistics education literature, and on the differences between statistics and
mathematics amongst some members of the statistical community, has at times
resulted in the devaluing of the role of mathematics and mathematical thinking in the
development of statistical reasoning. In contrast to this, there is a general
acknowledgement that a lack of capability with underlying mathematics interferes
with statistical learning (Ben-Zvi and Garfield, 2004). However, there has been a
paucity of research underpinning this “truth” particularly amongst students in
programs associated with science. Through investigating statistical thinking at the
interface of secondary and tertiary education and the factors which impact upon it,
the major implication of this work is that basic numeracy skill is a highly important
predictor of statistical reasoning and that this relationship extends to areas which are
not overtly mathematical in nature. In particular, this has been demonstrated through
the significance of numeracy in predicting statistical reasoning for students entering
an introductory data analysis subject in a program associated with science.
Chapter 8 Implications
184
In Chapter 7 of this thesis, it has been shown that this significance is such that
numeracy dominates the prediction of statistical reasoning in nearly all models,
whether or not allowance is made for the students’ tertiary entrance scores.
Furthermore, when the Numeracy Score is divided into an introductory and an
intermediate score, both of these together are useful in predicting statistical
reasoning. The importance of this finding cannot be overstated. While most
researchers in statistical education would acknowledge the importance of
introductory numeracy skills in developing statistical reasoning, some might question
the relevance of the more algebraic skills reflected in the intermediate component of
the Numeracy Questionnaire. However, this importance is clearly demonstrated in
the analyses of Chapter 7. Furthermore, investigation of success or failure in
individual statistical reasoning items indicates that the strength of the link between
statistical reasoning and numeracy is not dependent upon the difficulty of the item or
on the degree of mathematical ability apparent in the item. As is often the case with
mathematics within other disciplines, the generic skills of mathematics are at least as
important as the specific skills.
The SRA was developed to assess statistical reasoning at the secondary/tertiary
interface and to determine the success of a new high school statistics course in the
US (Konold, 1990; Garfield, 1991, 2003). Our belief that some aspects of the SRA,
particularly an emphasis on combinatorial reasoning, make it inappropriate for use in
totality in the current Australian context, has led us to develop a new Statistical
Reasoning Questionnaire. This SRQ draws on the SRA and on substantial work of
Watson (Watson and Callingham, 2003; Watson, et al., 2003) in the Australian
primary and junior secondary school context.
Tests of validity and reliability of the SRA (Garfield, 1998) and low correlations
with course outcomes (Garfield, 2003) have led to questioning over its use as a single
measure of statistical reasoning and instead to a reliance on a number of component
scales or correct reasoning and misconceptions (Tempelaar, 2004). However,
Watson and Callingham’s (2003) application of a Rasch partial credit model
demonstrates that statistical literacy is a one-dimensional hierarchical construct. The
implication of this is that it can validly be described by a single measure. In Chapter
6 of this thesis, a Rasch partial credit model is applied to the SRQ. The fit of the
Chapter 8 Implications
185
model supports Watson and Callingham’s findings and indicates that levels of
reasoning can be described and are demonstrated by students at the
secondary/tertiary interface in a manner similar to Watson and Callingham’s
description at the upper primary and junior secondary level.
The consistent and symmetric distribution of results of the SRQ, together with the
results of the Rasch analyses, provide evidence in Chapters 5 and 6 that the SRQ is
an appropriate tool for use in measuring general statistical reasoning at the
secondary/tertiary interface in the Australian context, while leaving room for future
development of items that reach into the higher levels of statistical reasoning
demonstrated by some students at this level.
The SRQPC Score, based on the results of the Rasch partial credit analysis, is
developed in Chapter 6 as a new approach to scoring a statistical reasoning
questionnaire. In calculating the SRQPC Score, the level of understanding
demonstrated by an item response, as defined through the Rasch partial credit
analysis, is used as the score for that item. By incorporating students’ levels of
understanding, this score allows for a more meaningful description of the range of
student reasoning. The significance of this score rather than the dichotomously-
based SRQ Score in analysis of the end of semester exam component of assessment
emphasises the fact that this scoring technique is a useful development in itself.
The Rasch partial credit model has also been used to estimate the expected responses
of students to individual items on the SRQ which have then been compared for
students with different mathematical backgrounds. Combining this information with
the relationship between numeracy and success in individual items of the SRQ,
particularly brings to light the difficulty which students with poorer mathematical
skills experience in successfully applying proportional reasoning in a statistical
context.
In Chapter 7 the end of semester section of the course assessment is used as a
representation of individual learning in the course. Although worth 50 to 60% of the
overall assessment, performance in the exam provides a measure of overall learning
in the course, including practical and project work. Tertiary entrance score and the
newly introduced SRQPC Score are both highly significant in explaining exam
Chapter 8 Implications
186
scores. While, in the presence of these predictors, numeracy is not helpful in
explaining exam score for those who sat the exam, those who did not sit the exam
tend to demonstrate lower numeracy than those who did. Also, students without the
core senior algebra and calculus based mathematics course, Maths B, tend to be more
likely not to sit the exam. The implications of this for teaching at the
secondary/tertiary interface are raised in Section 8.3.
In development of the Statistical Reasoning Assessment, Garfield (2003) comments
that the correlations between SRA score and a range of course outcomes were
“extremely low”, and concludes that statistical reasoning and performance in an
introductory statistics course are “unrelated”. However, in this study, all two-way
correlations between the variables: Exam, SRQ, SRQPC, Intro-Numeracy, Inter-
Numeracy, Self-efficacy and OP, were highly significant (p<0.001) with the
correlations between Exam and each of the statistical reasoning scores, and between
Exam and each of the Numeracy Scores falling between 0.2 and 0.3, and the
correlation between Exam and OP being -0.47. (Recall that numerically smaller OP
scores indicate a better performance.) Considering two-way correlations in
investigating student performance data, particularly at tertiary level, requires caution
with a number of aspects. Correlations, even partial correlations, cannot handle the
complexity of the bigger picture. The general linear modelling of Chapter 7
investigates all available variables simultaneously, and indicates that the SRQPC
Score helps to explain the exam mark after allowing for the tertiary entrance score
(assumed to be measuring general ability and application). As with R-squared in
general linear modelling, the actual sizes of correlations depend on the amount of
natural or inherent variation present in the variables under consideration. All tertiary
level educational data involve great inherent variation due to the multiplicity of
issues involved as well as characteristics and backgrounds of individuals, particularly
in large introductory statistics classes. The focus of this research lies not in
prediction or analysis of the many and varied individua ls who undertake introductory
tertiary statistics courses. Rather it is in measuring, unpacking and understanding the
statistical reasoning and numeracy which students bring to such courses (in areas
associated with science), and in investigating the inter-related links between these, all
background variables available and student performance.
Chapter 8 Implications
187
The Numeracy Questionnaire, described and analysed in Chapter 4, has been shown
to be an appropriate tool for measuring pre-calculus skills relevant to an introductory
data analysis course at the secondary/tertiary interface. Performances in this
questionnaire demonstrate that many students struggle with pre-calculus skills,
despite having undertaken Maths B. The results of general linear modelling of
Numeracy Scores indicate the significance of undertaking mathematics beyond
Maths B, as well as performance in Maths B, self-efficacy and gender in predicting
Numeracy Score. Logistic regression indicates that success with inequalities and
application of fractions are particularly dependent on having studied beyond Maths
B. The inability to apply fractions is especially relevant to the development of
statistical thinking, where it can impede full and complete proportional reasoning, as
indicated in the analysis of the SRQ results. It is important to note that these pre-
calculus skills are supposed to be developed before senior secondary schooling. The
implications are that the development is so insufficient that without Maths B students
have very little numeracy skill and that even more than Maths B is required to ensure
complete confidence with essential basic numeracy.
Results of the Attitudinal Surveys, described in Chapter 3, indicate that the students
on whom this research is based arrive at the secondary/tertiary interface with
generally positive attitudes towards statistics, convinced of the links that exist
between mathematics and statistics and confident of their own ability to succeed in
the area. Many believe, however, that the study of statistics at school has been
beneficial only in the final two years of high-school. By investigating the effect of a
variety of attitudinal components on statistical reasoning in Chapter 7, only the self-
efficacy component has been shown to be influential. Self-efficacy is a useful
predictor of numeracy allowing for all of gender, previous study of higher level
mathematics, Maths B result, tertiary entrance score and classification as a
mathematics student ; and also of statistical reasoning allowing for numeracy and
gender. These models indicate that even after taking account of background and
ability factors, a student’s confidence in their own ability to succeed in mathematics
and statistics is positively related to their ability to do so. However there are also
indications that students discover at tertiary level that statistics is more complex than
they had thought.
Chapter 8 Implications
188
8.2.1 Extent and limits of the study
This study has been conducted in the context of students at the secondary/tertiary
interface enrolled in programs associated with science. To derive implications of any
study beyond its immediate context requires sufficient information for benchmarking
across contexts. Although the cohorts of this study are enrolled in degree programs
broadly associated with science, the background information demonstrates the wide
diversity of students in the cohorts with respect to a range of variables. One
classification of the students, for example, is into maths and non-maths students, but
it must be remembered that this refers to choice of course and not just (or even
necessarily!) mathematical ability. Mathematically-capable students are found
across all tertiary programs including humanities, law and social sciences.
Correspondingly there are significant numbers of students in science programs who
are mathematically-averse and highly apprehensive about statistics. Hence, with due
care, the main findings of the study, described in Section 8.2 may be applicable in
more general settings.
The numeracy questionnaire reflects relevant aspects of the pre-senior curricula
across Australia, but the questions themselves and the analysis of the questionnaire
provide excellent benchmarking references for other contexts. The development of
the statistical reasoning questionnaire drew upon international studies at the
school/tertiary interface in the US and across school levels in Australia, plus
considerations of particular relevance to the Australian school context. Decisions
about inclusion of questions common to these international studies were based on the
balance of value of the questions themselves and of providing commonality in
national and international comparisons. Within Australia, although there are some
differences in the approach to statistics within the senior school curricula, most
notably between New South Wales11 and the other states, the curriculum-specific
questions refer to common curriculum elements. It is felt that the effectiveness of the
Attitudinal Survey is uncertain and that further exploration of attitudes towards
statistics, particularly amongst more mathematically inclined students could be
valuable.
11 Particularly at the senior school level, New South Wales has not yet adopted a data-based approach to statistical concepts within their mathematics syllabi
Chapter 8 Implications
189
The general linear modelling used in Chapters 4 and 7 needs to be interpreted in the
manner in which it was intended. Although the term “predictor” is used, and fitted
equations are given, there is no intention that these relationships be used to predict
outcomes for individual student profiles. The amount of variation left unexplained in
the data after models are fitted is indicative of the inherent variation among students
which prevents such prediction. The purpose of the modelling is to investigate and
unpack the relationships between statistical reasoning and numeracy as well as
background variables and student performance. The only instance where the
predictive nature of models may be of use, is to highlight students who achieve
significantly lower results in numeracy or statistical reasoning than their
backgrounds would predict, and to recommend such students for future support.
The variation between students also raises the issue of other variables not included in
this study which could be measured and included in the modelling. One of these is
student attendance at classes. Further work on measuring and including this variable
is currently being undertaken.
Due to practical and ethical considerations of students, the study did not include the
development of an instrument to measure general statistical reasoning at the end of
the semester. The end of semester component of assessment is a measure of the
operational knowledge attained by the student throughout the course. It should be
noted that students prepare and take their own summaries into this exam, and that the
time provided is such that all students finish within, mostly well within, the time
given. The analysis of the end of semester component of the assessment is oriented to
investigating the links between the measures provided by the instruments, the
background variables and a measure of knowledge, understanding and application
ability in introductory statistical data analysis. A better measure of the full benefit to
the student of the course would be the overall final result that incorporates the project
mark and continuous assessment. However this result is likely to be more heavily
influenced by other variables not included in this study, such as class attendance and
participation.
Chapter 8 Implications
190
8.3 Implications for Teaching, Assessment and Advising
In teaching at the secondary/tertiary interface, educators need to be aware of and
sensitive to the level of numeracy skills which students are likely to possess. This
involves an understanding of the manner in which difficulties are likely to be
exhibited in areas which stem from upper-primary and lower-secondary levels even
amongst students who have completed an algebra and calculus based senior
mathematics subject. This lack of solid grounding is particularly likely to be
demonstrated when basic skills are applied in new contexts and multi-step situations.
Already many universities and departments within universities provide support for
students whose basic quantitative skills do not meet the needs of their course. This
research reinforces the need for such support mechanisms and demonstrates their
requirement regardless of whether or not the students in question officially possess
the formal prerequisites. Clearly demonstrated in this research is the role of further
study in reinforcing basic skills. Educators at the secondary/tertiary interface, while
desiring to move students on to applications and higher levels of understanding, may
well benefit their students by intentionally providing opportunities to reinforce basic
skills in new contexts rather than avoiding aspects which are known to cause
problems.
This research brings into question the tendency in universities to remove formal
mathematical prerequisites. Pressure on high-school curricula and the lack of
appreciation in the general community of the generic skills developed through
mathematics, results in fewer students undertaking algebra and calculus based
mathematics courses when they are not formally required for further study. This
research indicates that such students are in danger of not developing even their basic
numeracy skills due to lack of reinforcement.
This tendency to remove an algebra and calculus based formal prerequisite is
sometimes supported by tertiary educators who, on experiencing the lack of ability
by some who have studied such a course, deem it to have been of no use. What these
educators are failing to recognise is that firstly the problems are often founded in
inadequate grounding in earlier years, and secondly the only way to maximise
Chapter 8 Implications
191
students’ abilities is to ensure they experience as much reinforcement as possible
through the highest level of mathematics they are capable of studying.
Those who advise students also need to be aware of the impact of studying
mathematics and higher level mathematics in particular. The improvement in basic
numeracy skills, ability to handle multi-step problems, and the transferability of
skills across contexts, reaches well beyond the specific content of any particular
mathematics course. This research suggests that these capabilities are all maximised
by studying more mathematics.
This research has implications for the teaching of introductory statistics in a course
broadly associated with science at the secondary/tertiary interface. Given the
strength of the relationship between statistical reasoning and numeracy, the clear
recognition of this link in students’ minds, and the students’ general confidence in
their own ability in mathematics and statistics, it is reasonable to expect that
attempting to teach an introductory data analysis course in this context, without
sufficient acknowledgement of its mathematical links, is likely to be counter-
productive. A better approach is to acknowledge and build on a simple but distinctly
quantitative foundation.
Course outcomes in this study imply that students who enter the course without the
mathematical background of an algebra and calculus based course are no less likely
than others to succeed provided they persist until the end of the course. This
observation, however, is limited to those who do persist and therefore should not be
taken as reason to encourage more students of lesser mathematical background to
undertake the course. The higher attrition rate of these students, the observation of
the difficulties which they have with all areas of the course and the advantages of
consolidation, clearly imply that wherever possible such students should be
encouraged to obtain an algebra and calculus based grounding before encountering
introductory data analysis.
In some sectors of the statistical community, there has been an argument put forward
to move the teaching of statistics from the school mathematics curricula into another
discipline area, such as geography, or into a new discipline area of its own. This
discussion is generally based on the perceived inability of school mathematics
Chapter 8 Implications
192
teachers to teach statistical thinking. These statisticians have succumbed to the
danger of focusing on the differences between mathematics and statistics rather than
their commonalities. This research indicates the folly of such an approach as it
would result in the teaching of statistics at school level becoming de-quantified,
contrary to the relationship indicated in this research. Also the forced choice these
students would undoubtedly have to make, away from any higher level mathematics
and often any core algebra and calculus based mathematics, would deprive them of
any opportunity to consolidate basic numeracy skills. While inadequacies with
current school curricula and their teaching are acknowledged, and indicated in this
research through the fact that a majority of students felt statistics at school level had
been beneficial only in the final two years, such an approach is unlikely to have a
productive impact on the statistical thinking of students at the secondary/tertiary
interface. A better approach rests in increased and improved teaching of statistical
thinking to current and prospective school teachers. The statistical community
would do better to focus on supporting mathematics teachers in their professional
development, rather than encouraging the artificial removal of statistics from their
discipline.
The effectiveness of the tools used in this research has implications for assessment of
students at the secondary/tertiary interface. Conflict between the need for task-
specificity and familiarity to students has led in the past to doubt over the validity of
measures of self-efficacy in statistics (Finney and Schraw, 2003). The significance
of self-efficacy in predicting numeracy and statistical reasoning in this study suggests
that the tool which has been used here is a valid measure of self-efficacy in
mathematics and statistics at this level. This measure consists of only five items
which were considered both relevant to the course and within the students’ previous
experience. Its usefulness suggests that if desired, the development of a more
detailed measure would not be difficult, but that even this simple tool can be used to
glean useful information. Furthermore, the lack of significance of other components
of student attitude indicates that questionnaires aimed at measuring attitudes towards
statistics could be made more informative by focussing specifically on self-efficacy.
Refining and adding items which measure this dimension and understand its impact
Chapter 8 Implications
193
on performance is likely to be more productive than developing measures of general
attitudes.
The work of Watson and Callingham (2003) indicates that it is valid to measure
statistical reasoning of school students on a single hierarchical scale without
resorting to a collection of subscales. This study supports the use of such a scale at
the secondary/tertiary interface in the form of the Statistical Reasoning
Questionnaire. While there is evidence that items reaching into higher levels of
statistical reasoning should be developed to measure the full range of ability, the
current collection of items is appropriate as a screening tool at this level.
Furthermore, summarising the results of the Statistical Reasoning Questionnaire in
the form of the SRQPC Score incorporates students’ levels of understanding and
allows for a more meaningful expression of the range of statistical ability.
8.4 Implications for Future Research
The main emphasis of future research indicated by this work is in the area of the
assessment of statistical reasoning. Although much work has been done in this area,
it is felt that there is substantial room for the development of further items
appropriate for use at the secondary/tertiary interface. The analysis of the SRQ in
this thesis indicates that more items could be developed which reach into the higher
levels demonstrated by some students in this study. A particular challenge is the
problem of how to formulate such items without relying on specific terminology or
mathematical ability. In fact, whether it is possible or desirable to do so is perhaps
doubtful, given that understanding statistical terminology is an aspect of statistical
reasoning, and given the relationship that exists between statistical reasoning and
numeracy. A best approach may be to focus on developing items which incorporate
these aspects separately.
A further challenge is the assessment of general statistical reasoning at the
conclusion of an introductory data analysis course. This issue involves both the
construction of an appropriate instrument and its administration to students. Given
the students’ common background from the course, items for such an instrument
Chapter 8 Implications
194
would be able to incorporate more statistical terminology, as well as applications of
specific techniques. Although it has been done in other studies, the use of such a
research instrument at the end of semester was seen as counterproductive for students
in this study. Hence the administration of an instrument may be more difficult than
its design. It could also be argued that the real measure of course success is the level
of statistical reasoning and use of statistics with which students operate at some time
after completion of the course. A study which surveyed students’ reasoning some
months after conclusion of the introductory data analysis course could provide more
information, but would be difficult to administer and incur severe disadvantages in
the area of rates and possible bias of response.
The Rasch analyses used in this study have shown that, at the point of entry to
tertiary education, statistical reasoning appears to be a single construct. In
developing an instrument to measure statistical reasoning at the end of an
introductory course and beyond, it would be interesting to determine whether
statistical reasoning could still be considered as a single construct given the broader
range of topic areas which would need to be included.
In developing the SRQPC Score the results of the Rasch partial credit model were
used. Another approach to using this model, which has been used by Reading
(2002), is to analyse students’ ability estimates on the logistic scale. These scores
could be modelled in a similar way to the SRQPC Scores. There are also
possibilities for future work in synthesising the information obtained from the
students in this study with Reading’s development of a profile of statistical
understanding.
A separate idea to pursue is in the area of students’ attitudes. The discrepancy which
was discussed in Section 3.4 between the students’ approval of the course and their
apparent decline in attitudes towards statistics, raises questions about the perspective
with which students arrive at tertiary level. Informal discussion with students
suggests that it is only as the course proceeds that they begin to appreciate the
breadth and depth of the discipline.
Chapter 8 Implications
195
It is to the advantage of us all if more of the best and brightest come into statistics
education systems and leave them better educated, with better developed thinking skills,
and a greater appreciation for the power of statistics. (Wild, 2006)
Further research into students’ initial and developing attitudes and how to encourage
their appreciation of the field may assist in encouraging good students into statistics -
a goal which is worth pursuing.
“When I grow up I want to be a statistician!” It would be nice but it just doesn’t
happen, does it? The most critically imp ortant element for the future health of our
discipline is an ample supply of very bright young minds entering it. (Wild, 2006)
Appendix A Background Information Survey
196
Appendix A. Background Information Survey
Survey of Students – Semester 1 2005 MAB101 Statistical Data Analysis 1
Name (underline surname): …………………………………………………………
Title (Mr, Ms, Miss, Mrs, other): …………. Student Number: ……..………… Course Code: …………………….. Major: ………………......……… (Education students) Discipline X: ………….… Discipline Y: ………...…………
Schooling completed grade 12
up to grade 11 only up to grade 10 only
Year in which you finished school: ……. Where: Qld
NSW Other Aust. state …… Overseas
School: ………………………………………………....…………………………
School results OP or Tertiary Entrance Rank: ……….
Maths subjects studied in grade 12 Maths subjects studied in grade 11 Qld Maths A Result ……… Qld Maths A Result ……… Qld Maths B Result ……… Qld Maths B Result ……… Qld Maths C Result ……… Qld Maths C Result ……… Other: ……............................ Result ……… ………........……… Result ………
………………............ Result ……… ……………........… Result ………
or Alternative Entry MAB105 at QUT Result ……… QUT Maths bridging course Result ……… Other ……………….……….. Result ………
Is this your 1st semester at QUT yes no
QUT to date (if applicable) Maths subjects studied
…………..…….. …………..……..
Have you studied other maths or statistics subjects at any tertiary institution? Maths subjects studied Result Year ……………………………… …….. ……....
Appendix B 2004 Attitudinal Survey
197
Appendix B. 2004 Attitudinal Survey
Survey of Attitudes About Statistics
DIRECTIONS: The questions below are designed to research your attitudes and beliefs about statistics. Read each statement carefully and choose your response from A (strongly disagree) through to E (strongly agree). If your response is “don’t know”, choose C for Neutral. Please mark your response on the mark sheet provided. To facilitate data matching, please record your student number on the mark sheet and (should you use it) on the reverse side of this page. Names should not be included. If you wish to elaborate on any of your responses, then there is space over the page to do so. (For example, you may wish to refer to various stages of education.)
Str
ongl
y D
isag
ree
Dis
agre
e
Neu
tral
Agr
ee
Str
ongl
y A
gree
1. Statistics is boring. A B C D E
2. Statistics can be used to justify almost anything. A B C D E
3. I find statistics easy. A B C D E
4. Statistics will be valuable in my chosen career. A B C D E
5. I don’t like statistics because there never seems to be a right or wrong answer.
A B C D E
6. I feel insecure when I have to do a statistics problem. A B C D E
7. Statistics is a complicated subject. A B C D E
8. Statistical skills will make me more employable. A B C D E
9. I use statistics in my everyday life. A B C D E
10. I would do better at statistics if I were better at maths. A B C D E
11. I want to learn more statistics. A B C D E
12. Understanding statistics is important in modern society. A B C D E
13. If you are good at maths you are more likely to understand basic statistical concepts.
A B C D E
14. I am taking this statistics unit only because I have to. A B C D E
15. I am good at maths. A B C D E
16. I expect to do well in this unit. A B C D E
17. I am not confident of my ability to read and interpret information presented graphically.
A B C D E
18. I expect to be able to do the computing necessary for this unit.
A B C D E
19. I expect to have trouble determining which procedure to use to answer questions.
A B C D E
Appendix B 2004 Attitudinal Survey
198
This side is only for optional comments. Student Number: You may wish to explain or elaborate on some of your responses. Use the space provided below to do so for as many or as few statements as you wish. 1. Statistics is boring. 2. Statistics can be used to justify almost anything. 3. I find statistics easy. 4. Statistics will be valuable in my chosen career. 5. I don’t like statistics because there never seems to be a right or wrong answer. 6. I feel insecure when I have to do a statistics problem. 7. Statistics is a complicated subject. 8. Statistical skills will make me more employable. 9. I use statistics in my eve ryday life. 10. I would do better at statistics if I were better at maths. 11. I want to learn more statistics. 12. Understanding statistics is important in modern society. 13. If you are good at maths you are more likely to understand basic statistical concepts. 14. I am taking this statistics unit only because I have to. 15. I am good at maths. 16. I expect to do well in this unit. 17. I am not confident of my ability to read and interpret information presented graphically. 18. I expect to be able to do the computing necessary for this unit. 19. I expect to have trouble determining which procedure to use to answer questions.
Appendix C 2004 Follow-up Attitudinal Survey
199
Appendix C. 2004 Follow-up Attitudinal Survey
Survey of Attitudes About Statistics
DIRECTIONS: The questions below are designed to research your attitudes and beliefs about statistics. Read each statement carefully and choose your response from A (strongly disagree) through to E (strongly agree). If your response is “don’t know”, choose C for Neutral. Please mark your response on the mark sheet provided. To facilitate data matching, please record your student number on the mark sheet. Names should not be included.
Str
ongl
y D
isag
ree
Dis
agre
e
Neu
tral
Agr
ee
Str
ongl
y A
gree
1. Statistics is boring. A B C D E
2. Statistics can be used to justify almost anything. A B C D E
3. I find statistics easy. A B C D E
4. Statistics will be valuable in my chosen career. A B C D E
5. I don’t like statistics because there never seems to be a right or wrong answer.
A B C D E
6. I feel insecure when I have to do a statistics problem. A B C D E
7. Statistics is a complicated subject. A B C D E
8. Statistical skills will make me more employable. A B C D E
9. I use statistics in my everyday life. A B C D E
10. I would do better at statistics if I were better at maths. A B C D E
11. I want to learn more statistics. A B C D E
12. Understanding statistics is important in modern society. A B C D E
13. If you are good at maths you are more likely to understand basic statistical concepts.
A B C D E
14. I am taking this statistics unit only because I have to. A B C D E
15. I am good at maths. A B C D E
16. I expect to do well in this unit. A B C D E
17. I am not confident of my ability to read and interpret information presented graphically.
A B C D E
18. I have been able to do the computing necessary for this unit.
A B C D E
19. When I use statistics in the future, I expect that I will have trouble determining which procedure to use.
A B C D E
Appendix D 2005 Attitudinal Survey
200
Appendix D. 2005 Attitudinal Survey
Your feelings about Statistics
Complete this statement: When I think of probability and statistics at school, I think of
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________
Did you find probability and statistics (sometimes called chance and data) beneficial
in grades 11&12 yes/no
in grades 8 to 10 yes/no
in grades 4 to 7 yes/no
What did you or didn’t you find beneficial about it?
___________________________________________________________________
___________________________________________________________________
___________________________________________________________________ Please respond to the following questions on the computer mark sheet provided. Read each statement carefully and choose your response from A (strongly agree) through to E (strongly disagree). If your response is “don’t know”, choose C for Neutral. To facilitate data matching, please record your student number on the mark sheet.
Str
ongl
y A
gree
Agr
ee
Neu
tral
Dis
agre
e
Str
ongl
y D
isag
ree
1. I am taking this statistics unit only because I have to. A B C D E
2. I am good at maths. A B C D E
3. I expect to do well in this unit. A B C D E
4. I am not confident of my ability to read and interpret information presented graphically.
A B C D E
5. I expect to be able to do the computing necessary for this unit.
A B C D E
6. I expect to have trouble determining which procedure to use to answer questions.
A B C D E
Appendix E Numeracy Questionnaire
201
Appendix E. Numeracy Questionnaire
For each question, choose the correct answer and mark it on the answer sheet provided.
Please record your student number to facilitate data matching. Names should not be recorded.
Do not use calculators.
1. Written as a fraction in its simplest form, 20% is equal to:
(a) 201
(d) 52
(b) 10020
(e) 21
(c) 51
2. Written as a fraction in its simplest form, 0.1% is equal to:
(a) 1000
1 (d)
10010
(b) 100
1 (e)
109
(c) 101
3. Written as a percentage correct to 1 decimal place, the fraction 61
is equal to:
(a) 6.0% (d) 60.0%
(b) 12.5% (e) 66.7%
(c) 16.7%
4. A class consists of 80 males and 120 females. A non-compulsory excursion is attended by 20% of the male students and 30% of the females. The percentage of the class which attends the excursion is:
(a) 25% (d) 52%
(b) 26% (e) 56%
(c) 28%
Appendix E Numeracy Questionnaire
202
5. Possible subject grades at a particular institution are 1 to 7, with 7 being the highest. In a particular class of 200, 5% of students were given a 1 or a 2.
The number of students receiving a 1 or a 2 was:
(a) 5 (d) 57
(b) 10 (e) 100
(c) 40
6. In the same class (as in question 5), 85% of students were awarded a grade
from 3 to 6. The number of students receiving a grade of 7 was:
(a) 10 (d) 30
(b) 15 (e) 170
(c) 20
7. A group of 340 students must be divided into lab classes with a maximum of
30 students in each. The smallest number of lab classes needed is:
(a) 10 (d) 12
(b) 11 (e) 13
(c) 11.3
8. 0.66 + 0.55 is equal to:
(a) 0.1111 (d) 1.21
(b) 0.121 (e) 12.1
(c) 1.111
9. 51
61
+ is equal to:
(a) 302
(d) 3011
(b) 111
(e) 65
(c) 112
Appendix E Numeracy Questionnaire
203
10. 65
32
41
++ is equal to:
(a) 138
(d) 125
1
(b) 7213
(e) 43
1
(c) 7235
11. 4
2945 22 ×+× is equal to:
(a) 29 (d) 89
(b) 41 (e) 181
(c) 56
12. 41
91
+ is equal to:
(a) 51
(d) 613
(b) 131
(e) 65
(c) 52
13. 253020 ×÷+ is equal to:
(a) 5 (d) 32
(b) 20 (e) 52
(c) 23
14. When 4 ,2 ,13 ,25 ==== tsmn , the expression ( ) ( )
211 22
−+−+−
mntmsn
(correct to 2 decimal places) is equal to:
(a) 2.83 (d) 6.93
(b) 3.94 (e) 11.31
(c) 6.00
Appendix E Numeracy Questionnaire
204
15. Given that 1
11 n
xp = ,
2
22 n
xp = and
21
21
nnxx
p++
= , the value of ( )pp
pp−−
121 , when
101 =x , 152 =x , 251 =n and 752 =n , is given by:
(a) 41
(d) 1516
(b) 54
(e) 77
108
(c) 65
16. Which of the following sets of values is correctly ordered from smallest to largest?
(a) –0.05, 51− , 0.05, 0.55, 0.5
(b) -0.05, 51− , 0.05, 0.5, 0.55
(c) 51− , -0.05, 0.05, 0.5, 0.55
(d) –0.05, 0.05, 51− , 0.5, 0.55
(e) 51− , -0.05, 0.05, 0.55, 0.5
17. Which of the following sets of values is correctly ordered from smallest to largest?
(a) 2013 , 7
8 , 64 , 5
3 , 31
(b) 78 , 6
4 , 2013 , 5
3 , 31
(c) 31 , 5
3 , 64 , 7
8 , 2013
(d) 31 , 5
3 , 2013 , 6
4 , 78
(e) 31 , 20
13 , 64 , 5
3 , 78
18. The solution for x to the equation: cxb
a=
− 1 is given by:
(a) cb
ax
1−= (d)
1+=
cba
x
(b) 1−
=abc
x (e) c
ba
x−
=1
(c) bc
ax
1−=
Appendix E Numeracy Questionnaire
205
19. The solution to the inequality: 1252 <+a is:
(a) 27
−>a (d) 1<a
(b) 27
<a (e) 27
=a
(c) 3=a
20. The solution to the inequality: 4148
<a
is:
(a) 121
<a (d) 12>a
(b) 12<a (e) 192>a
(c) 192<a
21. The solution to the pair of inequalities 167 <+a and 63 >a is:
(a) 75.3=a (d) 9<a
(b) 8=a (e) 92 << a
(c) 92 <> a
Appendix F Statistical Reasoning Questionnaire
206
Appendix F. Statistical Reasoning Questionnaire12
SRQ_1 To get the average number of children per family in a small town, a teacher counted the total number of children in the town. She then divided by 50, the total number of families. The average number of children per family was 2.2. Which of the following is certain to be true:
a) Half of the families in the town have more than 2 children.
b) More families in the town have 3 children than have 2 children.
c) There are a total of 110 children in the town.
d) There are 2.2 children in the town for every adult.
e) The most common number of children in a family is 2.
f) None of the above.
SRQ_2 A small object was weighed on the same scales separately by nine students in a science lab. The weights (in grams) recorded by each student are shown below.
6.3 6.0 6.0 15.3 6.1 6.3 6.2 6.15 6.3
The true weight could be estimated in several ways.
How would you estimate it?
SRQ_3 Box A and Box B are filled with red and blue marbles as follows.
Box A Box B
Each box is shaken. In order to win a ticket to a sporting match, you need to get a blue marble, but you are only allowed to pick out one marble without looking. Which box should you choose?
a) Box A (with 12 red and 8 blue).
b) Box B (with 30 red and 20 blue).
c) It doesn’t matter which box is chosen.
12 The Statistical Reasoning Questionnaire given here combines items over two years. SRQ_21 and SRQ_22 were not in the 2004 version, and SRQ_4 was not in 2005.
12 red
8 blue
30 red
20 blue
Appendix F Statistical Reasoning Questionnaire
207
SRQ_4 A bottle of medicine has the following printed on it: WARNING: For applications to skin areas there is a 15% chance of getting a rash. If you get a rash, consult your doctor. How would you interpret this?
a) Don’t use the medicine on your skin – there’s a good chance of getting a rash.
b) For application to the skin, apply only 15% of the recommended dose.
c) If you get a rash, it will probably involve only 15% of the skin.
d) About 15 out of every 100 people who use this medicine get a rash.
e) There is hardly any chance of getting a rash using this medicine.
SRQ_5 An Australian male is rushed to hospital in an ambulance. Which of the following is least likely:
a) The man is over 55.
b) The man has had a heart attack.
c) The man is over 55 and has had a heart attack.
SRQ_6 As captain of your cricket team you have lost 8 out of 9 tosses in your previous 9 matches. For the next 4 tosses of the coin, you choose heads. Tails comes up 4 times. For the 5th toss, what should you choose?
a) Heads
b) Tails
c) It doesn’t matter
What is the probability of getting heads on this 5th toss?______
What is the probability of getting tails on this 5th toss?______
Note: assume only fair coins are used in tosses at cricket matches!
SRQ_7 Mrs Jones wants to buy a new car, either a Honda or a Toyota. She wants whichever car will break down the least. First she read in Consumer Reports that for 400 cars of each type, the Toyota had more breakdowns than the Honda. Then she talked to three friends. Two were Toyota owners, who had no major breakdowns. The other friend used to own a Honda, but it had lots of break-downs, so he sold it. He said he’d never buy another Honda. Which car should Mrs Jones buy?
a) Mrs Jones should buy the Toyota, because her friend had so much trouble with his Honda, while the other friends had no trouble with their Toyotas.
b) She should buy the Honda, because the information about break-downs in Consumer Reports is based on many cases, not just one or two cases.
c) It doesn’t matter which car she buys. Whichever type she gets, she could still be unlucky and get stuck with a particular car that would need a lot of repairs.
Appendix F Statistical Reasoning Questionnaire
208
SRQ_8 Is there a problem with the following pie chart? If so, identify the problem.
SRQ_9 A farmer wants to know how many fish are in his dam. He took out 200 fish and tagged each of them. He put the tagged fish back in the dam and let them get mixed with the others. On the second day, he took out 250 fish in a random manner, and found that 25 of them were tagged. Estimate how many fish are in the dam.
SRQ_10 The Bureau of Meteorology wanted to determine the accuracy of their weather forecasts. They searched their records for those days when the forecaster had reported a 70% chance of rain. For those particular days (that is, those days for which the forecast was stated as a 70% chance of rain), they compared the forecast with records of whether or not it actually rained.
The forecast of 70% chance of rain can be considered very accurate if it rained on:
a) 95% - 100% of those days
b) 85% - 94% of those days
c) 75% - 84% of those days
d) 65% - 74% of those days
e) 55% - 64% of those days
space for working
Appendix F Statistical Reasoning Questionnaire
209
SRQ_11 A group of students recorded the number of years their families had lived in their town. Here are two graphs that the students drew to illustrate their results.
Graph 1
x x x x x x x x x x x x x x x x x x x x x x
0 1 2 3 4 5 6 10 11 12 13 14 17 25 37 YEARS IN TOWN
Graph 2
x x x x x x x x x x x x x x x x x x x x x x
0 5 10 15 20 25 30 35 YEARS IN TOWN
Which of these two graphs (1 or 2) would you recommend the students use and why?
SRQ_12 Another group of students carried out a survey at the local library regarding the most frequent reason for using the internet. They produced the following two graphs.
Graph A Graph B
Internet Usage
contact friends33%
study22%
entertainment17%
work14%
information6%
other8%
Internet Usage
contact friends33%
study22%
entertainment17%
work14%
information6%
other8%
Which of the graphs, A or B, would you recommend the students use and why?
Appendix F Statistical Reasoning Questionnaire
210
SRQ_13 Half of all newborns are girls and half are boys. Hospital A records an average of 50 births a day. Hospital B records an average of 10 births a day. On a particular day, which hospital is more likely to record 80% or more female births?
a) Hospital A (with an average of 50 births a day).
b) Hospital B (with an average of 10 births a day).
c) The two hospitals are equally likely to record such an event.
SRQ_14 A local newspaper published the following article:
Do you agree with Mr Robinson’s findings? (Please explain your response.)
SRQ_15 A music teacher was pleased to read the following article in a professional journal:
Do you agree with these research findings? (Please explain your response.)
Family car is killing us, says researcher
Twenty years of research has convinced Mr Robinson that motoring is a health hazard. Studying figures from the Australian Bureau of Statistics, Mr Robinson has produced graphs which show quite dramatically that as the numbers of new vehicle registrations increase, so have the numbers of deaths due to heart-related causes.
Instrumental Lessons Improve OPs.
Research has shown that learning to play a musical instrument during primary school increases a child’s chance of attaining a good OP. In a longitudinal study, students enrolled in primary schools across Queensland since 1990 have been followed until they graduated from high school. Of those students who were involved in an instrument program, 20% obtained an OP of 5 or better, while the figure was 15% for those not involved in instrumental music.
Appendix F Statistical Reasoning Questionnaire
211
SRQ_16 A Brisbane City Council brochure states that on a typical summer weekend, users of the Goodwill Bridge fall into the following age groups.
Age group percentage
of users
0-10 5
11-20 20
21-30 40
31-40 15
41-50 8
51-60 5
61-70 5
71+ 2
One typical summer weekend, 100 people were observed crossing the bridge.
Which of the data sets given below would cause you to question the information in the brochure?
a) Set A only.
b) Set B only.
c) Set C only.
d) Set A and B only.
e) Set A and C only.
f) Set B and C only.
g) Set A, B and C.
h) None of A, B or C.
Set A Set B Set C Age
group percentage
of users Age group
percentage of users Age
group percentage
of users
0-10 5 0-10 10 0-10 7
11-20 13 11-20 20 11-20 19
21-30 32 21-30 45 21-30 43
31-40 14 31-40 10 31-40 13
41-50 13 41-50 5 41-50 10
51-60 12 51-60 5 51-60 3
61-70 11 61-70 3 61-70 4
71+ 0 71+ 2 71+ 1
Appendix F Statistical Reasoning Questionnaire
212
The following data relates to questions 17 to 20
Total rainfall (in millimetres) for the month of November 2003 was collected from 25 weather stations over a region of northern Australia. The data below represent the logarithms (to base 10) of the rainfall. These values have been ordered from smallest to largest for ease of manipulation.
0.00 1.04 1.04 1.18 1.36
1.54 1.56 1.59 1.62 1.62
1.64 1.64 1.71 1.71 1.75
1.76 1.77 1.77 1.80 1.86
1.92 1.95 2.03 2.14 2.41
SRQ_17 What is the median of the above data set (i.e. the median of the log of the rainfalls)?
SRQ_18 What is the lower quartile of the data set?
SRQ_19 Using your answer to question SRQ_17, calculate the median total rainfall (in millimetres) for the region.
SRQ_20 What was the highest total rainfall recorded for the month in the region?
Appendix F Statistical Reasoning Questionnaire
213
SRQ_21 The graph below shows the number of members of the American Mathematical Society.
Join the ever increasing number of professionals who enjoy the benefits of AMS membership! 29 28 26 24 22 20 18 1987 1988 1989 1990 1991 Circle any of the following statements which are correct:
a) Membership of the AMS in 1991 was twice what it was in 1989.
b) Between 1987 and1991 membership of the AMS doubled every two years.
c) Membership of the AMS could reasonably be expected to reach 50000 by 1992
d) None of the above statements is correct.
Mem
bers
hip
in th
ousa
nds
Appendix F Statistical Reasoning Questionnaire
214
SRQ_22 The heights of first year female university students are normally distributed with a mean of 165 cm and a variance of 4 cm2 . The graphs below are all of the standard normal distribution, that is, normal with mean 0 and variance 1. Choose the graph in which the shaded area gives the probability that a randomly chosen first year female student has a height of more than 161 cm.
a) b) c) d) e)
3210-1-2-3
3210-1-2-3
3210-1-2-3
3210-1-2-3
3210-1-2-3
Appendix G Responses to the SRQ
215
Appendix G. Responses to the SRQ
Table G gives the responses to the SRQ, pooled across all three cohorts. For
multiple choice questions, the percentage of students responding to each choice is
given. For free answer items, a summary description of responses is given.
Reponses are ordered according to decreasing levels of understanding.
Item number Item Response Percentage
c) There are a total of 110 children - only response. 65.7
d) There are 2.2 children for every adult. 0.3
e) The most common number of children in a family is 2. 17.2
f) None of the above. 8.0
Multiple responses. 2.6
a) Half the families have more than 2 children. 2.8
b) More families have 3 children than have 2 children. 1.6
SRQ_1
To get the average number of children per family in a small town, a teacher counted the total number of children in the town. She then divided by 50, the total number of families. The average number of children per family was 2.2. Which of the following is certain to be true?
No response. 1.8
Exclude outlier to calculate mean. 37.1 Use median; use median with outlier excluded. 3.4
Uncertain about outlier; calculate mean. 33.4
Mean with max & min excluded. 8.8
Mode. 6.7 Mean of max & min; do it again multiple times & average 0.6
6; discard outlier but then unclear repeat once only 1.5
Multiple responses 4.4
SRQ_2
A small object was weighed on the same scales separately by nine students in a science lab. The weights (in grams) recorded by each student are shown below.
6.3 6.0 6.0 15.3 6.1 6.3 6.26.15 6.3
The true weight could be estimated in several ways. How would you estimate it?
Other; no response 3.9
c) It doesn’t matter. 81.7
a) Box A. 12.9
b) Box B. 5.2
SRQ_3
Box A and Box B are filled with red and blue marbles as follows. Box A Box B
Each box is shaken. In order to win a ticket to a sporting match, you need to get a blue marble, but you are only allowed to pick out one marble without looking. Which box should you choose? No response. 0.2
Table G continued over
30 red 20 blue
12 red 8 blue
Appendix G Responses to the SRQ
216
Item number Item Response Percentage
d) About 15 out of every 100 people who use this medicine get a rash; or this with either of next 2
95.7
a) Don’t use the medicine on your skin – there’s a good chance of getting a rash.
1.1
e) There is hardly any chance of getting a rash using this medicine. 2.1
b) For application to the skin, apply only 15% of the recommended dose.
0.3
c) If you get a rash, it will probably involve only 15% of the skin. 0.8
SRQ_4
A bottle of medicine has the following printed on it: WARNING: For applications to skin areas there is a 15% chance of getting a rash. If you get a rash, consult your doctor. How would you interpret this?
No response. ---
c) The man is over 55 and has had a heart attack. 62.2
a) The man is over 55. 20.8
b) The man has had a heart attack. 15.2 SRQ_5
An Australian male is rushed to hospital in an ambulance. Which of the following is least likely?
No response. 1.8
Either – 0.5, 0.5. 76.5 Heads; tails – 0.5, 0.5. 10.5 No choice – 0.5, 0.5. 0.2
Either – any or no prob given. 4.4
Heads; tails – probs add to 1. 3.6
Other. 3.4
SRQ_6
As captain of your cricket team you have lost 8 out of 9 tosses in your previous 9 matches. For the next 4 tosses of the coin, you choose heads. Tails comes up 4 times. For the 5th toss, what should you choose? What is the probability of getting heads on this 5th toss? What is the probability of getting tails on this 5th toss?
No response. 1.5
b) Buy the Honda, the information in Consumer Reports is based on many cases...
67.4
c) It doesn’t matter which car she buys. She could still be unlucky. 29.9
a) Buy the Toyota, because her friend had so much trouble with his Honda, ...
2.0
SRQ_7
Mrs Jones wants to buy a new car, either a Honda or a Toyota. First she read in Consumer Reports that for 400 cars of each type, the Toyota had more breakdowns than the Honda. Then she talked to three friends. Two were Toyota owners, who had no major breakdowns. The other friend used to own a Honda, but it had lots of break-downs. Which car should Mrs Jones buy? No response 0.7
Adds to more than 100% 72.4
Out of proportion. 5.2 “Other” too large; content and heading inconsistent. 8.8
Uncertain re importance of >100%. 0.3
Not enough info. 1.0
Other. 3.1
SRQ_8 Is there a problem with the following pie chart? If so, identify the problem.
Style issues; no response. 9.1
Table G continued over
Appendix G Responses to the SRQ
217
Item number Item Response Percentage
2000 55.2
Responses that have calculated 10% then stopped. 0.3
Incorrect responses between 1000 and 3000. 11.1
Responses >3000 or between 600 & 800. 4.6
Responses between 250 and 550. 16.9
SRQ_9
A farmer wants to know how many fish are in his dam. He took out 200 fish and tagged each of them. He put the tagged fish back in the dam and let them get mixed with the others. On the second day, he took out 250 fish in a random manner, and found that 25 of them were tagged. Estimate how many fish are in the dam.
No response; <250. 11.9
d) 65% - 74% of those days. 46.4
c) 75% - 84% of those days. 8.0
a) 95% - 100% of those days. 35.9
b) 85% - 94% of those days. 8.3
e) 55% - 64% of those days. 0.5
SRQ_10
The Bureau of Meteorology wanted to determine the accuracy of their weather forecasts. They searched their records for those days when the forecaster had reported a 70% chance of rain. For those particular days ..., they compared the forecast with records of whether or not it actually rained. The forecast of 70% chance of rain can be considered very accurate if it rained on: No response. 0.8
Graph 2 – uniform scale; more accurate; easier to use. 68.6
Graph 2 – clearer; looks better; no reason. 9.8
Graph 2 – grouped; other. 5.4
Graph 1 – any reason; either. 14.9
SRQ_11
A group of students recorded the number of years their families had lived in their town. Here are two graphs that the students drew to illustrate their results: Graph 1: not to scale Graph 2: to scale Which of the graphs would you recommend the students use and why? No response. 1.31
Graph A – size of pieces reflects proportions; more accurate; easier to compare pieces; clearer.
57.9
Graph A – can be done by hand; simpler; all the same; no reason. 4.8
Either; Graph B any reason. 35.0
SRQ_12
Another group of students carried out a survey at the local library regarding the most frequent reason for using the internet. They produced the following two graphs: Graph A: 2D pie graph Graph B: 3D pie graph Which of the graphs would you recommend the students use and why? No response. 2.3
b) Hospital B. 34.6
c) The two hospitals are equally likely to record such an event. 58.5
a) Hospital A. 5.9 SRQ_13
Half of all newborns are girls and half are boys. Hospital A records an average of 50 births a day. Hospital B records an average of 10 births a day. On a particular day, which hospital is more likely to record 80% or more female births? No response. 1.0
Disagree – population increase so need to use rate. 15.6
Disagree – other factors. 29.0
Disagree – correlation doesn’t imply causation; just coincidence; not enough info.
37.6
Agree – large sample size. 1.5
Disagree – other or no reason. 7.8 Agree – explains a possible link; other. 5.1
Agree – no reason. 0.5
SRQ_14
A local newspaper published the following article: (regarding increase in vehicle registrations being linked to heart disease) Do you agree with Mr Robinson’s findings?
No response. 2.9
Table G continued over
Appendix G Responses to the SRQ
218
Item number Item Response Percentage
Disagree – other factors. 18.2
Disagree – correlation doesn’t imply causation; just coincidence; not enough info; want to check data.
9.5
Disagree – not enough difference in %; too much difference in sample sizes; conditional probability confused.
19.0
Agree – large sample size. 7.5
Disagree – other or no reason. 5.6
Agree – statistical info was supplied; explains possible link; other.
25.7
Agree – personal experience; no reason. 7.0
SRQ_15
A music teacher was pleased to read the following article in a professional journal: (regarding link between OP and music lessons) Do you agree with these research findings?
No response. 7.5
a) A only. 42.2
b) B only; 1.6
c) C only. 2.1
d) A & B only. 5.4
e) A & C only 1.6
f) B & C only. 2.1
g) A, B & C. 3.3
h) none 37.6
SRQ_16
A Brisbane City Council brochure states that on a typical summer weekend, users of the Goodwill Bridge fall into the following age groups. (table given) One typical summer weekend, 100 people were observed crossing the bridge. Which of the data sets given below would cause you to question the information in the brochure? A, B, C are in decreasing order of difference from brochure. No response
4.1
SRQ_17 to 20
Total rainfall (in millimetres) for the month of November 2003 was collected from 25 weather stations over a region of northern Australia. The data below represent the logarithms (to base 10) of the rainfall. These values have been ordered from smallest to largest for ease of manipulation.
Correct (1.71 - 13th obs) 66.8 Data value near median; max/2; middle row; mean of middle row. 9.2
Mean; modes. 3.4
Other. 2.3
SRQ_17 What is the median of the above data set (i.e. the median of the log of the rainfalls)?
No response. 18.3
Correct (7th or 6.5th obs) 14.1 6.25th obs.; posn of quartile given not value. 0.5
Data value near quartile; med/2; max/4; first quarter of data. 26.2
3rd quartile. 0.3
Other. 22.7
SRQ_18 What is the lower quartile of the data set?
No response. 36.2
Table G continued over
Appendix G Responses to the SRQ
219
Item number Item Response Percentage
Correct 17.0
Incorrect trans of median involving logs, exp or powers of 10 11.1
Other trans of median 4.4
Median; trans of mean 6.4
Other 7.9
SRQ_19 Using your answer to question SRQ_17, calculate the median total rainfall (in millimetres) for the region
No response 53.2
Correct 16.7 Incorrect trans of max involving logs, exp or powers of 10 11.1
Max 39.3
Other 2.8
SRQ_20 What was the highest total rainfall recorded for the month in the region?
No response 30.1 None of the above statements is correct. 93.7
Membership of the AMS in 1991 was twice what it was in 1989 0.4
Between 1987 and1991 membership of the AMS doubled every two years.
3.0
Membership of the AMS could reasonably be expected to reach 50000 by 1992.
1.3
Multiple responses. 1.7
SRQ_21
The graph below shows the number of members of the American Mathematical Society: (Graph is out of proportion)
No response. 0
d) Shaded area Z<2. 26.2
c) Shaded area Z<-2. 11.8
a) Shaded area Z>1. 23.2
e) Shaded area Z>2. 5.9
b) Shaded area 0<Z<1. 17.3
SRQ_22
The heights of first year female university students are normally distributed with a mean of 165 cm and a variance of 4 cm2 . The graphs below are all of the standard normal distribution, that is, normal with mean 0 and variance 1. Choose the graph in which the shaded area gives the probability that a randomly chosen first year female student has a height of more than 161 cm No response. 15.6
Table G Student responses to the SRQ
Appendix H Project Description and Criteria
220
Appendix H. Project Description and Criteria
QUEENSLAND UNIVERSITY OF TECHNOLOGY
SCHOOL OF MATHEMATICAL SCIENCES
MAB101 Statistical Data Analysis I
Group Project in Data Collection, Presentation and Analysis Weight: This project is weighted at 20% of the final mark for MAB101. Aims: The aims of this project are to bring together ideas from the entire unit, and to enable you to obtain hands-on experience in statistical involvement in experiments or studies, from the beginning of the exe rcise right through to its conclusion. It will further allow you opportunities to apply your learning in MAB101 to real situations. Objectives: Through this project, you will get practical experience in statistical data planning, collection and analysis, and it is also an excellent way of helping your understanding and learning in MAB101. Conditions and Deadlines: The project must be done in groups of 3-4 students. You should aim to organize your groups in the first third of the semester. Groups often form from the practical classes, however, you may combine with students in other practical classes if you wish. Each group should submit a brief informal description (one per group) by email by the end of Week 6 or 7, confirming the names of the students in the group, and describing the context, data to be collected and/or "plan of attack". The description you hand in will assist in your planning, and to help you with feedback – it is not assessable. Make sure you identify in this brief report the variables you are going to collect data on, and the subjects on which data are going to be collected/observed – that is, the names of the columns of your planned spreadsheet/worksheet, and what each row will refer to. Your lecturer will send a comment by return email. Note that this brief informal description carries NO weight towards assessment AT ALL. Resources: These specifications are available on the MAB101 website at http://olt.qut.edu.au/sci/mab101/ under General Information and under Assessment. You will be able to look at past MAB101 projects during the support sessions from now on and during practical classes later in the semester. You may also refer to: Practical development of Statistical Understanding: A Project Based Approach by MacGillivray and Hayes. This book is a guide for doing good statistical projects and will also give you some ideas for suitable topics. Note that previous projects are like published materials – neither data nor reports can be copied and
Appendix H Project Description and Criteria
221
must be referenced in the usual way if you wish to refer to them. Copies of the guide are available in the Reserve Section of the library. (Note: if not please let us know). Format: The project is to be presented as a hard copy document. A project cover sheet will be provided on the olt. The reports should not be unnecessarily long - concise relevant reporting can still cover all points of interest and be more informative than too much detail. The raw data should be included on a disk. Brief outline of project requirements: You are required to identify a context of interest to you, collect relevant data, explore and comment on features of the data, analyse the data using techniques you meet in MAB101, and write a report. You should select and use appropriate statistical tools introduced in MAB101 for some analysis and discussion of the data. The context you choose should be of interest to the group. You must then decide which data to collect and how to collect them. The report should include description of the circumstances and any practical problems encountered in collecting the data. Keep in mind that a reader should be able to either repeat your study or build on it. Don't be too ambitious with your project! If you have difficulties or suffer a brief inspirational vacuum, ask for advice! The project can be considered as having three components: • identifying and describing a context and issues of interest; planning and collecting of relevant data; quality of data and discussion of context/problems; • handling and processing data; summarising, exploring and commenting on features of the data; statistical modelling; • using statistical tools for statistical analysis and interpretation of the data in the context/issues. The points below are augmented by questions and discussion in the project manual on the Web http://www.maths.qut.edu.au/MAB893/manual.htm and in the project reference book. The first component is very important with its emphasis on planning and considering practical aspects of data, obtaining good quality data, and understanding the aspects of data that affect statistical assumptions and analysis. It consists of identifying a context to be investigated; identifying what is of interest; identifying relevant variables; identifying which data to collect and how to collect them; and noting clearly the circumstances and any practical problems in collecting the data, for later interpretation and in case queries arise during subsequent comment on features and analysis. Include in your report: • clear identification of the situation/context, its interest to the group, and issues of
interest/importance; • clear identification of planning/observation, including which variables were
observed and how. Before you start collecting data, check whether there may be information you may regret not including.
Appendix H Project Description and Criteria
222
The second component refers to the handling, processing, exploring and modelling of the data, and preparing it for analysis. Presenting features and information in the data includes use of well-chosen graphs, plots and summaries. Note tha t there is some overlap with the first component in the handling of data: some projects may require more thought in the setting up phase, some may require more thought in the processing of the data. Use of graphs, plots and summaries is not separated from more formal analysis, as graphical and descriptive aspects can feature in preliminaries, in conjunction with more formal analysis, or to illustrate points of the analysis. Note that graphical and descriptive forms appropriate to the data should be chosen. Don't give every graph you can think of, and check if there is information in the data that you haven't presented. This component also involves using statistical tools in exploratory analysis and in identifying models to be formally assessed. The third component is as important as the first two and also overlaps them, but is particularly significant in helping you in learning the application of the data analysis tools being introduced throughout the semester. The project reference book includes examples of the types of datasets and situations that can be analysed, but don’t forget that the project reference book has model projects and is also used by students in units that do a little more than we do in MAB101. In addition, most sections of the MAB101 text give examples for which the tools of that section are useful. One of the aims of the project is to enable you to use the methods of MAB101 on your own data. Sophisticated data analysis is NOT expected: the aim is to make good use of the introductory tools of MAB101. Note that: • analysis techniques appropriate to the data and context should be applied - don't
use the scattergun approach of trying everything in case something is appropriate; and
• check to see if there is relevant information in the data that you haven't analysed. (See the project reference for various examples) For example, is there a time factor? Are some variables linked in some way? Could scaled versions of two different variables be compared?
Appendix H Project Description and Criteria
223
Descriptions of criteria and standards Criteria (i) Identifying and describing a context and issues of interest; planning and collecting of relevant data; quality of data and discussion of context/problems
Mark Description of standard
6.5-7
Thoughtful ideas translated into planning to obtain sound data with a range of variables and observations for investigation of a range of issues. Description of context and practical details sufficient for reader to repeat. Evidence of teamwork.
5-6 Ideas translated into planning to obtain data with variables and observations for investigation of issues. Attempt at description of context and practical details. Some evidence of teamwork.
3-4 Limited data with little description of context and practical details.
1-2 Very little data; poor quality data; little description or teamwork
Criteria (ii) handling and processing data; understanding data and variables and issues to be explored; summarising, exploring and commenting on features of the data
Mark Description of standard
5-6
Correct identification of variables, types of variables & subjects. Demonstrated understanding of nature of data and issues to be explored. Good data entry and preparation of data for analysis. Correct and judicious selection and use of graphs, tables. Range of graphs presents most features of data.
3-4
Attempt at identification of variables & subjects. Limited understanding of nature of data. Some issues identified. A mixture of correct/incorrect, judicious/non-judicious selections of graphs. Data entry and preparation adequate.
1-2 Limited data with little understanding of data, and issues. Negligible or incorrect presentations
Criteria (iii) using statistical tools for statistical analysis and interpretation of the data in the context/issues
Mark Description of standard
6-7 Judicious choices of statistical procedures to analyse range of issues. Mostly correct use and technical interpretation of selected statistical procedures. Synthesis of results and appropriate discussion.
3-5 Choices of statistical procedures to analyse range of issues. Mixture of correct and incorrect use and technical interpretation of statistical procedures. Reasonable attempt at appropriate discussion.
1-2 Negligible or incorrect choices and applications of statistical procedures
Appendix I End of Semester Exam
224
Appendix I. End of Semester Exam
The following is a typical MAB101 end of semester exam. Students are given two
hours to complete the paper. They provide their own double-sided one page
summary of the course material and are permitted to use any calculator.
QUESTION 1 The flight of paper planes (R.Alcaraz, J.Mulholland, A.Tidmarsh, S.Williams 2004) The group investigated variables that might affect the distance and the flight time of different paper aeroplanes. The experiment was conducted in an enclosed space to minimise the influence of the weather. Three different plane designs were made using 3 different types of paper, and each combination was thrown four times by different throwers. For each throw, the flight time, distance, type of landing (nosedive/glide), position on landing (upright/not) and whether there had been any obstacles, were all recorded. All flights took place on the same day in the same location.
(a) The number of variables in this dataset is (circle your answer)
A: 6 B: 7 C: 8 D: 9
(b) Name the continuous variables in this dataset
………………………………………………………………………………………
…………………………………………………………………………………….
(c) When the data in this study are entered on a spreadsheet (or Minitab worksheet),
the rows correspond to (circle your answer)
A: designs B: throws C: paper D: throwers
Question 1 continued overleaf /cont……. MAB101T1.051
Appendix I End of Semester Exam
225
Question 1 continued
(d) The boxplots below are of the flight times in seconds, classified by design and type of paper. For each of the statements, indicate in the column provided whether or not the statement is an appropriate one to make based ONLY on the boxplots below .
Flig
ht
Tim
e
DesignPaper
s tingray glidernick 's paper aeroplanegenericr iceplaincartr idgericeplaincartr idgericeplaincart r idge
6
5
4
3
2
1
0
Boxplot of Flight Time vs Design, Paper
Tick if appropriate Place a X if not
One design is clearly better by the criteria of length of flight time
The plot suggests that different paper types might suit different designs
The average flight time for the stingray glider in cartridge is 2 seconds The variability does not seem to depend on design or type of paper The standard deviation of the flight times for the generic design in rice paper is approximately 1 second
In Nick’s design, half of the cartridge paper planes flew longer than three-quarters of the rice paper planes
In the generic design, the rice paper planes had the most variable and the most skew times
Some of the observations should be discarded
(e) The stem-and-leaf plot below is of the distance in metres travelled by all the paper planes made of plain paper. Use this plot to answer the following questions.
Stem-and-leaf of Distance_plain N = 48 Leaf Unit = 0.10 1 2 0 4 3 299 14 4 0035699999 21 5 1234666 (7) 6 0222799 20 7 000022348 11 8 0335777 4 9 046 1 10 9
Question 1(e) continued overleaf /cont……. MAB101T1.051
Appendix I End of Semester Exam
226
Question 1(e) continued
(i) The median of the distance for plain paper is …………….
(ii) The lower quartile of the distance for plain paper is ………..
(iii) The probability that the distance travelled by a plain paper plane is more than 6 metres is estimated directly from the data by………….
(iv) Assuming the flight distances of plain paper planes are normal with a mean of 6.5 metres and a standard deviation of 2 metres, the probability that a plain paper plane flies more than 6 metres is given by
………………………………………………………………………
………………………………………………………………………….
(SHOW YOUR CALCULATIONS)
(15 marks) /cont……. MAB101T1.051
Appendix I End of Semester Exam
227
QUESTION 2 The table below classify the dress type and age group for people during lunchtime in a busy city street. Rows: dress Columns: age <30 30-39 40-49 50-59 >59 All casual 15 43 19 10 4 91 neutral 17 41 31 19 20 128 smart 7 19 28 19 8 81 All 39 103 78 48 32 300 (a) In carrying out a statistical test to assess whether the type of dress depends on age group, the
expected number in the 50-59 age group dressing casually is given by ………………………….. (b) Which statistical tables will you use in carrying out the above test? [Remember to give all
information needed in order to use the tables]
…………..………
(c) The p-value for the test is 0.003. From this we can say (circle your response):
(1) there is strong evidence that dress type is independent of age group (2) there is strong evidence that dress type depends on the age group
(3) there is a 0.3% chance that dress type is independent of age group.
(d) Based on the above data, an approximate 95% confidence interval for the probability that a
person in the <30 age group dresses smartly is given by
))1(
96.1,)1(
96.1(n
ccc
ncc
c−
+−
− where
(1) n is given by ………
(2) c is given by ………….
(e) The data above was also separated into males and females and tests for independence of dress type and age group carried out for each gender. The p-value for the females was 0.703, and the p-value for the males was 0.000. Comment on these results in a single sentence.
…………………………………………………………………………………………
…………………………………………………………………………………………
…………………………………………………………………………………………
Question 2 continued overleaf /cont…….
MAB101T1.051
Appendix I End of Semester Exam
228
Question 2 continued
(e) It is claimed that in daytime, the 30-39 year age group is more likely to dress smartly than the <30 age group. Let p1 be the probability that a 30-39 year-old in the city during the day dresses smartly, and let p2 be the probability that a <30 year-old in the city during the day dresses smartly.
(iv) The test statistic to test this is given by
2 1ˆ ˆ
1 1ˆ ˆ(1 )
p p
p pm n
−
− +
where
(i) (m, n) are given by ……………….. (ii) and p is given by …………………
(12 marks)
/cont…….
MAB101T1.051
Appendix I End of Semester Exam
229
QUESTION 3
Soluble aspirin (D.Raffelt, T.Do, E.Nguyen, C.Dance 2004) To investigate the time to dissolve different types of soluble aspirin tablets, an experiment was conducted with 5 brands, using the factors water temperature (room/fridge), pH of water (neutral/acidic), and water type (normal/added salt). Aspirin was classified as dissolved once it had broken up and dissociated from the surface of the water. (a) For the normal, neutral water samples, the following summary statistics were obtained Variable Temp N Mean StDev Median Time(sec) Fridge 15 79.20 31.96 65.00 Room 15 68.40 26.16 56.00
(i) Using the above data, a 95% confidence interval for the expected time to dissolve aspirin tablets in normal, neutral, room temperature water is given by 68.4 – c 26.16/√n, 68.4 + c 26.16/√n, where
(1) n is
………………
(2) c is ………………
(ii) It is desired to quote an interval giving a range of time in which most aspirin tablets will dissolve. Which should be used? (circle your response)
(1) a confidence interval.
(2) a tolerance interval.
(iii) It is desired to estimate the expected time to dissolve aspirin in normal, neutral, room temperature water to within 10 seconds with a confidence of 95%. Using 26.2 seconds for the standard deviation of time to dissolve, it is suggested that the number of observations needed should be at least
………………………………………………………………………………
………………………………………………………………………………
……………………………………………………………………………
[SHOW YOUR CALCULATIONS]
Question 3(a) continued overleaf
/cont…….
MAB101T1.051
Appendix I End of Semester Exam
230
Question 3(a) (continued)
(iv) A 95% confidence interval for the standard deviation of the time to dissolve aspirin tablets in normal, neutral, room temperature water is given by
............ 26.16 ............. 26.16,
.............. .................
x x
(v) Assuming equal variances, and testing that there is no difference between the expected
time to dissolve aspirin in normal, neutral water at room temperature or from the fridge, the tables used for the test are
……………….
[Remember to give all information needed in order to use the tables]
(vi) The test statistic to carry out the test of (v) is given by
............. .............
....... .........................
....... .........
−
+
(13 marks) (b) For the times to dissolve aspirin tablets in normal water at room temperature, the following was obtained Two-way ANOVA: Time(sec) versus pH, Brand
Source DF SS MS F P pH 1 70.5 70.53 2.35 0.141 Brand 4 6522.9 1630.72 54.36 0.000 Interaction 4 4160.5 1040.12 34.67 0.000 Error 20 600.0 30.00 Total 29 11353.9
S = 5.477 R-Sq = 94.72% R-Sq(adj) = 92.34%
(i) The p -value of 0.141 tells us that ……………………………………………
………………………………………………………………………………….
(ii) The p-value of 0.000 for Brand tells us that …………………………………
……………………………………………………………………………….
Question 3(b) continued overleaf /cont…….
MAB101T1.051
Appendix I End of Semester Exam
231
Question 3(b) (continued)
(iii) The p-value of 0.000 for Interaction tells us that ………………………………
…………………………………………………………………………………… ……………………………………………………………………………………
(iv) Give a single sentence comment on each of the plots below …………………
…………………………………………………………………………………………
…………………………………………………………………………………………………………… …………………………………………………………………………………………………………… ……………………………………………………………………………………………………………
BrandBrand
Me
an
S olpr inDi spr inC odoxCodisA spro Clear
120
110
100
90
80
70
60
50
40
pHacid ic
Neutr al
Interac tion Plot (data means ) fo r Time(sec)
95% Bo nfe r ro ni Co nfide nce Inte r va ls fo r StDe v s
pH Br and
Neutra l
acidic
S olpr in
Dispr in
C odox
Cod is
As pro C lear
S olpr in
Dispr in
C odox
Cod is
As pro C lear
200150100500
B ar tlet t's T est
0 .619
Test S tat istic 12. 05P-V alue 0.210
L ev ene' s Te st
Test S tat istic 0. 80P-V alue
T es t fo r Eq ua l V ar iance s for T ime(s e c)
(8 marks) /cont……. MAB101T1.051
Appendix I End of Semester Exam
232
QUESTION 4 Reflexes (K.Beakey, N.Hand, J.Rolfe 2004) An experiment was conducted to investigate human reflexes. A ruler was dropped (15.2cm above the hand and by the same group member) on the count of three and the aim was to catch the ruler as quick as possible. A fluorescent and a clear ruler were used, with each subject tested with each ruler for both left and right hands, and the distance of the catch from the bottom of the ruler noted if the subject caught the ruler. The age and dominant hand of each subject was noted. The order of rulers was randomised for each subject. All the subjects caught the fluorescent ruler with their right hand, but some missed the other rulers. (a) Below is part of the output for a oneway ANOVA on the reflex distance for the fluorescent ruler caught by the right hand.
One-way ANOVA: Right Fluorescent versus Age group Source DF SS MS F Age group 2 717.6 ………… …………… Error ……… ……………… ………… Total 39 1922.2
(i) Fill in the blank spaces in the above table (ii) Use the output below to comment on the comparisons across the age groups ………………………………………………………………………………… ………………………………………………………………………………..
Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of Age group Individual confidence level = 98.04% Age group = 0-34 subtracted from: Age group Lower Center Upper -----+---------+---------+---------+---- 35-69 0.362 6.475 12.588 (---------*---------) 70+ 4.270 9.229 14.188 (-------*--------) -----+---------+---------+---------+---- -6.0 0.0 6.0 12.0 Age group = 35-69 subtracted from: Age group Lower Center Upper -----+---------+---------+---------+---- 70+ -3.772 2.754 9.279 (----------*---------) -----+---------+---------+---------+---- -6.0 0.0 6.0 12.0
(7 marks)
Question 4 continued overleaf /cont…….
MAB101T1.051
Appendix I End of Semester Exam
233
Question 4 (continued)
(b) The output below looks at how the reflex distance on the fluorescent ruler caught by the right hand is affected by age group and dominant hand. Give (BRIEFLY) the information provided by this output. ………………………………………………………………………………………… ………………………………………………………………………………………… ……………………………………………………………………………………….. ………………………………………………………………………………………. Analysis of Variance for Right Fluorescent, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Age group 2 717.61 389.08 194.54 6.30 0.005 R/L Handed 1 75.28 30.51 30.51 0.99 0.327 Age group*R/L Handed 2 78.68 78.68 39.34 1.27 0.293 Error 34 1050.63 1050.63 30.90 Total 39 1922.20
Fitted Value
Sta
nd
ard
ize
d R
esi
du
al
27.525.022.520.017.515.0
3
2
1
0
-1
-2
-3
Residuals Versus the Fitted Values(response is Right Fluorescent)
(7 marks)
(c) The output over investigates how the reflex length for the clear ruler caught by the right hand is related to the reflex lengths for the other three catches (provided all four are caught) and age of the subject. Give (BRIEFLY) the information provided by this output.
Question 4(c) continued overleaf
/cont……. MAB101T1.051
Appendix I End of Semester Exam
234
Question 4(c) continued
………………………………………………………………………………… ………………………………………………………………………………….
………………………………………………………………………………………… ………………………………………………………………………………………… ………………………………………………………………………………………
Regression Analysis: Right Clear versus Right Fluore, Left Fluores, ... The regression equation is Right Clear = 7.64 + 0.330 Right Fluorescent + 0.158 Left Fluorescent + 0.0475 Age + 0.077 Left Clear 33 cases used, 7 cases contain missing values Predictor Coef SE Coef T P Constant 7.642 2.607 2.93 0.007 Right Fluorescent 0.3301 0.1339 2.47 0.020 Left Fluorescent 0.1584 0.1334 1.19 0.245 Age 0.04745 0.03926 1.21 0.237 Left Clear 0.0771 0.1364 0.57 0.576 S = 3.95723 R-Sq = 57.0% R-Sq(adj) = 50.9% Unusual Observations Right Right Obs Fluorescent Clear Fit SE Fit Residual St Resid 38 28.9 19.000 21.707 2.720 -2.707 -0.94 X X denotes an observation whose X value gives it large influence.
Age
Stan
dard
ized
Re
sid
ual
908070605040302010
2
1
0
-1
-2
Residuals Versus Age(response is Right Clear)
Question 4(c) continued overleaf /cont…….
MAB101T1.051
Appendix I End of Semester Exam
235
Question 4(c) (continued)
Fitted Value
Sta
nd
ard
ize
d R
esi
dua
l
28262422201816141210
2
1
0
-1
-2
Residuals Versus the Fitted Values(response is Right Clear)
Right Fluorescent
Sta
ndar
dize
d R
esid
ual
302520151050
2
1
0
-1
-2
Residuals Versus Right Fluorescent(response is Right Clear)
Standard ized Re sid ua l
Pe
rcen
t
3210-1-2- 3
99
95
90
80
70
60
5040
30
20
10
5
1
Normal Probabi lity P lot of the Res idual s(r esponse is Right C lear)
(8 marks) MAB101T1.051 /cont…….
Appendix I End of Semester Exam
236
QUESTION 5
In the paper plane experiment of Question 1 above, the relationship between flight time (in seconds) and distance travelled (in metres) were analysed for the design called Nick’s paper aeroplane.
Regression Analysis: time_n versus dist_n The regression equation is time_n = 1.15 + 0.0823 dist_n Predictor Coef SE Coef T P Constant 1.1546 0.2508 4.60 0.000 dist_n 0.08228 0.04637 1.77 0.083 S = 0.672386 R-Sq = 6.4% R-Sq(adj) = 4.4% (a) What is the p-value of 0.083 telling us? ………………………………………..
…………………………………………………………………………………………..
(b) What is the R-sq telling us? ………………………………………..
…………………………………………………………………………………………..
(c) The following output continues investigating the above relationship.
MTB > let c16=’dist_n’**2 MTB > let c17=’dist_n’**3 Regression Analysis: time_n versus dist_n, C16, C17 The regression equation is time_n = 3.47 - 1.44 dist_n + 0.277 C16 - 0.0148 C17 Predictor Coef SE Coef T P Constant 3.4703 0.6858 5.06 0.000 dist_n -1.4446 0.4889 -2.95 0.005 C16 0.2774 0.1059 2.62 0.012 C17 -0.014782 0.006928 -2.13 0.038 S = 0.595347 R-Sq = 29.8% R-Sq(adj) = 25.0% Analysis of Variance Source DF SS MS F P Regression 3 6.6249 2.2083 6.23 0.001 Residual Error 44 15.5953 0.3544 Total 47 22.2202 Unusual Observations Obs dist_n time_n Fit SE Fit Residual St Resid 12 9.35 1.2180 2.1356 0.3965 -0.9176 -2.07RX 13 1.46 3.1330 1.9066 0.2372 1.2264 2.25R 15 0.57 2.9030 2.7343 0.4587 0.1687 0.44 X 16 2.84 2.5150 1.2669 0.1447 1.2481 2.16R 27 2.02 0.3360 1.5625 0.1736 -1.2265 -2.15R 46 3.88 2.6720 1.1787 0.1266 1.4933 2.57R R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large influence.
Question 5(c) continued overleaf /cont…….
MAB101T1.051
Appendix I End of Semester Exam
237
Question 5(c) (continued)
(i) What was done to produce the above output? …………………………………….
…………………………………………………………………………………………..
(ii) What does the p-value of 0.005 tell us? ……………………………………………
…………………………………………………………………………………………..
(iii) What does the p-value of 0.012 tell us? …………………………………………..
………………………………………………………………………………………….
(iv) What does the p-value of 0.038 tell us? …………………………………………..
…………………………………………………………………………………………
(v) What do the plots below tell us? …………………………………………………..
………………………………………………………………………………………….. …………………………………………………………………………………………..
dist_n
Sta
nda
rdiz
ed R
esid
ual
9876543210
3
2
1
0
-1
-2
Residuals Versus dist_n(response is time_n)
Standardized Residual
Per
cent
3210-1-2-3
99
95
90
80
70
60
5040
30
20
10
5
1
Normal Probability Plot of the Residuals(response is time_n)
(10 marks)
END OF PAPER
238
Bibliography
239
Bibliography
Adams, R.J. and Khoo, S.T. (1996). Quest: The Interactive Test Analysis System.
Version 2.1. Melbourne: Australian Council for Educational Research.
Albert, J.H. (2003). College Students' Conceptions of Probability. The American
Statistician, 57 (1), 37-45.
Bandura, A. (1986), Social Foundations of Thought and Action: A Social Cognitive
Theory, Englewood Cliffs, NJ: Prentice Hall.
Ben-Zvi, D. (2000). Toward Understanding the Role of Technological Tools in
Statistical Learning. Mathematical Thinking and Learning, 2 (1-2), 127-155.
Ben-Zvi, D. and Garfield, J. (eds.) (2004), The Challenge of Developing Statistical
Literacy, Reasoning and Thinking, Dordrecht, The Netherlands: Kluwer
Academic Publishers.
Biggs, J.B. and Collis, K.F. (1982), Evaluating the Quality of Learning: The Solo
Taxonomy, New York: Academic Press.
Biggs, J.B. and Collis, K.F. (1991), "Multimodal Learning and the Quality of
Intelligent Behaviour," in Intelligence: Reconceptualization and
Measurement, ed. H. A. H. Rowe, Hillsdale, NJ: Lawrence Erlbaum, pp. 57-
76.
Broers, N.J. (2006). Learning Goals: The Primacy of Statistical Knowledge.
Proceedings of the 7th International Conference on Teaching Statistics:
Working Cooperatively in Statistics Education, Salvador, Brazil: International
Statistical Institute.
Bibliography
240
Chance, B.L. (2002). Components of Statistical Thinking and Implications for
Instruction and Assessment. Journal of Statistics Education, 10 (3),
www.amstat.org/publications/jse/secure/v10n3/chance.html.
Chance, B.L. and Garfield, J. (2002). New Approaches to Gathering Data on Student
Learning for Research in Statistics Education. Statistics Education Research
Journal, 1 (2), 38-41.
Cobb, P. (1999). Individual and Collective Mathematical Development: The Case of
Statistical Data Analysis. Mathematical Thinking and Learning, 1, 5-43.
Cockcroft, W.H. (1982), "Mathematics Counts : Report of the Committee of Inquiry
into the Teaching of Mathematics in Schools," Technical, Her Majesty's
Stationery Office.
Coutis, P., Cuthbert, R. and MacGillivray, H. (2002). Bridging the Gap between
Assumed Knowledge and Reality: A Case for Supplementary Learning
Support Programs in Tertiary Mathematics. Proceedings of the Engineering
Mathematics and Applications Conference, The Institution of Engineers,
Australia, pp. 97-102.
Cuthbert, R. and MacGillivray, H.L. (2003). Investigating Weaknesses in the
Underpinning Mathematical Confidence of First Year Engineering Students.
Proceedings of the Australasian Engineering Education Conference, The
Institution of Engineers, Australia, pp. 358-368.
delMas, R. (2002a). Statistical Literacy Reasoning and Learning. Journal of
Statistics Education, 10 (3),
www.amstat.org/publications/jse/v10n3/delmas_intro.html.
delMas, R. (2002b). Statistical Literacy, Reasoning and Learning: A Commentary.
Journal of Statistics Education, 10 (3),
www.amstat.org/publications/jse/v10n3/delmas_discussion.html.
delMas, R. (2004), "A Comparison of Mathematical and Statistical Reasoning," in
The Challenge of Developing Statistical Literacy, Reasoning and Thinking,
Bibliography
241
eds. D. Ben-Zvi and J. Garfield, Dordrecht, The Netherlands: Kluwer
Academic Publishers, pp. 79-95.
delMas, R., Garfield, J. and Chance, B.L. (1999). A Model of Classroom Research in
Action: Developing Simulation Activities to Improve Students' Statistical
Reasoning. Journal of Statistics Education, 7 (3),
www.amstat.org/publications/jse/secure/v7n3/demas.cfm.
Finney, S.J. and Schraw, G. (2003). Self-Efficacy Beliefs in College Statistics
Courses. Contemporary Educational Psychology, 28 (2), 161-186.
Gal, I. (ed.) (2000), Adult Numeracy Development: Theory, Research, Practice,
Cresskill, NJ: Hampton Press.
Gal, I. (2004), "Statistical Literacy: Meanings, Components, Responsibilities," in The
Challenge of Developing Statistical Literacy, Reasoning and Thinking, eds.
D. Ben-Zvi and J. Garfield, Dordrecht, The Netherlands: Kluwer Academic
Publishers, pp. 47-78.
Gal, I. and Garfield, J. (1997), The Assessment Challenge in Statistics Education (1st
ed.), IOS Press & International Statistical Institute.
Gal, I. and Ginsberg, L. (1994). The Role of Beliefs and Attitudes in Learning
Statistics: Towards an Assessment Framework. Journal of Statistics
Education, 2 (2), www.amstat.org/publications/jse/v2n2/gal.html.
Gal, I., Ginsberg, L. and Schau, C. (1997), "Monitoring Attitudes and Beliefs in
Statistics Education," in The Assessment Challenge in Statistics Education,
eds. I. Gal and J. Garfield, Amsterdam: IOS Press, pp. 37-51.
Garfield, J. (1991). Evaluating Students' Understanding of Statistics: Development of
the Statistical Reasoning Assessment. Proceedings of the Thirteenth Annual
Meeting of the North American Chapter of the International Group for the
Psychology of Mathematics Education, Blacksburg, VA, pp. 1-7.
Garfield, J. (1998). The Statistical Reasoning Assessment: Development and
Validation of a Research Tool. Proceedings of the Fifth International
Bibliography
242
Conference on Teaching Statistics, Singapore: International Statistical
Institute, pp. 781-786.
Garfield, J. (2002). The Challenge of Developing Statistical Reasoning. Journal of
Statistics Education, 10 (3),
www.amstat.org/publications/jse/v10n3/garfield.html.
Garfield, J. (2003). Assessing Statistical Reasoning. Statistics Education Research
Journal, 2 (1), 22-38.
Garfield, J. and Chance, B.L. (2000). Assessment in Statistics Education Issues and
Challenges. Mathematical Thinking and Learning, 2 (1-2), 99-125.
Garfield, J. and Gal, I. (1999). Assessment and Statistics Education: Current
Challenges and Directions. International Statistical Review, 67 (1), 1-12.
Garfield, J., Hogg, R.V., Schau, C. and Whittinghill, D. (2002). First Courses in
Statistical Science: The Status of Educational Reform Efforts. Journal of
Statistics Education, 10 (2),
www.amstat.org/publications/jse/v10n2/garfield.html.
Gnaldi, M. (2003). Students' Numeracy and Their Achievement of Learning
Outcomes in a Statistics Course for Psychologists. Unpublished M.Sc.,
University of Glasgow, Faculty of Statistics.
Hirsch, L. and O'Donnell, A. (2001). Representativeness in Statistical Reasoning:
Identifying and Assessing Misconceptions. Journal of Statistics Education, 9
(2), www.amstat.org/publications/jse/v9n2/hirsch.html.
Hogg, R.V. (1991). Statistical Education: Improvements Are Badly Needed. The
American Statistician, 45, 342-343.
Jansen, P.G.W. and Roskam, E.E. (1986). Latent Trait Models and Dichotomization
of Grade Responses. Psychometrika, 51 (1), 69-91.
Jones, B. (1982), Sleepers, Wake! : Technology and the Future of Work., Melbourne:
Oxford University Press.
Bibliography
243
Kahneman, D., Slovic, P. and Tversky, A. (eds.) (1982), Judgement under
Uncertainty: Heuristics and Biases, Cambridge University Press.
Kahneman, D. and Tversky, A. (1972). Subjective Probability: A Judgement of
Representativeness. Cognitive Psychology, 3, 430-454.
Kahneman, D. and Tversky, A. (eds.) (2000), Choices, Values and Frames,
Cambridge: Cambridge University Press.
Keeves, J.P. and Alagumalai, S. (1999), "New Approaches to Measurement," in
Advances in Measurement in Educational Research and Assessment, eds. G.
N. Masters and J. P. Keeves, Oxford: Pergamon, pp. 23-42.
Keeves, J.P. and Masters, G.N. (1999), "Issues in Educational Measurement," in
Advances in Measurement in Educational Research and Assessment (1st ed.),
eds. G. N. Masters and J. P. Keeves, Oxford: Pergamon, pp. 268-281.
Konold, C. (1989). Informal Conceptions of Probability. Cognition and Instruction, 6
(1), 59-98.
Konold, C. (1990), "Chanceplus: A Computer-Based Curriculum for Probability and
Statistics," Technical, Scientific Reasoning Research Institute, University of
Massachusetts, Amherst.
Konold, C. (1995). Issues in Assessing Conceptual Understanding in Probability and
Statistics. Journal of Statistics Education, 3 (1),
www.amstat.org/publications/jse/v3n1/konold.html.
Lalonde, R.N. and Gardner, R.C. (1993). Statistics as a Second Language? A Model
for Predicting Performance in Psychology Students. Canadian Journal of
Behavioural Science, 25 (1), 108-130.
Lokan, J., Ford, P. and Greenwood, L. (1997), Maths and Science on the Line:
Australian Middle Primary Students' Performance in the Third International
Mathematics and Science Study, Camberwell, Vic: Australian Council for
Educational Research.
Bibliography
244
MacGillivray, H. (2005). Helping Students Find Their Statistical Voices.
Proceedings of the ISI / IASE Satellite on Statistics Education and the
Communication of Statistics, Sydney, Australia: ISI, Voorburg, The
Netherlands.
MacGillivray, H.L. (1998). Developing and Synthesizing Statistical Skills for Real
Situations through Student Projects. Proceedings of the Fifth International
Conference on Teaching Statistics, Singapore: International Statistical
Institute, pp. 1149-1155.
MacGillivray, H.L. (2002). Lessons from Engineering Student Projects in Statistics.
Proceedings of the Australian Engineering Education Conference, Canberra,
Australia: The Institution of Engineers, Australia, pp. 225-230.
MacGillivray, H.L. (2004), Data Analysis: Introductory Methods in Context (1st
ed.), Australia: Pearson - SprintPrint.
Masters, G.N. (1982). A Rasch Model for Partial Credit Scoring. Psychometrika, 47
(2), 149-174.
Masters, G.N. (1988a). The Analysis of Partial Credit Scoring. Applied Measurement
in Education, 1 (4), 279-297.
Masters, G.N. (1988b), "Measurement Models for Ordered Response Categories," in
Latent Trait and Latent Class Models, eds. R. Langeheine and J. Rost, New
York: Plenum.
McLeod, D.B. (1992), "Research on Affect in Mathematics Education: A
Reconceptualization," in Handbook of Research on Mathematics Teaching
and Learning, ed. D. A. Grouws, New York: Macmillan, pp. 575-596.
Meletiou-Mavrotheris, M. and Lee, C. (2002). Teaching Students the Stochastic
Nature of Statistical Concepts in an Introductory Statistics Course. Statistics
Education Research Journal, 1 (2), 22-37.
Moore, D. (1997). New Pedagogy and New Content: The Case of Statistics (with
Discussion). International Statistical Review, 65 (2), 123-137.
Bibliography
245
Pajares, F. and Millerb, M.D. (1995). Mathematics Self-Efficacy and Mathematics
Performances: The Need for Specificity in Assessment. Journal of
Counseling Psychology, 42 (2), 190-198.
Pea, R.D. (1987), "Cognitive Technologies for Mathematics Education," in Cognitive
Science and Mathematics Education, ed. H. Schoenfeld, Hillsdale, NJ:
Lawrence Erlbaum Associates, Inc., pp. 89-122.
Petocz, P. and Reid, A. (2001). Students' Experience of Learning in Statistics.
Quaestiones Mathematicae, Suppl 1, 37-45.
Petocz, P. and Reid, A. (2003). Relationships between Students' Experience of
Learning Statistics and Teaching Statistics. Statistics Education Research
Journal, 2 (2), 39-53.
Pfannkuch, M. and Wild, C.J. (2004), "Towards an Understanding of Statistical
Thinking," in The Challenge of Developing Statistical Literacy, Reasoning
and Thinking, eds. D. Ben-Zvi and J. Garfield, Dordrecht, The Netherlands:
Kluwer Academic Publishers, pp. 17-46.
Pokorny, M. and Pokorny, H. (2005). Widening Participation on Higher Education:
Student Quantitative Skills and Independent Learning as Impediments to
Progression. International Journal of Mathematical Education in Science and
Technology, 36 (5), 445-467.
Queensland Studies Authority. (2004). Mathematics: Year 1 to 10 Syllabus.
Accessed 11 Feb 2005, from
www.qsa.qld.edu.au/yrs1to10/kla/mathematics/docs/syllabus/syllabus.pdf.
Queensland University of Technology. (2004). Making Inroads. Accessed April
2004, from
www.studentservices.qut.edu.au/applying/lodging/undergraduate/inroads.jsp.
Rasch, G. (1960), Probabilistic Models for Some Intelligence and Attainment Tests
(reprinted 1980 ed.), Chicago: University of Chicago Press.
Bibliography
246
Reading, C. (2002). Profile for Statistical Understanding. Proceedings of the Sixth
International Conference on Teaching Statistics: Developing a statistically
literate society, Cape Town, South Africa: International Statistical Institute.
Roberts, D. and Bilderback, E. (1980). Reliability and Validity of a Statistical
Attitude Survey. Educational and Psychological Measurement, 40, 235-238.
Roberts, D. and Reese, C. (1987). A Comparison of Two Scales Measuring Attitudes
Towards Statistics. Educational and Psychological Measurement, 47, 759-
764.
Rumsey, D. (2002). Statistical Literacy as a Goal for Introductory Statistics Courses.
Journal of Statistics Education, 10 (3),
www.amstat.org/publications/jse/v10n3/rumsey2.html.
Schau, C., Stevens, J., Dauphinee, T.L. and Del Vecchio, A. (1995). The
Development and Validation of the Survey of Attitudes Towards Statistics.
Educational and Psychological Measurement, 55, 868-875.
Shaughnessy, J.M. (1992), "Research in Probability and Statistics: Reflections and
Directions," in Handbook of Research on Mathematics Teaching and
Learning, ed. D. A. Grouws, Macmillan Publishing Company, pp. 465-494.
Snell, L. (1999). Using "Chance" Media to Promote Statistical Literacy. Paper
presented at the 1999 Joint Statistical Meetings.
Sowey, E.R. (1998). Statistical Vistas: Perspectives on Purpose and Structure.
Journal of Statistics Education, 6 (2),
www.amstat.org/publications/jse/v2n2/gal.html.
Sundre, D.L. (2003). Assessment of Quantitative Reasoning to Enhance Educational
Quality. Proceedings of the paper presented at the American Educational
Research Association meeting, Chicago, Illinois.
Sutarso, T. (1992). Students' Attitudes Towards Statistics. Proceedings of the Annual
Meeting of the Mid-South Educational Research Association.
Bibliography
247
Tempelaar, D. (2004). Statistical Reasoning Assessment: An Analysis of the SRA
Instrument. Proceedings of the ARTIST Roundtable Conference on
Assessment in Statistics, Lawrence University.
Tempelaar, D. (2006). A Structural Equation Model Analyzing the Relationship
Students' Statistical Reasoning Abilities, Their Attitudes toward Statistics,
and Learning Approaches. Proceedings of the 7th International Conference
on Teaching Statistics: Working Cooperatively in Statistics Education,
Salvador, Brazil: International Statistical Institute.
Thomas, S. and Fleming, N. (2004), Summing It Up: Mathematics Achievement in
Australian Schools in TIMSS 2002, Camberwell, Vic: Australian Council for
Educational Research.
Vere-Jones, D. (1995). The Coming of Age of Statistical Education. International
Statistical Review, 63, 3-23.
Waters, L., Martelli, T., Zakrajsek, T. and Popovich, P. (1988). Attitudes toward
Statistics: An Evaluation of Multiple Measures. Educational and
Psychological Measurement, 48, 513-518.
Watson, J.M. (1993), "Introducing the Language of Probability through the Media,"
in Communicating Mathematics - Perspectives from Current Research and
Classroom Practice in Australia, eds. M. Stephens, A. Wayward, D. Clarke
and J. Izard, Melbourne: Australian Council for Educational Research, pp.
119-139.
Watson, J.M. (1997), "Assessing Statistical Thinking Using the Media," in The
Assessment Challenge in Statistics Education, eds. I. Gal and J. Garfield,
Amsterdam: IOS Press, pp. 107-121.
Watson, J.M. and Callingham, R. (2003). Statistical Literacy: A Complex
Hierarchical Construct. Statistics Education Research Journal, 2 (2), 3-46.
Watson, J.M., Kelly, B.A., Callingham, R.A. and Shaughnessy, J.M. (2003). The
Measurement of School Students' Understanding of Statistical Variation.
Bibliography
248
International Journal of Mathematical Education in Science and Technology,
34 (1), 1-29.
Watson, J.M. and Moritz, J. (1999). The Beginning of Statistical Inference:
Comparing Two Data Sets. Educational Studies in Mathematics, 37 (2), 145-
168.
Watson, J.M. and Moritz, J. (2000). The Longitudinal Development of
Understanding of Average. Mathematical Thinking and Learning, 2 (1&2),
11-50.
Wild, C.J. (2006). On Cooperation and Competition. Proceedings of the 7th
International Conference on Teaching Statistics: Working Cooperatively in
Statistics Education, Salvador, Brazil: International Statistical Institute.
Wild, C.J. and Pfannkuch, M. (1999). Statistical Thinking in Empirical Enquiry (with
Discussion). International Statistical Review, 67 (3), 223-265.
Wilson, M. (1992), "Measuring Levels of Mathematical Understanding," in
Mathematics Assessment and Evaluation, ed. T. A. Romberg, Albany: State
University of New York Press, pp. 213-241.
Wilson, M. and Masters, G.N. (1993). The Partial Credit Model and Null Categories.
Psychometrika, 58 (1), 87-99.
Wilson, T.M. and MacGillivray, H.L. (2007). Counting on the Basics: Mathematical
Skills Amongst Tertiary Entrants. International Journal of Mathematical
Education in Science and Technology, 38 (1), 19-41.
Wise, S. (1985). The Development and Validation of a Scale Measuring Attitudes
toward Statistics. Educational and Psychological Measurement, 45, 401-405.
Wright, B.D. (1999), "Rasch Measurement Models," in Advances in Measurement in
Educational Research and Assessment (1st ed.), eds. G. N. Masters and J. P.
Keeves, Oxford: Pergamon, pp. 85-97.