statistical reasoning at the secondary tertiary interfaceeprints.qut.edu.au/16358/3/therese wilson...

Statistical Reasoning

at the Secondary Tertiary Interface

Therese Maree Wilson

BSc (Hons)

A thesis submitted in fulfilment of the requirements of

the degree of Doctor of Philosophy

in the School of Mathematical Sciences,

Queensland University of Technology.

November 2006

ii

Keywords

statistical reasoning

statistical thinking

statistical literacy

numeracy

secondary/tertiary interface

Rasch analysis

partial credit model

attitudes towards statistics

self-efficacy

assessment

statistical education

introductory data analysis course

mathematical thinking

iii

Abstract

Each year thousands of students enrol in introductory statistics courses at universities

throughout Australia, bringing with them formal and informal statistical knowledge

and reasoning, as well as a wide range of basic numeracy skills, mathematical

inclinations and attitudes towards statistics, which have the potential to impact on

their ability to develop statistically. This research develops and investigates

measures of each of these components for students at the interface of secondary and

tertiary education, and investigates the relationships that exist between them, and a

range of background variables. The focus of the research is on measuring and

analysing levels and abilities in statistical reasoning for a range of students at the

tertiary interface, with particular interest also in investigating their basic numeracy

skills and how these may or may not link with statistical reasoning allowing for other

variables and factors. Information from three cohorts in an introductory data analysis

course, whose focus is real data investigations, provides basis for the research. This

course is compulsory for all students in degree programs associated with all sciences

or mathematics.

The research discusses and reports on the development of questionnaires to measure

numeracy and statistical reasoning and the students’ attitudes and reflections on their

prior school experiences with statistics. Students’ attitudes are found to be generally

positive, particularly with regard to their self-efficacy. They are also in no doubt as

to the links that exist between mathematics and statistics.

The Numeracy Questionnaire, developed to measure pre-calculus skills relevant to an

introductory data analysis course which emphasises real data investigations,

demonstrates that many students who have completed a basic algebra and calculus

senior school subject struggle with skills which are in the pre-senior curricula.

Direct examination of the responses helps to understand where and why difficulties

tend to occur. Rasch analysis is used to validate the questionnaire and assist in the

iv

description of levels of skill. General linear models demonstrate that a student’s

numeracy score depends on the result obtained in senior mathematics, whether or not

the student is a mathematics student, gender, whether or not higher level

mathematics has been studied, self-efficacy and year. The research indicates that

either the pre-senior curricula need strengthening or that exposure to mathematics

beyond the core senior course is required to establish confidence with basic skills

particularly when applied to new contexts and multi-step situations.

The Statistical Reasoning Questionnaire (SRQ) is developed for use in the Australian

context at the secondary/tertiary interface. As with the Numeracy Questionnaire,

detailed examination of the responses provides much insight into the range and

features of statistical reasoning at this level. Rasch analyses, both dichotomous and

polychotomous, are used to establish the appropriateness of this instrument as a

measuring tool at this level. The polychotomous, Rasch partial credit model is also

used to define a new approach to scoring a statistical reasoning instrument and

enables development and application of a hierarchical model and measures levels of

statistical reasoning appropriate at the school/tertiary interface.

General linear models indicate that numeracy is a highly significant predictor of

statistical reasoning allowing for all other variables including tertiary entrance score

and students’ backgrounds and self-efficacy. Further investigation demonstrates that

this relationship is not limited to more difficult or overtly mathematical items on the

SRQ.

Performance on the end of semester component of assessment in the course is shown

to depend on statistical reasoning at the beginning of semester as measured by the

partial credit model, allowing for all other variables. Because of the dominance of

the relationship between statistical reasoning (as measured by the SRQ) and

numeracy on entry, some further analysis of the end of semester assessment is

carried out. This includes noting the higher attrition rates for students with less

mathematical backgrounds and lower numeracy.

v

Contents

CHAPTER 1 INTRODUCTION 1

1.1 Motivation 1

1.2 Aims and Scope of the Study 2

1.3 An Overview of the Statistics Education Literature 2

1.3.1 Statistical literacy, reasoning and thinking 3

1.3.2 Curricula and assessment 7

1.3.3 Research instruments for measuring statistical reasoning 9

1.3.4 Student learning 10

1.3.5 Student background 10

1.3.6 Student attitudes 13

1.3.7 Impact of technology 14

1.4 Literacy, Reasoning, Understanding or Thinking? 14

1.5 Structure of the Research 15

1.6 Outline of the Thesis 18

CHAPTER 2 THE CONTEXT OF THE STUDY 21

2.1 Introduction 21

2.2 The Course: Statistical Data Analysis 1 21

2.3 Demographics and Educational Backgrounds of Students 26

CHAPTER 3 ATTITUDES TOWARDS MATHEMATICS AND

STATISTICS 33

vi

3.1 Introduction 33

3.2 Construction of the 2004 Attitudinal Survey 34

3.3 Results from the 2004 Attitudinal Survey 38

3.3.1 Affective responses of students towards statistics 39

3.3.2 Students’ perceived value in studying statistics 40

3.3.3 The motivation of students to learn statistics 42

3.3.4 The links which students perceive between statistics and mathematics 43

3.3.5 How students see the use of statistics in society 44

3.3.6 Students perceived difficulty of statistics 44

3.3.7 Self-efficacy of students in statistics and mathematics 45

3.4 2004 Attitudes Follow-up 47

3.5 Construction of the 2005 Attitudinal Survey 49

3.6 Results from the 2005 Attitudinal Survey 50

3.7 Discussion 58

CHAPTER 4 NUMERACY 61

4.1 Introduction 61

4.2 Construction of the Numeracy Questionnaire 63

4.3 Results of the Numeracy Questionnaire 68

4.3.1 Consideration of student responses 71

4.3.2 Results from Rasch analysis 81

4.3.3 Levels of thinking 83

4.3.4 What influences students’ scores? 86

4.3.5 What influences responses to individual questions? 92

4.4 Discussion 95

CHAPTER 5 THE STATISTICAL REASONING QUESTIONNAIRE 97

vii

5.1 Introduction 97

5.2 Construction of the Statistical Reasoning Questionnaire (SRQ) 100

5.3 Results of the SRQ 107

5.3.1 Student responses to individual items 108

5.4 Discussion - Suitability of the SRQ 124

CHAPTER 6 RASCH ANALYSES FOR THE STATISTICAL

REASONING QUESTIONNAIRE 127

6.1 Introduction 127

6.2 Dichotomous Rasch Model 127

6.3 Polychotomous Rasch Model 131

6.3.1 Consistency of item step codes and difficulties 147

6.3.2 The SRQPC Score 152

6.4 Expected Responses to the SRQ 154

6.5 Discussion - Suitability of the SRQ 157

CHAPTER 7 FACTORS WHICH INFLUENCE STATISTICAL

REASONING 159


7.2 Predictors of Statistical Reasoning of Incoming Students 160

7.2.1 Results for the 2004 cohort 161

7.2.2 Results for the 2005 cohort 162

7.2.3 Combining the 2004 and 2005 cohorts 162

7.2.4 Synthesising the results 164

7.2.5 Recent school leavers 165

7.2.6 Incorporating levels of numeracy 166

7.3 Predictors of the SRQPC Scores 168

viii

7.4 Further Exploration of the Link between Numeracy and Statistical

Reasoning 170

7.5 Predictors of the Exam Component of Assessment 173

7.6 Discussion 178

CHAPTER 8 IMPLICATIONS 183


8.2 Implications within the Research 183

8.2.1 Extent and limits of the study 188

8.3 Implications for Teaching, Assessment and Advising 190

8.4 Implications for Future Research 193

APPENDIX A. BACKGROUND INFORMATION SURVEY 196

APPENDIX B. 2004 ATTITUDINAL SURVEY 197

APPENDIX C. 2004 FOLLOW-UP ATTITUDINAL SURVEY 199

APPENDIX D. 2005 ATTITUDINAL SURVEY 200

APPENDIX E. NUMERACY QUESTIONNAIRE 201

APPENDIX F. STATISTICAL REASONING QUESTIONNAIRE 206

APPENDIX G. RESPONSES TO THE SRQ 215

APPENDIX H. PROJECT DESCRIPTION AND CRITERIA 220

APPENDIX I. END OF SEMESTER EXAM 224

BIBLIOGRAPHY 239

ix

List of Figures

Figure 1.1 The concept map illustrates the structure of the literature. 4

Figure 2.1 Years since graduating from high-school, by cohort 28

Figure 2.2 OP scores reported by students for each cohort and each

program type 28

Figure 2.3 Level of maths reported as having been studied for each cohort 30

Figure 2.4 Maths B results by cohort 30

Figure 3.1 Responses for the affect aspect of attitudes tend to be positive. 39

Figure 3.2 Responses for the value aspect of attitudes are generally positive. 40

Figure 3.3 Responses for the motivation aspect of attitudes are somewhat

contradictory. 42

Figure 3.4 Responses for the links aspect of attitudes are strongly positive. 43

Figure 3.5 Responses for the use aspect of attitudes are negative. 44

Figure 3.6 Responses for the difficulty aspect of attitudes are relatively

neutral. 44

Figure 3.7 Responses for the self-efficacy aspect of attitudes tend to be

positive. 45

Figure 3.8 Responses for the Likert items repeated in 2005 51

Figure 3.9 Thoughts on statistics according to whether or not it was

considered beneficial in grade 11 and 12 57

x

Figure 4.1 The distribution of total Numeracy Scores is similar across

the two cohorts. 70

Figure 4.2 The variable map shows the students (on the left hand side) and

items (on the right hand side) displayed on a single logistic scale. 84

Figure 4.3 The residual plots for the model in Equation 4.2 show no

systematic concerns. 89

Figure 4.4 The residual plots for the model in Equation 4.3 show slight

indication of long-tailedness. 91

Figure 5.1 The distribution of SRQ Scores is consistent across the three

cohorts. 108

Figure 6.1 The variable map for the dichotomous Rasch model displays the

item and person parameter estimates. 130

Figure 6.2 The variable map for the Rasch partial credit model displays the

person ability and item step difficulty estimates. 146

Figure 6.3 The distribution of SRQPC Scores is essentially consistent across

the three cohorts. 153

Figure 6.4 Scatterplot showing differences in expected responses between

Maths groups 156

Figure 7.1 Residual plots for the model of Equation 7.1 show no systematic

concerns. 163


concerns. 164

Figure 7.3 Residual plots for the model in Equation 7.7 indicate some left

skewness. 169

Figure 7.4 Using the transformed response variable, residual plots

demonstrate no systematic concerns. 170

xi

Figure 7.5 Boxplots of Introductory and Intermediate Numeracy Scores by

response to SRQ_9 172

Figure 7.6 Residual plots for the model in Equation 7.7 show some indication

of skewness. 176

xii

List of Tables

Table 2.1 Topics covered in MAB101 23

Table 2.2 Assessment tasks in MAB101 in 2004, 2005 25

Table 2.3 Percentage of male and female students for each cohort 26

Table 2.4 Numbers of students surveyed, by program type, gender and

cohort 27

Table 3.1 What was beneficial – as reported by students who considered

statistics beneficial in grade 11 and 12 54

Table 3.2 What wasn’t beneficial – as reported by students who considered

statistics not beneficial in grade 11 and 12 55

Table 3.3 When I think of probability and statistics at school, I think of … 56

Table 4.1 Course breakdown of student cohort and respondents over two

years 69

Table 4.2 Items and responses to Numeracy Questionnaire in order of

difficulty 72

Table 4.3 Measures of fit from Rasch analysis indicate that the model fits

well. 82

Table 4.4 Significance of predic tors in the logistic regression for each item

on the Numeracy Questionnaire 93

Table 4.5 Student responses to item N_7 according to mathematical

background 95

xiii

Table 5.1 Questions in the SRQ are chosen to assess all aspects of reasoning

at the full range of levels. 106

Table 6.1 Fit statistics for the dichotomous Rasch model 128

Table 6.2 Each item response was coded for the Rasch partial credit model

on the basis of a substantive framework. 136

Table 6.3 Overall fit statistics for the Rasch partial credit model 144

Table 6.4 Individual item fit statistics for the Rasch partial credit model 145

Table 6.5 Expected responses to the SRQ predicted by the Rasch partial

credit model 155

Table 7.1 SRQ items for which successful and unsuccessful students reflect

a significant difference in mean Numeracy Scores 171

Table G Student responses to the SRQ 215

xiv

Statement of Original Authorship

The work contained in this thesis has not been previously submitted for a degree or

diploma at any other higher education institution. To the best of my knowledge and

belief, the thesis contains no material previously published or written by another

person except where due reference is made.

Signature: ____________________

Date: ___________

xv

Acknowledgements

I wish to acknowledge the friends and colleagues whose support and companionship

made this process much easier and far more enjoyable than it would otherwise have

been, in particular:

• my supervisor, mentor and friend, Helen MacGillivray, whose professional

input and personal encouragement got me started and kept me going;

• my husband, Richard, whose commitment and devotion enabled me to focus

on the task;

• my children, Katie and Stephen, whose patience and understanding made it

possible;

• my friend, Christine, who shared the journey and prayed me through it.

Thank you.

Even youths grow tired and weary, and young men stumble and fall;

but those who hope in the Lord will renew their strength.

They will soar on wings like eagles;

they will run and not grow weary,

they will walk and not be faint.

(Isaiah 40:30-31)

Chapter 1 Introduction

1


1.1 Motivation

It is over twenty years since Barry Jones (1982) noted that:

Australia is an information society in which more people are employed in collecting,

storing, retrieving, amending and disseminating data than in producing food, fibres and

minerals and manufacturing products.

It is entirely appropriate then, that each year thousands of students enrol in

introductory statistics courses at universities: some by choice, most as a compulsory

component of a chosen program.

The awareness of this information society is reflected in recent educational reforms,

and the study of statistics has received increasing prominence at the school level,

with Chance and Data now forming a component of all Australian primary and junior

secondary curricula. Hence students enrolling in introductory tertiary courses do so

with varying degrees of statistical skill and experience. Yet it is not only this formal

statistical knowledge with which a student embarks on tertiary study. Students also

bring with them a collection of informal knowledge which they have accumulated

from different sources. As well as both these forms of statistical knowledge, a

number of components such as basic numeracy, mathematical inclinations, attitudes

towards statistics and self-efficacy also have the potential to impact on a student’s

ability to develop statistical thinking within the context of an introductory tertiary

course.

With the growth of statistics as a discipline in its own right, there has been an

increasing tendency amongst some of the statistical community to focus on the

differences between mathematics and statistics rather than the commonalities. This

has resulted in an inclination to devalue the role of numerical and mathematical skill

in the ability of students to develop statistical thinking. Establishing and quantifying


2

the importance of this role has the potential to facilitate a better understanding of the

programs and support which students need in order to develop their statistical

thinking at the secondary/tertiary interface.

1.2 Aims and Scope of the Study

The fundamental aim of this research is to investigate, explore and model the

statistical thinking of students at the interface of secondary and tertiary education. A

profile of students entering a general first year statistics unit is formed with respect

to: mathematical and general academic background; attitude, beliefs and self-efficacy

with regard to mathematics and statistics; basic numeracy; and statistical reasoning.

Relationships between these factors as well as course outcomes are explored, with

the strength of the relationship between statistical reasoning and numeracy being of

particular interest.

While the research has been conducted within the context of a single specific

introductory data analysis subject, it is believed that the broad findings are applicable

across a range of such courses, particularly within the Australian setting where

statistics courses are traditionally included in an undergraduate program. The subject

under investigation is conducted in a School of Mathematical Sciences and

completed within programs associated with science. While this setting is not

uncommon, particularly in Australia, in the past such students have less often been

the subjects of research into the development of statistical thinking than their social

science counterparts.

1.3 An Overview of the Statistics Education Literature

In the past ten to twenty years there has been a considerable increase in research in

the area of the teaching and learning of statistics, to the point where statistical

education is now a research field in its own right, supporting entire journals such as

the Journal of Statistical Education (JSE) and the Statistical Education Research


3

Journal (SERJ) as well as regular international conferences, of which the most

celebrated is the International Conference on Teaching Statistics (ICOTS), and

numerous local meetings throughout the world. Because of the breadth of issues

which touch on the research reported in this thesis, it is fitting to include an overview

of the broader field in which this work sits. The concept map in Figure 1.1 indicates

the links between the different aspects of the literature and assists in illustrating the

complexities of the research.

1.3.1 Statistical literacy, reasoning and thinking

The concepts of statistical understanding, literacy, reasoning and thinking form the

basis of much work and it is generally agreed that the attainment or improvement of

one or more of these should be the ultimate goal of all statistics courses (Garfield, et

al., 2002). However, in much of the literature, the terms statistical understanding,

statistical literacy, statistical reasoning and statistical thinking are used

interchangeably and frequently without definition (delMas, 2002a). Rumsey (2002),

Garfield (2002) and Chance (2002) seek to clarify the differences between three of

these concepts in a collection of articles, published following a symposium at the

2000 Annual Meetings of the American Educational Research Association (AERA).

Further elaborations and clarifications in continuation of this work are attempted in

Ben-Zvi and Garfield (2004).

The term statistical literacy most often refers to the type of understanding needed to

be a good consumer of statistics. After quoting a number of definitions for statistical

literacy, from

… the ability to understand statistical concepts and reason at the most basic level

(Snell, 1999)

to

People’s ability to interpret and critically evaluate statistical information and data-based

arguments appearing in diverse media channels, and their ability to discuss their

opinions regarding such statistical information (Gal, 2000),


4

Figure 1.1 The concept map illustrates the structure of the literature.

Statistical Literacy, Reasoning & Thinking

How it works

What to assess

What to teach

How to teach

How to assess

Student Learning

What How

Technology

Gaps in

skills

Attitudes &

Beliefs

Misconceptions

Bac

kgro

und

Course

Self-

Efficacy

What it is Reasoning


5

Rumsey (2002) elects instead to define two different concepts: statistical competence

as

the basic knowledge that underlies statistical reasoning and thinking

and statistical citizenship as

the ultimate goal of developing the ability to function as an educated person in today’s

age of information.

Her concept of statistical competency then mirrors Snell’s early definition of

statistical literacy, while her statistical citizenship reflects Gal’s early definition.

In a more recent exposition of statistical literacy, Gal (2004) rejects a common

understanding of statistical literacy as defining a basic set of skills required to

function successfully, for a broader understanding which involves the ability of

citizens to be successful data consumers. This relies upon the ability to interpret and

critically evaluate statistical information and to discuss and communicate reactions to

this information. Based on this definition, Gal builds a model of statistical literacy

founded on both knowledge and dispositional elements.

By analysing over 3000 school students’ responses in a number of studies, Watson

and Callingham (2003) argue that statistical literacy is a hierarchical construct. They

identify six levels of statistical literacy from idiosyncratic literacy where personal

beliefs and experience dominate engagement with context, through to critical

mathematical literacy which requires a questioning engagement with context and

proportional reasoning.

Garfield and Gal (1999) broadly define statistical reasoning as

the way people reason with statistical ideas and make sense of statistical information.

On the basis of research into students’ reasoning about samples, Garfield (2002)

further defines five hierarchical levels of statistical reasoning, ranging from

idiosyncratic reasoning, where the student knows and uses some statistical words

and symbols but often without understanding, through to integrated process

reasoning, where the student fully integrates all components, resulting in a complete


6

understanding. These levels are similar to Watson and Callingham’s six levels of

statistical literacy, but the different setting and reliance on verbal descriptors makes

them difficult to compare directly.

Statistical thinking generally refers to having a statistical mindset or approach to

problems. Chance (2002) lists a number of definitions, all of which include the

importance of data, the presence of variability and the modelling or explanation of

variation.

In an attempt to develop a framework for the workings of statistical thinking as

practised by statisticians, Wild and Pfannkuch (1999) put forward a four dimensional

model in which each of the dimensions: the investigative cycle, types of thinking, the

interrogative cycle and dispositions, is described in detail. Aside from some further

elaboration (Pfannkuch and Wild, 2004), this thorough and complex model has

received little further development. It describes a level of thinking higher than that

defined by Chance (2002) and beyond the level of thinking at the secondary/tertiary

interface.

In describing statistical reasoning and attempting to distinguish it from statistical

thinking, delMas (2004) acknowledges that statistical reasoning and thinking may

both be involved when working on a single task, and that this implies that they

cannot be distinguished by the content of a problem. He claims however that the two

can be distinguished by the nature of a task. He describes statistical thinking as

involving tasks such as knowing when and how to apply statistical procedures, and

statistical reasoning as involving tasks such as explaining why results were produced

or how a conclusion is justified. There is an implication that statistical thinking

requires a degree of creativity which is not involved in statistical reasoning.

The term statistical understanding is even less clearly defined than the other three

concepts. Even within the series of articles aimed at clarifying statistical literacy,

reasoning and thinking, the use of statistical understanding varies. DelMas (2002a)

introduces the articles by describing statistical literacy, reasoning and thinking as

“types of understanding”, implying that statistical understanding is a broader concept

encompassing the other three. Garfield defines reasoning in terms of understanding

and Rumsey uses statistical understanding without any clear definition. In


7

summarizing the articles by Chance, Garfield and Rumsey, delMas (2002b) refers to

“statistical understanding, reasoning and thinking”, presumably taking statistical

understanding to be synonymous with statistical literacy.

Despite attempts to clarify these terms, there remains a degree of overlap, such that

continuing imprecision of their usage within the literature is inevitable. Broers

(2006) argues that, on the basis of this confusion and imprecision, a more appropriate

goal for statistics education is that of statistical knowledge which he argues is more

consistently definable and measurable.

1.3.2 Curricula and assessment

Clarification and modelling of the concepts and workings of statistical understanding

lead to an increased focus on what should be taught and assessed in introductory

statistics courses and how such teaching and assessment can be most effectively

carried out. Hogg (1991) calls educators to task for blaming difficulties on ill-

prepared and unmotivated students, while being poor teachers with little interest in

self- improvement. He emphasises the needs for goals to be carefully selected and

course-specific. Moore (1997) pushes even more strongly for statisticians to

reconsider the content of introductory courses. His suggestion to emphasise

statistical thinking, rather than theory and computation, through the use of more data,

teaching of more concepts and fewer recipes and derivations, along with the

automation of computations and graphics, is now the norm for many courses. As a

result of studying the relationship between students’ perceptions of learning and

teaching, Petocz and Reid (2003) call for teachers to focus less on content and more

on encouraging students to appreciate the wider impact of thinking statistically.

Moore (1997) also emphasises a change in the context of statistical learning. He

argues that

our teaching must avoid the professionals’ fallacy of imagining that our first courses are

a step in the training of statisticians.

On the other hand, opening statistical vistas for undergraduate students provides

them with a greater understanding of the purpose and structure of statistics (Sowey,

1998). While the majority of students in such courses may never become


8

statisticians, some will and, hopefully, some will be attracted to statistics through

such courses (Wild, 2006). The danger in Moore’s approach is that the needs of

these students will be ignored by failing to provide the breadth and depth necessary

on which to base further study or even a perspective which encourages it. Indeed

much of the research in statistics education is performed from the perspective of

students who are uncomfortable with mathematics with little regard for the more

mathematically capable.

While the considerable discussion surrounding course content has resulted in

substantial changes to most curricula (Garfield, et al., 2002), there remains a

significant need to document the effects of this change, identify the most effective

educational techniques and build models of how students come to understand

statistical concepts (Chance and Garfield, 2002). In one effort to identify effective

educational techniques, delMas et al. (1999) report on research into the use of

computer simulations to improve student understanding of sample distributions and

the Central Limit Theorem. They report that student understanding is optimised by

requiring students to predict the nature of the sample before carrying out simulations.

Meletiou-Mavrotheris and Lee (2002), believing that students’ statistical

understanding is limited by a deterministic mindset, improve student outcomes in an

introductory, non-mathematical course by adapting the curriculum to include

numerous activities aimed at specifically creating an awareness of variation in

everyday contexts.

Changes in content have lead to changes in assessment practices with the call for

teachers to assess what they value (Chance, 2002). However, Garfield et al. (2002),

in a survey of US statistics educators, report that of all areas of statistics education,

assessment practices have undergone the least reform. Garfield and Gal (1999) claim

that traditional tests are narrow and insufficient. Their list of alternative assessment

items includes

• individual or group projects

• portfolios of student work

• concept maps


9

• critiques of statistical ideas or issues in the news

• objective format questions aimed at assessing higher level thinking

• minute papers, where students have a minute immediately following a class

to write something about what they have learnt.

Assessment practices can have an explicit educational impact. MacGillivray (1998;

2002; 2005) explains how the use of own-choice group projects can synthesise

knowledge for meaningful use.

1.3.3 Research instruments for measuring statistical reasoning

Aside from the assessment issues within the context of a specific course, is the

question of how to assess the broader concepts of statistical literacy, reasoning and

thinking. While believing that these concepts are best assessed via one-on-one

communication and in-depth tasks such as projects, Garfield (2003) acknowledges

the need, particularly for use in research, for the development of an instrument which

is both easy to administer and score, and which accurately measures statistical

reasoning. The Statistical Reasoning Assessment (SRA) was developed and

validated to measure the effectiveness of a new statistics curriculum for high-school

students in the US (Konold, 1990; Garfield, 1991, 2003) and has been used in a

variety of settings since. This instrument consists of twenty multiple-choice

questions, which purport to provide scores on eight separate scales of correct

reasoning and eight scales of incorrect reasoning. However, tests of validity and

reliability have yielded low scores (Garfield, 1998). Of particular concern are the

extremely low correlations achieved between SRA scores and course outcomes

(Garfield, 2003). Whether this is an indication that

statistical reasoning and misconceptions are unrelated to students’ performance in a first

statistical course,

as Garfield suggests, or an indication of a weakness in the instrument is unknown.

An adaptation of the SRA by Sundre (2003) is aimed at addressing issues of low

internal consistency and improving the ease of scoring. Tempelaar (2004) argues

that a better use of the SRA is to combine correct and incorrect reasoning scores, and


10

uses this approach in a structural equation model which analyses relationships

between statistical reasoning abilities, attitudes and learning approaches (Tempelaar,

2006).

A pen and paper survey has also been designed to assess school students’

understanding of variation (Watson, et al., 2003). Sixteen multipart questions require

short answer responses and often an explanation of the students’ reasoning, making

this instrument more difficult to score than Garfield’s SRA. Responses are then

coded and combined using an Item Response model which suggests four increasing

levels of understanding: prerequisites for variation, partial recognition of variation,

applications of variation and critical aspects of variation.

1.3.4 Student learning

Just as teaching practices necessarily influence what students learn, an improved

understanding of how students learn should likewise impact good teaching practices.

In a study of school students from grade 3 to 9, Watson and Moritz (1999) identify

two unistructual-multistructua l-relational (U-M-R) learning cycles operating in the

development of understanding relating to the comparison of two data sets.

Responses in the first cycle compare data sets of equal size but do not recognise the

issue of non-equal sample size, while responses in the second cycle recognise and

resolve this issue. Similar learning cycles are identified in the development of

students’ concept of average (Watson and Moritz, 2000).

Petocz and Reid (2001) use in-depth interviews and phenomenography to model

students’ different concepts of learning in statistics. They describe students’

conceptions in terms of six hierarchical levels: doing, collecting, applying, linking,

expanding and changing. At the lowest level, doing, students see learning as simply

performing the required statistical activities to pass the assessment. At the highest

level, changing, learning is seen as being about using statistical concepts in order to

change their views.

1.3.5 Student background

Constructivist pedagogy is based on the belief that students


11

construct their own knowledge by combining their present experiences with their

existing conceptions (Moore, 1997).

Hence it is important when modelling student learning, to consider the impact of a

number of background issues, including students’ prior statistical abilities,

mathematical skills and attitudes towards statistics.

Gal (2004) comments that:

... the current knowledge base about statistical literacy of school or university students

and of adults in general is patchy,

and points to the need for further research and empirical information. A profile of

what statistical understanding can be expected of school students in the areas of data

collection, data tabulation and representation, data reduction, and interpretation and

inference is being constructed based on Rasch modelling (Reading, 2002).

A survey of introductory statistics students in the US (Albert, 2003) shows that

students generally do not have a clear understanding of probability. While their

ability to calculate probabilities under an equally likely assumption is well

developed, they are much less able to use a frequency interpretation or a subjective

viewpoint of probability.

Konold (1995) indicates that students enter statistics courses with strongly held but

incorrect intuitions which are often extremely difficult to alter and that the situation

is further complicated by a student’s ability to simultaneously hold multiple

contradictory beliefs. Considerable time and effort has been put into documenting

and explaining a variety of misconceptions which people apply to make sense of

probabilities in everyday situations (Kahneman, et al., 1982; Shaughnessy, 1992).

The representative heuristic, by which people determine the likelihood of an event by

how well it represents a population, has received considerable attention. Under the

representative heuristic, the sequence of coin tosses HTTH is considered to be more

likely than the sequence HHHH. Hirsch and O’Donnell (2001) construct an

instrument to identify students who hold this misconception. Konold (1989) argues

that many misconceptions are the result, not of incorrect probabilistic reasoning, but

of an approach which always aims to predict the outcome of a single trial, rather than


12

the likelihood of a single event in a series of trials. He calls this the outcome

approach. Garfield and Chance (2000) include scales for measuring eight such

misconceptions in their Statistical Reasoning Assessment instrument.

Although statistics is now frequently considered not to be a subfield of mathematics,

it is nonetheless true that statistics makes heavy and essential use of mathematics

(Moore, 1997).

Gal (2004) touches briefly on the numeracy requirements for statistical literacy,

acknowledging that there is some debate among statisticians regarding the level of

mathematical skill needed to understand statistical concepts. It is acknowledged as

one of the challenges of teaching statistics that:

Many students have difficulty with the underlying mathematics (such as fractions,

decimals, algebraic formulas), and that interferes with learning the related statistical

content. (Ben-Zvi and Garfield, 2004)

However, perhaps as a result of the democratisation of mathematics (Vere-Jones,

1995), there has been little formal research into the effect of mathematical

background on the development of statistical understanding in students. Cuthbert

and MacGillivray (2003) show that gaps in assumed skills form a barrier for learning

in mathematics courses for engineers. Gnaldi (2003) shows that in a statistics course

for psychologists, the statistical understanding of students at the end of the course

depends on students’ basic numeracy, rather than the number or level of previous

mathematics courses the students have undertaken. In modelling statistics as a

second language, Lalonde and Gardner (1993) conclude that mathematical ability

influences success in an introductory statistics course for psychologists both directly

and through mathematical anxiety. An analysis of background factors of students in

a UK business statistics unit shows that there are no simple predictors of success or

failure (Pokorny and Pokorny, 2005).

Before the effect of mathematical ability on statistical understanding can be

considered, the level of such ability amongst students needs to be understood. While

studies such as those conducted by the International Association for the Evaluation

of Educational Achievement (IEA) have investigated the level of mathematical skill


13

of school students, little formal research has been undertaken into the numeracy

levels of students entering university.

1.3.6 Student attitudes

Gal et al. (1997) point to the influence of attitudes and beliefs about statistics on

student learning. A number of instruments have been designed to measure attitudes

and beliefs towards statistics. Roberts and Bilderback (1980) construct the Statistics

Attitude Survey (SAS) using a 5-point Likert scale with 34 items. Wise (1985)

argues that many items on the SAS measure past achievement rather than attitude

and are inappropriate if students have no prior statistical experience. He proposes

instead another 5-point Likert scale: the Attitudes Toward Statistics (ATS). Roberts

and Reese (1987) compare the SAS and ATS on 280 students and conclude that there

is little difference between the two instruments with a correlation of 0.88. Waters et

al. (1988) reach a similar conclusion on the basis of 302 students. Sutarso (1992)

designs a 10-point Likert scale purported to consist of six factors: student’s interest

and future applicability, relationship and impact of the instructor, attitude toward

statistical tools, self-confidence, parental influence, and initiative and extra effort in

learning statistics. A survey consisting of 28 7-point Likert-type items designed by

Schau et al. (1995) includes scales of affect, cognitive competence, value and

difficulty.

Gal et al. (1997) stress the need to develop ways, other than Likert-type scales, of

assessing attitudes. In particular they point to the need to allow students to explain

their responses. All existing instruments are constructed from the supposition that

many student difficulties with statistics are the result of anxiety. While items are

framed in both a positive and negative light, there are no items that suggest that

statistics is dull or simplistic, as more mathematically capable students sometimes

report.

According to Bandura’s (1986) social cognitive theory, self-efficacy, that is one’s

own confidence in one’s ability to succeed in a particular task, strongly influences

one’s actual ability to succeed in that task. Self-efficacy influences choices, thought

patterns, reactions, effort expended and perseverance while struggling with a task.

Pajares and Millerb (1995) provide evidence that, as judgements of self-efficacy are


14

task-specific, accurate measures of self-efficacy need to be based on task-specific

items in the same area. Finney and Schraw (2003) argue that previously used

measures of self-efficacy in statistics are too general in nature to accurately measure

the construct. They develop the Current Statistics Self-Efficacy (CSSE) and Self-

Efficacy to Learn Statistics (SELS) scales, which have been trialled on an

introductory statistics course in educational psychology. The difficulty with their

scales is that the items are so specific to the material to be covered in the course and

to which students have had no previous exposure, that scores on the CSSE

administered at the beginning of the course are predictably low and uncorrelated with

course outcomes.

1.3.7 Impact of technology

Pea (1987) describes the effect of technology on mathematics education in terms of

an amplifier metaphor, i.e. that with computers students can carry out many more

calculations in a much shorter time and with much higher accuracy, but with minimal

change to the quality of education. By comparison, Ben-Zvi (2000) proposes a more

optimistic reorganisation metaphor, arguing that appropriate use of technology can

bring about structural change by allowing the student to focus on higher order tasks.

This belief places improved use of technology in the role of a major mechanism for

the reform in statistics education which the increased presence of technology has

largely motivated, as,

The increasing use of computers, not just within the discipline but in society in general

has placed an increasing premium on qualitative reasoning in general and on statistical

reasoning in particular (Cobb, 1999).

1.4 Literacy, Reasoning, Understanding or Thinking?

The terms statistical understanding, statistical reasoning, statistical literacy and

statistical thinking are all used, often interchangeably, in the literature. Definitions

and elaborations of these terms have been described in Section 1.3.1. Because a

degree of overlap exists between these concepts and in order to simplify discussion,


15

we will use the term statistical reasoning to encompass statistical reasoning and

literacy and, at a fundamental level, statistical thinking. At its most in-depth level,

statistical thinking refers to the complexity of thinking utilised by professional

statisticians and described by Wild and Pfannkuch (1999). The statistical thinking

included in this research is restricted to the introductory level.

1.5 Structure of the Research

This study has been conducted within the context of an introductory data analysis

course delivered to undergraduate students over three cohorts: semesters I and II

2004 and semester I 2005. A small pilot study was conducted in the summer

semester prior to semester I 2004. The students’ courses are broadly associated with

science and the majority are enrolled in their first semester of tertiary education.

Ethical approval for the research was granted by the university’s ethics committee,

and students who participated did so on a voluntary basis after indicating their

agreement in writing.

Data were collected by the use of questionnaires administered during class. These

questionnaires were used to gather information on students’ mathematical and

general educational backgrounds, attitudes towards mathematics and statistics, basic

numeracy skills and statistical reasoning. Students were asked to provide their

student identification number as a means of connecting the different surveys. The

participation rate was very high amongst those students who attended classes during

the survey period. In a large course such as this there are a substantial number of

students who do not attend classes at any time during semester and a significant

number who join after the first week. Approximately seventy-five percent of

enrolled students participated in the study over the three cohorts. Some data from

course assessment are also used in part of the analysis.

Although a number of instruments exist for measuring attitudes towards statistics, it

was felt that none completely met the needs of this study as their design had been for

use in non-mathematical or non-scientific service courses. A Likert-style instrument

was designed, covering the aspects of affective reactions towards statistics; the


16

perceived value in studying statistics; the perceived difficulty of statistics; motivation

to study statistics; perceived links between mathematics and statistics; the use of

statistics in society; and students’ self-efficacy regarding mathematics and statistics.

In the area of self-efficacy, particular attention was given to constructing items which

were specifically relevant to an introductory data analysis course but also within the

students’ previous experience.

This initial Attitudinal Survey was administered in Semester I 2004 with a similar

follow-up survey at the end of semester. Due to the small number and non-

representative nature of respondents to the follow-up survey, as well as the

complexities of interpretation of the meaning of comparisons of initial and follow-up

responses, a follow-up procedure was not completed with later cohorts. As initial

analyses with the 2004 data indicated that only the self-efficacy component of the

attitudinal survey was a significant predictor of statistical reasoning, the opportunity

was taken in 2005 to adjust the survey, including items more specific to high school

experiences of statistics. It should be noted that attitudes and beliefs about statistics

do not constitute a major emphasis of this research but are included for completeness

and to improve the strength of conclusions regarding the investigation into statistical

reasoning and its relationship with variables such as numeracy and student

background.

Although there has been an increasing trend in recent years to develop and

administer diagnostic surveys to incoming tertiary students to gauge their levels of

mathematical preparation, the instruments used for these surveys have tended to be

constructed to meet specific needs. A tool designed specifically to measure the pre-

calculus skills relevant to an introductory data analysis course and appropriate at the

secondary/tertiary interface could not be located in the literature and needed to be

constructed for this survey. The multiple-choice Numeracy Questionnaire designed

for this purpose was administered at the beginning of semester to each of the three

cohorts in the study. The mathematical skills on which the Numeracy Questionnaire

focuses are those skills commonly associated with an introductory data analysis

course, namely: handling of fractions, percentages and decimals; application of

operations; evaluation of simple expressions; substitution into expressions; and


17

handling simple equalities and inequalities. Use of calculators was not permitted for

the Numeracy Questionnaire.

The Statistical Reasoning Questionnaire (SRQ) was developed specifically for this

study to assess the statistical reasoning of Australian students at the

secondary/tertiary interface. This instrument was informed by the SRA, a US

instrument which was not entirely appropriate in the Australian context, and by

research at the Australian primary and secondary school level. The short-answer,

multiple-choice SRQ was administered at the beginning of semester to each of the

three cohorts in the study with the addition of one question and substitution of

another in 2005. Use of calculators was permitted for the SRQ.

The analysis of the Numeracy Questionnaire and the construction and analysis of the

Statistical Reasoning Questionnaire, together with the analysis of relationships

between these and other variables, form the major thrust of this research.

The pressures of course requirements and the need to maximise student cooperation

prevented a follow-up Statistical Reasoning Questionnaire from being administered

as part of this study. Indeed, what would constitute a suitable follow-up statistical

reasoning instrument at the end of an introductory tertiary course is a matter for

further research. However, the performance of students in course assessment should

be considered as an important aspect of their statistical development, provided the

assessment is both authentic and relates to objectives that address statistical

reasoning. Hence the end of semester data from the course assessment are used as a

measure here, recognising that richer measures may be developed in future research.

This research uses quantitative statistical procedures to analyse relatively large data

sets. Because of the lack of previous research which considers the different aspects

of students’ development at this level, this research is investigative and exploratory

in nature. Descriptive procedures and exploratory data analysis are used to explain

the results of each questionnaire. Item response theory, in the form of Rasch

dichotomous and partial credit models are used to validate the Numeracy and

Statistical Reasoning Questionnaires and to further understand the implication of

student responses. Logistic regression is used to identify factors which determine

whether a student’s response to individual items on the Numeracy Questionnaire is


18

likely to be correct or incorrect. Modelling of numeracy and statistical reasoning is

done through general linear models, the use of which is substantiated through

residual analysis. More complex modelling procedures such as structural equation

modelling are considered inappropriate without some form of pre-existing model.

1.6 Outline of the Thesis

Chapter 2 describes the context of this research in terms of the course, MAB101

Statistical Data Analysis 1, and the students who are the subjects of the study. The

course is outlined, including its purpose, aims, content and methods; and the student

population of MAB101 is described through the information obtained from the

Background Information Survey completed by students as part of the study.

In Chapter 3, the construction and findings of the Attitudinal Surveys are described.

Findings from the initial 2004 survey demonstrate a generally positive attitude

towards statistics with students being confident of the links that exist between

mathematics and statistics. They are, however, suspicious of its use in society and

demonstrate internal conflict regarding their motivation to study it although their

confidence in their ability to do so is high. A follow-up survey which was completed

by fewer students appears to indicate that the attitudes of this smaller group tend to

be less positive at the end of the semester than they are at the beginning but there are

difficulties with the interpretation of follow-up results. These and the reasons for not

repeating the follow-up procedure in 2005 are also discussed. In an adjusted 2005

survey, the belief of many students is that statistics at school is beneficial only in the

final two years of high school.

Chapter 4 develops and analyses a Numeracy Questionnaire to measure the level of

basic numeracy possessed by students at the interface of secondary and tertiary

education, embarking on a degree program associated with science. The structure of

this multiple-choice questionnaire, emphasising skills needed for an introductory

data analysis subject, is explained and the results of the questionnaire amongst

MAB101 students over two years are presented and discussed. Validity of the

questionnaire is confirmed through fitting a dichotomous Rasch model to the data.


19

This model is also used to define five levels of increasing understanding

demonstrated by the students. General linear models are used to show that students’

total scores on the Numeracy Questionnaire are related to their result in high school

mathematics, whether or not they can be considered to be a maths student, their

gender, the level of mathematics they have previously studied and their self-efficacy.

Logistic regression is also applied to each individual item on the Numeracy

Questionnaire to identify which of these factors are significant indicators of success

or failure for the item. This chapter demonstrates how tertiary educators need greater

awareness of the lack of emphasis on mathematical skills in the pre-senior years, and

the extent of consolidation needed at the senior school level for students to be able to

apply these skills at the tertiary level. The results of this chapter are of significance

in themselves and have been accepted for publication as a paper Counting on the

Basics: Mathematical skills amongst tertiary entrants (Wilson and MacGillivray,

2007).

In Chapter 5, we introduce an instrument, the Statistical Reasoning Questionnaire

(SRQ), for use at the interface of secondary and tertiary education. Construction of

the SRQ is described, drawing on the work of Garfield (1991; 2003), and Watson

and Callingham (2003). Particular attention is given to avoiding questions based on

combinatorial reasoning and those relying on examples of coin tossing and dice

throwing. Complete details of the responses of the students in the study to the

individual items of the SRQ are given, demonstrating the range of abilities present in

students at the secondary/tertiary interface.

In Chapter 6, the Statistical Reasoning Questionnaire is analysed using the

techniques of Rasch methods. Two different approaches are taken. In the first

approach, responses are scored dichotomously and the simple Rasch model fitted as

with the Numeracy Questionnaire in Chapter 4. This dichotomous approach is used

as a basis for the definition of the SRQ Score. In the second approach, responses are

scored polychotomously and the more complex Rasch partial credit model fitted to

the data. This model forms the foundation for the introduction of a score which we

will denote by the SRQPC Score. The construction of this SRQPC Score is a

significant development and extension of the work of Watson and Callingham (2003)

and applies their framework of statistical literacy in a new scoring approach. The


20

Rasch partial credit model is also used to investigate the expected responses of

students to individual items in the SRQ, indicating questions which are heavily

influenced by the level of mathematics previously studied. This chapter is concluded

with a critical examination of the SRQ confirming its suitability for use as an

instrument for measuring statistical reasoning at the secondary/tertiary interface.

Chapter 7 specifically addresses the major aim of this study in better understand ing

factors which influence statistical thinking at the secondary/tertiary interface. In this

chapter, two distinct aspects of this are considered. The first of these examines

performance on the Statistical Reasoning Questionnaire for incoming students, using

general linear models to describe each of the SRQ Score and SRQPC Score in terms

of students’ numeracy, attitudes, mathematical backgrounds and demographic

variables. The major implication of this modelling is the usefulness and importance

of the students’ Numeracy Scores in explaining statistical reasoning. This feature is

such that numeracy dominates the prediction in nearly all models, with both

introductory and intermediate components being significant. The second aspect

considers students’ performance on the end of semester section of the assessment. In

this case, modelling is dominated by the effect of tertiary entrance score with the

SRQPC Score helping to explain the result in a way the SRQ Score does no t. The

relationship between numeracy and individual items of the SRQ is also investigated

more closely, indicating that the link between numeracy and statistical reasoning is

not restricted to items which are more difficult or more obviously mathematical by

nature.

The thesis concludes with a summary of the research and its implications for the

teaching, assessment and advising of students. Possibilities for further research are

considered.

Chapter 2 The Context of the Study

21


2.1 Introduction

This study has been conducted within the context of an int roductory data analysis

course delivered to undergraduate students. In this chapter, this context is described

in terms of the course and the students. Section 2.2 outlines the course, MAB101

Statistical Data Analysis 1: its purpose, aims, content and me thods. Section 2.3

describes the student population of MAB101 through the information obtained from

the Background Information Survey completed by students as part of this study.

2.2 The Course: Statistical Data Analysis 1

Statistical Data Analysis 1, MAB101, is delivered by the School of Mathematical

Sciences at the Queensland University of Technology. It services students enrolled

in a range of science-oriented degree programs including: Mathematics, all the areas

of Applied Science, Biotechnology Innovation and Education; and double degree

programs involving mathematics or science, such as: Applied Science/Mathematics,

Applied Science/Information Technology, Applied Science/Law, Applied

Science/Business, Applied Science/Education, Arts/Applied Science,

Mathematics/Information Technology, Engineering/Mathematics and

Mathematics/Business. For almost all students in these programmes, MAB101 is a

compulsory subject, most commonly scheduled for the first year and with the biggest

groups in the first semester of tertiary study. The course is given thrice each year,

including a summer semester with small numbers of students. In 2004, 450 students

enrolled in it in first semester and 130 in second semester, while in 2005, 330

students enrolled in it in first semester, and 180 in second.


22

MAB101 is unusual in the sense that while it acts as a service course for most

students, for others it is a core course in their major field of study. Students enrolled

in an applied science degree majoring in biochemistry, for example, may be unlikely

to fit in any other mathematics or statistics course during their three year program,

while those undertaking a Bachelor of Mathematics, a Bachelor of Applied Science

majoring in Mathematics, or a double degree incorporating the Bachelor of

Mathematics, may complete up to half of their mathematics program in statistics.

This dual role means that MAB101, by necessity, aims to equip all students with the

tools necessary to apply basic statistical analyses to data drawn from a range of

contexts, with minimal mathematical derivation, while providing a foundation for

techniques studied in further statistics courses.

According to the official course description, MAB101 aims to provide students with

the essential grounding in statistical concepts, methods and analysis of data suitable for

application to real issues and as a basis for handling data and variation in all areas of

modern science, technology, industry and associated fields.

It builds on the statistics component of high school mathematics, commencing with

the organisation, exploration and presentation of data. It emphasises choice of

appropriate techniques for presentation and analysis of data, interpretation of results

and reporting of conclusions. Key statistical skills and concepts are provided as a

foundation for more advanced statistics courses.

MAB101 teaches introductory data analysis from exploratory data analysis through

to multiple regression. Aspects of probability included in the course are limited to

the estimation of probabilities from data; understanding p-values; and calculating

normal probabilities. Probability and basic distribution theory are covered in a

separate course taken by fewer students. Material which is frequently taught in an

introductory data analysis course but is not included in MAB101 includes:

probability rules, distributions and mathematical aspects of introductory statistics.

All ANOVA and regression in MAB101 are done through computer software. Also,

models and simple formulae are made available in words or symbols with students

free to choose their preferred form. Table 2.1 lists the topics which are covered in


23

the course together with the approximate number of 50 minute lectures spent on each

section of material.

Topic Lectures Planning investigations, collecting, handling and presenting data: types of data; types of variables; coding data; design of data spreadsheet; bar charts; pie charts; contingency tables; dot plots; histograms; stem & leaf plots; boxplots; scatterplots.

4

Data features & summary statistics: mean; median; quartiles; standard deviation; variance; skewness; parameters, models & estimates.

2

Chi-squared tests (used to introduce hypothesis testing & p-values): testing proportions; testing independence.

3

Normal data and applications of normal distribution: working with normal probabilities; behaviour of sample statistics, particularly average.

3

Interval estimation: confidence interval for µ, s known and unknown; confidence interval for µ1 - µ2, s 1, s 2 known; confidence interval for µ1 - µ2, s 1 = s 2 unknown; confidence interval for µ1 - µ2, paired; confidence interval for p; tolerance intervals; calculating sample size for desired precis ion.

5

Hypothesis testing for one or two means, variances and proportions: for µ = µ0, s known and unknown; for µ1 - µ2 = d, s 1, s2 known; for µ1 - µ2 = d, s 1 = s 2 unknown; for µ1 - µ2 = d paired. for s 1 = s 2; for p = p0; for p1 = p2; for s = s 0.

4

ANOVA: one way; randomized blocks; two way, interaction, plots, general linear model; multiple comparisons; residuals, test for equal variance.

6

Regression: simple linear; residual plots and diagnostics; multiple regression.

6

Table 2.1 Topics covered in MAB101


24

The content of MAB101 is presented to students in a manner that is rich in both data

and contexts. All topics are introduced, explained and illustrated using examples

which are of interest to students and with features relevant to the wide range of

disciplines from which the students are drawn. Most of these examples use data sets

which have been collected by past students as part of their assessment. These have a

range of variables and any particular data set is often used in different sections of the

course to illustrate selection as well as application of techniques, and the component

parts of a complex data investigation.

Classes in MAB101 consist of three 50 minute lectures per week and one 50 minute

practical class. Lectures build on the textbook Data Analysis: Introductory methods

in context (MacGillivray, 2004) and any PowerPoint slides or extra examples or

discussion are made available to students online after lectures have been delivered.

Much lecture time is devoted to examples, questions and computer demonstrations.

A weekly online diary keeps students informed of the activities carried out in classes.

In practical classes the focus is on students applying the methods they have learned

to given data sets using the statistical package Minitab. Ample guidance is provided

in written form and by experienced tutors. Students are also actively encouraged to

attempt the exercises given in the textbook (which either do not rely on computer

usage or are exercises on using given computer output), with solutions provided

online progressively throughout the course. Students are continually reminded that

learning in statistics comes only via doing, and course delivery and assessment are

structured to encourage students to take this approach. Analysis of variance and

regression are covered only through use of statistical software.

Assistance with the core learning of the course is provided through continuous

assessment, namely: fortnightly quizzes, a mid-semester test and a group project.

The full assessment schedule for 2004, 2005 is shown in Table 2.2. Quizzes consist

mainly of multiple-choice and fill- in-the-blank questions, but these questions are

carefully designed to emphasise the selection, application and interpretation of

techniques. Because the main purpose of the quizzes is to encourage learning, after

the first quiz, students are allowed up to five days to complete each quiz. The mid-

semester test, like the end of semester exam, consists of questions which are longer

than, but similar in style to, those used in quizzes. All the assessment is constructed


25

around real datasets with computer output to be used in answering questions. As the

datasets usually involve many variables, a single dataset is often used more than once

to assess different aspects of the course.

A significant facilitator of learning in MAB101 is the group project. Groups of three

to four students choose a topic, plan an investigation, collect and analyse data and

report on their study. Students are required to synthesise the concepts and techniques

presented in the course as they work through the planning, investigating, analysing

and reporting stages of the project. The benefits to learning of this style of task have

been reported elsewhere (MacGillivray, 1998, 2002; 2005). A copy of the project

description and criteria is included in Appendix H.

Assessment Weight

Fortnightly quizzes best 5 out of 6 10% total

Mid-semester test 10%

Group project 20%

End of semester exam max 60%

Optional essay optional 10%1

Table 2.2 Assessment tasks in MAB101 in 2004, 2005

The topics listed in Table 2.1 describe the statistical content which forms the basis of

MAB101. The methods and techniques of this content are the fundamental skills

which students who complete the course with a sound passing grade can be expected

to take from the course and with them into their own area of study. The statistical

skills assessed within the course constitute the measurable (Broers, 2006) and

therefore assessable goals of the course.

The skills assessed within MAB101 incorporate statistical thinking, although the

degree to which this is acquired by students depends on whether they participate in

1 If a student chooses to submit an optional essay (on aspects of how statistics revolutionised science in the twentieth century) it is worth 10% and the exam is scaled to 50%, provided this advantages the student.


26

all the learning experiences, as well as their inherent capabilities and background.

The content and data rich environment described above is the means by which

statistical thinking is taught. Varied contexts provide the opportunities to develop an

appreciation of the omnipresence of variation as well as a “general awareness and

critical perspective” (Gal, 2004) of the use of statistical methods. The use of real

data sets, particularly including numerous variables extraneous to the method being

illustrated, encourages in the student’s mind the formation of links and relationships

between concepts and techniques. This in turn develops a mindset which is, in

essence, statistical thinking.

2.3 Demographics and Educational Backgrounds of Students

Data for this study was collected from a total of 782 MAB101 students (382 in

semester I 2004, 103 in semester II 2004, and 297 in semester I 2005) who provided

identifying information and completed at least one survey instrument. Of these

identifiable students, participation rates for the Background Information Survey (see

Appendix A) completed in classes during the first week of semester were: 90% in

first semester 2004, 67% in second semester 2004, and 82% in first semester 2005.

Demographic information such as gender and course enrolment could be obtained for

most students who had not completed the Background Information Survey, while

information such as mathematical background could not.

In all three cohorts there were similar numbers of male and female students with

slightly more males in both first semester groups and slightly more females in the

second semester group. (See Table 2.3.)

Gender I_04 II_04 I_05 Total

female 49 52 47 48

male 51 48 53 52

Table 2.3 Percentage of male and female students for each cohort


27

Students were drawn from 36 different university programs. In Table 2.4, these are

shown by cohort and have been classified as Education, including double degrees

involving education (Edu); Mathematics, including double degrees one of which is

mathematics, and Applied Science, maths major (Mat); Other double degrees

(ODD); Applied Science degree, non-maths major (ASn); Biotechnology (Bit); Other

(Oth). Over half the students are non-maths majors enrolled in an Applied Science

degree.

I_04 II_04 I_05 Total Program type male female male female male female (%)

Edu 23 11 8 5 12 17 9.7

Mat 42 20 12 0 36 21 17.2

ODD 7 9 8 9 10 13 7.2

ASn 101 131 19 39 79 73 56.5

Bit 16 14 0 1 13 13 7.4

Oth 4 0 2 0 7 2 1.9

Table 2.4 Numbers of students surveyed, by program type, gender and cohort

Students were also classified as maths or non-maths students, based on their program

and specific subject enrolment. This variable, used in analyses in Chapters 4 and 7,

indicates essentially an interest in mathematics and probably an intention for further

study in the area. As well as those classified in Table 2.4 as being in a Mathematics

program, approximately 85% of the education students and a small number of other

double degree and other students, were classified as maths students. Over all

cohorts, 26% were classified as maths students, the figure ranging from 23% in

second semester 2004 to 28% in first semester 2005. Over the three cohorts there is

dependence between gender and maths students (p-value < 0.001) with 64% of maths

students and 47% of non-maths students being male. There is however substantial

variation between cohorts with the percentage of maths students who are male

ranging from 56% in first semester 2005 to 79% in second semester 2004.


28

Per

cent

Cohort

YSS

II_04

I_05

I_04

>106-103-5210

>106-103-5210

>106-103-5210

60

50

40

30

20

10

0

Years Since School

Percent within levels of Cohort. Miss ing valu es excluded. Figure 2.1 Years since graduating from high-school, by cohort

Cohort

OP

II_04(62)I_05(211)I_04(197)

18

16

14

12

10

8

6

4

2

0

Program

OP

Do uble_ deg (38)Maths (87)Educat io n(43)Biotech(33)Sci_non -maths(260)

16

14

12

10

8

6

4

2

0

Figure 2.2 OP scores reported by students for each cohort and each program

type. Smaller OP values indicate higher achievement. Values in brackets represent number of students responding to this item.


29

Students were asked in the Background Information Survey to report the year in

which they completed high school. Figure 2.1 illustrates the responses for those 76%

of identifiable students who completed this item. In the two first semester cohorts,

55% to 60% of students had completed high school in the previous year with another

18% in their second year out of school. In the second semester 2004 cohort fewer

students (45%) had completed high school in the previous year and 23% in the year

before. This difference is due largely to the different scheduling of MAB101 in

some programs or to students changing courses. In all three cohorts approximately

97% of students had completed 12 years of schooling and almost 90% had attended

high school in Queensland. For approximately 75% of students in both first semester

cohorts this was their first semester at QUT.

Tertiary entrance in Queensland is determined largely on the basis of a single score,

the OP (overall position) score. OP scores range from 1 to 25, with 1 being the

highest score. As part of the Background Information Survey, students were asked to

record their OP score or an equivalent measure. Scores which were reported in

another form (e.g. tertiary entrance scores from other Australian states) were

converted to an OP equivalent using standard national information (QUT, 2004).

Because the survey used in first semester 2004 was an adaptation of a form routinely

completed by students, this section was described as optional and hence a number of

students who completed the Background Information Survey in the first cohort did

not report their OP score. The non-response rate for this item was 43% in that cohort

whereas it was only 10 to 15% in the other two cohorts. While the indications are

clear that this non-reporting of OP scores is non-random, it is difficult to predict the

exact nature of the bias. Although it appears that large numerical scores are

underreported, this cannot be assumed in any formal analysis. For this reason

considerable care needs to be exercised for this particular cohort when interpreting

any statistical models involving OP scores. (See Chapters 4 and 7.) Figure 2.2

shows the distribution of reported OP scores for each cohort and according to the

program division used in Table 2.4, except for the small group (9) of Others.

The median OP for each of the three cohorts is 7. The largest OP cut-off for any of

the programs represented in MAB101 is 13 (for an applied science degree and some

education degrees). Most programs have a cut-off of 10 or 12 but some courses


30

represented here have cut-offs as tight as 4 (for a science/law double degree).

Students reporting an OP score outside these cut-offs are generally mature age

students for whom special entry rules apply. Note that the median OP for students

undertaking a mathematics program is 3 although the cut-off for this course is 12.

Per

cent

Cohort

level of math s

II_04

I_05

I_04

above_B

Ma th

s _B

below_ B

above_B

Math

s_B

below_ B

above_B

Math

s_B

below_B

60

50

40

30

20

10

0

Percen t within levels o f Cohort. Figure 2.3 Level of maths reported as having been studied for each cohort.

Per

cent

Coho rtMathsB_Result

II_04I_05I_ 04NPDNPDNPD

60

50

40

30

20

10

0

Percent within levels of Coho rt. Figure 2.4 Maths B results by cohort


31

Students were also asked to report the level of mathematics which they had studied at

high school and any mathematics courses which they had studied since leaving

school. A wide variety of mathematical backgrounds were reported. For the purpose

of this study, these backgrounds have been summarised by comparison with the core

Queensland algebra and calculus based subject, Maths B. Maths B, or an equivalent

standard is assumed, but not enforced, prior knowledge for entry to the Science

faculty and hence into MAB101. Students are classified as having studied a level of

mathematics below Maths B, at Maths B or above Maths B. Students classified as

having studied above Maths B had either taken an extension mathematics subject in

high school (e.g. Maths C in Queensland) or had already studied some tertiary level

mathematics. The results of this classification are shown by cohort in Figure 2.3.

For the two first semester cohorts, (of the 92 to 95% of students from whom this

information was provided) 4 to 6% were classified as below Maths B, 38% above

Maths B and the remainder at Maths B. For the second semester 2004 cohort, (of the

88% who responded) only 2% were below Maths B, 50% at Maths B and 48% above

Maths B.

It was also noted, from responses to the survey question on previous mathematics

courses, which students had some prior exposure to a statistics course, with the

percentage being 6 to 7% for each cohort. This information however has not been

used in any of the analyses of statistical reasoning as the level of statistics exposure

in this group is exceedingly variable and includes courses which cover little more

statistics than the standard high school curriculum with Maths B, as well as students

who had previously enrolled in MAB101 but withdrawn or failed. It should be noted

that most of the “failures” in this course are students who attend little of the course.

Results of high school mathematics subjects (and higher subjects where they had

been studied) were also requested from students and provided by 80% of identifiable

students. These results were variously expressed depending on the particular subject

studied. The most convenient classification for use in analyses was to describe the

Maths B (or equivalent) result as one of three levels: D, for students obtaining a

distinction or higher (the top two levels on a school-based five-point scale); P, for

other passing grades; N, for those students who had failed to pass or not studied

Maths B. These results are shown in Figure 2.4.


32

For all three cohorts the proportion of distinctions is between 55 and 60%. Not

surprisingly there are strong relationships between the Maths B result and both the

level of maths studied and whether or not a student is a maths student. For maths

students the proportion obtaining a D as a Maths B result is 87% and for students

who have studied beyond Maths B the proportion is 82%.

Combining the information on all demographic and background variables, it is

apparent that a good deal of consistency exists between the two first semester

cohorts. The variable which notably departs from this trend is the reported OP score

and, as has been described, this departure is clearly due to non-random, non-

reporting by students in semester I 2004. In most aspects, the semester II cohort

departs more noticeably but not substantially from the other two cohorts. This cohort

was also considerably smaller than the other two.

Chapter 3 Attitudes Towards Mathematics and Statistics

33

Chapter 3 Attitudes Towards Mathematics and

Statistics

3.1 Introduction

Gal and Ginsberg (1994) assert that many students arrive at statistics courses with

affective reactions and attitudes towards statistics which impact upon both their

learning process and learning outcomes, and argue the importance of educators being

sensitive to these factors. The purpose of including measures of non-cognitive

factors in this study was firstly to improve understanding of the profile of students

entering the introductory data analysis course, MAB101, and secondly to investigate

the possible impact of these factors on statistical learning.

While authors such as McLeod (1992) enunciate differences between the concepts of

attitudes and beliefs towards mathematics, this study follows the nomenclature of

most statistical education research, using the term attitudes to include both attitudes

and beliefs. The understanding is that these are relatively stable responses which

involve both positive and negative feelings (unlike emotions which are generally

unstable).

A number of instruments exist for measuring attitudes towards statistics. The

Statistics Attitude Survey (Roberts and Bilderback, 1980), the Attitudes Towards

Statistics (Wise, 1985), the Students’ Attitudes Towards Statistics (Sutarso, 1992)

and the Survey of Attitudes Towards Statistics (Schau, et al., 1995) are all based on

five to ten point Likert scales which claim to measure one or more constructs related

to a student’s attitudes towards statistics. Although some debate has surrounded

these as to which is the best scale, it was felt that none completely met the needs of

this study. In general these instruments have been designed for use in non-

mathematical and frequently non-scientific service courses where students may be

expected to demonstrate a considerable level of mathematical anxiety. Accordingly


34

these scales fail to include items which allow for a student’s frustration with lack of

mathematical connections as may be experienced by more mathematically capable

students in the context of early statistical experiences. As MAB101 is delivered by a

mathematical sciences department and services mathematics majors and mostly

scientifically-based students, it was considered important to gauge the full range of

student attitudes in this study.

This chapter describes the construction and results of the Attitudinal Surveys

administered to students in this study. Section 3.2 explains the survey conducted in

2004 while in Section 3.3 the subsequent findings are described. A follow-up

Attitudinal Survey was administered at the end of the semester in 2004, the results of

which are explained in Section 3.4. Section 3.4 also explains the difficulties with the

follow-up results and the reasons for not repeating a follow-up in the following year.

As only one component of the 2004 Attitudinal Survey, that of self-efficacy, was

significant in the analysis of Statistical Reasoning (described in Section 7.2), it was

decided to adjust the Attitudinal Survey in 2005, retaining only the self-efficacy

component and replacing the other items with a selection of items which focuses on

statistical experiences at the school level. The 2005 Attitudinal Survey is described

in Section 3.5 and the responses analysed in Section 3.6. Section 3.7 concludes this

chapter with a discussion of the results.

3.2 Construction of the 2004 Attitudinal Survey

Existing surveys of attitudes towards statistics are all formulated in the style of Likert

scales, from five to ten point. In a Likert scale respondents are requested to rate their

level of agreement with a selection of statements. Although Gal et al. (1997) suggest

a move away from this style of instrument, the ease of administration and scoring,

together with the need to maximise the cooperation of students who were being

requested to complete a number of questionnaires, made a Likert scale the instrument

of choice for this study. Students were asked to rate their level of agreement with

nineteen items from strongly disagree through to strongly agree. Space was provided

on the reverse side of the form for students to explain their responses. However,


35

very few students took this opportunity and the majority of those who did added very

little information to what they had supplied via the ratings. Hence all analysis and

discussion of responses in this chapter are restricted to the Likert scale responses.

The Survey of Attitudes Toward Statistics (Schau, et al., 1995) measures attitudes on

the basis of four separate subscales: affect, value, difficulty and cognitive

competence. These subscales directed the construction of the 2004 Attitudinal

Survey used in this study. Our survey includes items focusing on affect and value,

combines cognitive competence with difficulty, and also includes motivation,

perceived links between mathematics and statistics, use and self-efficacy. The

complete 2004 Attitudinal Survey can be found in Appendix B.

As is the custom with Likert scales, the survey contains a mix of positively and

negatively worded items to discourage unthinking responses. When calculating

scores for groups of items, scoring is reversed for those items which are negatively

worded.

Measures of affect focus on the feelings and emotions which tend to be engendered

in the student by statistics. In the 2004 Attitudinal Survey, the following three items

measure affect:

• Statistics is boring (A_1);

• I don’t like statistics because there never seems to be a right or wrong answer

(A_5);

• I feel insecure when I have to do a statistics problem (A_6).

Item A_5 investigates the feelings that students might have regarding the uncertain

nature of statistics. Sometimes students with a strong mathematical or scientific bent

entertain a black and white approach to the world and so feel uncomfortable with the

degree of subjective interpretation sometimes required in data analysis.

Attitudes regarding value reflect the importance or worth which students attribute to

their learning of statistics. This may be in relation to their field of study, their future

employment prospects or perhaps their daily life. In this survey, value is measured

by four items:


36

• Statistics will be valuable in my chosen career (A_4);

• Statistical skills will make me more employable (A_8);

• I use statistics in my everyday life (A_9);

• Understanding statistics is important in modern society (A_12).

Schau et al. (1995) differentiate between cognitive competence and difficulty. They

use cognitive competence to describe how difficult the individual personally finds

statistics, while difficulty describes the individual’s broader perception of the

difficulty and complexity of statistics. In formulating the 2004 Attitudinal Survey

one item was chosen from each of these aspects and the two have been combined

under the broad heading of difficulty. The two items are:

• I find statistics easy (A_3);

• Statistics is a complicated subject (A_7).

A further attitudinal aspect which is investigated by the survey is the students’

motivation to learn statistics. Two items relate to this aspect:

• I want to learn more statistics (A_11);

• I am taking this statistics unit only because I have to (A_14).

From the apparent contradictory nature of these items, one might expect a high

degree of negative correlation between the responses.

A further aspect of attitude which was considered relevant to this study was the link

which students perceive as existing between mathematics and statistics which we

measured via the items:

• I would do better at statistics if I were better at maths (A_10);

• If you are good at maths you are more likely to understand basic statistical

concepts (A_13).

As the growth of statistics as a discipline in its own right has taken place, and the

emphasis in statistical education on reasoning rather than computation has


37

developed, the impression is sometimes given by researchers that any association in

the minds of students between mathematics and statistics is likely to be a barrier to

statistical learning. For example, Gal et al. (1997) refer to:

Beliefs about the extent to which statistics is a part of mathematics or requires

mathematical skills (e.g. statistics is all computations).

This assumption and its logical consequences need closer examination. For students

who have a positive attitude towards mathematics, such a link surely encourages a

positive attitude towards statistics.

One item has been included in the survey to measure students’ attitude towards the

use of statistics in society. This item

• Statistics can be used to justify almost anything (A_2),

constitutes the aspect use.

The final five items of this survey are aimed at obtaining a broad measure of

students’ self-efficacy regarding mathematics and statistics. Self-efficacy is one’s

confidence in one’s own ability to succeed in a particular task. As outlined in

Chapter 1, the emphasis in the literature on the need for measures of self-efficacy to

be task-specific has led to previous measures consisting largely of items with which

beginning students would be unfamiliar. This has been avoided in the Attitudinal

Survey by focusing on students’ perception of their ability regarding aspects crucial

to an introductory statistics course, to which they should have had previous exposure,

as well as more general aspects of ability. The five self-efficacy items are:

• I am good at maths (A_15);

• I expect to do well in this unit (A_16);

• I am not confident of my ability to read and interpret information presented

graphically (A_17);

• I expect to be able to do the computing necessary for this unit (A_18);

• I expect to have trouble determining which procedure to use to answer

questions (A_19).


38

For each of the nineteen items on the 2004 Attitudinal Survey, students were asked to

rank their level of agreement on a five-point Likert scale from strongly disagree

through to strongly agree. When scores were formulated from the survey, responses

were scored as -2, for strongly disagree, through to 2, for strongly agree, with the

scoring reversed for items which are negatively framed. Scores were calculated for

each of the seven attitudinal aspects described above by averaging over the items in

that aspect. If a general attitude scale is desired, it can be calculated as the total score

over the aspects: affect, value, motivation, use and one of the two difficulty items2.

We believe that the links aspect should not be included in such a scale as we feel

there is some question over which direction constitutes a more positive general

attitude. Self-efficacy is possibly best kept as a separate scale.

3.3 Results from the 2004 Attitudinal Survey

As with all instruments in this study, the 2004 Attitudinal Survey was completed by

MAB101 students in class during the first week of semester. The survey was

completed by 301 students in first semester and a further 94 students in second

semester. Despite the differences between these cohorts described in Chapter 2, the

pattern of response for most items in the Attitudinal Survey was very similar for the

two cohorts. In order to discuss differences where they do occur, the two cohorts

have been kept separate in the discussion which follows.

Figures 3.1 to 3.7 illustrate the distribution of responses for each cohort within each

item. Items are grouped by aspect. The ordering of responses in the plots (from

strongly disagree through to strongly agree, or from strongly agree through to

strongly disagree) has been chosen so that, for each item, responses from left to right

demonstrate what is generally considered to be an increasingly positive attitude. For

the link items, which could be ordered in either direction, greater acknowledgement

of a link between mathematics and statistics has been taken as a more positive

attitude.

2 Item A_7 which states: Statistics is a complicated subject, is possibly best excluded as it could be seen to possibly reflect greater exposure to statistics rather than a negative attitude.


39

3.3.1 Affective responses of students towards statistics

Figure 3.1 indicates that the feelings engendered by statistics are generally positive,

with one item, A_1 “Statistics is boring” having a very strong neutral tendency.

Both the neutrality of this item and the general positivity of the affect aspect are a

little less pronounced in second semester than in first.

Coh

ort

SDDNASA

I_04

II_04

Each symb ol rep resents up to 5 observations .

A_1 Statistics is boring

Coh

ort

SDDNASA

I_04

II_04


A_5 I don't like statistics because there never seems to be a right or wrong answer

Figure 3.1 continued over


40

Coh

ort

SDDNASA

I_04

II_04


A_6 I feel insecure when I have to do a statistics problem

Figure 3.1 Responses for the affect aspect of attitudes tend to be positive.

3.3.2 Students’ perceived value in studying statistics

Figure 3.2 indicates that these students have a strong positive attitude regarding the

value of statistics in society. This is particularly evident with respect to the value of

statistics for future employment and in modern society. However, this positivity

does not extend to the value of statistics in daily life. The pattern of response to this

item (A_9) is interesting in its even split among “disagree”, “neutral” and “agree”,

rather than a strong neutral tendency. This item demonstrates the diversity of

appreciation which students have for the role of statistics in everyday life. There are

no apparent differences in the value aspect between the two cohorts.

Coh

ort

SAANDSD

I_04

II_04


A_4 Statistics will be valuable in my chosen career



41

Coh

ort

SAANDSD

I_04

II_04


A_8 Statistical skills will make me more employable

Coh

ort

SAANDSD

I_04

II_04


A_9 I use statistics in my everyday life

Coh

ort

SAANDSD

I_04

II_04


A_12 Understanding statistics is important in modern society

Figure 3.2 Responses for the value aspect of attitudes are generally positive.


42

3.3.3 The motivation of students to learn statistics

Figure 3.3 indicates that the students’ attitudes regarding motivation are somewhat

contradictory. While there is strong agreement with the statement “I am taking this

statistics unit only because I have to,” there is also a relatively positive response to “I

want to learn more statistics.” Given that MAB101 is a compulsory unit for most of

its students, the first of these is not surprising and the second should therefore be

taken as an encouraging approach in spite of this compulsion. There is some

suggestion that the proportion of students responding “strongly agree” to the first of

these items (A_14) is slightly higher in second semester than in first.

Coh

ort

SAANDSD

I_04

II_04


A_11 I want to learn more statistics

Coh

ort

SDDNASA

I_04

II_04


A_14 I am taking this statistics unit only because I have to

Figure 3.3 Responses for the motivation aspect of attitudes are somewhat

contradictory.


43

3.3.4 The links which students perceive between statistics and mathematics

Figure 3.4 indicates that these students are in no doubt as to the links that exist

between mathematics and statistics. This is strongly evidenced in the non-personal

item A_13. The less positive personal link in A_10 should be interpreted in the light

of responses to A_15 “I am good at maths” (see Figure 3.7). Given that the students

generally see themselves as being good at maths, agreement that they would do

better at statistics if they were better at maths should be interpreted as

acknowledgement of a strong link between the two.

Coh

ort

SAANDSD

I_04

II_04


A_10 I would do better at statistics if I were better at maths

Coh

ort

SAANDSD

I_04

II_04


A_13 If you are good at maths you a re more likely to understand ba sic statistical conce pts

Figure 3.4 Responses for the links aspect of attitudes are strongly positive.


44

3.3.5 How students see the use of statistics in society

Figure 3.5 indicates a generally negative attitude towards the use of statistics in

society. Because of the wording of the question, this attitude is probably best

interpreted as one of cynicism. C

ohor

t

SDDNASA

I_04

II_04


A_2 Statistics can be used to justify almost anything

Figure 3.5 Responses for the use aspect of attitudes are negative.

3.3.6 Students perceived difficulty of statistics

Figure 3.6 indicates a relatively neutral attitude with regard to difficulty. The

attitudes are a little more negative with regard to the complexity of the subject than

the personal experience of it.

Coh

ort

SAANDSD

I_04

II_04


A_3 I find statistics easy



45

Coh

ort

SDDNASA

I_04

II_04


A_7 Statistcs is a complicated subject

Figure 3.6 Responses for the difficulty as pect of attitudes are relatively neutral.

3.3.7 Self-efficacy of students in statistics and mathematics

Figure 3.7 indicates that these students have a decidedly positive self-efficacy

regarding mathematics and statistics. The single item A_19 regarding ability to

choose a correct procedure has a neutral response. In general the responses are very

similar for the two cohorts although the second semester group is a little more

confident regarding their ability to do well in the subject. This is most likely due to

greater experience at university.

Coh

ort

SAANDSD

I_04

II_04


A_15 I am good at maths



46

Coh

ort

SAANDSD

I_04

II_04


A_16 I expect to do well in this unit

Coh

ort

SDDNASA

I_04

II_04


A_17 I am not confident of my ability to read a nd interpret information presented graphically

Coh

ort

SAANDSD

I_04

II_04


A_18 I expect to be able to do the computing necessary for this unit



47

Coh

ort

SDDNASA

I_04

II_04


A_19 I expect to ha ve trouble de termining which proce dure to use to answe r questions

Figure 3.7 Responses for the self-efficacy aspect of attitudes tend to be positive.

In summary, the students demonstrate a generally positive attitude towards statistics.

They are, however, suspicious of its use in society and demonstrate internal conflict

regarding their motivation to study it. The first and second semester cohorts of 2004

had very few and only minor differences between them.

3.4 2004 Attitudes Follow-up

In first semester 2004, students were asked to complete a follow-up attitudes survey

during the final week of the course. The form of this was identical to the initial

survey apart from minor changes in wording which made the survey appropriate to

this timing. The complete follow-up survey is found in Appendix C.

While 301 students had completed the survey at the beginning of the semester, only

194 completed the follow-up, 56 of whom could not be matched with their initial

data. Hence there were only 138 students for whom initial and end of semester

attitudes could be compared.

Comparing final and initial attitudes of this smaller group of students showed that the

students’ attitudes tended to be less positive at the end of the semester than they had

been at the beginning. For each of the attitudinal aspects, affect, value, motivation,

links, difficulty and self-efficacy, the follow-up score was significantly less than the


48

initial score (p<0.001 from pairwise comparisons within each aspect). The only

aspect for which this was not the case was the single item aspect use where the

follow-up score was just significantly greater than the initial score (p=0.05), possibly

reflecting a less cynical attitude towards statistics.

However, the general approval rating for the course is high, particularly for a

compulsory course, with 65% of students rating the course as good or very good and

a further 31% rating it as satisfactory in the official QUT student evaluation of the

course.3 Given this apparent inconsistency, it is important to consider more carefully

how to interpret the follow-up results.

One issue which should be considered is the change during the course in students’

perspectives. Section 3.3 described the generally positive attitudes of the incoming

students. Observation and informal feedback from students during semester has

indicated that students tend to find during the course that there is more to statistics

than they had previously believed. This belief may have been encouraged by a

repetitive and relatively trivial coverage of statistics at school level. Hence the

demonstrated change in attitudes may reflect a growing awareness of the depth and

complexity of statistics as students are exposed to a wider range of applications and

techniques. The slight evidence of a decrease in cynicism demonstrated in the use

aspect may be related to this growing appreciation. Thus it is possible that the

changes in attitudes from the beginning to the end of a course such as MAB101 are

complex.

A second factor to consider is the obvious confounding which exists with the time of

follow-up. The low response rate at follow-up is not only a problem in itself but is

also indicative of this, namely that students are experiencing maximum pressure as

the semester draws to a close. This is a time when significant assessment is both

current and impending in this and all other courses. Many students skip classes when

under deadline pressure. Most are feeling overwhelmed with work. It is therefore

not unexpected that such feelings would considerably influence the results of an

attitudinal survey.

3 These results were obtained in I_2005. I_2004 were surveyed by different means, but II_2004 results were only slightly lower.


49

A third factor which could introduce further complexity is that the respondents for

the follow-up survey are not likely to be a random selection from the initial

respondents. Whether they are more or less likely to be struggling students, and

therefore the nature of any likely bias in their attitudes, is not known.

Finding a better time to administer the follow-up attitudinal survey is a complex

problem to solve. If follow-up is attempted sufficiently before the end of the

semester as to overcome this difficulty, then students will not have experienced the

complete impact of the course. The only other alternative, to survey students after

the completion of the course, is equally unsatisfactory. Although this could be easily

carried out using email lists, it would be unlikely that many students would respond

and in such situations it is usually found that those who do respond tend to be those

who are more emphatic in their attitudes than the complete cohort.

As described below in Section 3.5, the Attitudinal Survey was changed in 2005 to

more closely investigate experiences of statistics at the school level. This change,

together with the complications of interpretation described above, meant that follow-

up surveys were not conducted with subsequent cohorts in the study.

3.5 Construction of the 2005 Attitudinal Survey

After the completion of the 2004 data collection stage of this study, initial modelling

was carried out to investigate relationships between students’ backgrounds (see

Chapter 2), attitudes, basic numeracy skills (see Chapter 4) and statistical reasoning

(see Chapter 5). (Chapter 7 describes the relationships between these and the

techniques and results of this modelling over the entire study.) All of the attitudinal

aspects described above (Section 3.2) were included as possible predictors of

statistical reasoning, while only the self-efficacy component was considered as a

relevant predictor for numeracy. In these initial models, self-efficacy was the only

attitudinal aspect which was convincingly significant in contributing to the

explanation of either numeracy or statistical reasoning. An interaction between

Numeracy Score and affect was significant in modelling statistical reasoning but was

seen as being due to a small number of unusual observations. Some of the other


50

aspects appeared significant in complex and seemingly spurious interactions for

subgroups of students. For these reasons, apart from the self-efficacy aspect, much

of the 2004 Attitudinal Survey was not administered in 2005. Rather, the

opportunity was taken to survey the students more explicitly on their school

experience of statistics.

In 2005, students were asked to respond to six Likert-style items: A_15 to A_19 on

self-efficacy and A_14 on motivation, of the 2004 Attitudinal Survey. Students were

also asked to complete the statement:

When I think of probability and statistics at school, I think of...

and to indicate whether or not they found statistics beneficial at each of three

educational stages:

• grade 11 and 12 (senior secondary)

• grade 8 to 10 (junior secondary)

• grade 4 to 7 (middle to upper primary),

and to indicate what was or was not beneficial. The complete 2005 Attitudinal

Survey can be found in Appendix D.

3.6 Results from the 2005 Attitudinal Survey

The 2005 Attitudinal Survey was completed by 247 MAB101 students during class

in the first week of first semester. Responses to the open-ended questions were

completed on the questionnaire while responses to the six items from the 2004

survey were recorded on digitally scanned sheets. Twenty-one students completed

the open-ended questions only.

Figure 3.8 illustrates the responses for 226 students to the six Likert-style items. For

each item in the self-efficacy scale, responses are generally more positive than for

2004. Hence this group of students would appear to be even more confident than in

2004. The responses to item A_14, “I am taking this unit only because I have to,”


51

are more negative than in 2004, particularly with regard to the proportion who

strongly agree with this statement.

SDDNASA


A_14 I am taking this statistics unit only because I have to

SAANDSD


A_15 I am good at maths

SAANDSD


A_16 I expect to do well in this unit



52

SDDNASA


A_17 I a m not confident of my ability to read and interpret informa tion presented gra philcally

SAANDSD


A_18 I expect to be able to do the computing necessary for this unit

SDDNASA


A_19 I expect to ha ve trouble de termining which proce dure to use to answe r questions

Figure 3.8 Responses for the Likert items repeated in 2005


53

Responses to the question

Did you find probability and statistics (sometimes called chance and data)

beneficial

a) in grades 11 and 12?

b) in grades 8 to 10?

c) in grade 4 to 7?4

are summarised in Table 3.1. Nineteen of the 247 students did not respond to any of

the three components of this question. Of the students who responded to at least one

of the components, there were 11% who felt that statistics had not been beneficial at

any level and another 3% whose only expressed opinion was that it was not

beneficial. There were 25% of students who felt it had been beneficial at all levels.

Perhaps the most notable feature of these figures is that 78% of students felt that

probability and statistics in grades 11 and 12 was beneficial and that 29% felt it was

beneficial only at that level, with another 9% indicating that it was beneficial in

grades 11 and 12 but not expressing an opinion regarding earlier years. While the

response for the senior years is pleasing, it raises the question of the standing of the

chance and data strands in the Queensland P-10 syllabus undertaken by these

students.

In response to the open-ended question of what was or was no t beneficial, most

students who responded referred to the usefulness (or lack thereof) of their learning

with regard to life or education, or to specific or generic skills they had acquired.

Table 3.1 gives a summary of the benefits reported by those students who considered

statistics beneficial in grade 11 and 12.

Specific skills which students felt they had acquired included: reading results of

surveys and finding probabilities and creating graphs, while generic skills included:

problem solving, logic, and reading and understanding ‘things’. Students who felt

statistics had been beneficial to other subjects generally referred to using it to handle

data in science subjects at school, while one student mentioned its use in a business

4 Of those who expressed an opinion for at least one level, 13% expressed no opinion regarding grades 4 to 7, and 11% expressed no opinion before grades 11 and 12, possibly due to lack of recall.


54

subject. Students who felt statistics was useful for further study generally presumed

that what they had learnt at school would be helpful in MAB101.

What was beneficial Percent useful for further study 13 useful for life 8 useful for other subjects 9 useful for further study and job 1 useful for further study and life 1 useful for other subjects and life 2 practical 1 gain specific skills 12 gain generic skills 6 better mark 2 interesting 2 challenging maths 1 different maths 1 easy 1 fun maths 1 good foundation 2 builds on previous years 1 assignment 1 understood gambling 1 no response 34

Table 3.1 What was beneficial – as reported by students who considered statistics beneficial in grade 11 and 12

Table 3.2 summarises the reasons given by those students who considered statistics

not to be beneficial in grade 11 and 12. These students most often referred to the

lack of practical experience with statistics, with one student commenting:

We learnt about it but we didn’t apply it to real life situations. We didn’t get to use the

stuff we learnt.

There are clearly substantial differences in students’ statistical experiences at school.

Of those students who felt statistics was beneficial at one stage of high school and

not at another, only a few expanded on both what was and what was not beneficial.

It was possible to find cases where Student A reported a specific benefit at Stage 1 of

schooling and a specific non-benefit at Stage 2, while Student B reported the same


55

benefit at Stage 2 and the same non-benefit at Stage 1. Some of this is attributable to

external factors. Different schools and teachers provide different learning

experiences within the same curriculum. However, some differences are part of the

students’ subjective learning experience. The implication from some responses, for

example:

Too young to care,

(from a student who felt statistics was beneficial only in grade 11 and 12)

is that a certain level of maturity is required before much learning can be interpreted

(even retrospectively) as beneficial. The challenge for teachers to engage students in

meaningful applications is ever-present.

What wasn’t beneficial Percent not practical 15 couldn't see use 9 not interested 9 not used yet 7 not useful for life 7 difficult 7 not sufficient depth 4 too much repetition 4 poorly taught 2 too many numbers 2 no response 34

Table 3.2 What wasn’t beneficial – as reported by students who considered statistics not beneficial in grade 11 and 12

Responses to the item:

When I think of probability and statistics at school, I think of…

are summarised in Table 3.3, with percentages of students who mentioned each

concept. As the item is open-ended, students are able to list as many thoughts as

they wish. Hence, responding with one of the concepts listed does not exclude a

student from responding with another.


56

…I think of… Percent Data representations 28 graphs; pie charts; charts; tables Data 21 data; data manipulation Experimental 20 data collection; experiments; surveys; assignments Negative experience 17 negative experiences; boring/tedious; little meaning/ no point; repetitive Calculations 13 fractions; percentages; calculations; maths Concrete materials 5 13 coins; dice; marbles; cards Effort 10 basic/simplistic; hard effort; complex; easy; probability easy/stats difficult; different; applications difficult Probability 9 chance; tree diagrams; permutations and combinations Process 7 conclusions; problem solving; analysis; skills Statistical measures 6 mean/median/mode; hypothesis tests st. dev/sigma/correlation Applications 5 gambling; other specific examples Distributions 4 bell curves; distributions; hypothesis tests continuous/discrete; frequencies; p and q Classroom 4 teacher; class Positive experience 3 useful; positive experience Words 2 definitions/formulae/jargon; census Computers 1 computing; graphics calculator

Table 3.3 When I think of probability and statistics at school, I think of …

5 Although referred to as “concrete materials” there is no knowledge of whether the student is recalling handling, observing or simply discussing these objects.


57

Most common among students’ responses to this item are data representations and

data. One pleasing aspect of the students’ responses is that 20% of students mention

some form of hands-on experience (experimental) such as data collection,

experiments, surveys and assignments, reflecting an increasing emphasis on such

activities. Experiences such as these have more potential to result in genuine

learning (Moore, 1997; MacGillivray, 2002) than exercises using materials or

examples such as dice, coins, cards and marbles which have been recalled by 13% of

students.

Figure 3.9 further elucidates the list in Table 3.3 by focussing on the top seven

responses and illustrating percentages for those students who considered statistics

beneficial in grade 11 and 12, and those who did not, as well as the whole group. In

most cases the thoughts of the groups are comparable. The most obvious difference,

the percentages of students who refer to some sort of negative experience is, as

expected, considerably higher among students who felt that statistics was not

beneficial in grade 11 and 12.

0

5

10

15

20

25

30

35

40

data

summarie

s data

expe

rimen

ts

nega

tive ex

p

calcu

lation

s

conc

rete m

aterial

s effort

All

Ben

Not Ben

Figure 3.9 Thoughts on statistics according to whether or not it was considered

beneficial in grade 11 and 12

Two other aspects are worthy of comment. Firstly, the percentage of students who

thought of concrete materials (i.e. dice, coins, cards or marbles) was 19% among


58

students who felt statistics was not beneficial and only 11% among those who felt it

was. While there is no way of knowing from the data which stage of education the

students are associating with concrete materials, this response might imply that such

examples do not help students appreciate the benefits of statistics.

Secondly, the percentage of students who mentioned concepts related to effort was

12% for those who felt statistics was beneficial and only 4% for those who did not.

It should be noted that aspects related to effort included reports of statistics being

both easy and difficult, simplistic and complex, with one student commenting that

statistics was easy and probability was difficult. Again it cannot be assumed that the

stage of education associated with the memory of effort is grade 11 to 12. However,

if there is any implication to be taken from this response, it must be that helping

students to appreciate the benefit of statistics requires engaging them in the process

and possibly revealing some of the complexity of the area.

3.7 Discussion

Acknowledging the potential impact of non-cognitive factors on student learning, the

Attitudinal Surveys described in this chapter were administered to MAB101 students

in order to better understand the attitudinal profiles of these students. The 2004

survey used Likert scales to measure: Affect, Value, Motivation, Links between

mathematics and statistics, Use of statistics in society, Difficulty and Self-efficacy.

In 2005, the self-efficacy component was repeated and specific school experience of

statistics was surveyed.

In general, the 2004 cohort of students demonstrated positive attitudes towards

statistics with regard to their affective feelings and the value of learning statistics.

They displayed relatively neutral attitudes regarding the difficulty of statistics,

conflicting attitudes regarding their own motivation to learn statistics and negative

attitudes with regard to the use of statistics in society. There was no doubt in the

students’ minds as to the link that exists between statistics and mathematics. The

self-efficacy of the students was positive as was the case when this section of the

survey was repeated with the 2005 cohort.


59

When considered as a group, MAB101 students present at the secondary/tertiary

interface with a generally positive attitude, particularly regarding their confidence in

their own ability to learn statistics. This confidence makes these students different

from those often discussed in research into students’ attitudes towards statistics,

especially those in psychology statistics courses where such non-cognitive research

is often based. However, this positive attitude needs to be interpreted in the light of

the lack of breadth and depth in statistics to which these students have been exposed

at school level.

In exploring the students’ statistical experiences at school via the 2005 Attitudinal

Survey, 85% of students felt that during at least one of the three stages of schooling

(grade 4-7, grade 8-10, grades 11 and 12) the study of statistics had been beneficial.

Approximately a quarter of students felt that it had been beneficial at all three stages

while over a third felt it had been beneficial during grades 11 and 12 only. Over

three-quarters felt that the study of statistics had been beneficial in at least grades 11

and 12.

In discussing the perceived benefits of statistics, students most commonly referred to

its usefulness in other areas or to general and specific skills they had acquired.

Reasons statistics was considered non-beneficial referred most commonly to it being

not practical or to students’ inability to see its usefulness, with some

acknowledgement that student immaturity may be responsible for this perception.

Students in the 2005 cohort gave a variety of responses to the item:

When I think of probability and statistics at school, I think of…

Comparisons of these responses on the basis of whether or not students felt statistics

had been beneficial in grades 11 and 12 suggest that engaging students in a way that

connects with the effort they are required to exercise, may increase their perceived

benefit of their study.

While school experiences of statistics are varied, the attitudes towards statistics of

incoming MAB101 students can be summarised as generally positive particularly in

the area of self-efficacy. Addressing specific non-cognitive issues with individual

students is always necessary. In MAB101, this requires supporting those who are


60

traditionally seen as needing encouragement due to poor background or motivation,

while being aware of the challenge experienced by those who are encountering a

broader perspective to statistics than that to which they have been accustomed, and

the challenge of engaging those students who would prefer a more mathematical

approach to statistics.

Chapter 4 Numeracy

61

Chapter 4 Numeracy

4.1 Introduction

Over the past forty years, numerous studies at the national and international level

have been aimed at determining levels of numeracy and mathematical skills

possessed by students at different educational stages. Such studies include those

conducted by the International Association for the Evaluation of Educational

Achievement (IEA), extending from the First International Mathematics Study

(FIMS 1963-1967) to the most recent Trends in International Mathematics and

Science Study (TIMSS 2002/03). Most of these have concentrated on primary and

junior secondary school, the upper age limit generally being fourteen to fifteen years

– the end of compulsory schooling in most countries.

The British Cockcroft Report (Cockcroft, 1982) of 1982 first popularised the term

‘numeracy’ giving an informal definition of:

an “at-home-ness” with numbers and an ability to make use of the mathematical skills

which enable an individual to cope with the mathematical demands of his everyday life.

While educational literature differentiates between the terms ‘numeracy’,

‘quantitative literacy’ and ‘mathematical skill’, in many situations such differences

are insubstantial and Cockcroft’s definition brings to our attention some important

matters. In particular it emphasises that both familiarity and skills are needed to

achieve applicability, as well as highlighting the fact that a desired level of

competence depends on the specific demands of an individual’s circumstances.

Students beginning a degree program at university in a quantitative area such as

science will find that a certain level of mathematical skill is assumed for

understanding and successful completion of their course. There is a tendency to

think that the completion of an algebra and calculus based senior school mathematics

Chapter 4 Numeracy

62

subject provides more than sufficient mathematical preparation for such a course. In

some cases such a mathematics course is set as a formal prerequisite while in others

it is taken to be assumed prior knowledge. In either situation, the experience of

academics has been that the range of mathematical preparation with which students

arrive at university, even amongst those who have completed the school course in

question, is such that any assumption can be made only with caution (Coutis, et al.,

2002). Although this has resulted in many universities recognising the need to

provide support for students in this area, little formal research has to date been

carried out into the mathematical skills of students at the secondary/tertiary interface.

It has been shown that a lack of mathematical skills provides a barrier fo r learning in

engineering courses (Cuthbert and MacGillivray, 2003) and that basic numeracy is a

predictor of outcomes in introductory statistics (Gnaldi, 2003). Educators in the

tertiary context require a sound awareness of the level of skills which students are

likely to possess if they hope to understand the difficulties which students encounter

and provide them with the resources to overcome such difficulties. In particular, this

includes an awareness of the level of confidence retained by students from their pre-

senior school mathematics study.

In this chapter, a Numeracy Questionnaire to measure the level of basic numeracy

possessed by students at the interface of secondary and tertiary education, embarking

on a degree program associated with science, is developed and analysed. Section 4.2

describes the structure of this multiple-choice questionnaire. In Section 4.3.1, the

results of the questionnaire amongst 566 students over two years are presented and

discussed. A dichotomous Rasch model is fitted to the data in Section 4.3.2 and used

in Section 4.3.3 to define five levels of understanding demonstrated by the students.

In Section 4.3.4, general linear models are used to investigate relationships between

the students’ total scores on the Numeracy Questionnaire and their demographic,

mathematical and attitudinal backgrounds. Section 4.3.5 focuses on each individual

item on the Numeracy Questionnaire and applies logistic regression to identify which

of the factors determined as significant in Section 4.3.4 are significant indicators of

success or failure for the item. Section 4.4 concludes this chapter by discussing the

implications at the secondary/tertiary interface of the results of the Numeracy

Chapter 4 Numeracy

63

Questionnaire with respect to the importance of sufficient mathematical study at

secondary level.

4.2 Construction of the Numeracy Questionnaire

Over the past decade there has been an increasing trend to develop and administer

diagnostic surveys to incoming tertiary students to gauge their levels of mathematical

preparation. The instruments used for these surveys have tended to be constructed

for specific needs and the information obtained from them has rarely been

disseminated. Hence a tool designed specifically to measure the pre-calculus skills

relevant to an introductory data analysis course and appropriate at the

secondary/tertiary interface, could not be located in the literature and needed to be

constructed for this survey.

The Numeracy Questionnaire was developed specifically to assess the numeracy

skills considered relevant to the unit MAB101, Statistical Data Analysis 1, at the

Queensland University of Technology, but would be appropriate as a tool for

measuring pre-calculus mathematical skills of students entering any tertiary course

which required such skills. As described in Chapter 2, students in MAB101 are

enrolled in a science or broadly scientific degree program and the majority undertake

the unit in their first semester of study.

The Numeracy Questionnaire consists of 21 multiple-choice items which students

could complete in less than thirty minutes. Given that students were also asked to

complete the Statistical Reasoning Questionnaire described in Chapter 5, as well as

the Background Information Survey of Chapter 2 and Attitudinal Surveys described

in Chapter 3, it was felt that the multiple-choice format would both maximise student

cooperation, and minimise marking and processing difficulties. So as to enable very

basic questions to be asked, students were requested not to use calculators, with

numbers being deliberately chosen for ease of manipulation.

The mathematical skills on which the Numeracy Questionnaire focuses are those

skills commonly associated with an introductory data analysis course but it is

Chapter 4 Numeracy

64

designed to assess basic skills well below the assumed level of entry to the course.

(The complete Numeracy Questionnaire can be found in Appendix E.) For each

question, distracters were carefully chosen to reflect common or possible errors.

These are summarised, together with the proportion of students who selected each

response, in Table 4.2. Both the general and more specific skills for each question

are listed below. N_i is the item number on the questionnaire.

Handling of fractions, percentages and decimals:

N_1 Convert a percentage to a common fraction;

N_2 Convert a percentage less than one to a common fraction;

N_3 Convert a common fraction to a percentage;

N_4 Calculate the percentage of a group given the percentages of two

unequal subgroups;

N_5 Calculate a percentage of a group;

N_6 Calculate percentages in a two-step problem;

N_8 Add two decimal fractions;

N_9 Add two common fractions;

N_10 Add three common fractions;

N_16 Order a set of positive and negative, common and decimal fractions;

N_17 Order a set of common fractions;

Application of operations

(N_4, N_5, N_6) These items involve percentages in an applied context

N_7 Divide a class into groups of maximum size;

Evaluation of simple expressions

N_11 Evaluate a simple rational expression;

Chapter 4 Numeracy

65

N_12 Evaluate the square root of the sum of two fractions;

N_13 Evaluate a simple expression using order of operations;

Substitution and evaluation of expressions

N_14 Substitute into an expression which requires estimation of a surd;

N_15 Evaluate a multi-step substitution;

Solving of equalities and inequalities

N_18 Rearrange an expression involving 1

1 x;

N_19 Solve a simple linear inequality;

N_20 Solve a rational inequality with the unknown on the denominator;

N_21 Solve a pair of simple linear inequalities.

The emphasis in the Numeracy Questionnaire is intentionally put on basic skills

rather than any higher level thinking such as problem solving or mathematical

reasoning. In Chapter 7 of this thesis, relationships are shown to exist between these

basic skills and statistical reasoning. In particular it is demonstrated that a link exists

between the ability to manipulate fractions and the ability to utilise full and complete

proportional reasoning.

The first six questions (N_1 to N_6) involve the use of percentages. The application

of this area to an introductory statistics course is transparent, both in calculation and

interpretation. A student, for example, needs to be comfortable with converting

0.1% to 0.001 in the statement: “This factor is significant at the 0.1% level,” before

they can hope to understand the statistical implications.

The questions N_4 to N_7 require students to apply basic skills to practical

situations, perhaps combining two or more steps (N_4 and N_6) or reinterpreting the

mathematical answer to make sense in context (N_7). All these skills are commonly

applied in any introductory statistics course.

Chapter 4 Numeracy

66

Questions N_8 to N_13 involve simple calculations, such as adding two decimals

(N_8), two fractions (N_9), three fractions (N_10) and applying order of operations

(N_11). Such skills are all needed in simple calculations, in correctly using

calculator processes, and in understanding basic quantitative arguments.

An ability to perform basic calculations such as those demonstrated in N_1 through

to N_11 cannot be replaced by technology. Many teachers at the introductory

tertiary level have experienced situations where blind faith in a calculator, together

with an inability to manually check at least the scale of the computation, have led a

student to have difficulties in using a calculator, or to defend an error of sizeable

proportion on the basis that “the calculator said so”.

Questions N_14 and N_15 require students to substitute values and evaluate an

expression. In N_14, the expression is of the form used to calculate a pooled sample

standard deviation. N_15 is a form similar to a test of two proportions. Experience

has shown that some students have difficulty understanding what is happening in

such expressions, no matter what calculating technique is used.

Questions N_16 and N_17 involve ordering positive and negative fractions and

decimals. These questions further test students’ understanding of value.

Question N_18 asks students to rearrange the equation:

cxb

a=

− 1.

Experience has shown that students have difficulty understanding and handling an

expression such as this. Handling of fractions is often a problem area and it appears

that the combination of horizontal and sloped lines to indicate a fraction can add

considerably to the problem. Combined with the appearance in a fraction of

algebraic entities rather than simply numbers, this creates a significant barrier to

understanding. Students encounter such an expression during statistics in the form of

standardised sample means, such as:

xnµ

σ−

.

Chapter 4 Numeracy

67

Questions N_19 to N_21 involve the use of inequalities. Students encounter

inequalities in statistics in a number of situations.

During MAB101 students rely on calculators and computers to perform necessary

calculations. However, part of the motivation for this study were the indications that

an understanding of, familiarity with and ease of handling basic numerical and

algebraic expressions may be important to the ir development of statistical thinking.

The Numeracy Questionnaire was included in this research to clarify the degree to

which this is the case. The results of this aspect of the work are discussed in Chapter

7.

Comparison with the new Year 1 to 10 mathematics syllabus for Queensland, shows

that most of the items in the questionnaire require a level of understanding which

would be expected of students aged 10 to 15 years, requiring demonstration of

outcomes such as:

compare and order whole numbers and common decimal fractions of any size, making

connections between key percentages and fractions;

(Learning Outcome N4.1 p20, (Queensland Studies Authority, 2004))

and

identify and solve addition and subtraction problems involving rational numbers.

(Learning Outcome N6.2 p21, (Queensland Studies Authority, 2004)).

The first of these (designated a level 4 outcome in the syllabus) would be typically

expected of 12 year-old students, while the second (designated a level 6 outcome)

would be expected of 15 year-olds. These skills would be commonly represented in

mathematics syllabi for this age group both nationally and internationally. The

remaining questions involve evaluation of simple rational expressions and solving

simple inequalities, skills which students would be expected to consolidate within the

context of an algebra and calculus based senior school mathematics course, if not

during the mathematics of pre-senior compulsory schooling.

Chapter 4 Numeracy

68

4.3 Results of the Numeracy Questionnaire

MAB101 students completed the Numeracy Questionnaire in class during the first

week of semester. All students who attended classes were encouraged to complete

the questionnaire although there was no compulsion to do so. While it was expected

that many students would take less than twenty minutes to complete the tasks and

nearly all less than thirty, students were encouraged to take as long as they needed.

Hence speed of calculation did not affect students’ results. It was clearly explained

to students that neither their participation nor results would have any bearing on their

course, but would be used solely for research purposes. In the second year of the

study, students received feedback on their score and were given the opportunity to

compare their answers to the correct answers, but few availed themselves of this.

The Numeracy Questionnaire was administered to three cohorts of students: first

semester 2004, second semester 2004 and first semester 2005. Due to a complication

with the administration of the survey6 in second semester 2004, it was only

completed by a smaller and less representative proportion of the class. For this

reason only two cohorts, first semester 2004 and 2005, are included in the study.

Where students were included in the study within both these cohorts, only one set of

data was included. This was selected such that it belonged to the cohort in which the

student had completed a larger number of survey instruments over the entire study,

or, where an equal number had been completed, the most recent cohort was selected.

There were four students for whom such a decision needed to be made with regard to

the Numeracy Questionnaire.

As has been described in Chapter 2, background and demographic information,

including gender, course of study, tertiary entrance score (OP score), level of

mathematics previously studied and results obtained therein, was also provided by

the students. The Numeracy Questionnaire was completed by 562 students over two

years, with 548 of these providing the information required to match their answers

with their demographic and background information.

6 During the class in which the questionnaire was scheduled, the university administration elected to conduct a fire evacuation drill. Course requirements made it difficult to administer the questionnaire in the following class.

Chapter 4 Numeracy

69

Of the 548 ‘identifiable’ students who completed the skills questionnaire, 48.0%

were female and 52.0% male (the same proportions as for the total number of

identifiable students reported in Section 2.3). These students represented a total of

30 different courses which are summarised in Table 4.1 with percentages given for

all students over the two cohorts and those who completed the Numeracy

Questionnaire. As described in Chapter 2, the Queensland year 12 algebra and

calculus based mathematics course, Maths B, or its equivalent is the assumed level of

mathematics for MAB101. This standard was reported as having been studied by

53.1% of students who completed the Numeracy Questionnaire, 4.9% reported a

lower level of mathematical preparation, 35.6% had a higher level (either advanced

high school mathematics or previous tertiary study) and 6.4% did not report their

mathematical background.

Students were also classified as ‘maths’ or ‘non-maths’ students. Maths students

included all those who were studying a mathematics degree or double degree

including mathematics, as well as applied science students majoring in mathematics

and education students with mathematics as one of their teaching subjects. Under

this system, 27% of students in the study were classified as maths students,

essentially the same as the figure was 28% of those who completed the skills

questionnaire.

Course % of total 7

% of numeracy

respondents

education or double degree involving education 9.7 8.8

maths or double degree involving maths; applied science (maths)

17.2 19.7

other double degree 7.2 6.0

applied science (non-maths) 56.5 55.5

biotechnology 7.4 7.9

other 1.9 2.2

Table 4.1 Course breakdown of student cohort and respondents over two years

7 Reported in Section 2.3

Chapter 4 Numeracy

70

For the 562 students who completed the skills questionnaire, total scores ranged from

4 to 21, with a mean of 13.6, standard deviation of 4.1 and a median of 14. Twenty

students scored the maximum possible score of 21. The distributions of the two

cohorts are illustrated in Figure 4.1.

Although the two distributions are remarkably similar, a slight decrease in the

number of lower responses in 2005 resulted in quartiles which are higher by one

mark and a larger mean (p=0.011 from the t-test) over 2004. This statistically

significant difference contributes to some differences which appear between the

years in the analysis of background predictors (Section 4.3.4). However, closer

inspection of the ordering and distribution of responses showed so few differences

between the cohorts, that combining the years was considered a valid and the most

succinct way to describe the responses to individual items.

Descriptive Statistics: Numeracy Total Variable Year N Mean SE Mean StDev N_Total 04 303 13.224 0.238 4.135 05 259 14.104 0.251 4.040 Variable Year Minimum Q1 Median Q3 Maximum N_Total 04 4 10 14 16 21 05 4 11 14 17 21

Year

Num

erac

y To

tal

I_05I_04

22.5

20.0

17.5

15.0

12.5

10.0

7.5

5.0

Figure 4.1 The distribution of total Numeracy Scores is similar across the two cohorts.

Chapter 4 Numeracy

71

4.3.1 Consideration of student responses

The complete questionnaire and the percentage of students who gave each response,

together with a brief explanation of the error reflected in each distracter, can be

found in Table 4.2. As well as reporting the responses for the entire group, each

response is reported separately on the basis of the division of students into those who

have and have not studied Maths B. Questions are ordered by increasing difficulty

(as measured by student outcomes) and the correct response is listed first. In the

actual questionnaire, responses are arranged randomly.

An examination of groups of questions shows how the success rate falls rapidly as

the simplicity of the question decreases. Lack of familiarity, multi-step problems

and abstraction with the introduction of letters cause obvious difficulties for students.

As an example, consider a group of questions involving fractions, extending from

basic operations with fractions through to the application of fractions in the form of

manipulation of rational expressions. Question N_9, which requires addition of two

fractions, has a success rate of 84% which falls to 78% for adding three fractions

(N_10). For students who have not completed Maths B, the competency with

fractions is extremely limited with both these questions having a success rate close to

50%.

When a square root is also involved (N_12), the success rate for the complete cohort

falls to 56% and to 33% for those without Maths B. It is interesting to note here how

the addition of a step to a question causes students to regress in their skills. In

question N_9, 8.9% of all students added two fractions by either just adding the

denominators or by adding the numerators and denominators. When the square root

is added to the problem, 26.1% of students make either of these two errors (be it

before or after taking the square root).

When students are asked to apply their understanding of fractions to manipulate an

algebraic expression (N_18), the success drops to 48%, and to 42% when the

equality becomes an inequality (N_20). The three items (N_12, N_18 and N_20) all

have a success rate of around 30% for students without Maths B. Question (N_15),

% response

Question Choices all (566)

done maths B

(489)

not done maths B

(27)

Comments on responses

Level one8

10 95.7 96.3 88.9 Nearly all could manage this. 5 1.6 1.4 3.7 Forgot that reference group was 200.

40 2.3 1.9 7.4 1Used 5% .

5=

57 0.2 0.2 0.0 Assumed equally likely groups although told otherwise.

100 0.2 0.2 0.0 1Used 5% .

2=

N_5

Possible subject grades at a particular institution are 1 to 7, with 7 being the highest. In a particular class of 200, 5% of students were given a 1 or a 2. The number of students receiving a 1 or a 2 was:

No response. 0.0 0.0 0.0 1.21 92.2 92.6 92.6 Another high success rate. 0.1111 0.0 0 0.0 Does not trade. 0.121 4.3 4.1 0.0 Shifts decimal point. 1.111 3.6 3.3 7.4 Cannot trade correctly. 12.1 0.0 0.0 0.0 Shifts decimal point.

N_8 0.66 + 0.55 is equal to:

No response. 0.0 0.0 0.0 15

87.4 88.5 74.1 Most students can do this.

120

1.4 1.4 3.7 1

Since 10% is 10

20100

7.8 6.8 18.5 Did not simplify.

25

2.5 2.7 0.0

12

0.4 0.2 0.0 Guessing

N_1 Written as a fraction in its simplest form, 20% is equal to:

No response. 0.5 0.4 3.7

Table 4.2 continued over

8 The levels included in this table result from the Rasch analysis and are described in Section 4.3.3

% response


done maths B

(489)

not done maths B

(27)


29 87.2 87.0 88.9 Well done for those without Maths B 41 2.7 2.9 0.0 Cancel 4 and 42 to give 5+9x4. 56 5.0 4.9 3.7 Cancel 4 into 42 to give 5x4+9x4. 89 3.0 3.1 3.7 Cancel 4 and 22 to give 5x16+9.

181 1.1 1.0 0.0 ( )22Thinks a b a b× = ×


N_11 2 25 4 9 2

4is equal to:

× + ×

No response hereafter.9 0.2 0.2 0.0

Level two

1130

83.6 86.2 48.1 Note the difference for without Maths B.

230

6.2 5.8 18.5 Finds common denominator but doesn’t convert numerator.

111

2.7 2.5 11.1 Adds denominators.

211

6.2 4.7 22.2 Adds numerators and denominators.

56

0.9 0.6 0.0 No idea.

N_9 1 1 is equal to:

6 5+

No response. 0.4 0.2 0.0 12 78.6 81.0 66.7 10 6.6 5.6 0.0 ‘Near enough’. 11 5.7 5.4 7.4 Round down. 11.3 8.0 7.4 14.8 No practical application. 13 0.7 0.4 7.4

N_7

A group of 340 students must be divided into lab classes with a maximum of 30 students in each. The smallest number of lab classes needed is:



9 Three or more missing items at the end of the questionnaire were treated differently in the Rasch analysis. (See §4.3.)

% response


done maths B

(489)

not done maths B

(27)


16.7% 78.3 79.2 74.1 6.0% 3.4 2.9 7.4 12.5% 10.0 10.1 7.4 60.0% 2.0 1.6 7.4 66.7% 5.9 5.8 3.7

N_3

Written as a percentagecorrect to 1 decimal place,

1the fraction is equal to:

6

No response. 0.5 0.4 0.0 3

14

77.6 80.1 51.9 Adding 3 fractions is harder for most.

813

7.3 4.9 37.0 Adds numerators and denominators - Slightly more than in N_9.

1372

1.4 1.7 0.0 Finds common denominator but numerator is sum of denominators.

3572

1.8 1.9 3.7 Finds common denominator but tries to cross-multiply for numerator.

51

12 10.3 10.3 3.7 Can find common denominator, but not numerator; more convincing

than correct answer which has been simplified.

N_10 1 2 5 is equal to:

4 3 6+ +

No response. 1.6 1.2 3.7 Level three

20 69.2 71.8 44.4 Add one step and the success rate reduces considerably. 10 2.7 1.2 11.1 Used 100 as reference group. 15 1.8 2.0 0.0 Combination of errors above and below. 30 4.6 3.9 14.8 Forgot to remove the 5%. 170 21.5 21.0 25.9 = 85% - perhaps did not read the question.

N_6

In the same class (as in question 5), 85% of students were awarded a grade from 3 to 6. The number of students receiving a grade of 7 was:



% response


done maths B

(489)

not done maths B

(27)


11000

61.2 62.8 66.7 A percentage <1 is much harder.

1100

9.3 8.0 14.8

110

25.8 25.9 14.8 Confused with 0.1.

10100

3.7 3.3 3.7

910

0.0 0.0 0.0

N_2 Written as a fraction in its simplest form, 0.1% is equal to:

No response. 0.0 0.0 0.0 72

a < 62.8 64.0 51.9

72

a > − 3.9 3.5 3.7 Take 2a over without changing sign.

3a = 12.3 11.1 14.8 An integer that works. 1a < 7.1 6.8 14.8 Divide 12 by 2, then subtract 5. 72

a = 10.3 11.1 7.4 Solve the equality.


N_19 The solution to theinequality:2 5 12 is:a + <

No response hereafter. 3.2 3.1 7.4


% response


done maths B

(489)

not done maths B

(27)


Level four

136

56.2 58.0 33.3 Add 1 step and the success rate with fractions reduces considerably.

15

3.4 3.3 3.7 Take square roots then add denominators.

113

15.0 13.0 33.3 Just add denominators – compare with same error in N_9.

25

7.7 7.2 18.5 Take square roots then add numerators and denominators.

56

12.3 13.0 3.7 Take square roots first then add fractions.


N_12 1 1 is equal to:

9 4+


3 13 81 4 3 5 20 6 7 54.1 55.4 44.4 Slightly better than N_16.

13 8 34 1 20 7 6 5 3 17.3 15.8 25.9 Descending denominators.

8 13 34 1 7 6 20 5 3 2.9 2.9 3.7 Descending.

3 8 131 4 3 5 6 7 20 7.7 7.2 7.4 Ascending numerators & denominators.

13 3 81 4 3 20 6 5 7 14.1 14.4 11.1 3 4

5 6>


N_17 Which of the following sets of values is correctly ordered from smallest to largest?



% response


done maths B

(489)

not done maths B

(27)


2 9a< < 51.6 53.5 33.3 3.75a = 3.6 2.7 11.1 Solution to the pair of simultaneous equations. 8a = 7.1 6.4 3.7 An integer that works.

2 9a> < 24.0 24.7 22.2 A complete lack of understanding of inequality signs. 9a < 6.9 6.6 14.8 Solution to the first equation and satisfies 3x9>6


N_21 The solution to the pairof inequalities:

7 16 and 3 6is:a a+ < >

No response N_19 on. 3.2 3.1 7.4

26% 52.1 53.3 40.7 25% 15.8 15.0 29.6 Average of 20% and 30%.

28% 17.1 16.9 18.5 1Used 30%

3=

52% 11.9 11.5 11.1 Forgot that reference group was 200. 56% 2.1 2.3 0.0 Combination of previous 2 errors.

N_4

A class consists of 80 males and 120 females. A non-compulsory excursion is attended by 20% of the male students and 30% of the females. The percentage of the class which attends the excursion is:

No response. 0.9 1.0 0.0 1 -0.05 0.05 0.5 0.555

− 50.4 49.0 51.9

1-0.05 0.05 0.55 0.55− 3.4 3.3 7.4 Doesn’t understand decimal places.

1-0.05 0.05 0.5 0.555− 34.2 35.6 25.9 Cannot order negatives.

1-0.05 0.05 0.5 0.555− 6.1 6.4 0.0 No idea about negatives.

1 -0.05 0.05 0.55 0.55− 4.2 4.1 7.4 Two mistakes.


N_16 Which of the following sets of values is correctly ordered from smallest to largest?



% response


done maths B

(489)

not done maths B

(27)


32 50.9 52.1 40.7 Poorly done.

5 16.7 17.3 11.1 ( ) ( )20 30 5 2+ ÷ ×

6.4 4.3 25.9 Left to right. 23 24.2 24.3 22.2 Multiplication before division – misunderstands BOMDAS. 52 0.7 0.8 0.0 ( )20 30 5 2+ ÷ ×


N_13 20 30 5 2 is equal to:+ ÷ ×


1bc

xa

=−

47.7 49.4 25.9

1ax

b c−

= 18.9 18.3 18.5 Just swap x & c – visually appealing.

1ax

bc−

= 20.5 19.8 29.6 1

Does not understand 1 x

1a b

xc

=+

4.6 4.3 11.1 1

Take 1 over first. Confused by 1 x

1x

ac

b

=−

4.6 4.3 7.4

1 1a ab x b x

−= −


N_18

The solution for to theequation:

-1 is given by:

x

a cb x

=


2.83 46.4 48.4 25.9 Less than half can do without a calculator. 3.94 16.6 15.8 29.6 a b a b+ = + 6.00 11.9 11.5 14.8 As above with incorrect simplification. 6.93 16.6 16.0 18.5 Incorrect cancelling.

11.31 5.0 4.9 3.7 ( )22Thinks a b a b× = ×


N_14 ( ) ( )2 2

When 25, 13, 2, 4,

the expressionn-1 1

2

(correct to 2 decimal places)is equal to:

n m s t

s m tn m

= = = =

+ −+ −



% response


done maths B

(489)

not done maths B

(27)


Level five

192a > 42.0 44.2 29.6 Another very low success rate. 1

12a < 3.9 3.5 7.4

12a < 10.5 10.1 11.1 192a < 27.6 27.4 22.2 Inverted both sides without changing direction of inequality. 12a > 11.2 10.5 18.5

No response 1.6 1.2 3.7

N_20 The solution to the

48 1inequality: is:4a

<

No response N_19 on. 3.2 3.1 7.4

16/15 37.7 39.5 22.2 Lowest success rate. 1/4 10.3 9.7 22.2 This is p. 4/5 17.4 17.7 7.4 ( )Forgets 1 p−

5/6 17.8 17.3 29.6 1 2Uses p p p= + 108/77 8.4 8.0 0.0 x2 and n1 confused. No response. 6.9 6.6 11.1

N_15 ( )

1 2 1 21 2

1 2 1 2

1 2

1 2 1 2

Given that,

, ,

the value of when1

10, 15, 25, 75is given by:

x x x xp p p

n n n np p

p px x n n

+= = =

+−−

= = = =

No response hereafter. 1.4 1.2 7.4 Can’t do (or be bothered) – highest non-participation.

Table 4.2 Items and responses to Numeracy Questionnaire in order of difficulty

Chapter 4 Numeracy

80

which students found most difficult with a success rate of 38% (22% without Maths

B), requires multi-step substitution of fractions into an algebraic rational expression.

The numbers involved in N_15 have been deliberately chosen so that a little

experience and comfort with fractions and a minimal degree of persistence should

produce success. Interestingly, this question also had the highest non-response rate

of 9%, (19% without Maths B) indicating that comfort and persistence are not

present in many students when handling fractions.

A smaller group of questions involving practical application of percentages again

shows how the addition of steps in the procedure decreases the success rate.

Question N_5, requiring students to find five percent of 200 students, has the highest

success rate of 96% (89% without Maths B). This success rate falls rapidly to 69%

(44% without Maths B) when another step is added to this question (N_6). When the

percentage of two subgroups has to be converted to the percentage of the whole

group (N_4), the success rate falls to only 52% (41% without Maths B).

There are three items, N_1, N_2 and N_3, which involve converting between

common fractions and percentages. N_1 is the simplest, involving the value twenty

percent, N_3 becomes more difficult as the fraction 16

is less familiar to the students

and N_2 more difficult again as the percentage is less than one. The success rate for

the entire cohort falls from 87% through 78% to 61% for these items. For those

students without Maths B, the struggle begins much earlier with a success rate of

74% for N_1 and hence does not show the same decrease, with success rates of 74%

and 67% for N_3 and N_2, respectively. Item N_2 is one where students without

Maths B slightly outperform those without Maths B; the contrast in this question is

more in the incorrect choice. The pattern of students without Maths B struggling

sooner (and so not demonstrating as great a decrease in success with item

complexity) is generally reflected throughout the questionnaire.

Some of the errors which students have made in these items could legitimately be

attributed to carelessness. Perhaps 22% of students did not correctly read N_6 and

thought they were only finding eighty-five percent. Also, 17% of students in N_4

calculated one third rather than thirty percent of 120. However, this carelessness,

Chapter 4 Numeracy

81

which tends to increase with problem complexity, is one aspect of the students’ gap

in skills and is likely to cause problems in the students’ application of techniques in

their fields of study.

4.3.2 Results from Rasch analysis

In order to further investigate and better understand the levels of numerical

competency represented by students in the study, a dichotomous Rasch model was

fitted to the responses to the Numeracy Questionnaire for the two cohorts. This

Rasch model describes the probability of a person of ability nβ succeeding at an item

of difficulty iδ , in terms of the logistic equation:

Equation 4.1

( )( )( )( )

exp; 0,1; 1,..., ; 1,... ;

1 expn i

nin i

xP X x x n N i L

β δ

β δ

−= = = = =

+ −

where 1 for success, or 0 for failure,niX = is the response of person n on item i .

This is one of a family of models developed by George Rasch (Rasch, 1960) and

used over the past twenty years in the areas of education, psychology and sociology

as an alternative to traditional test theory and applied widely in school mathematics

studies (Wilson, 1992) and surveys such as the Third International Mathematics and

Science Survey (TIMSS) (Lokan, et al., 1997). The Rasch model assumes that the

questionnaire is measuring an underlying one-dimensional and hierarchical trait and

that the items are independent of one another. Model diagnostics are used to

measure the fit of the data to the model and in so doing provide evidence of the

validity of the questionnaire (Wright, 1999; Watson and Callingham, 2003). This

model has the advantage that sufficient statistics exist for nβ and iδ , with nR , the

total score for person n , being sufficient for nβ , and iS , the number of people

responding correctly to item i , being sufficient for iδ . The existence of these

sufficient statistics allows item difficulty and person ability to be separately

Chapter 4 Numeracy

82

estimated on a single scale (Keeves and Alagumalai, 1999). When n iβ δ= , person

n has a probability of 0.5 of succeeding at item i .

The Rasch model was fitted using Quest software (Adams and Khoo, 1996). For the

analysis, missing answers were considered as incorrect, apart from in cases where

students left three or more items blank at the end of the questionnaire in which case

they were treated as missing. As students were given virtually unlimited time, it was

felt that those who left a sequence of questions blank at the end of the paper had

chosen to proceed no further, whereas for individual missed responses it was

considered that the student could not determine the correct answer. Nineteen

students received perfect scores and are not included in the analysis as they and their

results cannot contribute to the estimation process.

One measure of fit used in Rasch modelling is the item infit mean square. The item

infit mean squares are the means of the weighted squared standardised residuals for

each item, averaged over students. For each of the 21 items in the questionnaire, the

infit mean square fell between 0.85 and 1.21. According to Keeves and Alagumalai

(1999), an item is generally accepted as fitting the Rasch model if the infit mean

square lies between 0.77 and 1.30, although some researchers would prefer a more

restricted range of 0.83 to 1.20. This provides evidence that the items are all

consistent with the underlying construct being measured by the questionnaire (in this

case, basic numeracy skill), and is a measure comparable to the concept of internal

validity used in traditional test theory (Wright, 1999).

Table 4.3 presents the estimates of the item and person separation reliabilities and the

overall fit measures.

item separation reliability 0.99 item infit mean square mean=1.00 SD=0.09 person separation reliability 0.74 person infit mean square mean=1.00 SD=0.20

Table 4.3 Measures of fit from Rasch analysis indicate that the model fits well.

Chapter 4 Numeracy

83

The item separation reliability index is the proportion of the total variance of item

estimates that is associated with parameter variance. The person separation

reliability is the corresponding index for persons. Large values of these indices (i.e.

close to one) are indicative of better estimation of parameters. (See Keeves and

Masters (1999) p275 for a detailed definition.)

For the Numeracy Questionnaire, the item separation index of 0.99 is very high,

providing evidence that the items give a good spread of difficulty which suggests that

the level of basic numeracy skills can be measured by the questionnaire. The person

separation reliability is acceptable at 0.74, providing evidence that the questionnaire

is of an appropriate level of difficulty for the students. The average item infit mean

square and the average person infit mean square are equal to the expected value of

1.00, suggesting that the model is appropriate for the data and hence that the

questionnaire is measuring a one-dimensional construct.

The variable map in Figure 4.2, shows the students (on the left hand side) and items

(on the right hand side) displayed on a single logistic scale. The level at which an

item appears is called the threshold. This is the level of ability at which a student has

a 50% chance of answering the question correctly. The map provides a convenient

visual display of the relative difficulty of the items. From the map, there appears to

be a group of students of very high ability, as well as the nineteen students who are

not included in the analysis because they received perfect scores. Although the case

separation reliability and case infit mean square provide evidence that the

questionnaire is of an appropriate level of difficulty for the students, the distribution

of items along the variable map verifies that, as intended, the questionnaire does not

reach into the upper levels of numeracy skill. This is not surprising as the

questionnaire is designed to assess basic skills below the assumed level of entry into

the course. In this sense the questionnaire is acting as a remedial diagnostic tool.

Questions at a higher level would need to be included in order to assess the full range

of numeracy skill of the cohort.

4.3.3 Levels of thinking

Consideration of the variable map produced by the Rasch analysis, together with

question complexity has been used to divide the items into five levels of difficulty

Chapter 4 Numeracy

84

Item Estimates (Thresholds) (N = 566 L = 21 Probability Level=0.50) ---------------------------------------------------------------------------- 4.0 | | | | XXXXXXXXX | | | | 3.0 | | | XXXXXXXX | | | XXXXXXXXXXXX | | 2.0 | XXXXXXXXXXXXXXXX | | | 15 XXXXXXXXXXXXXXX | | 20 XXXXXXXXXXXXXXXXXXXX | | 14 1.0 XXXXXXXXXXXXXXXXXX | 16 18 | 4 13 21 XXXXXXXXXXXX | 17 | 12 XXXXXXXXXXXXXX | | 2 XXXXXXXXXXXXX | 19 0.0 X | XXXXXXXXXX | 6 | XXXXXXXXXXXX | | XXXXXX | 3 10 X | 7 XXXXXX | -1.0 | 9 XXX | | | 1 11 XXXX | | | XX | -2.0 | 8 | | | | | 5 | | -3.0 | ---------------------------------------------------------------------------- Each X represents 3 students ============================================================================

Figure 4.2 The variable map shows the students (on the left hand side) and items (on the right hand side) displayed on a single logistic scale.

Level 5

Level 4

Level 3

Level 2

Level 1

Chapter 4 Numeracy

85

and the students into five levels of ability as indicated in Figure 4.2. Students at

level one are characterised by simple, common, single-step thinking. It can be

expected that a student at this level successfully:

• understands and uses common percentages;

• adds decimals;

• performs a simple calculation involving a combination of basic operations on

whole numbers.

At level two, students’ thinking is still simple and single-step but has progressed to

encompass less common applications. A student at this level can be expected to

successfully:

• understand and use less common fractions and percentages;

• add fractions with different denominators;

• solve simple practical problems.

At level three, students’ thinking has progressed to the two-step stage and is

beginning to encompass abstract notation in the use of simple algebra. It can be

expected that a student at this level:

• understands percentages less than one;

• solves a two-step problem involving percentages;

• uses abstract notation in a simple question;

• solves a simple linear inequality.

The thinking of students at level four has progressed to the multi-step stage with

abstract thinking continuing into more complex applications. It can be expected that

a student at this level successfully:

• solves a multi-step problem involving percentages;

• performs calculations requiring order of operations; combinations of fractions

and square roots;

• orders positive and negative fractions and decimals;

Chapter 4 Numeracy

86

• substitutes into complex expressions, approximating if necessary to obtain an

answer;

• manipulates abstract notation in a multi-step problem;

• rearranges a rational expression;

• solves a pair of simple inequalities.

At level five, a student’s thinking is multi-step, synthesising concepts with persistent

use of abstract notation. At this level a student can successfully:

• perform a multi-step substitution;

• solve a rational inequality with the unknown on the denominator.

Posit ioning of the students’ ability levels in Figure 4.2 shows that most of the

students in the study have begun tertiary studies operating at level three, four or five,

although there are some at level two and a small percentage of students who are still

characterised by level one thinking. It must be emphasised again that the Numeracy

Questionnaire covers junior high school mathematics, not the assumed senior

mathematics knowledge.

4.3.4 What influences students’ scores?

General linear model analysis was used to explore relationships between students’

scores on the Numeracy Questionnaire and their demographic and mathematical

backgrounds and attitudes. When describing linear models the convention used in

this thesis is to italicise the names of variables so as to aid the discussion thereof.

The variables used in modelling the total Numeracy Scores were formed from data

obtained in the Background Information Survey (See Section 2.3 and Appendix A)

and are described below.

Gender male/female;

1st Semester dichotomous variable =1 for students for whom this was

their first year of enrolment at QUT;

Years Since School continuous variable calculated from response to ‘Year

finished high school’;

Chapter 4 Numeracy

87

OP continuous variable - OP (overall position) or equivalent

score;

Maths Student dichotomous variable =1 if enrolled in a mathematics

degree, a double degree including mathematics, an applied

science degree majoring in mathematics, or an education

degree with mathematics as a teaching subject;

Repeat dichotomous variable =1 if student had previously failed

MAB101; constructed from information supplied on

mathematics subjects previously studied at QUT and

course records.

The data collected from the Attitudes Survey should not, in the most part, be

expected to relate directly to the results of the Numeracy Questionnaire. However,

as the aspect self-efficacy involves mathematics and statistics, it is appropriate to

include this variable in the modelling process for numeracy. Hence we also include

the variable:

Self-Efficacy continuous variable with possible values ranging from

-10 to 10; constructed as the sum of responses to the five

relevant items on the Attitudes Survey and described in

Section 3.2.

Students also provided information on the mathematics subjects they had stud ied at

school and the results they obtained. The nesting of these variables made variable

definition somewhat complex and several possibilities were experimented with

during initial analyses. The most informative variables constructed and used for

analysis were:

Maths B Result a categorical variable with three levels:

D = distinction (an A or B standard at school level; a 6 or

7 standard in the university equivalent subject),

P = pass (any other passing grade),

N = failed or not attempted (includes all students who

have not successfully completed Maths B);

Chapter 4 Numeracy

88

Higher Maths a dichotomous variable =1 for students who have studied

mathematics beyond Maths B, generally either the

extension maths subject (Maths C) at high school or at

least some university level mathematics.

A final dichotomous variable was included to describe the cohort to which a student

belonged:

Year 04 or 05.

The initial form used to collect background information was adapted from a form

which students in MAB101 had, in previous years, been routinely asked to complete.

As explained in Section 2.3, on this form the OP was described as “optional”. This

remained on the form in 2004 and hence resulted in an underreporting of OP scores

in 2004. Although this non-reporting was understandably non-random, the nature of

the bias was difficult to predict. This wording and the associated non-reporting were

avoided in 2005, but the inclusion and interpretation of tertiary entrance scores in

2004 or when the two years are combined or compared requires considerable caution

and care.

The advantage of using general linear models is that all available variables can be

investigated simultaneously. In the complex situation of dealing with numerous

variables which are expected to be interdependent, this type of statistical procedure is

necessary in order to examine the effect of variables in the presence of others that

cannot be ignored. In such situations a multitude of two-way correlations can be

misleading and almost impossible to interpret correctly. Two-way correlations are

not reported in this thesis so as to avoid their problems. An exception to this is made

in Section 8.2 in order to facilitate the comparison of results found in this work with

those quoted in previous research.

The approach taken in this thesis to fitting general linear models is to use a

combination of backwards elimination and forwards selection. Beginning with all

available main effects and all allowable two-way interactions in the model,

insignificant terms are deleted until all terms remaining in the model are significant.

Individual terms are then reconsidered and included in the reduced model if they are

Chapter 4 Numeracy

89

significant. Not all two-way interactions can be fitted in the initial model. Clearly,

for example, no repeating student is in his or her first semester at QUT, and hence an

interaction between Repeat and 1st Semester cannot be included. Given the multiple

testing required in the case of interactions, a Bonferroni approach is taken, enforcing

more stringent requirements for significance of interactions.

Under this technique, for the two years combined and not allowing for OP, the model

which best describes the Numeracy Score involves as significant predictors: Maths B

Result (p<0.001), Maths Student (p<0.001), Gender (p=0.002), Higher Maths

(p=0.003), Self-Efficacy (p=0.003) and Year (p=0.035). This model explains 29.4%

of the variation in the Numeracy and residual analyses display no systematic

concerns with the model. (See Figure 4.3)

Standardized Residual

Per

cen

t

420-2-4

99.9

99

90

50

10

1

0.1

Fitted Value

Sta

nda

rdiz

ed R

esi

du

al

17.515.012.510.0

4

2

0

-2


Fre

que

ncy

3210-1-2-3

48

36

24

12

0

Observation Order

Stan

dard

ize

d R

esi

dua

l

650

600

550

500

450

400

350

300

250

200

150100501

4

2

0

-2

Normal Probability Plot of the Residuals Residuals Versus the Fitted Values

Histogram of the Residuals Residuals Versus the Order of the Data

Residual Plots for Numeracy

Figure 4.3 The residual plots for the model in Equation 4.2 show no systematic

concerns

The fitted equation from this linear model is given by:

Equation 4.2

[ ] ( ) ( )

( ) ( )(s.e.=0.37) (s.e.=0.35) (s.e.=0.41)

(s.e.=0.40)

10.44 1.10 male 1.23 1

1.57 D 0.86 F/N

E Numeracy Gender Higher Maths

Maths B Result Maths B Result

= + × = + × =

+ × = − × =

( ) ( ) ( ) (s.e.=0.64)

(s.e.=0.45) (s.e.=0.061.69 S 1 0.20 - 0.72 05 .Maths tudent Self Efficacy Year+ × = + × + × =

6) (s.e.=0.34)

Chapter 4 Numeracy

90

When the predictor variables: Maths B Result, Maths Student, Higher Maths and

Self-Efficacy are examined, it is not surprising to find that they are highly inter-

dependent. Maths students are more likely to have studied at a higher level, to have

obtained a higher Maths B result and to have higher self-efficacy. Similarly those

who have studied higher level maths tend to have better Maths B results and higher

self-efficacy. In fact all the two way relationships between these four variables are

highly significant (p<0.001) and in the expected direction. It is all the more

interesting then that each of these four variables contributes a significant (albeit

small) increase to the Numeracy Score when all the other variables have been

allowed for.

A similar effect is noted for the variable Gender. Males are more likely to be maths

students, to have studied higher level maths and to have higher self-efficacy, but not

necessarily to have obtained better results in Maths B. Still there is an additional

advantage to the skills score in being male, even after allowing for the other

variables.

Although TIMMS 2002/03 showed that there was no significant difference between

genders in overall scores in Australian schools at age approximately 13 years, there

was a significant difference in the number section of the TIMMS assessment for this

group with males outperforming females (Thomas and Fleming, 2004). As this

section of the TIMMS assessment covered the topics of whole numbers, fractions

and decimals, integers, ratio, proportion and percent, the topics most heavily

represented in the Numeracy Questionnaire, the higher scores of males here may not

be unexpected.

The difference between the two years may be attributable to a difference in the two

cohorts or perhaps to different administrative practices in the two years. As

described in Section 4.3, in 2005 students were told they would receive feedback in

their performance on the Numeracy Questionnaire for their own benefit, and this may

have motivated some students to exercise greater effort. When the two years are

considered separately (without including OP), significant predictors for each year are

subsets of the predictors for combined years. For 2004, the best and most

Chapter 4 Numeracy

91

interpretable model (R-squared=21.4%) involves Maths Student (p<0.001), Higher

Maths (p<0.001) and Self-Efficacy (p=0.002); while for 2005 (R-squared=29.7%) it

involves Maths Student (p<0.001), Maths B Result (p<0.001) and Gender (p=0.001).

When the OP score is included in the analysis there is in fact little difference in the

best model with Gender (p<0.001), Higher Maths (p<0.001), Maths B Result

(p=0.001), Self-Efficacy (p=0.001), Maths Student (p=0.005) and Year (p=0.017) all

remaining significant. OP is also significant in this model with p=0.024. The R-

squared value increases to 39% due partially to the extra variation in Numeracy

which has been explained by allowing for OP, but also to the drop in error degrees of

freedom (from 418 to 304) caused by the number of students who did not report an

OP or equivalent score. The residual plots for the model, given in Figure 4.4, show a

slight indication of some long-tailedness in the distribution but not sufficient to be of

concern.


Per

cent

420-2-4

99.999

90

50

10

1

0.1

Fitted Value

Stan

dar

diz

ed R

esid

ual

20.017.515.012.510.0

2

1

0

-1

-2


Fre

que

ncy

2.251.500.750.00-0.75-1.50-2.25

40

30

20

10

0

Observation Order

Sta

nda

rdiz

ed

Re

sidu

al

550500450400350300250200150100501

2

1

0

-1

-2

Normal Probability Plot of the Residuals Residuals Versus the Fit ted Values


Residual Plots for Numeracy

Figure 4.4 The residual plots for the model in Equation 4.3 show slight indication

of long-tailedness.

The fitted equation for the linear model allowing for OP is given by:

Chapter 4 Numeracy

92

Equation 4.3

[ ] ( ) ( )

( ) ( )(s.e.=0.66) (s.e.=0.36) (s.e.=0.43)

(s.e.=0.44)

11.26 1.34 male 1.67 1

1.02 D 1.69 F/N

E Numeracy Gender Higher Maths

Maths B Result Maths B Result

= + × = + × =

+ × = − × =

( ) ( ) (s.e.=0.75)

(s.e.=0.48) (s.e.=0.070) 1.37 S 1 0.23 - 0.15Maths tudent Self Efficacy OP+ × = + × − ×

( ) (s.e.=0.066)

(s.e.=0.36)0.87 05 .Year+ × =

It should be noted in Equation 4.3 that a smaller numerical value of OP is an

indicator of higher achievement and hence the negative coefficient of OP is to be

expected. OP scores range from a high of 1 to a low of 25, although the lowest score

over the two cohorts is 17. OP of course is also highly correlated with Maths B

Result, Maths Student, Higher Maths and Self-Efficacy. Students with better Maths

B results and those with higher self-efficacy tend to have better OP scores as do

those studying mathematics and those who have previously studied higher level

mathematics. Indeed, all the correlations between each pair of these five variables

are highly significant and in the expected direction. Given these interrelationships

and the non-random non-reporting of OP in one year of the study, it is both

interesting and surprising that exactly the same variables appear in Equations 4.2 and

4.3 with very similar coefficients. The fact that even when OP is allowed for, the

variables measuring the result in Maths B, whether or not higher level mathematics

has been studied, the student’s interest in mathematics (one implication of whether a

student is a maths student or otherwise) and the student’s self-efficacy in

mathematics, each contribute a significant (albeit small) increase to the Numeracy

Score is of importance.

4.3.5 What influences responses to individual questions?

For each item in the skills questionnaire, a logistic regression was performed on the

dichotomous response variable correct/incorrect using the six predictors (excluding

OP) which were significant in explaining total scores, namely: Gender; Maths B

Result; Higher Maths; Maths Student; Self-Efficacy and Year. This allows us to see

which of the students’ characteristics are most important in explaining their success

Chapter 4 Numeracy

93

at a particular question, remembering that the presence or absence of one

characteristic is dependent on the presence or absence of the others. Table 4.4 gives

these results with questions arranged in increasing order of difficulty. The p-values

indicated are those obtained while allowing for all other significant variables. While

acknowledging that some spurious results may be included here as a result of much

testing, all individually significant terms are noted for investigative purposes.

Question Gender Maths B Result

Higher Maths

Maths Student

Self- Efficacy Year

level one N_5 N_8 * + ** N_1 *** N_11 * + level two N_9 *** *** *** N_7 * *** N_3 *** *** N_10 *** *** * level three N_6 *** N_2 *** N_19 *** *** * level four N_12 *** *** * N_17 ** *** ** N_21 *** *** N_4 *** N_16 *** N_13 *** *** N_18 *** * ** N_14 *** * * level five N_20 *** * * N_15 * *** *

*** p<0.005, ** 0.005<p<0.01, * 0.01<p<0.05; + opposite direction to that expected

Table 4.4 Significance of predictors in the logistic regression for each item on the Numeracy Questionnaire

Chapter 4 Numeracy

94

Given that the level of items in the questionnaire is consistent with junior high school

mathematics, it is particularly interesting to note for which questions success is

strongly influenced by studying higher level (i.e. beyond Maths B) mathematics.

Apart from N_15 (an item which was found difficult across the range of student

characteristics), questions which require application of fractions beyond addition

(N_12, N_17, N_18, N_20) are all dependent on students having studied higher level

mathematics. This is consistent with comments made earlier regarding students’

tendency to regress in basic skills when application thereof is required.

Other questions which depend on higher level mathematics include those involving

inequalities (N_19, N_20, N_21). Interestingly, two of these questions (N_19,

N_20) also depend on being a maths student. Anecdotal evidence of teachers

suggests that inequalities are currently de-emphasised in the Queensland high school

curriculum. This is undoubtedly the reason that success in this area is heavily

dependent on studying higher level mathematics and having an interest in

mathematics.

One further question for which Higher Maths is a significant predictor is N_7. This

question asks:

A group of 340 students must be divided into lab classes with a maximum of 30

students in each. The smallest number of lab classes needed is:

It is important to emphasise that such a practical question is significantly influenced

by the study of mathematics beyond the standard senior algebra and calculus based

course because it is not sufficiently realised how much higher level mathematics

develops generic problem-solving skills. Further consideration of responses to this

question is even more enlightening. Table 4.5 gives the percentage of students who

gave each possible response for the three groups: those who had not studied Maths

B; those who had studied at the Maths B level; and those who had studied beyond

Maths B.

Not only does the success rate continue to increase with the level of mathematics

studied, but the percentage who choose the meaningless answer of 11.3 also falls

noticeably. Many students are encouraged to choose not to study Maths B or higher

Chapter 4 Numeracy

95

level mathematics on the basis that algebra and calculus are more specialised and less

applied and are not necessarily linked with ‘real life’ skills. This example clearly

questions such beliefs.

Response % students

below Maths B % students at Maths B

% students above Maths B

10 0.0 7.6 2.6 11 7.4 7.2 2.6 11.3 14.8 9.6 4.1 12 66.7 75.3 89.7 13 7.4 0.3 0.5 no response 3.7 0.0 0.5

Table 4.5 Student responses to item N_7 according to mathematical background

4.4 Discussion

The debate as to whether or not educational standards are falling is common among

educators, the media and the general public. This study does not attempt to address

this issue. Undoubtedly high school graduate capabilities have risen in some areas

and fallen in others. However tertiary educators need to be aware of which skills can

or cannot be assumed of the majority of their incoming students.

A senior algebra and calculus based mathematics course is a prerequisite for many

scientific or quantitative tertiary courses, and if not a formal prerequisite, the content

of senior mathematics is likely to be considered ‘assumed prior knowledge’. Such a

course depends crucially on pre-senior mathematics. When commenting on tertiary

students’ backgrounds, many tertiary educators consider only the courses of senior

schooling. In considering mathematical skills it is essential to consider the effects of

the pre-senior years. Senior algebra and calculus based mathematics courses will

consolidate pre-senior mathematics, but the extent of consolidation clearly depends

on the students’ pre-senior mathematics experiences. Hence tertiary educators need

greater awareness of the emphasis (or lack of emphasis) on mathematical skills in the

pre-senior years, and the extent of consolidation needed at the senior school and even

tertiary level for students to be able to apply these skills in the new or multi-step

Chapter 4 Numeracy

96

situations that characterise so many tertiary areas. Observing that 34% of the cohorts

in both years are operating with a numeracy ability at or below the lower boundary of

level three, as defined by this questionnaire, emphasises that tertiary educators

should be aware that individuals, even with a senior algebra and calculus based

background, may need help with some of the basics, particularly in unfamiliar or

multi-step situations.

There is a growing tendency within universities to maximise potential student intake

by reducing the number of prerequisite subjects. In quantitative areas this has

resulted in the removal of higher level mathematics as a prerequisite for any field of

study, and of the standard algebra and calculus based course for many areas. Due to

the increased variety of subjects available at senior high-school level, this removal of

mathematics as a prerequisite has resulted in a number of senior students opting for a

non-algebra, non-calculus based alternative or no mathematics at all. This decision

is rarely challenged and even encouraged by parents and guidance officers who

consider that unless specific mathematical content will be used in further study, then

it is not valuable. There is a general lack of understanding across the community of

the generic problem-solving skills that are acquired by students in studying specific

mathematical skills, and of the amount of study of mathematics that is required to

attain the Cockcroft (1982) ‘at-home-ness’ with mathematics. Compare this with

language studies in which the study of literature is rarely questioned.

This study demonstrates that in order to have full and confident use of basic pre-

senior mathematical skills, mathematics needs to be studied beyond, and preferably

well beyond, the pre-senior level. This enables students to consolidate their basic

mathematical skills to the point where they can be confidently and reliably used and

applied, particularly in multi-step problems. Without such consolidation, skills

which may be known in isolation are lost in application and the ‘at-home-ness with

numbers and ability to make use of mathematical skills’ of Cockcroft’s numeracy is

not developed. The conclusion which must be drawn by those advising school

students, is that students should be encouraged to continue with mathematics to the

highest level of their ability. Only then can they be certain of maximising their

competency with even basic mathematical skills.

Chapter 5 The Statistical Reasoning Questionnaire

97

Chapter 5 The Statistical Reasoning

Questionnaire

5.1 Introduction

The educational reforms on statistics, which have taken place over the last ten to

twenty years, could be largely summarised by the observation that the generally

agreed upon course goal is the development of statistical reasoning. This approach

has included highlighting the importance of authentic assessment that focuses on

statistical reasoning and the need for specific research evaluation tools. While it is

generally agreed that statistical reasoning is best assessed via one-on-one

communication and the analysis of student work in in-depth tasks such as projects, it

is also acknowledged that instruments are needed which are easy to administer and

score for the purpose of accurately evaluating statistical reasoning, particularly in a

research environment (Garfield, 2003).

The Statistical Reasoning Assessment (SRA) was developed in order to measure the

effectiveness of a new statistics curriculum for high-school students in the US

(Konold, 1990; Garfield, 1991, 2003) and has been used in a variety of settings since.

This instrument, which consists of 20 multiple-choice questions, focuses on

reasoning about data, representations of data, statistical measures, uncertainty,

samples and association. In the development stage of the SRA, it was reported that

low internal reliability coefficients suggested that the instrument did not measure a

single trait, leading to some confusion as to how best to interpret the SRA results

(Garfield, 2003). One interpretation involves the calculation of eight scales of

correct reasoning and eight scales of incorrect reasoning (or misconceptions). Each

of these 16 scales is measured by between one and five items with most items

contributing to both a correct and an incorrect reasoning scale and some items

contributing to more than one incorrect reasoning scale depending on the distracter


98

which is chosen. Garfield (2003) reports these 16 scales in a cross-cultural

comparison but relies on the aggregate correct reasoning score and the aggregate

incorrect reasoning score for statistical analysis.

Questioning the use of aggregate scores given their reported low reliability,

Tempelaar (2004) prefers to analyse the 16 individual scales, each of which is

assumed to be reliable, as well as the correct and incorrect reasoning aggregates.

Tempelaar also suggests (and later uses in the context of structural equation

modelling (Tempelaar, 2006)) combining individual correct and incorrect reasoning

scales in accordance with the results of factor analysis. Under such a scheme, pairs

of correct and incorrect reasoning scales would be chosen for combination by virtue

of large negative correlations. As Tempelaar describes, these correlations are a result

of the design of the scales since the pairs largely consist of correct responses (in the

case of the correct reasoning scale) and selected incorrect responses (in the case of

the incorrect reasoning scale) to the same questions. Such a solution seems to be

little more than a complex construction of a partial credit scheme where the choice of

some distracters (namely those indicating a particular misconception) is seen to be

“more wrong” than the choice of other incorrect responses.

One difficulty with analysing the individual scales from the SRA would appear to be

the small number of items (between one and five) on which each scale is based. A

larger number of items with a range of difficulty would be preferable to meaningfully

interpret individual scales. This is particularly the case if comparisons are to be

made between the results for different scales as Tempelaar has done. Of course,

increasing the number of items on each scale would quickly lead to a questionnaire

of considerable length and hence impracticality with regard to its administration.

The Quantitative Reasoning Quotient (QRQ) (Sundre, 2003) is an adaptation of the

SRA developed to address the limitation of low internal consistency and to improve

the ease of scoring with regard to some of the more complex items on the SRA. In

particular, items which require students to identify the correct rationale from a list

were adapted so that students needed to indicate their agreement or disagreement

with each correct or incorrect rationale presented. The aim of this was to elicit

further information regarding students’ misconceptions from single items, consistent


99

with the view (Konold, 1995) that students can simultaneously hold multiple

contradictory beliefs, and also to increase the number of items from the same

domain, hopefully leading to an improvement in internal reliability.

In keeping with the use of the SRA, the QRQ focuses on scales of correct and

incorrect reasoning, rather than a total score. It includes three additional scales of

correct reasoning and seven additional scales of incorrect reasoning, although many

of the added scales of incorrect reasoning simply indicate a lack of a particular aspect

of correct reasoning, for example “failure to recognise potential sources of bias and

error”. Identification of such deficiencies seems to add little to the understanding of

statistical reasoning beyond the information contained in the corresponding correct

reasoning scale. This is in contrast to the majority of the original incorrect reasoning

scales which measure previously documented misconceptions such as an outcome

orientation (Konold, 1989) and the representativeness misconception (Kahneman, et

al., 1982; Shaughnessy, 1992) which are applied within and across a range of

situations.

In another approach, Watson and Callingham (2003) use Item Response Theory to

analyse 80 questionnaire items administered to over 3000 Australian school children

and argue that statistical literacy is a single construct. While the collection of much

of these data was via interview, some involved the administration of a pen and paper

instrument focusing specifically on the understanding of variation. Included in the

80 items are a number of items also included in the SRA, as well as it ems specific to

the school curriculum and items taken from the media. One conclusion from this

work would be that it should be possible to measure statistical reasoning as a single

trait without resorting to individual scales which consider various aspects of correct

and incorrect reasoning. This need not detract from the analysis of individual items

or combinations thereof with the intent of adding to the understanding of students’

reasoning and misconceptions.

In this chapter, we introduce an instrument, the Statistical Reasoning Questionnaire

(SRQ), for use at the interface of secondary and tertiary education. Section 5.2

describes the construction of the SRQ, drawing on the work of Garfield, and Watson

and Callingham. Section 5.3 details the responses of the students in the study to the


100

individual items of the SRQ. Section 5.4 begins a discussion of the suitability of the

SRQ as an instrument to measure statistical reasoning at the secondary/tertiary

interface. This discussion continues in Section 6.5 after further analyses of the SRQ

are carried out in Chapter 6 using Rasch methods.

5.2 Construction of the Statistical Reasoning Questionnaire

(SRQ)

The Statistical Reasoning Questionnaire (SRQ) was developed specifically for this

study to assess the statistical reasoning of students entering the unit MAB101,

Statistical Data Analysis 1, at the Queensland University of Technology. As

described in Chapter 2, students in this unit are enrolled in a science degree or a

degree program associated with science. In constructing the SRQ we considered

each item of the Statistical Reasoning Assessment (SRA) (Konold, 1990; Garfield,

1991, 2003) and the items of Watson and Callingham (2003), in the context of

statistical reasoning at the secondary/tertiary interface. We chose three items

common to these two instruments and adapted two more. Another two items from

the SRA and six items from the Watson and Callingham Questionnaire were

included. The remaining nine items were developed based on experience at the

secondary/tertiary interface, with five of these specifically relevant to Australian

school syllabi, particularly the officially assumed knowledge for the unit. Table 5.1

indicates the sources and aspects of reasoning of the 22 items. The complete SRQ

can be found in Appendix F and the questions are also given in the discussion of

Section 5.3.

In developing the SRQ, we referred to the six aspects of statistical reasoning listed by

Garfield (2003). These are:

• Reasoning about data: recognising data types and understand ing

implications for use;


101

• Reasoning about representations of data: being able to read and interpret a

graph and understand how it represents a sample; seeing beyond individual

data points to general characteristics of the distribution of the data;

• Reasoning about statistical measures: understanding and using appropriate

measures of central tendency and variation to describe and compare data;

knowing that such summaries are better for large samples;

• Reasoning about uncertainty: understanding the concept of probability to

make decisions about uncertain events; being able to determine and compare

simple probabilities;

• Reasoning about samples: understanding the relationship between a sample

and a population; knowing the importance of size and representativeness;

• Reasoning about association: being able to interpret relationships between

two variables, including reading of scatter plots and two-way tables; knowing

the difference between association and causality.

This list does not explicitly mention reasoning about variation which we also

included in our reference framework. Garfield considered the above aspects of

statistical reasoning to encompass the skills covered by a new US high school

curriculum which the SRA was designed to assess. The essential elements of these

aspects, together with reasoning about variation, were considered by us to be an

appropriate basis for assessing the statistical reasoning of students at the

secondary/tertiary interface, although we place less emphasis on determining and

comparing simple probabilities.

The first of Garfield’s aspects, reasoning about data, is assessed only implicitly in the

SRA. MAB101 includes specific and explicit teaching regarding types of data and

this emphasis is continued throughout the course in the teaching of techniques for

data analysis. This approach increases continuity within the course and helps

students to acquire skills in selecting statistical tools appropriately. While it would

be desirable to know the preparedness of students for this approach, it is difficult to

assess such understanding without relying on specifically taught vocabulary such as

the terms categorical and continuous variables. Hence in the SRQ, assessment of


102

reasoning about data, in the Garfield sense of recognising and using data types,

remains implicit.

The second aspect, reasoning about representations of data, is also covered only

implicitly by the SRA. As the students at the secondary/tertiary interface in this

study have generally been well exposed to reasoning about representations of data,

and the construction of items which assess this specific aspect of reasoning is

uncomplicated, it was felt that this was a distinct area of reasoning which warranted

explicit coverage in the SRQ. One aspect we identified in this area of reasoning is a

commonly held misconception “prettier is better” by which people believe that a

more complex graphical representation is preferable to a simpler version. This

misconception is frequently developed during primary school where increased access

to computer programs enables students to select an embellished graph without regard

for understanding. The example used in the SRQ is a three-dimensional pie graph.

Absent from Garfield’s list of aspects of statistical reasoning is reasoning about

variation. This is somewhat unexpected given the emphasis in the literature on the

recognition and understanding of variation as a vital component of statistical

reasoning (see for example, Chance and Garfield (2002), Meletiou-Mavrotheris and

Lee (2002) and Watson et al.(2003)). However some aspects of reasoning about

variation are covered in the SRA with two questions contributing to a correct

reasoning skill labelled “understands sampling variability”, which may be seen as a

component of reasoning about samples. Perhaps reasoning about variation is so

fundamental to statistical reasoning that, like reasoning about data, it too is thought

to be implicit in most aspects of reasoning. In the SRQ, reasoning about variation

features implicitly in many items and explicitly in SRQ_2, SRQ_10, SRQ7, SRQ_9,

SRQ_13 and SRQ_16. The last four of these items also describe reasoning about

samples and hence we refer to this aspect as Reasoning about Samples and Variation.

One aspect of the SRA which was eliminated from the SRQ is a heavy reliance on

questions involving tossing of coins and rolling of dice, with seven out of twenty

questions of the SRA being framed in that context. Such reliance is typical of

traditional examples in the area of chance. There appear to be two main reasons for

this. The first is that because these contexts are familiar, they can be made accessible


103

to students with a minimum of explanation, a useful feature particularly with regard

to assessment items. The second reason for the reliance on these contexts is that

much of the research into the development of young children’s understanding of

chance is set in this context. The second reason depends partly on the first and also

on the fact that experiments with dice and coins (also cards, coloured marbles and

spinners) can be easily carried out in the classroom.

It has been previously argued (Gal and Garfield, 1997; Wild and Pfannkuch, 1999)

that context is crucial in statistical reasoning and that students must be involved in

genuine contexts in order to fully develop their statistical reasoning. While educators

undoubtedly intend that dice and coins are simple representations of more general

and complex contexts, the assumption that students of any age at an introductory

level are capable of making the leap from dice, coins and cards to people, days and

products is unfounded. Research indicates that errors in probabilistic reasoning are

frequently associated with problems referring to everyday life (Kahneman, et al.,

1982) even for those with some formal training (Kahneman and Tversky, 2000). It

may be that over reliance on artificial and simple contexts prevents students from

taking skills and understanding to real world contexts. Hence we believe that such

artificial examples are better avoided wherever possible in examples and assessment.

The four questions involving dice in the SRA constitute the section “uses

combinatorial reasoning” of the scale “correctly computes probability” and are also

intended to detect either the outcome orientation misconception (in which a

probability is interpreted in terms of an individual outcome rather than a series of

events) or the equiprobability bias (which assumes that all events under consideration

are equally likely). However, the heavy reliance of these questions on calculation

means that incorrect responses which are attributed to particular misconceptions are

at least as likely to be the result of incorrect numerical calculation as an application

of a misconception. An example of this is found in the SRA question which asks

which of the outcomes: 5 black and 1 white, or 6 black, is more likely when a die

with five black sides and one white is rolled five times. These two possible

outcomes are too close in probability to be intuitive to most people and hence require

the application of the binomial distribution. Such use of combinatorics to calculate

probabilities was an earlier foundation of Queensland senior high-school


104

mathematics classes in probability, but is misplaced when included in an

introductory data analysis course. It is an example of emphasising calculation, and in

fact irrelevant calculation, possibly at the expense of understanding. Perhaps the

relative emphasis on questions of this nature in the SRA reflects an emphasis in the

curriculum which the SRA was designed to assess. However, as such specific

knowledge of the behaviour of probability distributions and combinatorial reasoning

is not required for MAB101, nor is it an emphasis in the current core Queensland

senior high-school mathematics curriculum, the SRQ does not include any questions

of this nature.

The three coin tossing items in the SRA are intended to constitute the correct

reasoning scale “understands independence” and are also intended to detect the

representativeness misconception by which people determine the likelihood of an

event by how well it represents a population. For example, one such question reads:

Which of the following sequence is most likely to result from flipping a fair coin 5

times?

a) HHHTT

b) THHTH

c) THTTT

d) HTHTH

e) all four sequences are equally likely.

(Responses a, b and d are taken to indicate a representativeness misconception.)

Care needs to be taken with the interpretation of this scale as the application of

independence in these three items is within the single setting of binomial trials. It is

highly likely that a student may have developed an understanding of independence in

this setting without extending it to other contexts. It is in the application of

independence to real world contexts that students (and professionals) frequently

come to grief. The SRQ uses just one coin tossing item, set in a sporting context for

greatest familiarity.

An emphasis in the work of Watson (1993; 1997) has been in the use of examples

from the media or set in a media context to assess statistical literacy. For much of


105

society, the media are the most common source of statistical information and provide

a context which is appreciable to a wide range of students including those at the

secondary/tertiary interface. Hence, a number of items of this nature are adapted

from Watson and Callingham (2003) for the SRQ.

A group of curriculum-specific questions (SRQ_17 to SRQ_20) were developed for

the SRQ requiring an understanding of quantiles. These questions are set in the

context of data which has been transformed by taking the logarithm to base ten, and

hence require students to combine two distinct areas of knowledge from their senior

mathematics studies. Because of the specific curriculum-dependent nature of this set

of questions and the level of mathematical knowledge they require, they are not

included in some of the analyses which are described in Chapter 7 where

relationships between statistical reasoning, numeracy and background are explored.

Another curriculum-specific question (SRQ_22) was added to the SRQ in its second

year, based on the use of the normal distribution. Inclusion of this question was

prompted by classroom observation that first-year students appeared to have less

confidence with the normal distribution than in previous years. This question

requires students to transform to the standard normal distribution and to interpret

probability as an area under the curve, utilising the symmetry of the distribution.

Another change to the SRQ in its second year was the removal of SRQ_4 and

addition of SRQ_21. The first of these questions was removed because of an

extremely high success rate of 96%. In fact the replacement question proved to be

almost as easy for students with a success rate of 94%. In the analyses of Chapter 7,

only questions common to both years are included.

As well as selecting questions to cover the different aspects of reasoning, items were

intentionally chosen with varying degrees of difficulty. In the Watson and

Callingham study, Rasch analysis is used to define levels of reasoning from Level 1:

Idiosyncratic to Level 6: Critical mathematical. As eleven of the items on the SRQ

appear in or are similar to the Watson and Callingham study, including some

common to the SRA, the Rasch analysis variable map could be used to select items

whose responses covered the full range of levels. For each question on the SRQ,

Question Source Aspect Reasoning about… Skill Misconception

Watson & Callingham

Level

SRQ_1 SRA; W&C Statistical measures Understands algorithm for mean Mean is most common value Mean is middle value 4

SRQ_2 SRA; W&C* Statistical measures Variation Understands how to select an appropriate average Mean is most common value

Mean calculated regardless of outliers 6

SRQ_3 SRA; W&C** Uncertainty Understands probability as a ratio 4 SRQ_4 SRA; W&C Uncertainty Correctly interprets probabilities Outcome approach 3 SRQ_5 W&C Uncertainty Correctly interprets probabilities Conjunction; Availability 3 SRQ_6 W&C Uncertainty Understands independence Representativeness 4

SRQ_7 SRA; W&C Samples Variation Understands importance of large samples Law of small numbers

Outcome approach 5

SRQ_8 W&C Representations of data Knows that sum of percentages should be 100% 5

SRQ_9 W&C Samples Variation Understands sampling-re-sampling process 6

SRQ_10 SRA Uncertainty Variation Correctly interprets probabilities Outcome approach

SRQ_11 W&C Representations of data Recognises importance of scale on graph 6 SRQ_12 Representations of data Can choose an appropriate graph Prettier is better

SRQ_13 SRA Samples variation Understands sampling variation Law of small numbers

SRQ_14 W&C Statistical measures Association Understands importance of rate Correlation implies causality 6

SRQ_15 Association Distinguishes between correlation and causality Correlation implies causality

SRQ_16 Samples Variation Understands sampling variation

SRQ_17 Statistical measures Can calculate a median SRQ_18 Statistical measures Can calculate a lower quartile SRQ_19 Statistical measures Understands effect of data transformation on median SRQ_20 Statistical measures Understands effect of data transformation SRQ_21 Representations of data Recognises importance of scale on graph SRQ_22 Uncertainty Understands use of standard normal curve

Table 5.1 Questions in the SRQ are chosen to assess all aspects of reasoning at the full range of levels

* SRQ_2 w as a multiple-choice question in SRA and W&C ** SRQ_3 used less obvious values than in SRA and W&C


107

Table 5.1 reports the levels of understanding demonstrated by a fully correct

response to that or similar question in the Watson and Callingham study. As a partial

credit model was used in that study, the lower levels of understanding, particularly

levels 1 and 2, are generally consistent with partially correct responses to items.

Also included in Table 5.1 is the source of each question, as well as aspects of

reasoning, skill and misconception assessed by each item.

5.3 Results of the SRQ

MAB101 students completed the Statistical Reasoning Questionnaire in class during

the first week of semester. For the SRQ, students were permitted to use calculators

(unlike the Numeracy Questionnaire). Although the majority of students found the

time in class ample, a handful of students who required more time opted to complete

the questionnaire at home. It was clearly explained to students that, although the

questionnaire was relevant to their studies in the subject, the results obtained would

have no bearing on their grade, but would be used solely for research purposes. The

majority of students who attended classes during the first week of semester were

willing to participate, with approximately 66% of enrolled students completing the

SRQ.

The SRQ was administered to three separate cohorts of students: first semester 2004,

second semester 2004 and first semester 2005. Where students were included in the

study within more than one cohort, only one set of data was included. This was

selected such that it belonged to the cohort in which the student had completed a

larger number of survey instruments over the entire study, or, where an equal number

had been completed, the most recent cohort was selected. There were ten students

for whom such a decision needed to be made with regard to the SRQ.

The distribution of results over the three cohorts was remarkably similar. For the

nineteen questions common to both years, Figure 5.1 shows the summary statistics

and boxplots of the total score for each of the three cohorts. A one-way ANOVA for

differences in the mean scores gives p>0.3. Further inspection of responses to


108

individual questions demonstrated further consistency between the three cohorts. For

this reason the groups are pooled for the discussion of individual questions which

follows.

Descriptive Statistics: SRQ_Com Variable Cohort N Mean SE Mean StDev SRQ_Com I_04 300 9.193 0.166 2.877 I_05 237 8.865 0.196 3.011 II_04 75 9.267 0.322 2.787 Variable Cohort Minimum Q1 Median Q3 Maximum SRQ_Com I_04 0 7 9 11 18 I_05 2 7 9 11 18 II_04 4 7 9 11 15

Cohort

SRQ

_Com

II_04I_05I_04

20

15

10

5

0

Boxplot o f SRQ Total (Common ques tions ) by Cohort

Figure 5.1 The distribution of SRQ Scores is consistent across the three cohorts.

5.3.1 Student responses to individual items

This section discusses students’ responses to individual items with reference to the

main aspect of reasoning assessed. All items, but not all responses, are discussed

here. A complete summary of responses can be found in Appendix G.

Reasoning about representations of data

Items SRQ_8, SRQ_11, SRQ12 and SRQ_21 assess reasoning about representations

of data. SRQ_8 requires students to critically read a pie chart from the media and

recognise that it sums to more than 100%.


109

SRQ_8 Is there a problem with the following pie

chart? If so, identify the problem.

A fully correct response was given by 72% of students while another 6% recognised

that the pie chart was not drawn in correct proportion but did not take the final step

of checking the percentages.

In item SRQ_12, students are required to choose between a two and three

dimensional pie chart, giving reasons for their choice.

SRQ_12 Another group of students carried out a survey at the local library

regarding the most frequent reason for using the internet. They

produced the following two graphs.

Graph A Graph B

Internet Usage

contact friends33%

study22%

entertainment17%

work14%

information6%

other8%

Internet Usage

contact friends33%

study22%

entertainment17%

work14%

information6%

other8%

Which of the graphs, A or B, would you recommend the students use

and why?

Fifty-eight percent of students chose the two-dimensional graph, justifying their

choice with explicit or implicit reference to the greater accuracy of the two-

dimensional representation. While 28% preferred the three-dimensional version,

only 7% of students demonstrated a “prettier is better” misconception by choosing


110

the three-dimensional graph giving an incorrect reason such as “it’s more

professional.” This is pleasing because this has been emphasised in recent years in

professional development workshops and other advice to teachers.

Items SRQ_11 and SRQ_21 both require an appreciation of the importance of scale

in reading a graph.

SRQ_11 A group of students recorded the number of years their families had

lived in their town. Here are two graphs that the students drew to

illustrate their results.

Graph 1

x x x x x x x x x x x x x x x x x x x x x x

0 1 2 3 4 5 6 10 11 12 13 14 17 25 37

YEARS IN TOWN

Graph 2


0 5 10 15 20 25 30 35

YEARS IN TOWN

Which of these two graphs (1 or 2) would you recommend the

students use and why?

SRQ_21 The graph below shows the number of members of the American

Mathematical Society.

Join the ever increasing number of professionals who enjoy the benefits of AMS membership! 29

28

26

24

22

20

18

1987 1988 1989 1990 1991

Mem

bers

hip

in th

ousa

nds


111

Circle any of the following statements which are correct:

a) Membership of the AMS in 1991 was twice what it was in 1989.

b) Between 1987 and1991 membership of the AMS doubled every

two years.

c) Membership of the AMS could reasonably be expected to reach

50000 by 1992

d) None of the above statements is correct.

In item SRQ_21 (introduced in the second year of the questionnaire), students need

to read the scale on the y-axis of the graph and not rely on the visual illusion created

by the size of the bars. Nearly all students (94%) did this correctly. Item SRQ_11 is

more difficult (with 69% success rate), although non-constant scale, this time along

the x-axis, is still the focus of the question. The most likely reason for the degree of

difference in the correct response rate to these questions is that in SRQ_21, the form

of the question requires students to make a numerical comparison which prompts

them to read the values on the graph rather than rely on the visual image. It is

possible that some of these students answer correctly without even noticing the

discrepancy between the numerical and visual interpretation of the graph. Item

SRQ_11, by comparison, is open-ended and does not prompt a numerical reading of

the graph so that students are less likely to detect the changing scale.

Reasoning about statistical measures

Items SRQ_1 and SRQ_2 assess reasoning about statistical measures, as do the more

curriculum-based items SRQ_17 to SRQ_20. SRQ_1 requires an interpretation of

the statement that the average number of children is 2.2.

SRQ_1 To get the average number of children per family in a small town, a

teacher counted the total number of children in the town. She then

divided by 50, the total number of families. The average number of

children per family was 2.2.

Which of the following is certain to be true?


112

a) Half of the families in the town have more than 2 children.

b) More families in the town have 3 children than have 2 children.

c) There are a total of 110 children in the town.

d) There are 2.2 children in the town for every adult.

e) The most common number of children in a family is 2.

f) None of the above.

In the SRA component scales, a correct answer to this question is deemed towards

the correct reasoning scale “understands how to select an appropriate average”. This

interpretation is perhaps a little generous as all selection has already been carried out

for the student. Rather, the question requires interpretation of the mean in terms of

the algebraic unpacking of the algorithm. In this question students are able to choose

more than one response. Two-thirds of students chose only the correct response to

this question. The misconception that the mean is always the middle value was

demonstrated by 3% of students, while the belief that the mean is the most common

value was demonstrated by 19%.

SRQ_2 requires the selection of an appropriate average for a sample in which, given

the context, one value is clearly an error.

SRQ_2 A small object was weighed on the same scales separately by nine

students in a science lab. The weights (in grams) recorded by each

student are shown below.

6.3 6.0 6.0 15.3 6.1 6.3 6.2 6.15 6.3

The true weight could be estimated in several ways.

How would you estimate it?

Only 37% of students demonstrated their appreciation of the context of the question

by removing the outlier before calculating the mean. An interesting response, given

by 9% of students, was to delete both the maximum and minimum values before


113

calculating the mean. Although this response shows some degree of understanding,

it demonstrates the same lack of engagement with context as many other responses.

The mode, either on its own or in combination with another measure, was suggested

as an appropriate average by 10% of students; this reflects a known problem with

school syllabi and school texts in incorrectly including mode as a ‘measure of centre’

without qualification.

As well as being specifically curriculum-dependent, SRQ_17 to SRQ_20 are more

computation and terminology based than most of the other questions. They assess

whether students can find the median (SRQ_17) and first quartile (SRQ_18) of the

data set and whether they understand the relationship between transformed and

original data (SRQ_19 and SRQ_20).

Total rainfall (in millimetres) for the month of November 2003 was collected from 25

weather stations over a region of northern Australia. The data below represent the

logarithms (to base 10) of the rainfall. These values have been ordered from

smallest to largest for ease of manipulation.

0.00 1.04 1.04 1.18 1.36

1.54 1.56 1.59 1.62 1.62

1.64 1.64 1.71 1.71 1.75

1.76 1.77 1.77 1.80 1.86

1.92 1.95 2.03 2.14 2.41

SRQ_17 What is the median of the above data set (i.e. the median of the log

of the rainfalls)?

SRQ_18 What is the lower quartile of the data set?

SRQ_19 Using your answer to question SRQ_17, calculate the median total

rainfall (in millimetres) for the region.

SRQ_20 What was the highest total rainfall recorded for the month in the

region?

The median could be correctly identified (the 13th value in a set of 25) by 67% of

students with another 8% choosing a nearly correct value (such as the 12th). Finding

the first quartile proved far more difficult. The 7th or the average of the 6th and 7th


114

values was accepted as a correct response, with 14% of students giving one of these.

It is particularly interesting to note that 23% of students gave a range of values (such

as 0 to 1.56) in response to this question. This concept that a quartile is a range of

values appears to be quite prevalent at high school and is one of which tertiary

educators are generally unaware. Although the idea of dividing the distribution into

quarters may still be part of the students’ understanding, it takes some rethinking on

the students’ part to move their understanding of the terminology from a range of

values to a single value and they are likely to need some specific teaching to do so.

The results of SRQ_19 and SRQ_20 show that very few students know how to back-

transform from the logarithm of the data to the original data despite studying logs at

school. In SRQ_19, 17% of students gave the correct transformation of the median,

6% reported the median without transformation, 11% gave an incorrect but plausible

transformation (i.e. one involving logs, exponents or powers of 10) while 53% gave

no response. In responding to SRQ_20, 17% of students gave the correct

transformation of the maximum, 39% reported the maximum without transformation,

11% gave an incorrect but plausible transformation and 30% gave no response. The

difference between the numbers of students who simply gave the recorded value in

the two questions is almost certainly due to the hint given in SRQ_19 to use the

answer to SRQ_17. With the hint and the preceding question, the students knew that

they needed to do something to the data but did not know what, hence the high non-

response rate. Without the hint, (even though the context remains unchanged and

SRQ_17 has already been attempted) it is very probable that the students would not

realise that the maximum needed to be back-transformed.

Reasoning about uncertainty

Reasoning about uncertainty is addressed by SRQ_3 through to SRQ_6. SRQ_3

requires students to understand probability in terms of a ratio.


115

SRQ_3 Box A and Box B are filled with red and blue marbles as follows.

Box A Box B

Each box is shaken. In order to win a ticket to a sporting match, you

need to get a blue marble, but you are only allowed to pick out one

marble without looking. Which box should you choose?

a) Box A (with 12 red and 8 blue).

b) Box B (with 30 red and 20 blue).

c) It doesn’t matter which box is chosen.

An accurate comparison of the two probabilities was made by 82% of students.

Rough working on the questionnaire and classroom discussion later in the semester

suggested that the majority of errors in this question were the result of inaccurate

calculation rather than the misunderstand ing of probabilities. A typical error was to

calculate the ratio of red to blue marbles in one box but red to all marbles in the

other. It is suggested that a lack of confidence and familiarity with common

fractions (see section 4.3.1) is the cause of this error. This item receives further

attention in Section 7.4.

SRQ_4, requiring a correct interpretation of a probability in a daily context, was

correctly handled by nearly all (96%) of students in 2004 leading to its non- inclusion

in the 2005 version of the SRQ.

SRQ_4 A bottle of medicine has the following printed on it: WARNING: For

applications to skin areas there is a 15% chance of getting a rash. If

you get a rash, consult your doctor. How would you interpret this?

a) Don’t use the medicine on your skin – there’s a good chance of

getting a rash.

b) For application to the skin, apply only 15% of the recommended

dose.

30 red 20 blue

12 red 8 blue


116

c) If you get a rash, it will probably involve only 15% of the skin.

d) About 15 out of every 100 people who use this medicine get a

rash.

e) There is hardly any chance of getting a rash using this

medicine.

The choice of the distracter which interprets a 15% chance as ‘hardly ever’ is

classified as demonstration of an outcome approach by Garfield (2003).

Demonstration of this misconception at this degree of difficulty is minimal (2% of

students) at this level. In SRQ_10, however, 44% of students demonstrated this

same misconception by claiming that a weather prediction which gave a 70% chance

of rain was very good if it rained on 85 to 100% of days for which such a prediction

was made. This result is comparable to that (46%) reported by Konold (1995).

SRQ_10 The Bureau of Meteorology wanted to determine the accuracy of

their weather forecasts. They searched their records for those days

when the forecaster had reported a 70% chance of rain. For those

particular days (that is, those days for which the forecast was stated

as a 70% chance of rain), they compared the forecast with records

of whether or not it actually rained.

The forecast of 70% chance of rain can be considered very accurate

if it rained on:

a) 95% - 100% of those days

b) 85% - 94% of those days

c) 75% - 84% of those days

d) 65% - 74% of those days

e) 55% - 64% of those days

In the SRQ 46% of students answered this question correctly.


117

SRQ_5 requires students to understand probabilities involved in two events and their

conjunction.

SRQ_5 An Australian male is rushed to hospital in an ambulance. Which of

the following is least likely?

a) The man is over 55.

b) The man has had a heart attack.

c) The man is over 55 and has had a heart attack.

Both incorrect responses are evidence of the conjunction fallacy (in this case

believing that being over 55 and having a heart attack is more probable than either

one of the individual events). No doubt the reason behind the belief is due to the use

of an availability heuristic (Kahneman, et al., 1982), as it is easier to call to mind

examples of the conjunction than of the individual events. Sixty-two percent of

students answered this question correctly.

SRQ_6 requires students to understand independence in a coin tossing context.

SRQ_6 As captain of your cricket team you have lost 8 out of 9 tosses in

your previous 9 matches. For the next 4 tosses of the coin, you

choose heads. Tails comes up 4 times. For the 5th toss, what

should you choose?

a) Heads

b) Tails

c) It doesn’t matter

What is the probability of getting heads on this 5th toss?

What is the probability of getting tails on this 5th toss?

Note: assume only fair coins are used in tosses at cricket matches!

It was thought that the wording of this question in first semester 2004, which

included the phrase “Suppose you decide to choose heads from now on”, may have

led to some students believing that heads was the correct choice for the 5th toss while


118

acknowledging that the probabilities of heads and tails were equal. For this reason,

the leading phrase was deleted in second semester and subsequently. In fact, the

percentage of students who chose heads but gave both probabilities as equal to a half

fluctuated from 9%, through 11%, to 7% between the three cohorts. Due to this

minimal and inconsistent change, the results from the three cohorts have been pooled

in the analyses despite the slightly different wording. A complete understanding of

independence in the context of this question was demonstrated by 76% of students.

SRQ_22 regarding interpretation of normal probabilities was introduced to the SRQ

in 2005 because it was observed in 2004 that a slight change in emphasis at school

level (see below) was having a large effect on students’ abilities with visualisation

and sketching in this context.

SRQ_22 The heights of first year female university students are normally

distributed with a mean of 165 cm and a variance of 4 cm2 . The

graphs below are all of the standard normal distribution, that is,

normal with mean 0 and variance 1. Choose the graph in which the

shaded area gives the probability that a randomly chosen first year

female student has a height of more than 161 cm.

a) b) c) d)

e)

3210-1- 2-3 3210-1- 2-3

3210-1- 2-3 3210-1- 2-3

3210-1- 2-3


119

A correct response involves the ability to transform from a given mean and variance

to a standard normal distribution and to interpret a probability as the area under the

curve, utilising the symmetry of the distribution. This is core content in all school

mathematics syllabi done by these students. A complete understanding was

demonstrated by 26% of students. One of the possible reasons for the difficulty

students experienced with this question is the increasing emphasis on graphics

calculators in senior high school mathematics. While judicial use of this technology

can enhance a student’s understanding, inappropriate use can have the opposite effect

(Ben-Zvi, 2000). When students were required to transform, draw a diagram,

manipulate and look up tables to calculate a probability, they acquired skills which

could be both transferred and generalised. When a graphics calculator is used to

calculate a normal probability, three or four numbers are entered and an answer

obtained with little understanding being gained regarding the visualisation of the

normal distribution or the relationship between values, areas and probabilities, and

almost no skills acquired which can be readily transferred to a different technological

environment.

Reasoning about samples and variation

Reasoning about samples is assessed by SRQ_7, SRQ_9, SRQ_13 and SRQ_16.

These items also include reasoning about variation more explicitly than other items.

SRQ_7 assesses an understanding of the importance of large samples, asking

whether a decision would be better made on the basis of a large or small sample.

SRQ_7 Mrs Jones wants to buy a new car, either a Honda or a Toyota. She

wants whichever car will break down the least. First she read in

Consumer Reports that for 400 cars of each type, the Toyota had

more breakdowns than the Honda. Then she talked to three friends.

Two were Toyota owners, who had no major breakdowns. The other

friend used to own a Honda, but it had lots of break -downs, so he

sold it. He said he’d never buy another Honda. Which car should

Mrs Jones buy?


120

a) Mrs Jones should buy the Toyota, because her friend had so

much trouble with his Honda, while the other friends had no

trouble with their Toyotas.

b) She should buy the Honda, because the information about

break-downs in Consumer Reports is based on many cases,

not just one or two cases.

c) It doesn’t matter which car she buys. Whichever type she gets,

she could still be unlucky and get stuck with a particular car that

would need a lot of repairs.

Sixty-seven percent of students demonstrated their understanding of the importance

of large samples. The response that the choice does not matter as anything could

happen, is said to indicate an outcome approach (Garfield, 2003). While 30% of

students chose this response, it is difficult to know whether or not they would employ

such an attitude in a real life situation. Perhaps their response is a little flippant,

exhibiting a general mistrust of survey results.

In SRQ_9 students are asked to estimate the size of a population on the basis of a tag

and recapture process.

SRQ_9 A farmer wants to know how many fish are in his dam. He took out

200 fish and tagged each of them. He put the tagged fish back in

the dam and let them get mixed with the others. On the second day,

he took out 250 fish in a random manner, and found that 25 of them

were tagged. Estimate how many fish are in the dam.

While a situation such as this is most likely outside the experience of students at this

level, an intuitive understanding of estimation, samples and representativeness

should result in a correct response as demonstrated by 55% of students, with another

11% giving a ‘nearly correct’ answer (usually the result of an incorrect ratio or

taking the wrong ratio, illustrating once again a lack of confidence with fractions).

This question was notable for the number of different responses: 60 in all. Seventeen


121

percent of students gave the response 425 or similar. This value is obtained by

counting up the number of distinct fish, tagged and untagged, which are seen in the

process. Such a response demonstrates a lack of appreciation for variation and the

underlying concept of estimation which is surprising for students at this level.

Question SRQ_13, the hospital problem, originating from Kahneman and Tversky

(1972), requires an appreciation of the variation in large and small samples.

SRQ_13 Half of all newborns are girls and half are boys. Hospital A records

an average of 50 births a day. Hospital B records an average of 10

births a day. On a particular day, which hospital is more likely to

record 80% or more female births?

a) Hospital A (with an average of 50 births a day).

b) Hospital B (with an average of 10 births a day).

c) The two hospitals are equally likely to record such an event.

Students who believe that the likelihood of observing 80% or more female births is

the same regardless of sample size are said to be susceptible to the “law of small

numbers”. This misconception was demonstrated by 58% of students, while 35%

demonstrated an appreciation of the importance of sample size by correctly selecting

the smaller hospital. In Kahneman and Tversky’s original sample of 15 to 18 year-

olds, a similar percentage of students (56%) demonstrated the misconception of “the

law of small numbers”, but the choice between the large and the small hospital was

divided almost equally, whereas in this group only 6% chose the larger hospital. (1%

did not respond.).

In SRQ_16, students were asked to decide which, if any, of three samples

demonstrated greater than expected variation from a theoretical set of proportions.

One sample was significantly different from the theoretical model, while the other

two were well within acceptable variation.


122

SRQ_16 A Brisbane City Council brochure states that on a typical summer

weekend, users of the Goodwill Bridge fall into the following age

groups.

Age group percentage of users

0-10 5

11-20 20

21-30 40

31-40 15

41-50 8

51-60 5

61-70 5

71+ 2

One typical summer weekend, 100 people were observed crossing

the bridge. Which of the data sets given below would cause you to

question the information in the brochure?

a) Set A only.

b) Set B only.

c) Set C only.

d) Set A and B only.

e) Set A and C only.

f) Set B and C only.

g) Set A, B and C.

h) None of A, B or C.

Set A Set B Set C

Age group percentage of users Age group percentage

of users Age group percentage of users

0-10 5 0-10 10 0-10 7

11-20 13 11-20 20 11-20 19

21-30 32 21-30 45 21-30 43

31-40 14 31-40 10 31-40 13

41-50 13 41-50 5 41-50 10

51-60 12 51-60 5 51-60 3

61-70 11 61-70 3 61-70 4

71+ 0

71+ 2

71+ 1


123

Forty-two percent of students correctly identified the single, clearly unusual sample;

38% felt that none of the three samples threw doubt on the theoretical proportions;

10% chose the extreme sample together with others; while the remainder chose

samples other than the truly anomalous sample. This suggests that most incoming

students have an understanding of sample variation, but that the extent of acceptable

variation is difficult for them to assess.

Reasoning about association

SRQ_14 and SRQ_15 both deal with reasoning about association. In SRQ_14, a

researcher draws a conclusion regarding heart-related deaths and vehicle-registration

on the basis of an increase in the numbers of both.

SRQ_14 A local newspaper published the following article:

Do you agree with Mr Robinson’s findings? (Please explain your

response.)

In SRQ_15, a conclusion is drawn relating academic achievement to learning to play

a musical instrument.

SRQ_15 A music teacher was pleased to read the following article in a

professional journal:

Do you agree with these research findings? (Please explain your

response.)

Instrumental Lessons Improve OPs.

Research has shown that learning to play a musical instrument during primary school increases a child’s chance of attaining a good OP. In a longitudinal study, students enrolled in primary schools across Queensland since 1990 have been followed until they graduated from high school. Of those students who were involved in an instrument program, 20% obtained an OP of 5 or better, while the figure was 15% for those not involved in instrumental music.

Family car is killing us, says researcher

Twenty years of research has convinced Mr Robinson that motoring is a health hazard. Studying figures from the Australian Bureau of Statistics, Mr Robinson has produced graphs which show quite dramatically that as the numbers of new vehicle registrations increase, so have the numbers of deaths due to heart-related causes.


124

A fully correct response to SRQ_15 required the discussion of other factors

(effectively confounders) which may have influenced the result. A similar response

to SRQ_14 was not credited as fully correct as a more fundamental issue in this

situation was seen to be the realisation that due to an increasing population, some

sort of rate would need to be calculated, even if more subtle aspects were then

considered. Only 16% of students identified the increasing population as a problem

in SRQ_14, while 27% discussed possible confounders. Interestingly in SRQ_15

only 18% raised the issue of confounders. In fact, while only 8% of students agreed

with the researcher’s conclusion in SRQ_14 (3% non-response rate), 40% did so in

SRQ_15 (8% non-response rate). Undoubtedly this difference demonstrates the

extent to which prior beliefs are able to influence one’s ability to think critically

regarding analysis of data, as while a link between tertiary entrance scores and

musical exposure is plausible, a link between heart disease and car registrations is

not. Another way in which this is demonstrated is by the most common (31%)

response to SRQ_14, namely disagreement with the findings solely on the basis that

the two factors are “unrelated” or that the result is “just coincidence”. In this

statement students are justifying their response with prior belief, without legitimate

criticism of the analysis. These two items illustrate the tension that exists for the

user of statistics between personal preconception, statistically supported studies

(good or otherwise) and input from other sources.

5.4 Discussion - Suitability of the SRQ

The Statistical Reasoning Questionnaire was developed to assess the statistical

understanding of students at the interface of secondary and tertiary education

enrolling in a science or science-related degree program in an Australian university.

Its development drew on the work of Garfield (1991; 2003) whose Statistical

Reasoning Assessment was designed for students of similar age who had undertaken

a new high school statistics curriculum in the US, and the work of Watson and

Callingham (2003) who had interviewed Australian school children from primary to

junior high school. In designing the SRQ, particular attention was given to selecting

questions at an appropriate range of levels and avoiding questions based on


125

combinatorial reasoning and relying on artificial examples of coin tossing and dice

throwing.

In administering the SRQ to the three cohorts of students, the distribution of total

scores remained remarkably consistent. This consistency of outcomes is an

indication of the reliability of the questionnaire as a measuring tool and its suitability

for students at this level. The range of student success rate for different items (from

14% to 96%) is indicative of its appropriateness for the wide range of abilities and

backgrounds present in these students at the secondary/tertiary interface. Further

analysis of the questionnaire is reported in Chapter 6.

Chapter 6 Rasch Analyses for the Statistical Reasoning Questionnaire

127

Chapter 6 Rasch Analyses for the Statistical

Reasoning Questionnaire

6.1 Introduction

In this chapter, the Statistical Reasoning Questionnaire is analysed using the

techniques of Rasch methods. Two different approaches are taken. In the first

approach, described in Section 6.2, responses are scored dichotomously and the

simple Rasch model fitted as with the Numeracy Questionnaire in Chapter 4. This

dichotomous approach is used as a basis for the definition of the SRQ Score.

In the second approach, described in Section 6.3, responses are scored

polychotomously and the more complex Rasch partial credit model fitted to the data.

This model forms the foundation for the introduction of the SRQPC Score. The

construction of this SRQPC Score is a significant development and extension of the

work of Watson and Callingham (2003) and applies their framework of statistical

literacy in a new scoring approach. In Section 6.4, the Rasch partial credit model is

used to investigate the expected responses of students to individual items in the SRQ.

Section 6.5 concludes this chapter with a critical examination of the SRQ regarding

its suitability for use at the secondary/tertiary interface.

6.2 Dichotomous Rasch Model

The simplest technique for combining items on the SRQ is to assign to each fully

correct response a score of one and to all other responses a score of zero. Under this

system, the total score is simply the number of fully correct responses.

As described in Chapter 4, the dichotomous Rasch model is given by:


128

Equation 6.1

( )( )( )( )

exp; 0,1; 1,..., ; 1,... ;

1 expn i

nin i

xP X x x n N i L

β δ

β δ

−= = = = =

+ −

where niX is the response of person n on item i , nβ is the ability parameter

associated with person n and iδ is the difficulty parameter for item i .

This model has the advantage that the existence of sufficient statistics, nR (the total

score for person n ) for nβ , and iS (the number of correct responses to item i ) for iδ ,

allows the estimation of item difficulty and person ability on a single scale (Keeves

and Alagumalai, 1999).

When n iβ δ= , person n has a probability of 0.5 of succeeding at item i .

This model was fitted to the 22 SRQ items administered to three cohorts (semester I

2004, semester II 2004 and semester I 2005) totalling 612 students. Overall person

and item statistics are given in Table 6.1

item separation reliability 0.99 item infit mean square mean=1.00 SD=0.10 person separation reliability 0.62 person infit mean square mean=1.01 SD=0.25

Table 6.1 Fit statistics for the dichotomous Rasch model

The statistics in Table 6.1 are indicative of a good fit. The infit average item and

person mean square values are close to the expected value of one, suggesting that

items on the questionnaire are measuring a single, one-dimensional construct. The

separation reliability index of 0.99 provides evidence that the items provide a good

spread of difficulty, although the person separation reliability index of 0.62 is a little

low.

Considering the items individually, most item infit mean square values fell between

0.90 and 1.13. The exceptions to this were item 20 which had an infit mean square


129

of 0.80 and item 19 with an infit mean square of 0.77. According to Keeves and

Alagumalai (1999), an item is generally accepted as fitting the Rasch model if it has

an infit mean square lying between 0.77 and 1.30, although some researchers would

prefer a more restricted range of 0.83 to 1.20. Although items 19 and 20 satisfy the

broader requirements, they are clearly borderline cases. Both these items (together

with items 17 and 18) are classified as curriculum-based items as they require an

understanding of the relatively mathematical concept of a logarithm and, in the case

of item 19 (and 17), as they require familiarity with the specific statistical

terminology of “median” (and “quartile” in the case of item 18). The lack of fit of

item 19 is discussed further in Section 6.3 with respect to the polychotomous Rasch

model.

The variable map shown in Figure 6.1 displays items on the right-hand side and

students on the left-hand side, illustrating the parameters nβ and iδ respectively, on

a single logistic scale. It can be seen that the items cover most of the range of

students’ abilities, although it should be noted that coverage in the upper range of

ability is highly dependent on curriculum-based items. It may not be possible to

measure the full range of students’ statistical reasoning independent of curriculum on

entering university. During analysis of the SRQ, there were also a number of

indications that higher statistical reasoning at the secondary/tertiary interface may

not be able to be separated from mathematical reasoning. These are possible

questions for further research.

While the Rasch model is able to calculate parameter estimates for all items,

including those which were not administered to all cohorts, only items which were

administered to all students can be meaningfully included when combining items into

a single scale. Given the more curriculum-dependent nature of items 17 to 20, it was

also decided that these four items would be better not included in the formation of an

overall score for use in investigating dependence on other variables. Hence the SRQ

Score, which is used for further analysis in Chapter 7, consists of the items SRQ_1 to

SRQ_3 and SRQ_5 to SRQ_16 (SRQ_4 having been discarded in 2005 when

SRQ_21 and SRQ_22 were introduced).


130

SRQ all - 1_04, 2_04, 1_05: ---------------------------------------------------------------------------- Item Estimates (Thresholds) all on all (N = 612 L = 22 Probability Level=0.50) ---------------------------------------------------------------------------- 4.0 | | | | | | | 3.0 | | | | X | | | 18 2.0 XX | 14 | 19 20 X | 15 XXX | XX | XXX | 22 XXX | 1.0 XXXXXXX | XXXX | 13 XXXXXXXX | 2 XXXX | 16 XXXXXXXXX | XXXXXX | 10 0.0 XXXXXXXXXXXXXX | 9 | XXXXXXXXXXXXXXXXXXX | 11 | 12 XXXXXXXXXXXXX | 5 | 1 7 17 XXXXXXXXX | -1.0 | 8 XXXXXXXX | | 6 XXX | | 3 XX | | -2.0 | X | | | | X | | 21 -3.0 | | 4 | | | | | -4.0 | ---------------------------------------------------------------------------- Each X represents 5 students =====================================

Figure 6.1 The variable map for the dichotomous Rasch model displays the item and person parameter estimates.


131

6.3 Polychotomous Rasch Model

A more sophisticated approach to scoring the Statistical Reasoning Questionnaire is

to allow partial credit for responses in a way that reflects the level of understanding

demonstrated. Masters (1982) described the use of a more complex Rasch model to

allow for polychotomous response variables. This Rasch partial credit model has

been used by Watson and Callingham (2003) in their study of statistical literacy

(which included items on which some SRQ items were based) and Reading (2002) in

her study of statistical understanding.

The Rasch partial credit model is given by:

Equation 6.2

( )( )

( )

( )

1

1 1

1 1

exp; 1,2,..., ; 1,..., ; 1,... ;

1 exp

1; 0;

1 exp

i

i

x

n ijj

ni im k

n ijk j

m k

n ijk j

P X x x m n N i L

x

β δ

β δ

β δ

=

= =

= =

−= = = = =

+ −

= =+ −

∑

∑ ∑

∑ ∑

where niX is the response of person n on item i , nβ is the ability parameter for

person n and ijδ is the difficulty parameter associated with giving response j rather

than 1j − on item i .

In this model the number of possible responses, 1im + , is allowed to vary between

items.

The model effectively simplifies to a dichotomous model at each response boundary

within an item, so that, conditioning on having either response 1x − or x , we have:

Equation 6.3

{ }( ) ( )( )

exp| 1, ; 1,2,..., ; 1,..., ; 1,... .

1 expn ix

ni ni in ix

P X x X x x x m n N i Lβ δ

β δ−

= ∈ − = = = =+ −


132

So when a person’s ability parameter nβ is equal to ixδ , there is a probability of 0.5

of giving response x to item i , given that the response choice is 1x − or x . This

stepwise interpretation of the partial credit model is consistent with increasing values

of the response x reflecting increasing levels of understanding. However, because

the explanation is conditional on the response being equal to either 1x − or x , it is

not necessary for the ixδ to be increasing with x . In the case where the ixδ are

unordered, there exists at least one response value which is never the most probable

for any value of β . (Masters, 1988a)

As in the dichotomous model, one of the benefits of the polychotomous Rasch model

is that sufficient statistics exist for nβ and ijδ , with nR , the total score for person n ,

being sufficient for nβ and the number of students who give response j to item i

being sufficient for ijδ .

In his initial development of the partial credit model, Masters (1982) states that the

numerical response values niX “indicate only the ordering of the response categories

and are not used as category ‘weights’”. However, mathematically the use of nR in

estimating nβ contradicts this assertion since, for example, a maximum response

value of four for one item gives that item more weight in estimating person ability

than an item with a maximum response of two. Further difficulty with the model

was raised by Jansen and Roskam (1986) who argue essentially that the estimate of a

person’s ability should be invariant under a change in coding of responses,

particularly by joining categories. They showed that the partial credit model, as

described in Masters (1982), does not satisfy this requirement and hence questioned

its use.

The most convincing response to these difficulties with the partial credit model, is to

ensure that the values assigned as item responses are numerically meaningful.

Wilson and Masters (1993) describe this as a “response framework” where response

values have a “uniform substantive meaning” across all items. Wilson and Masters

also reparameterised the partial credit model in a way that allows the inclusion of

null categories, often an important further requirement if such a response framework


133

is to be implemented consistently across items. A null category is one in which no

response occurs either by design in the coding process or by observation in the data.

The presence of a null category results in the allowable response codes being non-

consecutive.

In scoring the SRQ items for a partial credit model, we began by looking to Watson

and Callingham’s application of the partial credit model to statistical literacy. As a

number of SRQ items are drawn from this study it was hoped that the same scoring

system could be implemented and extended. The intention of Watson and

Callingham was that the codes they used reflect the cognitive frameworks of Biggs

and Collis (1982; 1991) and Watson’s own three-tiered framework established in

previous work (Watson, 1997). However, inspection of codes assigned to various

questions indicated problems with regard to the meaningfulness of codes across

questions. As an example, consider the following two questions from Watson and

Callingham:

Box A and Box B are filled with red and blue marbles as follows.

Box A Box B

Each box is shaken. In order to win a ticket to a sporting match, you

need to get a blue marble, but you are only allowed to pick out one

marble without looking. Which box should you choose?

A farmer wants to know how many fish are in his dam. He took out

200 fish and tagged each of them. He put the tagged fish back in

the dam and let them get mixed with the others. On the second day,

he took out 250 fish in a random manner, and found that 25 of them

were tagged. Estimate how many fish are in the dam.

60 red 40 blue

6 red 4 blue


134

The first of these questions appears in the Statistical Reasoning Questionnaire as

SRQ_3 with slightly more difficult numbers, while the second question appears as

SRQ_9 in this form.

For the box question, scoring by Watson and Callingham assigned a code of three for

the correct choice, it doesn’t matter, accompanied by an explanation of correct

proportional reasoning. A response which relied on the absolute number of marbles

rather than proportions was assigned a code of two, while a response that anything

could happen or idiosyncratic reasoning was assigned a code of one. A code of zero

was restricted to no response. By contrast, for the fish question, a correct response of

2000 was assigned a code of one and an incorrect or no response was assigned a code

of zero. Under this coding, when the Rasch partial credit model is applied to these

two questions, a correct response to the box question carries greater weight than a

correct response to the fish question. One defendable position would be to argue

that, given that a correct response to each question requires the use of proportional

reasoning, the highest response code for the two questions should be the same. It

could also be argued that the level of understanding required to correctly answer the

fish question is greater than that required for the box question and that therefore the

fish question should receive greater weighting. In fact there may be incorrect

responses to the fish question which demonstrate the same level of understanding as

the fully correct response to the box question.

An inspection of coding for the complete set of Watson and Callingham items

reveals that, while there appears to be a response framework applied within each

item, there has been no equating or moderating of numerical codes across items to

satisfy a ‘uniform substantive meaning’ across all items. If the latter is not satisfied,

a response to one item may be attributed the same numerical code as a response to

another item, but the two responses may reflect a different level of understanding

within the underlying overall response framework.

One reason for this occurrence with the Watson and Callingham scoring system is

the restriction to consecutive values for scores. Given the Wilson and Masters

reparameterisation of the partial credit model to allow null categories, this is not

necessary and a coding system which incorporates jumps can be legitimately applied.


135

Watson and Callingham used the results of their Rasch partial credit analysis,

together with their previously hypothesised framework to describe six hierarchical

levels of understanding within statistical literacy, namely:

1. Idiosyncratic – relying on idiosyncratic engagement with context,

tautological use of terminology and fundamental mathematical skills;

2. Informal – relying on informal engagement with context, reflecting intuitive

beliefs, single aspects of terminology and basic one-step calculation;

3. Inconsistent – requiring selective engagement with context, conclusions

without justification, qualitative use of statistics;

4. Consistent non-critical – requiring non-critical engagement with context,

multiple aspects of terminology, some appreciation of variation, basic

quantitative statistical skills;

5. Critical – requiring critical engagement with context, appropriate use of

terminology, qualitative statistical skills but not including proportional

reasoning;

6. Critical mathematical – requiring critical and questioning engagement with

context; understanding of subtle aspects of language, use of proportional

reasoning.

Despite the problems in the Watson and Callingham coding system, the relative

positioning of item step difficulties produced by the Rasch analysis, results in the

above hierarchy being consistent and well described. It is suspected that, due to the

number of items used (80), the effect of code variation between items may not have

significantly affected the outcome of the analysis. In any case, the levels are well

defined by the descriptors given above. We have used this hierarchy to inform

choices of levels in building a response framework for scoring the SRQ.

The descriptors of the levels in the Watson and Callingham hierarchy, together with

the Rasch output underpinning those descriptors, was used to score item responses to

the SRQ according to the levels of understanding which they reflect. Table 6.2 gives

Item number Item Response

W&C initial

code

W&C final level

Into SRQPC Explanation for change Out of

SRQPC

There are a total of 110 children - only response. 3 ? 5 4

There are 2.2 children for every adult. 2 3 3 3 The most common number of children in a family is 2.

2 3 3 3

None of the above. 2 3 3 3 Multiple responses. 2 3 3 3 Half the families have more than 2 children.

1 3 3 3

More families have 3 children than have 2 children.

1 3 3 3

SRQ_1

To get the average number of children per family in a small town, a teacher counted the total number of children in the town. She then divided by 50, the total number of families. The average number of children per family was 2.2. Which of the following is certain to be true?

No response. 0 0 0

Initial code 3 doesn’t appear on the W&C map although discussion suggests it is also at level 3. We scored at level 5 on the basis that critical reasoning is needed to unpack the mean algorithm.

0 Exclude outlier to calculate mean. 3 6 6 5 Use median; use median with outlier excluded. 3 6 5 5

Uncertain about outlier; calculate mean; mean with max & min excluded.

?; 2; ? ?; 4; ? 4 4

Mode; mean of max & min; do it again multiple times & average.

3; ?; ? 6; ?; ? 3 3

6; discard outlier but then unclear ?; ? ?; ? 2 3 repeat once only ? ? 1 3

SRQ_2

A small object was weighed on the same scales separately by nine students in a science lab. The weights (in grams) recorded by each student are shown below. 6.3 6.0 6.0 15.3 6.1 6.3 6.2 6.15 6.3 The true weight could be estimated in several ways. How would you estimate it?

other; no response 0 0 0

Given the context, at tertiary level, the median is considered less ideal than the mean with outlier excluded and the mode unacceptable. Hence needed to differentiate between responses that were all coded as completely correct by W&C.

0

It doesn’t matter – correct reasoning. 3 4 4 4

Box A/B – numeric explanation. 2 2 1 1

Box A/B - idiosyncratic reasoning. 1 2 1 1 SRQ_3

Box A and Box B are filled with red and blue marbles as follows. Box A Box B

Each box is shaken. In order to win a ticket to a sporting match, you need to get a blue marble, but you are only allowed to pick out one marble without looking. Which box should you choose?

No response. 0 0 0

No explanation was required for the SRQ. Selecting A or B is considered sufficiently low level for tertiary entrance to score as 1.

0


30 red 20 blue

12 red 8 blue


W&C initial

code

W&C final level


SRQPC

About 15 out of every 100 people who use this medicine get a rash; or this with either of next 2

2 3 3 3

Don’t use the medicine on your skin – there’s a good chance of getting a rash.

1 1 1 2

There is hardly any chance of getting a rash using this medicine.

1 1 1 2

For application to the skin, apply only 15% of the recommended dose.

0 0 0 0

If you get a rash, it will probably involve only 15% of the skin. 0 0 0 0

SRQ_4

A bottle of medicine has the following printed on it: WARNING: For applications to skin areas there is a 15% chance of getting a rash. If you get a rash, consult your doctor. How would you interpret this?

No response. 0 0 0

0 The man is over 55 and has had a heart attack. 1 3 3 4

The man is over 55. 0 0 1 1 The man has had a heart attack. 0 0 1 1

SRQ_5 An Australian male is rushed to hospital in an ambulance. Which of the following is least likely?

No response. 0 0 0

Question on W&C requires estimation of probability of each event. SRQ coding is by analogy. Generally restrict code 0 to no response. 0

Either – 0.5, 0.5. 2 4 4 4 Heads; tails – 0.5, 0.5. 1 2 2 3 No choice – 0.5, 0.5. ? ? 2 3 Either – any or no prob given. 1 2 2 3 Heads; tails – probs add to 1. 0 0 1 3 Other. 0 0 0 0

SRQ_6

As captain of your cricket team you have lost 8 out of 9 tosses in your previous 9 matches. For the next 4 tosses of the coin, you choose heads. Tails comes up 4 times. For the 5th toss, what should you choose? What is the probability of getting heads on this 5th toss? What is the probability of getting tails on this 5th toss? No response. 0 0 0

Some credit deserved for realising probabilities should sum to one.

0 Buy the Honda, the information in Consumer Reports is based on many cases...

3 5 5 4

It doesn’t matter which car she buys. She could still be unlucky.

2 2 2 2

Buy the Toyota, because her friend had so much trouble with his Honda, ...

1 1 1 1

SRQ_7

Mrs Jones wants to buy a new car, either a Honda or a Toyota. First she read in Consumer Reports that for 400 cars of each type, the Toyota had more breakdowns than the Honda. Then she talked to three friends. Two were Toyota owners, who had no major breakdowns. The other friend used to own a Honda, but it had lots of break-downs. Which car should Mrs Jones buy? No response 0 0 0

0

Table 6.2 conti nued over


W&C initial

code

W&C final level


SRQPC

Adds to more than 100% 2 5 5 4

Out of proportion. 2 5 5 4 “Other” too large; content and heading inconsistent.

1 4 4 4

Uncertain re importance of >100%. ? ? 3 4

Not enough info. ? ? 2 4

Other. ? ? 1 4

SRQ_8 Is there a problem with the following pie chart? If so, identify the problem.

Style issues; no response. 0 0 0

Could have separated the first two responses, but chose to remain consistent with W&C. Inserted levels at lower end.

0

2000 1 6 6 5

Responses that have calculated 10% then stopped. 0 0 5 5

Incorrect responses between 1000 and 3000. 0 0 4 4

Responses >3000 or between 600 & 800. 0 0 2 4

Responses between 250 and 550. 0 0 1 3

SRQ_9

A farmer wants to know how many fish are in his dam. He took out 200 fish and tagged each of them. He put the tagged fish back in the dam and let them get mixed with the others. On the second day, he took out 250 fish in a random manner, and found that 25 of them were tagged. Estimate how many fish are in the dam.

No response; <250. 0 0 0

Top code has been maintained as for W&C output. Partial credit created. Responses between 1000 & 3000 appear to be result of calculating proportions incorrectly. Responses between 250 and 550 appear to be result of counting observed fish – eg 200+(250-25) – no recognition of sampling. 0

65% - 74% of those days. 5 5

75% - 84% of those days. 3 5

95% - 100% of those days. 2 2

85% - 94% of those days. 2 2

55% - 64% of those days. 1 1

SRQ_10

The Bureau of Meteorology wanted to determine the accuracy of their weather forecasts. They searched their records for those days when the forecaster had reported a 70% chance of rain. For those particular days ..., they compared the forecast with records of whether or not it actually rained. The forecast of 70% chance of rain can be considered very accurate if it rained on: No response. 0

Not in W&C. Correct response demonstrates critical thinking consistent with level 5.

0



W&C initial

code

W&C final level


SRQPC

Graph 2 – uniform scale; more accurate; easier to use. 3 4 4 4

Graph 2 – clearer; looks better; no reason.

2 3 3 4

Graph 2 – grouped; other. 1 2 2 2

Graph 1 – any reason. ? ? 2 2

SRQ_11

A group of students recorded the number of years their families had lived in their town. Here are two graphs that the students drew to illustrate their results: Graph 1: not to scale Graph 2: to scale Which of the graphs would you recommend the students use and why? No response. 0 0 0

Difference in ages of subjects (between W&C and SRQ) makes it difficult to equate preference explanations.

0 Graph A – size of pieces reflects proportions; more accurate; easier to compare pieces; clearer.

4 5

Graph A – can be done by hand; simpler; all the same; no reason.

3 4

Either; Graph B any reason. 2 2

SRQ_12

Another group of students carried out a survey at the local library regarding the most frequent reason for using the internet. They produced the following two graphs: Graph A: 2D pie graph Graph B: 3D pie graph Which of the graphs would you recommend the students use and why? No response. 0

Not in W&C. Assignment of codes consistent with previous question.

0

Hospital B. 6 5

The two hospitals are equally likely to record such an event. 3 3

Hospital A. 2 2 SRQ_13

Half of all newborns are girls and half are boys. Hospital A records an average of 50 births a day. Hospital B records an average of 10 births a day. On a particular day, which hospital is more likely to record 80% or more female births?

No response. 0

Not in W&C. Correct response requires critical mathematical understanding consistent with code 6. “Equally likely” implies an understanding of proportions being equal – a higher level understanding than choosing the larger absolute value in hospital A

0



W&C initial

code

W&C final level


SRQPC

Disagree – population increase so need to use rate. 6 6

Disagree – other factors. 5 5 Disagree – correlation doesn’t imply causation; just coincidence; not enough info.

4 4

Agree – large sample size. 3 4

Disagree – other or no reason. 2 3

Agree – explains a possible link; other. 2 3

Agree – no reason. 1 3

SRQ_14

A local newspaper published the following article: (regarding increase in vehicle registrations being linked to heart disease) Do you agree with Mr Robinson’s findings?

No response. 0

The W&C item is phrased in terms of “What questions would you ask?” Coded at only 3 levels with top code (2) awarded for questions regarding existence of other causes, or how the two are linked. For SRQPC code 6 was restricted to the identification of need for rate - considered a critical mathematical response.

0

Disagree – other factors. 5 6 Disagree – correlation doesn’t imply causation; just coincidence; not enough info; want to check data.

4 5

Disagree – not enough difference in %; too much difference in sample sizes; conditional probability confused.

3 5

Agree – large sample size. 3 5

Disagree – other or no reason. 2 3 Agree – statistical info was supplied; explains possible link; other. 2 3

Agree – personal experience; no reason. 1 3

SRQ_15

A music teacher was pleased to read the following article in a professional journal: (regarding link between OP and music lessons) Do you agree with these research findings?

No response. 0

Not in W&C. Coded to maintain consistency with previous item.

0



W&C initial

code

W&C final level


SRQPC

A only. 5 5

A & B only; A, B & C; none. 4 4

Can’t tell. 3 4

B only; C only; A & C only; B & C only.

2 3

Circled individual values (sem1 2004). 1 3

SRQ_16

A Brisbane City Council brochure states that on a typical summer weekend, users of the Goodwill Bridge fall into the following age groups. (table given) One typical summer weekend, 100 people were observed crossing the bridge. Which of the data sets given below would cause you to question the information in the brochure? A, B, C are in decreasing order of difference from brochure. No response. 0

Not in W&C. Correct choice requires appreciation of variation, consistent with level 5.

0

SRQ_17 to 20

Total rainfall (in millimetres) for the month of November 2003 was collected from 25 weather stations over a region of northern Australia. The data below represent the logarithms (to base 10) of the rainfall. These values have been ordered from smallest to largest for ease of manipulation.

Not in W&C

Correct (1.71 - 13th obs) 4 4 Data value near median; max/2; middle row; mean of middle row.

3 4

Mean; modes. 2 4

Other. 1 4

SRQ_17 What is the median of the above data set (i.e. the median of the log of the rainfalls)?

No response. 0

0

Correct (7th or 6.5th obs) 5 6 6.25th obs.; posn of quartile given not value. 4 6

Data value near quartile; med/2; max/4; first quarter of data.

3 5

3rd quartile. 2 5

Other. 1 4


No response. 0

Correct response requires more terminology than for SRQ_17.

0



W&C initial

code

W&C final level


SRQPC

Correct 6 - Incorrect trans of median involving logs, exp or powers of 10 5 -

Other trans of median 4 - Median; trans of mean 3 - Other 1 -

SRQ_19 Using your answer to question SRQ_17, calculate the median total rainfall (in millimetres) for the region

No response 0

Correct response has greater mathematical requirements than for SRQ_18.

- Correct 6 6 Incorrect trans of max involving logs, exp or powers of 10

5 5

Max 3 4 Other 1 4

SRQ_20 What was the highest total rainfall recorded for the month in the region?

No response 0

As for SRQ_19.

0 None of the above statements is correct. 2 2 Membership of the AMS in 1991 was twice what it was in 1989 1 1

Between 1987 and1991 membership of the AMS doubled every two years.

1 1

Membership of the AMS could reasonably be expected to reach 50000 by 1992.

0 0

SRQ_21 The graph below shows the number of members of the American Mathematical Society: (Graph is out of proportion)

No response. 0

Straightforward graph (at tertiary level) consistent with level 2

0 Shaded area Z<2. 6 5 Shaded area Z<-2. 4 5 Shaded area Z>1. 3 4 Shaded area Z>2. 3 4

Shaded area 0<Z<1. 1 4

SRQ_22

The heights of first year female university students are normally distributed with a mean of 165 cm and a variance of 4 cm2 . The graphs below are all of the standard normal distribution, that is, normal with mean 0 and variance 1. Choose the graph in which the shaded area gives the probability that a randomly chosen first year female student has a height of more than 161 cm No response. 0

Correct response requires use of symmetry – mathematical requirement consistent with level 6.

0

Table 6.2 Each item response was coded for the Rasch partial credit model on the basis of a substantive framework.


143

the justification for the scoring applied to each item. The procedure applied was as

follows. Where an SRQ item was used in the Watson and Callingham study, a

response the same or similar to one described by Watson and Callingham received a

score equal to the level of understanding in which that response occurred in the

Watson and Callingham Rasch variable map. (A few minor deviations from this are

explained in Table 6.2.) Where the SRQ identified partially correct responses not

identified by Watson and Callingham, these were scored below the maximum level

of the item in the Watson and Callingham Rasch output, endeavouring to maintain

consistency with our framework. Where an SRQ item did not have an equivalent

within the Watson and Callingham study, responses were scored to reflect our

framework.

Having scored the SRQ responses to our framework, the Rasch partial credit model

was fitted to the 612 students from three cohorts, using the Quest (Adams and Khoo,

1996) program. Quest uses a joint maximum likelihood procedure to estimate the

parameters of the model in its reparameterised form with the advantage of catering

for null categories. In fitting the model, students who score full or zero marks and

items which are answered fully correctly by all or no students cannot contribute to

the estimation process. As there was one student who scored full marks under the

partial credit scoring system, there were 611 students who contributed to the fit of the

model. 10

Table 6.3 shows the fit statistics for the analysis. The average infit mean square

values for both items and persons are again very pleasing, suggesting that the Rasch

partial credit model fits the data well. The item separation reliability index at 0.88 is

not as high as it was for the dichotomous model. Item separation reliability improves

as the number of persons increases. With 611 students it is probably not surprising

that for 22 items, the dichotomous model gave a very high item reliability. In the

partia l credit model, the number of item parameters to be estimated increases

substantially and hence a larger number of students is needed to maintain precision.

Given that our partial credit model allows for up to six levels, an item reliability of

10 Scoring of SRQ_8 gave the highest code to students who noted the pie graph was out of proportion but did not take the final step of noting that it added to more than 100% - the only answer counted as correct in the dichotomous model. As the student in question fell into this category, he was not deleted from the estimation process in the dichotomous case.


144

0.88 is still quite pleasing. On the other hand our person separation reliability index

has risen slightly from 0.62 in the dichotomous case to 0.72 in fitting the partial

credit model. The infit mean square of each item was also examined to determine

the fit of individual items to the model. As explained in Section 6.2, it is regarded

that infit mean square values should lie between 0.77 and 1.30 and those which lie

outside 0.83 to 1.20 may also be of concern. The infit mean square for each item is

given in Table 6.4. It would appear that item 19, with an infit mean square of 0.71,

does not sufficiently fit the Rasch partial credit model and that item 20 may also be a

cause for some concern.

All 22 items Item 19 excluded

item separation reliability 0.88 0.87 item infit mean square mean=1.00 SD=0.11 mean=1.00 SD=0.08 person separation reliability 0.72 0.66 person infit mean square mean=0.98 SD=0.36 mean=1.00 SD=0.36

Table 6.3 Overall fit statistics for the Rasch partial credit model

One of the reasons an item may not fit the Rasch model is due to dependencies on

other items. It is possible that SRQ_19 falls into this category as it specifically asks

for a transformation to the median which had been identified in SRQ_17. Care was

taken, however, in scoring SRQ_19 to ensure that full credit was given in the case of

students who gave an incorrect response to SRQ_17 and then applied a correct

transformation to that response. As has been discussed in Section 5.3.1, a

comparison of the non-response rates to SRQ_19 and 20 indicates that the form of

SRQ_19 influenced student’s responses by including the hint “using your answer to

question 17”. It is likely that this influence contributed to the lack of fit of SRQ_19.

Because of its lack of fit, item 19 was deleted and the model refitted. The overall

measures of fit for this new model are included in Table 6.3 and the individual item

infit mean squares in Table 6.4. Looking at the fit of individual items with item 19

excluded, for most items there has been very little change, however those with more

extreme infit mean squares (measured roughly by the distance from 1) have become

less extreme. This results in item 20 no longer being a cause for concern. This fit of


145

the Rasch partial credit model, with SRQ_19 excluded, using the scoring described

in Table 6.2 (in the column “into SRQPC”) and based on 611 students from three

cohorts is the fit used in further discussion and analysis.

Infit Mean Square Infit Mean Square Item 19 in 19 out

Item 19 in 19 out

1 0.96 0.95 12 1.05 1.03 2 1.00 0.99 13 1.18 1.14 3 0.97 0.95 14 0.95 0.94 4 0.99 0.98 15 1.08 1.05 5 1.04 1.03 16 0.99 0.98 6 0.98 0.98 17 0.90 0.91 7 1.14 1.11 18 0.97 0.96 8 1.18 1.17 19 0.71 -- 9 1.03 0.97 20 0.80 0.89

10 1.06 1.02 21 0.94 0.94 11 0.97 0.96 22 1.14 1.11

Table 6.4 Individual item fit statistics for the Rasch partial credit model

As for the dichotomous model, the Rasch partial credit analysis provides a variable

map which displays item and case estimates on the same logistic scale. While the

case estimates are the nβ of Equation 6.2, the item estimates are not the ijδ , which,

as explained earlier, need not be ordered. Rather they are the threshold estimates

from a further alternative parameterisation (Masters, 1988b). The threshold for an

item step is the ability level required in order to have an unconditional probability of

0.5 of passing that step. Thresholds must, by definition, be ordered.

A study of the variable map shown in Figure 6.2 is revealing regarding the mutual

coverage of students and items. (It should be noted that each x on the map represents

five students. Asterisks have been added to the map to indicate the highest, 1.95, and

lowest, -0.30, abilities recorded.) At the lower end of the spectrum, there is a good

selection of items and steps which are indicative of that level, with the most basic

item steps reaching well below the lowest level of student ability. At the upper end

of the spectrum however, this is not the case. There are fewer items in the upper

ranges of student ability with no item reaching beyond the most able student. It


146

SRQ Partial Credit using 1_04 2_04 1_05 - item19 deleted ------------------------------------------------------------------------------------- Item Estimates (Thresholds) all on all (N = 611 L = 21 Probability Level=0.50) ------------------------------------------------------------------------------------- 2.0 * | | 14.6 | X | X | 15.5 XXX | 18.4 18.5 20.6 XXX | XXXXXX | 15.4 XXXXXXXXXXXX | 20.5 22.6 1.0 XXXXXXXXXXXX | 2.5 2.6 13.6 16.5 XXXXXXXXXXXXX | 10.5 14.5 18.2 18.3 22.4 XXXXXXXXXXXXXXXXXX | XXXXXXXXXXXXXXXXXXXXX | 9.5 9.6 10.3 12.4 15.3 XXXXXXXXXXXXXXXX | 5.3 7.5 9.2 9.4 12.3 17.4 18.1 20.3 22.3 XXXXXXXXX | 1.5 17.3 20.1 XXXX | 6.4 8.5 11.4 17.1 17.2 XXX | 3.4 8.2 8.3 8.4 14.4 X | 8.1 11.3 14.3 16.4 0.0 | 2.4 9.1 15.2 16.3 22.1 X | 2.3 6.2 | 2.2 16.2 | 6.1 14.2 * | 1.3 14.1 15.1 16.1 | 2.1 13.3 | 4.3 | 12.2 | 11.2 13.2 -1.0 | | 7.2 21.2 | 4.1 | 10.2 | 10.1 | 7.1 | 21.1 | | | -2.0 | | 5.1 | | | | | | | -3.0 | | | | | | | | | 3.1 -4.0 | ------------------------------------------------------------------------------------- Each X represents 5 students =====================================================================================

Figure 6.2 The variable map for the Rasch partial credit model displays the person ability and item step difficulty estimates.

Level 1

Level 2

Level 3

Level 4

Level 5

Level 6


147

should also be noted that in the upper tail there is a heavy reliance on the curriculum-

based items 17 to 20. An ideal extension of the SRQ would be to include more items

which reached into and beyond the higher levels of student ability at the

secondary/tertiary interface. Further discussion of this is undertaken in Section 6.5.

The variable map can also be used to help define levels of understanding. As item

step codes were developed on the basis of a substantive framework, these too can be

incorporated into the definition of these levels. Hence six levels of understanding

were mapped out in the following way. Firstly, the cut-off between level one and

two was chosen so that all item steps in level one had an item step code of one.

Secondly, the cut-off between levels two and three was chosen so that no item step

code greater than two appeared in level two. Finally, cut-offs between levels three

and four, four and five, and five and six were chosen so that levels two, three, four

and five all spanned approximately equal widths on the logistic scale. This divided

the map naturally into six levels with cut-offs at -1.30, -0.65, 0.00, 0.65 and 1.30 on

the logistic scale. Using the initial item step codes to help define these levels ensures

some degree of consistency between these levels and those of the Watson and

Callingham hierarchy.

The final column, “Out of SRQPC” of Table 6.2 gives the levels at which each coded

response to each item appears in the Rasch variable map. If the response framework

based on the six levels of Watson and Callingham has been applied consistently to

the SRQ and if the same framework is applicable to describe understanding at the

secondary/tertiary interface, than it should be expected that the item difficulty levels

resulting from the Rasch partial credit model would be consistent with the item

response codes which were used in the model. In the discussion which follows, each

item is examined for its consistency or otherwise.

6.3.1 Consistency of item step codes and difficulties

In SRQ_1, which required the unpacking of the algorithm for the mean, the fully

correct response, which had been coded at level 5, was mapped to level 4 by the

Rasch analysis. Level 5 had been selected for this response as it required a

mathematical understanding seen to be consistent with this level. It should be noted


148

here that in assigning levels to the item step difficulties of the Rasch analysis we are

categorising a continuous variable. As this response to SRQ_1 appears towards the

top of level 4, there is in fact little inconsistency demonstrated here.

In SRQ_2, which requires the calculation of an appropriate average, the fully correct

response, coded at level 6, maps to level 5, and lower level responses, coded at 1 and

2, map to level 3. In so doing, responses coded at 6 (mean with outlier removed)

demonstrate essentially the same ability level (1.05) as responses coded at level 5

(median) with ability level 0.97. Part of the reason for this may simply be that,

because of the small percentage of students coded at level 5 (3.6%) compared to 38%

coded at 6, the Rasch analysis cannot differentiate between the two responses. It is

surprising that the lowest responses map to level 3, however this too may be partially

due to the small percentage of students giving each of these responses. One possible

approach to this would be to revise the scoring system to ensure a more uniform

distribution of responses across categories (Keeves and Masters, 1999). Rather than

taking this approach at the initial stage, we have chosen to define codes to

differentiate meaningfully between responses where possible but will later use the

levels of understanding defined by the Rasch partial credit model.

In SRQ_3, the marble question, initial coding is confirmed by the Rasch analysis.

There is one other item where the Rasch analysis results in complete confirmation of

coding, that is SRQ_21, the item with an out of scale graph reporting on increasing

membership of a society.

In SRQ_4, the item about rash medication, the top code is confirmed at level 3, while

a partially correct response coded as 1 appears at level 2 in the Rasch map. Given

that 96% of students had this item correct, it is probably impossible to glean much

from the level of understanding demonstrated by a partially correct response. In fact,

in follow-up discussion, at least one student of very high ability reported thinking

that there must be more required by this question than the obvious response, a

common problem for students presented with questions well below their ability.

In SRQ_5, the correct response which required not being mislead by the conjunction

fallacy, mapped from code 3 to the top of level 4 in the Rasch analysis. This is one

of few items whose top code is mapped to a higher level of understanding than had


149

been predicted. The reason here may be that for the analogous item in the Watson

and Callingham study which requires the three probabilities to be estimated, students

are less likely to fall prone to the conjunction fallacy than in the SRQ multiple-

choice question.

In SRQ_6, the item about winning the toss in a game of cricket, the correct response

is confirmed at level 4 but lower responses, coded at level 1 and 2, move up to level

3. Responses coded as 2 included those who chose heads or tails despite knowing

that each had a probability of 0.5 of occurring. It may be the case that the numerical

calculation reflects the students’ level of understanding but the choice of heads or

tails follows their own internal rules of choice which can operate simultaneously

without apparent conflict.

In SRQ_7, the new car item, a correct response mapped from code 5 to level 4. It is

suspected that this is one item where there is a difference in the application of levels

of understanding between the school students in Watson and Callingham’s study and

those at the secondary/tertiary interface in this study. For these students, at the

secondary/tertiary stage, a lower level of understanding is reflected in a correct

response to this item.

The results of the Rasch partial credit analysis on SRQ_8, the pie chart summing to

more than 100%, were initially surprising. In this item all responses, coded 1

through to 5, mapped to level 4 understanding. Hence this item has no real

discriminatory power at this level. The suggested reason for this is that not

recognising the problem on the graph does not mean that it would not have been

understood as a problem had it been identified for the students. While it may have

been expected that a more critical eye for such detail would be consistent with higher

levels of understanding, the results of the analysis show this not to be the case.

SRQ_9 is the question requiring estimation of the number of fish in the dam. The

fully correct response coded at 6 mapped to level 5, as did responses of 10% (coded

as 5). Given the small number (less than 1%) of code 5s, it is not surprising that the

model could not discriminate between these two responses. However, given the

requirement of proportional reasoning for the correct answer, it was expected that

this item would reach into level 6. For this item, the lower response codes, 2 and 4,


150

both mapped to level 4, while code 1 maps to level 3. Examining the item step

difficulties in their continuous form, aside from the gap between code 0 and code 1,

the biggest jump appears between codes 1 and 2. This can be fully explained as from

code 2 up, some understanding of sampling and proportions is required. For many

responses coded 4 and possibly even 2, it appears to be lack of basic mathematical

skills rather than lack of statistical understanding which has been demonstrated. (See

Section 7.4 for further discussion of this issue.) This item is also one of a number for

which the lowest level of understanding is higher than expected. Unlike in SRQ_2, it

is not caused here by a small percentage response in the lowest groups resulting in

lack of discrimination with the higher code. It appears here that simply being able or

willing to give a response is demonstrative of a reasonable level of understanding.

This may be consistent with the fact that most of the students in the study are

operating at level 4 or 5.

In SRQ_10, the weather prediction item, the only difference between coding and

final levels is that the response 75-84%, coded as 3, has mapped to level 5 with the

correct response, 65-74%. It is likely that students giving the response 75-84%

understood the importance of a 70% chance in the question and possibly, reading

more into the question than had been intended, interpreted this as “at least 70%”.

In SRQ_11, the graph of number of years in the town drawn not to scale, there is a

minor adjustment with code 3 mapping to level 4 along with the top response code.

The lower code 2 remains unchanged. In SRQ_12, the choice between a two and

three dimensional pie chart, the top code 4 maps to level 5 and code 3 to level 4. In

fact these two codes lie respectively just above and below the level 4/5 border.

For SRQ_13, the hospital question, the correct response, coded at 6, maps to level 5.

It is worth noting here that no completely multiple-choice item has a step difficulty

which appears in level 6. In fact all level 6 responses require either a calculation or a

significant explanation.

SRQ_14 and SRQ_15 require a critical assessment of study results. These two items

had been coded to maintain consistency between them. Code 6 was restricted to

SRQ_14 which required proportional reasoning in terms of identification of the fact

that a rate was required. In SRQ_14, codes 6, 5, 4 and 3 mapped to levels 6, 5, 4 and


151

4 respectively, while in SRQ_15, codes 5, 4 and 3 mapped to levels 6, 5 and 5

respectively. Clearly the extra level of critical thinking required in SRQ_15 where

the study results appeared more plausible, added sufficient difficulty to shift

responses, which appear to be the same across the two questions, into a higher level

of understanding. In both items the lowest codes, 1 and 2 mapped to level 3, again a

reflection of the degree of effort required to provide a response.

In the Goodwill Bridge question, SRQ_16, the top responses, coded 5 and 4, mapped

to levels 5 and 4 respectively. Code 3 which also mapped to level 4 represented less

than 1% of students. Again codes 1 and 2 mapped to level 3.

SRQ_17 to 20 are the rainfall questions. (SRQ_19 has been excluded from the

Rasch analysis.) SRQ_17, 18 and 20 are characterised by high non-response rates:

18%, 36% and 30% respectively.) For all three questions the lowest non-zero code

maps to level 4, an indication of the level of ability demonstrated by being willing to

provide any response to this set of items. Like SRQ_8, SRQ_17 which asks for the

median rainfall has effectively no discriminatory power at tertiary level with all

codes (1 to 4) mapping to level 4. In SRQ_18, requesting the lower quartile, codes 4

(representing less than 1% of students) and 5 map to level 6, codes 2 and 3 to level 5.

These responses all demonstrate a reasonable, if not perfectly correct knowledge of

more advanced terminology than other questions. For SRQ_20, requiring a

mathematical transformation, codes 5 and 6 mapped to levels 5 and 6, while codes 1

and 3 mapped to level 4.

SRQ_21 was a question of low difficulty with response codes being confirmed at

levels 1 and 2 by the Rasch partial credit analysis. SRQ_22 required knowledge of

the normal distribution. As for SRQ_17, 18 and 20, this item had a relatively high

non-response rate (16%), and the lowest codes, 1 and 3 all mapped to level 4. Both

code 4 and 6 mapped to level 5. Given that a fully correct response to this item

demonstrated a degree of familiarity with the normal distribution sufficient to utilise

its symmetry, it is a little surprising that this response did not appear as level 6

understanding. However this is consistent with the observation noted previously that

no completely multiple-choice question had an item step difficulty in level 6.


152

6.3.2 The SRQPC Score

In general it can be concluded that the six levels of statistical literacy described by

Watson and Callingham can be used as a framework for statistical understanding at

the secondary/tertiary interface with one or two minor adjustments. Firstly, the

restriction of proportional reasoning to level 6, critical mathematical, is not necessary

at this stage of education. Such understanding could be lowered to level 5, critical,

or even level 4, consistent non-critical. Secondly, an added emphasis on more

advanced terminology could be added at level 6. It is suggested however that more

items reaching into the critical mathematical level should be developed to better

define the description of this level. All students in this study demonstrated a level of

understanding above the two lowest levels described in that framework: idiosyncratic

and informal. A small number of students were operating with an inconsistent

understanding. Approximately 27% demonstrated a consistent non-critical

understanding, 66% a critical understanding and 6% a critical mathematical

understanding.

Having used the Rasch partial credit model to attribute individual item responses to

levels of understanding, these levels could then be used to develop and calculate a

new score based on the SRQ. Rather than using the dichotomous approach of the

SRQ Score described in Section 6.2, the SRQPC Score uses the levels of

understanding resulting from the Rasch partial credit analysis. We shall call this the

SRQPC Score. As was the case for the SRQ Score, the SRQPC Score only uses

items which were administered in both years of the study. Excluded from the Rasch

analysis due to lack of fit to the model, SRQ_19 was not included. Due to the lack of

discriminatory power of SRQ_8 and SRQ_17, described above, these items were also

not included. Although SRQ_18 and 20 could have been included, it was felt that

they should also be excluded to maintain greater consistency with the original SRQ

Score. Hence the SRQPC Score is calculated as the sum of the fitted levels of

understanding demonstrated by the item steps achieved for the fourteen items:

SRQ_1, 2, 3, 5, 6, 7, 9, 10, 11, 12, 13, 14, 15 and 16.

As for the aggregate SRQ Score, the SRQPC Score was examined for consistency

across cohorts. (See Fig 6.3) While the SRQPC Score does not appear as consistent


153

as the SRQ Score, a one-way ANOVA does not indicate any differences between

means across cohorts (p>0.8). As described in Section 2.3, the second semester

(II_04) cohort is routinely a smaller group with backgrounds which tend to differ

from the larger class. Both the size and different nature of this group explains the

apparent difference in shape of the distribution.

One effect of using the SRQPC Score rather than the original aggregate score, is the

lengthening of the lower tail without much change in the bulk of the distribution.

This score accentuates those students whose understanding is particularly poor.

Descriptive Statistics: SRQPC Variable Cohort N Mean SE Mean StDev SRQPC I_04 300 52.343 0.365 6.329 I_05 237 52.021 0.412 6.342 II_04 74 51.919 0.887 7.626 Variable Cohort Minimum Q1 Median Q3 Maximum SRQPC I_04 26 49 53.5 57.00 66 I_05 23 48 52.0 56.00 66 II_04 24 47 54.0 57.25 64

Cohort

SRQ

PC

II_04I_05I_04

70

60

50

40

30

20

B oxplot of SRQPC by Cohort

Figure 6.3 The distribution of SRQPC Scores is essentially consistent across the

three cohorts.


154

6.4 Expected Responses to the SRQ

Using the Rasch partial credit model, it is possible to determine the estimated

expected response to an item for a person of given ability. In this section, the

expected response code to each item on the SRQ (excluding SRQ_19) is calculated

for a person of average ability within three groups of students, namely, those who

have not completed Maths B, those who have studied at, but not beyond, Maths B

level, and those who have studied beyond Maths B.

The estimated expected responses are calculated from the model in Equation 6.2,

using estimates from the fitted model for the ijδ and the average β estimate for each

of the three groups. The β estimates obtained are:

0

1

2

ˆ 0.73 for students below Maths B level;ˆ 0.77 for students at Maths B level;ˆ 0.93 for students beyond Maths B level.

β

β

β

=

=

=

Using these ability estimates and each item step difficulty, ijδ , the expected response

to item i for a person of average ability in maths group g, is calculated as:

( )0

ˆ ; 1,...,18,20,..22; 0,1,2;im

gix

xP X x i g=

= = =∑

where ( )ˆgiP X x= is calculated from Equation 6.2.

Table 6.5 gives the estimated expected level of response for each group, together

with the maximum possible response code allocated for each item.

The closeness of the average β estimates for students below Maths B and those at

Maths B, results in little difference in expected responses between these two groups

of students. This tends to reflect the small number of students below Maths B (n=28)

included in the data. In considering the difference in expected responses between

students at Maths B level and those beyond Maths B, it must be realised that the

degree to which the second group can outperform the first depends on the proximity


155

of students at Maths B level to the maximum possible response for an item. Figure

6.4 plots the difference in expected response to each item between the groups beyond

and at Maths B, and also between the groups at and below Maths B. These

differences are plotted against the expected response for the at Maths B group as a

proportion of the maximum possible response.

Item Below

Maths B At

Maths B Beyond

Maths B Max

Response

SRQ_1 4.2 4.3 4.4 5

SRQ_2 4.5 4.5 4.8 6

SRQ_3 3.4 3.5 3.6 4

SRQ_4 2.9 2.9 2.9 3

SRQ_5 2.2 2.2 2.3 3

SRQ_6 3.3 3.4 3.6 4

SRQ_7 3.9 4.0 4.3 5

SRQ_8 4.3 4.4 4.6 5

SRQ_9 3.8 4.1 4.8 6

SRQ_10 3.3 3.4 3.7 5

SRQ_11 3.4 3.4 3.5 4

SRQ_12 3.1 3.1 3.3 4

SRQ_13 3.7 3.8 4.2 6

SRQ_14 4.1 4.2 4.4 6

SRQ_15 2.6 2.7 3.0 5

SRQ_16 4.1 4.1 4.3 5

SRQ_17 3.0 3.1 3.4 4

SRQ_18 1.5 1.6 2.0 5

SRQ_20 2.5 2.6 3.3 6

SRQ_21 1.9 1.9 1.9 2

SRQ_22 2.8 3.0 3.6 6

Table 6.5 Expected responses to the SRQ predicted by the Rasch partial credit model

While the differences are small between the at Maths B and below Maths B groups,

the nature of the Rasch model is such that the pattern of differences is the same as for

the comparison between the beyond Maths B and at Maths B groups which can be

seen more clearly. As expected, there is a general decrease in difference as the


156

expected response for the at Maths B groups gets closer to the maximum possible

response.

Expected response for At Maths B as a proportion of Max response

Dif

fere

nce

in e

xpec

ted

resp

onse

1.00.90.80.70.60.50.40.3

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

VariableBey ond-At_MathsBAt-Below_MathsB

22

21

20

18

17

16

15

14

13

12

11

10

9

8

7

65

4

3

2

1

22

21

20

18 17

1615 14

13

12

11

10

9

8765

4

3

2

1

Figure 6.4 Scatterplot showing differences in expected responses between Maths

groups

Items which are of most interest are those which appear to differ from this general

pattern. The most notable of these is SRQ_9, with SRQ_20 and SRQ_22 also of

some interest. The item SRQ_20 requires students to recognise the need and to know

how, to find the maximum rainfall given the logarithms of the rainfalls. Although

logs are covered in the Maths B curriculum, the reinforcement of Maths B in higher

maths and greater experience in a broader range of applications gives those students

with higher level maths a clear advantage.

Item SRQ_22 involves the normal distribution. In this case, students with higher

maths are less likely to have had more experience in this specific area. As described

in Section 5.3.1, students’ difficulties with this item may in part be due to a reliance

on graphics calculators in working with distributions. It is possible that students who

have studied more maths have developed better visualisation abilities and more

transferable skills which are being demonstrated in this item.


157

Item SRQ_9 is notable as showing the greatest advantage for students with higher

level maths despite being of average difficulty. This item involves estimating the

fish population in a dam on the basis of a tag and recapture procedure. It is unlikely

that any of the students in each of the three groups would have previously met a

situation of this nature within the school curriculum. The advantage of the higher

maths students in this item appears to rest on their ability to problem solve and to

handle multi-step tasks. In Section 7.4, the relationship of performance in this item

to numeracy is discussed with emphasis on confidence with handling fractions. In

Section 4.3.5 it was noted that students who have studied beyond Maths B

demonstrate greater ability in items on the Numeracy Questionnaire which involve

application of fractions. This greater ability will give these students an added

advantage in this item.

6.5 Discussion - Suitability of the SRQ

In Chapter 5, the appropriateness of the Statistical Reasoning Questionnaire as a tool

for measuring statistical reasoning at the secondary/tertiary interface was indicated.

Despite this confirmation, an application of both a dichotomous Rasch analysis and a

Rasch partial credit analysis indicates that the complete coverage of student abilities

by items may be a little lacking. In particular, the SRQ has a large number of item

step difficulties below the range of student abilities in the study and insufficient item

step difficulties at and above the maximum student ability. An interesting aspect of

this is that most of the SRQ items which do reach into the higher levels of ability are

part of a group of items requiring a slightly higher level of mathematical knowledge

as well as a greater appreciation of statistical terminology. However, as indicated by

the analysis in Chapter 7, it may be that this is necessary to assess higher student

ability in statistical reasoning.

It is felt that in order to measure more accurately a fuller range of statistical

understanding at the secondary/tertiary interface, more items reaching into the higher

levels of understanding should be included. Such items may well include an

appreciation of terminology, but care needs to be taken to separate reliance on


158

mathematical manipulation. (The concurrence of terminology and mathematical

knowledge is one of the complications with the items SRQ_17 to SRQ_20.) The

development of items which measure higher statistical understanding (particularly

apart from probabilistic reasoning) at this educational level remains an area for

further research, but which should be informed by the results in Chapter 7.

While acknowledging the need for more items which assess higher levels of

understanding, reassurance should be taken from the distribution of total SRQ

Scores. When an assessment is overly simple for a group of students the effect on

the scores should be to apply an artificial ceiling. The resultant distribution would be

skewed with a cluster of observations in the upper tail. An examination of the

distributions (see Figure 5.1) shows that this is not the case for the total SRQ Scores.

Rather all three cohorts demonstrate symmetric distributions, indicating that a ceiling

effect is not in operation.

In acknowledging the tension between the consistency and distribution of SRQ

Scores and the possibility for improvement of the SRQ, notice also needs to be taken

of its purpose within this research. In a study such as this, validity of outcomes

depends largely on the cooperation of students and in particular of students who

represent as accurately as possible all those within the study population. As

described in Chapter 2, the students enrolled in MAB101 are drawn from a wide

variety of backgrounds. Even moderately difficult items designed to measure the

ability of higher level students would be well beyond the ability of others. In order

to maximise a student’s cooperation, care needs to be taken to minimise the number

of such difficult items which, particularly at first glance, would be a disincentive to

less able or less motivated students. On the basis of the symmetry and consistency of

the distribution of students’ scores, it is argued that the SRQ is of an appropriate

level to assess the statistical reasoning of students at the secondary/tertiary interface,

balancing the competing factors of student ability and cooperation.

Chapter 7 Factors Which Influence Statistical Reasoning

159

Chapter 7 Factors Which Influence Statistical

Reasoning

7.1 Introduction

The major aim of this study is to better understand factors which influence statistical

thinking at the secondary/tertiary interface. In this chapter, two distinct aspects of

this are considered within the context of MAB101. The first of these examines

performance on the Statistical Reasoning Questionnaire for incoming students, while

the second looks at students’ performance on the end of semester section of the

assessment. The pressures of course requirements and the need to maximise student

cooperation prevented a follow-up Statistical Reasoning Questionnaire from being

administered as part of this study; moreover, the performance of students in course

assessment should be considered as an important aspect of their statistical

development.

Because of the differences between the first and second semester cohorts, described

in Chapter 2, and the difficulties with the administration of the Numeracy

Questionnaire in Semester II 2004, explained in Chapter 4, only the Semester I 2004

and Semester I 2005 cohorts are included in this chapter. In Section 7.2, general

linear models are used to describe the SRQ Score in terms of students’ numeracy,

attitudes, mathematical backgrounds and demographic variables. Similar modelling

carried out in relation to the SRQPC Score (defined in Section 6.3.2) is described in

Section 7.3. Section 7.4 investigates more closely the relationship between

numeracy and individual items of the SRQ. The modelling of students’ course

outcomes via general linear models is described in Section 7.5. Section 7.6

concludes this chapter with a discussion of these results.


160

7.2 Predictors of Statistical Reasoning of Incoming Students

As described in Section 6.2, the SRQ Score used here is formed as the total number

of correct responses to items common to the SRQ in both 2004 and 2005, excluding

those items considered to be more curriculum-based in nature. Items thus included

are SRQ_1 to SRQ_3, SRQ_5 to SRQ_16. Variables included as possible predictors

for the SRQ Score were as for those considered in the analysis of Numeracy Scores

(see Section 4.3.4) as well as the Numeracy Score itself and separate aspects of

attitudes towards statistics. Hence the allowable predictor variables are:

Gender male/female;

1st Semester dichotomous variable =1 for students for whom this was

their first year of enrolment at QUT;

Years Since School continuous variable;

OP continuous variable - OP (overall position) or equivalent

score; ranges from 1 to 25; numerically smaller values of

OP indicate a better performance;

Maths Student dichotomous variable =1 if enrolled in a mathematics

degree, a double degree including mathematics, an applied

science degree majoring in mathematics, or an education

degree with mathematics as a teaching subject;

Repeat dichotomous variable =1 if student had previously failed

MAB101;

Maths B Result a categorical variable with three levels:

D = distinction (an A or B standard at school level; a 6 or

7 standard in the university equivalent subject),

P = pass (any other passing grade),

N = failed or not attempted (includes all students who

have not successfully completed Maths B);


161

Higher Maths a dichotomous variable =1 for students who have studied

mathematics beyond Maths B, generally either the

extension maths subject (Maths C) at high school or at

least some university level mathematics;

Numeracy a continuous variable given by the total score on the

Numeracy Questionnaire, with range 0 to 21;

Affect continuous variable with possible values ranging from

-2 to 2; constructed as the mean of responses to the five


Section 3.2;

Value as for Affect – based on four items;

Difficulty as for Affect – based on two items;

Motivation as for Affect – based on two items;

Link as for Affect – based on two items;

Use as for Affect – based on one item;

Self-Efficacy continuous variable with possible values ranging from

-10 to 10; constructed as the sum of responses to the five


Section 3.2.

As described in Section 4.3.4, general linear models with a combination of

backwards elimination and forwards selection were used to describe relationships

between SRQ Score and the predictor variables. Because of the complications in the

reporting of OP scores (see Section 2.3) modelling was carried out both with OP as a

possible predictor and without.

7.2.1 Results for the 2004 cohort

An initial analysis was performed using only the 2004 data. With OP in the model,

significant predictors for SRQ Score are: Numeracy (p=0.001), OP (p=0.001),

Gender (p=0.012) and Years Since School (p=0.040). Further investigation revealed

that the significance of Years Since School is due entirely to three students for whom


162

Years Since School is greater than ten. A more robust model involves Numeracy

(p<0.001), Gender (p=0.005) and OP (p=0.024), and has an adjusted R-squared

value of 22%. In this model SRQ Score increases with improved Numeracy and OP,

with males outperforming females. With OP excluded from the model, significant

predictors are Numeracy (p<0.001) and Gender (p=0.004), with an adjusted R-

squared of 22%.

Due to the insignificance of the various aspects of attitudes in explaining the SRQ

Score in 2004 and the desire to investigate issues specifically relevant to high school

experiences of statistics (see Section 3.5), the format of the Attitudinal Survey was

changed in 2005. The Affect, Value, Difficulty, Motivation, Link and Use aspects

were dropped, while the Self-Efficacy component was retained because of the

significance of this aspect in explaining numeracy. (See Section 4.3.4.)

7.2.2 Results for the 2005 cohort

Results for 2005 were also considered separately. Allowing for OP, significant

predictors are Numeracy (p<0.001), OP (p=0.001) and Years Since School (p<0.001)

with an adjusted R-squared of 30%. Further investigation of the Years Since School

term indicated that it is not the result of a small number of older students as in 2004.

Rather, for 2005, even when restricted to students with Years Since School less than

five, this variable is still a significant predictor with higher values resulting in higher

SRQ Scores. Without OP in the model, significant predictors are Numeracy

(p<0.001) and Maths B Result (p=0.009). The effect of Maths B Result in this model

is such that there is benefit in SRQ for students with Maths B Result = D over the

others, but no significant difference between those with Maths B = P and those who

have failed or not completed Maths B.

7.2.3 Combining the 2004 and 2005 cohorts

The two years were also considered in a combined analysis with an indicator variable

allowing for year and possible two-way interactions with Year included. With OP

included, the best model for the combined years has an adjusted R-squared of 25%

and significant predictors: Numeracy (p<0.001), OP (p<0.001) and Years Since

School (p<0.001). The fitted equation for this combined model is:


163

Equation 7.1

[ ] ( ) ( ) ( )(s.e.=0.65) (s.e.=0.035) (s.e.=0.045) (s.e.=0.047)

6.40 0.21 0.19 0.20 . E SRQ Numeracy OP Years Since School= + × − × + ×

Recall that the negative coefficient for OP is as expected since smaller values of OP

score indicate a better performance. The residual plots, shown in Figure 7.1, show

no systematic concerns with the model.

Standar dized Residual

Per

cent

420-2-4

99.9

99

90

50

10

1

0.1

Fitted V alue

Sta

ndar

diz

ed R

esi

du

al

12.510.07.55.0

3.0

1.5

0.0

-1.5

-3.0

Standar dized Residual

Freq

uen

cy

2.251.500.750.00-0.75-1.50-2.25

30

20

10

0

Observation Order

Sta

nda

rdiz

ed

Res

idua

l

450400350300250200150100501

3.0

1.5

0.0

-1.5

-3.0



SRQ on Numeracy, OP and Years Since School


concerns.

Without OP, the best model has as significant predictors, Numeracy (p<0.001), Self-

Efficacy (p=0.011) and Gender (p=0.039) and an adjusted R-squared of 21%. The

fitted equation for this model is:

Equation 7.2

[ ] ( ) ( ) ( )(s.e.=1.15) (s.e.=0.22) (s.e.=0.027) (s.e.=0.039)

4.50 0.45 Male 0.23 0.10 - . E SRQ Gender Numeracy Self Efficacy= + × = + × + ×

Again the residual plots shown in Figure 7.2 show no concerns.


164


Per

cent

420-2-4

99.999

90

50

10

1

0.1

Fitted Value

Stan

dar

diz

ed R

esid

ual

1210864

2

1

0

-1

-2


Fre

que

ncy

2.251.500.750.00-0.75-1.50-2.25

30

20

10

0

Observation Order

Sta

nda

rdiz

ed

Re

sidu

al

400350300250200150100501

2

1

0

-1

-2



SRQ on Gender, Numeracy, and Self-e fficacy


concerns.

7.2.4 Synthesising the results

In synthesising the results of the analyses for the two years both individually and

combined, a few points are notable. Firstly, it is clearly evident that in all models,

with or without OP score included, the score on the Numeracy Questionnaire is a

highly significant predictor of the score on the SRQ and is a more useful predictor

than other variables formed from the mathematical backgrounds of students. When

the OP score is included in the model, it too is highly significant in addition to the

Numeracy Score. Both these variables work in the expected direction, that is,

increasing Numeracy increases SRQ and improving OP increases SRQ.

The significance of gender differs between the two years. In 2004, Gender is

significant both with and without OP, while in 2005 it is not. When the two years

are combined, Gender is significant without OP but not with. In all cases where

Gender is significant, males outperform females in the SRQ in addition to their

advantage in the Numeracy Questionnaire. (See Section 4.3.4)

When OP is included in the model, Years Since School is significant in 2005 but not

in 2004. It is likely that this difference between years is related to the under-

reporting of OP scores in 2004. It is not surprising that students with higher values

of Years Since School are less likely to report their OP score. Investigation of the

two years indicates that in the 2004 data, there is much stronger evidence than in


165

2005 of this relationship between the non-reporting of OP and increasing values of

Years Since School. For the 2005 data, the coefficient of Years Since School in the

model is positive, suggesting that mature-age students in general have experience

which improves their statistical reasoning above what their numeracy and OP scores

together would predict. The same effect is not evident in the 2004 data possibly

because fewer such students have provided an OP score.

When OP score is not included, Years Since School is not significant. Investigation

of the Years Since School variable shows that, considering pairwise correlations, it is

uncorrelated with both SRQ and Numeracy but positively correlated with OP, that is

larger OP scores (indicating lower achievement) are linked with larger values of

Years Since School. Hence a reasonable interpretation of the positive coefficient of

Years Since School in models for SRQ which include OP, is as a correction for the

OP score amongst older students.

In general, when OP score is not included in the model, less variation in SRQ can be

explained. Other variables, such as Self-Efficacy in the combined cohorts and Maths

B Result in 2005, become useful predictors, but these do not explain SRQ as well as

OP. In all cases, the Numeracy Score remains pre-eminent in its usefulness in

explaining statistical reasoning.

7.2.5 Recent school leavers

In order to focus on those students who had graduated more recently from high

school, the above analyses (for each year individually and combined) were repeated

considering only those students who had values for Years Since School equal to zero

or one. This removed 192 students from the total combined years' data of 484

students. Eighty-two of these had a Years Since School value of two or more, while

110 had a missing value for this variable.

The results for these smaller groups were similar to those for the complete groups.

When considering only these students at the immediate secondary/tertiary interface,

for the 2004 group, Gender is not significant in predicting SRQ whether or not OP is

included. Without OP, Self-Efficacy (p=0.008) becomes a useful predictor in 2004,

while in 2005, Maths B Result is not significant when OP is not allowed for. When


166

the two years are combined, Gender is not significant but there is a significant

difference between years which acts in the opposite direction to the Year effect for

Numeracy. The two fitted equations for years combined are:

Equation 7.3

[ ] ( ) ( )(s.e.=0.76) (s.e.=0.040) (s.e.=0.052)

6.52 0.21 0.22 E SRQ Numeracy OP= + × − ×

when OP is included, and,

Equation 7.4

[ ] ( ) ( ) ( )(s.e.=0.55) (s.e.=0.036) (s.e.=0.052) (s.e.=0.28)

3.78 0.25 0.15 - 0.63 2004 E SRQ Numeracy Self Efficacy Year= + × + × + × =

when OP is not included. Hence when consideration is restricted to only those

students more specifically at the secondary/tertiary interface, the importance of

numeracy in predicting statistical reasoning is not altered.

7.2.6 Incorporating levels of numeracy

Using the levels of numeracy defined in Section 4.3.3 by the Rasch analysis, the

Numeracy Score was broken into two components, introductory numeracy and

intermediate numeracy. The variables thus defined:

Intro-Numeracy a continuous variable given by the total score

corresponding to levels 1 to 3 on the Numeracy

Questionnaire (i.e. items N_1-N_3, N_5-N_11 and N_19)

with range 0 to 11;

Inter-Numeracy a continuous variable given by the total score

corresponding to levels 4 and 5 on the Numeracy


167

Questionnaire (i.e. items N_4, N_12-N_18, N_20-N_21)

with range 0 to 10;

were included as possible predictors for SRQ, using the complete 2004/2005 data set.

Allowing for OP, the best model for SRQ involves both components of numeracy,

Intro-Numeracy (p=0.021) and Inter-Numeracy (p=0.001) as well as OP (p<0.001),

and Years Since School (p<0.001). This model has an adjusted R-squared of 25%

and the fitted equation:

Equation 7.5

[ ] ( ) ( ) ( )(s.e.=0.77) (s.e.=0.083) (s.e.=0.062) (

6.46 0.19 - 0.21 - 0.19

E SRQ Intro Numeracy Inter Numeracy OP= + × + × − ×

( )s.e.=0.045)

(s.e.=0.048)0.21 .

Years SinceSchool+ ×

This model is virtually identical to the corresponding model (Equation 7.1) which

considers the Numeracy Score as a single entity, explains the variation in SRQ

equally well, and gives essentially equal weighting to the two components of the

Numeracy Score.

Without allowing for OP, the best model involves the predictors Intro-Numeracy

(p<0.001), Inter-Numeracy (p=0.003), Self-Efficacy (p=0.010) and Gender

(p=0.030). This model has an adjusted R-squared of 21% and the fitted equation:

Equation 7.6

[ ] ( ) ( )

( ) ( )(s.e.=0.48) (s.e.=0.22) (s.e.=0.064)

(s.e.=0.

3.88 0.47 Male 0.34 -

0.15 - 0.10 - .

E SRQ Gender Intro Numeracy

Inter Numeracy Self Efficacy

= + × = + ×

+ × + ×049) (s.e.=0.039)

Again, this model has the same significant variables as the corresponding model

(Equation 7.2) which considers the Numeracy Score as a single entity and explains

the variation in SRQ equally well. However, without OP in the model, SRQ is best


168

predicted by giving approximately twice the weighting to the introductory

component of the Numeracy Score as to the intermediate component.

The most important point to be noted from these models is that, contrary to what may

appear intuitive, both introductory and intermediate components of the Numeracy

Score are significant in explaining statistical reasoning. This is despite the fact that

the SRQ requires very few mathematical calculations of the type included in the

introductory component of the Numeracy Questionnaire and no manipulations of the

type included in the intermediate component. Again we see evidence that ability

with mathematical skills tends to be linked with statistical reasoning even if the

reasoning does not explicitly require those specific mathematical skills.

7.3 Predictors of the SRQPC Scores

In Section 6.3.2, the results of the Rasch partial credit model were used to create the

SRQPC Score in which responses to the SRQ are scored according to the level of

reasoning they reflect. As for the SRQ Score, general linear models were used to

determine the best predictors for the SRQPC Score.

This modelling resulted in the same combination of predictor variables for SRQPC as

for SRQ, namely: Numeracy (p<0.001), Years Since School (p<0.001) and OP

(p=0.010), with an adjusted R-squared of 20% and the fitted equation:

Equation 7.7

[ ] ( ) ( ) ( )(s.e.=1.70) (s.e.=0.093) (s.e.=0.12) (s.e.=0.12)

47.00 0.54 0.30 0.44 . E SRQPC Numeracy OP Years SinceSchool= + × − × + ×

However, the residual plots, shown in Figure 7.3 indicate a degree of left skewness.


169


Per

cent

420-2-4

99.999

90

50

10

1

0.1

Fitted Value

Stan

dar

diz

ed R

esid

ual

60555045

2

0

-2

-4


Fre

que

ncy

1.500.750.00-0.75-1.50-2.25-3.00

40

30

20

10

0

Observation Order

Sta

nda

rdiz

ed

Re

sidu

al

450400350300250200150100501

2

0

-2

-4



SRQPC on Numeracy, OP, Years Since School

Figure 7.3 Residual plots for the model in Equation 7.7 indicate some left

skewness.

A standard technique for handling skewness is to apply a Box-Cox transformation,

that is a transformation of the form Y? to the response variable, Y, such that the

transformed data is closer to normal. For the SRQPC data, choosing the value of ?

which minimises the variance of the transformed variable, hence maximising the

likelihood, gives a 95% confidence interval for ? of 1.90 to 3.32. Hence, the value

?=3 is chosen as the most appropriate. When this transformation is applied and the

modelling procedure repeated for the transformed data, the resulting best model

contains the same predictor variables as for SRQPC, (and SRQ), namely Numeracy,

Years Since School and OP. The residual plots for this model, shown in Figure 7.4

demonstrate no systematic concerns. It should be emphasised that this procedure is

used here simply to check that the significant variables are unchanged by

transforming to obtain more normal residuals. It is not suggested that a

transformation of the SRQPC be used. The interest lies not in predicting the SRQPC

but in investigating its nature and what influences it.

If the Numeracy Score is split into its two components, then again the predictor

variables are the same as for SRQ. No further modelling of the SRQPC Score was

carried out because of the similarity to the SRQ with respect to important predictors,

and because of the difficulties of interpreting the transformed variable.


170


Per

cent

420-2-4

99.999

90

50

10

1

0.1

Fitted Value

Stan

dar

diz

ed R

esid

ual

200000175000150000125000100000

3.0

1.5

0.0

-1.5

-3.0


Fre

que

ncy

2.251.500.750.00-0.75-1.50-2.25-3.00

40

30

20

10

0

Observation Order

Sta

nda

rdiz

ed

Re

sidu

al

450400350300250200150100501

3.0

1.5

0.0

-1.5

-3.0



Residual Plots for lambda=3

Figure 7.4 Using the transformed response variable, residual plots demonstrate no systematic concerns.

7.4 Further Exploration of the Link between Numeracy and

Statistical Reasoning

As well as considering the SRQ and SRQPC Scores, individual items on the SRQ

were investigated regarding their relationship with numeracy. For each item in the

SRQ, the mean Numeracy Scores were compared, using t-tests, for the group of

students who had the item correct and those who had it incorrect. This comparison

was made for both the introductory and intermediate Numeracy Scores as well as the

total score. Table 7.1 shows the results of these comparisons. For those items with a

highly significant difference in mean scores for each of introductory, intermediate

and total numeracy, a 95% confidence interval for the mean difference in the total

score is also quoted.

Items SRQ_1, SRQ_3, SRQ_9, SRQ_19 and SRQ_20 are notable, as for these items

the difference in Numeracy Scores between the students who succeed at the item and

those who do not, is greatest. In the case of SRQ_19 and SRQ_20 which involve

logarithms, no explanation of this relationship is required. SRQ_1 is the item which

requires an appreciation of the algorithm for the mean. Its relationship to numeracy

is evidence of the claim made in Section 5.3.1 that a correct response to this item

reflects something other than the students being able to select an appropriate average

as is implied in the SRA component scales.


171

The remaining two items in this group, SRQ_3 and SRQ_9, both require some ability

with fractions. Given that the item SRQ_3 (involving drawing marbles from boxes)

is relatively easy for students at this level with a success rate of 82%, its strong

relationship to numeracy is interesting. As explained in Section 5.3.1, it is found

with this item that most errors are the result of miscalculations rather than

misunderstandings of probability. There is evidence here that errors resulting from

apparent carelessness (such as calculating the ratio of red to blue marbles in one box

and red to all marbles in the other), even with the use of a calculator, are not entirely

random, but likely to be related to a lack of confidence and familiarity with handling

common fractions in any situation.

Introductory Numeracy

Intermediate Numeracy

Total Numeracy

95% CI for mean diff. in total numeracy

SRQ_1 *** *** *** (2.4,3.9) SRQ_2 *** *** *** (1.2,2.7) SRQ_3 *** *** *** (2.1,4.0) SRQ_4 - - - SRQ_5 *** *** *** (0.7,2.2) SRQ_6 *** - * SRQ_7 * - * SRQ_8 - - - SRQ_9 *** *** *** (2.4,3.8) SRQ_10 *** *** *** (1.4,2.8) SRQ_11 - - - SRQ_12 - - - SRQ_13 - ** ** SRQ_14 * - - SRQ_15 - - - SRQ_16 - - - SRQ_17 *** - *** SRQ_18 - - - SRQ_19 *** *** *** (2.0,3.8) SRQ_20 *** *** *** (2.6,4.3) SRQ_21 _ ** ** SRQ_22 - - -

*** p<0.005, ** 0.005<p<0.01, * 0.01<p<0.05

Table 7.1 SRQ items for which successful and unsuccessful students reflect a significant difference in mean Numeracy Scores


172

Responses to SRQ_9 (the tag and recapture fish question) were investigated more

closely regarding their relationship with both the introductory and intermediate levels

of numeracy. In Section 6.3, through the Rasch analysis, responses to this item were

scored as:

0 no response, or response < 200;

3 responses between 200 and 525;

4 other incorrect responses – usually the result of calculating an

incorrect proportion;

5 correct response (2000); 10% (did not quite complete).

Response level for SRQ_ 9

Intr

odu

ctor

y N

umer

acy

Sco

re

5430

11

10

9

8

7

6

5

4

3

2

Response level for SRQ_ 9

Inte

rmed

iate

Num

erac

y S

core

5430

10

8

6

4

2

0

Figure 7.5 Boxplots of Introductory and Intermediate Numeracy Scores by response to SRQ_9


173

Figure 7.5 gives the boxplots for the introductory and intermediate Numeracy Scores

of students, according to their fitted response code for item SRQ_9.

The boxplots indicate that students giving a level 5 response to SRQ_9 tend to score

more highly on both components of numeracy. There is a suggestion that those

responding at level 4 tend to have lower scores than those responding at level 3 on

the introductory numeracy but not on the intermediate numeracy. As the

introductory component of the Numeracy Questionnaire includes the items which

require handling of fractions, this is evidence of a relationship between the

manipulation of fractions in simple arithmetic and the successful application of

proportional reasoning. That is, students responding at level 4 are most likely being

prevented from completing their statistical reasoning by a lack of competency in

handling fractions.

The demonstration of a link between handling and manipulating fractions, as

assessed in the Numeracy Questionnaire, and using full and complete proportional

reasoning, as required for success at both SRQ_3 and SRQ_9, is of importance.

While manipulation and reasoning may well be considered as different abilities, the

link shown here suggests that they cannot be separated.

7.5 Predictors of the Exam Component of Assessment

The assessment schedule for MAB101 is given in Table 2.2. The assessment focus

in this section is on the end of semester exam, as it is a measure of students’ overall

individual learning of the core knowledge and skills presented in the course. The

mid-semester test focuses only on specific sections of the course. Quizzes, designed

largely to encourage engagement with the material, are often completed with a

substantial amount of assistance from tutors and other students. The project is an

excellent tool for developing statistical thinking but has the disadvantage in research

of being completed in groups and therefore may not necessarily be accurate as a tool

for measuring individual learning. However, the end of semester exam reflects the

skills and understanding developed through the projects. Analysis of data as well as

interaction with students in helping with projects throughout the semester indicates


174

that those with the best understanding in their projects also tend to demonstrate better

understanding and performance in the exam.

In both 2004 and 2005, the end of semester exam contributed 60% of the assessment

for MAB101, or 50% for students who completed an optional essay. In both years,

the maximum marks available on the exam paper were more than what was required

for full marks in order to give students some flexibility of choice, but because of the

diversity of students, a considerable number of students obtained more than the

requisite full marks. Hence, for this analysis, raw exam scores were scaled as a

percentage of the maximum marks available and the two years combined.

Of those students who sat the exam at the end of semester, approximately 80% had

completed at least one of the Attitudes Survey, Numeracy Questionnaire or

Statistical Reasoning Questionnaire at the beginning of semester. Students who were

not surveyed consist predominantly of those who enrolled after the first week of

semester and those who attended very few classes even from week one.

Approximately 80% of students involved in the survey process sat the final exam.

Those not doing so include students who formally withdrew without penalty in the

first weeks of the course and those who failed to complete the course without notice.

Not surprisingly, the participants in both the survey process and the final exam are

not entirely representative of the complete group. In the final exam, those students

who had participated in the survey process outperformed those who had not, with a

95% confidence interval for the mean difference in exam mark of (5.2, 13.0).

Conversely, both introductory and intermediate levels of numeracy as well as SRQ

and SRQPC Scores were significantly higher for students who sat the exam

compared to those who did not. Interestingly however, there is no indication of a

difference in OP scores between these two groups. When possible relationships are

investigated between the various background variables and whether or not a student

sat the exam, the only significant relationship is with the variable Maths B Result,

with students who have failed or not completed Maths B less likely to sit the exam.

This is consistent with the observation that students without Maths B tend to need

extra support with all concepts throughout the semester, not only those that are

specifically associated with mathematics.


175

It has already been observed in this chapter that numeracy, mathematical

backgrounds, OP and self-efficacy feature in explaining the SRQ and the SRQPC

Scores. General linear models were used to investigate the simultaneous significance

of students’ mathematical backgrounds, statistical reasoning and numeracy at the

beginning of semester on their end of semester exam score. That is, knowing that

numeracy, mathematical backgrounds and OP are significant in explaining SRQ, the

use of general linear models explores what happens when we consider their

combined effects on the end of semester exam score. Using the SRQ Score to

measure statistical reasoning, the best model for Exam Score has an adjusted R-

squared of 26% and significant predictors: OP (p<0.001), Year (p<0.001), Maths

Student (p=0.003) and an interaction between Maths Student and Year (p=0.005).

Neither Numeracy nor SRQ is significant in the presence of these variables (which, it

must be remembered, are related to them). However, if the SRQPC Score is used to

measure statistical reasoning, it is a useful predictor of the final exam score even in

the presence of other variables which are related to it. Using SRQPC, the best model

for Exam Score has an adjusted R-squared of 32% and significant predictor

variables: OP (p<0.001), Year (p<0.001), SRQPC (p=0.009), Gender (p=0.009),

Years Since School (p=0.015), Maths Student (p=0.015) and an interaction between

Maths Student and Year (p=0.011). The residuals for this model, shown in Figure

7.6, indicate no real concern. The fitted equation for this model is given by:

Equation 7.8

[ ] ( )

( )

( )

(s.e.=11.04) (s.e.=2.14)

(s.e.=3.37)

(s.e.=2.50)

43.38 5.62 female

0.20 1; 04

2.33 0; 05

14.33 1

E Exam Score Gender

Maths Student Year

Maths Student Year

Maths Student

= + × =

+ × = =

+ × = =

+ × =( )

( ) ( )(s.e.=3.36)

(s.e.=0.19) (s.e.=0.37) (s.e.=0.44)

; 05

0.50 2.43 1.09 .

Year

SRQPC OP Years Since School

=

+ × − × + ×


176


Per

cen

t

420-2-4

99.9

99

90

50

10

1

0.1

Fitted Value

Stan

dard

ize

d R

esid

ual

10080604020

3.0

1.5

0.0

-1.5

-3.0


Fre

qu

ency

2.251.500.750.00-0.75-1.50-2.25-3.00

30

20

10

0

Observation Order

Stan

dard

ized

Res

idua

l

750

7 00650

600

550

500

4 50400

350

3 00250

200

1 50100501

3.0

1.5

0.0

-1.5

-3.0



Exam on Gender, Maths Student, Year, SRQPC, OP, Years SInce School, Maths Student x Year

Figure 7.6 Residual plots for the model in Equation 7.7 show some indication of

skewness.

Observed SRQPC Scores over the two cohorts range from 23 to 66, and hence the

predicted contribution to exam score from SRQPC ranges from 11.5 to 33. The

interaction between Year and Maths Student is a feature of the two cohorts which

needs to be allowed for in the model. Investigation of the exam scores and also of

marks for other assessment items indicates that in 2005 there was a significant group

of maths students who dominated the high achievers. Another feature of this model

is that unlike for the Numeracy and Statistical Reasoning Questionnaires, females

significantly outperform males. This is also the case for other assessment items and

is not uncommon in performance measures. However, it needs to be remembered

that at least some association exists between Gender, Maths Student and Year as

described in Section 2.3. It may be that part of the role of Gender in Equation 7.8 is

to adjust for the combined effects of Maths Student and Year. It must be emphasised

that none of these terms should be interpreted in isolation. Our main focus here is on

investigating the possible role of the SRQPC Score in explaining the exam score in

the presence of all other variable available to us.

The OP score is a measure of effort and commitment as well as ability, which are all

components of achievement in the end of semester exam. If OP is not included in

modelling Exam Score, but SRQPC is, then the best model involves SRQPC

(p<0.001), Gender (p=0.001), Maths Student (p<0.001), Year (p<0.001), Maths B

Result (p=0.014) and an interaction between Maths Student and Year (p=0.002), but


177

the adjusted R-squared falls to 20%. The Numeracy Score is not significant here in

the presence of SRQPC, but if SRQPC is forced out of the model, then a model

involving Gender (p<0.001), Year (p<0.001), Maths Student (p=0.001), Numeracy

(p=0.002), Maths B Result (p=0.003), Years Since School (p=0.030) and interactions

between Maths Student and Year (p=0.004) and Gender and Maths B Result

(p=0.013) also has an adjusted R-squared of 20%.

The above results, taken in conjunction with the results of previous chapters, are of

great importance for statistical educators, statistical education researchers and

curriculum developers and advisors. We have seen that both introductory and

intermediate numeracy and mathematical backgrounds or inclinations are highly

significant in explaining SRQ and SRQPC Scores even though these involve no

questions usually regarded as requiring mathematical background.

This section has now investigated the end of semester exam component of the

assessment which represents the statistical operational knowledge and skills learnt

during the semester. The statistical operational knowledge and skills of this course

are those endorsed and emphasised by the statistical education reform movement,

focussing on data-driven statistical reasoning with negligible calculations,

manipulations or considerations of the “mathematical” type. However, not only do

we have that mathematical background and skills significantly contribute to the

statistical understanding and operational knowledge learnt in this apparently non-

mathematical course, but we now also see that the SRQPC Score, even in the

presence of these significant mathematical variables, is important in explaining the

extent to which students develop statistical operational skills and understanding

during the semester. Neither the Statistical Reasoning Questionnaire nor the end of

semester exam involves overt mathematics. What is emerging throughout this

research and analysis of data is that the link between mathematical and statistical

thinking and skills goes beyond any specific numerate techniques that might appear

in basic statistics.


178

7.6 Discussion

This chapter has considered the relationships between statistical reasoning of

MAB101 students, their numeracy skills, attitudes towards statistics, mathematical

backgrounds and demographic variables, both at the beginning and end of the course.

At the beginning of the course statistical reasoning was measured using the

Statistical Reasoning Questionnaire developed in Section 5.2, while at the end of the

course, performance on the end of semester section of the assessment was used as an

indicator of statistical reasoning.

Use of the same SRQ instrument, or a comparable end of semester SRQ, was not

possible due to circumstances surrounding these two time points and the aims of the

study. The wide range of statistical backgrounds and experience with which students

enter the course necessitated the use of an instrument which is virtually technique

free at the beginning of the study. While comparisons may have been of interest had

a similar instrument been administered at the end of the course, such an instrument

would be perceived by students as irrelevant to their work on projects, assignments

and study towards the end of semester, and hence viewed as an imposition, resulting

in low student cooperation and even resentment.

In measuring statistical reasoning based on the SRQ, both the SRQ and SRQPC

Scores were used. The more symmetric distribution of the SRQ residuals made it

more appropriate for use in general linear models, however models formed using the

SRQPC Score were virtually identical to those based on the SRQ Score.

The major implication of the modelling used to describe the statistical reasoning

scores of students at the beginning of MAB101 is the usefulness and importance of

the students’ Numeracy Scores, with higher Numeracy Scores corresponding to

higher SRQ Scores. This feature is such that numeracy dominates the prediction of

SRQ Score in nearly all models, whether or not the OP score is included.

Furthermore, when the Numeracy Score is broken into two measures, the

introductory and intermediate components, both of these together are needed to best

describe the SRQ Score. This is despite the lack of mathematical calculations

required in the SRQ.


179

When included in the model, the OP score is also highly significant. The negative

coefficient of OP is as expected since smaller OP scores indicate higher achievement.

With OP score acting as a general measure of ability, the important feature of this

model is the significance of Numeracy after allowing for OP.

The linear models developed indicate that there is a positive effect on SRQ Scores

from the number of years since leaving school in addition to the effect of numeracy

and OP scores. Investigation shows that this is most likely acting to correct the

effect of larger OP scores amongst older students. While these older students do not

in general have higher SRQ Scores, such a correction essentially extends the

applicability of the model to include these students.

With numeracy, OP and the number of years since school allowed for, none of the

other variables is helpful in explaining statistical reasoning. This includes the

various aspects of attitudes towards statistics described in Chapter 3, as well as

mathematical background and demographic variables. (It should be remembered,

however, that self-efficacy, gender and other background variables are useful

predictors of numeracy. See Section 4.3.4) Without allowing for OP, in addition to

numeracy, self-efficacy, Maths B result and possibly gender further help to explain

statistical reasoning.

Examining the relationship between Numeracy Scores and individual items on the

Statistical Reasoning Questionnaire, indicates that the link between numeracy and

statistical reasoning is not restricted to items which are more difficult or more

obviously mathematical by nature. In particular, the successful use of even a basic

level of proportional reasoning appears to be related to the level of basic numeracy

skills.

The end of semester component of the assessment in MAB101 measures the

statistical operational knowledge and understanding developed during the semester in

a data-driven, non-mathematical course strongly aligned with the principles and

practices of the statistical education reform movement. The effect of the OP score on

the marks on the end of semester component of the assessment can most probably be

explained by the impact on both these measures of a combination of effort,


180

commitment and ability. With OP in the model, the number of years since school is

significant, possibly again correcting for the effect of OP.

Using the SRQPC Score to describe statistical reasoning helps to explain the exam

score in a way that the SRQ Score does not. It is possible that a contribution to this

is that there is a slightly lower correlation between SRQPC and OP than between

SRQ and OP. It is of interest that the use of levels of partial understanding in

formulating the SRQPC helps to explain marks in assessment that bring together the

operational knowledge and interpretation developed over the semester. That is, the

partial understanding measured by the SRQPC Score appears to provide a measure of

potential to succeed in development within the course.

There is also a significant effect of gender on exam mark. However, unlike in the

Numeracy Scores where males outperform females, in exam marks females

outperform males.

In the presence of other predictors which themselves depend on the Numeracy Score

(as well as an interaction between year and maths students which is a feature of the

two cohorts) the Numeracy Score is not additionally significant in predicting the

exam mark. This is perhaps not surprising given its strong influence on SRQ and

SRQPC. This remains the case even if OP is not included in the model. However,

without OP and without the benefit of the SRQPC Score, the Numeracy Score is

again a highly significant predictor of statistical skills and understanding.

In investigating exam scores, we are automatically excluding students who have

withdrawn from, or failed to complete, the course. Comparisons between students

who did and did not sit the exam, indicate that those who did not have lower scores

at the beginning of semester in statistical reasoning and both introductory and

intermediate levels of numeracy. Notably however, there is no significant difference

in OP scores between these two groups. Also, the only background variable that is

significantly related to sitting the end of semester exam is the Maths B result.

Students who have failed or not completed Maths B are more likely to not complete

MAB101 than those who have successfully completed Maths B.


181

Hence, despite the dominance of OP score in predicting course outcome as measured

by the end of semester section of the assessment, care needs to be taken to account

for other features. In particular, it must be noted that students who enter MAB101

with lower levels of numeracy and statistical reasoning as well as those without

Maths B, may need extra support to persist to the end of the course.

Chapter 8 Implications

183


8.1 Introduction

This chapter concludes the thesis with the implications of this research. Section 8.2

considers the key findings within the research and in so doing summarises the work.

Some aspects of the extent and limits of the study are discussed in Section 8.2.1.

Section 8.3 discusses implications of the work for teaching, assessment and advising

students. The thesis concludes with Section 8.4 which considers possibilities for

further research.

8.2 Implications within the Research

The focus on the nature and promotion of statistical literacy, reasoning and thinking

in the statistics education literature, and on the differences between statistics and

mathematics amongst some members of the statistical community, has at times

resulted in the devaluing of the role of mathematics and mathematical thinking in the

development of statistical reasoning. In contrast to this, there is a general

acknowledgement that a lack of capability with underlying mathematics interferes

with statistical learning (Ben-Zvi and Garfield, 2004). However, there has been a

paucity of research underpinning this “truth” particularly amongst students in

programs associated with science. Through investigating statistical thinking at the

interface of secondary and tertiary education and the factors which impact upon it,

the major implication of this work is that basic numeracy skill is a highly important

predictor of statistical reasoning and that this relationship extends to areas which are

not overtly mathematical in nature. In particular, this has been demonstrated through

the significance of numeracy in predicting statistical reasoning for students entering

an introductory data analysis subject in a program associated with science.


184

In Chapter 7 of this thesis, it has been shown that this significance is such that

numeracy dominates the prediction of statistical reasoning in nearly all models,

whether or not allowance is made for the students’ tertiary entrance scores.

Furthermore, when the Numeracy Score is divided into an introductory and an

intermediate score, both of these together are useful in predicting statistical

reasoning. The importance of this finding cannot be overstated. While most

researchers in statistical education would acknowledge the importance of

introductory numeracy skills in developing statistical reasoning, some might question

the relevance of the more algebraic skills reflected in the intermediate component of

the Numeracy Questionnaire. However, this importance is clearly demonstrated in

the analyses of Chapter 7. Furthermore, investigation of success or failure in

individual statistical reasoning items indicates that the strength of the link between

statistical reasoning and numeracy is not dependent upon the difficulty of the item or

on the degree of mathematical ability apparent in the item. As is often the case with

mathematics within other disciplines, the generic skills of mathematics are at least as

important as the specific skills.

The SRA was developed to assess statistical reasoning at the secondary/tertiary

interface and to determine the success of a new high school statistics course in the

US (Konold, 1990; Garfield, 1991, 2003). Our belief that some aspects of the SRA,

particularly an emphasis on combinatorial reasoning, make it inappropriate for use in

totality in the current Australian context, has led us to develop a new Statistical

Reasoning Questionnaire. This SRQ draws on the SRA and on substantial work of

Watson (Watson and Callingham, 2003; Watson, et al., 2003) in the Australian

primary and junior secondary school context.

Tests of validity and reliability of the SRA (Garfield, 1998) and low correlations

with course outcomes (Garfield, 2003) have led to questioning over its use as a single

measure of statistical reasoning and instead to a reliance on a number of component

scales or correct reasoning and misconceptions (Tempelaar, 2004). However,

Watson and Callingham’s (2003) application of a Rasch partial credit model

demonstrates that statistical literacy is a one-dimensional hierarchical construct. The

implication of this is that it can validly be described by a single measure. In Chapter

6 of this thesis, a Rasch partial credit model is applied to the SRQ. The fit of the


185

model supports Watson and Callingham’s findings and indicates that levels of

reasoning can be described and are demonstrated by students at the

secondary/tertiary interface in a manner similar to Watson and Callingham’s

description at the upper primary and junior secondary level.

The consistent and symmetric distribution of results of the SRQ, together with the

results of the Rasch analyses, provide evidence in Chapters 5 and 6 that the SRQ is

an appropriate tool for use in measuring general statistical reasoning at the

secondary/tertiary interface in the Australian context, while leaving room for future

development of items that reach into the higher levels of statistical reasoning

demonstrated by some students at this level.

The SRQPC Score, based on the results of the Rasch partial credit analysis, is

developed in Chapter 6 as a new approach to scoring a statistical reasoning

questionnaire. In calculating the SRQPC Score, the level of understanding

demonstrated by an item response, as defined through the Rasch partial credit

analysis, is used as the score for that item. By incorporating students’ levels of

understanding, this score allows for a more meaningful description of the range of

student reasoning. The significance of this score rather than the dichotomously-

based SRQ Score in analysis of the end of semester exam component of assessment

emphasises the fact that this scoring technique is a useful development in itself.

The Rasch partial credit model has also been used to estimate the expected responses

of students to individual items on the SRQ which have then been compared for

students with different mathematical backgrounds. Combining this information with

the relationship between numeracy and success in individual items of the SRQ,

particularly brings to light the difficulty which students with poorer mathematical

skills experience in successfully applying proportional reasoning in a statistical

context.

In Chapter 7 the end of semester section of the course assessment is used as a

representation of individual learning in the course. Although worth 50 to 60% of the

overall assessment, performance in the exam provides a measure of overall learning

in the course, including practical and project work. Tertiary entrance score and the

newly introduced SRQPC Score are both highly significant in explaining exam


186

scores. While, in the presence of these predictors, numeracy is not helpful in

explaining exam score for those who sat the exam, those who did not sit the exam

tend to demonstrate lower numeracy than those who did. Also, students without the

core senior algebra and calculus based mathematics course, Maths B, tend to be more

likely not to sit the exam. The implications of this for teaching at the

secondary/tertiary interface are raised in Section 8.3.

In development of the Statistical Reasoning Assessment, Garfield (2003) comments

that the correlations between SRA score and a range of course outcomes were

“extremely low”, and concludes that statistical reasoning and performance in an

introductory statistics course are “unrelated”. However, in this study, all two-way

correlations between the variables: Exam, SRQ, SRQPC, Intro-Numeracy, Inter-

Numeracy, Self-efficacy and OP, were highly significant (p<0.001) with the

correlations between Exam and each of the statistical reasoning scores, and between

Exam and each of the Numeracy Scores falling between 0.2 and 0.3, and the

correlation between Exam and OP being -0.47. (Recall that numerically smaller OP

scores indicate a better performance.) Considering two-way correlations in

investigating student performance data, particularly at tertiary level, requires caution

with a number of aspects. Correlations, even partial correlations, cannot handle the

complexity of the bigger picture. The general linear modelling of Chapter 7

investigates all available variables simultaneously, and indicates that the SRQPC

Score helps to explain the exam mark after allowing for the tertiary entrance score

(assumed to be measuring general ability and application). As with R-squared in

general linear modelling, the actual sizes of correlations depend on the amount of

natural or inherent variation present in the variables under consideration. All tertiary

level educational data involve great inherent variation due to the multiplicity of

issues involved as well as characteristics and backgrounds of individuals, particularly

in large introductory statistics classes. The focus of this research lies not in

prediction or analysis of the many and varied individua ls who undertake introductory

tertiary statistics courses. Rather it is in measuring, unpacking and understanding the

statistical reasoning and numeracy which students bring to such courses (in areas

associated with science), and in investigating the inter-related links between these, all

background variables available and student performance.


187

The Numeracy Questionnaire, described and analysed in Chapter 4, has been shown

to be an appropriate tool for measuring pre-calculus skills relevant to an introductory

data analysis course at the secondary/tertiary interface. Performances in this

questionnaire demonstrate that many students struggle with pre-calculus skills,

despite having undertaken Maths B. The results of general linear modelling of

Numeracy Scores indicate the significance of undertaking mathematics beyond

Maths B, as well as performance in Maths B, self-efficacy and gender in predicting

Numeracy Score. Logistic regression indicates that success with inequalities and

application of fractions are particularly dependent on having studied beyond Maths

B. The inability to apply fractions is especially relevant to the development of

statistical thinking, where it can impede full and complete proportional reasoning, as

indicated in the analysis of the SRQ results. It is important to note that these pre-

calculus skills are supposed to be developed before senior secondary schooling. The

implications are that the development is so insufficient that without Maths B students

have very little numeracy skill and that even more than Maths B is required to ensure

complete confidence with essential basic numeracy.

Results of the Attitudinal Surveys, described in Chapter 3, indicate that the students

on whom this research is based arrive at the secondary/tertiary interface with

generally positive attitudes towards statistics, convinced of the links that exist

between mathematics and statistics and confident of their own ability to succeed in

the area. Many believe, however, that the study of statistics at school has been

beneficial only in the final two years of high-school. By investigating the effect of a

variety of attitudinal components on statistical reasoning in Chapter 7, only the self-

efficacy component has been shown to be influential. Self-efficacy is a useful

predictor of numeracy allowing for all of gender, previous study of higher level

mathematics, Maths B result, tertiary entrance score and classification as a

mathematics student ; and also of statistical reasoning allowing for numeracy and

gender. These models indicate that even after taking account of background and

ability factors, a student’s confidence in their own ability to succeed in mathematics

and statistics is positively related to their ability to do so. However there are also

indications that students discover at tertiary level that statistics is more complex than

they had thought.


188

8.2.1 Extent and limits of the study

This study has been conducted in the context of students at the secondary/tertiary

interface enrolled in programs associated with science. To derive implications of any

study beyond its immediate context requires sufficient information for benchmarking

across contexts. Although the cohorts of this study are enrolled in degree programs

broadly associated with science, the background information demonstrates the wide

diversity of students in the cohorts with respect to a range of variables. One

classification of the students, for example, is into maths and non-maths students, but

it must be remembered that this refers to choice of course and not just (or even

necessarily!) mathematical ability. Mathematically-capable students are found

across all tertiary programs including humanities, law and social sciences.

Correspondingly there are significant numbers of students in science programs who

are mathematically-averse and highly apprehensive about statistics. Hence, with due

care, the main findings of the study, described in Section 8.2 may be applicable in

more general settings.

The numeracy questionnaire reflects relevant aspects of the pre-senior curricula

across Australia, but the questions themselves and the analysis of the questionnaire

provide excellent benchmarking references for other contexts. The development of

the statistical reasoning questionnaire drew upon international studies at the

school/tertiary interface in the US and across school levels in Australia, plus

considerations of particular relevance to the Australian school context. Decisions

about inclusion of questions common to these international studies were based on the

balance of value of the questions themselves and of providing commonality in

national and international comparisons. Within Australia, although there are some

differences in the approach to statistics within the senior school curricula, most

notably between New South Wales11 and the other states, the curriculum-specific

questions refer to common curriculum elements. It is felt that the effectiveness of the

Attitudinal Survey is uncertain and that further exploration of attitudes towards

statistics, particularly amongst more mathematically inclined students could be

valuable.

11 Particularly at the senior school level, New South Wales has not yet adopted a data-based approach to statistical concepts within their mathematics syllabi


189

The general linear modelling used in Chapters 4 and 7 needs to be interpreted in the

manner in which it was intended. Although the term “predictor” is used, and fitted

equations are given, there is no intention that these relationships be used to predict

outcomes for individual student profiles. The amount of variation left unexplained in

the data after models are fitted is indicative of the inherent variation among students

which prevents such prediction. The purpose of the modelling is to investigate and

unpack the relationships between statistical reasoning and numeracy as well as

background variables and student performance. The only instance where the

predictive nature of models may be of use, is to highlight students who achieve

significantly lower results in numeracy or statistical reasoning than their

backgrounds would predict, and to recommend such students for future support.

The variation between students also raises the issue of other variables not included in

this study which could be measured and included in the modelling. One of these is

student attendance at classes. Further work on measuring and including this variable

is currently being undertaken.

Due to practical and ethical considerations of students, the study did not include the

development of an instrument to measure general statistical reasoning at the end of

the semester. The end of semester component of assessment is a measure of the

operational knowledge attained by the student throughout the course. It should be

noted that students prepare and take their own summaries into this exam, and that the

time provided is such that all students finish within, mostly well within, the time

given. The analysis of the end of semester component of the assessment is oriented to

investigating the links between the measures provided by the instruments, the

background variables and a measure of knowledge, understanding and application

ability in introductory statistical data analysis. A better measure of the full benefit to

the student of the course would be the overall final result that incorporates the project

mark and continuous assessment. However this result is likely to be more heavily

influenced by other variables not included in this study, such as class attendance and

participation.


190

8.3 Implications for Teaching, Assessment and Advising

In teaching at the secondary/tertiary interface, educators need to be aware of and

sensitive to the level of numeracy skills which students are likely to possess. This

involves an understanding of the manner in which difficulties are likely to be

exhibited in areas which stem from upper-primary and lower-secondary levels even

amongst students who have completed an algebra and calculus based senior

mathematics subject. This lack of solid grounding is particularly likely to be

demonstrated when basic skills are applied in new contexts and multi-step situations.

Already many universities and departments within universities provide support for

students whose basic quantitative skills do not meet the needs of their course. This

research reinforces the need for such support mechanisms and demonstrates their

requirement regardless of whether or not the students in question officially possess

the formal prerequisites. Clearly demonstrated in this research is the role of further

study in reinforcing basic skills. Educators at the secondary/tertiary interface, while

desiring to move students on to applications and higher levels of understanding, may

well benefit their students by intentionally providing opportunities to reinforce basic

skills in new contexts rather than avoiding aspects which are known to cause

problems.

This research brings into question the tendency in universities to remove formal

mathematical prerequisites. Pressure on high-school curricula and the lack of

appreciation in the general community of the generic skills developed through

mathematics, results in fewer students undertaking algebra and calculus based

mathematics courses when they are not formally required for further study. This

research indicates that such students are in danger of not developing even their basic

numeracy skills due to lack of reinforcement.

This tendency to remove an algebra and calculus based formal prerequisite is

sometimes supported by tertiary educators who, on experiencing the lack of ability

by some who have studied such a course, deem it to have been of no use. What these

educators are failing to recognise is that firstly the problems are often founded in

inadequate grounding in earlier years, and secondly the only way to maximise


191

students’ abilities is to ensure they experience as much reinforcement as possible

through the highest level of mathematics they are capable of studying.

Those who advise students also need to be aware of the impact of studying

mathematics and higher level mathematics in particular. The improvement in basic

numeracy skills, ability to handle multi-step problems, and the transferability of

skills across contexts, reaches well beyond the specific content of any particular

mathematics course. This research suggests that these capabilities are all maximised

by studying more mathematics.

This research has implications for the teaching of introductory statistics in a course

broadly associated with science at the secondary/tertiary interface. Given the

strength of the relationship between statistical reasoning and numeracy, the clear

recognition of this link in students’ minds, and the students’ general confidence in

their own ability in mathematics and statistics, it is reasonable to expect that

attempting to teach an introductory data analysis course in this context, without

sufficient acknowledgement of its mathematical links, is likely to be counter-

productive. A better approach is to acknowledge and build on a simple but distinctly

quantitative foundation.

Course outcomes in this study imply that students who enter the course without the

mathematical background of an algebra and calculus based course are no less likely

than others to succeed provided they persist until the end of the course. This

observation, however, is limited to those who do persist and therefore should not be

taken as reason to encourage more students of lesser mathematical background to

undertake the course. The higher attrition rate of these students, the observation of

the difficulties which they have with all areas of the course and the advantages of

consolidation, clearly imply that wherever possible such students should be

encouraged to obtain an algebra and calculus based grounding before encountering

introductory data analysis.

In some sectors of the statistical community, there has been an argument put forward

to move the teaching of statistics from the school mathematics curricula into another

discipline area, such as geography, or into a new discipline area of its own. This

discussion is generally based on the perceived inability of school mathematics


192

teachers to teach statistical thinking. These statisticians have succumbed to the

danger of focusing on the differences between mathematics and statistics rather than

their commonalities. This research indicates the folly of such an approach as it

would result in the teaching of statistics at school level becoming de-quantified,

contrary to the relationship indicated in this research. Also the forced choice these

students would undoubtedly have to make, away from any higher level mathematics

and often any core algebra and calculus based mathematics, would deprive them of

any opportunity to consolidate basic numeracy skills. While inadequacies with

current school curricula and their teaching are acknowledged, and indicated in this

research through the fact that a majority of students felt statistics at school level had

been beneficial only in the final two years, such an approach is unlikely to have a

productive impact on the statistical thinking of students at the secondary/tertiary

interface. A better approach rests in increased and improved teaching of statistical

thinking to current and prospective school teachers. The statistical community

would do better to focus on supporting mathematics teachers in their professional

development, rather than encouraging the artificial removal of statistics from their

discipline.

The effectiveness of the tools used in this research has implications for assessment of

students at the secondary/tertiary interface. Conflict between the need for task-

specificity and familiarity to students has led in the past to doubt over the validity of

measures of self-efficacy in statistics (Finney and Schraw, 2003). The significance

of self-efficacy in predicting numeracy and statistical reasoning in this study suggests

that the tool which has been used here is a valid measure of self-efficacy in

mathematics and statistics at this level. This measure consists of only five items

which were considered both relevant to the course and within the students’ previous

experience. Its usefulness suggests that if desired, the development of a more

detailed measure would not be difficult, but that even this simple tool can be used to

glean useful information. Furthermore, the lack of significance of other components

of student attitude indicates that questionnaires aimed at measuring attitudes towards

statistics could be made more informative by focussing specifically on self-efficacy.

Refining and adding items which measure this dimension and understand its impact


193

on performance is likely to be more productive than developing measures of general

attitudes.

The work of Watson and Callingham (2003) indicates that it is valid to measure

statistical reasoning of school students on a single hierarchical scale without

resorting to a collection of subscales. This study supports the use of such a scale at

the secondary/tertiary interface in the form of the Statistical Reasoning

Questionnaire. While there is evidence that items reaching into higher levels of

statistical reasoning should be developed to measure the full range of ability, the

current collection of items is appropriate as a screening tool at this level.

Furthermore, summarising the results of the Statistical Reasoning Questionnaire in

the form of the SRQPC Score incorporates students’ levels of understanding and

allows for a more meaningful expression of the range of statistical ability.

8.4 Implications for Future Research

The main emphasis of future research indicated by this work is in the area of the

assessment of statistical reasoning. Although much work has been done in this area,

it is felt that there is substantial room for the development of further items

appropriate for use at the secondary/tertiary interface. The analysis of the SRQ in

this thesis indicates that more items could be developed which reach into the higher

levels demonstrated by some students in this study. A particular challenge is the

problem of how to formulate such items without relying on specific terminology or

mathematical ability. In fact, whether it is possible or desirable to do so is perhaps

doubtful, given that understanding statistical terminology is an aspect of statistical

reasoning, and given the relationship that exists between statistical reasoning and

numeracy. A best approach may be to focus on developing items which incorporate

these aspects separately.

A further challenge is the assessment of general statistical reasoning at the

conclusion of an introductory data analysis course. This issue involves both the

construction of an appropriate instrument and its administration to students. Given

the students’ common background from the course, items for such an instrument


194

would be able to incorporate more statistical terminology, as well as applications of

specific techniques. Although it has been done in other studies, the use of such a

research instrument at the end of semester was seen as counterproductive for students

in this study. Hence the administration of an instrument may be more difficult than

its design. It could also be argued that the real measure of course success is the level

of statistical reasoning and use of statistics with which students operate at some time

after completion of the course. A study which surveyed students’ reasoning some

months after conclusion of the introductory data analysis course could provide more

information, but would be difficult to administer and incur severe disadvantages in

the area of rates and possible bias of response.

The Rasch analyses used in this study have shown that, at the point of entry to

tertiary education, statistical reasoning appears to be a single construct. In

developing an instrument to measure statistical reasoning at the end of an

introductory course and beyond, it would be interesting to determine whether

statistical reasoning could still be considered as a single construct given the broader

range of topic areas which would need to be included.

In developing the SRQPC Score the results of the Rasch partial credit model were

used. Another approach to using this model, which has been used by Reading

(2002), is to analyse students’ ability estimates on the logistic scale. These scores

could be modelled in a similar way to the SRQPC Scores. There are also

possibilities for future work in synthesising the information obtained from the

students in this study with Reading’s development of a profile of statistical

understanding.

A separate idea to pursue is in the area of students’ attitudes. The discrepancy which

was discussed in Section 3.4 between the students’ approval of the course and their

apparent decline in attitudes towards statistics, raises questions about the perspective

with which students arrive at tertiary level. Informal discussion with students

suggests that it is only as the course proceeds that they begin to appreciate the

breadth and depth of the discipline.


195

It is to the advantage of us all if more of the best and brightest come into statistics

education systems and leave them better educated, with better developed thinking skills,

and a greater appreciation for the power of statistics. (Wild, 2006)

Further research into students’ initial and developing attitudes and how to encourage

their appreciation of the field may assist in encouraging good students into statistics -

a goal which is worth pursuing.

“When I grow up I want to be a statistician!” It would be nice but it just doesn’t

happen, does it? The most critically imp ortant element for the future health of our

discipline is an ample supply of very bright young minds entering it. (Wild, 2006)

Appendix A Background Information Survey

196

Appendix A. Background Information Survey

Survey of Students – Semester 1 2005 MAB101 Statistical Data Analysis 1

Name (underline surname): …………………………………………………………

Title (Mr, Ms, Miss, Mrs, other): …………. Student Number: ……..………… Course Code: …………………….. Major: ………………......……… (Education students) Discipline X: ………….… Discipline Y: ………...…………

Schooling completed grade 12

up to grade 11 only up to grade 10 only

Year in which you finished school: ……. Where: Qld

NSW Other Aust. state …… Overseas

School: ………………………………………………....…………………………

School results OP or Tertiary Entrance Rank: ……….

Maths subjects studied in grade 12 Maths subjects studied in grade 11 Qld Maths A Result ……… Qld Maths A Result ……… Qld Maths B Result ……… Qld Maths B Result ……… Qld Maths C Result ……… Qld Maths C Result ……… Other: ……............................ Result ……… ………........……… Result ………

………………............ Result ……… ……………........… Result ………

or Alternative Entry MAB105 at QUT Result ……… QUT Maths bridging course Result ……… Other ……………….……….. Result ………

Is this your 1st semester at QUT yes no

QUT to date (if applicable) Maths subjects studied

…………..…….. …………..……..

Have you studied other maths or statistics subjects at any tertiary institution? Maths subjects studied Result Year ……………………………… …….. ……....

Appendix B 2004 Attitudinal Survey

197

Appendix B. 2004 Attitudinal Survey

Survey of Attitudes About Statistics

DIRECTIONS: The questions below are designed to research your attitudes and beliefs about statistics. Read each statement carefully and choose your response from A (strongly disagree) through to E (strongly agree). If your response is “don’t know”, choose C for Neutral. Please mark your response on the mark sheet provided. To facilitate data matching, please record your student number on the mark sheet and (should you use it) on the reverse side of this page. Names should not be included. If you wish to elaborate on any of your responses, then there is space over the page to do so. (For example, you may wish to refer to various stages of education.)

Str

ongl

y D

isag

ree

Dis

agre

e

Neu

tral

Agr

ee

Str

ongl

y A

gree

1. Statistics is boring. A B C D E

2. Statistics can be used to justify almost anything. A B C D E

3. I find statistics easy. A B C D E

4. Statistics will be valuable in my chosen career. A B C D E

5. I don’t like statistics because there never seems to be a right or wrong answer.

A B C D E

6. I feel insecure when I have to do a statistics problem. A B C D E

7. Statistics is a complicated subject. A B C D E

8. Statistical skills will make me more employable. A B C D E

9. I use statistics in my everyday life. A B C D E

10. I would do better at statistics if I were better at maths. A B C D E

11. I want to learn more statistics. A B C D E

12. Understanding statistics is important in modern society. A B C D E

13. If you are good at maths you are more likely to understand basic statistical concepts.

A B C D E

14. I am taking this statistics unit only because I have to. A B C D E

15. I am good at maths. A B C D E

16. I expect to do well in this unit. A B C D E

17. I am not confident of my ability to read and interpret information presented graphically.

A B C D E

18. I expect to be able to do the computing necessary for this unit.

A B C D E

19. I expect to have trouble determining which procedure to use to answer questions.

A B C D E

Appendix B 2004 Attitudinal Survey

198

This side is only for optional comments. Student Number: You may wish to explain or elaborate on some of your responses. Use the space provided below to do so for as many or as few statements as you wish. 1. Statistics is boring. 2. Statistics can be used to justify almost anything. 3. I find statistics easy. 4. Statistics will be valuable in my chosen career. 5. I don’t like statistics because there never seems to be a right or wrong answer. 6. I feel insecure when I have to do a statistics problem. 7. Statistics is a complicated subject. 8. Statistical skills will make me more employable. 9. I use statistics in my eve ryday life. 10. I would do better at statistics if I were better at maths. 11. I want to learn more statistics. 12. Understanding statistics is important in modern society. 13. If you are good at maths you are more likely to understand basic statistical concepts. 14. I am taking this statistics unit only because I have to. 15. I am good at maths. 16. I expect to do well in this unit. 17. I am not confident of my ability to read and interpret information presented graphically. 18. I expect to be able to do the computing necessary for this unit. 19. I expect to have trouble determining which procedure to use to answer questions.

Appendix C 2004 Follow-up Attitudinal Survey

199

Appendix C. 2004 Follow-up Attitudinal Survey

Survey of Attitudes About Statistics

DIRECTIONS: The questions below are designed to research your attitudes and beliefs about statistics. Read each statement carefully and choose your response from A (strongly disagree) through to E (strongly agree). If your response is “don’t know”, choose C for Neutral. Please mark your response on the mark sheet provided. To facilitate data matching, please record your student number on the mark sheet. Names should not be included.

Str

ongl

y D

isag

ree

Dis

agre

e

Neu

tral

Agr

ee

Str

ongl

y A

gree

1. Statistics is boring. A B C D E

2. Statistics can be used to justify almost anything. A B C D E

3. I find statistics easy. A B C D E

4. Statistics will be valuable in my chosen career. A B C D E

5. I don’t like statistics because there never seems to be a right or wrong answer.

A B C D E

6. I feel insecure when I have to do a statistics problem. A B C D E

7. Statistics is a complicated subject. A B C D E

8. Statistical skills will make me more employable. A B C D E

9. I use statistics in my everyday life. A B C D E

10. I would do better at statistics if I were better at maths. A B C D E

11. I want to learn more statistics. A B C D E

12. Understanding statistics is important in modern society. A B C D E

13. If you are good at maths you are more likely to understand basic statistical concepts.

A B C D E





A B C D E

18. I have been able to do the computing necessary for this unit.

A B C D E

19. When I use statistics in the future, I expect that I will have trouble determining which procedure to use.

A B C D E

Appendix D 2005 Attitudinal Survey

200

Appendix D. 2005 Attitudinal Survey

Your feelings about Statistics

Complete this statement: When I think of probability and statistics at school, I think of

___________________________________________________________________

___________________________________________________________________

___________________________________________________________________

Did you find probability and statistics (sometimes called chance and data) beneficial

in grades 11&12 yes/no

in grades 8 to 10 yes/no

in grades 4 to 7 yes/no

What did you or didn’t you find beneficial about it?

___________________________________________________________________

___________________________________________________________________

___________________________________________________________________ Please respond to the following questions on the computer mark sheet provided. Read each statement carefully and choose your response from A (strongly agree) through to E (strongly disagree). If your response is “don’t know”, choose C for Neutral. To facilitate data matching, please record your student number on the mark sheet.

Str

ongl

y A

gree

Agr

ee

Neu

tral

Dis

agre

e

Str

ongl

y D

isag

ree





A B C D E

5. I expect to be able to do the computing necessary for this unit.

A B C D E

6. I expect to have trouble determining which procedure to use to answer questions.

A B C D E

Appendix E Numeracy Questionnaire

201

Appendix E. Numeracy Questionnaire

For each question, choose the correct answer and mark it on the answer sheet provided.

Please record your student number to facilitate data matching. Names should not be recorded.

Do not use calculators.

1. Written as a fraction in its simplest form, 20% is equal to:

(a) 201

(d) 52

(b) 10020

(e) 21

(c) 51

2. Written as a fraction in its simplest form, 0.1% is equal to:

(a) 1000

1 (d)

10010

(b) 100

1 (e)

109

(c) 101

3. Written as a percentage correct to 1 decimal place, the fraction 61

is equal to:

(a) 6.0% (d) 60.0%

(b) 12.5% (e) 66.7%

(c) 16.7%

4. A class consists of 80 males and 120 females. A non-compulsory excursion is attended by 20% of the male students and 30% of the females. The percentage of the class which attends the excursion is:

(a) 25% (d) 52%

(b) 26% (e) 56%

(c) 28%


202

5. Possible subject grades at a particular institution are 1 to 7, with 7 being the highest. In a particular class of 200, 5% of students were given a 1 or a 2.

The number of students receiving a 1 or a 2 was:

(a) 5 (d) 57

(b) 10 (e) 100

(c) 40

6. In the same class (as in question 5), 85% of students were awarded a grade

from 3 to 6. The number of students receiving a grade of 7 was:

(a) 10 (d) 30

(b) 15 (e) 170

(c) 20

7. A group of 340 students must be divided into lab classes with a maximum of

30 students in each. The smallest number of lab classes needed is:

(a) 10 (d) 12

(b) 11 (e) 13

(c) 11.3

8. 0.66 + 0.55 is equal to:

(a) 0.1111 (d) 1.21

(b) 0.121 (e) 12.1

(c) 1.111

9. 51

61

+ is equal to:

(a) 302

(d) 3011

(b) 111

(e) 65

(c) 112


203

10. 65

32

41

++ is equal to:

(a) 138

(d) 125

1

(b) 7213

(e) 43

1

(c) 7235

11. 4

2945 22 ×+× is equal to:

(a) 29 (d) 89

(b) 41 (e) 181

(c) 56

12. 41

91

+ is equal to:

(a) 51

(d) 613

(b) 131

(e) 65

(c) 52

13. 253020 ×÷+ is equal to:

(a) 5 (d) 32

(b) 20 (e) 52

(c) 23

14. When 4 ,2 ,13 ,25 ==== tsmn , the expression ( ) ( )

211 22

−+−+−

mntmsn

(correct to 2 decimal places) is equal to:

(a) 2.83 (d) 6.93

(b) 3.94 (e) 11.31

(c) 6.00


204

15. Given that 1

11 n

xp = ,

2

22 n

xp = and

21

21

nnxx

p++

= , the value of ( )pp

pp−−

121 , when

101 =x , 152 =x , 251 =n and 752 =n , is given by:

(a) 41

(d) 1516

(b) 54

(e) 77

108

(c) 65

16. Which of the following sets of values is correctly ordered from smallest to largest?

(a) –0.05, 51− , 0.05, 0.55, 0.5

(b) -0.05, 51− , 0.05, 0.5, 0.55

(c) 51− , -0.05, 0.05, 0.5, 0.55

(d) –0.05, 0.05, 51− , 0.5, 0.55

(e) 51− , -0.05, 0.05, 0.55, 0.5

17. Which of the following sets of values is correctly ordered from smallest to largest?

(a) 2013 , 7

8 , 64 , 5

3 , 31

(b) 78 , 6

4 , 2013 , 5

3 , 31

(c) 31 , 5

3 , 64 , 7

8 , 2013

(d) 31 , 5

3 , 2013 , 6

4 , 78

(e) 31 , 20

13 , 64 , 5

3 , 78

18. The solution for x to the equation: cxb

a=

− 1 is given by:

(a) cb

ax

1−= (d)

1+=

cba

x

(b) 1−

=abc

x (e) c

ba

x−

=1

(c) bc

ax

1−=


205

19. The solution to the inequality: 1252 <+a is:

(a) 27

−>a (d) 1<a

(b) 27

<a (e) 27

=a

(c) 3=a

20. The solution to the inequality: 4148

<a

is:

(a) 121

<a (d) 12>a

(b) 12<a (e) 192>a

(c) 192<a

21. The solution to the pair of inequalities 167 <+a and 63 >a is:

(a) 75.3=a (d) 9<a

(b) 8=a (e) 92 << a

(c) 92 <> a

Appendix F Statistical Reasoning Questionnaire

206

Appendix F. Statistical Reasoning Questionnaire12

SRQ_1 To get the average number of children per family in a small town, a teacher counted the total number of children in the town. She then divided by 50, the total number of families. The average number of children per family was 2.2. Which of the following is certain to be true:

a) Half of the families in the town have more than 2 children.

b) More families in the town have 3 children than have 2 children.

c) There are a total of 110 children in the town.

d) There are 2.2 children in the town for every adult.

e) The most common number of children in a family is 2.

f) None of the above.

SRQ_2 A small object was weighed on the same scales separately by nine students in a science lab. The weights (in grams) recorded by each student are shown below.

6.3 6.0 6.0 15.3 6.1 6.3 6.2 6.15 6.3

The true weight could be estimated in several ways.

How would you estimate it?

SRQ_3 Box A and Box B are filled with red and blue marbles as follows.

Box A Box B

Each box is shaken. In order to win a ticket to a sporting match, you need to get a blue marble, but you are only allowed to pick out one marble without looking. Which box should you choose?

a) Box A (with 12 red and 8 blue).

b) Box B (with 30 red and 20 blue).

c) It doesn’t matter which box is chosen.

12 The Statistical Reasoning Questionnaire given here combines items over two years. SRQ_21 and SRQ_22 were not in the 2004 version, and SRQ_4 was not in 2005.

12 red

8 blue

30 red

20 blue


207

SRQ_4 A bottle of medicine has the following printed on it: WARNING: For applications to skin areas there is a 15% chance of getting a rash. If you get a rash, consult your doctor. How would you interpret this?

a) Don’t use the medicine on your skin – there’s a good chance of getting a rash.

b) For application to the skin, apply only 15% of the recommended dose.

c) If you get a rash, it will probably involve only 15% of the skin.

d) About 15 out of every 100 people who use this medicine get a rash.

e) There is hardly any chance of getting a rash using this medicine.

SRQ_5 An Australian male is rushed to hospital in an ambulance. Which of the following is least likely:

a) The man is over 55.

b) The man has had a heart attack.

c) The man is over 55 and has had a heart attack.

SRQ_6 As captain of your cricket team you have lost 8 out of 9 tosses in your previous 9 matches. For the next 4 tosses of the coin, you choose heads. Tails comes up 4 times. For the 5th toss, what should you choose?

a) Heads

b) Tails

c) It doesn’t matter

What is the probability of getting heads on this 5th toss?______

What is the probability of getting tails on this 5th toss?______

Note: assume only fair coins are used in tosses at cricket matches!

SRQ_7 Mrs Jones wants to buy a new car, either a Honda or a Toyota. She wants whichever car will break down the least. First she read in Consumer Reports that for 400 cars of each type, the Toyota had more breakdowns than the Honda. Then she talked to three friends. Two were Toyota owners, who had no major breakdowns. The other friend used to own a Honda, but it had lots of break-downs, so he sold it. He said he’d never buy another Honda. Which car should Mrs Jones buy?

a) Mrs Jones should buy the Toyota, because her friend had so much trouble with his Honda, while the other friends had no trouble with their Toyotas.

b) She should buy the Honda, because the information about break-downs in Consumer Reports is based on many cases, not just one or two cases.

c) It doesn’t matter which car she buys. Whichever type she gets, she could still be unlucky and get stuck with a particular car that would need a lot of repairs.


208


SRQ_9 A farmer wants to know how many fish are in his dam. He took out 200 fish and tagged each of them. He put the tagged fish back in the dam and let them get mixed with the others. On the second day, he took out 250 fish in a random manner, and found that 25 of them were tagged. Estimate how many fish are in the dam.

SRQ_10 The Bureau of Meteorology wanted to determine the accuracy of their weather forecasts. They searched their records for those days when the forecaster had reported a 70% chance of rain. For those particular days (that is, those days for which the forecast was stated as a 70% chance of rain), they compared the forecast with records of whether or not it actually rained.

The forecast of 70% chance of rain can be considered very accurate if it rained on:

a) 95% - 100% of those days

b) 85% - 94% of those days

c) 75% - 84% of those days

d) 65% - 74% of those days

e) 55% - 64% of those days

space for working


209

SRQ_11 A group of students recorded the number of years their families had lived in their town. Here are two graphs that the students drew to illustrate their results.

Graph 1


0 1 2 3 4 5 6 10 11 12 13 14 17 25 37 YEARS IN TOWN

Graph 2


0 5 10 15 20 25 30 35 YEARS IN TOWN

Which of these two graphs (1 or 2) would you recommend the students use and why?

SRQ_12 Another group of students carried out a survey at the local library regarding the most frequent reason for using the internet. They produced the following two graphs.

Graph A Graph B

Internet Usage

contact friends33%

study22%

entertainment17%

work14%

information6%

other8%

Internet Usage

contact friends33%

study22%

entertainment17%

work14%

information6%

other8%

Which of the graphs, A or B, would you recommend the students use and why?


210

SRQ_13 Half of all newborns are girls and half are boys. Hospital A records an average of 50 births a day. Hospital B records an average of 10 births a day. On a particular day, which hospital is more likely to record 80% or more female births?

a) Hospital A (with an average of 50 births a day).

b) Hospital B (with an average of 10 births a day).

c) The two hospitals are equally likely to record such an event.

SRQ_14 A local newspaper published the following article:

Do you agree with Mr Robinson’s findings? (Please explain your response.)

SRQ_15 A music teacher was pleased to read the following article in a professional journal:

Do you agree with these research findings? (Please explain your response.)

Family car is killing us, says researcher

Twenty years of research has convinced Mr Robinson that motoring is a health hazard. Studying figures from the Australian Bureau of Statistics, Mr Robinson has produced graphs which show quite dramatically that as the numbers of new vehicle registrations increase, so have the numbers of deaths due to heart-related causes.

Instrumental Lessons Improve OPs.

Research has shown that learning to play a musical instrument during primary school increases a child’s chance of attaining a good OP. In a longitudinal study, students enrolled in primary schools across Queensland since 1990 have been followed until they graduated from high school. Of those students who were involved in an instrument program, 20% obtained an OP of 5 or better, while the figure was 15% for those not involved in instrumental music.


211

SRQ_16 A Brisbane City Council brochure states that on a typical summer weekend, users of the Goodwill Bridge fall into the following age groups.

Age group percentage

of users

0-10 5

11-20 20

21-30 40

31-40 15

41-50 8

51-60 5

61-70 5

71+ 2

One typical summer weekend, 100 people were observed crossing the bridge.

Which of the data sets given below would cause you to question the information in the brochure?

a) Set A only.

b) Set B only.

c) Set C only.

d) Set A and B only.

e) Set A and C only.

f) Set B and C only.

g) Set A, B and C.

h) None of A, B or C.

Set A Set B Set C Age

group percentage

of users Age group

percentage of users Age

group percentage

of users

0-10 5 0-10 10 0-10 7

11-20 13 11-20 20 11-20 19

21-30 32 21-30 45 21-30 43

31-40 14 31-40 10 31-40 13

41-50 13 41-50 5 41-50 10

51-60 12 51-60 5 51-60 3

61-70 11 61-70 3 61-70 4

71+ 0 71+ 2 71+ 1


212

The following data relates to questions 17 to 20


0.00 1.04 1.04 1.18 1.36

1.54 1.56 1.59 1.62 1.62

1.64 1.64 1.71 1.71 1.75

1.76 1.77 1.77 1.80 1.86

1.92 1.95 2.03 2.14 2.41



SRQ_19 Using your answer to question SRQ_17, calculate the median total rainfall (in millimetres) for the region.



213

SRQ_21 The graph below shows the number of members of the American Mathematical Society.

Join the ever increasing number of professionals who enjoy the benefits of AMS membership! 29 28 26 24 22 20 18 1987 1988 1989 1990 1991 Circle any of the following statements which are correct:

a) Membership of the AMS in 1991 was twice what it was in 1989.

b) Between 1987 and1991 membership of the AMS doubled every two years.

c) Membership of the AMS could reasonably be expected to reach 50000 by 1992

d) None of the above statements is correct.

Mem

bers

hip

in th

ousa

nds


214

SRQ_22 The heights of first year female university students are normally distributed with a mean of 165 cm and a variance of 4 cm2 . The graphs below are all of the standard normal distribution, that is, normal with mean 0 and variance 1. Choose the graph in which the shaded area gives the probability that a randomly chosen first year female student has a height of more than 161 cm.

a) b) c) d) e)

3210-1-2-3

3210-1-2-3

3210-1-2-3

3210-1-2-3

3210-1-2-3

Appendix G Responses to the SRQ

215

Appendix G. Responses to the SRQ

Table G gives the responses to the SRQ, pooled across all three cohorts. For

multiple choice questions, the percentage of students responding to each choice is

given. For free answer items, a summary description of responses is given.

Reponses are ordered according to decreasing levels of understanding.

Item number Item Response Percentage

c) There are a total of 110 children - only response. 65.7

d) There are 2.2 children for every adult. 0.3

e) The most common number of children in a family is 2. 17.2

f) None of the above. 8.0

Multiple responses. 2.6

a) Half the families have more than 2 children. 2.8

b) More families have 3 children than have 2 children. 1.6

SRQ_1

To get the average number of children per family in a small town, a teacher counted the total number of children in the town. She then divided by 50, the total number of families. The average number of children per family was 2.2. Which of the following is certain to be true?

No response. 1.8

Exclude outlier to calculate mean. 37.1 Use median; use median with outlier excluded. 3.4

Uncertain about outlier; calculate mean. 33.4

Mean with max & min excluded. 8.8

Mode. 6.7 Mean of max & min; do it again multiple times & average 0.6

6; discard outlier but then unclear repeat once only 1.5

Multiple responses 4.4

SRQ_2

A small object was weighed on the same scales separately by nine students in a science lab. The weights (in grams) recorded by each student are shown below.

6.3 6.0 6.0 15.3 6.1 6.3 6.26.15 6.3

The true weight could be estimated in several ways. How would you estimate it?

Other; no response 3.9

c) It doesn’t matter. 81.7

a) Box A. 12.9

b) Box B. 5.2

SRQ_3

Box A and Box B are filled with red and blue marbles as follows. Box A Box B

Each box is shaken. In order to win a ticket to a sporting match, you need to get a blue marble, but you are only allowed to pick out one marble without looking. Which box should you choose? No response. 0.2

Table G continued over

30 red 20 blue

12 red 8 blue


216


d) About 15 out of every 100 people who use this medicine get a rash; or this with either of next 2

95.7

a) Don’t use the medicine on your skin – there’s a good chance of getting a rash.

1.1

e) There is hardly any chance of getting a rash using this medicine. 2.1

b) For application to the skin, apply only 15% of the recommended dose.

0.3

c) If you get a rash, it will probably involve only 15% of the skin. 0.8

SRQ_4

A bottle of medicine has the following printed on it: WARNING: For applications to skin areas there is a 15% chance of getting a rash. If you get a rash, consult your doctor. How would you interpret this?

No response. ---

c) The man is over 55 and has had a heart attack. 62.2

a) The man is over 55. 20.8

b) The man has had a heart attack. 15.2 SRQ_5

An Australian male is rushed to hospital in an ambulance. Which of the following is least likely?

No response. 1.8

Either – 0.5, 0.5. 76.5 Heads; tails – 0.5, 0.5. 10.5 No choice – 0.5, 0.5. 0.2

Either – any or no prob given. 4.4

Heads; tails – probs add to 1. 3.6

Other. 3.4

SRQ_6

As captain of your cricket team you have lost 8 out of 9 tosses in your previous 9 matches. For the next 4 tosses of the coin, you choose heads. Tails comes up 4 times. For the 5th toss, what should you choose? What is the probability of getting heads on this 5th toss? What is the probability of getting tails on this 5th toss?

No response. 1.5

b) Buy the Honda, the information in Consumer Reports is based on many cases...

67.4

c) It doesn’t matter which car she buys. She could still be unlucky. 29.9

a) Buy the Toyota, because her friend had so much trouble with his Honda, ...

2.0

SRQ_7

Mrs Jones wants to buy a new car, either a Honda or a Toyota. First she read in Consumer Reports that for 400 cars of each type, the Toyota had more breakdowns than the Honda. Then she talked to three friends. Two were Toyota owners, who had no major breakdowns. The other friend used to own a Honda, but it had lots of break-downs. Which car should Mrs Jones buy? No response 0.7

Adds to more than 100% 72.4

Out of proportion. 5.2 “Other” too large; content and heading inconsistent. 8.8

Uncertain re importance of >100%. 0.3

Not enough info. 1.0

Other. 3.1


Style issues; no response. 9.1



217


2000 55.2

Responses that have calculated 10% then stopped. 0.3

Incorrect responses between 1000 and 3000. 11.1

Responses >3000 or between 600 & 800. 4.6

Responses between 250 and 550. 16.9

SRQ_9

A farmer wants to know how many fish are in his dam. He took out 200 fish and tagged each of them. He put the tagged fish back in the dam and let them get mixed with the others. On the second day, he took out 250 fish in a random manner, and found that 25 of them were tagged. Estimate how many fish are in the dam.

No response; <250. 11.9

d) 65% - 74% of those days. 46.4

c) 75% - 84% of those days. 8.0

a) 95% - 100% of those days. 35.9

b) 85% - 94% of those days. 8.3

e) 55% - 64% of those days. 0.5

SRQ_10

The Bureau of Meteorology wanted to determine the accuracy of their weather forecasts. They searched their records for those days when the forecaster had reported a 70% chance of rain. For those particular days ..., they compared the forecast with records of whether or not it actually rained. The forecast of 70% chance of rain can be considered very accurate if it rained on: No response. 0.8

Graph 2 – uniform scale; more accurate; easier to use. 68.6

Graph 2 – clearer; looks better; no reason. 9.8

Graph 2 – grouped; other. 5.4

Graph 1 – any reason; either. 14.9

SRQ_11

A group of students recorded the number of years their families had lived in their town. Here are two graphs that the students drew to illustrate their results: Graph 1: not to scale Graph 2: to scale Which of the graphs would you recommend the students use and why? No response. 1.31

Graph A – size of pieces reflects proportions; more accurate; easier to compare pieces; clearer.

57.9

Graph A – can be done by hand; simpler; all the same; no reason. 4.8

Either; Graph B any reason. 35.0

SRQ_12

Another group of students carried out a survey at the local library regarding the most frequent reason for using the internet. They produced the following two graphs: Graph A: 2D pie graph Graph B: 3D pie graph Which of the graphs would you recommend the students use and why? No response. 2.3

b) Hospital B. 34.6

c) The two hospitals are equally likely to record such an event. 58.5

a) Hospital A. 5.9 SRQ_13

Half of all newborns are girls and half are boys. Hospital A records an average of 50 births a day. Hospital B records an average of 10 births a day. On a particular day, which hospital is more likely to record 80% or more female births? No response. 1.0

Disagree – population increase so need to use rate. 15.6

Disagree – other factors. 29.0

Disagree – correlation doesn’t imply causation; just coincidence; not enough info.

37.6

Agree – large sample size. 1.5

Disagree – other or no reason. 7.8 Agree – explains a possible link; other. 5.1

Agree – no reason. 0.5

SRQ_14

A local newspaper published the following article: (regarding increase in vehicle registrations being linked to heart disease) Do you agree with Mr Robinson’s findings?

No response. 2.9



218


Disagree – other factors. 18.2

Disagree – correlation doesn’t imply causation; just coincidence; not enough info; want to check data.

9.5

Disagree – not enough difference in %; too much difference in sample sizes; conditional probability confused.

19.0

Agree – large sample size. 7.5

Disagree – other or no reason. 5.6

Agree – statistical info was supplied; explains possible link; other.

25.7

Agree – personal experience; no reason. 7.0

SRQ_15

A music teacher was pleased to read the following article in a professional journal: (regarding link between OP and music lessons) Do you agree with these research findings?

No response. 7.5

a) A only. 42.2

b) B only; 1.6

c) C only. 2.1

d) A & B only. 5.4

e) A & C only 1.6

f) B & C only. 2.1

g) A, B & C. 3.3

h) none 37.6

SRQ_16

A Brisbane City Council brochure states that on a typical summer weekend, users of the Goodwill Bridge fall into the following age groups. (table given) One typical summer weekend, 100 people were observed crossing the bridge. Which of the data sets given below would cause you to question the information in the brochure? A, B, C are in decreasing order of difference from brochure. No response

4.1

SRQ_17 to 20


Correct (1.71 - 13th obs) 66.8 Data value near median; max/2; middle row; mean of middle row. 9.2

Mean; modes. 3.4

Other. 2.3


No response. 18.3

Correct (7th or 6.5th obs) 14.1 6.25th obs.; posn of quartile given not value. 0.5

Data value near quartile; med/2; max/4; first quarter of data. 26.2

3rd quartile. 0.3

Other. 22.7


No response. 36.2



219


Correct 17.0

Incorrect trans of median involving logs, exp or powers of 10 11.1

Other trans of median 4.4

Median; trans of mean 6.4

Other 7.9

SRQ_19 Using your answer to question SRQ_17, calculate the median total rainfall (in millimetres) for the region

No response 53.2

Correct 16.7 Incorrect trans of max involving logs, exp or powers of 10 11.1

Max 39.3

Other 2.8


No response 30.1 None of the above statements is correct. 93.7

Membership of the AMS in 1991 was twice what it was in 1989 0.4

Between 1987 and1991 membership of the AMS doubled every two years.

3.0

Membership of the AMS could reasonably be expected to reach 50000 by 1992.

1.3

Multiple responses. 1.7

SRQ_21

The graph below shows the number of members of the American Mathematical Society: (Graph is out of proportion)

No response. 0

d) Shaded area Z<2. 26.2

c) Shaded area Z<-2. 11.8

a) Shaded area Z>1. 23.2

e) Shaded area Z>2. 5.9

b) Shaded area 0<Z<1. 17.3

SRQ_22

The heights of first year female university students are normally distributed with a mean of 165 cm and a variance of 4 cm2 . The graphs below are all of the standard normal distribution, that is, normal with mean 0 and variance 1. Choose the graph in which the shaded area gives the probability that a randomly chosen first year female student has a height of more than 161 cm No response. 15.6

Table G Student responses to the SRQ

Appendix H Project Description and Criteria

220

Appendix H. Project Description and Criteria

QUEENSLAND UNIVERSITY OF TECHNOLOGY

SCHOOL OF MATHEMATICAL SCIENCES

MAB101 Statistical Data Analysis I

Group Project in Data Collection, Presentation and Analysis Weight: This project is weighted at 20% of the final mark for MAB101. Aims: The aims of this project are to bring together ideas from the entire unit, and to enable you to obtain hands-on experience in statistical involvement in experiments or studies, from the beginning of the exe rcise right through to its conclusion. It will further allow you opportunities to apply your learning in MAB101 to real situations. Objectives: Through this project, you will get practical experience in statistical data planning, collection and analysis, and it is also an excellent way of helping your understanding and learning in MAB101. Conditions and Deadlines: The project must be done in groups of 3-4 students. You should aim to organize your groups in the first third of the semester. Groups often form from the practical classes, however, you may combine with students in other practical classes if you wish. Each group should submit a brief informal description (one per group) by email by the end of Week 6 or 7, confirming the names of the students in the group, and describing the context, data to be collected and/or "plan of attack". The description you hand in will assist in your planning, and to help you with feedback – it is not assessable. Make sure you identify in this brief report the variables you are going to collect data on, and the subjects on which data are going to be collected/observed – that is, the names of the columns of your planned spreadsheet/worksheet, and what each row will refer to. Your lecturer will send a comment by return email. Note that this brief informal description carries NO weight towards assessment AT ALL. Resources: These specifications are available on the MAB101 website at http://olt.qut.edu.au/sci/mab101/ under General Information and under Assessment. You will be able to look at past MAB101 projects during the support sessions from now on and during practical classes later in the semester. You may also refer to: Practical development of Statistical Understanding: A Project Based Approach by MacGillivray and Hayes. This book is a guide for doing good statistical projects and will also give you some ideas for suitable topics. Note that previous projects are like published materials – neither data nor reports can be copied and


221

must be referenced in the usual way if you wish to refer to them. Copies of the guide are available in the Reserve Section of the library. (Note: if not please let us know). Format: The project is to be presented as a hard copy document. A project cover sheet will be provided on the olt. The reports should not be unnecessarily long - concise relevant reporting can still cover all points of interest and be more informative than too much detail. The raw data should be included on a disk. Brief outline of project requirements: You are required to identify a context of interest to you, collect relevant data, explore and comment on features of the data, analyse the data using techniques you meet in MAB101, and write a report. You should select and use appropriate statistical tools introduced in MAB101 for some analysis and discussion of the data. The context you choose should be of interest to the group. You must then decide which data to collect and how to collect them. The report should include description of the circumstances and any practical problems encountered in collecting the data. Keep in mind that a reader should be able to either repeat your study or build on it. Don't be too ambitious with your project! If you have difficulties or suffer a brief inspirational vacuum, ask for advice! The project can be considered as having three components: • identifying and describing a context and issues of interest; planning and collecting of relevant data; quality of data and discussion of context/problems; • handling and processing data; summarising, exploring and commenting on features of the data; statistical modelling; • using statistical tools for statistical analysis and interpretation of the data in the context/issues. The points below are augmented by questions and discussion in the project manual on the Web http://www.maths.qut.edu.au/MAB893/manual.htm and in the project reference book. The first component is very important with its emphasis on planning and considering practical aspects of data, obtaining good quality data, and understanding the aspects of data that affect statistical assumptions and analysis. It consists of identifying a context to be investigated; identifying what is of interest; identifying relevant variables; identifying which data to collect and how to collect them; and noting clearly the circumstances and any practical problems in collecting the data, for later interpretation and in case queries arise during subsequent comment on features and analysis. Include in your report: • clear identification of the situation/context, its interest to the group, and issues of

interest/importance; • clear identification of planning/observation, including which variables were

observed and how. Before you start collecting data, check whether there may be information you may regret not including.


222

The second component refers to the handling, processing, exploring and modelling of the data, and preparing it for analysis. Presenting features and information in the data includes use of well-chosen graphs, plots and summaries. Note tha t there is some overlap with the first component in the handling of data: some projects may require more thought in the setting up phase, some may require more thought in the processing of the data. Use of graphs, plots and summaries is not separated from more formal analysis, as graphical and descriptive aspects can feature in preliminaries, in conjunction with more formal analysis, or to illustrate points of the analysis. Note that graphical and descriptive forms appropriate to the data should be chosen. Don't give every graph you can think of, and check if there is information in the data that you haven't presented. This component also involves using statistical tools in exploratory analysis and in identifying models to be formally assessed. The third component is as important as the first two and also overlaps them, but is particularly significant in helping you in learning the application of the data analysis tools being introduced throughout the semester. The project reference book includes examples of the types of datasets and situations that can be analysed, but don’t forget that the project reference book has model projects and is also used by students in units that do a little more than we do in MAB101. In addition, most sections of the MAB101 text give examples for which the tools of that section are useful. One of the aims of the project is to enable you to use the methods of MAB101 on your own data. Sophisticated data analysis is NOT expected: the aim is to make good use of the introductory tools of MAB101. Note that: • analysis techniques appropriate to the data and context should be applied - don't

use the scattergun approach of trying everything in case something is appropriate; and

• check to see if there is relevant information in the data that you haven't analysed. (See the project reference for various examples) For example, is there a time factor? Are some variables linked in some way? Could scaled versions of two different variables be compared?


223

Descriptions of criteria and standards Criteria (i) Identifying and describing a context and issues of interest; planning and collecting of relevant data; quality of data and discussion of context/problems

Mark Description of standard

6.5-7

Thoughtful ideas translated into planning to obtain sound data with a range of variables and observations for investigation of a range of issues. Description of context and practical details sufficient for reader to repeat. Evidence of teamwork.

5-6 Ideas translated into planning to obtain data with variables and observations for investigation of issues. Attempt at description of context and practical details. Some evidence of teamwork.

3-4 Limited data with little description of context and practical details.

1-2 Very little data; poor quality data; little description or teamwork

Criteria (ii) handling and processing data; understanding data and variables and issues to be explored; summarising, exploring and commenting on features of the data


5-6

Correct identification of variables, types of variables & subjects. Demonstrated understanding of nature of data and issues to be explored. Good data entry and preparation of data for analysis. Correct and judicious selection and use of graphs, tables. Range of graphs presents most features of data.

3-4

Attempt at identification of variables & subjects. Limited understanding of nature of data. Some issues identified. A mixture of correct/incorrect, judicious/non-judicious selections of graphs. Data entry and preparation adequate.

1-2 Limited data with little understanding of data, and issues. Negligible or incorrect presentations

Criteria (iii) using statistical tools for statistical analysis and interpretation of the data in the context/issues


6-7 Judicious choices of statistical procedures to analyse range of issues. Mostly correct use and technical interpretation of selected statistical procedures. Synthesis of results and appropriate discussion.

3-5 Choices of statistical procedures to analyse range of issues. Mixture of correct and incorrect use and technical interpretation of statistical procedures. Reasonable attempt at appropriate discussion.

1-2 Negligible or incorrect choices and applications of statistical procedures

Appendix I End of Semester Exam

224

Appendix I. End of Semester Exam

The following is a typical MAB101 end of semester exam. Students are given two

hours to complete the paper. They provide their own double-sided one page

summary of the course material and are permitted to use any calculator.

QUESTION 1 The flight of paper planes (R.Alcaraz, J.Mulholland, A.Tidmarsh, S.Williams 2004) The group investigated variables that might affect the distance and the flight time of different paper aeroplanes. The experiment was conducted in an enclosed space to minimise the influence of the weather. Three different plane designs were made using 3 different types of paper, and each combination was thrown four times by different throwers. For each throw, the flight time, distance, type of landing (nosedive/glide), position on landing (upright/not) and whether there had been any obstacles, were all recorded. All flights took place on the same day in the same location.

(a) The number of variables in this dataset is (circle your answer)

A: 6 B: 7 C: 8 D: 9

(b) Name the continuous variables in this dataset

………………………………………………………………………………………

…………………………………………………………………………………….

(c) When the data in this study are entered on a spreadsheet (or Minitab worksheet),

the rows correspond to (circle your answer)

A: designs B: throws C: paper D: throwers

Question 1 continued overleaf /cont……. MAB101T1.051


225

Question 1 continued

(d) The boxplots below are of the flight times in seconds, classified by design and type of paper. For each of the statements, indicate in the column provided whether or not the statement is an appropriate one to make based ONLY on the boxplots below .

Flig

ht

Tim

e

DesignPaper

s tingray glidernick 's paper aeroplanegenericr iceplaincartr idgericeplaincartr idgericeplaincart r idge

6

5

4

3

2

1

0

Boxplot of Flight Time vs Design, Paper

Tick if appropriate Place a X if not

One design is clearly better by the criteria of length of flight time

The plot suggests that different paper types might suit different designs

The average flight time for the stingray glider in cartridge is 2 seconds The variability does not seem to depend on design or type of paper The standard deviation of the flight times for the generic design in rice paper is approximately 1 second

In Nick’s design, half of the cartridge paper planes flew longer than three-quarters of the rice paper planes

In the generic design, the rice paper planes had the most variable and the most skew times

Some of the observations should be discarded

(e) The stem-and-leaf plot below is of the distance in metres travelled by all the paper planes made of plain paper. Use this plot to answer the following questions.

Stem-and-leaf of Distance_plain N = 48 Leaf Unit = 0.10 1 2 0 4 3 299 14 4 0035699999 21 5 1234666 (7) 6 0222799 20 7 000022348 11 8 0335777 4 9 046 1 10 9

Question 1(e) continued overleaf /cont……. MAB101T1.051


226

Question 1(e) continued

(i) The median of the distance for plain paper is …………….

(ii) The lower quartile of the distance for plain paper is ………..

(iii) The probability that the distance travelled by a plain paper plane is more than 6 metres is estimated directly from the data by………….

(iv) Assuming the flight distances of plain paper planes are normal with a mean of 6.5 metres and a standard deviation of 2 metres, the probability that a plain paper plane flies more than 6 metres is given by

………………………………………………………………………

………………………………………………………………………….

(SHOW YOUR CALCULATIONS)

(15 marks) /cont……. MAB101T1.051


227

QUESTION 2 The table below classify the dress type and age group for people during lunchtime in a busy city street. Rows: dress Columns: age <30 30-39 40-49 50-59 >59 All casual 15 43 19 10 4 91 neutral 17 41 31 19 20 128 smart 7 19 28 19 8 81 All 39 103 78 48 32 300 (a) In carrying out a statistical test to assess whether the type of dress depends on age group, the

expected number in the 50-59 age group dressing casually is given by ………………………….. (b) Which statistical tables will you use in carrying out the above test? [Remember to give all

information needed in order to use the tables]

…………..………

(c) The p-value for the test is 0.003. From this we can say (circle your response):

(1) there is strong evidence that dress type is independent of age group (2) there is strong evidence that dress type depends on the age group

(3) there is a 0.3% chance that dress type is independent of age group.

(d) Based on the above data, an approximate 95% confidence interval for the probability that a

person in the <30 age group dresses smartly is given by

))1(

96.1,)1(

96.1(n

ccc

ncc

c−

+−

− where

(1) n is given by ………

(2) c is given by ………….

(e) The data above was also separated into males and females and tests for independence of dress type and age group carried out for each gender. The p-value for the females was 0.703, and the p-value for the males was 0.000. Comment on these results in a single sentence.

…………………………………………………………………………………………

…………………………………………………………………………………………

…………………………………………………………………………………………

Question 2 continued overleaf /cont…….

MAB101T1.051


228

Question 2 continued

(e) It is claimed that in daytime, the 30-39 year age group is more likely to dress smartly than the <30 age group. Let p1 be the probability that a 30-39 year-old in the city during the day dresses smartly, and let p2 be the probability that a <30 year-old in the city during the day dresses smartly.

(iv) The test statistic to test this is given by

2 1ˆ ˆ

1 1ˆ ˆ(1 )

p p

p pm n

−

− +

where

(i) (m, n) are given by ……………….. (ii) and p is given by …………………

(12 marks)

/cont…….

MAB101T1.051


229

QUESTION 3

Soluble aspirin (D.Raffelt, T.Do, E.Nguyen, C.Dance 2004) To investigate the time to dissolve different types of soluble aspirin tablets, an experiment was conducted with 5 brands, using the factors water temperature (room/fridge), pH of water (neutral/acidic), and water type (normal/added salt). Aspirin was classified as dissolved once it had broken up and dissociated from the surface of the water. (a) For the normal, neutral water samples, the following summary statistics were obtained Variable Temp N Mean StDev Median Time(sec) Fridge 15 79.20 31.96 65.00 Room 15 68.40 26.16 56.00

(i) Using the above data, a 95% confidence interval for the expected time to dissolve aspirin tablets in normal, neutral, room temperature water is given by 68.4 – c 26.16/√n, 68.4 + c 26.16/√n, where

(1) n is

………………

(2) c is ………………

(ii) It is desired to quote an interval giving a range of time in which most aspirin tablets will dissolve. Which should be used? (circle your response)

(1) a confidence interval.

(2) a tolerance interval.

(iii) It is desired to estimate the expected time to dissolve aspirin in normal, neutral, room temperature water to within 10 seconds with a confidence of 95%. Using 26.2 seconds for the standard deviation of time to dissolve, it is suggested that the number of observations needed should be at least

………………………………………………………………………………

………………………………………………………………………………

……………………………………………………………………………

[SHOW YOUR CALCULATIONS]

Question 3(a) continued overleaf

/cont…….

MAB101T1.051


230

Question 3(a) (continued)

(iv) A 95% confidence interval for the standard deviation of the time to dissolve aspirin tablets in normal, neutral, room temperature water is given by

............ 26.16 ............. 26.16,

.............. .................

x x

(v) Assuming equal variances, and testing that there is no difference between the expected

time to dissolve aspirin in normal, neutral water at room temperature or from the fridge, the tables used for the test are

……………….

[Remember to give all information needed in order to use the tables]

(vi) The test statistic to carry out the test of (v) is given by

............. .............

....... .........................

....... .........

−

+

(13 marks) (b) For the times to dissolve aspirin tablets in normal water at room temperature, the following was obtained Two-way ANOVA: Time(sec) versus pH, Brand

Source DF SS MS F P pH 1 70.5 70.53 2.35 0.141 Brand 4 6522.9 1630.72 54.36 0.000 Interaction 4 4160.5 1040.12 34.67 0.000 Error 20 600.0 30.00 Total 29 11353.9

S = 5.477 R-Sq = 94.72% R-Sq(adj) = 92.34%

(i) The p -value of 0.141 tells us that ……………………………………………

………………………………………………………………………………….

(ii) The p-value of 0.000 for Brand tells us that …………………………………

……………………………………………………………………………….

Question 3(b) continued overleaf /cont…….

MAB101T1.051


231

Question 3(b) (continued)

(iii) The p-value of 0.000 for Interaction tells us that ………………………………

…………………………………………………………………………………… ……………………………………………………………………………………

(iv) Give a single sentence comment on each of the plots below …………………

…………………………………………………………………………………………

…………………………………………………………………………………………………………… …………………………………………………………………………………………………………… ……………………………………………………………………………………………………………

BrandBrand

Me

an

S olpr inDi spr inC odoxCodisA spro Clear

120

110

100

90

80

70

60

50

40

pHacid ic

Neutr al

Interac tion Plot (data means ) fo r Time(sec)

95% Bo nfe r ro ni Co nfide nce Inte r va ls fo r StDe v s

pH Br and

Neutra l

acidic

S olpr in

Dispr in

C odox

Cod is

As pro C lear

S olpr in

Dispr in

C odox

Cod is

As pro C lear

200150100500

B ar tlet t's T est

0 .619

Test S tat istic 12. 05P-V alue 0.210

L ev ene' s Te st

Test S tat istic 0. 80P-V alue

T es t fo r Eq ua l V ar iance s for T ime(s e c)

(8 marks) /cont……. MAB101T1.051


232

QUESTION 4 Reflexes (K.Beakey, N.Hand, J.Rolfe 2004) An experiment was conducted to investigate human reflexes. A ruler was dropped (15.2cm above the hand and by the same group member) on the count of three and the aim was to catch the ruler as quick as possible. A fluorescent and a clear ruler were used, with each subject tested with each ruler for both left and right hands, and the distance of the catch from the bottom of the ruler noted if the subject caught the ruler. The age and dominant hand of each subject was noted. The order of rulers was randomised for each subject. All the subjects caught the fluorescent ruler with their right hand, but some missed the other rulers. (a) Below is part of the output for a oneway ANOVA on the reflex distance for the fluorescent ruler caught by the right hand.

One-way ANOVA: Right Fluorescent versus Age group Source DF SS MS F Age group 2 717.6 ………… …………… Error ……… ……………… ………… Total 39 1922.2

(i) Fill in the blank spaces in the above table (ii) Use the output below to comment on the comparisons across the age groups ………………………………………………………………………………… ………………………………………………………………………………..

Tukey 95% Simultaneous Confidence Intervals All Pairwise Comparisons among Levels of Age group Individual confidence level = 98.04% Age group = 0-34 subtracted from: Age group Lower Center Upper -----+---------+---------+---------+---- 35-69 0.362 6.475 12.588 (---------*---------) 70+ 4.270 9.229 14.188 (-------*--------) -----+---------+---------+---------+---- -6.0 0.0 6.0 12.0 Age group = 35-69 subtracted from: Age group Lower Center Upper -----+---------+---------+---------+---- 70+ -3.772 2.754 9.279 (----------*---------) -----+---------+---------+---------+---- -6.0 0.0 6.0 12.0

(7 marks)

Question 4 continued overleaf /cont…….

MAB101T1.051


233

Question 4 (continued)

(b) The output below looks at how the reflex distance on the fluorescent ruler caught by the right hand is affected by age group and dominant hand. Give (BRIEFLY) the information provided by this output. ………………………………………………………………………………………… ………………………………………………………………………………………… ……………………………………………………………………………………….. ………………………………………………………………………………………. Analysis of Variance for Right Fluorescent, using Adjusted SS for Tests Source DF Seq SS Adj SS Adj MS F P Age group 2 717.61 389.08 194.54 6.30 0.005 R/L Handed 1 75.28 30.51 30.51 0.99 0.327 Age group*R/L Handed 2 78.68 78.68 39.34 1.27 0.293 Error 34 1050.63 1050.63 30.90 Total 39 1922.20

Fitted Value

Sta

nd

ard

ize

d R

esi

du

al

27.525.022.520.017.515.0

3

2

1

0

-1

-2

-3

Residuals Versus the Fitted Values(response is Right Fluorescent)

(7 marks)

(c) The output over investigates how the reflex length for the clear ruler caught by the right hand is related to the reflex lengths for the other three catches (provided all four are caught) and age of the subject. Give (BRIEFLY) the information provided by this output.

Question 4(c) continued overleaf

/cont……. MAB101T1.051


234

Question 4(c) continued

………………………………………………………………………………… ………………………………………………………………………………….

………………………………………………………………………………………… ………………………………………………………………………………………… ………………………………………………………………………………………

Regression Analysis: Right Clear versus Right Fluore, Left Fluores, ... The regression equation is Right Clear = 7.64 + 0.330 Right Fluorescent + 0.158 Left Fluorescent + 0.0475 Age + 0.077 Left Clear 33 cases used, 7 cases contain missing values Predictor Coef SE Coef T P Constant 7.642 2.607 2.93 0.007 Right Fluorescent 0.3301 0.1339 2.47 0.020 Left Fluorescent 0.1584 0.1334 1.19 0.245 Age 0.04745 0.03926 1.21 0.237 Left Clear 0.0771 0.1364 0.57 0.576 S = 3.95723 R-Sq = 57.0% R-Sq(adj) = 50.9% Unusual Observations Right Right Obs Fluorescent Clear Fit SE Fit Residual St Resid 38 28.9 19.000 21.707 2.720 -2.707 -0.94 X X denotes an observation whose X value gives it large influence.

Age

Stan

dard

ized

Re

sid

ual

908070605040302010

2

1

0

-1

-2

Residuals Versus Age(response is Right Clear)

Question 4(c) continued overleaf /cont…….

MAB101T1.051


235

Question 4(c) (continued)

Fitted Value

Sta

nd

ard

ize

d R

esi

dua

l

28262422201816141210

2

1

0

-1

-2

Residuals Versus the Fitted Values(response is Right Clear)

Right Fluorescent

Sta

ndar

dize

d R

esid

ual

302520151050

2

1

0

-1

-2

Residuals Versus Right Fluorescent(response is Right Clear)

Standard ized Re sid ua l

Pe

rcen

t

3210-1-2- 3

99

95

90

80

70

60

5040

30

20

10

5

1

Normal Probabi lity P lot of the Res idual s(r esponse is Right C lear)

(8 marks) MAB101T1.051 /cont…….


236

QUESTION 5

In the paper plane experiment of Question 1 above, the relationship between flight time (in seconds) and distance travelled (in metres) were analysed for the design called Nick’s paper aeroplane.

Regression Analysis: time_n versus dist_n The regression equation is time_n = 1.15 + 0.0823 dist_n Predictor Coef SE Coef T P Constant 1.1546 0.2508 4.60 0.000 dist_n 0.08228 0.04637 1.77 0.083 S = 0.672386 R-Sq = 6.4% R-Sq(adj) = 4.4% (a) What is the p-value of 0.083 telling us? ………………………………………..

…………………………………………………………………………………………..

(b) What is the R-sq telling us? ………………………………………..

…………………………………………………………………………………………..

(c) The following output continues investigating the above relationship.

MTB > let c16=’dist_n’**2 MTB > let c17=’dist_n’**3 Regression Analysis: time_n versus dist_n, C16, C17 The regression equation is time_n = 3.47 - 1.44 dist_n + 0.277 C16 - 0.0148 C17 Predictor Coef SE Coef T P Constant 3.4703 0.6858 5.06 0.000 dist_n -1.4446 0.4889 -2.95 0.005 C16 0.2774 0.1059 2.62 0.012 C17 -0.014782 0.006928 -2.13 0.038 S = 0.595347 R-Sq = 29.8% R-Sq(adj) = 25.0% Analysis of Variance Source DF SS MS F P Regression 3 6.6249 2.2083 6.23 0.001 Residual Error 44 15.5953 0.3544 Total 47 22.2202 Unusual Observations Obs dist_n time_n Fit SE Fit Residual St Resid 12 9.35 1.2180 2.1356 0.3965 -0.9176 -2.07RX 13 1.46 3.1330 1.9066 0.2372 1.2264 2.25R 15 0.57 2.9030 2.7343 0.4587 0.1687 0.44 X 16 2.84 2.5150 1.2669 0.1447 1.2481 2.16R 27 2.02 0.3360 1.5625 0.1736 -1.2265 -2.15R 46 3.88 2.6720 1.1787 0.1266 1.4933 2.57R R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large influence.

Question 5(c) continued overleaf /cont…….

MAB101T1.051


237

Question 5(c) (continued)

(i) What was done to produce the above output? …………………………………….

…………………………………………………………………………………………..

(ii) What does the p-value of 0.005 tell us? ……………………………………………

…………………………………………………………………………………………..

(iii) What does the p-value of 0.012 tell us? …………………………………………..

………………………………………………………………………………………….

(iv) What does the p-value of 0.038 tell us? …………………………………………..

…………………………………………………………………………………………

(v) What do the plots below tell us? …………………………………………………..

………………………………………………………………………………………….. …………………………………………………………………………………………..

dist_n

Sta

nda

rdiz

ed R

esid

ual

9876543210

3

2

1

0

-1

-2

Residuals Versus dist_n(response is time_n)


Per

cent

3210-1-2-3

99

95

90

80

70

60

5040

30

20

10

5

1

Normal Probability Plot of the Residuals(response is time_n)

(10 marks)

END OF PAPER

Bibliography

239

Bibliography

Adams, R.J. and Khoo, S.T. (1996). Quest: The Interactive Test Analysis System.

Version 2.1. Melbourne: Australian Council for Educational Research.

Albert, J.H. (2003). College Students' Conceptions of Probability. The American

Statistician, 57 (1), 37-45.

Bandura, A. (1986), Social Foundations of Thought and Action: A Social Cognitive

Theory, Englewood Cliffs, NJ: Prentice Hall.

Ben-Zvi, D. (2000). Toward Understanding the Role of Technological Tools in

Statistical Learning. Mathematical Thinking and Learning, 2 (1-2), 127-155.

Ben-Zvi, D. and Garfield, J. (eds.) (2004), The Challenge of Developing Statistical

Literacy, Reasoning and Thinking, Dordrecht, The Netherlands: Kluwer

Academic Publishers.

Biggs, J.B. and Collis, K.F. (1982), Evaluating the Quality of Learning: The Solo

Taxonomy, New York: Academic Press.

Biggs, J.B. and Collis, K.F. (1991), "Multimodal Learning and the Quality of

Intelligent Behaviour," in Intelligence: Reconceptualization and

Measurement, ed. H. A. H. Rowe, Hillsdale, NJ: Lawrence Erlbaum, pp. 57-

76.

Broers, N.J. (2006). Learning Goals: The Primacy of Statistical Knowledge.

Proceedings of the 7th International Conference on Teaching Statistics:

Working Cooperatively in Statistics Education, Salvador, Brazil: International

Statistical Institute.

Bibliography

240

Chance, B.L. (2002). Components of Statistical Thinking and Implications for

Instruction and Assessment. Journal of Statistics Education, 10 (3),

www.amstat.org/publications/jse/secure/v10n3/chance.html.

Chance, B.L. and Garfield, J. (2002). New Approaches to Gathering Data on Student

Learning for Research in Statistics Education. Statistics Education Research

Journal, 1 (2), 38-41.

Cobb, P. (1999). Individual and Collective Mathematical Development: The Case of

Statistical Data Analysis. Mathematical Thinking and Learning, 1, 5-43.

Cockcroft, W.H. (1982), "Mathematics Counts : Report of the Committee of Inquiry

into the Teaching of Mathematics in Schools," Technical, Her Majesty's

Stationery Office.

Coutis, P., Cuthbert, R. and MacGillivray, H. (2002). Bridging the Gap between

Assumed Knowledge and Reality: A Case for Supplementary Learning

Support Programs in Tertiary Mathematics. Proceedings of the Engineering

Mathematics and Applications Conference, The Institution of Engineers,

Australia, pp. 97-102.

Cuthbert, R. and MacGillivray, H.L. (2003). Investigating Weaknesses in the

Underpinning Mathematical Confidence of First Year Engineering Students.

Proceedings of the Australasian Engineering Education Conference, The

Institution of Engineers, Australia, pp. 358-368.

delMas, R. (2002a). Statistical Literacy Reasoning and Learning. Journal of

Statistics Education, 10 (3),

www.amstat.org/publications/jse/v10n3/delmas_intro.html.

delMas, R. (2002b). Statistical Literacy, Reasoning and Learning: A Commentary.

Journal of Statistics Education, 10 (3),

www.amstat.org/publications/jse/v10n3/delmas_discussion.html.

delMas, R. (2004), "A Comparison of Mathematical and Statistical Reasoning," in

The Challenge of Developing Statistical Literacy, Reasoning and Thinking,

Bibliography

241

eds. D. Ben-Zvi and J. Garfield, Dordrecht, The Netherlands: Kluwer

Academic Publishers, pp. 79-95.

delMas, R., Garfield, J. and Chance, B.L. (1999). A Model of Classroom Research in

Action: Developing Simulation Activities to Improve Students' Statistical

Reasoning. Journal of Statistics Education, 7 (3),

www.amstat.org/publications/jse/secure/v7n3/demas.cfm.

Finney, S.J. and Schraw, G. (2003). Self-Efficacy Beliefs in College Statistics

Courses. Contemporary Educational Psychology, 28 (2), 161-186.

Gal, I. (ed.) (2000), Adult Numeracy Development: Theory, Research, Practice,

Cresskill, NJ: Hampton Press.

Gal, I. (2004), "Statistical Literacy: Meanings, Components, Responsibilities," in The

Challenge of Developing Statistical Literacy, Reasoning and Thinking, eds.

D. Ben-Zvi and J. Garfield, Dordrecht, The Netherlands: Kluwer Academic

Publishers, pp. 47-78.

Gal, I. and Garfield, J. (1997), The Assessment Challenge in Statistics Education (1st

ed.), IOS Press & International Statistical Institute.

Gal, I. and Ginsberg, L. (1994). The Role of Beliefs and Attitudes in Learning

Statistics: Towards an Assessment Framework. Journal of Statistics

Education, 2 (2), www.amstat.org/publications/jse/v2n2/gal.html.

Gal, I., Ginsberg, L. and Schau, C. (1997), "Monitoring Attitudes and Beliefs in

Statistics Education," in The Assessment Challenge in Statistics Education,

eds. I. Gal and J. Garfield, Amsterdam: IOS Press, pp. 37-51.

Garfield, J. (1991). Evaluating Students' Understanding of Statistics: Development of

the Statistical Reasoning Assessment. Proceedings of the Thirteenth Annual

Meeting of the North American Chapter of the International Group for the

Psychology of Mathematics Education, Blacksburg, VA, pp. 1-7.

Garfield, J. (1998). The Statistical Reasoning Assessment: Development and

Validation of a Research Tool. Proceedings of the Fifth International

Bibliography

242

Conference on Teaching Statistics, Singapore: International Statistical

Institute, pp. 781-786.

Garfield, J. (2002). The Challenge of Developing Statistical Reasoning. Journal of


www.amstat.org/publications/jse/v10n3/garfield.html.

Garfield, J. (2003). Assessing Statistical Reasoning. Statistics Education Research

Journal, 2 (1), 22-38.

Garfield, J. and Chance, B.L. (2000). Assessment in Statistics Education Issues and

Challenges. Mathematical Thinking and Learning, 2 (1-2), 99-125.

Garfield, J. and Gal, I. (1999). Assessment and Statistics Education: Current

Challenges and Directions. International Statistical Review, 67 (1), 1-12.

Garfield, J., Hogg, R.V., Schau, C. and Whittinghill, D. (2002). First Courses in

Statistical Science: The Status of Educational Reform Efforts. Journal of


www.amstat.org/publications/jse/v10n2/garfield.html.

Gnaldi, M. (2003). Students' Numeracy and Their Achievement of Learning

Outcomes in a Statistics Course for Psychologists. Unpublished M.Sc.,

University of Glasgow, Faculty of Statistics.

Hirsch, L. and O'Donnell, A. (2001). Representativeness in Statistical Reasoning:

Identifying and Assessing Misconceptions. Journal of Statistics Education, 9

(2), www.amstat.org/publications/jse/v9n2/hirsch.html.

Hogg, R.V. (1991). Statistical Education: Improvements Are Badly Needed. The

American Statistician, 45, 342-343.

Jansen, P.G.W. and Roskam, E.E. (1986). Latent Trait Models and Dichotomization

of Grade Responses. Psychometrika, 51 (1), 69-91.

Jones, B. (1982), Sleepers, Wake! : Technology and the Future of Work., Melbourne:

Oxford University Press.

Bibliography

243

Kahneman, D., Slovic, P. and Tversky, A. (eds.) (1982), Judgement under

Uncertainty: Heuristics and Biases, Cambridge University Press.

Kahneman, D. and Tversky, A. (1972). Subjective Probability: A Judgement of

Representativeness. Cognitive Psychology, 3, 430-454.

Kahneman, D. and Tversky, A. (eds.) (2000), Choices, Values and Frames,

Cambridge: Cambridge University Press.

Keeves, J.P. and Alagumalai, S. (1999), "New Approaches to Measurement," in

Advances in Measurement in Educational Research and Assessment, eds. G.

N. Masters and J. P. Keeves, Oxford: Pergamon, pp. 23-42.

Keeves, J.P. and Masters, G.N. (1999), "Issues in Educational Measurement," in

Advances in Measurement in Educational Research and Assessment (1st ed.),

eds. G. N. Masters and J. P. Keeves, Oxford: Pergamon, pp. 268-281.

Konold, C. (1989). Informal Conceptions of Probability. Cognition and Instruction, 6

(1), 59-98.

Konold, C. (1990), "Chanceplus: A Computer-Based Curriculum for Probability and

Statistics," Technical, Scientific Reasoning Research Institute, University of

Massachusetts, Amherst.

Konold, C. (1995). Issues in Assessing Conceptual Understanding in Probability and

Statistics. Journal of Statistics Education, 3 (1),

www.amstat.org/publications/jse/v3n1/konold.html.

Lalonde, R.N. and Gardner, R.C. (1993). Statistics as a Second Language? A Model

for Predicting Performance in Psychology Students. Canadian Journal of

Behavioural Science, 25 (1), 108-130.

Lokan, J., Ford, P. and Greenwood, L. (1997), Maths and Science on the Line:

Australian Middle Primary Students' Performance in the Third International

Mathematics and Science Study, Camberwell, Vic: Australian Council for

Educational Research.

Bibliography

244

MacGillivray, H. (2005). Helping Students Find Their Statistical Voices.

Proceedings of the ISI / IASE Satellite on Statistics Education and the

Communication of Statistics, Sydney, Australia: ISI, Voorburg, The

Netherlands.

MacGillivray, H.L. (1998). Developing and Synthesizing Statistical Skills for Real

Situations through Student Projects. Proceedings of the Fifth International

Conference on Teaching Statistics, Singapore: International Statistical

Institute, pp. 1149-1155.

MacGillivray, H.L. (2002). Lessons from Engineering Student Projects in Statistics.

Proceedings of the Australian Engineering Education Conference, Canberra,

Australia: The Institution of Engineers, Australia, pp. 225-230.

MacGillivray, H.L. (2004), Data Analysis: Introductory Methods in Context (1st

ed.), Australia: Pearson - SprintPrint.

Masters, G.N. (1982). A Rasch Model for Partial Credit Scoring. Psychometrika, 47

(2), 149-174.

Masters, G.N. (1988a). The Analysis of Partial Credit Scoring. Applied Measurement

in Education, 1 (4), 279-297.

Masters, G.N. (1988b), "Measurement Models for Ordered Response Categories," in

Latent Trait and Latent Class Models, eds. R. Langeheine and J. Rost, New

York: Plenum.

McLeod, D.B. (1992), "Research on Affect in Mathematics Education: A

Reconceptualization," in Handbook of Research on Mathematics Teaching

and Learning, ed. D. A. Grouws, New York: Macmillan, pp. 575-596.

Meletiou-Mavrotheris, M. and Lee, C. (2002). Teaching Students the Stochastic

Nature of Statistical Concepts in an Introductory Statistics Course. Statistics

Education Research Journal, 1 (2), 22-37.

Moore, D. (1997). New Pedagogy and New Content: The Case of Statistics (with

Discussion). International Statistical Review, 65 (2), 123-137.

Bibliography

245

Pajares, F. and Millerb, M.D. (1995). Mathematics Self-Efficacy and Mathematics

Performances: The Need for Specificity in Assessment. Journal of

Counseling Psychology, 42 (2), 190-198.

Pea, R.D. (1987), "Cognitive Technologies for Mathematics Education," in Cognitive

Science and Mathematics Education, ed. H. Schoenfeld, Hillsdale, NJ:

Lawrence Erlbaum Associates, Inc., pp. 89-122.

Petocz, P. and Reid, A. (2001). Students' Experience of Learning in Statistics.

Quaestiones Mathematicae, Suppl 1, 37-45.

Petocz, P. and Reid, A. (2003). Relationships between Students' Experience of

Learning Statistics and Teaching Statistics. Statistics Education Research

Journal, 2 (2), 39-53.

Pfannkuch, M. and Wild, C.J. (2004), "Towards an Understanding of Statistical

Thinking," in The Challenge of Developing Statistical Literacy, Reasoning

and Thinking, eds. D. Ben-Zvi and J. Garfield, Dordrecht, The Netherlands:

Kluwer Academic Publishers, pp. 17-46.

Pokorny, M. and Pokorny, H. (2005). Widening Participation on Higher Education:

Student Quantitative Skills and Independent Learning as Impediments to

Progression. International Journal of Mathematical Education in Science and

Technology, 36 (5), 445-467.

Queensland Studies Authority. (2004). Mathematics: Year 1 to 10 Syllabus.

Accessed 11 Feb 2005, from

www.qsa.qld.edu.au/yrs1to10/kla/mathematics/docs/syllabus/syllabus.pdf.

Queensland University of Technology. (2004). Making Inroads. Accessed April

2004, from

www.studentservices.qut.edu.au/applying/lodging/undergraduate/inroads.jsp.

Rasch, G. (1960), Probabilistic Models for Some Intelligence and Attainment Tests

(reprinted 1980 ed.), Chicago: University of Chicago Press.

Bibliography

246

Reading, C. (2002). Profile for Statistical Understanding. Proceedings of the Sixth

International Conference on Teaching Statistics: Developing a statistically

literate society, Cape Town, South Africa: International Statistical Institute.

Roberts, D. and Bilderback, E. (1980). Reliability and Validity of a Statistical

Attitude Survey. Educational and Psychological Measurement, 40, 235-238.

Roberts, D. and Reese, C. (1987). A Comparison of Two Scales Measuring Attitudes

Towards Statistics. Educational and Psychological Measurement, 47, 759-

764.

Rumsey, D. (2002). Statistical Literacy as a Goal for Introductory Statistics Courses.


www.amstat.org/publications/jse/v10n3/rumsey2.html.

Schau, C., Stevens, J., Dauphinee, T.L. and Del Vecchio, A. (1995). The

Development and Validation of the Survey of Attitudes Towards Statistics.

Educational and Psychological Measurement, 55, 868-875.

Shaughnessy, J.M. (1992), "Research in Probability and Statistics: Reflections and

Directions," in Handbook of Research on Mathematics Teaching and

Learning, ed. D. A. Grouws, Macmillan Publishing Company, pp. 465-494.

Snell, L. (1999). Using "Chance" Media to Promote Statistical Literacy. Paper

presented at the 1999 Joint Statistical Meetings.

Sowey, E.R. (1998). Statistical Vistas: Perspectives on Purpose and Structure.


www.amstat.org/publications/jse/v2n2/gal.html.

Sundre, D.L. (2003). Assessment of Quantitative Reasoning to Enhance Educational

Quality. Proceedings of the paper presented at the American Educational

Research Association meeting, Chicago, Illinois.

Sutarso, T. (1992). Students' Attitudes Towards Statistics. Proceedings of the Annual

Meeting of the Mid-South Educational Research Association.

Bibliography

247

Tempelaar, D. (2004). Statistical Reasoning Assessment: An Analysis of the SRA

Instrument. Proceedings of the ARTIST Roundtable Conference on

Assessment in Statistics, Lawrence University.

Tempelaar, D. (2006). A Structural Equation Model Analyzing the Relationship

Students' Statistical Reasoning Abilities, Their Attitudes toward Statistics,

and Learning Approaches. Proceedings of the 7th International Conference

on Teaching Statistics: Working Cooperatively in Statistics Education,

Salvador, Brazil: International Statistical Institute.

Thomas, S. and Fleming, N. (2004), Summing It Up: Mathematics Achievement in

Australian Schools in TIMSS 2002, Camberwell, Vic: Australian Council for

Educational Research.

Vere-Jones, D. (1995). The Coming of Age of Statistical Education. International

Statistical Review, 63, 3-23.

Waters, L., Martelli, T., Zakrajsek, T. and Popovich, P. (1988). Attitudes toward

Statistics: An Evaluation of Multiple Measures. Educational and

Psychological Measurement, 48, 513-518.

Watson, J.M. (1993), "Introducing the Language of Probability through the Media,"

in Communicating Mathematics - Perspectives from Current Research and

Classroom Practice in Australia, eds. M. Stephens, A. Wayward, D. Clarke

and J. Izard, Melbourne: Australian Council for Educational Research, pp.

119-139.

Watson, J.M. (1997), "Assessing Statistical Thinking Using the Media," in The

Assessment Challenge in Statistics Education, eds. I. Gal and J. Garfield,

Amsterdam: IOS Press, pp. 107-121.

Watson, J.M. and Callingham, R. (2003). Statistical Literacy: A Complex

Hierarchical Construct. Statistics Education Research Journal, 2 (2), 3-46.

Watson, J.M., Kelly, B.A., Callingham, R.A. and Shaughnessy, J.M. (2003). The

Measurement of School Students' Understanding of Statistical Variation.

Bibliography

248

International Journal of Mathematical Education in Science and Technology,

34 (1), 1-29.

Watson, J.M. and Moritz, J. (1999). The Beginning of Statistical Inference:

Comparing Two Data Sets. Educational Studies in Mathematics, 37 (2), 145-

168.

Watson, J.M. and Moritz, J. (2000). The Longitudinal Development of

Understanding of Average. Mathematical Thinking and Learning, 2 (1&2),

11-50.

Wild, C.J. (2006). On Cooperation and Competition. Proceedings of the 7th

International Conference on Teaching Statistics: Working Cooperatively in

Statistics Education, Salvador, Brazil: International Statistical Institute.

Wild, C.J. and Pfannkuch, M. (1999). Statistical Thinking in Empirical Enquiry (with

Discussion). International Statistical Review, 67 (3), 223-265.

Wilson, M. (1992), "Measuring Levels of Mathematical Understanding," in

Mathematics Assessment and Evaluation, ed. T. A. Romberg, Albany: State

University of New York Press, pp. 213-241.

Wilson, M. and Masters, G.N. (1993). The Partial Credit Model and Null Categories.

Psychometrika, 58 (1), 87-99.

Wilson, T.M. and MacGillivray, H.L. (2007). Counting on the Basics: Mathematical

Skills Amongst Tertiary Entrants. International Journal of Mathematical

Education in Science and Technology, 38 (1), 19-41.

Wise, S. (1985). The Development and Validation of a Scale Measuring Attitudes

toward Statistics. Educational and Psychological Measurement, 45, 401-405.

Wright, B.D. (1999), "Rasch Measurement Models," in Advances in Measurement in

Educational Research and Assessment (1st ed.), eds. G. N. Masters and J. P.

Keeves, Oxford: Pergamon, pp. 85-97.

statistical reasoning at the secondary tertiary interfaceeprints.qut.edu.au/16358/3/therese wilson...

Documents