Download - (Achievement Standard 91581) Richard Arnold and Cushla …sms.victoria.ac.nz/foswiki/pub/Events/TeacherDay/WebHome/... · Students are given a multivariate dataset with some ... This

(Achievement Standard 91581)

Richard Arnold and Cushla Thomson

Overview – 3.9 Bivariate Data The most common standard in Level 3 statistics (16 000

students in 2013) 4 credits

Internally assessed

Around 5 weeks spent teaching then assessing the topic (varies across schools and courses – depends on how many standards students are taking)

This Powerpoint, NZQA documents and examples of student work related to 3.9 are on http://padlet.com/cushla_thomson/27oubb6k0yq9

or http://bit.ly/1HnYCNn

Basic idea Students are given a multivariate dataset with some

background information on the variables. They have to choose 2 numeric variables and investigate the relationship between them using linear regression.

They write a report about their investigation

Almost all schools use iNZight for the analysis

At Achieved level: investigate bivariate data

At Merit level: investigate bivariate data with justification

At Excellence level: investigate bivariate data with statistical insight

The investigation cycle - PPDAC In all Level 3 statistics

standards students must show that they’ve used this cycle:

Problem

Plan

Data

Analysis

Conclusion

Requirements for Achieved For Achieved the student is required to investigate bivariate

measurement data. This involves showing evidence of using each component of the statistical enquiry cycle.

This is NZQA-speak for the following skills: The student must: Write a question about the relationship between two quantitative

variables “I want to know what the relationship is between the ____ of ____ (in

m) and the _____ of ______ (in cm)” Identify the explanatory and response variables Use iNZight or Excel to draw a scatterplot Comment on the sign and strength or the relationship

Focus on scatterplot visually before using value of r to back up

Find the line of best fit and use it to make a prediction Form a conclusion about the question

Requirements for Merit Investigate bivariate data with justification

Code for:

All of the above plus links to context throughout the report

Not many stupid statements...

Requirements for Excellence The student must have done what is required for Merit plus

used research to deepen the investigation

Loads of context Residual plot used to look for problems with model

Non-constant variance Non-linear model

Prediction intervals (informally) Students might fit a non-linear model if that makes sense Unusual points will be discussed and possibly removed Students might investigate a second pair of variables

No recipe for Excellence but that doesn’t stop students

trying to find one!

Big gap between Achieved and Excellence See the Padlet link for examples of what is needed for

each of the 3 passing levels http://bit.ly/1HnYCNn

Achieved is mechanical and requires minimal understanding – rote learning will get students there

Excellence requires more communication of understanding and ability to link concepts than many strong first year university students display

Bivariate Data in STAT 193

Correlation (1.5 lectures) Compute Pearson’s r (gcalc)

Rank data (by hand)

Compute Spearman’s r (gcalc: Pearson’s on the ranks)

Plot data, choose appropriate correlation coefficient

Comment on strength and direction

Regression (3 lectures) Draw scatterplot (manually)

Fit regression line a + bX (gcalc)

Draw regression line (manually)

Predictions (gcalc or using formula)

Compute residuals (manually)

Draw residual plot (manually)

Assess regression assumptions using residual plot (linear, constant variance, normal data)

Compute 𝑅2(gcalc) and interpret

Assessment Context is spelled out to a large degree (X and Y are

identified)

Need to consider nature of data and relationship to select correlation coefficient

No inference (no test of slope, or confidence intervals)

No transformation of variables to make relationship linear

Linear relationships only (non-linear in 2nd year)

Statistical Thinking Some consideration of correlation and causation

Need to express confidence in interpolated and extrapolated predictions

Limited requirement to criticise experimental design

What they find easy Calculator work (fitting the line, computing r)

Ranking the data

Drawing a scatterplot

Choosing the correlation coefficient

What they find hard Interpretation of intercept and slope

The meaning of the residual plot: connection of residuals to assumptions

Big picture – correlation and causation

Using appropriate language

Independence

This material is early in the course, so they can practice it – often choose this question on the exam

Richard Arnold and Cushla Thomson

Overview – 3.12 – Statistical Reports Standard taken by relatively few students (4300 in 2013)

E.g. taken by all Level 3 stats students at Onslow and Rongotai but not taken at Wellington Girls’, Tawa

4 credits Externally assessed in the November exams Around 4 weeks spent teaching then assessing the topic

My view: a great standard to wrap around lots of big ideas in

stats – we teach this standard last. Student view: ugh. Too wordy and we have to read the minds of

the examiners. Hard to get Excellence in – 3% E grades last year Heavy reading and writing component – challenging for

international students and local students with poor literacy skills

This Powerpoint, NZQA documents and the 2013 exam paper for 3.12 are on http://padlet.com/cushla_thomson/ymlwvp86vk6z

or http://bit.ly/1qZyca8

3 main parts Inference for poll results – proportions

Survey design; observational studies vs. experiments – critiquing studies and reports of studies

Non-sampling error

Inference for proportions

Inference for proportions in the context of poll results

The only place in Level 3 where students are assessed on classical confidence intervals – BUT…

Students use rules of thumb instead of exact CI (unless a rogue teacher teaches them the exact CIs anyway)

𝑝 ±1

𝑛 for values of 𝑝 between 30% and 70%

Difference of two proportions from one group

Difference of two proportions from 2 groups

Experiments vs observational studies Focus is on critiquing studies and newspaper reports

Difference between observational study and experiment

Can’t prove causation with observational study

Confounding variables

Experimental design principles:

Control and treatment groups

Random allocation

Blinding

Replication

Non-sampling error Focus is on critiquing studies and reports

Sampling methods (good and bad)

Self-selection bias

Interviewer effects

Non-response bias

Questioner effects

Question wording problems

Transfer of findings beyond study population

Statistical reports in STAT 193

Inference for a proportion (1.5 lectures) Taught as part of large sample estimation and testing

Comes after the Central Limit Theorem

Estimates and large sample confidence intervals

𝑝 ± 𝑍𝑝 (1 − 𝑝 )

𝑛

Margin of Error

Sample size calculation

Inference for a proportion (1.5 lectures) Hypothesis test for a single sample 𝐻0: 𝑝 = 𝑝0

Rejection region (inverse Normal: gcalc)

P-value (TEST function in gcalc)

Survey Design/Experimental Design Non-sampling error Polls are used as examples for proportions, teaching

sampling variability

No other explicit survey design material

Informal discussion of poor question design, leading questions, selection bias, observation/intervention: but not examined

This material mostly delayed until 2nd and 3rd year.

Assessment Not required to draw questions out of a large amount

of text

Context is spelled out to a large degree (Hypotheses are identified)

Only limited need for interpretation of findings

Need to know formulae for confidence intervals, margins of error, test statistics

Need to be able to identify and sketch a rejection region, critical value

Statistical Thinking Some consideration of correlation and causation

Limited requirement to criticise experimental design

Appreciation of sampling variability, impact of sample size

What they find easy Calculator work (entering components into the test

function of the calculator)

Substituting into a formula, once they know which one to use

What they find hard Using appropriate language

Rejection regions for one and two-sided tests

Identifying the correct formula (small vs. large sample)

Sampling variability as a concept

Difference between sample and population

This material is in the middle of the course, but the small sample material that follows it (t-tests) makes it harder for them to recall this material. Often skip this question on the exam.

Download - (Achievement Standard 91581) Richard Arnold and Cushla …sms.victoria.ac.nz/foswiki/pub/Events/TeacherDay/WebHome/... · Students are given a multivariate dataset with some ... This

Top Related