some statistical ideas marian scott statistics, university of glasgow september 2011
TRANSCRIPT
![Page 1: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/1.jpg)
Some statistical ideas
Marian ScottStatistics, University of Glasgow
September 2011
![Page 2: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/2.jpg)
What shall we cover?
• Why might we need some statistical skills
• Statistical inference- what is it?
• how to handle variation
• exploring data
• probability models
• inferential tools- hypothesis tests and confidence intervals
![Page 3: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/3.jpg)
Why quantify?
We need statistical skills to:• Make sense of numerical
information, • Summarise data,• Present results
(graphically),• Test hypotheses• Construct models
• Decision making- Which areas should be restricted?
• Prediction-What is the trend in temperature? Predict its level in 2050?
• Decision making-is it safe to eat fish?
• Regulatory- Have emission control agreements reduced air pollutants?
• Understanding -when did things happen in the past
![Page 4: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/4.jpg)
Observed nitrogen signals in rivers, lakes
and groundwater in Europe (EEA).
• What is a trend and how should we evaluate it?
• How sure are we?
![Page 5: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/5.jpg)
Trends in seasons over Europe (Global Change Biology, 2006)
• 21 countries, 125,000 studies, 542 plant and 19 animal species, 1971-2000
• Spring is on average 6 to 8 days earlier than it was 30 years ago
• Analysis of 254 national time series , pattern of observed change in spring matches measured national warming (correlation coefficient –0.69, P<0.001)
• What do the statistical terms mean?
![Page 6: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/6.jpg)
Spatial patterns of change
• Spatial patterns of change may be important
• Changes in the start and end of the growing season between two years (1961, 2004)– heterogeneous
![Page 7: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/7.jpg)
Data types
• Numerical: a variable may be either continuous or discrete. – For a discrete variable, the values taken are whole numbers
(e.g. number of invertebrates). – For a continuous variable, values taken are real numbers ( e.g.
pH, alkalinity, DOC, temperature).• Categorical: a limited number of categories or classes
exist, each member of the sample belongs to one and only one of the classes. – Compliance is a nominal categorical variable since the
categories are unordered. – Level of diluent (eg recorded as low, medium ,high) would be
an ordinal categorical variable since the different classes are ordered
![Page 8: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/8.jpg)
the statistical process
• A process that allows inferences about properties of a large collection of things (the population) to be made based on observations on a small number of individuals belonging to the population (the sample).
• The use of valid statistical sampling techniques increases the chance that a set of specimens (the sample, in the collective sense) is collected in a manner that is representative of the population.
![Page 9: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/9.jpg)
![Page 10: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/10.jpg)
What is the population?
• The population is the set of all items that could be sampled, such as all fish in a lake, all people living in the UK, all trees in a spatially defined forest, or all 20-g soil samples from a field. Appropriate specification of the population includes a description of its spatial extent and perhaps its temporal stability
![Page 11: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/11.jpg)
What are the sampling units?
In some cases, sampling units are discrete entities (i.e., animals, trees), but in others, the sampling unit might be investigator-defined, and arbitrarily sized.
Example- technetium in shellfishThe objective here is to provide a measure (the average) of
technetium in shellfish (eg lobsters for human consumption) for the west coast of Scotland.
• Population is all lobsters on the west coast• Sampling unit is an individual animal.
Variability exists amongst the sampling units and hence within the population
![Page 12: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/12.jpg)
• Summarising data- means, medians and other such statistics
![Page 13: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/13.jpg)
![Page 14: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/14.jpg)
![Page 15: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/15.jpg)
![Page 16: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/16.jpg)
![Page 17: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/17.jpg)
• plotting data- histograms, boxplots, stem and leaf plots, scatterplots
![Page 18: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/18.jpg)
![Page 19: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/19.jpg)
![Page 20: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/20.jpg)
![Page 21: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/21.jpg)
![Page 22: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/22.jpg)
median
lower quartile
upper quartile
![Page 23: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/23.jpg)
Example -Bathing water quality
• All bathing water sites are classified as either ‘Excellent’, ‘Good’, ‘Sufficient’ or ‘Poor’ in terms of the quantities of 2 different microbiological indicator bacteria
• Faecal Streptococci (FS)• Faecal Coliforms (FC)
• ‘Sufficient’ is the minimum standard that bathing water sites are required to meet
• Classification for each site is based on the 90th & 95th percentiles of samples over the most recent 4 bathing seasons
joint work with Ruth Haggarty, Claire Ferguson
![Page 24: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/24.jpg)
Preliminary Analysis
• There is considerable variation – Across different sites – Within the same site
across different years• Distribution of data is
highly skewed with evidence of outliers and in some cases bimodality
2004 2005 2006 20070
20
04
00
60
08
00
Boxplots of FS: 114567
SEPA location code 114567Year
FS
![Page 25: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/25.jpg)
• probability models- the Normal especially
![Page 26: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/26.jpg)
![Page 27: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/27.jpg)
![Page 28: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/28.jpg)
![Page 29: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/29.jpg)
![Page 30: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/30.jpg)
![Page 31: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/31.jpg)
• checking distributional assumptions
![Page 32: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/32.jpg)
Histogram of FS
SEPA location code: 4556FS/100ml
De
nsi
ty
0 20 40 60 80 100
0.0
00
.02
0.0
40
.06
0.0
8
-2 -1 0 1 2
02
04
06
08
0
Normal Q-Q Plot
Theoretical Quantiles
Sa
mp
le Q
ua
ntil
es
Histogram of log10(FS)
SEPA location code: 4556log10(FS)/100ml
De
nsi
ty
0.0 0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
-2 -1 0 1 2
0.0
0.5
1.0
1.5
2.0
Normal Q-Q Plot
Theoretical Quantiles
Sa
mp
le Q
ua
ntil
es
![Page 33: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/33.jpg)
Modelling Continuous Variables checking normality
• Normal probability plot
• Should show a straight line
• p-value of test is also reported (null: data are Normally distributed)C1
Perc
ent
43210-1-2-3
99.9
99
95
90
80706050403020
10
5
1
0.1
Mean
0.439
0.1211StDev 1.015N 100AD 0.361P-Value
Probability Plot of C1Normal
![Page 34: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/34.jpg)
Statistical inference
• Confidence intervals
• Hypothesis testing and the p-value
• Statistical significance vs real-world importance
![Page 35: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/35.jpg)
• a formal statistical procedure- confidence intervals
![Page 36: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/36.jpg)
Confidence intervals- an alternative to hypothesis testing
• A confidence interval is a range of credible values for the population parameter. The confidence coefficient is the percentage of times that the method will in the long run capture the true population parameter.
• A common form is sample estimator 2* estimated standard error
![Page 37: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/37.jpg)
![Page 38: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/38.jpg)
![Page 39: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/39.jpg)
![Page 40: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/40.jpg)
![Page 41: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/41.jpg)
• another formal inferential procedure- hypothesis testing
![Page 42: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/42.jpg)
Hypothesis Testing
• Null hypothesis: usually ‘no effect’
• Alternative hypothesis: ‘effect’
• Make a decision based on the evidence (the data)
• There is a risk of getting it wrong!
• Two types of error:-– reject null when we shouldn’t - Type I– don’t reject null when we should - Type II
![Page 43: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/43.jpg)
Significance Levels
• We cannot reduce probabilities of both Type I and Type II errors to zero.
• So we control the probability of a Type I error. This is referred to as the Significance Level or p-value.
• Generally p-value of <0.05 is considered a reasonable risk of a Type I error.(beyond reasonable doubt)
![Page 44: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/44.jpg)
Statistical Significance vs. Practical Importance
• Statistical significance is concerned with the ability to discriminate between treatments given the background variation.
• Practical importance relates to the scientific domain and is concerned with scientific discovery and explanation.
![Page 45: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/45.jpg)
Power
Power is related to Type II error
probability of power = 1 - making a Type II
error
Aim:
to keep power as high as possible (also related to sample size calculations)
![Page 46: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/46.jpg)
![Page 47: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/47.jpg)
![Page 48: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/48.jpg)
![Page 49: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/49.jpg)
• relationships- linear or otherwise
![Page 50: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/50.jpg)
Correlations and linear relationships
• pearson correlation
• Strength of linear relationship
• Simple indicator lying between –1 and +1
• Check your plots for linearity
![Page 51: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/51.jpg)
Interpreting correlations
• The correlation coefficient is used as a measure of the linear relationship between two variables,
• The correlation coefficient is a measure of the strength of the linear association between two variables. If the relationship is non-linear, the coefficient can still be evaluated and may appear sensible, so beware- plot the data first.
![Page 52: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/52.jpg)
3210-1
-1.0
-1.5
-2.0
-2.5
-3.0
-3.5
-4.0
log Fe
log P
Scatterplot of log P vs log Fe
3210-1
0
-1
-2
-3
-4
-5
-6
log Fe
log N
Scatterplot of log N vs log Fe
0-1-2-3-4-5-6
-1.0
-1.5
-2.0
-2.5
-3.0
-3.5
-4.0
log N
log P
Scatterplot of log P vs log N
0.167
0.1340.380
![Page 53: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/53.jpg)
• what is a statistical model?
![Page 54: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/54.jpg)
Statistical models
• Outcomes or Responsesthese are the results of the practical work and are sometimes referred to as ‘dependent variables’.
• Causes or Explanationsthese are the conditions or environment within which the outcomes or responses have been observed and are sometimes referred to as ‘independent variables’, but more commonly known as covariates.
![Page 55: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/55.jpg)
Specifying a statistical models• Models specify the way in which outcomes and
causes link together, eg. • Chl-a ~ Temperature• there should be an additional item on the right hand
side giving a formula:-
• Chl-a ~ Temperature + Error
• This says that Chl-a depends on temperature, but that there is also some random variability (error)
![Page 56: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/56.jpg)
Example 1: are atmospheric SO2 concentrations declining?
• Measurements made at a monitoring station over a 20 year period
• Complex statistical model developed to describe the pattern, the model portions the variation to ‘trend’, seasonality, residual variation
![Page 57: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/57.jpg)
so2 monitored in GB02
observations
so2
0 50 100 150 200 250
02
46
81
0
![Page 58: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/58.jpg)
Plot of so2 against time, monitored in GB02Lines = Model 3
months
so2
1980 1985 1990 1995
02
46
81
0
![Page 59: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/59.jpg)
summary
• hypothesis tests and confidence intervals are used to make inferences
• we build statistical models to explore relationships and explain variation
• a general linear modelling framework is very flexible
• assumptions should be checked.
![Page 60: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/60.jpg)
Statistics might be needed where?
• designing and evaluation monitoring and sampling networks; sampling strategies
• the analysis of observational records, (e.g. past climate indicators, water quality, pollutant trends); trends, spatio-temporal modelling, dealing with variation
• the study and modelling of extreme events (e.g. sea levels, flood prediction) for prediction and management of future occurrences; extremes, risk modelling, uncertainty
• evaluating the state of the environment;trends, uncertainty, prediction
![Page 61: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/61.jpg)
Statistics might be needed where?
• the use of complex computer models to simulate the whole earth system (e.g. climate change and the carbon cycle); uncertainty, model evaluation
• the analysis of observational records, (e.g. past climate indicators, water quality, pollutant trends); trends, spatio-temporal modelling, dealing with variation
• the study and modelling of extreme events (e.g. sea levels, flood prediction) for prediction and management of future occurrences; extremes
• the evaluation and quantification of risk and uncertainty (e.g. volcanic or earthquake prediction);uncertainty, prediction
![Page 62: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/62.jpg)
Statistics and the environment
• Appropriate statistical models can give – added value to your data, – better descriptions of complex change
behaviour and – begin to tease out climate change driven
effects in environmental quality – handle natural variation.
• Greater, innovative statistical analysis needed for environmental science
![Page 63: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/63.jpg)
Statistics and the environment
As environmental scientists, we need to try and ensure that:
data are gathered under good statistical principles and that they are not left in the filing cabinet.
We need to ensure thatGood environmental science is served by good statistical science.
Environmental science should be “Data and information rich”
![Page 64: Some statistical ideas Marian Scott Statistics, University of Glasgow September 2011](https://reader038.vdocuments.us/reader038/viewer/2022110116/5515c9a6550346a3758b4a74/html5/thumbnails/64.jpg)
Statistics training
• we have chosen a number of key statistical topics to cover- there are many others
• each topic will be covered in a general sense but will also have practical examples for you to work through with guidance
• the main software tool will be R, which is freely available
• there should be lots of opportunities to ask questions