given a sample from some population: what is a good “summary” value which well describes the...
TRANSCRIPT
![Page 1: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/1.jpg)
• Given a sample from some population:• What is a good “summary” value which well
describes the sample?
• We will look at:
• Average (arithmetic mean)
• Median
• Mode
Measures of Location
For reference see (available on-line): “The Dynamic Character of Disguised Behaviour for Text-based, Mixed and Stylized Signatures”
LA Mohammed, B Found, M Caligiuri and D RogersJ Forensic Sci 56(1),S136-S141 (2011)
![Page 2: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/2.jpg)
Histogram Points of Interest
• Velocity for the first segment of genuine signatures in (soon to be classic) Mohammed et al. study.
• What is a good summary number?
• How spread out is the data? (We will talk about this later)
![Page 3: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/3.jpg)
• Arithmetic sample mean (average):• The sum of data divided by number of observations:
Measures of Location
intuitive formula
fancy formula
![Page 4: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/4.jpg)
• Example from LAM study: • Compute the average absolute size of segment 1 for
the genuine signature of subject 2:
Subj. 2; Gen; Seg. 1
Absolute Size (cm)
1 0.05482 0.29513 0.10264 0.10055 0.24916 0.12877 0.04968 0.22999 0.256
10 0.0538
Measures of Location
![Page 5: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/5.jpg)
• Example: • More useful: Consider again Absolute Average Velocity for
Genuine Signatures across all writers in the LAM study:
92 subjects × 10 measurements/subject = 920 velocity measurements
Average Absolute Average Velocity:
Measures of Location
![Page 6: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/6.jpg)
• Follow up question: • Is there a difference in the Abs. Avg. Veloc. for Genuine
signatures vs. Disguised signatures (DWM and DNM)??
Genuine DWM DNM
• We will learn how to answer this, but not yet.
Measures of Location
![Page 7: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/7.jpg)
• Sample median:• Ordering the n pieces of data from smallest value to
largest value, the median is the “middle value”:
• If n is odd, median is largest data point.
• If n is even, median is average of and largest data points.
th1
2
n
th
2
n th12
n
Measures of Location
![Page 8: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/8.jpg)
• Example: • Median of Average Absolute Velocity for Genuine Signatures,
LAM:
Avg
Measures of Location
![Page 9: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/9.jpg)
• Sample mode:• Needs careful definition but basically:
• The data value that occurs the most
Avg
mode = 9.2541
Med
Measures of Location
![Page 10: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/10.jpg)
• Some trivia:
Nice and symmetric:Mean = Median = Mode
Mean
Modes
Measures of Location
![Page 11: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/11.jpg)
Measures of Location
![Page 12: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/12.jpg)
Measures of Location
Toss out the largest 5% and smallest 5% of the data
![Page 13: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/13.jpg)
• Sample variance:• (Almost) the average of squared deviations from the
sample mean.
Measures of Data Spread
22
1
1
1
n
ii
s x xn
data point i
sample mean
there are n data points
2s s• Standard deviation is • The sample average and standard dev. are the most
common measures of central tendency and spread
• Sample average and standard dev have the same units
![Page 14: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/14.jpg)
Measures of Data Spread• If you have “enough” data, you can fit a smooth
probability density function to the histogram
![Page 15: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/15.jpg)
Measures of Data Spread
~ 68%±1s
~ 95%±2s
~ 99%±3s
• Trivia: The famous (standardized) “Bell Curve”
• Also called “normal” and “Gaussian”• Mean = 0
• Std Dev = 1
• Units are in
Std Devs
- - -
![Page 16: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/16.jpg)
Measures of Data Spread
![Page 17: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/17.jpg)
• Sample range:• The difference between the largest and smallest
value in the sample• Very sensitive to outliers (extreme observations)
• Percentiles:• The pth percentile data value, x, means that p-
percent of the data are less than or equal to x.• Median = 50th percentile
Measures of Data Spread
![Page 18: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/18.jpg)
1st-%tile99th-%tile
1.520031.52008
Measures of Data Spread
![Page 19: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/19.jpg)
Measures of Data Spread
![Page 20: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/20.jpg)
Confidence Intervals
• A confidence interval (CI) gives a range in which a true population parameter may be found.
• Specifically, (1-α)×100% CIs for a parameter, constructed from a random sample (of a given sample size), will contain the true value of the parameter approximately (1-α)×100% of the time.
• α is called the “level of significance”
• Different from tolerance and prediction intervals
![Page 21: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/21.jpg)
Confidence Intervals
• Caution: IT IS NOT CORRECT to say that there a (1- α)×100% probability that the true value of a parameter is between the bounds of any given CI.
true valueof parameter
Here 90% of theCIs contain thetrue value of theparameter
Graphical representation of 90% CIs is for a parameter:
Take a sample.Compute a CI.
![Page 22: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/22.jpg)
• Construction of a CI for a mean depends on:• Sample size n
• Standard error for means
• Level of confidence 1-α• α is significance level
• Use α to compute tc-value
• (1-α)×100% CI for population mean using a sample average and standard error is:
Confidence Intervals
![Page 23: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/23.jpg)
• Compute a 99% confidence interval for the mean using this sample set:
Confidence Intervals
Fragment # Fragment nD1 1.520052 1.520033 1.520014 1.520045 1.520006 1.520017 1.520088 1.520119 1.52008
10 1.5200811 1.52008
Putting this together:[1.52005 - (3.17)(0.00001), 1.52005 + (3.17)(0.00001)]
99% CI for sample = [1.52002, 1.52009]
![Page 24: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/24.jpg)
Confidence Intervals
![Page 25: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/25.jpg)
Hypothesis Testing• A hypothesis is an assumption about a statistic.
• Form a hypothesis about the statistic
• H0, the null hypothesis
• Identify the alternative hypothesis, Ha
• “Accept” H0 or “Reject” H0 in favour of Ha at a certain confidence level (1-α)×100%• Technically, “Accept” means “Do not Reject”
• The testing is done with respect to how sample values of the statistic are distributed• Student’s-t
• Gaussian
• Binomial
• Poisson
• Bootstrap, etc.
![Page 26: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/26.jpg)
Hypothesis Testing• Hypothesis testing can go wrong:
• 1-β is called test’s power
• Do the thicknesses of float glass differ from non float glass?
• How can we use a computer to decide?
H0 is really true H0 is really false
Test rejects H0 Type I error. Probability is α
OK
Test accepts H0 OK Type II error. Probability is β
![Page 27: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/27.jpg)
Importing External Data From a Spread Sheet
• Use R function read.csv:• Import (fake) float glass thickness data in file
glass_thickness_simulated.csv:
read.csv(“/Path/to/your/data/glass_thickness_simulated.csv", header=T)
![Page 28: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/28.jpg)
Hypothesis Testing
![Page 29: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/29.jpg)
Analysis of Variance
• Standard hypothesis testing is great for comparing two statistics.• What is we have more than two statistics to compare?
• Use analysis of variance (ANOVA)
• Note that the statistics to be compares must all be of the same type• Usually the statistic is an average “response” for
different experimental conditions or treatments.
![Page 30: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/30.jpg)
Analysis of Variance• H0 for ANOVA
• The values being compared are not statistically different at the (1-a)×100% level of confidence
• Ha for ANOVA
• At least one of the values being compared is statically distinct.
• ANOVA computes an F-statistic from the data and compares to a critical Fc value for
• Level of confidence
• D.O.F. 1 = # of levels -1
• D.O.F. 2 = # of obs. - # of levels
![Page 31: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/31.jpg)
Analysis of Variance• Levels are “categorical variables” and can be:
• Group names
• Experimental conditions
• Experimental treatments
![Page 32: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median](https://reader030.vdocuments.us/reader030/viewer/2022032802/56649e165503460f94b01d37/html5/thumbnails/32.jpg)
Analysis of Variance