given a sample from some population: what is a good “summary” value which well describes the...

32
•Given a sample from some population: • What is a good “summary” value which well describes the sample? • We will look at: Average (arithmetic mean) Median Mode Measures of Location For reference see (available on-line): “The Dynamic Character of Disguised Behaviour for Text-based, Mixed and Stylized Signatures LA Mohammed, B Found, M Caligiuri and D Rogers J Forensic Sci 56(1),S136-S141 (2011)

Upload: bertina-judith-burns

Post on 26-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Given a sample from some population:• What is a good “summary” value which well

describes the sample?

• We will look at:

• Average (arithmetic mean)

• Median

• Mode

Measures of Location

For reference see (available on-line): “The Dynamic Character of Disguised Behaviour for Text-based, Mixed and Stylized Signatures”

LA Mohammed, B Found, M Caligiuri and D RogersJ Forensic Sci 56(1),S136-S141 (2011)

Page 2: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Histogram Points of Interest

• Velocity for the first segment of genuine signatures in (soon to be classic) Mohammed et al. study.

• What is a good summary number?

• How spread out is the data? (We will talk about this later)

Page 3: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Arithmetic sample mean (average):• The sum of data divided by number of observations:

Measures of Location

intuitive formula

fancy formula

Page 4: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Example from LAM study: • Compute the average absolute size of segment 1 for

the genuine signature of subject 2:

Subj. 2; Gen; Seg. 1

Absolute Size (cm)

1 0.05482 0.29513 0.10264 0.10055 0.24916 0.12877 0.04968 0.22999 0.256

10 0.0538

Measures of Location

Page 5: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Example: • More useful: Consider again Absolute Average Velocity for

Genuine Signatures across all writers in the LAM study:

92 subjects × 10 measurements/subject = 920 velocity measurements

Average Absolute Average Velocity:

Measures of Location

Page 6: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Follow up question: • Is there a difference in the Abs. Avg. Veloc. for Genuine

signatures vs. Disguised signatures (DWM and DNM)??

Genuine DWM DNM

• We will learn how to answer this, but not yet.

Measures of Location

Page 7: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Sample median:• Ordering the n pieces of data from smallest value to

largest value, the median is the “middle value”:

• If n is odd, median is largest data point.

• If n is even, median is average of and largest data points.

th1

2

n

th

2

n th12

n

Measures of Location

Page 8: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Example: • Median of Average Absolute Velocity for Genuine Signatures,

LAM:

Avg

Measures of Location

Page 9: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Sample mode:• Needs careful definition but basically:

• The data value that occurs the most

Avg

mode = 9.2541

Med

Measures of Location

Page 10: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Some trivia:

Nice and symmetric:Mean = Median = Mode

Mean

Modes

Measures of Location

Page 11: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Measures of Location

Page 12: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Measures of Location

Toss out the largest 5% and smallest 5% of the data

Page 13: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Sample variance:• (Almost) the average of squared deviations from the

sample mean.

Measures of Data Spread

22

1

1

1

n

ii

s x xn

data point i

sample mean

there are n data points

2s s• Standard deviation is • The sample average and standard dev. are the most

common measures of central tendency and spread

• Sample average and standard dev have the same units

Page 14: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Measures of Data Spread• If you have “enough” data, you can fit a smooth

probability density function to the histogram

Page 15: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Measures of Data Spread

~ 68%±1s

~ 95%±2s

~ 99%±3s

• Trivia: The famous (standardized) “Bell Curve”

• Also called “normal” and “Gaussian”• Mean = 0

• Std Dev = 1

• Units are in

Std Devs

- - -

Page 16: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Measures of Data Spread

Page 17: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Sample range:• The difference between the largest and smallest

value in the sample• Very sensitive to outliers (extreme observations)

• Percentiles:• The pth percentile data value, x, means that p-

percent of the data are less than or equal to x.• Median = 50th percentile

Measures of Data Spread

Page 18: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

1st-%tile99th-%tile

1.520031.52008

Measures of Data Spread

Page 19: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Measures of Data Spread

Page 20: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Confidence Intervals

• A confidence interval (CI) gives a range in which a true population parameter may be found.

• Specifically, (1-α)×100% CIs for a parameter, constructed from a random sample (of a given sample size), will contain the true value of the parameter approximately (1-α)×100% of the time.

• α is called the “level of significance”

• Different from tolerance and prediction intervals

Page 21: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Confidence Intervals

• Caution: IT IS NOT CORRECT to say that there a (1- α)×100% probability that the true value of a parameter is between the bounds of any given CI.

true valueof parameter

Here 90% of theCIs contain thetrue value of theparameter

Graphical representation of 90% CIs is for a parameter:

Take a sample.Compute a CI.

Page 22: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Construction of a CI for a mean depends on:• Sample size n

• Standard error for means

• Level of confidence 1-α• α is significance level

• Use α to compute tc-value

• (1-α)×100% CI for population mean using a sample average and standard error is:

Confidence Intervals

Page 23: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

• Compute a 99% confidence interval for the mean using this sample set:

Confidence Intervals

Fragment # Fragment nD1 1.520052 1.520033 1.520014 1.520045 1.520006 1.520017 1.520088 1.520119 1.52008

10 1.5200811 1.52008

Putting this together:[1.52005 - (3.17)(0.00001), 1.52005 + (3.17)(0.00001)]

99% CI for sample = [1.52002, 1.52009]

Page 24: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Confidence Intervals

Page 25: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Hypothesis Testing• A hypothesis is an assumption about a statistic.

• Form a hypothesis about the statistic

• H0, the null hypothesis

• Identify the alternative hypothesis, Ha

• “Accept” H0 or “Reject” H0 in favour of Ha at a certain confidence level (1-α)×100%• Technically, “Accept” means “Do not Reject”

• The testing is done with respect to how sample values of the statistic are distributed• Student’s-t

• Gaussian

• Binomial

• Poisson

• Bootstrap, etc.

Page 26: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Hypothesis Testing• Hypothesis testing can go wrong:

• 1-β is called test’s power

• Do the thicknesses of float glass differ from non float glass?

• How can we use a computer to decide?

H0 is really true H0 is really false

Test rejects H0 Type I error. Probability is α

OK

Test accepts H0 OK Type II error. Probability is β

Page 27: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Importing External Data From a Spread Sheet

• Use R function read.csv:• Import (fake) float glass thickness data in file

glass_thickness_simulated.csv:

read.csv(“/Path/to/your/data/glass_thickness_simulated.csv", header=T)

Page 28: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Hypothesis Testing

Page 29: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Analysis of Variance

• Standard hypothesis testing is great for comparing two statistics.• What is we have more than two statistics to compare?

• Use analysis of variance (ANOVA)

• Note that the statistics to be compares must all be of the same type• Usually the statistic is an average “response” for

different experimental conditions or treatments.

Page 30: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Analysis of Variance• H0 for ANOVA

• The values being compared are not statistically different at the (1-a)×100% level of confidence

• Ha for ANOVA

• At least one of the values being compared is statically distinct.

• ANOVA computes an F-statistic from the data and compares to a critical Fc value for

• Level of confidence

• D.O.F. 1 = # of levels -1

• D.O.F. 2 = # of obs. - # of levels

Page 31: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Analysis of Variance• Levels are “categorical variables” and can be:

• Group names

• Experimental conditions

• Experimental treatments

Page 32: Given a sample from some population: What is a good “summary” value which well describes the sample? We will look at: Average (arithmetic mean) Median

Analysis of Variance