introduction to statistics and er ror anal...
TRANSCRIPT
Physics116C, 4/3/06
D. Pellett
Introduction to Statistics and Error
Analysis
References:Data Reduction and Error Analysis for the Physical Sciences by Bevington and Robinson
Particle Data Group notes on probability and statistics, etc.–online at http://pdg.lbl.gov/2004/reviews/contents_sports.html(Reference is S. Eidelman et al, Phys. Lett. B 592, 1 (2004))
Any original presentations copyright D. Pellett 2006
Crucial Issues for Experimenters
• Accuracy of data
• Probability distributions for data
• Statistical parameters and estimates:
• Propagation of errors
• Comparison with theory
• Curve fitting (also Essick, Ch. 8)
• Significance of results
• Chi-square test, confidence intervals
• Physics 116C applications:
• Radioactive decay; mean life of nuclide; Johnson noise
µ, !
Measurement and Uncertainty Estimation
• Error: “Difference between the measured and true value”
• But: the “true value” is unknown...
• Make measurement and estimate uncertainty due to
• blunders: oops! – e.g., wrote down wrong number - do over carefully... may appear as statistical “outlier”
• random errors (statistical fluctuations) differ from trial to trial but average to better value – e.g., repeated measurements with meter stick
• systematic errors due to reproducible discrepancies – e.g., measurements with cold metal ruler appear bigger since scale has contracted
Accuracy vs. Precision
• Accuracy: how close to true value
• Precision: how well the result has been determined - e.g., how many decimal places for the result
• Example: can improve length measurement with vernier caliper
• Can be precise
• Can also be accurate if careful
• But if you hold it away from the piece you are measuring and estimate the length by sight, the precision will be the same but the accuracy will suffer...
• Understand use of significant figures (see Bevington, Ch. 1)
• Often better to state error estimate: l = 1.423 ± 0.003 m
Estimating Statistical Uncertainty
• Repeated measurements to understand uncertainties
• Mean value gives better estimate
• Leads to study of probability and statistics
• Parent distributions – those of larger, perhaps infinite set of possible measurements (parent population) from which a finite sample is drawn
• Sample distributions
• Example: make N = 100 length measurements xi
• Histogram results (frequency distribution)
• Calculate sample mean, sample standard deviation
• Compare with parent distribution (assumed Gaussian)
x̄, s
Statistics
σ2! lim
N!"
! 1
N
N"
i=1
(xi " µ)2#
= limN!"
! 1
N
N"
i=1
xi2
#
" µ2
µ ! limN!"
!
1
N
N"
i=1
xi
#
x̄ !
1
N
N!
i=1
xi
s2!
1
(N " 1)
N!
i=1
(xi " x̄)2
Sample mean:
Parent distribution mean:
Variance:
Sample variance (unbiased estimate of variance from sample):
! = parent standard deviation
s = sample standard deviation
(Divide by (N-1) since sample mean used)
Standard deviation:
Statistics (continued)
• Some other statistics:
• mode (most probable value)
• median (half values less, half greater in infinite population)For example, for a Gaussian, the median equals the mean
• Finite sample (according to Mathworld):Put samples in increasing order. If the number of samples is odd, the value of the sample in the center is the median. If the number of samples is even, take the average of the values of the samples on either side of the center. Reference:http://mathworld.wolfram.com/StatisticalMedian.html
Distributions and Expectation Values
• Probability density function (p.d.f) p(x) (following Bevington notation –
usually the p.d.f. is written f(x)).
• Cumulative distribution function P(x):
(again, this is usually written as F(x)).P(a) is the probability that x ! a.
• Expectation value of a function of a random variable:
• For discrete distribution:
< u(x) >=
!!
"!
u(x)f(x) dx
< u(x) >=!
u(xi)P (xi)
Area of width !x under p(x) curve is the normalized probability of x falling within the !x interval
P (x) =
!x
−∞
p(x′) dx′
Parent and Sample Distributions
• Simulate results of N =100 repeated measurements xi (random samples) from a parent Gaussian distribution f(x) = pGauss(x;µ,!)
• Calculate, compare sample mean and sample standard deviation
• Compare sample histogram with N f(x) "x (where "x = bin width)
Gaussian Distribution
Joint p.d.f. Statistics
f(x, y)
µx =
!!
"!
!!
"!
xf(x, y)dxdy µy =
!!
"!
!!
"!
yf(x, y)dxdy
!xy !
cov[x, y]
"x"y
Joint probability density function:
Marginal p.d.f.’s in x and y (all but one variable unobserved):
f1(x) =
!!
"!
f(x, y)dy f2(y) =
!!
"!
f(x, y)dx
Mean values of x, y (expectation values under joint p.d.f.):
Covariance:
Correlation coefficient: (!1 " !xy " 1)
f(x, y) = f1(x)f2(y)x, y independent iff
Independence implies (but the converse is not true)!xy = 0
cov[x, y] !< (x " µx)(y " µy) >=< xy > "µxµy
Example: 40,000 Samples from Joint p.d.f.
root [14] h2.GetCovariance(1,2)(const Stat_t)(-1.81279999091343846e-05)root [15] h2.GetCorrelationFactor(1,2)(const Stat_t)(-6.44433599986376094e-05)root [16] h2.GetRMS(1)(const Stat_t)6.98881747937568965e-01root [17] h2.GetRMS(2)(const Stat_t)4.02501975150788061e-01root [18] h2.Integral()(const Stat_t)4.00000000000000000e+04
• Here f(x,y) is a product of two independent Gaussians in x and y (x, y are independent) – plots done with ROOT (root.cern.ch)
• Marginal distribution histograms can be found from sums of rows or columns (add sums in the margins of the chart)
!y = 0.40
σx = 0.70 cov[x, y] = !1.81 " 10!5
!xy = !6.4 " 10!5
Example: Variables Not Independent
• Sum of two functions of x and y (each a Gaussian similar to prev. plot)
• Correlation coefficient = 0.77
Common Distributions and Properties
• Typical probability distributions
• Uniform distribution
• Gaussian distribution
• Central limit theorem and LabVIEW example
• Lorentzian distribution (a.k.a. Cauchy, Breit-Wigner)
• “Propagation of errors” overview (more later)
• Generation of Pseudorandom distributions with LabVIEW
• Next -
• Counting statistics and “square root of N”
• Binomial distribution and Gaussian limit for large n
• Poisson distribution, relation to binomial, Gaussian
Gaussian Histogram Generation VI
• Generate random number from uniform distribution with rand()
• Transform to sample from unit normal distribution using Inverse Normal Dist. vi
• Scale from z to desired mean and std. dev.
Central Limit Theorem Example
• Sum of 6 samples from uniform distribution approximates Gaussian with mean = 3 and standard deviation = 0.707
• Cutoff at 4.24 standard deviations
• Probability for exceeding 4.23 standard deviations is 2.3 x 10-5
Central Limit Example VI
• Add 6 samples from uniform distribution
Overview: Propagation of Errors
• Brief overview
• Suppose we have x = f(u,v) and u and v are uncorrelated random variables with Gaussian distributions.
• Expand f in a Taylor’s series around x0 = f(u0,v0) where
u0,v0 are the mean values of u and v, keeping the lowest order terms:
• The distribution of !x is a bivariate distribution in !u and
!v. Under suitable conditions (see Bevington Ch. 3) we can approximate !x (the standard deviation of !x) by
Brief Article
The Author
April 8, 2006
!x =!
!f
!u
"!u +
!!f
!v
"!v
"x2 !
!!f
!u
"2
"u2 +
!!f
!v
"2
"v2
1
Brief Article
The Author
April 8, 2006
!x ! x" x0 =!
!f
!u
"!u +
!!f
!v
"!v
"x2 #
!!f
!u
"2
"u2 +
!!f
!v
"2
"v2
1
Binomial Distribution Summary
• The binomial distribution arises in flipping coins or performing a similar experiment with exactly two possible outcomes repeatedly with independent trials.
• Call the outcome of a single trial “heads” with probability p
or “tails” with probability (1-p), allowing the possibility of unequal probabilities.
• Then the probability for getting x heads with n trials is
• The distribution has " = np and !2 = npq.
• For n >>1, the discrete distribution can be approximated by a Gaussian p.d.f. with the above mean and standard deviation.
Brief Article
The Author
April 8, 2006
!x ! x" x0 =!
!f
!u
"!u +
!!f
!v
"!v
"x2 #
!!f
!u
"2
"u2 +
!!f
!v
"2
"v2
PB(x;n, p) =n!
x!(n" x)!px(1" p)n!x
1
(! =!
npq)