chaper 3

Chaper 3

Some basic concepts of statistics

Population versus Sample

Population• Numbers that describe the

population are called _________________

• Population mean is represented by ________

• Population variance is represented by ________

Sample• Numbers that describe the

sample are called __________________

• Sample mean is represented by ________

• Sample variance is represented by ________

Sample mean and varianceUse the following data set: 5,9,8,7,6,5,8,4,1

•Calculate sample mean:

•Calculate sample variance:

•Sample standard deviation:

Population Mean and Standard deviation

• = E(Y) = yip(yi)

• Population standard deviation: 2 = (yi-)2p(yi)

Use the following information to calculate Population mean, variance and standard deviation:

Y P(Y)1 0.12 0.63 0.24 0.1

Sampling distribution

• The distribution of all y-bars possible with n=50.

• E(y-bar)= • Var(y-bar)= 2/n

Section 3.3 Summarizing Information in Populations and Samples: The Finite

Population Case• If the population is infinitely large, we can

assume sampling without replacement (probabilities of selecting observations are independent)

• However, if population is finite, then probabilities of selecting elements will change as more elements are selected

(Example: rolling a die versus selecting cards from standard 52 card deck)

Estimating total population

• We will represent the total of a population as and the statistic as -hat

• More to come on this in the next few chapters

Sampling without replacement

• Same idea can be used with sampling without replacement, but probabilities become more difficult to find (STT 315 helps to understand how to calculate these).

3.4 Sampling distribution

• In your introductory statistics class, you discovered that the sampling distribution of y-bar was normally distributed (if n was large enough) with mean and standard deviation /sqrt(n).

Tchebysheff’s theorem

• If n is NOT large enough to assume CLT and the population distribution is NOT normal, then we can still use Tchebysheff’s theorem to get a lower bound:For any k > 1, at least (1-(1/k2)) will fall within k standard deviations of the mean (this is a LOWER BOUND!!) . Therefore, within 1 standard deviation, at least 0% (not very useful); within 2 standard deviations, at least 75%; within 3, at least 88.88889%

Finite population sizeAll the theory in introductory statistics class (and so far in this

class) assumes INDEPENDENT observations (infinite population…..or so large that we can assume infinite population)

What happens when this is not true?

RcodeR-code:x<-rgamma(80,shape=0.5,scale=9)hist(x)x.bar.dist<-function(x,n){xbar<-vector(length=1000) for (i in 1:1000){ temp<-sample(x,n,replace=FALSE) xbar[i]<-mean(temp) } return(xbar)}

RcodeR-code:x.bar.dist1<-function(n){xbar<-vector(length=1000) for (i in 1:1000){ temp<-rgamma(n,shape=0.5,scale=9) xbar[i]<-mean(temp) }

return(xbar)}

3.5 Covariance and Correlation

• Relationship between two random variables: covariance

• The covariance indicates how two variables “covary”• Positive covariance indicates a positive “covary” or

association• Negative covariance indicates a negative “covary” or

association• Zero covariance indicates no association (NOT

necessarily independence!!!)

More on Covariance

• We calculate covariance by E[(y1-1)(y2-2)].

• Look at graphs to discuss covariance (measure of LINEAR dependency)

• However, covariance depends on the scale of the two variables

• Correlation “standardizes” the covariance

• Correlation = cov(y1,y2)/(12) =

• Note that -1<<1

3.6 Estimation

• Since we do not know parameters, we estimate them with statistics!! If is the parameter of interest, then -hat is the estimator of . We want the following properties to hold:

1.E(-hat) = 2.V(-hat) = 2

(-hat) is small

Error of Estimations and Bounds

• The error of estimation is defined as |(-hat)-|

• Set a bound on this error of estimation (B) such that

P(|(-hat)-| < B) = 1-The value of B (bound) can be thought of as the

margin of error. In fact, this is how confidence intervals (when the sampling distribution of the statistics is normally distributed).

chaper 3

Documents