ap statistics chapters 0 & 1 review. variables fall into two main categories: a categorical, or...
Post on 11-Jan-2016
212 Views
Preview:
TRANSCRIPT
AP StatisticsChapters 0 & 1 Review
Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of several groups or categories. A quantitative variable takes numeric values for which arithmetic operations make sense.
The distribution of a variable tells us what values the variable takes
on and how often it takes on those values.
Statistical inference involves drawing conclusions about a large
group, called the populationby gathering information from a
smaller subgroup, called the sample.
The main statistical designs for producing data are surveys, experiments and
observational studies.In an observational study, we observe individuals and measure variables of
interest but do not attempt to influence the responses.
In an experiment, we deliberately do something to individuals in order to
observe their response.
What two types of graphs are typically used for categorical variables?
What two types of graphs are typically used for quantitative variables?
GraphsBar and Charts Pie
histograms and plots Stem plots,Dot
Please know:
Cumulative frequency histogram
Relative frequency histogram
When you describe the distribution pay special attention to the … shape: overall pattern, symmetric or skewed. The length of the “tails” will tell us whether a graph (i.e. distribution) is left-skewed (left tail is the longest) or right-skewed (the right tail is the longest).
modes: the values that occur most often (i.e. peaks)
unimodal - one major peak, bimodal - two major peaks
Center: the middle The two most common measures of center are the mean and the median. Spread: how varied (i.e. spread out is the data The IQR and standard deviation are probably the two most common measures of spread. Outliers: any value(s) that fall outside the overall
pattern.
When you have to describe the shape of a distribution, don’t get mad,
C U S S E N P HN U R AT S E PE U A ER A D L
Measuring Center: The Mean & Median
To calculate the mean, add the values of the observations and divide by the number of
observations. The mean of a sample is denoted ,
pronounced x-bar.The mean of a population is denoted , the
Greek letter Mu.
x
Measuring Center: The MedianThe median (denoted by M) is the midpoint of a distribution: To calculate the median….
1. Order the observations from smallest to largest.
2. If the number of observations is odd, the median is simply the middle value in the list. You can find the location by counting (n+1)/2 observations from the bottom (or top).
3. If the number of observations is even, you should average the two middle numbers. The location of the median is again (n+1)/2 from the bottom or top of the list.
EXAMPLE:Consider the following set of numbers… 13, 25, 28, 36, 47
M= _______ =________
Now, consider adding a 6th number, say 104.
M= _______ =________
We say that the median is an outlier resistant measure of center, while the mean is not.
x
x
28 8.29
32 61.42
Mean versus MedianThe mean and median of a roughly symmetrical distribution will be close together. If the distribution is exactly symmetric, the mean and median are equal. In a skewed distribution, the mean is farther out in the long tail than the median. In a skewed distribution, the median is the more accurate measure of center.
In descriptions of data, the “average” value of a variable is usually referred to as the mean whereas the “typical” value is usually referred to as the median.
Measuring Spread: The QuartilesOne way to measure spread, or variability, is to
calculate the range, which is the difference between the largest and smallest observations.
Another way to describe the spread of a distribution is by considering different percentiles. The pth percentile of a distribution is the value that has p percent of the observations at or below it. The median is the 50%
percentile. The 25th percentile is called the 1st quartile while the 75th percentile is called the 3rd quartile.
The Five-Number Summary and Boxplots
The five-number summary of a set of observations consists of the smallest value, the 1st quartile, the median, the 3rd quartile and the
largest value.
The five-number summary can be presented visually by a boxplot.
The 1.5IQR Rule for Outliers
The distance between the 1st and 3rd quartiles is called the interquartile range, which is abbreviated IQR for
obvious reasons.
The quartiles and IQR are resistant to changes in either tail of a distribution.
****Since the median and the IQR are resistant to outliers, they should be used when describing a skewed
distribution.
We will call a data value a “suspected” outlier if it falls more than 1.5 x IQR above Q3 or below
Q1.
In a modified boxplot, the whiskers extend only to vlaues not “flagged” as outliers and asterisks
are used to denote any outliers.
Measuring Spread: The Standard DeviationThe standard deviation measures spread by determining how far each value is from the mean and then “averaging” these distances.
The standard deviation of a sample is denoted by s.
The standard deviation of a population is denoted , the Greek letter Sigma.
The following formula is used to compute the standard deviation of a sample.
The variance of a set of observations, , is simply the square of the standard
deviation.
2
1
1xx
ns i
22 or s
Properties of the Standard Deviation1. s measures spread about the mean and should be used only when the mean is used as the measure of center2. s = 0 only when there is no spread/variability (i.e. all the values are the same . Otherwise, s > 0. As the observations become more spread out about their mean, s gets greater.3. s, like the mean , is not resistant to outliers. A few outliers can make s very large. Distributions with outliers and strongly skewed distributions have very large standard deviations. As such, the number s does not give much helpful information about such distributions.
x
Choosing Measures of Center and Spread
The five number summary, in particular the median and the IQR, is usually better than the mean and standard deviation for describing a
skewed distribution or a distribution with strong outliers.
Use and s only for reasonably symmetric distributions that are free of outliers.x
Adding the same number, a, to each observation adds a to the measure of
center but does not affect the measure of spread.
Multiplying each observation by the same number, b, multiplies both the measures of
center and spread by b.
top related