ap statistics chapters 0 & 1 review. variables fall into two main categories: a categorical, or...

Post on 11-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

AP StatisticsChapters 0 & 1 Review

Variables fall into two main categories: A categorical, or qualitative, variable places an individual into one of several groups or categories. A quantitative variable takes numeric values for which arithmetic operations make sense.

The distribution of a variable tells us what values the variable takes

on and how often it takes on those values.

Statistical inference involves drawing conclusions about a large

group, called the populationby gathering information from a

smaller subgroup, called the sample.

The main statistical designs for producing data are surveys, experiments and

observational studies.In an observational study, we observe individuals and measure variables of

interest but do not attempt to influence the responses.

In an experiment, we deliberately do something to individuals in order to

observe their response.

What two types of graphs are typically used for categorical variables?

What two types of graphs are typically used for quantitative variables?

GraphsBar and Charts Pie

histograms and plots Stem plots,Dot

Please know:

Cumulative frequency histogram

Relative frequency histogram

When you describe the distribution pay special attention to the … shape: overall pattern, symmetric or skewed. The length of the “tails” will tell us whether a graph (i.e. distribution) is left-skewed (left tail is the longest) or right-skewed (the right tail is the longest).

modes: the values that occur most often (i.e. peaks)

unimodal - one major peak, bimodal - two major peaks

Center: the middle The two most common measures of center are the mean and the median. Spread: how varied (i.e. spread out is the data The IQR and standard deviation are probably the two most common measures of spread. Outliers: any value(s) that fall outside the overall

pattern.

When you have to describe the shape of a distribution, don’t get mad,

C U S S E N P HN U R AT S E PE U A ER A D L

Measuring Center: The Mean & Median

To calculate the mean, add the values of the observations and divide by the number of

observations. The mean of a sample is denoted ,

pronounced x-bar.The mean of a population is denoted , the

Greek letter Mu.

x

Measuring Center: The MedianThe median (denoted by M) is the midpoint of a distribution: To calculate the median….

1. Order the observations from smallest to largest.

2. If the number of observations is odd, the median is simply the middle value in the list. You can find the location by counting (n+1)/2 observations from the bottom (or top).

3. If the number of observations is even, you should average the two middle numbers. The location of the median is again (n+1)/2 from the bottom or top of the list.

EXAMPLE:Consider the following set of numbers… 13, 25, 28, 36, 47

M= _______ =________

Now, consider adding a 6th number, say 104.

M= _______ =________

We say that the median is an outlier resistant measure of center, while the mean is not.

x

x

28 8.29

32 61.42

Mean versus MedianThe mean and median of a roughly symmetrical distribution will be close together. If the distribution is exactly symmetric, the mean and median are equal. In a skewed distribution, the mean is farther out in the long tail than the median. In a skewed distribution, the median is the more accurate measure of center.

In descriptions of data, the “average” value of a variable is usually referred to as the mean whereas the “typical” value is usually referred to as the median.

Measuring Spread: The QuartilesOne way to measure spread, or variability, is to

calculate the range, which is the difference between the largest and smallest observations.

Another way to describe the spread of a distribution is by considering different percentiles. The pth percentile of a distribution is the value that has p percent of the observations at or below it. The median is the 50%

percentile. The 25th percentile is called the 1st quartile while the 75th percentile is called the 3rd quartile.

The Five-Number Summary and Boxplots

The five-number summary of a set of observations consists of the smallest value, the 1st quartile, the median, the 3rd quartile and the

largest value.

The five-number summary can be presented visually by a boxplot.

The 1.5IQR Rule for Outliers

The distance between the 1st and 3rd quartiles is called the interquartile range, which is abbreviated IQR for

obvious reasons.

The quartiles and IQR are resistant to changes in either tail of a distribution.

****Since the median and the IQR are resistant to outliers, they should be used when describing a skewed

distribution.

We will call a data value a “suspected” outlier if it falls more than 1.5 x IQR above Q3 or below

Q1.

In a modified boxplot, the whiskers extend only to vlaues not “flagged” as outliers and asterisks

are used to denote any outliers.

Measuring Spread: The Standard DeviationThe standard deviation measures spread by determining how far each value is from the mean and then “averaging” these distances.

The standard deviation of a sample is denoted by s.

The standard deviation of a population is denoted , the Greek letter Sigma.

The following formula is used to compute the standard deviation of a sample.

The variance of a set of observations, , is simply the square of the standard

deviation.

2

1

1xx

ns i

22 or s

Properties of the Standard Deviation1. s measures spread about the mean and should be used only when the mean is used as the measure of center2. s = 0 only when there is no spread/variability (i.e. all the values are the same . Otherwise, s > 0. As the observations become more spread out about their mean, s gets greater.3. s, like the mean , is not resistant to outliers. A few outliers can make s very large. Distributions with outliers and strongly skewed distributions have very large standard deviations. As such, the number s does not give much helpful information about such distributions.

x

Choosing Measures of Center and Spread

The five number summary, in particular the median and the IQR, is usually better than the mean and standard deviation for describing a

skewed distribution or a distribution with strong outliers.

Use and s only for reasonably symmetric distributions that are free of outliers.x

Adding the same number, a, to each observation adds a to the measure of

center but does not affect the measure of spread.

Multiplying each observation by the same number, b, multiplies both the measures of

center and spread by b.

top related