sda 3e chapter 2 (2)
Post on 07-Apr-2018
216 Views
Preview:
TRANSCRIPT
-
8/4/2019 SDA 3E Chapter 2 (2)
1/40
2007 Pearson Education
Chapter 2: Displaying andSummarizing Data
Part 2: Descriptive Statistics
-
8/4/2019 SDA 3E Chapter 2 (2)
2/40
Excel SupportExcel statistical functions
Excel
Analysis Toolpak toolsExcel
PHStat tools and proceduresPHStat tools
-
8/4/2019 SDA 3E Chapter 2 (2)
3/40
Descriptive StatisticsFrequency distributions and histograms
Measures of central tendency
Measures of dispersion
-
8/4/2019 SDA 3E Chapter 2 (2)
4/40
Terminology and Notation
Parameter a measurable characteristic of apopulation: mis a parameter, x is not
xi represents the i th observationindicates the operation of addition
N is the size of the population; n is the size of
the samplef i is the number of observations in cell i of afrequency distribution
-
8/4/2019 SDA 3E Chapter 2 (2)
5/40
Frequency DistributionTabular summary showing the frequency of observations in each of several non-
overlapping (mutually exclusive) classes, orcellsRelative frequency fraction or proportion of observations that fall within a cellCumulative frequency proportion orpercentage of observations that fall below theupper limit of a cell
-
8/4/2019 SDA 3E Chapter 2 (2)
6/40
Home Runs - 2004 Baseball
Season
-
8/4/2019 SDA 3E Chapter 2 (2)
7/40
HistogramColumn chart representing a frequencydistribution
-
8/4/2019 SDA 3E Chapter 2 (2)
8/40
Excel Tool: HistogramExcel Menu > Tools > Data Analysis > Histogram
Specify range of data
Define and specify
bin range(recommended)
Select outputoptions (always
check Chart Output
-
8/4/2019 SDA 3E Chapter 2 (2)
9/40
Good Practice GuidelinesCell intervals should be of equal width.Choose the width using the formula
(largest value smallest value)/number of cells but round to reasonable values(e.g., 97 to 100)
Choose somewhere between 5 to 15 cellsto provide a useful picture of the data
-
8/4/2019 SDA 3E Chapter 2 (2)
10/40
Excel Frequency FunctionDefine binsSelect a range of cells adjacent to the bin
range (if continuous data, add one empty cellbelow this range as an overflow cell)Enter the formula =FREQUENCY( range of data, range of bins ) and press Ctrl-Shift-Enter simultaneously .Construct a histogram using the Chart Wizard for a column chart.
-
8/4/2019 SDA 3E Chapter 2 (2)
11/40
Arithmetic MeanPopulation
Sample
Excel function AVERAGE( range )
N
x N
ii
1m
x
x
n
i
i
n
1
-
8/4/2019 SDA 3E Chapter 2 (2)
12/40
Example: Team Home Runs
Mean = 2605/14 = 186.1
-
8/4/2019 SDA 3E Chapter 2 (2)
13/40
Properties of the MeanMeaningful for interval and ratio data
All data used in the calculation
Unique for every set of data Affected by unusually large or smallobservations ( outliers )The only measure of central tendency wherethe sum of the deviations of each value fromthe measure is zero; i.e.,
(xi x ) = 0
-
8/4/2019 SDA 3E Chapter 2 (2)
14/40
MedianMiddle value when data are ordered fromsmallest to largest. This results in an equal
number of observations above the median asbelow it.Unique for each set of dataNot affected by extremes
Meaningful for ratio, interval, and ordinal dataExcel function MEDIAN( range )
-
8/4/2019 SDA 3E Chapter 2 (2)
15/40
ModeObservation that occurs most frequently; forgrouped data, the midpoint of the cell withthe largest frequency (approximate value)
Useful when data consist of a small number of unique values
-
8/4/2019 SDA 3E Chapter 2 (2)
16/40
Bimodal Distribution
-
8/4/2019 SDA 3E Chapter 2 (2)
17/40
Midrange Average of the largest and smallestobservations
Useful for very small samples, but extremevalues can distort the result
-
8/4/2019 SDA 3E Chapter 2 (2)
18/40
Measures of DispersionDispersion the degree of variation inthe data. E.g., {48, 49, 50, 51, 52} and{10, 30, 50, 70, 90}Range difference between themaximum and minimum observations
Same issues as with midrange
-
8/4/2019 SDA 3E Chapter 2 (2)
19/40
VariancePopulation
Sample
N
x N
ii
1
2
2
m
1
1
2
2
n
x x
s
n
ii
-
8/4/2019 SDA 3E Chapter 2 (2)
20/40
Calculations
Variance = 17534.93/14 = 1252.49
Excel functions: VAR, VARP, STDEV, STDEVP
-
8/4/2019 SDA 3E Chapter 2 (2)
21/40
Standard DeviationPopulation
Sample
The standard deviation has the same units of measurement as the original data, unlike thevariance
N
x N
ii
1
2m
1
1
2
n
x xs
n
ii
-
8/4/2019 SDA 3E Chapter 2 (2)
22/40
Chebyshevs Theorem For any set of data, the proportion of valuesthat lie within k standard deviations of the
mean is at least 1 1/k 2, for any k > 1For k = 2, at least of the data lie within 2standard deviations of the meanFor k = 3, at least 8/9, or 89% lie within 3
standard deviations of the meanFor k = 10, at least 99/100, or 99% of the data liewithin 10 standard deviations of the mean
-
8/4/2019 SDA 3E Chapter 2 (2)
23/40
Example
Mean = 28.87; standard deviation = 21.92
3 about the mean: [-36.9, 94.6]
2 about the mean: [-15.0, 72.7]
-
8/4/2019 SDA 3E Chapter 2 (2)
24/40
Grouped Data: Calculation of
MeanSample
Population
In a frequency distribution, replace x i with a representative value (e.g.,midpoint)
n
x f x
n
iii
1
N
x f N
iii
1m
-
8/4/2019 SDA 3E Chapter 2 (2)
25/40
Grouped Data: Calculation of
VarianceSample
Population
-
8/4/2019 SDA 3E Chapter 2 (2)
26/40
Coefficient of VariationCV = Standard Deviation / Mean
CV is dimensionless, and therefore is useful when
comparing data sets that are scaled differently.
-
8/4/2019 SDA 3E Chapter 2 (2)
27/40
SkewnessCoefficient of skewness (CS)
-0.5 < CS < 0.5 indicates relativesymmetry
Relatively Symmetric Positively skewed
-
8/4/2019 SDA 3E Chapter 2 (2)
28/40
Excel Tool: Descriptive
StatisticsExcel menu > Tools > Data Analysis > Descriptive Statistics
-
8/4/2019 SDA 3E Chapter 2 (2)
29/40
Data Profiles (Fractiles)Describe the location and spread of data overits range
Quartiles a division of a data set into four equalparts; shows the points below which 25%, 50%,75% and 100% of the observations lie (25% isthe first quartile, 75% is the third quartile, etc.)Deciles a division of a data set into 10 equal
parts; shows the points below which 10%, 20%,etc. of the observations liePercentiles a division of a data set into 100equal parts; shows the points below which k percent of the observations lie
-
8/4/2019 SDA 3E Chapter 2 (2)
30/40
ProportionFraction of data that has a certaincharacteristicUse the Excel function COUNTIF( data range, criteria ) to count observationsmeeting a criterion to compute
proportions.
-
8/4/2019 SDA 3E Chapter 2 (2)
31/40
Box and Whisker PlotsDisplay minimum, first quartile (Q 1), median,third quartile (Q 3), and maximum values
graphically
min 1 st quartile median 3 rd quartile max
-
8/4/2019 SDA 3E Chapter 2 (2)
32/40
PHStat Tool: Box and Whisker
PlotPHStat menu > Descriptive Statistics > Box and Whisker Plot
Enter data range
Choose type of data set
Check box for fivenumber summary
-
8/4/2019 SDA 3E Chapter 2 (2)
33/40
Stem and Leaf DisplayEach number is divided into two parts:x y x = stem, and y = leaf Stem = cell; leaf = value within cell
11 37
12 46
Stem and leaf display aggregates and sortsall leaves within the same stem:
Number Stem Leaf
117 11 7
113 11 3
124 12 4
125 12 6
-
8/4/2019 SDA 3E Chapter 2 (2)
34/40
Stem and Leaf Stem unit is a power of 10; the higher thestem unit, the more aggregation of data
-
8/4/2019 SDA 3E Chapter 2 (2)
35/40
PHStat Tool: Stem and Leaf
DisplayPHStat menu > Descriptive Statistics > Stem and Leaf Display
Enter data range
Select stem unit orautocalculation
Check SummaryStatistics box
-
8/4/2019 SDA 3E Chapter 2 (2)
36/40
Dot Scale DiagramPHStat menu > Descriptive Statistics > Dot Scale Diagram
-
8/4/2019 SDA 3E Chapter 2 (2)
37/40
Statistical RelationshipsCorrelation a measure of strength of linearrelationship between two variablesCorrelation coefficient
Covariance average of the products of thedeviations of each variable from its mean;
describes how two variables move together
Sample correlation coefficient
y x y x
Y X
),cov(,
N
y xY X
N
i yi xi
1),cov(m m
y x
n
iii
ssn
y y x xr
)1(
1
-
8/4/2019 SDA 3E Chapter 2 (2)
38/40
Examples of Correlation
Negative correlation
Positive correlation
No
correlation
-
8/4/2019 SDA 3E Chapter 2 (2)
39/40
Excel Tool: CorrelationExcel menu > Tools > Data Analysis >Correlation
-
8/4/2019 SDA 3E Chapter 2 (2)
40/40
Correlation Tool Results
top related