Download - Basic Elements of Descritive Statistic
-
7/28/2019 Basic Elements of Descritive Statistic
1/61
What is Statistics?
Statistics is the science of describing ormaking inferences about the world from asample of data.
1
-
7/28/2019 Basic Elements of Descritive Statistic
2/61
Descriptive Inferential
Statistics
2
-
7/28/2019 Basic Elements of Descritive Statistic
3/61
Descriptive Statistics
Descriptive statistics are methods fororganizing and summarizing data.
For example, tables or graphs are used toorganize data, and descriptive values areused to summarize data.
3
-
7/28/2019 Basic Elements of Descritive Statistic
4/61
Inferential Statistics
Two main methods:1. Estimation
The sample statistic is used to estimate a
population parameter.
A confidence interval about the estimate is
constructed.
2. Hypothesis testing
A null hypothesis is put forward.
Analysis of the data is then used to
determine whether to reject it.4
-
7/28/2019 Basic Elements of Descritive Statistic
5/61
Definitions
A variable is a characteristic or conditionthat can change or take on different
values.
Datum is one observation about thevariable being measured.
Data are a collection of observations.
The goal of statistics is to help researchersorganize and interpret the data.
5
-
7/28/2019 Basic Elements of Descritive Statistic
6/61
TYPES OF DATA
VARIABLES
QUANTITATIVEQUALITATIVE
RATIO INTERVALORDINAL NOMINAL
Discrete Continuous
6
-
7/28/2019 Basic Elements of Descritive Statistic
7/61
Nominal or categorical data is data that comprises of
categories that cannotbe rank ordered each category is just
different.
The categories available cannot be placed in any order and nojudgement can be made about the relative size or distance from
one category to another.
What does this mean? No mathematical operations can be
performed on the data relative to each other.
Therefore, nominal data reflect qualitative differences rather
than quantitative ones.
Nominal data
7
-
7/28/2019 Basic Elements of Descritive Statistic
8/61
Examples:
Nominal data
What is your gender?
(please tick)
Male
Female
Did you enjoy the film?
(please tick)
Yes
No
8
-
7/28/2019 Basic Elements of Descritive Statistic
9/61
Systems for measuring nominal data must ensure that
each category is mutually exclusive and the system of
measurement needs to be exhaustive.
Variables that have only two responses i.e. Yes or No,
are known as dichotomies.
Nominal data
9
-
7/28/2019 Basic Elements of Descritive Statistic
10/61
Ordinal data is data that comprises of categories that
can be rank ordered.
Similarly with nominal data the distance between each
category cannot be calculated but the categories can be
ranked above or below each other.
What does this mean? Can make statistical judgements
and perform limited maths.
Ordinal data
10
-
7/28/2019 Basic Elements of Descritive Statistic
11/61
Example:
Ordinal data
How satisfied are you with the level of
service you have received? (please tick)
Very satisfied
Somewhat satisfied
Neutral
Somewhat dissatisfied
Very dissatisfied
11
-
7/28/2019 Basic Elements of Descritive Statistic
12/61
Both interval and ratio data are examples of scale data.
Scale data:
data is in numeric format (50, 100, 150)
data that can be measured on a continuous scale
the distance between each can be observed and
as a result measured the data can be placed in rank order.
Interval and ratio data
12
-
7/28/2019 Basic Elements of Descritive Statistic
13/61
Interval data measured on a continuous scale and has no
true zero point. Examples:
Time moves along a continuous measure or
seconds, minutes and so on and is without a zero
point of time.
Temperature moves along a continuous measure of
degrees and is without a true zero.
Interval data
13
-
7/28/2019 Basic Elements of Descritive Statistic
14/61
Ratio data measured on acontinuous
scale anddoes
have a true zeropoint. Examples:
Age
Weight
Height
Ratio data measured on a discrete scale and does have a true zero point.
Example:
Number of children
Ratio data
14
-
7/28/2019 Basic Elements of Descritive Statistic
15/61
These levels of measurement can be placed in hierarchical order.
Hierarchical data order
Ratio
Interval
Ordinal
Nominal
15
-
7/28/2019 Basic Elements of Descritive Statistic
16/61
Population
The entire group of individuals is called the
population.
Population
16
-
7/28/2019 Basic Elements of Descritive Statistic
17/61
Sample
Usually populations are so large that a
researcher cannot examine the entire group.Therefore, a sample is selected to representthe population in a research study. The goal
is to use the results obtained from thesample to help answer questions about thepopulation.
Population
Sample
17
-
7/28/2019 Basic Elements of Descritive Statistic
18/61
Why sample?
Measuring all units (trees, products, birds, etc.) is
impractical, if not impossible.
Sampling just a few units saves money.
Sampling just a few units saves time.
Some measurements are destructive:
cutting down trees to inspect ring patterns or stem analysiscapturing wildlife to examine their morphology, etc.
Sampling makes statistical methods attractive and powerful.
18
-
7/28/2019 Basic Elements of Descritive Statistic
19/61
19
-
7/28/2019 Basic Elements of Descritive Statistic
20/61
A descriptive value for a population iscalled a parameterand a descriptive
value for a sample is called astatistic.
Parameterversus Statistic
PopulationSample
ParameterStatistic
20
-
7/28/2019 Basic Elements of Descritive Statistic
21/61
21
Statistic tools
-
7/28/2019 Basic Elements of Descritive Statistic
22/61
Tables
One way frequency table
Number of passangers Frequency
2 2
4 23
5 41
6 18
7 88 1
For nominal, ordinal and discrete variables.
22
-
7/28/2019 Basic Elements of Descritive Statistic
23/61
Two way frequency table
Tables
Sex\ Hobby Dance Sports TV Total
Men 2 10 8 20
Women 16 6 8 30
Total 18 16 16 50
For nominal, ordinal and discrete variables.
23
-
7/28/2019 Basic Elements of Descritive Statistic
24/61
Frequency table
Tables
Age Frequency Percentage
10-14 2 5
15-19 16 40
20-24 18 45
25-29 37.5
30-34 1 2.5
Total 40 100
For quantitive variables.
24
-
7/28/2019 Basic Elements of Descritive Statistic
25/61
Graphs
Bar chart
Pie chart
Pictograms
HistogramDensity plot
Scatter plot
Time series plotBoxplot
25
-
7/28/2019 Basic Elements of Descritive Statistic
26/61
Graphs
For nominal, ordinal and discrete variables.
26
-
7/28/2019 Basic Elements of Descritive Statistic
27/61
Graphs
For nominal, ordinal and discrete variables.
27
-
7/28/2019 Basic Elements of Descritive Statistic
28/61
Graphs
Statistic pictograms
Do not recommended
28
-
7/28/2019 Basic Elements of Descritive Statistic
29/61
Graphs
Only for numerical variables
29
-
7/28/2019 Basic Elements of Descritive Statistic
30/61
Graphs
Only for numerical variables
30
-
7/28/2019 Basic Elements of Descritive Statistic
31/61
Graphs
Only for numerical variables
31
-
7/28/2019 Basic Elements of Descritive Statistic
32/61
Graphs examples on web
32
-
7/28/2019 Basic Elements of Descritive Statistic
33/61
Recommended book
http://www.laeditorialvirtual.com.ar/Pages2/Huff_Darrell/Huff_ComoMentirConEstadisticas.html#_Toc334380216
33
http://www.laeditorialvirtual.com.ar/Pages2/Huff_Darrell/Huff_ComoMentirConEstadisticas.htmlhttp://www.laeditorialvirtual.com.ar/Pages2/Huff_Darrell/Huff_ComoMentirConEstadisticas.htmlhttp://www.laeditorialvirtual.com.ar/Pages2/Huff_Darrell/Huff_ComoMentirConEstadisticas.html -
7/28/2019 Basic Elements of Descritive Statistic
34/61
A cartoon
34
-
7/28/2019 Basic Elements of Descritive Statistic
35/61
Recommended videos
http://www.youtube.com/watch?v=nUJNstRFvvo
http://www.youtube.com/watch?v=ETbc8GIhfHo
35
http://www.youtube.com/watch?v=nUJNstRFvvohttp://www.youtube.com/watch?v=ETbc8GIhfHohttp://www.youtube.com/watch?v=ETbc8GIhfHohttp://www.youtube.com/watch?v=ETbc8GIhfHohttp://www.youtube.com/watch?v=nUJNstRFvvohttp://www.youtube.com/watch?v=nUJNstRFvvohttp://www.youtube.com/watch?v=nUJNstRFvvo -
7/28/2019 Basic Elements of Descritive Statistic
36/61
A measure of central tendency is a value that represents a
typical, or central, entry of a data set. The three most
commonly used measures of central tendency are the mean,
the median, and the mode.
Measures of Central Tendency
36
-
7/28/2019 Basic Elements of Descritive Statistic
37/61
The mean of a data set is the sum of the data entries divided
by the number of entries.
Population mean:x
N
Sample mean:
xx
n
mu x-bar
Mean
37
-
7/28/2019 Basic Elements of Descritive Statistic
38/61
Calculate the population mean.
Mean
N
x
7
343
49 years
53 32 61 57 39 44 57
Example: the following are the ages of all seven
employees of a small company:
The mean age of the employees is 49 years.
Add the ages and divide by 7.
38
-
7/28/2019 Basic Elements of Descritive Statistic
39/61
Median
The median of a data set is the value that lies in the middleof the data when the data set is ordered. If the data set has
an odd number of entries, the median is the middle data
entry. If the data set has an even number of entries, the
median is the mean of the two middle data entries.
53 32 61 57 39 44 57To find the median, sort the data.
Example: calculate the median age of the seven employees.
32 39 44 53 57 57 61
The median age of the employees is 53 years.39
-
7/28/2019 Basic Elements of Descritive Statistic
40/61
The mode is 57 because it occurs the most times.
Mode
The mode of a data set is the data entry or category thatoccurs with the greatest frequency. If no entry is repeated,
the data set has no mode. If two entries occur with the same
greatest frequency, each entry is a mode and the data set is
called bimodal.
53 32 61 57 39 44 57
Example: find the mode of the ages of the seven employees.
An outlieris a datum that is far from the other in the data set.
40
-
7/28/2019 Basic Elements of Descritive Statistic
41/61
53 32 61 57 39 44 57 29
Recalculate the mean, the median, and the mode. Whichmeasure of central tendency was affected when this new
age was added?
Mean = 46.5
Example: A 29-year-old employee joins the company
and the ages of the employees are now:
Comparing the Mean, Median and Mode
Median = 48.5
Mode = 57
The mean takes every value into
account, but is affected by the outlier.
The median and mode are not
influenced by extreme values.
41
-
7/28/2019 Basic Elements of Descritive Statistic
42/61
Weighted Mean
A weighted mean is the mean of a data set whose entries
have varying weights. A weighted mean is given by
where wis the weight of each entryx.
( )x wx
w
Example: grades in a statistics class are weighted as
follows.
Tests are worth 50% of the grade, homework is worth 30% ofthe grade and the final is worth 20% of the grade. A student
receives a total of 80 points on tests, 100 points on
homework, and 85 points on his final. What is his current
grade?42
-
7/28/2019 Basic Elements of Descritive Statistic
43/61
Weighted Mean
Source Score,x Weight, w xw
Tests 80 0.50 40Homework 100 0.30 30
Final 85 0.20 17
The students current grade is 87%.
( )x wxw
87100
0.87
Begin by organizing the data in a table.
-
7/28/2019 Basic Elements of Descritive Statistic
44/61
Shapes of Distributions
A frequency distribution is symmetric when a vertical line can be drawnthrough the middle of a graph of the distribution and the resulting halves
are approximately the mirror images.
A frequency distribution is uniform (orrectangular) when all entries, or
classes, in the distribution have equal frequencies. A uniform distribution
is also symmetric.
A frequency distribution is skewed if the tail of the graph elongates more
to one side than to the other. A distribution is skewed left (negatively
skewed) if its tail extends to the left. A distribution is skewed right
(positivelyskewed) if its tail extends to the right.
-
7/28/2019 Basic Elements of Descritive Statistic
45/61
Mean > Median > ModeMean < Median < Mode
Summary of Shapes of Distributions
Mean = Median
45
-
7/28/2019 Basic Elements of Descritive Statistic
46/61
Measures of Variation
46
-
7/28/2019 Basic Elements of Descritive Statistic
47/61
The mean is a good indicator of the central tendency of a setof data, but it does not provide the whole picture about the
data set.
Example 1: comparison of the distribution of two data sets
Mean Median
Data set A: 5 6 7 8 9 7 7Data set B: 1 2 7 12 13 7 7
Note: Both the distributions have same mean and median, butbeyond that they are quite different. In the distribution A, 7 is afairly typical value but in distribution B, most of the values differquite a bit from 7. What is needed here is some measure ofthe dispersion or spread of the data. Following example willillustrate further the importance of measuring the variability in adata set.
47
-
7/28/2019 Basic Elements of Descritive Statistic
48/61
Example 2: Suppose that in a hospital, each patients pulserate is taken in the morning, at noon, and in the evening. On a
certain day, pulse rate for
Mean Median
Patient A: 72 76 74 74 74Patient B: 72 91 59 74 72
Note: Mean pulse rate is same for both the patients. While
patientAs pulse rate is stable, patient Bs fluctuates widely.
48
R
-
7/28/2019 Basic Elements of Descritive Statistic
49/61
Range
The range of a data set is the difference between themaximum and minimum date entries in the set.
Range = (Maximum data entry) (Minimum data entry)
Example:
The following data are the closing prices for a certain
stock on ten successive Fridays. Find the range.
Stock 56 56 57 58 61 63 63 67 67 67
The range is 67 56 = 11.
-
7/28/2019 Basic Elements of Descritive Statistic
50/61
Population Variance and Standard Deviation
The populationvariance of a population data set ofN
entries is
Population variance =
sigma squared
The populationstandard deviation of a population data set
ofNentries is the square root of the population variance.
Population standard deviation =
sigma
2 =( )2
=( )2
50
-
7/28/2019 Basic Elements of Descritive Statistic
51/61
Sample Variance and Standard Deviation
The samplevariance of a sample data set ofn entries is
Sample variance =
S squared
The samplestandard deviation of a sample data set ofn
entries is the square root of the sample variance.
Sample standard deviation =
S
2 =( )2
1
2 =( )2
1
51
-
7/28/2019 Basic Elements of Descritive Statistic
52/61
Interpreting Standard Deviation
When interpreting standard deviation, remember that is a
measure of the typical amount an entry deviates from the
mean. The more the entries are spread out, the greater the
standard deviation.
10
8
6
4
2
0
Data value
Frequency
1214
2 4 6
= 4
s = 1.18
x
10
8
6
4
2
0
Data value
Frequency
1214
2 4 6
= 4
s = 0
x
52
-
7/28/2019 Basic Elements of Descritive Statistic
53/61
Measures of Position
53
Quartiles
-
7/28/2019 Basic Elements of Descritive Statistic
54/61
Quartiles
The three quartiles, Q1, Q2, and Q3, approximately divide anordered data set into four equal parts.
Median
0 5025 10075
Q3Q2Q1
Q1 is the median of the
data below Q2.
Q3 is the median of
the data above Q2.
54
Finding Quartiles
-
7/28/2019 Basic Elements of Descritive Statistic
55/61
Finding Quartiles
Example:The quiz scores for 15 students is listed below. Find the first,
second and third quartiles of the scores.
28 43 48 51 43 30 55 44 48 33 45 37 37 42 38
Order the data.
28 30 33 37 37 38 42 43 43 44 45 48 48 51 55
Lower half Upper half
Q2Q1 Q3
About one fourth of the students scores 37 or less; about one half
score 43 or less; and about three fourths score 48 or less.
55
Interquartile Range
-
7/28/2019 Basic Elements of Descritive Statistic
56/61
Interquartile Range
The interquartile range (IQR) of a data set is the differencebetween the third and first quartiles.
Interquartile range (IQR) = Q3Q1.
Example:
The quartiles for 15 quiz scores are listed below. Find the
interquartile range.
(IQR) = Q3Q1
Q2 = 43 Q3 = 48Q1 = 37
= 48 37
= 11
The quiz scores in the middle
portion of the data set vary by at
most 11 points.
56
Box and Whisker Plot (boxplot)
-
7/28/2019 Basic Elements of Descritive Statistic
57/61
Box and Whisker Plot (boxplot)
A box-and-whisker plot is an exploratory data analysis tool
that highlights the important features of a data set.
The five-number summary is used to draw the graph.
The minimum entry
Q1 Q2 (median)
Q3 The maximum entry
Example:
Use the data from the 15 quiz scores to draw a box-and-
whisker plot.
Continued.
28 30 33 37 37 38 42 43 43 44 45 48 48 51 55
57
Box and Whisker Plot
-
7/28/2019 Basic Elements of Descritive Statistic
58/61
Box and Whisker Plot
Five-number summary
The minimum entry
Q1 Q2 (median)
Q3 The maximum entry
37
28
55
4348
40 44 48 52363228 56
28 37 43 48 55
Quiz Scores
58
Parts of a boxplot
-
7/28/2019 Basic Elements of Descritive Statistic
59/61
59
Parts of a boxplot
Percentiles and Deciles
-
7/28/2019 Basic Elements of Descritive Statistic
60/61
Percentiles and Deciles
Percentiles divide an ordered data set into 100 parts.There are 99 percentiles: P1, P2, P3P99.
Deciles divide an ordered data set into 10 parts. Thereare 9 deciles: D1, D2, D3D9.
Example: A test score at the 80th percentile (D8), indicatesthat the test score is greater than 80% of all other test scores
and less than or equal to 20% of the scores.
Design matrix
-
7/28/2019 Basic Elements of Descritive Statistic
61/61
Design matrix
Sex Age Smoke Country Married
Female 23 Yes USA Yes
Male 43 Yes Colombia Yes
Male 19 Not Brazil Yes
Male 23 Yes Brazil Not
Female 56 Not Canada Yes
Female 78 Yes USA Yes
Male 54 Not Spain Not
Male 76 Yes Colombia Not
Female 43 Not Peru Yes
5 Variables
10 Individuals
Dimension 10 x 5