lecture 4: measure of dispersion - university of new...
TRANSCRIPT
Lecture 4 Measure of Dispersion
Donglei Du(dduunbedu)
Faculty of Business Administration University of New Brunswick NB Canada FrederictonE3B 9Y2
Donglei Du (UNB) ADM 2623 Business Statistics 1 59
Table of contents1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 2 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 3 59
Introduction
Dispersion (aka variability scatter or spread)) characterizes howstretched or squeezed of the data
A measure of statistical dispersion is a nonnegative real number thatis zero if all the data are the same and increases as the data becomemore diverse
Dispersion is contrasted with location or central tendency andtogether they are the most used properties of distributions
There are many types of dispersion measures
RangeMean Absolute DeviationVarianceStandard Deviation
Donglei Du (UNB) ADM 2623 Business Statistics 4 59
Range
Range = maxminusmin
Donglei Du (UNB) ADM 2623 Business Statistics 5 59
Example
Example Find the range of the following data 10 7 6 2 5 4 8 1
Range = 10minus 1 = 9
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtrange(sec_01A)
[1] 130 189
Range = 189minus 130 = 59
Donglei Du (UNB) ADM 2623 Business Statistics 6 59
Properties of Range
Only two values are used in its calculation
It is influenced by an extreme value (non-robust)
It is easy to compute and understand
Donglei Du (UNB) ADM 2623 Business Statistics 7 59
Mean Absolute Deviation
The Mean Absolute Deviation of a set of n numbers
MAD =|x1 minus micro|+ + |xn minus micro|
n
|x ‐ ||x1 ‐ |
|x ‐ |
|x3 ‐ |
|x4 ‐ |
x1 x4x2 x3
|x2 |
Donglei Du (UNB) ADM 2623 Business Statistics 8 59
Example
Example A sample of four executives received the following bonuseslast year ($000) 140 150 170 160
Problem Determine the MAD
Solution
x =14 + 15 + 17 + 16
4=
62
4= 155
MAD =|14minus 155|+ |15minus 155|+ |17minus 155|+ |16minus 155|
4
=4
4= 1
Donglei Du (UNB) ADM 2623 Business Statistics 9 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtmad(sec_01A center=mean(sec_01A)constant = 1)
[1] 65
Donglei Du (UNB) ADM 2623 Business Statistics 10 59
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Table of contents1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 2 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 3 59
Introduction
Dispersion (aka variability scatter or spread)) characterizes howstretched or squeezed of the data
A measure of statistical dispersion is a nonnegative real number thatis zero if all the data are the same and increases as the data becomemore diverse
Dispersion is contrasted with location or central tendency andtogether they are the most used properties of distributions
There are many types of dispersion measures
RangeMean Absolute DeviationVarianceStandard Deviation
Donglei Du (UNB) ADM 2623 Business Statistics 4 59
Range
Range = maxminusmin
Donglei Du (UNB) ADM 2623 Business Statistics 5 59
Example
Example Find the range of the following data 10 7 6 2 5 4 8 1
Range = 10minus 1 = 9
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtrange(sec_01A)
[1] 130 189
Range = 189minus 130 = 59
Donglei Du (UNB) ADM 2623 Business Statistics 6 59
Properties of Range
Only two values are used in its calculation
It is influenced by an extreme value (non-robust)
It is easy to compute and understand
Donglei Du (UNB) ADM 2623 Business Statistics 7 59
Mean Absolute Deviation
The Mean Absolute Deviation of a set of n numbers
MAD =|x1 minus micro|+ + |xn minus micro|
n
|x ‐ ||x1 ‐ |
|x ‐ |
|x3 ‐ |
|x4 ‐ |
x1 x4x2 x3
|x2 |
Donglei Du (UNB) ADM 2623 Business Statistics 8 59
Example
Example A sample of four executives received the following bonuseslast year ($000) 140 150 170 160
Problem Determine the MAD
Solution
x =14 + 15 + 17 + 16
4=
62
4= 155
MAD =|14minus 155|+ |15minus 155|+ |17minus 155|+ |16minus 155|
4
=4
4= 1
Donglei Du (UNB) ADM 2623 Business Statistics 9 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtmad(sec_01A center=mean(sec_01A)constant = 1)
[1] 65
Donglei Du (UNB) ADM 2623 Business Statistics 10 59
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 3 59
Introduction
Dispersion (aka variability scatter or spread)) characterizes howstretched or squeezed of the data
A measure of statistical dispersion is a nonnegative real number thatis zero if all the data are the same and increases as the data becomemore diverse
Dispersion is contrasted with location or central tendency andtogether they are the most used properties of distributions
There are many types of dispersion measures
RangeMean Absolute DeviationVarianceStandard Deviation
Donglei Du (UNB) ADM 2623 Business Statistics 4 59
Range
Range = maxminusmin
Donglei Du (UNB) ADM 2623 Business Statistics 5 59
Example
Example Find the range of the following data 10 7 6 2 5 4 8 1
Range = 10minus 1 = 9
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtrange(sec_01A)
[1] 130 189
Range = 189minus 130 = 59
Donglei Du (UNB) ADM 2623 Business Statistics 6 59
Properties of Range
Only two values are used in its calculation
It is influenced by an extreme value (non-robust)
It is easy to compute and understand
Donglei Du (UNB) ADM 2623 Business Statistics 7 59
Mean Absolute Deviation
The Mean Absolute Deviation of a set of n numbers
MAD =|x1 minus micro|+ + |xn minus micro|
n
|x ‐ ||x1 ‐ |
|x ‐ |
|x3 ‐ |
|x4 ‐ |
x1 x4x2 x3
|x2 |
Donglei Du (UNB) ADM 2623 Business Statistics 8 59
Example
Example A sample of four executives received the following bonuseslast year ($000) 140 150 170 160
Problem Determine the MAD
Solution
x =14 + 15 + 17 + 16
4=
62
4= 155
MAD =|14minus 155|+ |15minus 155|+ |17minus 155|+ |16minus 155|
4
=4
4= 1
Donglei Du (UNB) ADM 2623 Business Statistics 9 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtmad(sec_01A center=mean(sec_01A)constant = 1)
[1] 65
Donglei Du (UNB) ADM 2623 Business Statistics 10 59
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Introduction
Dispersion (aka variability scatter or spread)) characterizes howstretched or squeezed of the data
A measure of statistical dispersion is a nonnegative real number thatis zero if all the data are the same and increases as the data becomemore diverse
Dispersion is contrasted with location or central tendency andtogether they are the most used properties of distributions
There are many types of dispersion measures
RangeMean Absolute DeviationVarianceStandard Deviation
Donglei Du (UNB) ADM 2623 Business Statistics 4 59
Range
Range = maxminusmin
Donglei Du (UNB) ADM 2623 Business Statistics 5 59
Example
Example Find the range of the following data 10 7 6 2 5 4 8 1
Range = 10minus 1 = 9
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtrange(sec_01A)
[1] 130 189
Range = 189minus 130 = 59
Donglei Du (UNB) ADM 2623 Business Statistics 6 59
Properties of Range
Only two values are used in its calculation
It is influenced by an extreme value (non-robust)
It is easy to compute and understand
Donglei Du (UNB) ADM 2623 Business Statistics 7 59
Mean Absolute Deviation
The Mean Absolute Deviation of a set of n numbers
MAD =|x1 minus micro|+ + |xn minus micro|
n
|x ‐ ||x1 ‐ |
|x ‐ |
|x3 ‐ |
|x4 ‐ |
x1 x4x2 x3
|x2 |
Donglei Du (UNB) ADM 2623 Business Statistics 8 59
Example
Example A sample of four executives received the following bonuseslast year ($000) 140 150 170 160
Problem Determine the MAD
Solution
x =14 + 15 + 17 + 16
4=
62
4= 155
MAD =|14minus 155|+ |15minus 155|+ |17minus 155|+ |16minus 155|
4
=4
4= 1
Donglei Du (UNB) ADM 2623 Business Statistics 9 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtmad(sec_01A center=mean(sec_01A)constant = 1)
[1] 65
Donglei Du (UNB) ADM 2623 Business Statistics 10 59
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Range
Range = maxminusmin
Donglei Du (UNB) ADM 2623 Business Statistics 5 59
Example
Example Find the range of the following data 10 7 6 2 5 4 8 1
Range = 10minus 1 = 9
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtrange(sec_01A)
[1] 130 189
Range = 189minus 130 = 59
Donglei Du (UNB) ADM 2623 Business Statistics 6 59
Properties of Range
Only two values are used in its calculation
It is influenced by an extreme value (non-robust)
It is easy to compute and understand
Donglei Du (UNB) ADM 2623 Business Statistics 7 59
Mean Absolute Deviation
The Mean Absolute Deviation of a set of n numbers
MAD =|x1 minus micro|+ + |xn minus micro|
n
|x ‐ ||x1 ‐ |
|x ‐ |
|x3 ‐ |
|x4 ‐ |
x1 x4x2 x3
|x2 |
Donglei Du (UNB) ADM 2623 Business Statistics 8 59
Example
Example A sample of four executives received the following bonuseslast year ($000) 140 150 170 160
Problem Determine the MAD
Solution
x =14 + 15 + 17 + 16
4=
62
4= 155
MAD =|14minus 155|+ |15minus 155|+ |17minus 155|+ |16minus 155|
4
=4
4= 1
Donglei Du (UNB) ADM 2623 Business Statistics 9 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtmad(sec_01A center=mean(sec_01A)constant = 1)
[1] 65
Donglei Du (UNB) ADM 2623 Business Statistics 10 59
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Example Find the range of the following data 10 7 6 2 5 4 8 1
Range = 10minus 1 = 9
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtrange(sec_01A)
[1] 130 189
Range = 189minus 130 = 59
Donglei Du (UNB) ADM 2623 Business Statistics 6 59
Properties of Range
Only two values are used in its calculation
It is influenced by an extreme value (non-robust)
It is easy to compute and understand
Donglei Du (UNB) ADM 2623 Business Statistics 7 59
Mean Absolute Deviation
The Mean Absolute Deviation of a set of n numbers
MAD =|x1 minus micro|+ + |xn minus micro|
n
|x ‐ ||x1 ‐ |
|x ‐ |
|x3 ‐ |
|x4 ‐ |
x1 x4x2 x3
|x2 |
Donglei Du (UNB) ADM 2623 Business Statistics 8 59
Example
Example A sample of four executives received the following bonuseslast year ($000) 140 150 170 160
Problem Determine the MAD
Solution
x =14 + 15 + 17 + 16
4=
62
4= 155
MAD =|14minus 155|+ |15minus 155|+ |17minus 155|+ |16minus 155|
4
=4
4= 1
Donglei Du (UNB) ADM 2623 Business Statistics 9 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtmad(sec_01A center=mean(sec_01A)constant = 1)
[1] 65
Donglei Du (UNB) ADM 2623 Business Statistics 10 59
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Properties of Range
Only two values are used in its calculation
It is influenced by an extreme value (non-robust)
It is easy to compute and understand
Donglei Du (UNB) ADM 2623 Business Statistics 7 59
Mean Absolute Deviation
The Mean Absolute Deviation of a set of n numbers
MAD =|x1 minus micro|+ + |xn minus micro|
n
|x ‐ ||x1 ‐ |
|x ‐ |
|x3 ‐ |
|x4 ‐ |
x1 x4x2 x3
|x2 |
Donglei Du (UNB) ADM 2623 Business Statistics 8 59
Example
Example A sample of four executives received the following bonuseslast year ($000) 140 150 170 160
Problem Determine the MAD
Solution
x =14 + 15 + 17 + 16
4=
62
4= 155
MAD =|14minus 155|+ |15minus 155|+ |17minus 155|+ |16minus 155|
4
=4
4= 1
Donglei Du (UNB) ADM 2623 Business Statistics 9 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtmad(sec_01A center=mean(sec_01A)constant = 1)
[1] 65
Donglei Du (UNB) ADM 2623 Business Statistics 10 59
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Mean Absolute Deviation
The Mean Absolute Deviation of a set of n numbers
MAD =|x1 minus micro|+ + |xn minus micro|
n
|x ‐ ||x1 ‐ |
|x ‐ |
|x3 ‐ |
|x4 ‐ |
x1 x4x2 x3
|x2 |
Donglei Du (UNB) ADM 2623 Business Statistics 8 59
Example
Example A sample of four executives received the following bonuseslast year ($000) 140 150 170 160
Problem Determine the MAD
Solution
x =14 + 15 + 17 + 16
4=
62
4= 155
MAD =|14minus 155|+ |15minus 155|+ |17minus 155|+ |16minus 155|
4
=4
4= 1
Donglei Du (UNB) ADM 2623 Business Statistics 9 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtmad(sec_01A center=mean(sec_01A)constant = 1)
[1] 65
Donglei Du (UNB) ADM 2623 Business Statistics 10 59
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Example A sample of four executives received the following bonuseslast year ($000) 140 150 170 160
Problem Determine the MAD
Solution
x =14 + 15 + 17 + 16
4=
62
4= 155
MAD =|14minus 155|+ |15minus 155|+ |17minus 155|+ |16minus 155|
4
=4
4= 1
Donglei Du (UNB) ADM 2623 Business Statistics 9 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtmad(sec_01A center=mean(sec_01A)constant = 1)
[1] 65
Donglei Du (UNB) ADM 2623 Business Statistics 10 59
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
gtmad(sec_01A center=mean(sec_01A)constant = 1)
[1] 65
Donglei Du (UNB) ADM 2623 Business Statistics 10 59
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Properties of MAD
All values are used in the calculation
It is not unduly influenced by large or small values (robust)
The absolute values are difficult to manipulate
Donglei Du (UNB) ADM 2623 Business Statistics 11 59
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Variance
The variance of a set of n numbers as population
Var = σ2 =(x1 minus micro)2 + + (xn minus micro)2
n(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
n(computational formula)
The variance of a set of n numbers as sample
S2 =(x1 minus micro)2 + + (xn minus micro)2
nminus 1(conceptual formula)
=
nsumi=1
x2i minus
(nsum
i=1xi
)2
n
nminus 1(computational formula)
Donglei Du (UNB) ADM 2623 Business Statistics 12 59
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Standard Deviation
The Standard Deviation is the square root of variance
Other notations σ for population and S for sample
Donglei Du (UNB) ADM 2623 Business Statistics 13 59
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example Conceptual formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the mean variance and Standard Deviation
Solution Mean and variance
micro =10 + 11 + 13
3=
34
3asymp 1133333333333
σ2 asymp (10minus 1133)2 + (11minus 1133)2 + (13minus 1133)2
3
=1769 + 01089 + 27889
3=
46668
3= 15556
Standard Deviationσ asymp 1247237
Donglei Du (UNB) ADM 2623 Business Statistics 14 59
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example Computational formula
Example The hourly wages earned by three students are $10 $11$13
Problem Find the variance and Standard Deviation
Solution
Variance
σ2 =(102 + 112 + 132)minus (10+11+13)2
3
3
=390minus 1156
3
3=
390minus 38533
3=
467
3= 1555667
Standard Deviationσ asymp 1247665
If the above is sample then σ2 asymp 233335 and σ asymp 1527531
Donglei Du (UNB) ADM 2623 Business Statistics 15 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Sample Variance
gtvar(sec_01A)
[1] 1463228
if you want the population variance
gtvar(sec_01A)(length(sec_01A)-1)(length(sec_01A))
[1] 1439628
Sample standard deviation
gtsd(sec_01A)
[1] 120964
if you want the population SD
gtsd(sec_01A)sqrt((length(sec_01A)-1)(length(sec_01A)))
[1] 1199845
Donglei Du (UNB) ADM 2623 Business Statistics 16 59
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Note
Conceptual formula may have accumulated rounding error
Computational formula only has rounding error towards the end
Donglei Du (UNB) ADM 2623 Business Statistics 17 59
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Properties of Variancestandard deviation
All values are used in the calculation
It is not extremely influenced by outliers (non-robust)
The units of variance are awkward the square of the original unitsTherefore standard deviation is more natural since it recovers heoriginal units
Donglei Du (UNB) ADM 2623 Business Statistics 18 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 19 59
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
VarianceStandard Deviation for Grouped Data
The variance of a sample of data organized in a frequency distributionis computed by the following formula
S2 =
ksumi=1
fix2i minus
(ksumi1fixi
)2
n
nminus 1
where fi is the class frequency and xi is the class midpoint for Classi = 1 k
Donglei Du (UNB) ADM 2623 Business Statistics 20 59
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Example Recall the weight example from Chapter 2
class freq (fi) mid point (xi) fixi fix2i
[130 140) 3 135 405 54675
[140 150) 12 145 1740 252300
[150 160) 23 155 3565 552575
[160 170) 14 165 2310 381150
[170 180) 6 175 1050 183750
[180 190] 4 185 740 136900
62 9810 1561350
Solution The VarianceStandard Deviation are
S2 =1 561 350minus 98102
62
62minus 1asymp 1500793
S asymp 1225069
The real sample varianceSD for the raw data is 1463228120964
Donglei Du (UNB) ADM 2623 Business Statistics 21 59
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Range for grouped data
The range of a sample of data organized in a frequency distribution iscomputed by the following formula
Range = upper limit of the last class - lower limit of the first class
Donglei Du (UNB) ADM 2623 Business Statistics 22 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 23 59
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Coefficient of Variation (CV)
The Coefficient of Variation (CV) for a data set defined as the ratioof the standard deviation to the mean
CV =
σmicro for populationSx for sample
It is also known as unitized risk or the variation coefficient
The absolute value of the CV is sometimes known as relative standarddeviation (RSD) which is expressed as a percentage
It shows the extent of variability in relation to mean of the population
It is a normalized measure of dispersion of a probability distribution orfrequency distributionCV is closely related to some important concepts in other disciplines
The reciprocal of CV is related to the Sharpe ratio in FinanceWilliam Forsyth Sharpe is the winner of the 1990 Nobel Memorial Prizein Economic Sciences
The reciprocal of CV is one definition of the signal-to-noise ratio inimage processing
Donglei Du (UNB) ADM 2623 Business Statistics 24 59
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Example Rates of return over the past 6 years for two mutual fundsare shown below
Fund A 83 -60 189 -57 236 20
Fund B 12 -48 64 102 253 14
Problem Which fund has higher riskSolution The CVrsquos are given as
CVA =1319
985= 134
CVB =1029
842= 122
There is more variability in Fund A as compared to Fund BTherefore Fund A is riskierAlternatively if we look at the reciprocals of CVs which are theSharpe ratios Then we always prefer the fund with higher Sharperatio which says that for every unit of risk taken how much returnyou can obtainDonglei Du (UNB) ADM 2623 Business Statistics 25 59
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Properties of Coefficient of Variation
All values are used in the calculation
It is only defined for ratio level of data
The actual value of the CV is independent of the unit in which themeasurement has been taken so it is a dimensionless number
For comparison between data sets with different units orwidely different means one should use the coefficient of variationinstead of the standard deviation
Donglei Du (UNB) ADM 2623 Business Statistics 26 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 27 59
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Coefficient of Skewness
Skewness is a measure of the extent to which a probability distributionof a real-valued random variable rdquoleansrdquo to one side of the mean Theskewness value can be positive or negative or even undefined
The Coefficient of Skewness for a data set
Skew = E[(Xminusmicro
σ
)3 ]=micro3
σ3
where micro3 is the third moment about the mean micro σ is the standarddeviation and E is the expectation operator
Donglei Du (UNB) ADM 2623 Business Statistics 28 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
Skewness
gtskewness(sec_01A)
[1] 04267608
Donglei Du (UNB) ADM 2623 Business Statistics 29 59
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Skewness Risk
Skewness risk is the risk that results when observations are not spreadsymmetrically around an average value but instead have a skeweddistribution
As a result the mean and the median can be different
Skewness risk can arise in any quantitative model that assumes asymmetric distribution (such as the normal distribution) but is appliedto skewed data
Ignoring skewness risk by assuming that variables are symmetricallydistributed when they are not will cause any model to understate therisk of variables with high skewness
Reference Mandelbrot Benoit B and Hudson Richard L The(mis)behaviour of markets a fractal view of risk ruin and reward2004
Donglei Du (UNB) ADM 2623 Business Statistics 30 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 31 59
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Coefficient of Excess Kurtosis Kurt
Kurtosis (meaning curved arching) is a measure of the rdquopeakednessrdquoof the probability distribution
The Excess Kurtosis for a data set
Kurt =E[(X minus micro)4]
(E[(X minus micro)2])2minus 3 =
micro4
σ4minus 3
where micro4 is the fourth moment about the mean and σ is the standarddeviationThe Coefficient of excess kurtosis provides a comparison of the shapeof a given distribution to that of the normal distribution
Negative excess kurtosis platykurticsub-Gaussian A low kurtosisimplies a more rounded peak and shorterthinner tails such asuniform Bernoulli distributionsPositive excess kurtosis leptokurticsuper-Gaussian A high kurtosisimplies a sharper peak and longerfatter tails such as the Studentrsquost-distribution exponential Poisson and the logistic distributionZero excess kurtosis are called mesokurtic or mesokurtotic such asNormal distribution
Donglei Du (UNB) ADM 2623 Business Statistics 32 59
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Example the weight example (weightcsv)
The R code
gtweight lt- readcsv(weightcsv)
gtsec_01Alt-weight$Weight01A2013Fall
kurtosis
gtkurtosis(sec_01A)
[1] -006660501
Donglei Du (UNB) ADM 2623 Business Statistics 33 59
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Kurtosis Risk
Kurtosis risk is the risk that results when a statistical model assumesthe normal distribution but is applied to observations that do notcluster as much near the average but rather have more of a tendencyto populate the extremes either far above or far below the average
Kurtosis risk is commonly referred to as rdquofat tailrdquo risk The rdquofat tailrdquometaphor explicitly describes the situation of having moreobservations at either extreme than the tails of the normaldistribution would suggest therefore the tails are rdquofatterrdquo
Ignoring kurtosis risk will cause any model to understate the risk ofvariables with high kurtosis
Donglei Du (UNB) ADM 2623 Business Statistics 34 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 35 59
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Chebyshevrsquos Theorem
Given a data set with mean micro and standard deviation σ for anyconstant k the minimum proportion of the values that lie within kstandard deviations of the mean is at least
1minus 1
k2
2
11k
+k‐k
kk
+k k
Donglei Du (UNB) ADM 2623 Business Statistics 36 59
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Given a data set of five values 140 150 170 160 150
Calculate mean and standard deviation micro = 154 and σ asymp 10198
For k = 11 there should be at least
1minus 1
112asymp 17
within the interval
[microminus kσ micro+ kσ] = [154minus 11(10198) 154 + 11(10198)]
= [1428 1652]
Let us count the real percentage Among the five three of which(namely 150 150 160) are in the above interval Therefore thereare three out of five ie 60 percent which is greater than 17percent So the theorem is correct though not very accurate
Donglei Du (UNB) ADM 2623 Business Statistics 37 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 38 59
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Example
Example The average of the marks of 65 students in a test was704 The standard deviation was 24
Problem 1 About what percent of studentsrsquo marks were between656 and 752
Solution Based on Chebychevrsquos Theorem 75 since656 = 704minus (2)(24) and 752 = 704 + (2)(24)
Problem 2 About what percent of studentsrsquo marks were between632 and 776
Solution Based on Chebychevrsquos Theorem 889 since632 = 704minus (3)(24) and 776 = 704 + (3)(24)
Donglei Du (UNB) ADM 2623 Business Statistics 39 59
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
The Empirical rule
We can obtain better estimation if we know more
For any symmetrical bell-shaped distribution (ie Normaldistribution)
About 68 of the observations will lie within 1 standard deviation ofthe meanAbout 95 of the observations will lie within 2 standard deviation ofthe meanVirtually all the observations (997) will be within 3 standarddeviation of the mean
Donglei Du (UNB) ADM 2623 Business Statistics 40 59
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
The Empirical rule
Donglei Du (UNB) ADM 2623 Business Statistics 41 59
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Comparing Chebyshevrsquos Theorem with The Empirical rule
k Chebyshevrsquos Theorem The Empirical rule
1 0 682 75 953 889 997
Donglei Du (UNB) ADM 2623 Business Statistics 42 59
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
An Approximation for Range
Based on the last claim in the empirical rule we can approximate therange using the following formula
Range asymp 6σ
Or more often we can use the above to estimate standard deviationfrom range (such as in Project Management or QualityManagement)
σ asymp Range
6
Example If the range for a normally distributed data is 60 given aapproximation of the standard deviation
Solution σ asymp 606 = 10
Donglei Du (UNB) ADM 2623 Business Statistics 43 59
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Why you should check the normal assumption first
Model risk is everywhere
Black Swans are more than you think
Donglei Du (UNB) ADM 2623 Business Statistics 44 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 45 59
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Coefficient of Correlation Scatterplot for the weight vsheight collected form the guessing game in the first class
Coefficient of Correlation is a measure of the linear correlation(dependence) between two sets of data x = (x1 xn) andy = (y1 yn)
A scatter plot pairs up values of two quantitative variables in a dataset and display them as geometric points inside a Cartesian diagram
Donglei Du (UNB) ADM 2623 Business Statistics 46 59
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Scatterplot Example
160 170 180 190
130
140
150
160
170
180
190
Weight
Hei
ght
The R code
gtweight_height lt- readcsv(weight_height_01Acsv)
gtplot(weight_height$weight_01A weight_height$height_01A
xlab=Weight ylab=Height)
Donglei Du (UNB) ADM 2623 Business Statistics 47 59
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Coefficient of Correlation r
Pearsonrsquos correlation coefficient between two variables is defined asthe covariance of the two variables divided by the product of theirstandard deviations
rxy =
sumni=1(Xi minus X)(Yi minus Y )radicsumn
i=1(Xi minus X)2radicsumn
i=1(Yi minus Y )2
=
nnsumi=1
xiyi minusnsumi=1
xinsumi=1
yiradicn
nsumi=1
xi minus(n
nsumi=1
xi
)2radicn
nsumi=1
yi minus(n
nsumi=1
yi
)2
Donglei Du (UNB) ADM 2623 Business Statistics 48 59
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Coefficient of Correlation Interpretation
Donglei Du (UNB) ADM 2623 Business Statistics 49 59
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Coefficient of Correlation Example
For the the weight vs height collected form the guessing game in thefirst class
The Coefficient of Correlation is
r asymp 02856416
Indicated weak positive linear correlation
The R code
gt weight_height lt- readcsv(weight_height_01Acsv)
gt cor(weight_height$weight_01A weight_height$height_01A
use = everything method = pearson)
[1] 02856416
Donglei Du (UNB) ADM 2623 Business Statistics 50 59
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Layout1 Measure of Dispersion scale parameter
IntroductionRangeMean Absolute DeviationVariance and Standard DeviationRange for grouped dataVarianceStandard Deviation for Grouped DataRange for grouped data
2 Coefficient of Variation (CV)3 Coefficient of Skewness (optional)
Skewness Risk4 Coefficient of Kurtosis (optional)
Kurtosis Risk5 Chebyshevrsquos Theorem and The Empirical rule
Chebyshevrsquos TheoremThe Empirical rule
6 Correlation Analysis7 Case study
Donglei Du (UNB) ADM 2623 Business Statistics 51 59
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Case 1 Wisdom of the Crowd
The aggregation of information in groups resultsin decisions that are often better than could havebeen made by any single member of the group
We collect some data from the guessing game in the first class Theimpression is that the crowd can be very wise
The question is when the crowd is wise We given a mathematicalexplanation based on the concepts of mean and standard deviation
For more explanations please refer to Surowiecki James (2005)The Wisdom of Crowds Why the Many Are Smarter Than the Fewand How Collective Wisdom Shapes Business Economies Societiesand Nations
Donglei Du (UNB) ADM 2623 Business Statistics 52 59
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Case 1 Wisdom of the Crowd Mathematical explanation
Let xi (i = 1 n) be the individual predictions and let θ be thetrue value
The crowdrsquos overall prediction
micro =
nsumi=1
xi
n
The crowdrsquos (squared) error
(microminus θ)2 =
nsumi=1
xi
nminus θ
2
=
nsumi=1
(xi minus θ)2
nminus
nsumi=1
(xi minus micro)2
n
= Biasminus σ2 = Bias - Variance
Diversity of opinion Each person should have private informationindependent of each other
Donglei Du (UNB) ADM 2623 Business Statistics 53 59
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Case 2 Friendship paradox Why are your friends morepopular than you
On average most people have fewer friends thantheir friends have Scott L Feld (1991)
References Feld Scott L (1991) rdquoWhy your friends have morefriends than you dordquo American Journal of Sociology 96 (6)1464-147
Donglei Du (UNB) ADM 2623 Business Statistics 54 59
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Case 2 Friendship paradox An example
References [hyphen]httpwwweconomistcomblogseconomist-explains201304economist-explains-why-friends-more-popular-paradox
Donglei Du (UNB) ADM 2623 Business Statistics 55 59
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Case 2 Friendship paradox An example
Person degree
Alice 1Bob 3
Chole 2Dave 2
Individualrsquos average number of friends is
micro =8
4= 2
Individualrsquos friendsrsquo average number of friends is
12 + 22 + 22 + 32
1 + 2 + 2 + 3=
18
8= 225
Donglei Du (UNB) ADM 2623 Business Statistics 56 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 57 59
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Case 2 Friendship paradox Mathematical explanation
Let xi (i = 1 n) be the degrees of a network
Individualrsquos average number of friends is
micro =
nsumi=1
xi
n
Individualrsquos friendsrsquo average number of friends is
nsumi=1
x2i
nsumi=1
xi
= micro+σ2
micro
Donglei Du (UNB) ADM 2623 Business Statistics 58 59
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-
Case 2 A video by Nicholas Christakis How socialnetworks predict epidemics
Here is the youtube link
Donglei Du (UNB) ADM 2623 Business Statistics 59 59
- Measure of Dispersion scale parameter
-
- Introduction
- Range
- Mean Absolute Deviation
- Variance and Standard Deviation
- Range for grouped data
- VarianceStandard Deviation for Grouped Data
- Range for grouped data
-
- Coefficient of Variation (CV)
- Coefficient of Skewness (optional)
-
- Skewness Risk
-
- Coefficient of Kurtosis (optional)
-
- Kurtosis Risk
-
- Chebyshevs Theorem and The Empirical rule
-
- Chebyshevs Theorem
- The Empirical rule
-
- Correlation Analysis
- Case study
-