mgmt 276: statistical inference in management spring, 2014 green sheets
TRANSCRIPT
MGMT 276: Statistical Inference in Management
Spring, 2014
Green sheets
My last name starts with a letter somewhere between
A. A – DB. E – LC. M – RD. S – Z
Please click in
Schedule of readings
Before next exam: February 18th Please read chapters 1 - 4 &
Appendix D & E in Lind
Please read Chapters 1, 5, 6 and 13 in PlousChapter 1: Selective PerceptionChapter 5: PlasticityChapter 6: Effects of Question Wording and FramingChapter 13: Anchoring and Adjustment
By the end of lecture today 2/6/14
Use this as your study guide
Correlational methodologyStrength of correlation versus direction
Positive vs Negative correlationStrong, vs Moderate vs Weak correlation
Characteristics of a distribution
Remember to hold onto
homework until we have a
chance to cover it
Review of Homework Worksheet
.10.08
2235258
100,00010
.22
.35.25
80,000250,000350,000220,000
Notice Gillian asked 1300 people
130+104+325+455+286=1300
130/1300 = .10
.10x100=10
.10 x 1,000,000 = 100,000
Review of Homework Worksheet
=correl(A2:A11,B2:B11)=-0.9226648007
Strong NegativeDown-0.9227
This shows a strong negative relationship (r = - 0.92)
between the amount spent on snacks and the age of the
moviegoerDescription includes:
Both variablesStrength
(weak,moderate,strong)Direction (positive, negative)Correlation r (actual number)
Scatterplot displays relationships between two continuous variables
Correlation: Measure of how two variables co-occur and also can be used for prediction
Range between -1 and +1
The closer to zero the weaker the relationshipand the worse the prediction
Positive or negative
Correlation - How do numerical values change?
Let’s estimate the correlation coefficient for each of the following
r = +1.0 r = -1.0 r = +.80
r = -.50 r = 0.0
http://neyman.stat.uiuc.edu/~stat100/cuwu/Games.html
http://argyll.epsb.ca/jreed/math9/strand4/scatterPlot.htm
r = +0.97
This shows a strong positive relationship (r = 0.97) between the appraised price of the house
and its eventual sales price
Description includes:Both variables
Strength (weak,moderate,strong)
Direction (positive, negative)Estimated value (actual
number)
r = +0.97 r = -0.48
This shows a moderate negative relationship (r = -
0.48) between the amount of pectin in orange juice and its
sweetnessDescription includes:
Both variablesStrength
(weak,moderate,strong)Direction (positive, negative)
Estimated value (actual number)
r = -0.91
This shows a strong negative relationship (r = -0.91) between the distance that a golf ball is
hit and the accuracy of the drive
Description includes:Both variables
Strength (weak,moderate,strong)
Direction (positive, negative)Estimated value (actual
number)
r = -0.91 r = 0.61
This shows a moderate positive relationship (r = 0.61) between the length of stay in a hospital and the
number of services provided
Description includes:Both variables
Strength (weak,moderate,strong)
Direction (positive, negative)Estimated value (actual
number)
Height of Daughters (inches)
Heig
ht
of
Moth
ers
(i
n)
48 52 56 60 64 68 72 76 48 5
2 5
660 6
4 6
8 7
2
This shows the strong positive (r = +0.8) relationship between the
heights of daughters (in inches) with heights of their mothers (in
inches).
Both axes and values are labeled
Both axes have real numbers
listed
Variable name is
listed clearly
Variable name is listed clearly
Description includes:Both variables
Strength (weak,moderate,strong)
Direction (positive, negative)Estimated value (actual
number)
Height of Daughters (inches)
Heig
ht
of
Moth
ers
(i
n)
48 52 56 60 64 68 72 76 48 5
2 5
660 6
4 6
8 7
2
This shows the strong positive (r = +0.8) relationship between the
heights of daughters (in inches) with heights of their mothers (in
inches).
Both axes and values are labeled
Both axes have real numbers
listed
Variable name is
listed clearly
Variable name is listed clearly
Description includes:Both variables
Strength (weak,moderate,strong)
Direction (positive, negative)Estimated value (actual
number)
Height of Daughters (inches)
Heig
ht
of
Moth
ers
(i
n)
48 52 56 60 64 68 72 76 48 5
2 5
660 6
4 6
8 7
2
This shows the strong positive (r = +0.8) relationship between the
heights of daughters (in inches) with heights of their mothers (in
inches).
Both axes and values are labeled
Both axes have real numbers
listed
Variable name is
listed clearly
Variable name is listed clearly
Description includes:Both variables
Strength (weak,moderate,strong)
Direction (positive, negative)Estimated value (actual
number)
Height of Daughters (inches)
Heig
ht
of
Moth
ers
(i
n)
48 52 56 60 64 68 72 76 48 5
2 5
660 6
4 6
8 7
2
This shows the strong positive (r = +0.8) relationship between the
heights of daughters (in inches) with heights of their mothers (in
inches).
Both axes and values are labeled
Both axes have real numbers
listed
Variable name is
listed clearly
Variable name is listed clearly
Description includes:Both variables
Strength (weak,moderate,strong)
Direction (positive, negative)Estimated value (actual
number)
Height of Daughters (inches)
Heig
ht
of
Moth
ers
(i
n)
48 52 56 60 64 68 72 76 48 5
2 5
660 6
4 6
8 7
2
This shows the strong positive (r = +0.8) relationship between the
heights of daughters (in inches) with heights of their mothers (in
inches).
Both axes and values are labeled
Both axes have real numbers
listed
Variable name is
listed clearly
Variable name is listed clearly
Description includes:Both variables
Strength (weak,moderate,strong)
Direction (positive, negative)Estimated value (actual
number)
1. Describe one positive correlationDraw a scatterplot (label axes)
2. Describe one negative correlationDraw a scatterplot (label axes)
3. Describe one zero correlationDraw a scatterplot (label axes)
Break into groups of 2 or 3Each person hand in own worksheet. Be sure to list
your name and names of all others in your groupUse examples that are different from those is lecture
4. Describe one perfect correlation (positive or negative)Draw a scatterplot (label axes)
5. Describe curvilinear relationshipDraw a scatterplot (label axes)
Height of Daughters (inches)
Heig
ht
of
Moth
ers
(i
n)
48 52 56 60 64 68 72 76 48 5
2 5
660 6
4 6
8 7
2
This shows the strong positive (r = +0.8) relationship between the
heights of daughters (in inches) with heights of their mothers (in
inches).
Both axes and values are labeled
Both axes have real numbers
listed
1. Describe one positive correlationDraw a scatterplot (label axes)
2. Describe one negative correlationDraw a scatterplot (label axes)
3. Describe one zero correlationDraw a scatterplot (label axes)
4. Describe one perfect correlation (positive or negative)Draw a scatterplot (label axes)
5. Describe curvilinear relationshipDraw a scatterplot (label axes)
Variable name is
listed clearly
Variable name is listed clearly
Description includes:Both variables
Strength (weak,moderate,strong)
Direction (positive, negative)Estimated value (actual
number)
Height of Daughters (inches)
Heig
ht
of
Moth
ers
(i
n)
48 52 56 60 64 68 72 76 48 5
2 5
660 6
4 6
8 7
2
This shows the strong positive (.8) relationship between the heights of daughters (measured in inches) with heights of their mothers (measured in inches).
Both axes and values are labeled
Both axes and values are labeled
Both variables are listed, as are direction
and strength
Height of Daughters (inches)
Heig
ht
of
Moth
ers
(i
n)
48 52 56 60 64 68 72 76 48 5
2 5
660 6
4 6
8 7
2
This shows the strong positive (.8) relationship between the heights of daughters (measured in inches) with heights of their mothers (measured in inches).
Both axes and values are labeled
Both axes and values are labeled
Both variables are listed, as are direction
and strength
1. Describe one positive correlationDraw a scatterplot (label axes)
2. Describe one negative correlationDraw a scatterplot (label axes)
3. Describe one zero correlationDraw a scatterplot (label axes)
Break into groups of 2 or 3Each person hand in own worksheet. Be sure to list
your name and names of all others in your groupUse examples that are different from those is lecture
4. Describe one perfect correlation (positive or negative)Draw a scatterplot (label axes)
5. Describe curvilinear relationshipDraw a scatterplot (label axes)
Height of Daughters (inches)
Heig
ht
of
Moth
ers
(i
n)
48 52 56 60 64 68 72 76 48 5
2 5
660 6
4 6
8 7
2
This shows the strong positive (.8) relationship between the heights of daughters (measured in inches) with heights of their mothers (measured in inches).
Both axes and values are labeled
Both axes and values are labeled
Both variables are listed, as are direction
and strength
1. Describe one positive correlationDraw a scatterplot (label axes)
2. Describe one negative correlationDraw a scatterplot (label axes)
3. Describe one zero correlationDraw a scatterplot (label axes)
4. Describe one perfect correlation (positive or negative)Draw a scatterplot (label axes)
5. Describe curvilinear relationshipDraw a scatterplot (label axes)
Review of Homework Worksheet
=correl(A2:A11,B2:B11)=-0.9226648007
Strong NegativeDown-0.9227
Must be complete
and must be stapled
Hand in your
homework
Sample versus census
How is a census different from a sample?
Census measures each person in the specific population
Sample measures a subset of the population and infers about the population – representative sample is goodWhat’s
better?
Use of existing survey data
U.S. Census
Family size, fertility, occupation
The General Social Survey
Surveys sample of US citizens over 1,000 itemsSame questions asked each year
You’ve completed constructing your questionnaire…what’s
the best way to get responders??
Parameter – Measurement or characteristic of the population Usually unknown (only estimated) Usually represented by Greek letters (µ)
Population (census) versus sampleParameter versus statistic
pronounced
“mu”
pronounced
“mew”
Statistic – Numerical value calculated from a sample Usually represented by Roman letters (x)
pronounced “x bar”
Simple random sampling: each person from the population has an equal probability of being included
Sample frame = how you define population
=RANDBETWEEN(1,115)
Let’s take a sample
…a random sample
Question: Average weight of U of A football playerSample frame population of the U of A football team
Or, you can use excel to provide number for
random sample
Random number table – List of random numbers
64 Pick 64th name on the list
(64 is just an example here)
Pick 24th
name on the
list
Systematic random sampling: A probability sampling technique that involves selecting every
kth person from a sampling frame
You pick the
numberOther examples of systematic random sampling1) check every 2000th light bulb2) survey every 10th voter
Stratified sampling: sampling technique that involves dividing a sample into subgroups (or strata) and then selecting samples from each of these groups
- sampling technique can maintain ratios for the different groups
Average number of speeding tickets
17.7% of sample are Pre-business majors 4.6% of sample are Psychology majors 2.8% of sample are Biology majors 2.4% of sample are Architecture majors etc
Average cost for text books for a semester
12% of sample is from California 7% of sample is from Texas6% of sample is from Florida 6% from New York 4% from Illinois 4% from Ohio 4% from Pennsylvania 3% from Michigan etc
Cluster sampling: sampling technique divides a population sample into subgroups (or clusters) by region or physical space.Can either measure everyone or select samples for each cluster
Textbook prices Southwest schools Midwest schools Northwest schools etc
Average student income, survey by Old main areaNear McClelland Around Main Gate etc
Patient satisfaction for hospital 7th floor (near maternity ward) 5th floor (near physical rehab) 2nd floor (near trauma center) etc
Snowball sampling: a non-random technique in which one or more members of a population are located and used to lead the researcherto other members of the population
Used when we don’t have any other way of finding them - also vulnerable to biases
Convenience sampling: sampling technique that involves sampling people nearby.
A non-random sample and vulnerable to bias
Judgment sampling: sampling technique that involves sampling people who an expert says would be useful.
A non-random sample and vulnerable to bias
Non-random sampling is vulnerable to bias
Overview Frequency distributions
The normal curve
Mean, Median,Mode, Trimmed Mean
Standard deviation,Variance, Range
Mean Absolute Deviation
Skewed right, skewed leftunimodal, bimodal, symmetric
Challenge yourself as we work through characteristics of distributions to try to categorize each concept as a measure
of 1) central tendency
2) dispersion or 3) shape
Another example: How many kids in your family?
3
4
82
2
1
4
1
14
2
Number of kids in family1 43 21 84 2 2 14
Measures of Central Tendency(Measures of location)
The mean, median and mode
Mean: The balance point of a distribution. Found by adding up all observations and then dividing by the number of observations
Mean for a sample:
Mean for a population:
ΣX / N = mean = µ (mu)
Note: Σ = add upx or X = scoresn or N = number of scores
Σx / n = mean = x
Measures of “location”Where on the number line the scores tend to
cluster
Measures of Central Tendency(Measures of location)
The mean, median and mode
Mean: The balance point of a distribution. Found by adding up all observations and then dividing by the number of observations
Mean for a sample:
Note: Σ = add upx or X = scoresn or N = number of scores
Σx / n = mean = x
Number of kids in family1 43 21 84 2 2 14
41/ 10 = mean = 4.1
How many kids are in your family?What is the most common family size?
Median: The middle value when observations are ordered from least to most (or most to least)
1, 3, 1, 4, 2, 4, 2, 8, 2, 14
1, 1, 2, 2, 2, 3, 4, 4, 8, 14
Number of kids in family1 43 21 84 2 2 14
Number of kids in family1 43 21 84 2 2 14
148,4,4,2,2,1,
How many kids are in your family?What is the most common family size?
Number of kids in family1 31 42 42 8 2 14
Median: The middle value when observations are ordered from least to most (or most to least)
1, 3, 1, 4, 2, 4, 2, 8, 2, 14
2.5
2, 3,1, 2, 4,2, 4, 8,1, 142, 3,1,
Median always has a percentile rank of 50% regardless of shape
of distribution
2 + 3 µ=2.5If there appears to be two
medians, take the mean of the two
How many kids are in your family?What is the most common family size?
Number of kids in family1 31 42 42 8 2 14
Median: The middle value when observations are ordered from least to most (or most to least)
Mode: The value of the most frequent observation
Number of kids in family1 31 42 42 8 2 14
Score f .1 22 33 14 25 06 07 08 19 010 011 012 013 014 1
Please note:The mode is “2” because it is the most frequently occurring score.
It occurs “3” times. “3” is not the mode, it is
just the frequency for the value that is the
mode
Bimodal distribution: If there are two mostfrequent observations
What about central tendency for qualitative data?
Mode is good for nominal or ordinal data
Median can be used with ordinal data
Mean can be used with interval or ratio data
Overview Frequency distributions
The normal curve
Mean, Median,Mode, Trimmed Mean
Challenge yourself as we work through characteristics of distributions to try to categorize each concept as a measure
of 1) central tendency
2) dispersion or 3) shape
Skewed right, skewed leftunimodal, bimodal, symmetric
Measure of central tendency: describes how scores tend tocluster toward the center of the distribution
Normal distribution
In a normal distribution:
mode = mean = median
In all distributions:mode = tallest point
median = middle scoremean = balance point
Measure of central tendency: describes how scores tend tocluster toward the center of the distribution
Positively skewed distribution
In a positively skewed distribution:
mode < median < mean
In all distributions:mode = tallest point
median = middle scoremean = balance point
Note: mean is most affected by outliers or skewed distributions
Measure of central tendency: describes how scores tend tocluster toward the center of the distribution
Negatively skewed distribution
In a negatively skewed distribution: mean < median < mode
In all distributions:mode = tallest point
median = middle scoremean = balance point
Note: mean is most affected by outliers or skewed distributions
Mode: The value of the most frequent observation
Bimodal distribution: Distribution with two mostfrequent observations (2 peaks)
Example: Ian coaches two boys baseball teams. One
team is made up of 10-year-olds and the other is made up of 16-year-olds. When he measured the
height of all of his players he found a bimodal
distribution
Overview Frequency distributions
The normal curve
Mean, Median,Mode, Trimmed Mean
Standard deviation,Variance, Range
Mean Absolute Deviation
Skewed right, skewed leftunimodal, bimodal, symmetric