1_introduction to statistics_jan-2, 2012 [compatibility mode]
DESCRIPTION
goodTRANSCRIPT
Page 1
1
Statistics for Management
Ramesh [email protected]
Department of Mechanical Engineering, NIT CalicutKerala, India -673 601.
Dedicated to
Professor. S. G. DeshmukhProfessor. S. G. DeshmukhProfessor. S. G. DeshmukhProfessor. S. G. Deshmukh
3
Objectives of this course…
• Appreciate the role of statistics in various decision making situations
• Summarize data with frequency distributions and graphic presentation.
• Interpret descriptive statistics for central tendency, dispersion and location
• Define and interpret probability. Utilize discrete and continuous probability distributions to determine probabilities in various managerial applications.
• Apply the central limit theorem to determine probabilities of sample means and compute and interpret point and interval estimates.
• Conduct Hypothesis tests for means
• Utilize linear regression to estimate and predict variables.• Understand basic concepts of design-of-experiment
• Understand importance of non-parametric tests
Lab/tutorial
• The laboratory content will require pre-requisite of working with Excel. There will
be quizzes/assignments every week. The
lab assignments are to be submitted on
that day itself. Students will be also
required to visit and consult useful web
resources.
Mode of Evaluation and Grades
• Grades are based on total points earned from test 1 &2,lab/tutorial/assignments,
mini-project and end semester
examination.
Test 1 Test 2 End Semester
Lab/tutorial /assignments
(every week)
Mini-Project
Surprise quizzes
15 % 15 % 40 % 10 % 10% 10%
Reference
• Meyer PL, Introductory Probability and Statistical Applications, Oxford and IBH Publishers
• Miller IR, Freund JE, Johnson R, Probability and Statistics for Engineers, Prentice-Hall (I) Ltd
• Walpole RE and Myers RH, Probability & Statistics for Engineers and Scientists, Macmillan
• Levin, R. I. and Rubin, D.S., Statistics for Management
(Pearson Education )
• Levine,David., Stephan,David., Krehbiel, Timothy and
Berenson, Mark., Statistics for Managers using Microsoft Excel, Prentice Hall
Page 2
7
Statistics..
• Plays an important role in many facets of human endeavour
• Occurs remarkably frequently in our
everyday lives
• It is often incorrectly thought of as just a
collection of data, graphs and diagrams
Statistics in Business
• Accounting — auditing and cost estimation• Economics — regional, national, and international
economic performance • Finance — investments and portfolio management• Management — human resources, compensation,
and quality management• Management Information Systems — (ERP):
performance of systems which gather, summarize, and disseminate information to various managerial levels
• Marketing — market analysis and consumer research
• International Business — market and demographic analysis
9
What is Statistics?
• Science of gathering, analyzing, interpreting,
and presenting data
• Branch of mathematics
• Facts and figures
• Measurement taken on a sample
Statistics is the scientific method that
enables us to make decisions as responsibly
as possible.
Statistics…
• The science of data to answer research questions– Formulate a research question(s) (hypothesis)
– Collect data
– Analyze and summarize data
– Draw conclusions to answer research questions
• Statistical Inference
– In the presence of variation
11
Answers Questions from Everyday
Life• Business: Will a new marketing strategy be
profitable?
• Industry: Will a product’s life exceed the warranty period?
• Medicine: Will this year’s flu vaccine reduce the chance of flu?
• Education: Will technology improve learning?
• Government: Will a change in interest rates affect inflation?
Statistics: Science of
variability..?
• Virtually everything varies
• Variation occurs among individuals
• Variation occurs within any one individual
as time passes
Page 3
13
Can Statistics Be Trusted?“There are three kinds of lies:
Lies, damned lies, and statistics.”--Mark Twain
“It is easy to lie with statistics. But it is
easier to lie without them.” --Frederick Mosteller
“Figures won’t lie but liars will figure.”--Charles Grosvenor
Population Versus Sample• Population — the whole
– a collection of persons, objects, or items under study
– The entire group of individuals in a statistical study we want information about.
• Census — gathering data from the entire population
• Sample — a portion of the whole– a subset of the population
– a part of the population from which we actually collect information, used to draw conclusions about the whole (statistical inference
15
Statistics can be split into two
broad categories
1. Descriptive statistics
2. Statistical inference
Descriptive Statistics
� Collect data
� ex. Survey
� Present data
� ex. Tables and graphs
� Characterize data
� ex. Sample mean =i
X
n
∑
17
Descriptive statistics..
• Encompasses the following:
– Graphical or pictorial display
– Condensation of large masses of data into a
form such as tables
– Preparation of summary measures to give a
concise description of complex information
(e.g. an average figure)
– Exhibition of patterns that may be found in
sets of information
Inferential Statistics
� Estimation
� ex. Estimate the
population mean weight
using the sample mean
weight
� Hypothesis testing
� ex. Test the claim that the
population mean weight
is 120 poundsDrawing conclusions and/or making decisions concerning a population based on sample results.
Page 4
19
Inferential Statistics..
• Especially relates to:
– Determining whether characteristics of a
situation are unusual or if they have
happened by chance
– Estimating values of numerical quantities and
determining the reliability of those estimates
– Using past occurrences to attempt to predict
the future
Process of Inferential Statistics
Population
(parameter)
µ
Sample
x
(statistic )
Calculate x
to estimate µ
Select a
random sample
Population vs. Sample
Population Sample
Measures used to describe the
population are called parameters
Measures computed from
sample data are called statistics
22
Parameter vs. Statistic
• Parameter — descriptive measure of the
population
– Usually represented by Greek letters
• Statistic — descriptive measure of a
sample
– Usually represented by Roman letters
23
Symbols for Population
Parameters
µ denotes population parameter
2
σ denotes population variance
σ denotes population standard deviation
Symbols for Sample Statistics
x denotes sample mean
2
S denotes sample variance
S denotes sample standard deviation
Page 5
Types of Variables
� Categorical (qualitative) variables have values
that can only be placed into categories, such as
“yes” and “no.”
� Numerical (quantitative) variables have values
that represent quantities.
Types of Variables
Data
Categorical Numerical
Discrete Continuous
Examples:
� Marital Status
� Political Party� Eye Color
(Defined categories)Examples:
� Number of Children
� Defects per hour
(Counted items)
Examples:
� Weight
� Voltage
(Measured characteristics)
27
Levels of Data Measurement
• Nominal — Lowest level of measurement
• Ordinal
• Interval
• Ratio — Highest level of measurement
Levels of Measurement
� A nominal scale classifies data into distinct
categories in which no ranking is implied.
Categorical Variables Categories
Personal Computer Ownership
Type of Stocks Owned
Internet Provider
Yes / No
Microsoft Network / AOL
Growth Value Other
Levels of Measurement
� An ordinal scale classifies data into distinct
categories in which ranking is implied
Categorical Variable Ordered Categories
Student class designation Freshman, Sophomore, Junior,
Senior
Product satisfaction Satisfied, Neutral, Unsatisfied
Faculty rank Professor, Associate Professor,
Assistant Professor, Instructor
Standard & Poor’s bond ratings AAA, AA, A, BBB, BB, B, CCC, CC,
C, DDD, DD, D
Student Grades A, B, C, D, F
Levels of Measurement
� An interval scale is an ordered scale in which the difference between measurements is a meaningful quantity but the measurements do not have a true zero point.
� A ratio scale is an ordered scale in which the difference between the measurements is a meaningful quantity and the measurements have a
true zero point.
Page 6
Interval and Ratio Scales
32
Usage Potential of Various
Levels of Data
Nominal
Ordinal
Interval
Ratio
33
Data Level, Operations,
and Statistical Methods
Data Level
Nominal
Ordinal
Interval
Ratio
Meaningful Operations
Classifying and Counting
All of the above plus Ranking
All of the above plus Addition, Subtraction
All of the above plus multiplication and division
StatisticalMethods
Nonparametric
Nonparametric
Parametric
Parametric
Data preparation rules
• Data presented must be
– factual
– relevant
Before presentation always check:
• the source of the data
• that the data has been accurately
transcribed
• the figures are relevant to the problem
35
Methods of visual presentation
of data• Table
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 90 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
Methods of visual presentation
of data• Graphs
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
Page 7
37
Methods of visual presentation
of data• Pie chart
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
Methods of visual presentation
of data• Multiple bar chart
0 20 40 60 80 100
1st Qtr
2nd Qtr
3rd Qtr
4th Qtr
North
West
East
39
Methods of visual presentation
of data• Simple pictogram
0
20
40
60
80
100
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
North
West
Frequency distributions
• Frequency tables
Class Interval Frequency Cumulative Frequency
< 20 13 13
<40 18 31
<60 25 56
<80 15 71
<100 9 80
Observation Table
41
Frequency
0
5
10
15
20
25
30
< 20 <40 <60 <80 <100
Frequency
Frequency diagramsFrequency
0
5
10
15
20
25
30
< 20 <40 <60 <80 <100
Frequency
Cumulative Frequency
0
10
20
30
40
50
60
70
80
90
< 20 <40 <60 <80 <100
Cumulative Frequency
Ungrouped Versus
Grouped Data
• Ungrouped data
• have not been summarized in any way
• are also called raw data
• Grouped data
• have been organized into a frequency distribution
Page 8
43
Example of Ungrouped
Data
42
30
53
50
52
30
55
49
61
74
26
58
40
40
28
36
30
33
31
37
32
37
30
32
23
32
58
43
30
29
34
50
47
31
35
26
64
46
40
43
57
30
49
40
25
50
52
32
60
54
Ages of a Sample of
Managers from
XYZ
Frequency Distribution of
Ages
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
45
Data Range
42
30
53
50
52
30
55
49
61
74
26
58
40
40
28
36
30
33
31
37
32
37
30
32
23
32
58
43
30
29
34
50
47
31
35
26
64
46
40
43
57
30
49
40
25
50
52
32
60
54
Smallest
Largest
Range = Largest - Smallest
= 74 - 23
= 51
Number of Classes and Class
Width• The number of classes should be between 5 and 15.
• Fewer than 5 classes cause excessive summarization.
• More than 15 classes leave too much detail.
• Class Width
• Divide the range by the number of classes for an approximate class width
• Round up to a convenient number
10 = Width Class
8.5 =6
51 = Width Class eApproximat
47
Class Midpoint
Class Midpoint = beginning class endpoint + ending class endpoint
2
= 30 + 40
2
= 35
( )
Class Midpoint = class beginning point + 1
2class width
= 30 + 1
210
= 35
Relative FrequencyRelative
Class Interval Frequency Frequency
20-under 30 6 .12
30-under 40 18 .36
40-under 50 11 .22
50-under 60 11 .22
60-under 70 3 .06
70-under 80 1 .02
Total 50 1.00
6
50=
18
50=
Page 9
49
Cumulative Frequency
CumulativeClass Interval Frequency Frequency
20-under 30 6 6
30-under 40 18 24
40-under 50 11 35
50-under 60 11 46
60-under 70 3 49
70-under 80 1 50
Total 50
18 + 6
11 + 24
Class Midpoints, Relative Frequencies,
and Cumulative Frequencies
Relative Cumulative
Class IntervalFrequency Midpoint Frequency Frequency
20-under 30 6 25 .12 6
30-under 40 18 35 .36 24
40-under 50 11 45 .22 35
50-under 60 11 55 .22 46
60-under 70 3 65 .06 49
70-under 80 1 75 .02 50
Total 50 [email protected]
51
Cumulative Relative Frequencies
Relative Cumulative Cumulative Relative
Class IntervalFrequency Frequency Frequency Frequency
20-under 30 6 .12 6 .12
30-under 40 18 .36 24 .48
40-under 50 11 .22 35 .70
50-under 60 11 .22 46 .92
60-under 70 3 .06 49 .98
70-under 80 1 .02 50 1.00
Total 50 1.00
Common Statistical Graphs
• Histogram -- vertical bar chart of frequencies
• Frequency Polygon -- line graph of frequencies
• Ogive -- line graph of cumulative frequencies
• Pie Chart -- proportional representation for
categories of a whole
• Stem and Leaf Plot
• Pareto Chart
• Scatter Plot
53
Histogram
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1 01
02
0
0 10 20 30 40 50 60 70 80
Years
Fre
qu
en
cy
Histogram Construction
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1
01
02
0
0 10 20 30 40 50 60 70 80
Years
Fre
qu
en
cy
Page 10
55
Frequency Polygon
Class Interval Frequency
20-under 30 6
30-under 40 18
40-under 50 11
50-under 60 11
60-under 70 3
70-under 80 1 01
02
0
0 10 20 30 40 50 60 70 80
Years
Fre
qu
en
cy
Ogive
Cumulative
Class Interval Frequency
20-under 30 6
30-under 40 24
40-under 50 35
50-under 60 46
60-under 70 49
70-under 80 50
020
40
60
0 10 20 30 40 50 60 70 80
Years
Fre
qu
en
cy
57
Relative Frequency Ogive
Cumulative
Relative
Class Interval Frequency
20-under 30 .12
30-under 40 .48
40-under 50 .70
50-under 60 .92
60-under 70 .98
70-under 80 1.00
0.000.100.200.300.400.500.600.700.800.901.00
0 10 20 30 40 50 60 70 80
Years
Cu
mu
lati
ve R
ela
tive F
req
uen
cy
Complaints by Passengers
COMPLAINT NUMBER PROPORTION DEGREES
Stations, etc. 28,000 .40 144.0
TrainPerformance
14,700 .21 75.6
Equipment 10,500 .15 50.4
Personnel 9,800 .14 50.6
Schedules,etc.
7,000 .10 36.0
Total 70,000 1.00 360.0
59
Complaints by Passengers
Stations, Etc.
40%Train
Performance
21%
Equipment
15%
Personnel
14%
Schedules,
Etc.
10%
Second
Quarter Truck Production
2d QuarterTruck
ProductionCompany
A
B
C
D
ETotals
357,411
354,936
160,997
34,099
12,747920,190
Page 11
61
39%
39%
17%
4%1%
A B C D E
Second Quarter
Truck Production
Pie Chart Calculations for
Company A
2d QuarterTruck
ProductionProportion DegreesCompany
A
B
C
D
ETotals
357,411
354,936
160,997
34,099
12,747920,190
.388
.386
.175
.037
.0141.000
140
139
63
13
5360
357,411
920,190 =
.388 360 =×
63
Pareto Chart
0
10
20
30
40
50
60
70
80
90
100
Poor
Wiring
Short in
Coil
Defective
Plug
Other
Fre
qu
ency
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Scatter Plot
Registered Vehicles (1000's)
Gasoline Sales (1000's of
Gallons)
5 60
15 120
9 90
15 140
7 60
0
100
200
0 5 10 15 20Registered Vehicles
Ga
soli
ne S
ale
s
Principles of Excellent Graphs
� The graph should not distort the data.
� The graph should not contain unnecessary
adornments (sometimes referred to as chart junk).
� The scale on the vertical axis should begin at zero.
� All axes should be properly labeled.
� The graph should contain a title.
� The simplest possible graph should be used for a
given set of data.
Graphical Errors: Chart Junk
1960: $1.00
1970: $1.60
1980: $3.10
1990: $3.80
Minimum Wage
Bad Presentation
Minimum Wage
0
2
4
1960 1970 1980 1990
$
� Good Presentation
Page 12
Graphical Errors:
Compressing the Vertical Axis
Good Presentation
Quarterly Sales Quarterly Sales
Bad Presentation
0
25
50
Q1 Q2 Q3 Q4
$
0
100
200
Q1 Q2 Q3 Q4
$
�
Graphical Errors: No Zero Point
on the Vertical Axis
Monthly Sales
36
39
42
45
J F M A M J
$
Graphing the first six months of sales
Monthly Sales
0
39
42
45
J F M A M J
$
36
�Good PresentationsBad Presentation
69
Thank You
• http://www.stats.gla.ac.uk/steps/glossary/presenting_data.html
• http://www.ilir.uiuc.edu/courses/lir593/