presentation on statistics for research lecture 7
Post on 20-Dec-2015
214 views
TRANSCRIPT
Presentation on Statistics for
Research
Lecture 7
Contents
What is Statistics?- its scope Is Statistics Science or Arts?- Debatable Types of Data Presentation of Data Measure of Central Tendency Measures of Variability Chi square test T test for testing difference between
two means
What is Statistics?
”Statistics is a body of methods or tools for obtaining knowledge”
That is Statistics is a tool for obtaining knowledge.
Example : correlation coefficient between height and weight is + 8.5
Functions of statistics:
•presents facts in definite form
•Simplifies huge number of figures and facilitates analysis
•Helps in formulating and testing hypothesis• helps in prediction
.
Scope of Statistics:
Vast, unlimited and ever increasing in
e.g. Biostatistics, Industrial statistics, Informatics, Design of experiments in agricultural production, Demography, Queuing Theory, Stochastic Process, psychology, sociology, public administration etc.
Types of Data
There are three types of data mainly:
1. Cross Sectional, 2. Time Series and3. Panel data
Cross Sectional Data:Cross-sectional data refer to observations of many individuals (subjects, objects) at a given time.
Example:Gross annual income for each of 1000 randomly chosen households in Dhaka City for the year 2009
Example of cross-section data
Income data (,000 Tk) of 10 persons in year 2000.
Person
A
Person
B
Person
C
Person
D
Person
E
Person
F
Person
G
Person
H
Person
I
Person
J
Person
K
234 210 187 342 124 234 321 123 128 187 301
Time series data Data:
Time series data also called Longitudinal data refer to observations of a given unit made over time.
Example of Time Series Data: Overtime (10 years) Income data for 1 person (in ,000 )
.
Year Person X
1991 129
1992 131
1993 150
1994 170
1995 187
1996 293
1997 209
1998 210
1999
2000
213
240
Example of Time series data
Average gross annual income of, say, 1000 households randomly chosen from Dhaka City for 10 years 1991-2000.
Panel Data:
A panel data set refers contains observations on a number of units (e.g. subjects, objects) over time. Thus, panel data has characteristics of both time series and cross-sectional data .
Example of Panel data
Values of the gross annual income for each of 1000 randomly chosen households in Dhaka City collected for each of 10 years from 1991 to 2000. Such data can be represented as a set of double-indexed values {Vij; i=1,...,10, j=1,...,1000} .
Example of Panel Data: Overtime (10 years) Income data for 3 person (in ,000 ) V ij (‘i =1-10, j= 1,2,3. Year Person X
IncomePerson YIncome
Person Z
Income
1991 129 131 87
1992 131 150 93
1993 150 170 70
1994 170 187 34
1995 187 293 87
1996 293 170 93
1997 209 187 70
1998 210 293 87
1999
2000
213
240
209
234
16
54
Example of Panel Data: Overtime (10 years) Income, Exp, Loan data for 3 person (in ,000 )Vij (i= 1-10, j = Income, Exp, Loan
. Year Person X Income
Person XExpenditure
Person X
Loan
1991 129 131 87
1992 131 150 93
1993 150 170 70
1994 170 187 34
1995 187 293 87
1996 293 170 93
1997 209 187 70
1998 210 293 87
1999
2000
213
240
209
234
16
54
Example of Panel data
Values of the gross annual income for each of 1000 randomly chosen households in Dhaka City collected for each of 10 years from 1991 to 2000. Such data can be represented as a set of double-indexed values {Vij; i=1,...,10, j=1,...,1000} .
Presentation of data
Pie chart, Bar chart and Column chart
export quantity by products of year 2010
Series1, 125, 8%
Series1, 800, 55%
Series1, 325, 22%
Series1, 225, 15%
tea
RMG
Jute
others
export quantity
125
800
325
225
0 500 1000
tea
RMG
Jute
others
export quantity
export of 2010
0
500
1000
tea RMG Jute others
products
qu
anti
ty
Series1
Pie chart Example
export value by products of year 2010
8%
55%
22%
15%
tea
RMG
Jute
others
Bar chart Example Projected export value in crore dollar
125
800
325
225
0 500 1000
tea
RMG
Jute
others
export quantity
Column chart Example
Projected export
0
500
1000
tea RMG Jute others
products
Series1
MEASURES OF CENTRAL TENDENCY
What is Measures of Central Tendency?Measures of Central Tendency are -
Mean, Median, Mode, Quartile, Percentile calculations
Measures of Central Tendency
Mean: For a population or a sample, the mean is
the arithmetic average of all values.
The mean is a measure of central tendency.
e.g. mean age of CSC students is say 38
The mean, symbolized by X, is the sum of the weights of students divided by the number of students whose weights have been taken.
The following formula both defines and describes the procedure for finding the mean
= X1 + X2 + X3 / 3
32,35,36,36, 37,38,38,39,39,39,40,40,42,45
Then the mean denoted as :
Meaning of Measures of Central Tendency
• Maximum observation at the mean level and then gradually declining on both sides.
2.5% 2.5%Mean Height in cm
15% 25%
Values have tendency to cluster around the central /mean values
Median:
The median, symbolized by Md, is the value which lies in the middle point of the distribution so that half the values are above the median and half of the values are below the median.
Computation of the median is relatively straightforward
.
The first step is to serially write the values (called rank order of the values) from lowest to highest.
Then the Median is simply the middle number. In the case below, the Median would be 38 because there are 15
values all together with 7 values larger and 7 values smaller than the median.
32 32 35 36 36 37 38
38
39 39 39 40 40 45 46
Median in case of even number of values
Median is calculated as mid-point of the two middle numbers.38 + 39 / 2 = 38.5
32 35 36 36 37 38
38 39
39 39 40 40 42 45
Mode: Mode is a value that occurs most in a population or a sample. It could be considered as the single value most typical of all the values.
Here Mode is 39
32 35 36 36 37 38
38 39
39 39 40 40 42 45
Shape of distribution if mode is higher than mean and medianooooooo.
Meaning of Measures of Central Tendency
• Maximum observation at the mean level and then gradually declining on both sides.
Population’s distribution
2.5% 2.5%Mean Height in cm
15%
Example: For a set of numbers 1,2,3,7,3,8,9,5,3,8,9
the mode is 3 which occurs most
NB. Some population may have more than one mode and could be bi-modal.
Percentiles and Quartiles
Percentiles are like quartiles except that percentiles divide the set of data into 100 equal parts and quartiles divide the set of data into 4 equal parts.
Example
. Research methodology Exam numbers
Frequency
No. of students
Cumulative frequency
Cum. No. of students
76-80 9 9
81-85 21 30
86-90 18 48
91-95 12 60
First Quartile = 25th percentile
In total 60 marks, the first quartile will be located (25% of 60) = 15
15 values from the bottom First quartile is the interval 81-85
Similarly 3rd quartile (75% of 60) = 45 3rd quartile is the interval 86-90
Percentile rank of the student who got 90 marks
Percentile rank = (number of students got below 90 / Total no. of students) x 100
= (47 /60) x 100 = 78th
Measures of Variability Variability refers to the spread or dispersion of
values scores.
A distribution of scores is said to be highly variable if the scores differ widely from one
another. There are Three measures of dispersion Range Variance Standard Deviation
Lecture 8
Importance of Variability
Following two data have got same mean
But do they reflect the same information?
No Data B has more
number of under-weight babies
Data A weight of new born baby (pound
Data B weight of new born baby (pound)
4 3
5 3
6 9
Average
5
Average
5
Range Range is the difference between the largest value and smallest value. Range= Highest value-lowest value Distribution 1: 32 35 36 36 37 38 40 42 42 43 43 45
Distribution 2: 32 32 33 33 33 34 34 34 34 34 35 45
Although the range is (45-32) 13 for both the distribution but doesn’t give true picture about the variability.
Measures of Variability (Variance and Standard Deviation)
: The variance, symbolized by "s2", is a measure of variability.
The variance symbolized by "s2 ", is the average of sum of squares of the deviation.
2S
Formula of Standard Deviation
Standard Deviation is the positive Square root of Variance
2
1
)(
N
XXiS
Example of Variance and Standard Deviation
Series 1 : 32 36 37 37 38 40 42 42 43 43 45 45
Mean X = 480/12 = 40Student No.
12 3 4 5 6 7 8 9 10 11 12
Weights of students
kg
32
36 37 37 38 40 42 42 43 43 45 45
Xi - X
-8
-4 -3 -3 -2 0 2 2 3 3 5 5
(Xi –X) 2 6416 9 9 4 0 4 4 9 9 25 25
Sum of squares = 186
Therefore Variance S2=186 / n-1 = 186 /11 = 16.9
Standard Deviation = 4.11
Standard deviation 4.11 means average variation of the series of values from the mean value is 4.11
Chi Square Test
Tests difference in qualitative values For example, whether people have a definite
taste for colored cars compared to white cars
Suppose in Bangladesh 1000 cars are sold in a month. If there was no preference for colored cars,
then:
Chi square Test; Whether Bangladeshi people have a choice for colored cars.
Types of Colors
Observed no.(O)
Expected no.(E)
O-E (O-E)**2 (O-E)**2/E
White 400 500 -100 10000 20
Colored 600 500 100 10000 20
Total = 40
From Chi-square table, find value for 40 for n-1 = 2-1=1 degree of freedom.
Reject null hypothesis (of no preference) if Calculated Value greater than Tabulated value at 99% or 95% level of significance.
The End