dr.shaikh shaffi ahamed ph.d., associate professor dept. of family & community medicine
TRANSCRIPT
Dr.Shaikh Shaffi Ahamed Ph.D.,Associate Professor
Dept. of Family & Community Medicine
Statistics is the science of conducting studies to collect, organize, summarize, analyze, present, interpret and draw conclusions from data.
Any values (observations or measurements) that have been collected
What Is Statistics?1. Collecting Data
e.g., Sample, Survey, Observe,Simulate
2. Characterizing Data e.g., Organize/Classify,
Count, Summarize
3. Presenting Data e.g., Tables, Charts,
Statements
4. Interpreting Resultse.g. Infer, Conclude, Specify Confidence
Why?Data Analysis
Decision-Making
© 1984-1994 T/Maker Co.
(1) Statistics arising out of biological sciences, particularly from the fields of Medicine and public health.
(2) The methods used in dealing with statistics in the fields of medicine, biology and public health for planning, conducting and analyzing data which arise in investigations of these branches.
BASIC CONCEPTSData : Set of values of one or more variables recorded on one or more observational units (singular: Datum)
Categories of data 1. Primary data: observation, questionnaire, interviews & survey 2. Secondary data: census, medical records, registry
Sources of data 1. Routinely kept records2. Surveys (census)3. Experiments4. External source
Dataset: Data for a set of variables collection in group of persons.
Data Table: A dataset organized into a table, with one column for each variable and one row for each person.
Datasets and Data Tables
OBS AGE BMI FFNUM TEMP( 0F) GENDER EXERCISE LEVEL QUESTION
1 26 23.2 0 61.0 0 1 1
2 30 30.2 9 65.5 1 3 2
3 32 28.9 17 59.6 1 3 4
4 37 22.4 1 68.4 1 2 3
5 33 25.5 7 64.5 0 3 5
6 29 22.3 1 70.2 0 2 2
7 32 23.0 0 67.3 0 1 1
8 33 26.3 1 72.8 0 3 1
9 32 22.2 3 71.5 0 1 4
10 33 29.1 5 63.2 1 1 4
11 26 20.8 2 69.1 0 1 3
12 34 20.9 4 73.6 0 2 3
13 31 36.3 1 66.3 0 2 5
14 31 36.4 0 66.9 1 1 5
15 27 28.6 2 70.2 1 2 2
16 36 27.5 2 68.5 1 3 3
17 35 25.6 143 67.8 1 3 4
18 31 21.2 11 70.7 1 1 2
19 36 22.7 8 69.8 0 2 1
20 33 28.1 3 67.8 0 2 1
Typical Data Table
Definitions for Variables
• AGE: Age in years• BMI: Body mass index, weight/height2 in kg/m2
• FFNUM: The average number of times eating “fast food” in a week
• TEMP: High temperature for the day• GENDER: 1- Female 0- Male• EXERCISE LEVEL: 1- Low 2- Medium 3- High• QUESTION: what is your satisfaction rating for this
Biostatistics session ?
1- Very Satisfied 2- Somewhat Satisfied 3- Neutral
4- Somewhat dissatisfied 5- Dissatisfied
• When collecting or gathering data we collect data from individuals cases on particular variables.
• A variable is a unit of data collection whose value can vary.
• Variables can be defined into types according to the level of mathematical scaling that can be carried out on the data.
• There are four types of data or levels of measurement:
Types of variables and Types of variables and datadata
1. Nominal 2. Ordinal
3. Interval 4. Ratio
Scales of Measurement
Terminology Categorical variables Quantity variables Nominal variables Ordinal Variables Binary data. Discrete and continuous data. Interval and ratio variables Qualitative and Quantitative traits/
characteristics of data.
Categorical data The objects being studied are grouped into
categories based on some qualitative trait.
The resulting data are merely labels or categories.
Examples of categorical data
Eye color:blue, brown, black, green, etc. Smoking status: smoker, non-smoker Attitudes towards the death penalty:Strongly disagree, disagree, neutral, agree,
strongly agree.
Categorical data
Ordinaldata
Nominaldata
Nominal data
A type of categorical data in which objects fall into unordered categories.
Studies measuring nominal data must ensure that each category is mutually exclusive and the system of measurement needs to be exhaustive.
Variables that have only two responses i.e. Yes or No, are known as dichotomies.
Examples of Nominal data Type of Car
BMW, Mercedes, Lexus, Toyota, etc., Ethnicity
White British, Afro-Caribbean, Asian, Arab, Chinese, other, etc.
Smoking statussmoker, non-smoker
Binary Data A type of categorical data in which there
are only two categories.Examples: Smoking status- smoker, non-smoker Attendance- present, absent Result of a exam- pass, fail. Status of student- undergraduate,
postgraduate.
• Ordinal data is data that comprises of categories that can be rank ordered.
• Similarly with nominal data the distance between each category cannot be calculated but the categories can be ranked above or below each other.
Ordinal dataOrdinal data
Examples of Ordinal Data:
Grades in exam- A+, A, B+ B, C+, C ,D , D+, and Fail.
Degree of illness- none, mild, moderate, acute, chronic.
Opinion of students about stats classes-Very unhappy, unhappy, neutral, happy,
ecstatic!
Examples:
Nominal data (Binary)& Nominal data (Binary)& Ordinal dataOrdinal data
What is your gender? (please tick)
Male
Female
Did you enjoy the teaching session ? (please tick)
Yes
NoWhat is the level of satisfaction with the new curriculum at a medical school received? (please tick)
Very satisfied
Somewhat satisfied
Neutral
Somewhat dissatisfied
Very dissatisfied
QUANTATIVE DATA The objects being studied are
‘measured’ based on some quantitative trait.
The resulting data are set of numbers. Examples: Pulse rate Height Age Exam marks Time to complete a statistics test Number of cigarettes smoked
Quantitativedata
ContinuousDiscrete
Discrete DataOnly certain values are possible (there are gaps between the possible values). Implies
counting.
Continuous DataTheoretically, with a fine enough
measuring device. Implies counting.
24
Discrete data -- Gaps between possible values
Continuous data -- Theoretically,no gaps between possible values
Number of Children
Hb
Examples of Discrete Data: Number of children in a family Number of students passing a stats exam Number of crimes reported to the police Number of bicycles sold in a day.
Generally, discrete data are counts.We would not expect to find 2.2 children
in a family or 88.5 students passing an exam or 127.2 crimes being reported to the police or half a bicycle being sold in
one day.
Example of Continuous Data: Age ( in years) Height( in cms.) Weight (in Kgs.) Sys.BP, Hb., etc.,
‘Generally, continuous data come from measurements.
Variables
Category Quantity
Nominal Ordinal Discrete(counting)
Continuous(measuring)
Interval variables Examples: Fahrenheit temperature scale- Zero is
arbitrary- 40 Degrees is not twice as hot as 20 degrees.
IQ tests. No such thing as Zero IQ. 120 IQ not twice as intelligent as 60.
Question- Can we assume that attitudinal data represents real, quantifiable measured categories? (ie. That ‘very happy’ is twice as happy as plain ‘happy’ or that ‘Very unhappy’ means no happiness at all). “Statisticians not in agreement on this”.
Ratio variables Examples: Can be discrete or continuous data. The distance between any two adjacent
units of measurement (intervals) is the same and there is a meaningful zero point.
Income- someone earning SR20,000 earns twice as much as someone who earns SR10,000.
Height Weight Age
•These levels of measurement can be placed in hierarchical order.
Hierarchical data Hierarchical data orderorder
• Nominal data is the least complex and give a simple measure of whether objects are the same or different.
• Ordinal data maintains the principles of nominal data but adds a measure of order to what is being observed.
• Interval data builds on ordinal by adding more information on the range between each observation by allowing us to measure the distance between objects.
• Ratio data adds to interval with including an absolute zero.
Hierarchical data Hierarchical data orderorder
32
QUANTITATIVE DATA QUALITATIVE DATA
wt. (in Kg.) : under wt, normal & over wt. Ht. (in cm.): short, medium & tall
33
CLINIMETRICS
A science called clinimetrics in which qualities are converted to meaningful quantities by using the scoring system.
Examples: (1) Apgar score based on appearance, pulse, grimace,
activity and respiration is used for neonatal prognosis.
(2) Smoking Index: no. of cigarettes, duration, filter or not, whether pipe, cigar etc.,
(3) APACHE( Acute Physiology and Chronic Health Evaluation) score: to quantify the severity of condition of a patient
• Why do we need to know what type of data we are dealing with?
• The data type or level of measurement influences the type of statistical analysis techniques that can be used when analysing data.
Data types – Data types – important?important?
Frequency DistributionsFrequency Distributions What is a frequency distribution?What is a frequency distribution? A
frequency distribution is an organization of raw data in tabular form, using classes (or intervals) and frequencies.
What is a frequency count?What is a frequency count? The frequency or the frequency count for a data value is the number of times the value occurs in the data set.
Frequency Distributions data distribution – pattern of
variability.
the center of a distribution the ranges the shapes
simple frequency distributions grouped & ungrouped frequency
distributions
Categorical or Qualitative Categorical or Qualitative Frequency DistributionsFrequency Distributions
What is a categorical frequency What is a categorical frequency distribution?distribution?
A categorical frequency distribution represents data that can be placed in specific categories, such as gender, blood group, & hair color, etc.
Categorical or Qualitative Categorical or Qualitative Frequency Distributions -- Frequency Distributions -- ExampleExample
Example:Example: The blood types of 25 blood donors are given below. Summarize the data using a frequency distribution.
AB B A O BAB B A O B O B O A O O B O A O B O B B B B O B B B A O AB AB O A O AB AB O A B AB O AA B AB O A
Categorical Frequency Distribution Categorical Frequency Distribution for the Blood Types -- for the Blood Types -- Example ContinuedExample Continued
Note:Note: The classes for the distribution are The classes for the distribution are the blood types.the blood types.
Quantitative Frequency Quantitative Frequency Distributions -- UngroupedDistributions -- Ungrouped
What is an ungrouped frequency What is an ungrouped frequency distribution?distribution?
An ungrouped frequency distribution simply lists the data values with the corresponding frequency counts with which each value occurs.
Quantitative Frequency Quantitative Frequency Distributions Distributions –– Ungrouped -- Ungrouped -- ExampleExample
Example:Example: The at-rest pulse rate for 16 athletes at a meet were 57, 57, 56, 57, 57, 57, 56, 57, 58, 56, 54, 64, 53, 54, 54, 55, 57, 55, 60,58, 56, 54, 64, 53, 54, 54, 55, 57, 55, 60, and 5858. Summarize the information with an ungrouped frequency distribution.
Quantitative Frequency Quantitative Frequency Distributions Distributions –– Ungrouped -- Ungrouped -- Example Example
ContinuedContinued
Note: The Note: The (ungrouped) (ungrouped) classes are the classes are the observed values observed values themselves.themselves.
Example of a simple frequency distribution (ungrouped)
5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1
f 9 3 8 2 7 2 6 1 5 4 4 4 3 3 2 3 1 3
f = 25
Relative Frequency DistributionRelative Frequency Distribution
Note:Note: The relative The relative frequency for a frequency for a class is obtainedclass is obtainedby computing by computing f/nf/n..
Example of a simple frequency distribution 5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1
f rel f 9 3 .12 8 2 .08 7 2 .08 6 1 .04 5 4 .16 4 4 .16 3 3 .12 2 3 .12 1 3 .12 f = 25 rel f = 1.0
Cumulative Frequency and Cumulative Frequency and Cumulative Relative FrequencyCumulative Relative Frequency
Note:Note: Table Table withwithrelative and relative and cumulativecumulativerelative relative frequencies.frequencies.
Example of a simple frequency distribution (ungrouped)
5 7 8 1 5 9 3 4 2 2 3 4 9 7 1 4 5 6 8 9 4 3 5 2 1
f cf rel f rel. cf 9 3 3 .12 .12 8 2 5 .08 .20 7 2 7 .08 .28 6 1 8 .04 .32 5 4 12 .16 .48 4 4 16 .16 .64 3 3 19 .12 .76 2 3 22 .12 .88 1 3 25 .12 1.0 f = 25 rel f = 1.0
Quantitative Frequency Quantitative Frequency Distributions -- GroupedDistributions -- Grouped
What is a grouped frequency What is a grouped frequency distribution?distribution? A grouped frequency distribution is obtained by constructing classes (or intervals) for the data, and then listing the corresponding number of values (frequency counts) in each interval.
Patient No
Hb
(g/dl)
Patient No
Hb
(g/dl)
Patient No
Hb
(g/dl)
1 12.0 11 11.2 21 14.9
2 11.9 12 13.6 22 12.2
3 11.5 13 10.8 23 12.2
4 14.2 14 12.3 24 11.4
5 12.3 15 12.3 25 10.7
6 13.0 16 15.7 26 12.5
7 10.5 17 12.6 27 11.8
8 12.8 18 9.1 28 15.1
9 13.2 19 12.9 29 13.4
10 11.2 20 14.6 30 13.1
Tabulate the hemoglobin values of 30 adult Tabulate the hemoglobin values of 30 adult male patients listed belowmale patients listed below
Hb (g/dl) No. of patients
9.0 – 9.910.0 – 10.911.0 – 11.912.0 – 12.913.0 – 13.914.0 – 14.915.0 – 15.9
136
10532
Total 30
Frequency distribution of 30 adult male Frequency distribution of 30 adult male patients by Hb patients by Hb
DIAGRAMS/GRAPHSCategorical data
--- Bar diagram (one or two groups)
--- Pie diagram
Continuous data
--- Histogram
--- Frequency polygon (curve)
--- Stem-and –leaf plot
--- Box-and-whisker plot
Two-dimensional graphs: Basic Set-Up
Histograms
H is t o g ra m s
Frequency Polygons
Example data
68 63 42 27 30 36 28 3279 27 22 28 24 25 44 6543 25 74 51 36 42 28 31 28 25 45 12 57 51 12 32 49 38 42 27 31 50 38 21 16 24 64 47 23 22 43 27 49 28 23 19 11 52 46 3130 43 49 12
Stem and leaf plotStem-and-leaf of Age N = 60
Leaf Unit = 1.0
6 1 122269
19 2 1223344555777788888
11 3 00111226688
13 4 2223334567999
5 5 01127
4 6 3458
2 7 49
Bar Graphs
912
2016
128
20
0
5
10
15
20
25
Smo Alc Chol DM HTN NoExer
F-H
Risk factor
Numb
er
The distribution of risk factor among cases with Cardio vascular Diseases
Heights of the bar indicates frequency
Frequency in the Y axis and categories of variable in the X axis
The bars should be of equal width and no touching the other bars
HIV cases enrolment in USA by gender
0
2
4
6
8
10
12
1986 1987 1988 1989 1990 1991 1992
Year
En
rollm
ent
(hu
nd
red
)
MenWomen
Bar chart
HIV cases Enrollment in USA by gender
0
2
4
6
8
10
12
14
16
18
1986 1987 1988 1989 1990 1991 1992
Year
Enro
llm
ent (T
hou
sands)
WomenMen
Stocked bar chart
Grouped Bar Graph
Pie diagram Pie diagram – depicts
the percentage represented by each alternative as a slice of a circular pie; the larger the slice, the greater the percentage.
10%
20%
70%
Mild
Moderate
Severe
The prevalence of different degree of Hypertension
in the population
Pie Chart•Circular diagram – total -100%
•Divided into segments each representing a category
•Decide adjacent category
•The amount for each category is proportional to slice of the pie
General rules for designing graphs A graph should have a self-explanatory
legend A graph should help reader to
understand data Axis labeled, units of measurement
indicated Scales important. Start with zero
(otherwise // break) Avoid graphs with three-dimensional
impression, it may be misleading (reader visualize less easily
Thank You