statisticsstatistics dr. ahmed abd elmaksoud lecturer of anesthesiology & icu faculty of...

77
Statistics Statistics Dr. Ahmed Abd Elmaksoud Dr. Ahmed Abd Elmaksoud Lecturer of Anesthesiology & ICU Lecturer of Anesthesiology & ICU Faculty of medicine Faculty of medicine Ain Shams University Ain Shams University When guessing is When guessing is informed decision informed decision

Upload: colleen-parmeter

Post on 15-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

StatisticsStatisticsStatisticsStatistics

Dr. Ahmed Abd ElmaksoudDr. Ahmed Abd ElmaksoudLecturer of Anesthesiology & ICULecturer of Anesthesiology & ICU

Faculty of medicineFaculty of medicineAin Shams UniversityAin Shams University

When guessing is informed When guessing is informed decisiondecision

Objectives

Define statistics and understand its terminology

Discuss the importance and need of statistics in medical field

Distinguish types of data and variables Describe types of statistics & statistical

tests

What is statistics?

Statistics is the science of collecting, describing, and analyzing data in order to get a good decision.

Collecting data

Describing data

Analyzing data

Good Decision

Why do we need statistics?

Variability (atropine) Causes: -

Uncontrollable (too many) factors Immeasurable factors Unknown factors

Why do we need statistics?

Variability (atropine) Effect of variability

Large amount of data describing the same thing (many values for one variable)

No certainty “Deterministic vs probabilistic” Sampling

Functions of statistics (new hypothetical β-blocker drug)

Describe (Descriptive statistics) Inference (Inferential statistics)

Variables & Data

A variable is something whose value can vary. For example, age, gender and blood type are variables.

Data are the values you get when you measure a variable

Mrs Brown

Mr Patel

Ms Manda

Age 32 24 20

Gender Female Male Female

Blood type

O O A

The Variables…… ………and the

Data

Types of variables 1. Qualitative variables (data)

a) Categorical variableThe values (data) of a categorical variable are categories

Gender: (dichotomous –binary) Male Female

Type of ICU admission Medical Surgical Physical injuries Poisoning Others

Types of variables 1. Qualitative variables (data)

b) Ordinal variable Categorical variable whose values are ordered

Degree of illness Mild Moderate Severe

Physical status ASA I ASA II ASA III ASA IV ASA V

Glasgow’s coma scale

Types of variables 2. Quantitative (Numerical) variables

a) Discrete variable Episodes of myocardial ischemia

b) Continuous (interval – ratio) variable Cardiac index Creatinine clearance

Types of variables

Changing data scales

Sample and Population

A sample is a group (subset) taken from a population.

Population is the group of ALL individuals (entities) sharing specific characteristics

Sample and Population All human beings Day case surgical patients undergoing

general anesthesia Low-risk CABG surgery patients ICU patients with septic shock Women undergoing CS under spinal

anesthesia Human Skeletal muscle fibers Cardiac muscles of rates

Descriptive statistics

Descriptive statistics is a series of procedures designed to illuminate the data, so that its principal characteristics and main features are revealed.

This may mean sorting the data by size; perhaps putting it into a table, may be presenting it in an appropriate chart, or summarizing it numerically; and so on.

Descriptive statistics

Qualitative variables Frequency Relative frequency

Descriptive statistics

Qualitative variables Frequency Relative frequency Cumulative frequency

Descriptive statistics

Qualitative variables Frequency Relative frequency Cumulative frequency Cross tabulation

Descriptive statistics

Qualitative variables Frequency Relative frequency Cumulative frequency Cross tabulation (what about numerical variables?, grouping)

Descriptive statistics

Qualitative variables In terms of describing data, an appropriate

chart is almost always a good idea. What ‘appropriate’ means depends primarily

on the type of data, as well as on what particular features of it you want to explore.

Finally, a chart can often be used to illustrate or explain a complex situation for which a form of words or a table might be clumsy, lengthy or otherwise inadequate.

Descriptive statistics

Qualitative variables Charts

Pie chart

Postoperative Complications

23.5%

9.8%

39.2%

27.5%

Nausea VomitingPainCouph

Advantage:- 1. Summarize (Area-relative frequency)2. show magnitude (relative frequency)

Disadvantage:- 1. one variable only2. loose clarity if more than 4-5 categories.3. no cross tabulation “separate pies”

Descriptive statistics

Qualitative variables Charts

Pie Chart Bar Chart

Simple

Postoperative Complications

0

5

10

15

20

25

Nausea Vomiting Pain Couph

Nu

mb

er

of

pa

tie

nts

AlternativeWidth

spacing

Postoperative Complications

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Nausea Vomiting Pain Couph

Nu

mb

er

of

pa

tie

nts

(%

)

Descriptive statistics

Qualitative variables Charts

Pie Chart Bar Chart

Simple Clustered bar chart

Postoperative Complications

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Nausea Vomiting Pain Couph

Nu

mb

er

of

pa

tie

nts

(%

)

Group IGroup II

Descriptive statistics

Qualitative variables Charts

Pie Chart Bar Chart

Simple Clustered bar chart Stacked bar chart

Postoperative Complications

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

Nausea Vomiting Pain Couph

Nu

mb

er

of

pa

tie

nts

(%

)

Group IGroup II

Postoperative Complications

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Group I Group II

Nu

mb

er

of

pa

tie

nts

(%

)

CouphPainVomitingNausea

Descriptive statistics

Qualitative variables Charts

Pie Chart Bar Chart

Simple Clustered bar chart Stacked bar chart

Histogram

one variable – no gapping

Descriptive statistics

Qualitative variables Charts

Pie Chart Bar Chart

Simple Clustered bar chart Stacked bar chart

Histogram Frequency polygon Cumulative frequency polygon (Ogive)

Nausea Vomiting Pain Couph

0%

5%

10%

15%

20%

25%

30%

35%

40%

Nu

mb

er

of

pa

tie

nts

(%

)Postoperative Complications

Group IGroup II

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location) Mean (Average)

1 – 1 – 3 – 5 - 10

Advantages – Disadvantages – Ratio

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location) Median

1 – 1 – 3 – 5 - 10

1-3-5-9Odd vs Even

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location) Mode

1 – 1 – 3 – 5 - 10

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location) Midrange

1 – 1 – 3 – 5 - 10

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location)

1 – 1 – 3 – 5 - 10 Mean = 4 Median = 3 Mode = 1 Mid range = 4.5

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location)

Measures of degree of dispersion (summary measures of spread) Range

1 – 1 – 3 – 5 - 10

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location)

Measures of degree of dispersion (summary measures of spread) Percentiles – quartiles

1 – 1 – 3 – 5 - 10

1-2-5-6-8-9-11-13-17-20-22

1st quartile3rd quartile

70th quartile

90th percentile

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location)

Measures of degree of dispersion (summary measures of spread) Inter-quartile range (Q3 – Q1)

1 – 1 – 3 – 5 - 101st quartile

3rd quartile

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location)

Measures of degree of dispersion (summary measures of spread) Variance – standard deviation

1 – 1 – 3 – 5 - 10

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location)

Measures of degree of dispersion (summary measures of spread) Variance – standard deviation

1 – 1 – 3 – 5 - 10

Descriptive statistics

Numerical variables Measures of Central tendency

(summary measures of location)

Measures of degree of dispersion (summary measures of spread)

1 – 1 – 3 – 5 - 10 Range Inter-quartile range Variance Standard deviation

= 9= 4

= 3.35

= 11.2

Descriptive statistics

Numerical variables

Inferential statistics (Informed guess)

Making inference about population parameters from sample statistics

Inferential statistics (Informed guess)

Making inference about population parameters from sample statistics Standard Error (SD of the statistic)

Confidence interval (95% CI)

Inferential statistics (Informed guess)

Hypothesis testing Almost all clinical research begins with a question. For example, is stress a risk factor for breast

cancer? To answer questions like this you have to

transform the research question into a testable hypothesis called the null hypothesis, conventionally labeled H0.

This usually takes the following form: H0: Stress is NOT a risk factor for breast cancer H0: The drug has NO effect on mean heart rate

Inferential statistics (Informed guess)

Hypothesis testing Null hypotheses reflect the conservative position

of no difference, no risk, no effect, etc., To test this null hypothesis, researchers will take

samples and measure outcomes, and decide whether the data from the sample provides strong enough evidence to be able to reject the null hypothesis or not.

If evidence against the null hypothesis is strong enough for us to be able to reject it, then we are implicitly accepting that some specified alternative hypothesis, usually labelled H1, is probably true.

Inferential statistics (Informed guess)

Hypothesis testing ExampleIs the new hypothetical β-blocker is more

efficient than another conventional β-blocker (e.g. Inderal) in decreasing heart rate or not.

Inferential statistics (Informed guess)

Hypothesis testing ExampleLet the mean heart rate of all people having

Inderal is 1

Let the mean heart rate of all people having the other new drug is 2

Inferential statistics (Informed guess)

Type I & Type II Errors

The Ho is:

True false

The decision about Ho

Accept Good decision Type II error

reject Type I error Good decision

Inferential statistics (Informed guess)

: Probability of conducting type I error : Probability of conducting type II error p–value: the probability of getting the

outcome observed (or one more extreme), assuming the null hypothesis to be true.

Sample size & power of the study

Inferential statistics (Informed guess)

How does it work? Probability distributions

Inferential statistics (Informed guess)

How does it work? Parametric vs non-parametric tests

Inferential statistics (Informed guess)

Some example of testing of hypothesis? Comparisons

One sample Two independent samples Two dependent samples More than two samples (independent-dependent) Comparing two or more factors

Association

Inferential statistics (Informed guess)

Some example of testing of hypothesis? Comparisons

One sample Two independent samples Two dependent samples More than two samples (independent-dependent)

Association Prediction

y = 2.03x + 8

0

5

10

15

20

25

30

35

40

45

0 2 4 6 8 10 12 14

Age (y)

Wei

gh

t (k

g)

Inferential statistics (Informed guess)

Some example of testing of hypothesis? Comparisons

One sample Two independent samples Two dependent samples More than two samples (independent-dependent)

Association Prediction Diagnostic test (dichotomous – continuous)

Diagnostic tests

The disease (outcome) is:

Present Absent

The test is:

Positive TP FP

Negative FN TN

0.0 0.2 0.4 0.6 0.8 1.0

1 - Specificity

0.0

0.2

0.4

0.6

0.8

1.0

Sen

siti

vity

Diagonal segments are produced by ties.

ROC Curve

0.0 0.2 0.4 0.6 0.8 1.0

1 - Specificity

0.0

0.2

0.4

0.6

0.8

1.0

Sen

siti

vity

Diagonal segments are produced by ties.

ROC Curve

Inferential statistics (Informed guess)

Some example of testing of hypothesis? Comparisons

One sample Two independent samples Two dependent samples More than two samples (independent-dependent)

Association Prediction Diagnostic test (dichotomous – continuous) Survival analysis (censored data)

Final words If valid data are analyzed improperly, then

the results become invalid and the conclusions may well be inappropriate.

At best, the net effect is to waste time, effort, and money for the project.

At worst, therapeutic decisions may well be based upon invalid conclusions and patients’ wellbeing may be jeopardized.

Thank youThank you