types of data & variables statistical procedures for ......types of data: quantitative data...

14
Statistical Procedures for Biomedical Study Jaranit Kaewkungwal Faculty of Tropical Medicine Types of Data & Variables Types of Data QUALITATIVE Data expressed by type Data that has been described QUANTITATIVE Data classified by numeric value Data that has been measured or counted Adapted from: Dr. Craig Jackson, University of Central England QUALITATIVE and QUANTITATIVE data are not mutually exclusive Types of Data: Qualitative (Categorical) Data NOMINAL DATA values that the data may have do not have specific order values act as labels with no real meaning Binomial: two possible values (categories, states) Multinomial: more than two possible values (categories, states) e.g. Health status healthy =1 sick=2 e.g. Treatment new regimen = 1 standard regimen = 2 e.g. hair colour brown =1 blond =2 black =100 ORDINAL DATA values with some kind of ordering data that has been measured or counted e.g. social class: upper=1 middle = 2 working = 3 e.g. glioblastoma tumor grade: 1 2 3 4 5 e.g. position in a race: 1 st 2 nd 3 rd Adapted from: Dr. Craig Jackson, University of Central England

Upload: others

Post on 09-Sep-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Statistical Procedures for Biomedical Study

Jaranit KaewkungwalFaculty of Tropical Medicine

Types of Data & Variables

Types of Data

QUALITATIVE

Data expressed by type

Data that has been described

QUANTITATIVE

Data classified by numeric value

Data that has been measured or counted

Adapted from: Dr. Craig Jackson, University of Central England

QUALITATIVE and QUANTITATIVE data are not mutually exclusive

Types of Data: Qualitative (Categorical) Data

NOMINAL DATA• values that the data may have do not have specific order• values act as labels with no real meaning• Binomial: two possible values (categories, states)• Multinomial: more than two possible values (categories, states)e.g. Health status healthy =1 sick=2e.g. Treatment new regimen = 1 standard regimen = 2e.g. hair colour brown =1 blond =2 black =100

ORDINAL DATA• values with some kind of ordering • data that has been measured or countede.g. social class: upper=1 middle = 2 working = 3e.g. glioblastoma tumor grade: 1 2 3 4 5e.g. position in a race: 1 st 2 nd 3 rd

Adapted from: Dr. Craig Jackson, University of Central England

Page 2: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Types of Data: Quantitative Data

DISCRETE• distinct or separate parts, with no finite detaile.g children in family

CONTINUOUS• between any two values, there would be a thirde.g between meters there are centimetres

INTERVAL• equal intervals between values and an arbitrary zero on the scalee.g temperature gradient

RATIO• equal intervals between values and an absolute zeroe.g body mass index

Adapted from: Dr. Craig Jackson, University of Central England

Examples of Data Coding

1 2 99Cat.

Example of Descriptive Statistics Constant vs. Variable

Variables are the specific properties that have the ability to take different values.

Constants are the specific properties that cannot vary or won’t be made to vary.

Page 3: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Terminology - Variables

INDEPENDENT(syn: treatment, predictor, input, exposure, explanatory variable) is a stimulus or activity that is identified or manipulated to predict the dependent variable e.g. new drug, sex, smokingDEPENDENT(syn: effect, criterion, outcome, output variable) is a response that the researcher wanted to assess and it happens due to the independent variables. e.g., cure rates, attitudes, performance on neuropsychological test

Adapted from: Dr. Craig Jackson, University of Central England

X Y X (independent) Y (dependent)

Terminology - Variables

EXTRANEOUSExtraneous variable is a variable that has a potential to distorts the relationship between dependent and independent variables.

• Controlled extraneous variables are recognized before the study is initiated and are controlled in the design and selection criteria.• Uncontrolled extraneous variables are recognized before or after the study is initiated but are not controlled in the design. They can be assessed and adjusted later by statistical methods.

e.g., diet, income, age

Adapted from: Dr. Craig Jackson, University of Central England

X (independent) Y (dependent)X1 (independent) Y (dependent)

X2 (independent)

Study Variables

Confounding Variable- When the effects of two or more variables cannot be separated.

“Condom Use increases the risk of STD”BUT ...

Explanation: Individuals with more partners are more likely to use condoms. But individuals with more partners are also more likely to get STD.

Study VariablesConfounding Variable- When the effects of two or more variables cannot be separated.

Page 4: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Types of Study

•Types of Observational Study• Descriptive(Parameter Estimation)

- Cross-sectional• Analytic

(Hypothesis Testing)- Cross-sectional- Case-Control- Cohort

• Types of Experimental Study• True Experimental

- Randomize Control Trial (RCT)• Quasi Experimental

• Other Types of MedicalResearch Study

• Diagnosis• Prognostic Factor Study

Types of Study Design

•Types of Observational Study• Descriptive(Parameter Estimation)

- Cross-sectional

• Analytic(Hypothesis Testing)

- Cross-sectional- Case-Control- Cohort

Types of Study Design:Descriptive Study

Sampling Techniques

Generalization/Inferential Statistics

P

X

•Types of Observational Study• Descriptive(Parameter Estimation)

- Cross-sectional

• Analytic(Hypothesis Testing)

- Cross-sectional- Case-Control- Cohort

Types of Study Design:Analytic Study

Generalization/

P1      X1

P2     X2

Ho:P1 = P2      X1 = X2

Page 5: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Types of Study Design:Analytic - Observational

•Types of Observational Study• Descriptive(Parameter Estimation)

- Cross-sectional• Analytic

(Hypothesis Testing)- Cross-sectional- Case-Control- Cohort

Cross‐sectionalExposure <‐> Outcome

Smoking (y/n)<‐> Cancer (y/n)Depress <‐> QOL

Case‐ControlOutcome  Exposure

Cancer  (y/n) Smoking (y/n)Stroke after surgery(y/n) surgery time 

CohortExposure  Outcome

Smoking  (y/n) Cancer (y/n)Diabetes(y/n) Cognitive & physical function score 

D D

E a b m

E c d n

o p N

Coho

rt

Case-Control

• Types of Experimental Study• True Experimental

- Randomize Control Trial (RCT)• Quasi Experimental

Types of Study Design:Analytic - Experimental

ExperimentalExposure  Outcome

Treatment  (A/B) Cure (y/n)Intensive counseling (y/n) Coping score

D D

E a b m

E c d n

o p N

Coho

rtR

CT

Case-Control

Subjects, i = 1,…,n

A

C

B

Randomize

Treatment groups

• Types of Experimental Study• True Experimental

- Randomize Control Trial (RCT)• Quasi Experimental

Truly randomized subjects

Not‐ truly randomized subjects

Types of Study Design:Analytic - Experimental

Subjects, i = 1,…,n

A

C

B

Treatment groupsBaseline Exposure groups Measurement times

Types of Study Design:Analytic – Repeated measures (Longitudinal)

Page 6: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Subjects, i = 1,…,n

A

C

B

Measurement times

Xi0 Xi1 Xi2 Xi3 Xi4

Treatment groupsBaseline Exposure groups

Types of Study Design:Analytic – Repeated measures (Longitudinal)

Subjects, i = 1,…,n

A

B

Study Closure

LostStudy closed

OutcomeOutcome

OutcomeOutcome of other cause

OutcomeStudy closed

Failure

Censor

Time from randomized to outcomes / censor / study closure

Treatment groupsBaseline Exposure groups

Types of Study Design:Analytic – Time-to-Event (Longitudinal)

Types of Statistics

Types of Statistics

• By Level of Underlying Distribution– Parametric Statistics– Non-parametric Statistics

Sampling Techniques

Generalization/Inferential Statistics

Page 7: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Types of Statistics

• By Level of Generalization– Descriptive Statistics– Inferential Statistics

• Parameter Estimation• Hypothesis Testing

– Comparison– Association– Multivariable data analysis– Multivariate data analysis

Sampling Techniques

Generalization/Inferential Statistics

Types of Statistics

• By Level of Generalization– Descriptive Statistics– Inferential Statistics

• Parameter Estimation• Hypothesis Testing

– Comparison– Association– Multivariable data analysis

Sampling Techniques

Generalization/Inferential Statistics

Describe Statistics of SampleP      X     SD

Descriptive Statistics

• Describe sample statistics– Categorical Variables

Proportion, Percent (%)– Continuous Variable

• Normal Distribution Mean SD

• Not normal DistributionMedian Range/IQR

Examples: Descriptive Statistics

Page 8: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Examples: Descriptive Statistics Types of Statistics

• By Level of Generalization– Descriptive Statistics– Inferential Statistics

• Parameter Estimation• Hypothesis Testing

– Comparison– Association– Multivariable data analysis

Descriptive StudyEstimate Parameter in Population from  Statistics of Sample

P   P (95%CI of P)                        Ex:  20% (17%‐23%) = X (SD)   X (SE)  or X (95%CI of X)     Ex:  2.75 (1.60‐3.90) = 

Sampling Techniques

Generalization/Inferential Statistics

P

X

Inferential Statistics

• Estimate population parameters– Categorical Variables

Proportion, Percent (%) with 95%CI– Continuous Variable

Mean SE (or 95%CI)

Examples: Parameter Estimation

Page 9: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Examples: Parameter Estimation Types of Statistics

• By Level of Generalization– Descriptive Statistics– Inferential Statistics

• Parameter Estimation• Hypothesis Testing

– Comparison– Association– Multivariable data analysis

Generalization/

P1      X1

P2     X2

Ho:P1 = P2      X1 = X2

Analytic StudyAssociation or Causation 

between Exposure and Outcome

Inferential Statistics

• Comparison between groups – Categorical outcome

Chi-square (2) test– Continuous Variable

• Normal Distributiont-test (2 groups) F-test/ANOVA (>= 2 groups)

• Not normal DistributionWilcoxon test, Mann-Whitney U test

Examples: 2‐test  & t‐test

Page 10: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Examples: Paired t-test Types of Statistics

• By Level of Generalization– Descriptive Statistics– Inferential Statistics

• Parameter Estimation• Hypothesis Testing

– Comparison– Association– Multivariable data analysis

Analytic StudyAssociation or Causation 

between One or Many Exposures and Outcome

Example:SimpleY    XBP    Drug (A/B)BP   Age

MultipleY    X1 X2 X3BP   Drug(A/B)  Sex (M/F)  Age

Types of Statistics

Regression Model• Prediction of outcome from exposures• Hypothesis testing with adjusted (controlled) for confounding 

other variablesSimple     Y    X• BP    Drug (A/B)

BP = a + b Drug• BP   Age

BP = a + b AgeMultiple    Y    X1 X2 X3

BP   Drug(A/B)  Sex (M/F)  AgeBP = a + b1 Drug  + b2 Drug  + b3 Age BP = 100  +   3 Drug  +    5 Sex     ‐ 7 Age 

A=0 / B=1    M=0 / F=1

Multi-variables analysis

• Linear Regression

• Logistic Regression

• Poisson Regression

• Cox’s Proportional Hazard Regression

Y X1 X2 X3 …Continuous Continuous, Categorical

(normal dist)

Y X1 X2 X3 …Categorical Continuous, Categorical

(Binary, Ordinal, Multinomial))

Y X1 X2 X3 …Incidence / Count Continuous, Categorical

Y X1 X2 X3 …Time to event Continuous, Categorical

Page 11: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Typical Regression Models in Clinical Research

• Linear Regression

• Binary Logistic Regression

• Poisson Regression

• Cox’s Proportional Hazard Regression

Y X1 X2 X3 …Continuous Continuous, Categorical

Asthma scores Age Sex Smoking

Y X1 X2 X3 …Categorical Continuous, Categorical

Asthma(1) vs. No Asthma(0) Age Sex Smoking

Y X1 X2 X3 …Incidence / Count Continuous, Categorical

Asthma episodes in past year Age Sex Smoking

Y X1 X2 X3 …Time to event Continuous, Categorical

Time to asthma relapse Age Sex Smoking

Comparisons Among Different Regression Models

Linear regression model

OR = exp Binary logistic regression model

Cox regression modelHR = exp 

IRR = exp 

Poisson regression model

Note: If P0 (T) is small, when comparing two groups

Simple and Multiple Models Crude and Adjusted RRin Case-Control study

Page 12: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Relative Risk in Experimental Study Relative Risk in Experimental Study

Relative Risk in Experimental Study

4/9  = 0.441/9  = 0.11

0.44 – 0.110.44

วเิคราะห์แบบได้รับการรักษาครบตามทีก่าํหนดไว้ในแผนการวจิยั(Per‐Protocol – PP)

Relative Risk in Experimental Study

Page 13: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Example from RV144 (2009)

ITT Population

Modified ITT Population

IRR / HR and Efficacy in Clinical TrialExample from RV144 (2009)

ประสิทธิผล0.279 – 0.192

0.279

ประสิทธิผล1- RR = 1 - 0.192

0.279

Types of Statistics

• By Level of Generalization– Descriptive Statistics– Inferential Statistics

• Parameter Estimation• Hypothesis Testing

– Comparison– Association– Multivariate data analysis

Analytic StudyAssociation or Causation 

between Many Exposures and Many Outcomes Over Time

Example:Single OutcomeY    XY    X1 X2 X3BP   Drug(A/B)  Sex (M/F)  Age

Repeated OutcomesYt1 Yt2 Yt3 XYt1 Yt2 Yt3 X1 X2 X3BPt1 BPt2 BPt3 Drug(A/B)  Sex (M/F)  Age

Different Analytic Modelsin Longitudinal Data Analysis

• Generalized Linear Model (GLM)– Basic Statistical Methods for Hypothesis Testing

(t-test, F-test, Regressions)– Repeated Measures Analysis of Variance

• Generalized Estimating Equation (GEE)• Time-varying Covariates• Generalized Mixed Model• Survival & Time-to-event Model• Multiple Failures Model• Time series Model (secondary data)

Page 14: Types of Data & Variables Statistical Procedures for ......Types of Data: Quantitative Data DISCRETE • distinct or separate parts, with no finite detail e.g children in family CONTINUOUS

Example: Comparing means at each time point

Example: Comparing means at each time point and against baseline

Example: Comparing change over time

The End Statistical Procedures for Clinical Study