lecture 1 - introduction 1 - introduction.pdf · data introduction each column should be divided...

16
Lecture 1 - INTRODUCTION Definitions Data collection - principles Data classification – data types Data presentation/ visualization – Tables Charts/ graphs

Upload: others

Post on 15-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Lecture 1 - INTRODUCTION

Definitions Data collection - principles Data classification – data types Data presentation/ visualization – Tables Charts/ graphs

Page 2: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

DefinitionStatistics is the scinece of

collecting, classifying, presenting, interpretingdata and using it

to draw conclusions and to make decisions.

Page 3: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Definitions Descriptive statistics - collecting,

classifying and presenting data. Inferential statistics – interpreting

data and results generated bydescriptive statistics in order to drawconclusions and make decisions.

Biostatistics is the science that dealswith the application of statisticalmethods in the life sciences.

Page 4: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Definitions We call statistical population a lot of items that

have one or more common characteristics and aresubject to statistical research.

We call statistical individual an element of astatistical populations, regardless of its nature.

A sample group of individuals is a set ofindividuals selected from a statistical populationby a defined procedure.

A feature that changes from individual toindividual or in the same individual over time or inresponse to environmental conditions, disease,medication etc. is called variable or parameter.

Page 5: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

1.Data collectionBasic Principle

It should not be possible for informationintroduced into a cell of a table to bedecomposed into simpler information:

WRONG

RIGHT

Page 6: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Wrong! Most cells contain information that

can be separated

For the leftcolumn it wouldbe correct tocreate a separatecolumn for eachdiagnosis: CICD,CICN, AP, API,where we writeonly YES or NO

Page 7: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Data introductionEach column should be divided into two columns:

VSH I VSHI1h, VSHI2h

VSH E VSHE1h, VSHE2h

This column should be replaced by two columns:

HTA, where we write YES or NOStage, where we write I / II / III/IV

Page 8: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Table design - recomandations

Number the lines of the table,whether the software automaticallydoes it or not.

Do not join more cells to groupmultiple columns / lines

Do not make one a separate tablefor each important category ofpatients

Page 9: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

2.Data classificationData types Numerical data, discrete or continuous –

age, height, weight, blood pressure, hemoglobin (Hb), glycemia (blood sugar)

Ordinal data – disease stage, discharge status. The used codes have a clear order.

Categorical data – disease code, blood type, hair color, preferred political party. The used codes do not have a clear order.

Alphanumeric data– name, surname, address, workplace, description of a disease.

Page 10: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Data tableNo. Year Name Surname Sex Area Age Decade Occupation Stage1 2008 CALOTA LUCIA F RURAL 62 60-69 FARA OCUPATIE III2 2009 CONSTANTIN MARIN M URBAN 55 50-59 FARA OCUPATIE III3 2007 FLOREA ELENA F RURAL 83 80-89 PENSIONAR II4 2009 HOLT MARIANA F URBAN 65 60-69 PENSIONAR I5 2010 IVANESCU VIRGIL M RURAL 64 60-69 PENSIONAR II6 2012 LEPADAT MARIN M URBAN 68 60-69 PENSIONAR III7 2011 MANOLACHE EUGENIA F RURAL 39 30-39 SALARIAT IV8 2010 MARINESCU DAN M RURAL 57 50-59 PENSIONAR IV9 2008 STAN SANDU M URBAN 53 50-59 PENSIONAR V10 2007 NEAGU MARIA M URBAN 53 50-59 PENSIONAR III11 2008 NEDELEA GHEORGHE F RURAL 70 70-79 PENSIONAR II12 2009 ORZESCU ION M URBAN 71 70-79 PENSIONAR V13 2011 PALIU MARIN F RURAL 76 70-79 PENSIONAR IV14 2013 PISICA MIHAIL F RURAL 72 70-79 PENSIONAR III15 2010 POPESCU PETRE M URBAN 58 50-59 PENSIONAR IV16 2012 PREDA ION M RURAL 45 40-49 SALARIAT V17 2009 ALBU NICOLAE M RURAL 45 40-49 SALARIAT V18 2008 RADUCAN ELISABETA M URBAN 62 60-69 FARA OCUPATIE IV19 2010 RADUCEANU ION M URBAN 39 30-39 FARA OCUPATIE III20 2012 IONESCU MARIA M URBAN 39 30-39 FARA OCUPATIE IV

Page 11: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

3.Data presentationFrequency tables

Nr. Age Fi Ficc Ficd fi ficc ficd

1 25 - 30 5 5 234 2.14% 2.14% 100.00%

2 30 - 35 6 11 229 2.56% 4.70% 97.86%

3 35 - 40 9 20 223 3.85% 8.55% 95.30%

4 40 - 45 26 46 214 11.11% 19.66% 91.45%

5 45 - 50 30 76 188 12.82% 32.48% 80.34%

6 50 - 55 50 126 158 21.37% 53.85% 67.52%

7 55 - 60 53 179 108 22.65% 76.50% 46.15%

8 60 - 65 32 211 55 13.68% 90.17% 23.50%

9 65 - 70 14 225 23 5.98% 96.15% 9.83%

10 70 - 75 5 230 9 2.14% 98.29% 3.85%

11 75 - 80 4 234 4 1.71% 100.00% 1.71%

Total 234 100%

Number of individuals= absolute frequency

Percentage = relative frequency

Page 12: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Frequency tables - exampleSurvival in breast cancer

2456 patients 12 month

survival classes

Sometimes the data are summarized in atable as the one above.Data representation in the form of a chart(e.g. histogram) makes them much easierto understand.

Page 13: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Data presentationIncidence tables

Patients with Diabetes Mellitus may haveretinopathy and nephropathy as majorcomplications

We say that we have a match if bothcomplications are present or absent

In this table there are 29 + with + matches(cell a) and 172 - with - matches (cell d)

Page 14: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Incidence tables

This is a table of correlation between grooms’ and brides’ ages forweddings recorded in Dolj County in 1998.Horizontally we have grooms’ ages and vertically – brides’ ages.In each cell we wrote the number of couples with ages in thecorresponding age categories.Is there a correlation between the ages of brides and grooms?Obviously, YES! Most cases are on the main diagonal of the table!

Page 15: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Data visualization - charts

Men243

58.13%

Women175

41.87%

Rural180

43.06%Urban238

56.94%

0

5

10

15

20

25

2006 2007 2008 2009 2010

1214

21

25

18

No.

of p

atie

nts

Admission year

Colums/ bars chart

Pie chart

Page 16: Lecture 1 - Introduction 1 - Introduction.pdf · Data introduction Each column should be divided into two columns: VSH I VSHI1h, VSHI2h VSH E VSHE1h, VSHE2h This column should be

Data visualization - charts

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

ESR

2 h

ESR 1 h

Coreltation betweenESR at 1h and 2h

XY (scatter) chartESR=Erythrocyte sedimentation rate

Linear chart(evolution chart)