lecture 1 - introduction 1 - introduction.pdf · data introduction each column should be divided...

Lecture 1 - INTRODUCTION

Definitions Data collection - principles Data classification – data types Data presentation/ visualization – Tables Charts/ graphs

DefinitionStatistics is the scinece of

collecting, classifying, presenting, interpretingdata and using it

to draw conclusions and to make decisions.

Definitions Descriptive statistics - collecting,

classifying and presenting data. Inferential statistics – interpreting

data and results generated bydescriptive statistics in order to drawconclusions and make decisions.

Biostatistics is the science that dealswith the application of statisticalmethods in the life sciences.

Definitions We call statistical population a lot of items that

have one or more common characteristics and aresubject to statistical research.

We call statistical individual an element of astatistical populations, regardless of its nature.

A sample group of individuals is a set ofindividuals selected from a statistical populationby a defined procedure.

A feature that changes from individual toindividual or in the same individual over time or inresponse to environmental conditions, disease,medication etc. is called variable or parameter.

1.Data collectionBasic Principle

It should not be possible for informationintroduced into a cell of a table to bedecomposed into simpler information:

WRONG

RIGHT

Wrong! Most cells contain information that

can be separated

For the leftcolumn it wouldbe correct tocreate a separatecolumn for eachdiagnosis: CICD,CICN, AP, API,where we writeonly YES or NO

Data introductionEach column should be divided into two columns:

VSH I VSHI1h, VSHI2h

VSH E VSHE1h, VSHE2h

This column should be replaced by two columns:

HTA, where we write YES or NOStage, where we write I / II / III/IV

Table design - recomandations

Number the lines of the table,whether the software automaticallydoes it or not.

Do not join more cells to groupmultiple columns / lines

Do not make one a separate tablefor each important category ofpatients

2.Data classificationData types Numerical data, discrete or continuous –

age, height, weight, blood pressure, hemoglobin (Hb), glycemia (blood sugar)

Ordinal data – disease stage, discharge status. The used codes have a clear order.

Categorical data – disease code, blood type, hair color, preferred political party. The used codes do not have a clear order.

Alphanumeric data– name, surname, address, workplace, description of a disease.

Data tableNo. Year Name Surname Sex Area Age Decade Occupation Stage1 2008 CALOTA LUCIA F RURAL 62 60-69 FARA OCUPATIE III2 2009 CONSTANTIN MARIN M URBAN 55 50-59 FARA OCUPATIE III3 2007 FLOREA ELENA F RURAL 83 80-89 PENSIONAR II4 2009 HOLT MARIANA F URBAN 65 60-69 PENSIONAR I5 2010 IVANESCU VIRGIL M RURAL 64 60-69 PENSIONAR II6 2012 LEPADAT MARIN M URBAN 68 60-69 PENSIONAR III7 2011 MANOLACHE EUGENIA F RURAL 39 30-39 SALARIAT IV8 2010 MARINESCU DAN M RURAL 57 50-59 PENSIONAR IV9 2008 STAN SANDU M URBAN 53 50-59 PENSIONAR V10 2007 NEAGU MARIA M URBAN 53 50-59 PENSIONAR III11 2008 NEDELEA GHEORGHE F RURAL 70 70-79 PENSIONAR II12 2009 ORZESCU ION M URBAN 71 70-79 PENSIONAR V13 2011 PALIU MARIN F RURAL 76 70-79 PENSIONAR IV14 2013 PISICA MIHAIL F RURAL 72 70-79 PENSIONAR III15 2010 POPESCU PETRE M URBAN 58 50-59 PENSIONAR IV16 2012 PREDA ION M RURAL 45 40-49 SALARIAT V17 2009 ALBU NICOLAE M RURAL 45 40-49 SALARIAT V18 2008 RADUCAN ELISABETA M URBAN 62 60-69 FARA OCUPATIE IV19 2010 RADUCEANU ION M URBAN 39 30-39 FARA OCUPATIE III20 2012 IONESCU MARIA M URBAN 39 30-39 FARA OCUPATIE IV

3.Data presentationFrequency tables

Nr. Age Fi Ficc Ficd fi ficc ficd

1 25 - 30 5 5 234 2.14% 2.14% 100.00%

2 30 - 35 6 11 229 2.56% 4.70% 97.86%

3 35 - 40 9 20 223 3.85% 8.55% 95.30%

4 40 - 45 26 46 214 11.11% 19.66% 91.45%

5 45 - 50 30 76 188 12.82% 32.48% 80.34%

6 50 - 55 50 126 158 21.37% 53.85% 67.52%

7 55 - 60 53 179 108 22.65% 76.50% 46.15%

8 60 - 65 32 211 55 13.68% 90.17% 23.50%

9 65 - 70 14 225 23 5.98% 96.15% 9.83%

10 70 - 75 5 230 9 2.14% 98.29% 3.85%

11 75 - 80 4 234 4 1.71% 100.00% 1.71%

Total 234 100%

Number of individuals= absolute frequency

Percentage = relative frequency

Frequency tables - exampleSurvival in breast cancer

2456 patients 12 month

survival classes

Sometimes the data are summarized in atable as the one above.Data representation in the form of a chart(e.g. histogram) makes them much easierto understand.

Data presentationIncidence tables

Patients with Diabetes Mellitus may haveretinopathy and nephropathy as majorcomplications

We say that we have a match if bothcomplications are present or absent

In this table there are 29 + with + matches(cell a) and 172 - with - matches (cell d)

Incidence tables

This is a table of correlation between grooms’ and brides’ ages forweddings recorded in Dolj County in 1998.Horizontally we have grooms’ ages and vertically – brides’ ages.In each cell we wrote the number of couples with ages in thecorresponding age categories.Is there a correlation between the ages of brides and grooms?Obviously, YES! Most cases are on the main diagonal of the table!

Data visualization - charts

Men243

58.13%

Women175

41.87%

Rural180

43.06%Urban238

56.94%

0

5

10

15

20

25

2006 2007 2008 2009 2010

1214

21

25

18

No.

of p

atie

nts

Admission year

Colums/ bars chart

Pie chart

Data visualization - charts

0

10

20

30

40

50

60

70

80

90

100

0 10 20 30 40 50 60 70 80 90 100

ESR

2 h

ESR 1 h

Coreltation betweenESR at 1h and 2h

XY (scatter) chartESR=Erythrocyte sedimentation rate

Linear chart(evolution chart)

lecture 1 - introduction 1 - introduction.pdf · data introduction each column should be divided...

Documents