lecture 1 - introduction 1 - introduction.pdf · data introduction each column should be divided...
TRANSCRIPT
Lecture 1 - INTRODUCTION
Definitions Data collection - principles Data classification – data types Data presentation/ visualization – Tables Charts/ graphs
DefinitionStatistics is the scinece of
collecting, classifying, presenting, interpretingdata and using it
to draw conclusions and to make decisions.
Definitions Descriptive statistics - collecting,
classifying and presenting data. Inferential statistics – interpreting
data and results generated bydescriptive statistics in order to drawconclusions and make decisions.
Biostatistics is the science that dealswith the application of statisticalmethods in the life sciences.
Definitions We call statistical population a lot of items that
have one or more common characteristics and aresubject to statistical research.
We call statistical individual an element of astatistical populations, regardless of its nature.
A sample group of individuals is a set ofindividuals selected from a statistical populationby a defined procedure.
A feature that changes from individual toindividual or in the same individual over time or inresponse to environmental conditions, disease,medication etc. is called variable or parameter.
1.Data collectionBasic Principle
It should not be possible for informationintroduced into a cell of a table to bedecomposed into simpler information:
WRONG
RIGHT
Wrong! Most cells contain information that
can be separated
For the leftcolumn it wouldbe correct tocreate a separatecolumn for eachdiagnosis: CICD,CICN, AP, API,where we writeonly YES or NO
Data introductionEach column should be divided into two columns:
VSH I VSHI1h, VSHI2h
VSH E VSHE1h, VSHE2h
This column should be replaced by two columns:
HTA, where we write YES or NOStage, where we write I / II / III/IV
Table design - recomandations
Number the lines of the table,whether the software automaticallydoes it or not.
Do not join more cells to groupmultiple columns / lines
Do not make one a separate tablefor each important category ofpatients
2.Data classificationData types Numerical data, discrete or continuous –
age, height, weight, blood pressure, hemoglobin (Hb), glycemia (blood sugar)
Ordinal data – disease stage, discharge status. The used codes have a clear order.
Categorical data – disease code, blood type, hair color, preferred political party. The used codes do not have a clear order.
Alphanumeric data– name, surname, address, workplace, description of a disease.
Data tableNo. Year Name Surname Sex Area Age Decade Occupation Stage1 2008 CALOTA LUCIA F RURAL 62 60-69 FARA OCUPATIE III2 2009 CONSTANTIN MARIN M URBAN 55 50-59 FARA OCUPATIE III3 2007 FLOREA ELENA F RURAL 83 80-89 PENSIONAR II4 2009 HOLT MARIANA F URBAN 65 60-69 PENSIONAR I5 2010 IVANESCU VIRGIL M RURAL 64 60-69 PENSIONAR II6 2012 LEPADAT MARIN M URBAN 68 60-69 PENSIONAR III7 2011 MANOLACHE EUGENIA F RURAL 39 30-39 SALARIAT IV8 2010 MARINESCU DAN M RURAL 57 50-59 PENSIONAR IV9 2008 STAN SANDU M URBAN 53 50-59 PENSIONAR V10 2007 NEAGU MARIA M URBAN 53 50-59 PENSIONAR III11 2008 NEDELEA GHEORGHE F RURAL 70 70-79 PENSIONAR II12 2009 ORZESCU ION M URBAN 71 70-79 PENSIONAR V13 2011 PALIU MARIN F RURAL 76 70-79 PENSIONAR IV14 2013 PISICA MIHAIL F RURAL 72 70-79 PENSIONAR III15 2010 POPESCU PETRE M URBAN 58 50-59 PENSIONAR IV16 2012 PREDA ION M RURAL 45 40-49 SALARIAT V17 2009 ALBU NICOLAE M RURAL 45 40-49 SALARIAT V18 2008 RADUCAN ELISABETA M URBAN 62 60-69 FARA OCUPATIE IV19 2010 RADUCEANU ION M URBAN 39 30-39 FARA OCUPATIE III20 2012 IONESCU MARIA M URBAN 39 30-39 FARA OCUPATIE IV
3.Data presentationFrequency tables
Nr. Age Fi Ficc Ficd fi ficc ficd
1 25 - 30 5 5 234 2.14% 2.14% 100.00%
2 30 - 35 6 11 229 2.56% 4.70% 97.86%
3 35 - 40 9 20 223 3.85% 8.55% 95.30%
4 40 - 45 26 46 214 11.11% 19.66% 91.45%
5 45 - 50 30 76 188 12.82% 32.48% 80.34%
6 50 - 55 50 126 158 21.37% 53.85% 67.52%
7 55 - 60 53 179 108 22.65% 76.50% 46.15%
8 60 - 65 32 211 55 13.68% 90.17% 23.50%
9 65 - 70 14 225 23 5.98% 96.15% 9.83%
10 70 - 75 5 230 9 2.14% 98.29% 3.85%
11 75 - 80 4 234 4 1.71% 100.00% 1.71%
Total 234 100%
Number of individuals= absolute frequency
Percentage = relative frequency
Frequency tables - exampleSurvival in breast cancer
2456 patients 12 month
survival classes
Sometimes the data are summarized in atable as the one above.Data representation in the form of a chart(e.g. histogram) makes them much easierto understand.
Data presentationIncidence tables
Patients with Diabetes Mellitus may haveretinopathy and nephropathy as majorcomplications
We say that we have a match if bothcomplications are present or absent
In this table there are 29 + with + matches(cell a) and 172 - with - matches (cell d)
Incidence tables
This is a table of correlation between grooms’ and brides’ ages forweddings recorded in Dolj County in 1998.Horizontally we have grooms’ ages and vertically – brides’ ages.In each cell we wrote the number of couples with ages in thecorresponding age categories.Is there a correlation between the ages of brides and grooms?Obviously, YES! Most cases are on the main diagonal of the table!
Data visualization - charts
Men243
58.13%
Women175
41.87%
Rural180
43.06%Urban238
56.94%
0
5
10
15
20
25
2006 2007 2008 2009 2010
1214
21
25
18
No.
of p
atie
nts
Admission year
Colums/ bars chart
Pie chart
Data visualization - charts
0
10
20
30
40
50
60
70
80
90
100
0 10 20 30 40 50 60 70 80 90 100
ESR
2 h
ESR 1 h
Coreltation betweenESR at 1h and 2h
XY (scatter) chartESR=Erythrocyte sedimentation rate
Linear chart(evolution chart)