lp9_en

10
Practice no 9/2014-2015__________________________ UMF “Carol Davila” – Medical Informatics & Biostatistics 1 Practice No 9 General information The first objective of this practice is to create a view to insert data in files that are created by EpiInfo 2005. This is software useful particularly in epidemiology. (We have to point out that the type of files created by this software is the same as that specific to Access.) The major advantage of this software is the price (it is free-of-charge), due to the fact that it allows the most part of data processing needed in medical research. Its strong point is the possibility to create questionnaires (views), which do allow inserting only not erroneous data. The major weakness is the low quality of the diagrams that are created. A second objective of the practice is presenting how elementary statistical processing is done and how diagrams are constructed by using this software. During this practice: a) You will create files of database type, and inside them questionnaires, and then you will insert records; b) You will start statistical processing of records, from simple examples. Subjects 35: creating questionnaires in Epi Info 36: inserting data in Epi Info 37: primary statistical analysis of data from files Software used during practice: Epi Info 2005

Upload: giorgos-doukas-karanasios

Post on 25-Sep-2015

216 views

Category:

Documents


0 download

DESCRIPTION

info

TRANSCRIPT

  • Practice no 9/2014-2015 __________________________ UMF Carol Davila Medical Informatics & Biostatistics 1

    Practice No 9 General information The first objective of this practice is to create a view to insert data in files that are created by EpiInfo 2005. This is software useful particularly in epidemiology. (We have to point out that the type of files created by this software is the same as that specific to Access.) The major advantage of this software is the price (it is free-of-charge), due to the fact that it allows the most part of data processing needed in medical research. Its strong point is the possibility to create questionnaires (views), which do allow inserting only not erroneous data. The major weakness is the low quality of the diagrams that are created. A second objective of the practice is presenting how elementary statistical processing is done and how diagrams are constructed by using this software. During this practice:

    a) You will create files of database type, and inside them questionnaires, and then you will insert records;

    b) You will start statistical processing of records, from simple examples. Subjects 35: creating questionnaires in Epi Info 36: inserting data in Epi Info 37: primary statistical analysis of data from files Software used during practice: Epi Info 2005

  • Practice no 9/2014-2015 __________________________ UMF Carol Davila Medical Informatics & Biostatistics 2

    Subject 35: Creating questionnaires in Epi Info Epi Info is software for processing data organized in questionnaire form and presenting results in reports. Initially used in epidemiology, Epi Info is successfully used also to process other biomedical data; this software allows management and statistical processing similar to SAS, SPSS, and is freeware. The starting page is as follows:

    The main components of Epi Info: Make View, which is a text editor de text, used to define data fields on one or several pages of a View. Enter Data, which lists questionnaires built with Make View, controls the process of inserting data and allows searching for records. Analyze Data, used to analyze data stored in files created not only with Epi Info, but also with dBase, FoxPro, Excel etc. These files may contain lists, frequencies, tables, and diagrams, data typical to epidemiological studies. Create Maps, which is an instrument used to create epidemiological maps. Create Reports, used to generate reports. Other components of the software are as follows: NutStat, used to register and evaluate measurements related to heights, weights, head and thorax circumference for youngsters. StatCalc, which is used to compute with data stored in tables. Data Compare, used to identify differences between two tables. Table to View, used to generate a view on the basis of the existent data table. VisData, used to read data tables and change their properties. Epi Lock, which codifies data to protect the access and to facilitate transmission and data backup creation. Compact, which compacts databases of (MS)Access type.

  • Practice no 9/2014-2015 __________________________ UMF Carol Davila Medical Informatics & Biostatistics 3

    Epi Info contains also: A help system, containing information about what is offered, A user manual, and An interactive program to create epidemiological files.

    To create questionnaires just use Make View, more precisely the command: FileNewFile name (name of data base: name_EPI)OpenName the View (Chest1 as questionnaire name)

    On the left side three options referring to the management of questionnaire pages are presented (Add Page inserting new pages at the end of the existent ones, Insert Page inserting new pages between two existent, Delete Page deleting the current page). The command Program allows programming some checking operations, to avoid errors that may appear when inserting data. Inserting new fields in the current page of the view (at right) is easy: a right-click over the position where the new field should appear (the grid helps identifying this position). Then, in the dialog box Field Definition the necessary characteristics of the field the name, type, dimensions, limits of values, codes, legal values etc. are to be specified. The dialog box Field Definition is presented below. Notice that the type of the field is, by default, Text.

  • Practice no 9/2014-2015 __________________________ UMF Carol Davila Medical Informatics & Biostatistics 4

    The questionnaire (view) you create will contain 15 fields: 1. The personal ID (SSN). In the edit text Question or Prompt insert the text Social Security Number:, in the group Field or Variable choose as Type the value Number, and as Pattern the value ############# (i.e. 13 digits); finally, in the edit text Field Name insert the text SSN. (Let us mention here that the sequence SSN will stand for the name of the field, and the longer sequence Social Security Number: will be used as label on the screen.) 2. Family name of patient will be of text type and will have at most 30 characters. This time in the edit text Question or Prompt insert Family name:, as Type choose Text, Size will be fixed at 30. Leave the name of the field that proposed in the edit text Field Name. 3. Last name of patient will be treated similarly. 4. The gender of patient will have two possible values: F or M. This time, in the edit text Question or Prompt insert Gender:, as Type choose again Text, but in group Code Tables press the button Legal Values, then the button Create New, and key in the legal values F, then M. Leave also in this case the name of the field that proposed in the edit text Field Name.

    5. Birth date of patient will be obviously a calendar date. To be able to correctly collect suchdata, in the edit text Question or Prompt insert Birth date:, as Type choose Date, and as Pattern choose DD-MM-YYYY. This time we have to insert in the edit text Field Name the name of the field, for example BirthDate. 6. Admission date of patient will be treated similarly. 7. Edema will be a two possible values variable (Yes/No). This time, in the edit text Question or Prompt insert Edema?, as Type choose Yes/No. In this case the name of the field, in the edit text Field Name, will be modified into Oedema. Proceed similarly for the next three fields: 8. Pleurisy. 9. Palpitations. 10. Cough. 11. Temperature will be a numerical type variable and will take as values numbers between 35 and 43. To fix these limits just validate the check box Range and choose as Lower and Upper the values 35, res. 43. The last five fields (Edema, Pleurisy, Palpitations, Cough and Temperature) will be grouped into a group named Symptoms. To create a group just select (by dragging the mouse over) the fields, then select from the menu Insert the command Group. The constructed page may look similar to the following:

  • Practice no 9/2014-2015 __________________________ UMF Carol Davila Medical Informatics & Biostatistics 5

    By use of the command Add Page (from the menu of the left side) add a new page in which insert the last three fields: 12. Employed, of Yes/No type, 13. Number of children, of numeric type, values between 0 and 14, 14. Children, a list-table that will contain the name and the age of children. In the edit text Question or Prompt insert Children:, and in group Code Tables press button Grid. Now, in the combo box Enter Column Name for Grid insert the text Name of child, then press Save Column. Do the same for the Age of child. 15. Age of patient at admission, of numeric type. Obviously, if we know the birth date and the admission date, the age of patient should be automatically computed! To do such operations the command Program from the left side should be used! As a result of the Program command, the screen will be organized in another way: the left side is now entitled Check, the right side Check Commands. Choose Age as field for which the value is to be computed, then command Assign, and try to insert the computing expression (see the figure below) =YEARS(DataNast, DataIntern)

  • Practice no 9/2014-2015 __________________________ UMF Carol Davila Medical Informatics & Biostatistics 6

    Probably you wont be successful. The reason should be clear: the fields AdmissionDate and BirthDate are placed on another page, their values are not accessible for computing in this second page! Try to move, as a solution, the field Age from page 2 on page 1. To do the move, appeal to commands Cut/Paste from the menu Edit. After this move the problem is solved. Subject 36: Inserting data in Epi Info Data can be inserted directly from the menu File, using the command Enter Data. Other possibilities: after leaving (closing) the module Make View, from the main page Epi Info either select directly the module Enter Data, or command Enter Data from the menu Programs. In this case the necessary view (and project) should be chosen. (The project named name_EPI.mdb is that created previously.) Insert at least four records (this implies filling in the data fields for at least four persons, on both pages!). Save the file name_EPI.mdb in your personal older. In the following figure the insertion of admission date, on the first page, for the third record, is presented. Let us mention that for all labels associated to field values a standard font (MS Sans Serif) of size 14 p.t. was selected.

    Subject 37: Primary statistical analysis of data from files To obtain statistical results the module Analyze Data is used. Inside this module several commands are available in the command window at left. The results of the execution are presented in the window at right upper part (entitled Analysis Output). Below, in the window entitled Program Editor the previous executed commands are shown; here new commands can be keyed in, then executed. The commands at left are grouped. We distinguish the data processing commands (grouped in Data), the commands that operate on the variables (grouped obviously in Variables), the selection commands (grouped in Select/If), the elementary statistical analysis commands (grouped in Statistics) etc.

  • Practice no 9/2014-2015 __________________________ UMF Carol Davila Medical Informatics & Biostatistics 7

    Read (Import) is the command that is used at the beginning of every new work session in the module Analysis. The (imported) data are available for processing until a new Read (Import) command is given. The default data format is Epi 2000, but this can be changed. It is possible to import data from other types of files, such as different versions of Excel, of Fox Pro, Paradox or even hypertext documents. Epi Info is endowed with several projects to exemplify and self-learning; the simplest is Sample.mdb. Execute the command: Read (Import)Data Formats: Epi 2000 Data Source: Sample.mdb Show: Views Views: viewBabyBloodPressure You will see that the full command is: READ 'C:\...\Epi_Info\Sample.mdb':viewBabyBloodPressure List, from the group Statistics, is a command used to present, under a tabular form (either Grid or HTML), of values of some variables from the active data file. Remember, the star * means all. Thus, in the list Variables a star * means that all the values for all variables will be shown. When only some variables are selected, then only the values of these variables will be shown. This command allows also some changes of values from the active data file (Allow Updates). As an example, let us show on screen only the values of variables (i.e. fields) Birthweight, SystolicBlood, AgeInDays under a tabular form (Display Mode: Grid). Of course, we have to select the fields in the list Variables. The full command is: LIST Birthweight SystolicBlood AgeInDays GRIDTABLE

    Frequencies, also from group Statistics, is the first command to begin the analysis of a new dataset; before more processing is done, we need to find out some basic information about the distribution of data. This command is applied to both qualitative and quantitative variables; the result is a synthetic table containing all values of variables that were specified in the list Frequency of:, together with the absolute frequencies (number of apparitions), the percents and

  • Practice no 9/2014-2015 __________________________ UMF Carol Davila Medical Informatics & Biostatistics 8

    the cumulative percents for each value of the variable. Attached to the table a sketch of a bar diagram represents the percents. In the figure below the effect of the command FREQ Birthweight is represented:

    Notice the 95% confidence limits, for each value of the variable, are presented. Read these as follows: we are 95% confident that the percent of newborn that weight 90 oz is situated somewhere between 0.2% and 30.2%. This result is based of 1 in 16 recorded cases! When a stratifying variable is specified, several frequency tables, one for each stratum, are obtained. The command Means leads to values of some center and spread statistics: the Mean, the Median, the (25% and 75%) quartiles, the Minimum and Maximum values, the mode (i.e. the values with highest frequency, the Variance and standard deviation (Std Dev). Obviously, Obs is the total number of values of the variable, and Total is the sum of all values. In the figure below the effect of the command MEANS AgeInDays is represented:

  • Practice no 9/2014-2015 __________________________ UMF Carol Davila Medical Informatics & Biostatistics 9

    The command Means may be used only for quantitative variables. For qualitative variables we limit ourselves to the command Frequencies. The command Select, from the group Select/If, is used to select those records that satisfy a certain criterion. After selection only these records are processed, thus the command Select remains active until it is cancelled (Cancel select). As an example, let us select the newborn children with the age (expressed in days) greater than 3. In the dialog Select Criteria: insert the expression AgeInDays>3. Then, after a List command the following result is obtained: The last two columns, entitled UniqueKey and RecStatus, are special fields for tables created with Epi Info. In the field RecStatus the status of each record is kept. Namely, for records that are marked as deleted the value here is 0; for the others the value is 1. The field UniqueKey is used to automatically count the records.

    The command Header, from group Output, may be used to insert a text as title for the results, also the rendering characteristics may be specified (font, size, etc.). An example: HEADER 2 "Results for newborn children" (BOLD) TEXTFONT +4 The command Type, from the same group Output, is analogous to the previous one; obviously, it is used to insert a string of characters or a content of a text-file in the output stream (which, by default, is the monitor, or is that specified by the command RouteOut). The command RouteOut redirects the results (the output stream) toward the contents of a file specified by name; this process is ended by a command CloseOut. The results obtained by commands such as Frequencies, List etc. will be inserted in the content of the file whose name was previously specified by a command RouteOut. Open (from Sample.mdb), by using the command Read (Import), the table viewEstriolAndBirthweight. Use the command RouteOut to redirect the obtained results toward

  • Practice no 9/2014-2015 __________________________ UMF Carol Davila Medical Informatics & Biostatistics 10

    the file named name_EBW (obviously, in the folder C:\Anul_2). Notice the extension of this file. Insert the title The estriol and the weight at birth by help of the command Header, activating the options Bold and Italic and choosing the font size 7. Insert then the text Content of the file by the command Type activating again the options Bold and Italic, but the font size 5. Use command List to see the values of the two variables Birthweight and Estriol, choosing the alternative Web (HTML). Insert a new text: Statistical processing keeping the same values of parameters as above. Using the command Means compute the statistics for the variable Birthweight, then for Estriol. Close the results file by using RouteOut. Probably we all agree that information presented diagrammatically is easier to be transmitted and understood. The most used diagrams are those with rectangles (Bar or Rotated Bar), the pie charts and the histograms. The first two are adequate to present information about variables that have a small number of values (especially qualitative). The last type is adequate to summarize variables that have a large number of numeric values (as is the case of weights in grams, or of heights in centimeters), of course, after grouping the values into several intervals. The command Graph, from group Statistics, is used to represent diagrammatically variables from the active data file. As an example, open (from the source Sample.mdb) by help of the command Read (Import) the table viewSmoke. Then, using the command Graph, present the values of the variable Sex in a bar chart. Thus, in the dialog box of the command, select Bar in the list Graph Type: and Sex in X-AXIS Main_Variable(s):. In Y-AXIS Show values of: keep the default value Count. The title of the diagram will be: Distribution of smokers by sex | created by ... (your name). After seeing the diagram on screen, export it (FileExport...) in format jpg then rename the obtained file as name_DIAGSX.jpg by help of the command Export Destination: File Browse. Similarly for the variable Race. However, for this the type Rotated Bar is selected. Then, for the variable Marital select the type Pie. Save the two diagrams, with adequate titles, into the files named name_DIAGRC.jpg resp. name_DIAGMR.jpg. For the quantitative variable Age the adequate type will be Histogram, for which the length of the grouping interval will be fixed at 10, and the first value will be. Save the obtained diagram in the file name_DIAGAGE.jpg. Which title is adequate? As another example, open the table viewOswego from the project Sample.mdb. Redirect the results toward the file name_OSW. Every command should be accompanied by genuine explanations. For the variable Age compute the average for the healthy persons (criterion ill=No) and separately for the ill persons (obviously, ill=Yes). Represent diagrammatically the variables Age, Sex, Ill, save all diagrams in format JPG and insert them, accompanied by your comments about what is represented in the diagrams, in a document file named name_DIAGOSWEGO.doc. Create a questionnaire intended to be used to insert only a few data. Use the database name_MEDPL.mdb in which you should create the view Quest2. Prepare this view for inserting the following data:

    a) Code of patient (numeric, starting with 1); b) Gender (legal values M or F); c) Start date of the treatment procedure; d) Type of treatment (legal values only genuine pill or placebo); e) Evaluation date; f) Result (legal values totally cured, partially cured, not cured).

    Insert now (module Enter Data) at least 40 records, trying to balance the numbers according to the type of treatment (around 20 with value genuine pill, around other 20 with value placebo).