analysis of cdc chronic disease indicators us compared with georgia

37
D. Fullerton STAT 3010.W01 Final Project 07/19/09

Upload: dan-fullerton

Post on 29-Jun-2015

179 views

Category:

Documents


1 download

DESCRIPTION

Applied Statistics final project using SAS to complete an epidemiology analysis of CDC Chronic Disease Indicators in US compared with the state of Georgia. Note, we are not very healthy in the state of Georgia.

TRANSCRIPT

Page 1: Analysis of CDC Chronic Disease Indicators US compared with Georgia

D. FullertonSTAT 3010.W01

Final Project07/19/09

Page 2: Analysis of CDC Chronic Disease Indicators US compared with Georgia

STAT 3010.W01 Final Project: Analysis of Center for Disease ControlChronic Disease Indicators of the United States and Georgia for Year 2005

The aim of this report is to discuss the results of a statistical analysis of Chronic Disease Indicators of the United States and Georgia for the year 2005 made by the Center for Disease Control. The points covered in the analysis of data were: 1) Determine descriptive statistics and describe the distributions of variables of the data set, 2) Compare chronic disease indicator rates between the United States and Georgia separately for each of five categories, and 3) Create a random 20 item sample from the dataset, then estimate the Chronic Disease Indicator rate in the United States and Georgia using a 95% and 99% confidence interval, then determine whether or not the population mean rate for all 50 initial data were captured by the estimated confidence intervals. SAS 9.1.3 SP4 and graphics from SAS and Minitab 15 were the applications used in this analysis.

The particular dataset was chosen due relation to healthcare, size, and complexion of data. The five variables (three catagorical and two quantitative) of the Center for Disease Control Chronic Disease Indicators of the United States and Georgia for Year 2005 were obtained by filtering a data set from the Center for Disease Control website (http://apps.nccd.cdc.gov/cdi/Default.aspx). A comparison was selected between the United States and Georgia. The data and definitions were originally developed by The Council of State and Territorial Epidemiologists with epidemiologists and chronic disease program directors at the state and federal level, were refined between 1999 and 2002, then a survey was made for 2005.

This data has proved useful in Georgia to develop a database of the indicators by 19 health districts available via the internet. As well, the Division of Diabetes Translation at Center for Disease Control uses the data to assist diabetes programs with their surveillance and epidemiological activities. Table 1 shows a short selection of the data, and variable names used in Table 1 are described in Table 2. There are 50 datapoints from the year 2005, and the six other datapoints from different years were trimmed from the data set before analysis. Therefore, results and analysis is only valid for the year 2005.The occurrences per 100,000 people of the United States, and Georgia, by Chronic Disease Indicator category are assessed.

The assessment of the quantitative and categorical variables shows the following. Table 3 shows the descriptive statistics for Chronic Disease Indicators of the United States and Georgia both have a significant difference between the mean and median. Figures 1 and 2 clearly show that the distribution of occurrences for the United States, and for Georgia, are both unimodal, and positively skewed. Figures 3 and 4 further demonstrate this trend. Although drasticly skewed, no outliers are shown. The most representative measure of central tendency is the median, 25.95 for the United States, and 25.90 for Georgia.

Table 4 shows the frequency of each occurrence by category. Cancer swallows up the data at 36 occurrences (out of 50), this mode is over four times that of the next leading indicator, Cardiovascular Disease. Figures 5 and 6 reinforce this, however, it is notable that cancer has a broader range of results,and is skewed, but Cardiovascular Disease has a more even distribution.

A new categorical variable was created for the occurrences in the United States and Georgia based on size. The occurrences were broken up into chunks of size 150. The Contingency Table 5 shows that Cancer statistics for the United States are mostly returned in the “X-Small” range, meaning that most of the 32 data points in this category were less than 150 occurrences.

Page 3: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Occurrences for Georgia differ in that some results fall into the “Medium” range, and 50% of the Cardiovascular results are from the “X-Small” category.

The categorical indicator is also show in Figures 7 through 10. They stress again that Cancer is the leading indicator, by far, at over 75% overall. Figure 7 clumps the smallest three indicators into one category, “Other”. The breakdowns of cause by either United States, or Georgia, continue to stress the facts that Cancer and Cardiovascular Disease are the factors that beg further study.

Tables 11 and 12 again show the breakdown of occurrences by the newly created variable, size. Each show that most occurrences for both the United States, and for Georgia, fall into the “X-Small” category, at a frequency of nearly 40% in each. Tables 13 and 14 show the category of incidence by size on stacked bar charts for the United States and Georgia. Cancer results in the United States fit mostly the “X-small” category, and Cardiovascular fit the “Small” category. The results in Geogia show that “X-Small” leads in all categories, and is the vast majority of the Cancer indicator.

Finally, a random sample was produced in SAS of 20 data points. Both the 95 and 99% confidence intervals captured the true sample means with the United States between 38.69 and 200.55 (95%), and 9.00 and 230.24 (99%), where the true mean is 102.33, and Georgia between 35.69 and 210.84 (95%), and 3.56 and 242.97 (99%), where the true mean is 100.37

Page 4: Analysis of CDC Chronic Disease Indicators US compared with Georgia

APPENDIX I: SAS TABLES AND FIGURES

Table 1: Abbreviated Display of the Center for Disease ControlChronic Disease Indicators of the United States and Georgia for Year 2005

Obs CATEGORY INDICATOR YEAR MEASURE UNITED_STATES GEORGIA

1 Tobacco and Alcohol Chronic liver disease - mortality 2005 Crude Rate 9.3 7.5

2 Tobacco and Alcohol Chronic liver disease - mortality 2005 Age-adjusted Rate

8.9 8.1

3 Cancer Invasive cancer (all sites combined) - incidence

2005 Crude Rate 469.8 402.6

4 Cancer Invasive cancer (all sites combined) - incidence

2005 Age-adjusted Rate

458.4 452.0

5 Cancer Cancer (all sites combined) - mortality 2005 Crude Rate 188.6 157.2

. . . . . . .

. . . . . . .

. . . . . .

48 Overarching Conditions Premature mortality among adults aged 45-64 years

2005 Age-adjusted Rate

618.6 711.1

49 Other Diseases and Risk Factors

Asthma - mortality 2005 Crude Rate 1.3 1.3

50 Other Diseases and Risk Factors

Asthma - mortality 2005 Age-adjusted Rate

1.3 1.5

NOTE: The data for other years were minimal and thus eliminated from this data set (the numeration “Obs” was added automatically by SAS).

Page 5: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Table 2: Summary of Variables Contained in Center for Disease ControlChronic Disease Indicators of the United States and Georgia for Year 2005

Variable Name Label General Type Specific TypeMeasurement

Units

ObsObservation number

Categorical Identifier Variable N/A

CATEGORY Disease category Categorical Nominal N/A

INDICATOR Disease indicator Categorical Nominal N/A

YEARSurvey year(only 2005 used)

Categorical Nominal N/A

MEASURECrude or Age adjusted rate

Categorical Nominal N/A

UNITED_STATES - Quantitative Interval/Ratio

Number of instances per 100,000 persons*

GEORGIA - Quantitative Interval/Ratio

Number of instances per 100,000 persons*

* standardized by the direct method to the year 2000 standard U.S. population based on single years of age from the Census P25-1130 series estimates

Page 6: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Table 3: Descriptive Statistics of Center for Disease ControlChronic Disease Indicators of the United States and Georgia for Year 2005

Variable N Mean Median Std Dev Range Minimum Maximum

UNITED_STATESGEORGIA

5050

102.23100.37

25.9525.90

153.70163.10

628.60719.70

1.301.30

629.90721.00

Table 4: Frequency Table of Center for Disease ControlChronic Disease Indicators by Category

CATEGORY

CATEGORY Frequency PercentCumulativeFrequency

CumulativePercent

Cancer 36 72.00 36 72.00

Cardiovascular Disease 8 16.00 44 88.00

Other Diseases and Risk Factors 2 4.00 46 92.00

Overarching Conditions 2 4.00 48 96.00

Tobacco and Alcohol 2 4.00 50 100.00

Page 7: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 1: Histogram of Occurrences United States (per 100,000 people)

0 120 240 360 480 600

0

10

20

30

40

50

60

70

Percent

UNI TED STATES

Figure 2: Histogram of Occurrences Georgia (per 100,000 people)

0 120 240 360 480 600 720

0

10

20

30

40

50

60

70

Percent

GEORGI A

Page 8: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 3: Box Plot of Occurrences United States (year 2005 per 100,000 people)

2005

0

200

400

600

800

UNITED

STATES

YEAR

Figure 4: Box Plot of Occurrences Georgia (year 2005 per 100,000 people)

2005

0

200

400

600

800

GEORGIA

YEAR

Page 9: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 5: Side by Side Box Plot of Occurrences United States (per 100,000 people)

Cancer Tobacco and Al cohol

0

200

400

600

800

UNITED

STATES

CATEGORY

Figure 6: Side by Side Box Plot of Occurrences Georgia (per 100,000 people)

Cancer Tobacco and Al cohol

0

200

400

600

800

GEORGIA

CATEGORY

Page 10: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Table 5: Contingency Table Category of Occurrences by United States Size

CATEGORY(CATEGORY) US_SIZE

Total

FrequencyPercentRow PctCol Pct Large Small X-Large X-Small

Cancer 24.005.56

100.00

24.005.56

25.00

00.000.000.00

3264.0088.8984.21

3672.00

Cardiovascular Disease 00.000.000.00

612.0075.0075.00

00.000.000.00

24.00

25.005.26

816.00

Other Diseases and Risk Factors 00.000.000.00

00.000.000.00

00.000.000.00

24.00

100.005.26

24.00

Overarching Conditions 00.000.000.00

00.000.000.00

24.00

100.00100.00

00.000.000.00

24.00

Tobacco and Alcohol 00.000.000.00

00.000.000.00

00.000.000.00

24.00

100.005.26

24.00

Total 24.00

816.00

24.00

3876.00

50100.00

Page 11: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Table 6: Contingency Table Category of Occurrences by Georgia Size

CATEGORY(CATEGORY) GA_SIZE

Total

FrequencyPercentRow PctCol Pct Large Medium Small X-Large X-Small

Cancer 12.002.78

100.00

12.002.78

50.00

36.008.33

50.00

00.000.000.00

3162.0086.1179.49

3672.00

Cardiovascular Disease 00.000.000.00

12.00

12.5050.00

36.00

37.5050.00

00.000.000.00

48.00

50.0010.26

816.00

Other Diseases and Risk Factors 00.000.000.00

00.000.000.00

00.000.000.00

00.000.000.00

24.00

100.005.13

24.00

Overarching Conditions 00.000.000.00

00.000.000.00

00.000.000.00

24.00

100.00100.00

00.000.000.00

24.00

Tobacco and Alcohol 00.000.000.00

00.000.000.00

00.000.000.00

00.000.000.00

24.00

100.005.13

24.00

Total 12.00

24.00

612.00

24.00

3978.00

50100.00

Page 12: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 7: Pie Chart Category of Occurrences (per 100,000 people)

Page 13: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 8: Pie Chart Category of Occurrences United States (per 100,000 people)

Page 14: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 9: Pie Chart Category of Occurrences Georgia (per 100,000 people)

Page 15: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 10: Bar Chart of Category of Occurrences (per 100,000 people)

Page 16: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 11: Bar Chart of Category of Occurrences United States (per 100,000 people)FREQUENCY

0

10

20

30

40

US_ SI ZE

Lar ge Smal l X- Lar ge X- Smal l

Figure 12: Bar Chart of Category of Occurrences Georgia (per 100,000 people) FREQUENCY

0

10

20

30

40

GA_ SI ZE

Lar ge Medi um Smal l X- Lar ge X- Smal l

Page 17: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 13: Stacked Bar Chart of Category of Occurrences United States (per 100,000 people)

Page 18: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 14: Stacked Bar Chart of Category of Occurrences Georgia (per 100,000 people)

Table 7: 95 and 99% Confidence Intervals for United States and Georgia 20 set Sample

Variable Label NLower 95%

CL for MeanUpper 95%

CL for Mean

UNITED_STATESGEORGIA

UNITED STATESGEORGIA

2020

38.6935.69

200.55210.84

Variable Label NLower 99%

CL for MeanUpper 99%

CL for Mean

UNITED_STATESGEORGIA

UNITED STATESGEORGIA

2020

9.003.56

230.24242.97

Page 19: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Appendix II: Figures Generated in Minitab

Figure 15

6404803201600

25

20

15

10

5

0

occurrences/ 100k

Frequency

Histogram of UNITED_STATES

Figure 16

7006005004003002001000

30

25

20

15

10

5

0

occurrences/ 100k

Frequency

Histogram of GEORGIA

Page 20: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 17

700

600

500

400

300

200

100

0

occ

urr

ence

s/100k

Boxplot of UNITED_STATES

Figure 18

800

700

600

500

400

300

200

100

0

occ

urr

ence

s/100k

Boxplot of GEORGIA

Page 21: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 19

700

600

500

400

300

200

100

0

CATEGORY

occ

urr

ence

s/100k

Boxplot of UNITED_STATES by CATEGORY

Figure 20

800

700

600

500

400

300

200

100

0

CATEGORY

occ

ure

nce

s/100k

Boxplot of GEORGIA by CATEGORY

Page 22: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 21

CancerCardiovascular DiseaseOther Diseases and Risk FactorsOverarching ConditionsTobacco and Alcohol

Category

4.0%4.0%

4.0%

16.0%

72.0%

Pie Chart of CATEGORY

Figure 22

CancerCardiovascular DiseaseOther Diseases and Risk FactorsOverarching ConditionsTobacco and Alcohol

Category

0.3%

24.5%

0.0%

27.6%

47.5%

Pie Chart of CATEGORY for UNITED_STATES

Page 23: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 23

CancerCardiovascular DiseaseOther Diseases and Risk FactorsOverarching ConditionsTobacco and Alcohol

Category

0.3%

28.6%

0.0%

25.7%

45.3%

Pie Chart of CATEGORY for GEORGIA

Figure 24

40

30

20

10

0

CATEGORY

Count

Bar Chart of CATEGORY

Page 24: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 25

X-SmallX-LargeSmallLarge

40

30

20

10

0

US_SIZE

Count

Bar Chart of United States Size

Figure 26

X-SmallX-LargeSmallMediumLarge

40

30

20

10

0

GA_SIZE

Count

Bar Chart of Georgia Size

Page 25: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Figure 27

CATEGORY

40

30

20

10

0

Count

X-SmallX-LargeSmallLarge

US_SIZE

Stacked Bar Chart of CATEGORY by United States Size

Figure 28

CATEGORY

40

30

20

10

0

Count

X-SmallX-LargeSmallMediumLarge

GA_SIZE

Stacked Bar Chart of CATEGORY by Georgia Size

Page 26: Analysis of CDC Chronic Disease Indicators US compared with Georgia

Appendix III: SAS Code

* FULLERTON, STAT 3010.W01, FINAL PROJECT: DATA ANALYSIS OF Center for Disease Control Chronic Disease Indicators (CDC - CDI) of the United States and Georgia for Year 2005;

* SETTING SYSTEM OPTIONS;

DM 'LOG;CLEAR;OUT;CLEAR;';OPTIONS LS=100 PS=75 FORMDLIM="=";QUIT;

* Loading previously saved data set;

DATA NEWCDICDC;SET 'V:\final.project\CDICDC';

RUN;

* Saving the data as a permanent SAS data set;

DATA CDICDC;SET 'V:\final.project\CDICDC';

RUN;

* To view data in SAS;

PROC PRINT DATA = CDICDC;RUN;

* SETTING LIBREF;

* Saving data as a permanent SAS data set;

LIBNAME W2 'V:\final.project';

DATA W2.CDICDC;SET CDICDC;

RUN;

* IMPORT CDC - CDI DATA;

PROC IMPORTDATAFILE = 'V:\final.project\FilChrDisIndCDC.xls'OUT = T1REPLACE;

RUN; QUIT;

* Variable View in SAS;

PROC CONTENTS DATA = W2.CDICDC;RUN;

* Table 1 Dataset;

ODS RTF;

PROC PRINT DATA = W2.CDICDC;VAR CATEGORY INDICATOR YEAR MEASURE UNITED_STATES GEORGIA;

RUN;

ODS RTF CLOSE;

Page 27: Analysis of CDC Chronic Disease Indicators US compared with Georgia

* Descriptive Statistics for Quantitative Variables;

ODS RTF;PROC MEANS DATA = W2.CDICDC MAXDEC=2 N MEAN MEDIAN STD RANGE MIN MAX;

VAR UNITED_STATES GEORGIA;RUN;ODS RTF CLOSE;

* Frequency Tables of Category Variables;

ODS RTF;PROC FREQ DATA = W2.CDICDC;

TABLES CATEGORY INDICATOR MEASURE;RUN;ODS RTF CLOSE;

* Histograms and Boxplots;

DM 'LOG; CLEAR; OUT; CLEAR;';

PROC UNIVARIATE DATA = W2.CDICDC;VAR UNITED_STATES GEORGIA;HISTOGRAM;

RUN;

PROC SORT DATA = W2.CDICDC;BY YEAR;

PROC BOXPLOT DATA = W2.CDICDC;PLOT UNITED_STATES*YEAR; PLOT GEORGIA*YEAR;

RUN;

* Boxplot of Occurrences by Category;

DM 'LOG; CLEAR; OUT; CLEAR; GRAPH; CLEAR';PROC SORT DATA = W2.CDICDC;

BY CATEGORY;PROC BOXPLOT DATA = W2.CDICDC;

PLOT UNITED_STATES*CATEGORY;PLOT GEORGIA*CATEGORY;

RUN;

* Creating new variable (size) for contingency table analysis;

DM 'LOG;CLEAR;OUT;CLEAR';DATA T1;

SET T1;LENGTH US_SIZE $ 7;IF UNITED_STATES < 145 THEN US_SIZE = 'X-Small';IF (UNITED_STATES GE 145) AND (UNITED_STATES < 300) THEN US_SIZE = 'Small';IF (UNITED_STATES GE 300) AND (UNITED_STATES < 450) THEN US_SIZE = 'Medium';IF (UNITED_STATES GE 450) AND (UNITED_STATES < 600) THEN US_SIZE = 'Large';IF (UNITED_STATES GE 600) THEN US_SIZE = 'X-Large';SET T1;LENGTH GA_SIZE $ 7;IF GEORGIA < 145 THEN GA_SIZE = 'X-Small';IF (GEORGIA GE 145) AND (GEORGIA < 300) THEN GA_SIZE = 'Small';IF (GEORGIA GE 300) AND (GEORGIA < 450) THEN GA_SIZE = 'Medium';IF (GEORGIA GE 450) AND (GEORGIA < 600) THEN GA_SIZE = 'Large';IF (GEORGIA GE 600) THEN GA_SIZE = 'X-Large';

Page 28: Analysis of CDC Chronic Disease Indicators US compared with Georgia

PROC PRINT DATA = T1;RUN;

* Contingency Tables;

DM 'LOG;CLEAR;OUT;CLEAR';

ODS RTF;PROC FREQ DATA = T1;

TABLES CATEGORY*US_SIZE;RUN;ODS RTF CLOSE;

ODS RTF;PROC FREQ DATA = T1;

TABLES CATEGORY*GA_SIZE;RUN;ODS RTF CLOSE;

* Pie Charts;

PROC GCHART DATA = W2.CDICDC;PIE CATEGORY;GOPTIONS HTEXT = 1;

LEGEND;RUN;QUIT;

PROC GCHART DATA = W2.CDICDC;PIE CATEGORY / SUMVAR = UNITED_STATES PERCENT = INSIDE;

GOPTIONS HTEXT = 1; LEGEND;RUN;QUIT;

PROC GCHART DATA = W2.CDICDC;PIE CATEGORY / SUMVAR = GEORGIA PERCENT = INSIDE;

GOPTIONS HTEXT = 1;LEGEND;RUN;QUIT;

* Bar Charts;

PROC GCHART DATA = W2.CDICDC;VBAR CATEGORY / TYPE = FREQ;

GOPTIONS HTEXT = 1;LEGEND;RUN;

PROC GCHART DATA = T1;VBAR US_SIZE / TYPE = FREQ;

GOPTIONS HTEXT = 1;LEGEND;RUN;

PROC GCHART DATA = T1;VBAR GA_SIZE / TYPE = FREQ;

GOPTIONS HTEXT = 1;LEGEND;RUN;

Page 29: Analysis of CDC Chronic Disease Indicators US compared with Georgia

* Stacked Bar Charts;

PROC GCHART DATA = T1;VBAR CATEGORY / SUBGROUP = US_SIZE;GOPTIONS HTEXT = 1;LEGEND;

RUN;

PROC GCHART DATA = T1;VBAR CATEGORY / SUBGROUP = GA_SIZE;GOPTIONS HTEXT = 1;LEGEND;

RUN;

* Generate Random sample set of data with seed to replicate data;

DATA CDICDCN;

SET W2.CDICDC;GROUP = RANUNI(123456);

PROC PRINT DATA = CDICDCN;RUN;

* Sort random data to show only the first 20 observations;

PROC SORT DATA = CDICDCN;

BY GROUP;

DATA CDICDCNN;

SET CDICDCN;IF _n_ < 21;

PROC PRINT DATA = CDICDCNN;RUN;

* Confidence Intervals on ratio scale variables;

DM 'LOG;CLEAR;OUT;CLEAR;';

ODS RTF;PROC MEANS DATA = CDICDCNN MAXDEC=2 N CLM ALPHA = .05;

VAR UNITED_STATES GEORGIA;RUN;PROC MEANS DATA = CDICDCNN MAXDEC=2 N CLM ALPHA = .01;

VAR UNITED_STATES GEORGIA;RUN;ODS RTF CLOSE;

* Export Data to Minitab;PROC EXPORT

OUTFILE = 'V:final.project\FilChrDisIndCDC.csv'DATA = W2.CDICDCREPLACE;

RUN;PROC EXPORT

OUTFILE = 'V:final.project\FilChrDisIndCDCT1.csv'DATA = T1REPLACE;

RUN;QUIT;