introduction relationship between climate and health widely studied climatic temperature stress...

1
Introduction Relationship between climate and health widely studied Climatic temperature stress increases cardiovascular disease risk Solar UV radiation increases skin cancer risk Cancer incidence rates expected to rise dramatically Cancer risk factors: genetic vs. environmental Objective: Investigate possible link between climate and cancer incidence How: Knowledge discovery via text mining of biomedical literature Why: Efficient way to assist scientific hypothesis formation Future Work Implement investigation linking actual transplant records and cancer incidence data Determine if higher association exists between skin cancer and kidney transplants in sunny climates Generate new hypotheses that may be tested with traditional, long term epidemiological studies Develop model to predict future expected cancer incidence based on environmental factors 1. Crandall, C.G., & Gonzalez-Alonso, J. (2010). Cardiovascular Function in the Heat-Stressed Human. Acta Physiologica, 407-23. 2. Jemal, A., Center, M. M., & DeSantis, C. (2010). Global Patterns of Cancer Incidence and Mortality Rates and Trends. Cancer Epidemiology Biomarkers & Prevention, 1893-907. 3. U.S. cancer statistics: an interactive atlas [Image]. (n.d.). Retrieved from http://apps.nccd.cdc.gov/DCPC_INCA/DCPC_INCA.aspx 4. van der Leun, J. C., & de Gruijl, F. R. (2002). Climate change and skin cancer [Abstract]. Photochemical and Photobiological Sciences, 324-26. Knowledge Discovery through Text Mining Biomedical Documents Relating Climate and Cancer Incidence Katherine Gaster Wofford College Mentor: Georgia Tourassi, Ph.D. Computational Sciences and Engineering Division Research Alliance in Math and Science Oak Ridge National Laboratory http://info.ornl.gov/sites/rams2012/k_gaster The Research Alliance in Math and Science program is sponsored by the Office of Advanced Scientific Computing Research, U.S. Department of Energy. The work was performed at the Oak Ridge National Laboratory, which is managed by UT- Battelle, LLC under Contract No. De-AC05-00OR22725. This work has been authored by a contractor of the U.S. Government, accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes. The author would like to thank Dr. Georgia Tourassi and Dr. Robert Patton for their support and assistance with this research. I would also like to thank Kara Kruse and Rashida Askia for their continued support and assistance, and Debbie McCoy for the opportunity to perform this research and her continued support and assistance. Methodology Study design Literature search to find relevant publications Text mining using Piranha software Investigate impact of Piranha’s key features Changing threshold values Changing stop word groups Category addition Values from 0.0 to 1.0; intervals of 0.2 Intervals halved three times Combinations of groups In conjunction with varied threshold values Cancer types; risk factors Used with varied threshold values; stop word groups Figure 2. Expected association: breast or prostate cancer risk and genetic profile. Total documents: 6 Total Piranha Categories: 1 unknown 100% Total Document Categories: 1 All Article… 100% Top Terms: herit, heritable, prostate, pros … genetics, ge ne… brca population-based Results Discovered expected associations Skin cancer risk and UV exposure Breast cancer / prostate cancer risk and genetic profile Total documents: 3 Total Piranha Categories: 1 Unknown 100% Total Document Categories: 1 All Article… 100% Top Terms: uv photobiol, phot… .rs carcinogenesis bl ish ed Total documents: 3 Total Piranha Categories: 1 unknown 100% Total Document Categories: 1 All Article… 100% Top Terms: renal nmsc, nmsc:, nmscs dermatol keratoses:, ker… transplanted, t… Melanoma of the Skin Incidence Rates by State, 2008 Figure 1. Expected association: cancer risk and UV exposure. Figure 4. Unexpected association: kidney transplants and skin cancer. Figure 3. Map depicting relative melanoma incidence for each U.S. state in 2008. Discovered unexpected association: increased skin cancer risk in kidney transplant recipients Obtained additional publications to investigate association further • Completed literature search for publications referring to kidney transplants and skin cancer • Analyzed supplementary documents with and without original documents Conclusions Text mining revealed expected associations between climatic factors and cancer types Unexpected association discovered: increased skin cancer risk in kidney transplant recipients Supplementary articles clustered as expected Figure 6. Document clustering of original and supplementary articles. Original threshold values, no stop words. Figure 5. Document clustering of original articles. Original threshold values, no stop words.

Upload: abner-long

Post on 16-Jan-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction Relationship between climate and health widely studied Climatic temperature stress increases cardiovascular disease risk Solar UV radiation

Introduction

• Relationship between climate and health widely studied• Climatic temperature stress increases

cardiovascular disease risk• Solar UV radiation increases skin cancer risk

• Cancer incidence rates expected to rise dramatically

• Cancer risk factors: genetic vs. environmental• Objective: Investigate possible link between

climate and cancer incidence• How: Knowledge discovery via text mining of

biomedical literature• Why: Efficient way to assist scientific

hypothesis formation

Future Work

• Implement investigation linking actual transplant records and cancer incidence data

• Determine if higher association exists between skin cancer and kidney transplants in sunny climates

• Generate new hypotheses that may be tested with traditional, long term epidemiological studies

• Develop model to predict future expected cancer incidence based on environmental factors

1. Crandall, C.G., & Gonzalez-Alonso, J. (2010). Cardiovascular Function in the Heat-Stressed Human. Acta Physiologica, 407-23.

2. Jemal, A., Center, M. M., & DeSantis, C. (2010). Global Patterns of Cancer Incidence and Mortality Rates and Trends. Cancer Epidemiology Biomarkers & Prevention, 1893-907.

3. U.S. cancer statistics: an interactive atlas [Image]. (n.d.). Retrieved from http://apps.nccd.cdc.gov/DCPC_INCA/DCPC_INCA.aspx

4. van der Leun, J. C., & de Gruijl, F. R. (2002). Climate change and skin cancer [Abstract]. Photochemical and Photobiological Sciences, 324-26.

Knowledge Discovery through Text Mining Biomedical Documents Relating Climate and Cancer Incidence

Katherine GasterWofford College

Mentor: Georgia Tourassi, Ph.D.Computational Sciences and Engineering Division

Research Alliance in Math and ScienceOak Ridge National Laboratory

http://info.ornl.gov/sites/rams2012/k_gaster

The Research Alliance in Math and Science program is sponsored by the Office of Advanced Scientific Computing Research, U.S. Department of Energy. The work was performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. De-AC05-00OR22725. This work has been authored by a contractor of the U.S. Government, accordingly, the U.S. Government retains a nonexclusive, royalty-free license to publish or reproduce the published form of this contribution, or allow others to do so, for U.S. Government purposes. The author would like to thank Dr. Georgia Tourassi and Dr. Robert Patton for their support and assistance with this research. I would also like to thank Kara Kruse and Rashida Askia for their continued support and assistance, and Debbie McCoy for the opportunity to perform this research and her continued support and assistance.

MethodologyMethodology

Study design• Literature search to find relevant publications• Text mining using Piranha software • Investigate impact of Piranha’s key features

Changingthreshold values

Changing stopword groups

Categoryaddition

• Values from 0.0 to 1.0; intervals of 0.2

• Intervals halved three times

• Combinations of groups• In conjunction with varied

threshold values• Cancer types; risk factors• Used with varied

threshold values; stop word groups

Figure 2. Expected association: breast or prostate cancer risk and genetic profile.

Total documents: 6Total Piranha Categories: 1unknown 100%Total Document Categories: 1All Article… 100%Top Terms:herit, heritable,prostate, pros …genetics, ge ne…brcapopulation-based

Results

Discovered expected associations• Skin cancer risk and UV exposure• Breast cancer / prostate cancer risk and genetic

profile

Total documents: 3Total Piranha Categories: 1Unknown 100%Total Document Categories: 1All Article… 100%Top Terms:uvphotobiol, phot….rscarcinogenesisbl ish ed

Total documents: 3Total Piranha Categories: 1unknown 100%Total Document Categories: 1All Article… 100%Top Terms:renalnmsc, nmsc:, nmscsdermatolkeratoses:, ker…transplanted, t…

Melanoma of the Skin Incidence Rates by State, 2008

Figure 1. Expected association: cancer risk and UV exposure.

Figure 4. Unexpected association: kidney transplants and skin cancer.

Figure 3. Map depicting relative melanoma incidence for each U.S. state in 2008.

Discovered unexpected association: increased skin cancer risk in kidney transplant recipients

• Obtained additional publications to investigate association further• Completed literature search for publications

referring to kidney transplants and skin cancer• Analyzed supplementary documents with and

without original documents

Conclusions

• Text mining revealed expected associations between climatic factors and cancer types

• Unexpected association discovered: increased skin cancer risk in kidney transplant recipients

• Supplementary articles clustered as expected

Figure 6. Document clustering of original and supplementary articles. Original threshold values, no stop words.

Figure 5. Document clustering of original articles. Original threshold values, no stop words.