disesase symptoms network poster

1
Comparing Two Human Disease Networks: Gene-Based and Symptom-Based Perspectives Ibraheem Rehman, Cheryl Limer, Yousuf Shah, Zach Eaton, Carol Reynolds, Alan Troidl (Vestal High School, Vestal, NY, USA) Kristie McHugh, Hiroki Sayama (Binghamton University, Binghamton, NY, USA) Genki Ichinose (Anan Nation College of Technology, Japan) Goh et al.’s human disease network [1] introduced a novel, network-based perspective to the understanding of human diseases and medicine. Their network was built using genes that are commonly associated with two diseases as links between them. In contrast, many diseases and ailments have traditionally been diagnosed by doctors based on the observed and experienced symptoms that are commonly associated with a given disease. While symptoms have the potential to be great indicators of specific diseases, they are not always accurate. Certain symptoms are universal to many diseases. How are diseases, symptoms and genes correlate? Is it possible to identify genes that account for certain symptoms? If we could find correlations and discrepancies between them and analyze the data using network analysis, we could reduce the risk of medical errors that come along with physicians diagnosing genetic disorders on a solely observable basis. In this study, we hypothesized that there will be a positive correlation between the numbers of symptoms and genes shared by a pair of diseases. To test this hypothesis, we created a new network depicting disease-symptom relationships and compared it to Goh et al.’s network depicting disease-gene relationships. We found that the two networks had very different structures, and that there was essentially no correlation between genetic and symptomatic similarities between diseases. Our methodology and findings may inform medical researchers and practitioners about sets of confusing diseases that require particular attention in diagnosis and treatment. Conclusion: We constructed and compared two different networks of human diseases, one based on genes and the other based on symptoms. Those two networks were found to be very different from each other. We found no significant correlation between genetic and symptomatic similarities, which indicates that diagnosis based on symptoms alone is not satisfactory. Using the results, we were able to identify disease pairs with many similar symptoms but few common genes. We believe this is the first step in being able to find the genetic diseases that are most commonly misdiagnosed. Future research directions will include (1) expanding the number of diseases studied, (2) comparing our results with real-life examples of misdiagnoses, (3) investigating into the symptoms themselves more in depth, and (4) conducting more advanced network analyses and statistical testing of the data. References: 1. Goh, K. I., Cusick, M. E., Valle, D., Childs, B., Vidal, M., & Barabási, A. L. (2007). The human disease network. Proceedings of the National Academy of Sciences, 104(21), 8685-8690. 2. MedTech USA, Inc. (2013). Disease Information. Retrieved October 15, 2013, from DiagnosisPro: http://en.diagnosispro.com/disease_information/home/ Acknowledgments: This research was supported in part by the National Science Foundation (NSF) Grants #1027752 and #1319152. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the NSF. We thank Jeff Schmidt for his assistance throughout our work. The Vestal High School team’s trip to NetSci 2014 was also sponsored by: National Science Foundation, STEM Hub/Lockheed- Martin, Innovation Associates, BAE Systems, Stantec, and Vestal National Honor Society. Results: In Figure 1, the network is very sparse because only certain genes code for certain diseases. These gene-disease clusters show us that as the genes that are present vary, the diseases slightly vary. This explains the slight variation among some genetic diseases. (Pfeiffer Syndrome, Craniosynostosis) In Figure 2, the network is very dense because there are symptoms that are common to all diseases, such as pain and fatigue. The diseases on the outside of the network have very unique symptoms. By analyzing these networks, one can tell that diagnosing on symptoms alone is not satisfactory. Gene tests must also be done. None of the scatter plots show any significant correlation. Therefore, our hypothesis was false. There was not a positive correlation between genetic and symptomatic similarities. The result of our research was that diagnosing on symptoms alone is not satisfactory and gene tests must also be used. A: Based on # of shared genes/symptoms B: Based on geometric mean of gene/symptom overlap ratios C: Based on A x B Methods: We gathered the symptoms of approximately 2,700 genetic diseases from DiagnosisPro [2], using the same disease set as in Goh et al.’s gene-based disease network [1]. Diseases that were not in the database of the website were excluded. The data about genes associated with the diseases were obtained from Goh et al.’s website [1]. These data were stored in two Excel files. We used Python to process the data files and constructed two bipartite networks, one with genes and diseases and one with symptoms and diseases. Those networks were visualized using Gephi (see Fig. 1). We created scatter plots showing all the disease pairs arranged by their genetic and symptomatic similarities. We tried three similarity measures: (A) Number of shared genes/symptoms, (B) geometric mean of gene/symptom overlap ratios (i.e., # of shared genes/symptoms over total # of associated genes/symptoms) between two diseases, and (C) product of A and B (see Fig. 2). In the plots, we identified diseases that were similar in symptoms but different in genes, possibly causing a high risk of misdiagnosis. To verify our results we asked Dr. Afzal Ur Rehman for feedback. Figure 1: Above: Disease-gene network. Below: Disease-symptom network. Diseases are shown in red, while genes/symptoms are in cyan. Node sizes are scaled according to their degrees. Figure 2: Scatter plots showing distributions of disease pairs, where x- and y- axes are genetic and symptomatic similarities of the two diseases, respectively.

Upload: zachary-eaton

Post on 09-Apr-2017

108 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Disesase Symptoms Network poster

Comparing Two Human Disease Networks: Gene-Based and Symptom-Based Perspectives

Ibraheem Rehman, Cheryl Limer, Yousuf Shah, Zach Eaton, Carol Reynolds, Alan Troidl (Vestal High School, Vestal, NY, USA)Kristie McHugh, Hiroki Sayama (Binghamton University, Binghamton, NY, USA)

Genki Ichinose (Anan Nation College of Technology, Japan)

Goh et al.’s human disease network [1] introduced a novel,network-based perspective to the understanding of humandiseases and medicine. Their network was built using genesthat are commonly associated with two diseases as linksbetween them. In contrast, many diseases and ailmentshave traditionally been diagnosed by doctors based on theobserved and experienced symptoms that are commonlyassociated with a given disease. While symptoms have thepotential to be great indicators of specific diseases, they arenot always accurate. Certain symptoms are universal tomany diseases. How are diseases, symptoms and genescorrelate? Is it possible to identify genes that account forcertain symptoms? If we could find correlations anddiscrepancies between them and analyze the data usingnetwork analysis, we could reduce the risk of medical errorsthat come along with physicians diagnosing geneticdisorders on a solely observable basis.

In this study, we hypothesized that there will be a positivecorrelation between the numbers of symptoms and genesshared by a pair of diseases. To test this hypothesis, wecreated a new network depicting disease-symptomrelationships and compared it to Goh et al.’s networkdepicting disease-gene relationships. We found that the twonetworks had very different structures, and that there wasessentially no correlation between genetic and symptomaticsimilarities between diseases. Our methodology andfindings may inform medical researchers and practitionersabout sets of confusing diseases that require particularattention in diagnosis and treatment.

Conclusion:We constructed and compared two different networks of humandiseases, one based on genes and the other based on symptoms. Thosetwo networks were found to be very different from each other. Wefound no significant correlation between genetic and symptomaticsimilarities, which indicates that diagnosis based on symptoms alone isnot satisfactory. Using the results, we were able to identify diseasepairs with many similar symptoms but few common genes. We believethis is the first step in being able to find the genetic diseases that aremost commonly misdiagnosed.

Future research directions will include (1) expanding the number ofdiseases studied, (2) comparing our results with real-life examples ofmisdiagnoses, (3) investigating into the symptoms themselves more indepth, and (4) conducting more advanced network analyses andstatistical testing of the data.

References:1. Goh, K. I., Cusick, M. E., Valle, D., Childs, B., Vidal, M., & Barabási, A. L. (2007). The

human disease network. Proceedings of the National Academy of Sciences, 104(21),8685-8690.

2. MedTech USA, Inc. (2013). Disease Information. Retrieved October 15, 2013, fromDiagnosisPro: http://en.diagnosispro.com/disease_information/home/

Acknowledgments:This research was supported in part by the National ScienceFoundation (NSF) Grants #1027752 and #1319152. Any opinions,findings, and conclusions or recommendations expressed in thismaterial are those of the authors and do not necessarily reflect theviews of the NSF. We thank Jeff Schmidt for his assistance throughoutour work.

The Vestal High School team’s trip to NetSci 2014 was alsosponsored by: National Science Foundation, STEM Hub/Lockheed-Martin, Innovation Associates, BAE Systems, Stantec, and VestalNational Honor Society.

Results:• In Figure 1, the network is very sparse because only certain genes code for certain diseases.• These gene-disease clusters show us that as the genes that are present vary, the diseases

slightly vary. This explains the slight variation among some genetic diseases. (PfeifferSyndrome, Craniosynostosis)

• In Figure 2, the network is very dense because there are symptoms that are common to alldiseases, such as pain and fatigue.

• The diseases on the outside of the network have very unique symptoms.• By analyzing these networks, one can tell that diagnosing on symptoms alone is not

satisfactory. Gene tests must also be done.• None of the scatter plots show any significant correlation.• Therefore, our hypothesis was false. There was not a positive correlation between genetic and

symptomatic similarities.• The result of our research was that diagnosing on symptoms alone is not satisfactory and gene

tests must also be used.

A: Based on # of shared genes/symptoms

B: Based on geometric mean of gene/symptom overlap ratios

C: Based on A x B

Methods:• We gathered the symptoms of approximately 2,700 genetic

diseases from DiagnosisPro [2], using the same disease set as inGoh et al.’s gene-based disease network [1]. Diseases that werenot in the database of the website were excluded. The data aboutgenes associated with the diseases were obtained from Goh et al.’swebsite [1]. These data were stored in two Excel files.

• We used Python to process the data files and constructed twobipartite networks, one with genes and diseases and one withsymptoms and diseases. Those networks were visualized usingGephi (see Fig. 1).

• We created scatter plots showing all the disease pairs arranged bytheir genetic and symptomatic similarities. We tried threesimilarity measures: (A) Number of shared genes/symptoms, (B)geometric mean of gene/symptom overlap ratios (i.e., # of sharedgenes/symptoms over total # of associated genes/symptoms)between two diseases, and (C) product of A and B (see Fig. 2).

• In the plots, we identified diseases that were similar in symptomsbut different in genes, possibly causing a high risk of misdiagnosis.

• To verify our results we asked Dr. Afzal Ur Rehman for feedback.

Figure 1: Above: Disease-gene network. Below:Disease-symptom network. Diseases are shown inred, while genes/symptoms are in cyan. Node sizesare scaled according to their degrees.

Figure 2: Scatter plots showing distributions of disease pairs, where x- and y-axes are genetic and symptomatic similarities of the two diseases, respectively.