disesase symptoms network poster

Comparing Two Human Disease Networks: Gene-Based and Symptom-Based Perspectives

Ibraheem Rehman, Cheryl Limer, Yousuf Shah, Zach Eaton, Carol Reynolds, Alan Troidl (Vestal High School, Vestal, NY, USA)Kristie McHugh, Hiroki Sayama (Binghamton University, Binghamton, NY, USA)

Genki Ichinose (Anan Nation College of Technology, Japan)

Goh et al.’s human disease network [1] introduced a novel,network-based perspective to the understanding of humandiseases and medicine. Their network was built using genesthat are commonly associated with two diseases as linksbetween them. In contrast, many diseases and ailmentshave traditionally been diagnosed by doctors based on theobserved and experienced symptoms that are commonlyassociated with a given disease. While symptoms have thepotential to be great indicators of specific diseases, they arenot always accurate. Certain symptoms are universal tomany diseases. How are diseases, symptoms and genescorrelate? Is it possible to identify genes that account forcertain symptoms? If we could find correlations anddiscrepancies between them and analyze the data usingnetwork analysis, we could reduce the risk of medical errorsthat come along with physicians diagnosing geneticdisorders on a solely observable basis.

In this study, we hypothesized that there will be a positivecorrelation between the numbers of symptoms and genesshared by a pair of diseases. To test this hypothesis, wecreated a new network depicting disease-symptomrelationships and compared it to Goh et al.’s networkdepicting disease-gene relationships. We found that the twonetworks had very different structures, and that there wasessentially no correlation between genetic and symptomaticsimilarities between diseases. Our methodology andfindings may inform medical researchers and practitionersabout sets of confusing diseases that require particularattention in diagnosis and treatment.

Conclusion:We constructed and compared two different networks of humandiseases, one based on genes and the other based on symptoms. Thosetwo networks were found to be very different from each other. Wefound no significant correlation between genetic and symptomaticsimilarities, which indicates that diagnosis based on symptoms alone isnot satisfactory. Using the results, we were able to identify diseasepairs with many similar symptoms but few common genes. We believethis is the first step in being able to find the genetic diseases that aremost commonly misdiagnosed.

Future research directions will include (1) expanding the number ofdiseases studied, (2) comparing our results with real-life examples ofmisdiagnoses, (3) investigating into the symptoms themselves more indepth, and (4) conducting more advanced network analyses andstatistical testing of the data.

References:1. Goh, K. I., Cusick, M. E., Valle, D., Childs, B., Vidal, M., & Barabási, A. L. (2007). The

human disease network. Proceedings of the National Academy of Sciences, 104(21),8685-8690.

2. MedTech USA, Inc. (2013). Disease Information. Retrieved October 15, 2013, fromDiagnosisPro: http://en.diagnosispro.com/disease_information/home/

Acknowledgments:This research was supported in part by the National ScienceFoundation (NSF) Grants #1027752 and #1319152. Any opinions,findings, and conclusions or recommendations expressed in thismaterial are those of the authors and do not necessarily reflect theviews of the NSF. We thank Jeff Schmidt for his assistance throughoutour work.

The Vestal High School team’s trip to NetSci 2014 was alsosponsored by: National Science Foundation, STEM Hub/Lockheed-Martin, Innovation Associates, BAE Systems, Stantec, and VestalNational Honor Society.

Results:• In Figure 1, the network is very sparse because only certain genes code for certain diseases.• These gene-disease clusters show us that as the genes that are present vary, the diseases

slightly vary. This explains the slight variation among some genetic diseases. (PfeifferSyndrome, Craniosynostosis)

• In Figure 2, the network is very dense because there are symptoms that are common to alldiseases, such as pain and fatigue.

• The diseases on the outside of the network have very unique symptoms.• By analyzing these networks, one can tell that diagnosing on symptoms alone is not

satisfactory. Gene tests must also be done.• None of the scatter plots show any significant correlation.• Therefore, our hypothesis was false. There was not a positive correlation between genetic and

symptomatic similarities.• The result of our research was that diagnosing on symptoms alone is not satisfactory and gene

tests must also be used.

A: Based on # of shared genes/symptoms

B: Based on geometric mean of gene/symptom overlap ratios

C: Based on A x B

Methods:• We gathered the symptoms of approximately 2,700 genetic

diseases from DiagnosisPro [2], using the same disease set as inGoh et al.’s gene-based disease network [1]. Diseases that werenot in the database of the website were excluded. The data aboutgenes associated with the diseases were obtained from Goh et al.’swebsite [1]. These data were stored in two Excel files.

• We used Python to process the data files and constructed twobipartite networks, one with genes and diseases and one withsymptoms and diseases. Those networks were visualized usingGephi (see Fig. 1).

• We created scatter plots showing all the disease pairs arranged bytheir genetic and symptomatic similarities. We tried threesimilarity measures: (A) Number of shared genes/symptoms, (B)geometric mean of gene/symptom overlap ratios (i.e., # of sharedgenes/symptoms over total # of associated genes/symptoms)between two diseases, and (C) product of A and B (see Fig. 2).

• In the plots, we identified diseases that were similar in symptomsbut different in genes, possibly causing a high risk of misdiagnosis.

• To verify our results we asked Dr. Afzal Ur Rehman for feedback.

Figure 1: Above: Disease-gene network. Below:Disease-symptom network. Diseases are shown inred, while genes/symptoms are in cyan. Node sizesare scaled according to their degrees.

Figure 2: Scatter plots showing distributions of disease pairs, where x- and y-axes are genetic and symptomatic similarities of the two diseases, respectively.

disesase symptoms network poster

Documents