lsi based framework to predict gene regulatory information
TRANSCRIPT
BioMed CentralBMC Bioinformatics
ss
Open AcceMeeting abstractLSI based framework to predict gene regulatory informationSujoy Roy1,2, Lijing Xu1,2 and Ramin Homayouni*1,2Address: 1Department of Biology, University of Memphis, Memphis, TN 38152, USA and 2Bioinformatics Program, University of Memphis, Memphis, TN 38152, USA
Email: Ramin Homayouni* - [email protected]
* Corresponding author
BackgroundLatent Semantic Indexing (LSI), a vector space model forinformation retrieval, has shown promise in predictingfunctional relationships between genes using textualinformation in MEDLINE abstracts. The underlying prin-ciple is that genes may be represented as document vectorsin a multi-dimensional hyperspace, and the conceptualrelationship between any two genes is determined by thecosine of the angle between their vectors [1]. In this study,we sought to extend this concept for identification ofputative transcription factors (TFs) that regulate a groupof co-regulated genes. We hypothesized that co-expressedgenes identified by microarray experiments are function-ally related and that at least some of these genes have pre-viously been linked explicitly or implicitly to TFs in theliterature. A transcriptional module is then defined as a setof genes clustered together in LSI space with closelyrelated TFs (Figure 1). We devised a framework usingthese assumptions to identify transcriptional modulesfrom microarray and promoter motif data (Figure 2). Theframework requires as input, co-expressed genes from amicroarray dataset and a set of TFs that have consensusmotifs in the promoter regions of the co-expressed genes.Usually the set of such motif-derived TFs is large andmakes the identification of the critical ones difficult. Theframework first identifies functionally related clusters ofco-expressed genes based on their latent relationshipsfrom literature, and then adds to each cluster TFs that areclosely associated with the genes in the cluster. The puta-tive transcriptional modules are ranked based on the
degree of relative literature coherence amongst the entitiesin them.
Results and discussionThe LSI-based algorithm allows prediction of TFs basedon latent (implicit) relationships in the literature. A pre-liminary evaluation of our method using previously pub-lished knock-out experiments revealed that it hasreasonable recall and precision. A more rigorous evalua-tion of the method will require several additional TFknock-out microarray experiments. This work providesproof of principle that the combination of motif analysis
from UT-ORNL-KBRIN Bioinformatics Summit 2009Pikeville, TN, USA. 20–22 March 2009
Published: 25 June 2009
BMC Bioinformatics 2009, 10(Suppl 7):A5 doi:10.1186/1471-2105-10-S7-A5
<supplement> <title> <p>UT-ORNL-KBRIN Bioinformatics Summit 2009</p> </title> <editor>Eric C Rouchka and Julia Krushkal</editor> <note>Meeting abstracts – A single PDF containing all abstracts in this Supplement is available <a href="http://www.biomedcentral.com/content/pdf/1471-2105-10-S7-full.pdf">here</a>.</note> <url>http://www.biomedcentral.com/content/pdf/1471-2105-10-S7-info.pdf</url> </supplement>
This abstract is available from: http://www.biomedcentral.com/1471-2105/10/S7/A5
© 2009 Roy et al; licensee BioMed Central Ltd.
Gene clusters and transcriptional modules in LSI spaceFigure 1Gene clusters and transcriptional modules in LSI space. Gene document vectors (blue) are clustered together in LSI space based on the closeness of the angle between them. Next, transcription factor vectors (green) are added to the clusters if their cosine value is close to the average cosine of the gene cluster.
Page 1 of 2(page number not for citation purposes)
BMC Bioinformatics 2009, 10(Suppl 7):A5 http://www.biomedcentral.com/1471-2105/10/S7/A5
Publish with BioMed Central and every scientist can read your work free of charge
"BioMed Central will be the most significant development for disseminating the results of biomedical research in our lifetime."
Sir Paul Nurse, Cancer Research UK
Your research papers will be:
available free of charge to the entire biomedical community
peer reviewed and published immediately upon acceptance
cited in PubMed and archived on PubMed Central
yours — you keep the copyright
Submit your manuscript here:http://www.biomedcentral.com/info/publishing_adv.asp
BioMedcentral
Overview of FrameworkFigure 2Overview of Framework.
and LSI may be used to identify putative transcriptionalmodules from microarray data.
References1. Homayouni R, Heinrich K, Wei L, Berry MW: Gene clustering by
latent semantic indexing of MEDLINE abstracts. Bioinformatics2005, 21(1):104-115.
Page 2 of 2(page number not for citation purposes)