using the reactome database

Upload: utpalmtbi

Post on 02-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Using the Reactome Database

    1/16

    UNIT 8.7Using the Reactome Database

    The completion of multiple genomes in recent years has led to an explosion of information

    about known and predicted gene products. This information explosion has been acceler-

    ated by the invention of high-throughput experimental techniques, such as microarrays

    (see Chapter 7), yeast two-hybrid screens, and ChIP on Chip techniques, which allow

    experimentalists to ask questions about tens of thousands of genes simultaneously. As a

    result, biological researchers now face an embarrassment of riches: there is simply too

    much information to easily digest and interpret.

    One way to reduce the complexity of this information is to adopt a high-level view

    of biological pathways. A microarray experiment that changes the expression pattern

    of thousands of genes may only affect the expression patterns of a small handful of

    biochemical pathways. Hence there is a high degree of interest in the bioinformatics

    community in creating pathway databases. The Reactome project, covered in this unit,

    is one such database. It is a curated collection of well documented molecular reactions

    that span the gamut from simple intermediate metabolism (e.g., sugar catabolism) to

    complex cellular events such as the mitotic cell cycle. These reactions are gathered by

    experts in the field, peer reviewed, and edited by professional staff members prior to being

    published in the database. A semiautomated procedure supplements this information by

    identifying likely orthologous molecular reactions in mouse, rat, zebrafish, and othermodel organisms.

    The protocols in this unit illustrate how to use Reactome to learn the steps of a biological

    pathway and see how one pathway interacts with another. Basic Protocol 1 describes how

    to navigate and browse through the Reactome database. Basic Protocol 2 and Alternate

    Protocol 1 explain how to identify thepathways in which a molecule of interest is involved

    using either the common name or accession number, respectively. Basic Protocol 3 details

    how to use the Pathfinder tool to search the database for possible connections within and

    between pathways. Alternate Protocol 2 describes when and how to use the Advanced

    Search feature.

    NOTE: This information is based on Reactome in July 2004. Some of the Web pages may

    have changed somewhat since the unit written.

    BASIC

    PRO TO CO L 1

    BROWSING A REACTOME PATHWAY

    This protocol will introduce the basic navigational techniques needed to browse the

    Reactome database.

    Necessary Resources

    Hardware

    Computer capable of supporting a Web browser, and an Internet connection

    Software

    Any modern Web browser will work. The formatting of the Reactome pages maylook best using Internet Explorer 4.0 or higher, or Netscape 7.0 or higher.

    Contributed by Lincoln D. SteinCurrent Protocols in Bioinformatics(2004) 8.7.1-8.7.16

    Copyright C 2004 by John Wiley & Sons, Inc.

    AnalyzingMolecularInteractions

    8.7.1

    Supplement 7

  • 8/10/2019 Using the Reactome Database

    2/16

  • 8/10/2019 Using the Reactome Database

    3/16

  • 8/10/2019 Using the Reactome Database

    4/16

    Using theReactomeDatabase

    8.7.4

    Supplement 7 Current Protocols in Bioinformatics

    Repair and Double-Strand Break Repair. The+ marks mean that there are subheadings

    underneath the headings. Clicking on a+ will expand the topic to show its subparts.

    The main screen, to the right of the navigation panel, containing the description of the

    pathway. This is the meat of the information contained within Reactome. The main screen

    begins with the authors, peer reviewers, and editors for this pathway, along with the date

    that the pathway wasfirst released. This is followed by a textsummationthat describes

    the pathway. Below the summation are more details about the pathway, including the

    taxon in which the reaction occurs, the Gene Ontology classification(s) of the pathway,

    and the cellular compartment in which the pathway is known to occur. Further down

    are two importantfields. Thefield that readsEquivalent event(s) in other organism(s)allows one to jump to the corresponding processes in the other model organism systems.

    TheParticipating molecules field lists all proteins, nucleic acids, complexes and small

    molecules, and complexes of these entities that are involved in any of the myriad aspects

    of DNA repair.

    3. Drill down into the Global Genomic Nucleotide Excision Repair subpathway as fol-

    lows. The last entry in the navigation panel is Nucleotide Excision Repair. Click on

    it to open this level of the hierarchy, revealing the subentries Global Genomic NER

    (GG-NER)andTranscription-coupled NER (TC-NER).Click on Global Genomic

    NER (GG-NER), to reveal the page shown in Figure 8.7.3.

    Notice that the navigation panel has now expanded by a level to reveal the relation-

    ship between global genomic nucleotide excision repair and the more general pathways

    that it belongs to on the one hand, and to the more specific pathways (DNA Damage

    Recognition. . ., Formation of incision complex. . .,etc.) on the other hand. Further, the

    Figure 8.7.3 The main screen after drilling down to the Global Genomic NER (GG-NER) sub-

    pathway. The navigation panel on the left has opened up to indicate the subpathways of GG-NER,

    and the highlighting in the reaction map now indicates the reactions involved in this subpathway

    only.

  • 8/10/2019 Using the Reactome Database

    5/16

    AnalyzingMolecularInteractions

    8.7.5

    Current Protocols in Bioinformatics Supplement 7

    Figure 8.7.4 An individual reaction. Notice that a single reaction arrow is highlighted in the

    reaction map, and that the information in the main screen now shows the constituent input and

    output molecular compounds that participate in this reaction.

    highlighting in the reaction map is now restricted to the reactions that are involvedin global

    genomic nucleotide excision repair. The main screen describes the process in text form, and

    is accompanied by a cartoon overview. A much smaller set of participating molecules (par-

    tially scrolled out of view in thefigure) lists the proteins, complexes, and other molecules

    that participate in this process.

    4. In order to drill down to the reaction level, continue to click on subpathways. Eventually

    the reaction level will be reached, where processes are described as the interactions

    of individual molecules. To see this, return to the navigation panel and clickfirst on

    DNA Damage Recognition in GG-NERand then on XPC:HR23B complex binds

    to damaged DNA site with lesion [Homo sapiens]to go to the page shown in Figure

    8.7.4.

    A reaction-level page is similar to the upper-level pages, with a few important differences.

    First of all, the reaction map on the reaction-level page highlights a single reaction arrow

    only, indicating that one is at the lowest level of a pathway. Second, several additional

    fields appear below the text description of the reaction. These new fields include Input,

    which lists the molecules that enter the reaction, and Output, which lists the molecules thatresult from the reaction. In the case of the current reaction, the inputs are the damaged

    DNA substrate and the XPC:HR23B nucleotide excision complex, while the output is the

    complex of XPC:HR23B with the damaged DNA. In other words, this reaction describes

    the binding of XPC:HR23B to damaged DNA prior to the subsequent enzymatic reactions

    that cleave the DNA and excise the damaged base pair.

    Two other new fields are also shown. Preceding event(s) describes the reaction that

    immediately precedes this one temporally, in this caseXPC binds to HR23B forming a

    heterodimeric complex. . . ..Following event(s)describes the reaction that immediately

    follows this one:Recruitment of repair factors to form preincision complex. . .. One can

  • 8/10/2019 Using the Reactome Database

    6/16

    Using theReactomeDatabase

    8.7.6

    Supplement 7 Current Protocols in Bioinformatics

    Figure 8.7.5 After clicking theFollowing event(s)link in the previousfigure, the next step in the

    GG-NER pathway is displayed.

    click on the preceding and following events to follow the reactions backward and forward

    in time.

    5. Move to the next reaction by clicking on theFollowing event(s)link,Recruitment

    of repair factors to form preincision complex. This will lead to the page shown in

    Figure 8.7.5, which describes the recruitment of six new proteins and complexes tocreate a single complex bound at the site of the damaged DNA. This page shows a

    preceding event ofXPC:HR23B complex binds to damaged DNA site with lesion

    [Homo sapiens], which is the page shown in Figure 8.7.4, and a following event

    ofFormation of open bubble structure in DNA by helicases [Homo sapiens]. By

    clicking on the Following event(s)link, it would be possible to continue to follow

    the process forward in time.

    The relationship between thelevelsof the navigation bar on the one hand and thePre-

    ceding event(s) andFollowing event(s) links, on the other hand, may not be immediately

    clear. These represent two distinct ways of viewing pathways. The nested levels of the nav-

    igation bar reflect levels of abstraction in the conceptual organization of pathways. As

    one moves deeper into the hierarchy, the contents of the main screen become more and

    more specific and move closer to the biochemical reaction level. ThePreceding event(s)

    andFollowing event(s)links, on the other hand, usually only appear when one is at the

    reaction level, and move backward and forward in time, remaining always at individual

    reactions. It might seem to be redundant to have this dual mode of navigation, but it is there

    for a good reason. Because biological knowledge is incomplete, there are many instances

    where it is known thatsomething happens next, but the specific molecules that are in-

    volved in this next step are not yet characterized. In this case, theFollowing event(s)

    link will be missing, and one must step up in the hierarchy to a more general description

    of the pathway in order to connect to the next known, well characterized reaction in the

    process.

  • 8/10/2019 Using the Reactome Database

    7/16

    AnalyzingMolecularInteractions

    8.7.7

    Current Protocols in Bioinformatics Supplement 7

    Figure 8.7.5 also illustrates an important aspect of Reactome, the References section at

    the bottom of the screen. Every reaction described in the database is supported by some

    type of provenance. The three main types of provenance are direct literature citations, an

    indirect assertion made by arguing from protein-based similarity in a model organism, and

    an assertion made by the author of the module. In the case of direct literature citations,

    the citation describes experiments performed using a system derived from the taxon under

    consideration. For example, the first reference in the current reaction describes in vivo

    experiments performed on human tissue culture cells that provided direct evidence via

    molecular cross-linking of an association between the XPC:HR23B/DNA complex and the

    repair factors recruited during this step.

    Often, knowledge of human biology is derived from work on model organisms. If under-

    standing of a reaction is derived from work on a model organism system, the references will

    describe those experiments. Internally, direct evidence and indirect evidence from model

    organisms are kept distinct, but the user interface does not currently reflect that fact.

    Finally, the high-level, more general pathways will usually be based on an assertion by the

    author of the module and supported by one or more review articles. Click on the authors

    name at the top of the main screen to see the list of review articles that describe the

    pathway.

    6. Reactome provides information about the subunits of a complex, as well as the larger

    ensembles of proteins that a complex participates in. In this example, from the Re-

    cruitment of repair factors to form preincision complexpage, click on the TFIIHlink in the Input section. This will load a page that contains information about the

    TFIIH (transcription factor IIH) complex (Fig. 8.7.6).

    Figure 8.7.6 This page describes the TFIIH protein. In addition to describing its subunit structure,

    the page notes all the macromolecular complexes and pathways in which TFIIH participates. The

    reaction map highlighting indicates that, in addition to DNA repair processes, TFIIH is involved in

    mRNA transcription (arrow).

  • 8/10/2019 Using the Reactome Database

    8/16

    Using theReactomeDatabase

    8.7.8

    Supplement 7 Current Protocols in Bioinformatics

    Because this page describes a molecule and not a reaction or pathway, there is no navigation

    panel on the left. However, the reaction map at the top of the page is still present, and it lights

    up to highlight the reactions in which TFIIH participates. Mousing over the highlighted

    pathways reveals that, in addition to the DNA excision repair pathway that has been browsed

    in the steps above, TFIIH also participates in PolII-mediated RNA transcription. This

    connection between RNA transcription and DNA repair might surprise biologists who are

    not well acquainted with DNA excision repair, and illustrates how Reactome bridges the

    disciplines.

    The section near the bottom of Figure 8.7.6 labeledParticipates in processes lists all

    the pathways and reactions in which TFIIH participates. Although not shown in Figure

    8.7.6, at the top of this section there is an extensive list of all the reactions in which the

    current molecule participates. This is organized in a hierarchical manner that mirrors the

    pathway hierarchy of the navigation panel. At the bottom, these events are organized into

    three groups: all events that produce TFIIH, all that consume it, and all that are catalyzed

    by it.

    7. To learn more about a protein subunit, click on the subunit of interest. In this case, one

    of the subunits of TFIIH is Cdk7 (shown in Fig. 8.7.6; it complexes with Cyclin H and

    MAT1 to form the CAK subcomplex, which in turn is one of the major components

    of TFIIH). Click on the Cdk7 link to load a page that describes it (Fig. 8.7.7). In

    addition to highlighting the DNA repair and RNA transcription constellations, the

    reaction map now shows highlighting in the Mitotic Cell Cycle constellation as well

    (upper left quadrant of the image), reflecting Cdk7s role as a cell-cycle checkpointmolecule.

    This page is called the reference entity page because it contains links to UniProt,

    Ensembl, and other reference databases that describe the molecule.

    Figure 8.7.7 The reference entity page describes the relationship between a molecule as it is

    represented in Reactome and one or more entries in a third-party database such as SwissProt.

  • 8/10/2019 Using the Reactome Database

    9/16

    AnalyzingMolecularInteractions

    8.7.9

    Current Protocols in Bioinformatics Supplement 7

    BASIC

    PRO TO CO L 2

    FINDING THE PATHWAYS INVOLVING A GENE OR PROTEIN

    This protocol will describe how to identify pathways and reactions that involve a gene or

    protein of interest. For the purposes of illustration, the cyclin-dependent kinase 7 gene

    will be used, which has the following identifiers:

    Protein product: Common name: Cdk7

    UniProt (SwissProt): CDK7 HUMAN

    Gene: LocusLink: 1022

    GenBank: NM 001799

    Ensembl: ENSG00000134058.

    See Alternate Protocol 1 to search by a database accession number rather than by a

    common name.

    Necessary Resources

    Hardware

    Computer capable of supporting a Web browser, and an Internet connection

    Software

    Any modern Web browser will work. The formatting of the Reactome pages maylook best using Internet Explorer 4.0 or higher, or Netscape 7.0 or higher.

    1. Point the browser to the Reactome home page at http://www.reactome.org.

    2. On the home page (Fig. 8.7.1), in the search bar near the top of the page (see annotation

    to step 1 of Basic Protocol 1), click the text box (second box from the right-hand side

    of the search bar), type Cdk7, then press the Enter key (or click the Go! button). This

    brings up the search results page shown in Figure 8.7.8.

    For now, ignore the textfields and buttons that occupy most of the real estate at the top of

    the page, and focus on the section at the bottom under the headingFound 8 instances in

    the following categories.This section tells the user that Reactome knows of 1 Literature

    reference, 1 summation, 4 ReferenceEntities, and 2 PhysicalEntities that have something

    to do with Cdk7. Summations are the text paragraphs that appear at the top of pages that

    describe pathways and reactions. ReferenceEntities are lists of protein and gene entries

    that appear in online genome databases. What are needed, although not apparent from the

    name, are the PhysicalEntities, which is the term that Reactome uses for anything that has

    Figure 8.7.8 Results from the quick search on the Reactome home page are displayed at the

    bottom of the full-featured Advanced Search page.

  • 8/10/2019 Using the Reactome Database

    10/16

    Using theReactomeDatabase

    8.7.10

    Supplement 7 Current Protocols in Bioinformatics

    Figure 8.7.9 Following the search results to the Cdk7 page displays the structure of Cdk7 and a

    hierarchical list of the pathways in which it is known to participate.

    mass, such as a macromolecule. The search interface will be modified in the near future to

    make it easier to interpret.

    3. Navigate to the Cdk7 entry by clicking on the2 link that appears after the Physi-

    calEntity label. This will lead to a list of two entries in Reactome, MAT1 (also known

    asCdk7 assembly factor) and Cdk7 itself. Click on the Cdk7link. This will lead

    to the page shown in Figure 8.7.9.

    This page, which is similar to the TFIIH page shown in Figure 8.7.7, describes everything

    that Reactome knows about Cdk7, including its names in other online databases, the protein

    complexes that it belongs to, and the pathways and reactions that it participates in. Any of

    these links can be clicked to begin browsing the pathways involving Cdk7 as described in

    Basic Protocol 1.

    ALTER NATE

    PRO TOCO L 1

    FINDING THE PATHWAYS INVOLVING A GENE OR PROTEIN USINGSwissProt, Ensembl, OR LocusLink NAME

    Instead of searching for a gene or protein using its common name, as described in Basic

    Protocol 2, one may wish to use the accession number by which it is known in SwissProt,

    Ensembl, or LocusLink. The steps for doing so, using a SwissProt accession number,

    are presented here. The same procedure works for Ensembl or LocusLink identi fiers.

    However, Reactome does not currently recognize GenBank accession numbers, e.g.,

    NM 001799, because of the redundancy of GenBank entries. If one wish to find a proteinbased on its GenBank accession number, one should first use NCBI LocusLink to find

    the correct LocusLink number, and then use this number to access the appropriate entry

    in Reactome.

    Necessary Resources

    Hardware

    Computer capable of supporting a Web browser, and an Internet connection

  • 8/10/2019 Using the Reactome Database

    11/16

    AnalyzingMolecularInteractions

    8.7.11

    Current Protocols in Bioinformatics Supplement 7

    Software

    Any modern Web browser will work. The formatting of the Reactome pages maylook best using Internet Explorer 4.0 or higher, or Netscape 7.0 or higher.

    1. Point the browser to the Reactome home page at http://www.reactome.org.

    2. On the home page (Fig. 8.7.1), in the search bar near the top of the page (see annotation

    to step 1 of Basic Protocol 1), click the text box (second box from the right-hand side

    of the search bar), type CDK7 HUMAN, then press the Enter key (or Click the Go!

    button).

    This brings up a reference entity page (see Basic Protocol 1, step 7) similar to the one

    shown in Figure 8.7.7.

    3. Navigate to the molecule or pathway of interest. The reference entity page is similar

    in most respects to the PhysicalEntity page shown in Figure 8.7.9. From here it is

    possible to navigate to the pathways and reactions in which Cdk7 takes part, view the

    complexes that contain Cdk7, or link to the PhysicalEntity page shown in Figure 8.7.9.

    ALTER NATE

    PRO TO CO L 2

    USING ADVANCED SEARCH

    The simple searches shown in Basic Protocol 2 and Alternate Protocol 1 will suffice for

    many situations. However, the default search casts a very wide net and may return morehits than one wants. If this is the case, one may wish to use the Advanced Search, which

    gives much finer control over the search. To illustrate, this protocol describe how to search

    forpyruvate dehydrogenase,whose default search returns multiple hits on compounds,

    events, literature references, and other database entries.

    Necessary Resources

    Hardware

    Computer capable of supporting a Web browser, and an Internet connection

    Software

    Any modern Web browser will work. The formatting of the Reactome pages maylook best using Internet Explorer 4.0 or higher, or Netscape 7.0 or higher.

    1. Point the Web browser to the Reactome home page at http://www.reactome.org.

    2. On the home page (Fig. 8.7.1), in the search bar near the top of the page (see annotation

    to step 1 of Basic Protocol 1), click the text box (second box from the right-hand side

    of the search bar), and typepyruvate dehydrogenase.

    Pyruvate dehydrogenase is a protein complex, and one might like to limit the search to

    database entries for complexes, as in step 3.

    3. Go to the pull-down menu on the far left of the search bar and change the scope of the

    search fromeverythingtocomplexes,then press the Go! button to the right of the

    search bar.

    This will return a list of 13 complexes that contain the wordspyruvate dehydrogenase,

    including FADH2-linked pyruvate dehydrogenase complex, pyruvate dehydrogenase E2

    holoenzyme, S-acetyldihydrolipoamide linked, and pyruvate dehydrogenase E2 trimer.

    By default, the search willfind matches in Homo sapiens. If one wishes to see matches in

    another species, one can change the search parameters as in step 4.

  • 8/10/2019 Using the Reactome Database

    12/16

  • 8/10/2019 Using the Reactome Database

    13/16

    AnalyzingMolecularInteractions

    8.7.13

    Current Protocols in Bioinformatics Supplement 7

    Figure 8.7.10 After entering the names of the start and end compounds, the Pathfinder willdisplay pull-down menus of candidate compounds known to Reactome. Pull down the menus in

    order tofine tune Reactomes choice of compounds.

    Figure 8.7.11 The PathFinder graphic display shows all the steps necessary to traverse from

    the starting compound to the ending compound.

    the menus and select the best match to what was intended. In the current example,

    Reactome got the start compound right, but guessed incorrectly for the end compound,

    returning an intermediate complex that happens to involve pol II transcription. Click on

    the end compound menu and select pol II transcription complex.If the sought-afteritem is not found at first, try rephrasing it and typing it into the appropriate textfield,

    then pressing Enter. This will update the list of candidates in the pull-down menu.

    5. When the correct start and end compounds have been selected, press the Go! button

    at the bottom of the panel. In a few seconds the page will refresh and display a list

    of reactions that together connect the origin of replication to the pol II transcription

    complex. The found path traverses the DNA repair pathways, which involve both

    DNA replication and transcription factors. One can click on any of these steps to

    begin browsing Reactome at that point.

  • 8/10/2019 Using the Reactome Database

    14/16

    Using theReactomeDatabase

    8.7.14

    Supplement 7 Current Protocols in Bioinformatics

    6. Press the button labeled View in Pathway underneath the Pathfinder list. Provided that

    Java is installed and running on the system, a new image window will pop up that

    shows this pathway in a graphical form (Fig. 8.7.11).

    The user can interact with the pathway visualization in a limited manner in order to make

    it more visually appealing. To do this, press the button labeledStop relaxingto stop the

    automatic layout process andfix the reaction boxes in place. Next, grab the boxes with the

    mouse and move them into the preferred positions.

    The Pathfinder visualization does not currently support exporting the display as a static

    image. However, it is possible to use the screenshot feature of ones local computer (Alt-

    Print Scr on the PC) in order to capture the pathway. Also note that the View in Pathwaybutton will not appear unless Java is installed.

    COMMENTARY

    Background InformationThe Reactome project is a collaboration be-

    tween Cold Spring HarborLaboratory andThe

    European Bioinformatics Institute, and aims

    to collect structured information on all the bi-

    ological pathways in the human (Joshi-Tope

    et al., 2003; see Internet Resources for online

    version of this paper). The project is build-ing its database by inviting faculty-level lab-

    oratory researchers to contribute a pathway or

    sub-pathway to the database. To achieve this,

    contributors are instructed on the use of a spe-

    cializedpieceof authoringsoftwareand areas-

    sisted in their work by a staff of curators based

    at the two institutions. After authoring, each

    pathway is checked for consistency both man-

    ually and automatically and then sent to one

    or more external peer reviewers. The pathway

    is published to the Web when all internal and

    external peer review is satisfactory. In many

    ways, the project resembles a review journal,

    except that its output is a database rather than

    a series of papers.

    In order to assist authors in organizing their

    domain of knowledge into a set of defined

    pathways,Reactomerelies on frequentmini-

    jamborees of roughly a half-dozen authors.

    Duringthesejamborees,whichare held in con-

    junction with international meetings, authors

    working on a set of related pathways get to-

    gether in the same room and work out the

    logical structure of their topic. This is also an

    opportunity for Reactome curators to train theauthors in the use of the authoring software.

    Reactome uses a simple scheme for describ-

    ing biological pathways in which all molecu-

    lar interactions are defined as reactions. A re-

    action takes a series of inputs and transforms

    them into a series of outputs, where inputs and

    outputs are any type of molecular compound.

    For example, the reaction in which proinsulin

    is cleaved to form the and chains takes as

    its input proinsulin, and produces the insulin

    and polypeptides.

    Representing biology as a set of molecular

    reactions turns out to have broad expressive

    power, but sometimes the results are disorient-

    ing. For example, the reaction in which insulin

    binds to the insulin receptor takes as its in-

    puts extracellular insulin and the extracellularportion of the insulin receptor, and produces

    as its output the complex of insulin and its

    receptor, which, in Reactome, is represented

    as a distinct molecular entity. The reaction by

    which extracellular glucose is transported into

    the cytosol transforms extracellularD-glucose

    into intracellular D-glucose. Hence, a search

    of Reactome forD-glucosewillfind bothD-

    glucose (intracellular cytosolic) and D-glucose

    (extracellular).

    In addition to inputs and outputs, Reactome

    reactions have a discrete set of additional at-

    tributes. For those reactions that are mediated

    by catalysts, the catalyst enzyme and its ac-

    tivity are noted. Reactions are also annotated

    using the cellular compartment in which they

    occur. While Reactomedoes notpretendto be a

    definitive source of information on the cellular

    location of macromolecules, its data model is

    set up to work smoothly with future databases

    of subcellular localization; information on the

    subcellular location of macromolecules will

    help automated path-prediction software dis-

    tinguish plausible pathways from impossible

    ones. Finally, each reaction is supported by lit-erature citations, either those reporting experi-

    ments performed directly in the humansystem,

    or those performed on model systems when

    there is high-quality protein similarity data to

    suggest that thesame reaction is likelyto occur

    in humans.

    In order to assist with the comprehensibil-

    ity of the resource, the reactions are annotated

    with text narratives and illustrations, and are

  • 8/10/2019 Using the Reactome Database

    15/16

    AnalyzingMolecularInteractions

    8.7.15

    Current Protocols in Bioinformatics Supplement 7

    organized into a series of discrete goal-driven

    pathways.

    Reactome is related to several other path-

    way databases, but has distinct methodolo-

    gies and aims. The Human Protein Reference

    Database (HPRD; Peri et al., 2003) is also

    a hand-curated database of biological path-

    ways. The HPRD focus, however, is to an-

    notate individual proteins and their physical

    and genetic interactions. HPRD contains in-

    formation derived from large-scale screening

    studies as well as individual papers that report

    pairwise interactions. A result of this method-

    ology is that many of the interactions found in

    HPRD are speculative and subject to change.

    Reactome takes a much more conservative ap-

    proach; it represents far fewer molecular in-

    teractions than HPRD does, but they are more

    likely to be correct and less subject to revision.

    HumanCyc (Krieger et al., 2004) is a

    database of biological pathways that uses a

    data model generally similar to Reactome,

    although the user interface and underlyingdatabase technology are quite different in de-

    tail. The focus of HumanCyc is intermediate

    metabolism, however. It tends to have more

    information on the creation and utilization of

    small molecules than does Reactome, but less

    information on such higher-level processes as

    transcription, translation, and the cell cycle.

    The Kyoto Encyclopedia of Genes and

    Genomes, or KEGG (Kanehisa et al., 2004)

    features an extensive set of biological path-

    way charts. Like HumanCyc, KEGG focuses

    on intermediatemetabolismrather thanhigher-

    level pathways. Its data model differs funda-mentally from Reactomes by representing the

    motivating force of all reactions in the form

    of catalyst activities via Enzyme Commission

    EC numbers. Because there is not a one-to-one

    mapping between EC activity and polypeptide,

    it can be problematic to relate a protein repre-

    sented in SwissProt to a reaction represented

    in KEGG.

    Finally, the BioCarta project (http://www.

    biocarta.com) represents human biology as

    a series of colorful high-resolution diagrams.

    Unlike Reactome or the other projects men-

    tioned earlier, these diagrams are the end prod-uct of the project; there is no underlying

    database. The focus of BioCarta is to be an

    education and visualization tool, rather than to

    support data mining and pattern discovery.

    The Reactome database is far from com-

    plete. At the time this module was written, Re-

    actome covered just 8% of the human genome,

    a number conservatively estimated by divid-

    ing the number of human SwissProt entries

    that take part in Reactome reactions by the

    total number of human entries in the entire

    SwissProt database. Because all of the other

    pathway databases mentioned here are also in-

    complete, the biologist faces the daunting task

    of visiting each of these sites in an attempt

    to fill in the holes in one databases coverage

    with information from the others. The BioPAX

    project (http://www.biopax.org) promises to

    improve this situation by creating a standard-

    ized file format for representing biological

    pathways and reactions. Reactome and many

    of the other pathway databases have commit-

    ted to exporting their data in BioPAX format.

    In the future, this will enable the databases

    to exchange pathways and to co-curate data,

    thereby accelerating the rate in which the gaps

    are closed.

    Reactomeis a fully open-source project.All

    the software developed for use in Reactome

    is available for download and redistribution,

    and the data itself is available in a variety of

    formats. The Download link on theReactome

    Web site provides instructions for obtaining

    data and software.

    The Reactome dataset is available as

    relational database tables in a format com-

    patible with MySQL (http//www.mysql.com;

    UNIT 9.2) and as files compatible with

    the Protege-2000 knowledgebase editor

    (http://protege.stanford.edu) and will soon be

    available as tab-delimited textfiles.

    Literature CitedJoshi-Tope, G., Vastrik, I., Gopinath, G.R.,

    Matthews, L., Schmidt, E., Gillespie, M.,DEustachio, P., Jassal, B., Lewis, S., Wu, G.,Birney, E., and Stein, L. 2003. The GenomeKnowledgebase: A Resource for Biologists andBioinformaticists. Cold Spring Harbor Sym-posia on Quantitative Biology LXVIII:237-244.Cold Spring Harbor Laboratory Press, ColdSpring Harbor, N.Y.

    Kanehisa, M., Goto, S., Kawashima, S., Okuno,Y., and Hattori, M. 2004. The KEGG resourcefor deciphering the genome.Nucleic Acids Res.32:D277-D280.

    Krieger, C.J., Zhang, P., Mueller, L.A., Wang, A.,Paley, S., Arnaud, M., Pick, J., Rhee, S.Y., andKarp, P.D. 2004. MetaCyc: A multiorganismdatabase of metabolic pathways and enzymes.

    Nucleic Acids Res.32:D438-D442.

    Peri, S., Navarro, J.D., Amanchy, R., Kristiansen,T.Z., Jonnalagadda, C.K., Surendranath, V.,Niranjan, V., Muthusamy, B., Gandhi, T.K.,Gronborg, M., Ibarrola, N., Deshpande, N.,Shanker, K., Shivashankar, H.N., Rashmi, B.P.,Ramya, M.A., Zhao, Z., Chandrika, K.N.,Padma, N., Harsha, H.C., Yatish, A.J., Kavitha,M.P., Menezes, M., Choudhury, D.R., Suresh,S., Ghosh, N., Saravana, R., Chandran, S.,

  • 8/10/2019 Using the Reactome Database

    16/16