meeting review: bioinformatics and medicine – from...

8
Feature Meeting Review: Bioinformatics And Medicine – From molecules to humans, virtual and real Hinxton Hall Conference Centre, Genome Campus, Hinxton, Cambridge, UK – April 5th–7th Roslin Russell* MRC UK HGMP Resource Centre, Genome Campus, Hinxton, Cambridge CB10 1SB, UK * Correspondence to: MRC UK HGMP Resource Centre, Genome Campus, Hinxton, Cambridge CB10 1SB, UK. Received: 22 April 2002 Accepted: 24 April 2002 Abstract The Industrialization Workshop Series aims to promote and discuss integration, automa- tion, simulation, quality, availability and standards in the high-throughput life sciences. The main issues addressed being the transformation of bioinformatics and bioinformatics- based drug design into a robust discipline in industry, the government, research institutes and academia. The latest workshop emphasized the influence of the post-genomic era on medicine and healthcare with reference to advanced biological systems modeling and simulation, protein structure research, protein-protein interactions, metabolism and physiology. Speakers included Michael Ashburner, Kenneth Buetow, Francois Cambien, Cyrus Chothia, Jean Garnier, Francois Iris, Matthias Mann, Maya Natarajan, Peter Murray-Rust, Richard Mushlin, Barry Robson, David Rubin, Kosta Steliou, John Todd, Janet Thornton, Pim van der Eijk, Michael Vieth and Richard Ward. Copyright # 2002 John Wiley & Sons, Ltd. Keywords: bioinformatics; medicine; virtual human; protein structure; personalized medicine; ontology Introduction The conference was jointly organized and spon- sored by the Deep Computing Institute of IBM (http://www.research.ibm.com/dci/), the Task Force on Bioinformatics of the International Union of Pure and Applied Bioinformatics (http://www.iupab. org/taskforces.html) and the European Bioinfor- matics Institute (EMBL-EBI, http://www.ebi.ac.uk/). Barry Robson (IBM Deep Computing Institute) opened the workshop by stating that industrializa- tion does not mean commercialization but high- throughput industrial strength. Last year saw the development of ‘clinical bioinformatics’ linking cli- nical data to genomics, requiring a large collective database that is ‘patient-centric’. This field will lead to improved diagnostic tools, therapeutics and patient care. Keynote speakers Franc ¸ois Cambien (Institut National de la Sante ´ et de la Recherche Me ´dicale, INSERM) presented his latest results [9] and gave an overview on the mole- cular and epidemiological genetics of cardiovascular disease and the different approaches towards geno- typing. He stressed the importance of creating a catalogue of ‘all’ common polymorphisms in the human genome. Since the start of the Etude Cas- Te ´moin sur l’Infarctus du Myocarde (ECTIM) in 1988 (the first large scale study on myocardial in- farction, recent figures on the total number of candi- date genes and polymorphisms across 29 studies are now at 102 and 475 respectively as recorded by GeneCanvas (www.genecanvas.org) a website sup- ported by INSERM. The project and others will continue to search for more polymorphisms but Comparative and Functional Genomics Comp Funct Genom 2002; 3: 270–276. Published online 9 May 2002 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002 / cfg.178 Copyright # 2002 John Wiley & Sons, Ltd.

Upload: others

Post on 21-Oct-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

  • Feature

    Meeting Review: Bioinformatics AndMedicine – From molecules to humans,virtual and realHinxton Hall Conference Centre, Genome Campus, Hinxton, Cambridge, UK– April 5th–7th

    Roslin Russell*MRC UK HGMP Resource Centre, Genome Campus, Hinxton, Cambridge CB10 1SB, UK

    *Correspondence to:MRC UK HGMP ResourceCentre, Genome Campus,Hinxton, Cambridge CB10 1SB,UK.

    Received: 22 April 2002

    Accepted: 24 April 2002

    Abstract

    The Industrialization Workshop Series aims to promote and discuss integration, automa-

    tion, simulation, quality, availability and standards in the high-throughput life sciences.

    The main issues addressed being the transformation of bioinformatics and bioinformatics-

    based drug design into a robust discipline in industry, the government, research institutes

    and academia. The latest workshop emphasized the influence of the post-genomic era on

    medicine and healthcare with reference to advanced biological systems modeling and

    simulation, protein structure research, protein-protein interactions, metabolism and

    physiology. Speakers included Michael Ashburner, Kenneth Buetow, Francois Cambien,

    Cyrus Chothia, Jean Garnier, Francois Iris, Matthias Mann, Maya Natarajan, Peter

    Murray-Rust, Richard Mushlin, Barry Robson, David Rubin, Kosta Steliou, John Todd,

    Janet Thornton, Pim van der Eijk, Michael Vieth and Richard Ward. Copyright # 2002John Wiley & Sons, Ltd.

    Keywords: bioinformatics; medicine; virtual human; protein structure; personalized

    medicine; ontology

    Introduction

    The conference was jointly organized and spon-sored by the Deep Computing Institute of IBM(http://www.research.ibm.com/dci/), the Task Forceon Bioinformatics of the International Union ofPure and Applied Bioinformatics (http://www.iupab.org/taskforces.html) and the European Bioinfor-matics Institute (EMBL-EBI, http://www.ebi.ac.uk/).Barry Robson (IBM Deep Computing Institute)opened the workshop by stating that industrializa-tion does not mean commercialization but high-throughput industrial strength. Last year saw thedevelopment of ‘clinical bioinformatics’ linking cli-nical data to genomics, requiring a large collectivedatabase that is ‘patient-centric’. This field will leadto improved diagnostic tools, therapeutics andpatient care.

    Keynote speakers

    François Cambien (Institut National de la Santé et dela Recherche Médicale, INSERM) presented hislatest results [9] and gave an overview on the mole-cular and epidemiological genetics of cardiovasculardisease and the different approaches towards geno-typing. He stressed the importance of creating acatalogue of ‘all’ common polymorphisms in thehuman genome. Since the start of the Etude Cas-Témoin sur l’Infarctus du Myocarde (ECTIM) in1988 (the first large scale study on myocardial in-farction, recent figures on the total number of candi-date genes and polymorphisms across 29 studies arenow at 102 and 475 respectively as recorded byGeneCanvas (www.genecanvas.org) a website sup-ported by INSERM. The project and others willcontinue to search for more polymorphisms but

    Comparative and Functional Genomics

    Comp Funct Genom 2002; 3: 270–276.Published online 9 May 2002 in Wiley InterScience (www.interscience.wiley.com). DOI: 10.1002 /cfg.178

    Copyright # 2002 John Wiley & Sons, Ltd.

  • how many are there, are they all relevant, and howdo we make the most out of this information? Littleis known about the distribution and structure ofpolymorphisms across the genome despite the avail-ability of the almost complete sequence. On theassumption that there are 10–20 common poly-morphisms per gene affecting coding and regulatoryregions and that there are around 30 000 humangenes in total, it is estimated that there may bearound 500 000 functionally related SNPs. The classi-fication and exploitation of polymorphisms for thewhole genome not only requires data quality,storage, integration and accessibility, but also anunderstanding of data relevance and representationin relation to complex disorders. Lopez and Murray[5] ranked cardiovascular disease the fifth mostcommon disease in 1990 but they estimate that in2020, it will be the most common disease world-wide. There is a complex interplay between envir-onmental and genetic determinants: the former mayexplain the difference in disease prevalence betweenpopulation groups, and time related changes, whilethe latter plays a role in susceptibility. However, theexpectations of finding genetic factors for multi-factorial diseases may not be fulfilled because manygenes are involved and several models (such as thewhole genome approach) may be much too simplis-tic. Important correlations may also be found notonly in phenotypes such as clinical data but also inbiological systems. When integrated with epidemiol-ogy and Mendelian genetics, the application of highthroughput proteomic tools as well as comparativeinvestigations of different species will provide agreater understanding into evolution and functionalbiology and is expected to provide new leads fordrug development or other approaches to maintainhealth.

    Janet Thornton (EMBL-EBI, http://www.ebi.ac.uk/Thornton/index.html) presented results on theanalysis of the domain structure of enzymes andhow this knowledge can be interpreted in the con-text of the evolution of metabolic pathways [10,8,6].The results from an analysis of small molecule meta-bolic pathways in yeast support a ‘mosaic’ model ofevolution in which enzymes in metabolic pathwaysare chosen in an unsystematic way to meet the func-tional requirements of a metabolic pathway. Furtheranalysis on the conservation of function acrossfamilies of homologous enzymes shows that thedegree of functional similarity varies according to thedegree of sequence identity shared by enzyme homo-logue pairs. High sequence identity corresponds to

    almost identical function, while proteins with lessthan 30% sequence identity show significant func-tional differences. These results have been appliedto the problem of genome annotation and func-tional assignments have been catalogued in thedatabase, Gene3D (http://www.biochem.ucl.ac.uk/bsm/cath_new/Gene3D/) for a number of completedgenomes, providing functional and structural anno-tation.

    Virtual human biology

    Michael Ashburner (University of Cambridge) spokeabout recent developments and tools associatedwith the Gene Ontology2 Consortium (GO, http://www.geneontology.org/). GO uses a natural lan-guage ontology system and aims to provide a setof structured vocabularies for specific biologicaldomains that can be used to describe gene productsin any organism. There are GO terms for human,mouse, rat, worm, fly, amoeba, yeast, Arabidopsisand the more recently sequenced rice, and TIGRare now remodeling all their bacterial genomes. Thethree organizing principles (orthogonal ontologies)of GO are: biological process, molecular functionand cellular component. There is a hierarchicalstructure within these groups that allows queryingat different levels and as the number of theseparent-child relationships increase, the annotationsget more complex. Many databases are now colla-borating with GO and these include MGI, LOCUS-LINK, SWISSPROT and CGAP. Recent browsersand query tools attached to GO include the follow-ing: the MGI GO Browser, AmiGO (Berkeley,http://www.godatabase.org/) ; the CGAP GO Brow-ser (http://cgap.nci.nih.gov/Genes/GOBrowser), theEP : GO Browser (http://ep.ebi.ac.uk/EP/GO/) whichis built into EBI’s Expression Profiler (a set of toolsfor clustering and analysing microarray data); Quick-GO (http://www.ebi.ac.uk/ego/) which is integratedinto InterPro; and DAG-Edit (http://sourceforge.net/projects/geneontology), a purpose built, down-loadable Java application to maintain, edit anddisplay GO terms. With the emerging developmentof ontologies in other domains, the Global OpenBiology Ontologies (GoBo) or Extended GO(EGO) has also been set up with the aim ofbeing open source and DAG-Edit compatible,using non-overlapping defined terms and commonID space.

    Richard Ward (Oak Ridge National Laboratory)

    Meeting Review: Bioinformatics and Medicine 271

    Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276.

  • is involved in the Virtual Human (VH) Project(http://www.ornl.gov/virtualhuman/), a human simu-lation tool. The ambitious goal of the VH will be anintegrated digital representation of the functioninghuman from the molecular level to the livinghuman, and from conception to old age. Tools arebeing developed to represent the body’s processesfrom DNA molecules and proteins to cells, tissues,organs and gross anatomy. These tools includeXML (Extensible Markup Language) where ontol-ogies and a high level description of the humanbody are important, visualization tools and prob-lem solving environments. A forum has been estab-lished for the digital human effort to provide opensource code tools to represent the body’s processes:http://www.fas.org/dh/. The Physiome project (http://www.physiome.org) is another comprehensive mod-eling environment that aims to promote modelingfor analysing integrative physiological function interms of underlying structure and molecular mech-anisms. There is a recognised need to adoptstandards for data definition, storage and transfer,XML to replace standard data input, and databasesfor creating and developing XML schemas. There isalso a need to establish integrative databases deal-ing with genomic, biophysiological, bioengineeringand clinical data. XML descriptions are now becom-ing more common and examples of these include:cellular function, cellMEL (http://www.cellml.org); sys-tems biology modeling, SBML (http://xml.coverpages.org/sbml.html); and their own prototype for physio-logical modeling, physioML (http://www.ornl.gov/virtualhuman). His group originally used a JavaSAX parser for physioML but they converted tousing a Java document object model instead, sincenew objects in the description are easily defined andit allows the integration of physiology and anat-omy. However, ontologies have an advantage overXML because they provide a richer description ofrelationships (semantic rules) between objects andinteroperability improves between systems. Theyare now creating a Digital Human (DH) for thedevelopment of an anatomical ontology. The DHrequires a problem-solving environment for com-plex 3D-flow modeling and for high performancecomputing, three-tiered client/server architecturewith Java RMI and NetSolve (http://icl.cs.utk.edu/netsolve) is used. He also mentioned how a commoncomponent architecture, which is currently used forclimate modeling, is the way of the future for theVH project.

    Protein structure and proteininteraction

    Jean Garnier (INRA – de Recherche de Versailles)gave an overview of protein structure predictiontechniques, describing some of the more widely usedtools. Structure prediction tools fall into threeclasses, those which predict structures from firstprinciples such as molecular dynamics simulations,those based on statistical analysis, such as hiddenMarkov models (HMMs) and those based oncomparison to known structures, i.e. homologymodeling. Jean Garnier stressed the importance oftesting different techniques, referring to EVA andCASP. EVA is an automatic evaluation tool thatcontinuously tests publicly available structure pre-diction servers by feeding them sequences forknown new structures from PDB and comparingthe results with the determined structures. CASP isa public competition in which experts attempt topredict structures for proteins whose structureshave been determined but whose structures havenot been made public. So far, structure prediction ismost successful for proteins with known folds.Identification of proteins whose structure may besimilar to a novel protein is often difficult andattempts to identify homologues of a new sequenceoften fail despite the existence of homologues withknown structures because linear alignment of sequ-ences is not as conserved as a three-dimensionaltopology. Some recent ‘threading’ programs havebeen developed with better sensitivity than conven-tional alignment techniques for detecting distanthomologues for novel sequences, such as FORESST(http://abs.cit.nih.gov/foresst/, 3) and FROST (http://www.inra.fr/bia/produits/logiciels/BD2html.php?nom=FROST). The former program predicts whichknown fold best matches a novel sequence usingHMMs based on patterns of secondary structure inproteins with known structures to analyse novelproteins. Secondary structure prediction algorithmsare applied to the novel sequence to determine apredicted secondary structure topology which isthen fed to the HMM to identify known structureswith matching topologies.

    Major evolutionary transitions have arisen as aresult of changes in protein repertoires and changesin the expression of protein repertoires. CyrusChothia and Bernard de Bono (at the MRCLaboratory of Molecular Biology) have been study-ing the immunoglobulin superfamily as a model ofwhole genome protein repertoires. In his talk Dr

    272 Meeting Review: Bioinformatics and Medicine

    Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276.

  • Chothia discussed problems that have arisen fromanalyses of Ensembl in the context of attemptsto determine the complete repertoire of immuno-globulin superfamily proteins in the human genome.Application of HMMs to predicted proteins fromEnsembl to identify immunoglobulin superfamilyproteins is hampered by various issues. Parts ofsome genes are absent while parts of other genes aresplit and widely separated from each other. Con-sistency between releases of Ensembl is also a pro-blem. Each new Ensembl release contains newprotein predictions, while predictions from previousreleases may be deleted. The solution involvedlooking at all nine releases of Ensembl to collec-tively identify all immunoglobulin superfamily pro-teins from all the predicted proteins in these releasesusing HMMs. This was followed by removingredundant predictions and confirming the qualityof the predictions that remain using GeneWise.

    Environmental healthcare issues andbioinformatics/individuality and thepursuit of personalised molecularmedicine

    Kenneth Buetow (NCBI) is actively involved in theCancer Genome Anatomy Project (CGAP) andspoke about how combining bioinformatics andgenomics gives an insight into the etiology of can-cer. The most obvious challenges associated withutilizing large data collections are management,visualisation and interpretation. If bioinformatics isto succeed in integrating knowledge from diversefields of research, the problem of standardizingterminology and language must be overcome. Hediscussed how the NCBI is using the GO2 infra-structure in cancer research.

    The annotation of genomes requires sensitivetools for the identification of homologous proteins,allowing proteins of unknown function to be linkedto homologues with known function. Conventionalsequence alignment tools are often insensitive todistant sequence relationships even though the dis-tantly related sequences may have similar structuresand functions. Since structure is typically moreconserved than sequence, the use of structure basedhomologue recognition tools for genome annota-tion has significant value. Michael Sternberg (Imper-ial College) presented the latest results from hisgroup on the development of genome annotationtools based on structural profiles [2,7]. His group

    has developed tools both to identify distantlyrelated homologues and to assign function to thosehomologues. The 3D-PSSM tool (http://www.sbg.bio.ic.ac.uk/y3dpssm/) was developed to improvePSI-BLAST predictions and to detect distant sequ-ence relationships using structural profiles fromknown proteins. An algorithm has also been devel-oped for predicting functional sites in proteinsbased on multiple sequence alignment of homolo-gous sequences followed by identification of invar-iant polar residues. Spatially clustered groups ofthese residues are considered to be functional sites.Comparison of data for proteins with known func-tional sites with the multiple alignment predictionsshowed the system to be 94% accurate. He empha-sized the extent to which automated methods can beused but also how human insight can provide addedvalue to genome annotation projects.

    It is proving difficult to develop novel therapeuticmolecules for treatment of multifactorial disordersbased upon known targets despite a steady increasein the number of such targets. In most cases, thera-peutic molecules, which successfully alleviate a dis-ease do not directly affect the causative geneproducts, instead they act on the physiologicalmechanisms that are ultimately triggered. There-fore, there is a need to identify these physiologicalmechanisms and an understanding of the under-lying events should be the main aim of the research.However, technological and conceptual difficultiesmust be resolved for functionally targeted drugdevelopment to be a reality. Since it is wellestablished that physiological changes result fromand induce gene expression pattern changes, thelatter together with pathological states could beused to determine the physiological mechanismsimplicated. However, it is important to consider theisoforms and any protein modifications involved,the types of complexes they associate with and thenature of the systemic environment. François Iris(HemispherxBiopharmaEurope) presented Bio-Graph,a computer-assisted approach to the determinationof phenotype-specific physiological mechanisms fordrug discovery. He pointed out the data miningproblems of ‘textual informatics’ and discussed howto approach them. Biograph is an integrative sys-tematic approach that is a four-phased physiolo-gical modeling process. It takes information fromdiverse forms and formats to create pertinentknowledge that is directly implemented to tacklethe biological problems being addressed. In the firstphase, data from various biological databases (for

    Meeting Review: Bioinformatics and Medicine 273

    Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276.

  • examples, MedLine, OMIM, KEGG and SPAD)are mapped into a relational graph, and then in thesecond phase, the graph is interpreted in the contextof the problem of interest, leading to a sub-graphthat reveals potential pathological links. The thirdphase involves the construction of a theoretical phy-siological model entirely supported by scientificliterature and finally, the last phase identifiespathway-associated potential intervention points andtherapeutic intervention modes. Direct experimentalbiological verification of the results is then required.

    Kosta Steliou (Boston University) gave an over-view of population history and genomics. Studiesinto patient history and genealogy give importantinsights into the molecular basis of disease. He des-cribed how the analysis of the mitochondrialgenome can be used to segregate patients into medi-cally relevant genetic subgroups. Research has shownthat there are certain mutations in the mitochon-drial DNA that are associated with diverse diseasessuch as cancer, infertility, diabetes, stroke, deafnessand many others (http://www.mitoresearch.org).There are also associations between the A3243GmtDNA mutation and neurodegenerative disorders.

    Researching and simulating biomolecularinteraction networks and bioinformaticsaspects of intercellular integration

    Despite the ease with which SNPs can be identified,determining the biological significance of SNPs pro-vides many challenges. In his presentation, JohnTodd (University of Cambridge) gave examples fromhis work in type I diabetes (TID), and autoimmunedisease. He pointed out that there are two generalmechanisms by which SNPs have their effect, eitherby changing the regulation of the gene or bychanging the precise amino acid sequence coded bythe gene. Analysis of the function of SNPs iscomplicated by epistasis (which is where the effectsof one SNP may be masked by another). To dealwith epistasis, large sample sizes are required andTodd suggested that a case study of around 5000samples is required. This implies that SNP screeningmust become an industrial process once SNPs arerecognized in all haplotypes. However, the techno-logy is still approximately 12-fold out in terms ofcost therefore more cost effective methods arerequired. Genetics has to scale up to enable popu-lation-based studies and he currently has fundingfor a case study of 10 000 DNA patient samples

    from the UK Paediatric TID registry. The ‘regis-tries’ provide a diverse range of cases, patient hist-ory and other relevant clinical data such as reasonsfor death. Another challenge in SNP research is inunderstanding splicing complexity.

    Matthias Mann (University of Southern Denmarkand MDS-Proteomics) presented the latest resultsfrom his collaborative work on high-throughputmass spectrometric protein complex identification[4]. Libraries of expressed cDNAs encoding ‘bait’proteins are prepared by a high-throughput recom-bination procedure such that the members of thelibrary of expressed proteins are each linked toan epitope tag. This tag may be used to immuno-precipitate the expressed ‘bait’ protein and anyproteins that are interacting with the ‘bait’ proteinfrom whole cell extracts. The components of theimmunoprecipitated complex can then be identifiedfurther by gel electrophoresis and peptide massfingerprinting. The technique found 3-times moreinteractions than the yeast 2-hybrid technology in acomprehensive study of the yeast proteome. Someteething troubles were mentioned such as ectopicexpression of the tagged clones and a significantnumber of false positives. The filtered data set hasbeen deposited in BIND2, a powerful protein-protein interaction database (http://www.bind.ca/).Raw data can be obtained from http://www.mdsp.com/yeast.

    Matthias Mann also referred to his latest work incollaboration with Angus Lamond in Dundee onthe characterisation of the nucleolus, a substructurewithin the cell nucleus [1]. This project aims todetermine which proteins are found in the nucleo-lus, what they interact with, and what their functionis. Nucleolar isolation followed by 2-D gel electro-phoresis and peptide mass fingerprinting has beenused to identify components of the nucleolus. Issuesthat have arisen from this project include how toconfirm that proteins identified by this processactually reside in the nucleolus rather than beingcontaminants. Fluorescence localization using YellowFluorescent Protein tagged clones was carried outto confirm nucleolar residence of the proteinsidentified by mass spectrometry.

    Matthias Mann emphasized the need for stan-dards to interpret proteomic data particularly giventhe variety of analytical methods that are referredto as proteomic techniques. Predictive tools maybecome more valuable for functional pathways oncemore data is available, such as more in vivo data on

    274 Meeting Review: Bioinformatics and Medicine

    Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276.

  • phosphorylation sites to allow the identification ofunknown proteins with phosphorylation sites.

    David Rubin presented Cognia, a company that isdeveloping proteomic database products. Cogniahas licensed the right to distribute commercial ver-sions of the TRANSFAC and TRANSPATH data-bases with improved querying tools and interfaces.Cognia is also developing a proprietary relationaldatabase product, the Protein Catabolism DataBase(PCDB), a comprehensive database on protein turn-over and protein degradation pathways. This will beof value in a number of diseases in which proteinturnover is believed to be aberrant, such as in cer-tain cancers, neurodegenerative diseases and inflam-matory disorders. The PCDB database includesdata on degradation mechanisms, post-translationalmodifications, structure, protein-protein interac-tions and signal transduction. The PCDB productincludes search tools to enable Boolean queries,structural searching and tools to enable tailoredcuration specific to particular organizations or pro-jects. The design of the database is amenable toinclusion of open source code.

    Molecular docking is a technique for determininghow well pairs of molecules interact with eachother. Pairs of molecules are ‘docked’ in variousorientations and the energy of each interaction isscored. The process is repeated to find the bestinteraction between the pair. Applications include‘in silico’ drug screening to filter compound librariesprior to actual screening and prediction of bindingconformation of hits from screening. Michael Vieth(Eli Lilly and Company) addressed various limi-tations of docking algorithms. The flexibility ofproteins, and their ability to adopt novel conforma-tions on ligand binding, are difficult to account for,as are changes in ligand conformation. MultipleX-ray structures and improved Structure ActivityRelationships (SARs) from known proteins/ligandcomplexes can improve predictions for new proteinsand/or new ligands.

    E-Medicine, medical research andhealthcare in an Internet world

    MayaNatarajan (Entropia, Inc., http://www.entropia.com/) presented results of how ‘grid computing’ aka‘distributed computing’ can accelerate bioinfor-matics research. Using BLAST results and resultsfrom docking experiments from a pharmaceuticalcompany, she demonstrated how moderately sized

    networks of PCs that are actively performingmenial tasks can provide significant computationalpower for processor intensive problems. The com-putational power derived from such networks growslinearly with the number of computing nodes withinthe network and the use of such networks providesa cost-effective solution for high performance com-putational needs.

    Barry Robson’s presentation on ‘Health Care andPersonalised Medicine’ was a visionary account ofwhat the healthcare system might be like in thefuture and how this may be achieved. The medicineof the future will take advantage of personal geno-mic and expression data but such data cannot beused to its full potential unless there is an effectivelink and accessibility to patient records. IBM’svision for a global healthcare system, ‘The SecureHealth and Medical Access Network’ (SHAMAN)provides the infrastructure for this goal. The futureprospect of designing personal drugs in real time isachievable. The drugs will be designed for patientson high performance servers at pharmaceuticalcompanies and FDA offices. However, such avision will mean that a change in the healthcaresystem is required. The development of SHAMANessentially requires the encoding of patient recordinto digital form and the team is working closely witha group in Oxford on bioethics issues. An XMLstandard is used for the ‘clinical’ document ‘architec-ture’ and various forms of visualization tools arebeing developed. Mitochondrial DNA may be usedas a ‘barcode’ for the record. A major problem is thatthe data is dirty, sparse and in a high dimensionalspace. Algorithms are required to cope with this, forexample, knowledge can be reduced into a fewterms using mathematical terminologies. Extensionsto SHAMAN will include a database of potentialpathogens and also a database of animal models forhuman disease.

    Richard Mushlin (IBM T. J. Watson ResearchCenter) mentioned that digital patient data has beenaround for many years now, and the recent successof these systems is largely due to the diverse formsof data being structured according to standards andbrought under the control of integrated applica-tions. Several of these applications are completeenough such that virtually no paper work is involved.However, with the emergence of new multi-sourcedata types such as personal genomic information,these systems must be able to accommodate themelectronically, or risk returning to paper records. Heemphasized the importance of integrating genomic

    Meeting Review: Bioinformatics and Medicine 275

    Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276.

  • data with clinical data for making scientific dis-coveries. Re-evaluation tools for the data will berequired as more information is fed to the system.

    Pim van der Eijk (Oasis, http://www.oasis-open.org/) gave a high-level introduction to XML andpresented the results of a survey of several XML-based languages in biology with the emphasis onschema design, process issues and XML-basedknowledge formalisms.

    Peter Murray-Rust (Unilever Centre for MolecularInformatics) described an XML based system forchemical nomenclature. He emphasized the need foragreed ontologies and discussed the problems ofoverlapping domains. Scientific publication can beused to create a knowledge base, or semantic web,and this has great potential in providing novelknowledge discoveries beyond what is presentlyfound in traditional publications.

    Conclusion

    A diverse array of subjects was covered by thespeakers and a number of underlying themes haveemerged relating to the exploitation of genomicdata, principally the need for standardization bothof data structure and vocabulary. To achieve thisgoal of integration of disciplines as varied as clinicalgenetics, structural genomics and proteomics, thisprocess of standardization will need to be far-reaching and extensive, and achieve global accep-tance. Scientific publication can also provide andcreate a wealth of knowledge, however, publicationwill need to be in an accessible format for dataintegration and mining. Furthermore, if medicalresearch is to evolve, academics in particular needto take an ‘industrial approach’ to research in orderto achieve the necessary scale of data acquisitionand management for genome-wide population basedstudies. To successfully scale up, research processeswill require effective automation particularly ofgenome annotation and the task of assigning

    functions to novel sequences. High throughputstructure analysis is beginning to provide auto-mated methods of genome annotation. Accuratefunctional annotation of genomes will lead to betterinterpretation of both molecular and clinical genet-ics, allowing structural and functional informationto be linked to clinical phenotypes. The ultimateoutcome of this process of integration will beimproved health-care for all.

    References

    1. Andersen JS, Lyon CE, Fox AH, et al. 2002. Directed

    proteomic analysis of the human nucleolus. Curr Biol 12:

    1–11.

    2. Bates PA, Kelley LA, MacCallum RM, Sternberg MJ. 2001.

    Enhancement of protein modeling by human intervention in

    applying the automatic programs 3D-JIGSAW and 3D-

    PSSM. Proteins 45: 39–46.

    3. Di Francesco V, Munson PJ, Garnier J. 1999. FORESST:

    fold recognition from secondary structure predictions of

    proteins. Bioinformatics 15: 131–140.

    4. Ho Y, Gruhler A, Heilbut A, et al. 2002. Systematic

    identification of protein complexes in Saccharomyces cerevi-

    siae by mass spectrometry. Nature 415: 180–183.

    5. Lopez AD, Murray CC. 1998. The global burden of disease,

    1990–2020. Nat Med 4: 1241–1243.

    6. Rison S, Teichmann SA, Thornton JM. 2002. Homology,

    pathway distance and chromosomal localisation of the small

    molecule metabolism enzymes in Escherichia coli. J Mol Biol

    00: 00–00.

    7. Smith GR, Sternberg MJ. 2002. Prediction of protein-protein

    interactions by docking methods. Curr Opin Struct Biol 12:

    28–35.

    8. Teichmann SA, Rison SC, Thornton JM, Riley M, Gough J,

    Chothia C. 2001. The evolution and structural anatomy of

    the small molecule metabolic pathways in Escherichia coli.

    J Mol Biol 311: 693–708.

    9. Tiret L, Poirier O, Nicaud V, et al. 2002. Heterogeneity of

    linkage disequilibrium in human genes has implications for

    association studies of common diseases. Hum Mol Genet 11:

    419–429.

    10. Todd AE, Orengo CA, Thornton JM. 2001. Evolution of

    function in protein superfamilies, from a structural perspec-

    tive. J Mol Biol 307: 1113–1143.

    276 Meeting Review: Bioinformatics and Medicine

    Copyright # 2002 John Wiley & Sons, Ltd. Comp Funct Genom 2002; 3: 270–276.

  • Submit your manuscripts athttp://www.hindawi.com

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Anatomy Research International

    PeptidesInternational Journal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Hindawi Publishing Corporation http://www.hindawi.com

    International Journal of

    Volume 2014

    Zoology

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Molecular Biology International

    GenomicsInternational Journal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    The Scientific World JournalHindawi Publishing Corporation http://www.hindawi.com Volume 2014

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    BioinformaticsAdvances in

    Marine BiologyJournal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Signal TransductionJournal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    BioMed Research International

    Evolutionary BiologyInternational Journal of

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Biochemistry Research International

    ArchaeaHindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Genetics Research International

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Advances in

    Virolog y

    Hindawi Publishing Corporationhttp://www.hindawi.com

    Nucleic AcidsJournal of

    Volume 2014

    Stem CellsInternational

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    Enzyme Research

    Hindawi Publishing Corporationhttp://www.hindawi.com Volume 2014

    International Journal of

    Microbiology