from genotype to phenotype - future perspectives on data and services integration
DESCRIPTION
GEN2PHEN internal report created for bioinformatics course. Provides a general overview of current Web2.0 applications and how can they be used in bioinformatics.TRANSCRIPT
FROMGENOTYPETOPHENOTYPE
FUTUREPERSPECTIVESONDATAANDSERVICEINTEGRATION
TÓPICOS AVANÇADOS EM ENGENHARIA INFORMÁTICA
BIOINFORMÁTICA
Programa Doutoral em Engenharia Informática 20082009
PedroLopes|[email protected]
2
TABLE OF CONTENTS
Tableofcontents .....................................................................................................................................................2Introduction–TheGEN2PHENProject.........................................................................................................3IntegrationScenariosandRelatedWork......................................................................................................6SemanticWeb ......................................................................................................................................................7Socialenvironments .........................................................................................................................................8Integration ............................................................................................................................................................9Summary............................................................................................................................................................. 10
OurOngoingDevelopments ............................................................................................................................ 12Dynamicflow ..................................................................................................................................................... 12DiseaseCard ....................................................................................................................................................... 14Summary............................................................................................................................................................. 15
FuturePerspectives ............................................................................................................................................ 16Cloud‐computing............................................................................................................................................. 16InformationIntegration ............................................................................................................................... 17DataVisualization ........................................................................................................................................... 18Summary............................................................................................................................................................. 19
Conclusion............................................................................................................................................................... 20References............................................................................................................................................................... 21
3
INTRODUCTION – THE GEN2PHEN PROJECT
Bioinformatics is emerging as one of the more fastest‐growing scientific areas of
computerscience.Recenthardwareandsoftwaredevelopmentsshowanevolutionfaster
thantheMoore’sLawpredictions.ThisdevelopmenthasbegunwiththeHumanGenome
Project 1 which has succeeded in decoding the complete human genetic code. This
generated a tremendous amount of information that was readily available and the
scientific community rapidly started designing applications, increasing the amount of
resources needed in this area. Following the Human Genome Project came the Human
Variome Project2, which aims to collect information about genome variations and their
influenceinhumanhealth.Alongwiththelatter,EuropeanCommunityisalsosponsoring
a bioinformatics project in its Seventh Framework Program: Genotype to Phenotype
Databases:aHolisticSolution(GEN2PHEN)3.
The GEN2PHEN Project is a collaborative project with 19 partners. Most of the
partners are from European institutions with relevant work in the bioinformatics
scientific area. GEN2PHEN is an ambitious project aiming to unify human and model
organismsgeneticvariationdatabasesallowingthecreationofacentralgenomebrowser
withtheabilitytoblendGEN2PHENdataandmedicaldata.Theoverallgoalistocreatea
completebiomedicalknowledgeenvironment.Thestrategyandobjectivesofthisproject
maybedividedinseveralresearchareas:
• Analyzethegenotypetophenotype fieldand investigatecurrentneedsand
practices in order to obtain a complete knowledge about other ongoing
projects with similar objectives. The active biology community must be
consultedinorderdevelopanaccuratestate‐of‐the‐artdocumentdescribing
thegeneralprocessonthefieldandenablingthemostcorrectdefinitionof
what this particular area is lacking andwhatmodels and technologies are
beingeffectivelyused.
1 Human Genome Project: http://www.ornl.gov/sci/techresources/Human_Genome/home.shtml 2 Human Variome Project: http://www.humanvariomeproject.org 3 GEN2PHEN: http://www.gen2phen.org
4
• Developstandardsforthegenotypetophenotypefieldofresearchinorder
to speed up the standardization process with new data models,
nomenclatureandtechnologystandards.
• Create generic database components, services and integration
infrastructures for thegenotypetophenotypedomain.Thesesolutionswill
bemostlywebapplicationsapplyingnew interfaceusability standardsand
customizedtotheirendusers.Solutionsforgeneticandgenomicdatabases
will be developed. This particular objective is aiming to create a central
GEN2PHEN database crossing all the research areas and a simpler
application,whichcanbedeployedbyanyresearchgroup.
• Create data search and presentation solutions for genotype to phenotype
knowledge.Applicationsdesignedwhen fulfilling thepreviouslymentioned
objectivewon’t be completewithout proper searchmechanisms thatmust
encompass information distributed throughout different applications and
architecture layers. The applications must also have an effective interface
layerdesignedtorespectthecommunityrequests.
• Facilitate research and diagnostic genotype to phenotype databases
population by developing new tools and promoting them in the scientific
community. The newly developed applications will also support more
efficientmethods for data insertion allowing anyone to collaborate in this
project.
• Build a major genotype to phenotype Internet portal, a GEN2PHEN
knowledge centre. This portal will contain all GEN2PHEN related
information, ranging from calendars to databases, from publications to
discussionforums.
• Deploy developed solutions to the community in order to increase
researchers interestandparticipation.Severalresourceswillbedevotedto
advertising, explaining and training researchers in using the developed
solutions.
The project main focus is on developing and promoting a new generation of
applicationsthatwillaiddifferenttypesofresearchersintheirscientificworkand,atthe
sametime,gatherandintegrateinformationfromdifferentsourceswhichwillbeshared
tothecommunity.GEN2PHENapplicationshavetobestate‐of‐the‐artwebapplications.It
is important to research and study the most popular Web2.0 (and next Web3.0)
5
applicationsinordertoimprovedevelopers’knowledgeaboutwhatcaptivatestheusers,
increasing general biomedical community interest. This research should be mainly
focused on user interactions issues like usability, interfaces, “quality of service” and
overallusersatisfaction.Thisnewwaveofapplicationshastoaddressissueslikesemantic
data integration, user collaboration, information sharing and search engines’ algorithms
improvements.
Fig. 1 GEN2PHENstrategy
Developing a simple Rich Internet Application is, by now, a somewhat trivial
process,notrequiringgreatsoftwareengineeringandprogrammingknowledge.However,
bioinformatics and biomedicine don’t depend only on good‐looking interfaces. What
matters, and this is the difficult part, is what’s under the hood. Going deeper in the
application composition, several issues like data integration, service integration, service
orchestration, workflow composition, distributed processing, query expansion or object
ontologiesarise.
ThisreportintendstogiveaGEN2PHENprojectoverviewwithspecialincidencein
these next‐generation web applications problems. Some solutions with ongoing
developmentwill be referred aswell as systems in development in ourworkgroup and
howcanbothhelpassessingGEN2PHENapplicationdesign.
6
INTEGRATION SCENARIOS AND RELATED WORK
Firstofallisnecessarytounderstandtowhomthesenewapplicationparadigmswill
beimportantandwhythesegenericGEN2PHENgoalsaresosignificant.Thebiologicaland
biomedical scientific community iswatchinganexponential increaseon the information
available. This growth leads, subsequently, to the growth of the number of applications
(web or desktop) to solve the same specific problems. And along with these new
applications,comenewdatasources,newservicesandtheheterogeneityamongthemis
huge. The main issue one main found when doing scientific research is where to find
information.A fewyearsago thiswasaproblembecauseof the lackofapplicationsand
databases. Now, this is a problem because of the excessive amount of information
availableoneverycorneroftheweb.
Fig. 2 Web2.0integration
From the users perspective, we believe they are looking for a central, unifying
portal,customizedtotheirpersonalstatus,wheretheycaneasilyfindalltheinformation
they need. This is the added value GEN2PHEN solutionsmay have. Currently, there are
7
innumerous ongoing works focusing on this problem. However, there isn’t a universal
solutiontosolvealltheheterogeneityproblemsarosebydataandserviceintegration.And
theproblemsdon’tboildowntothis;therearealsothenovelfunctionalitiespossiblewith
thesemanticweb[1]andthegranddevelopmentsmadeininformationmining.Following
GobleandStevens[2]work,onecanconcludethatnotall iswell in thekingdomofdata
integrationinbioinformaticsandthatdata integrationhasa longpathtoruninorderto
completelysatisfytheinitialsgoals.
Thegroupofapplicationsthatshouldbestudiedmaybedividedinthreemainareas
thatarelargelyconnectedandpotentiateintegration.Therearedevelopmentsinsemantic
web and its application in biology and how the bridge between generic ontologies and
biological ones can be made. Other groups are working in collaboration tools for the
community, which have better information sharing and productivity tools. The largest
group is the integration one. In this group one can encompass data integration, service
integration,serviceorchestration,workflowcompositionandmashupapplications.
SEMANTICWEB
Semantic web developments have the main purpose of describing, with a pre‐
definedontology,all the informationexistent in theweb.Semanticwebkeycomponents
areRDF4,OWL5andSPARQL6.RDF stands forResourceDescriptionFrameworkand is a
genericmetadatamodel foronlineinformationandcontentdescription.OWListheWeb
OntologyLanguage,whichistheontology‐authoringtoolusuallyassociatedwiththeRDF
schema. SPARQL is a recursive acronym for SPARQLProtocol andRDFQuery Language
and is a query language, basedon SQL, to obtain information stored in theRDF format.
Implementing semantic web architectures is not a trivial task [3] for any kind of data.
However, it is important to introduce these metadata structures and algorithms in
bioinformatics,astheywillbecomepartofWeb3.0.
Applyingsemanticwebconceptsandtechnologiesinbioinformaticsonecanaccess,
in a unified manner, several biological documents described with RDF. Automation of
processes and improved machine‐machine data exchange are also enabled with the
applicationoftheseconcepts.Belleauetal.proposeBio2RDF[4],apreliminaryapproach
4 Resource Description Framework: http://www.w3.org/RDF 5 Web Ontology Language: http://www.w3.org/2004/OWL 6 SPARQL Query Language for RDF: http://www.w3.org/TR/rdf-sparql-query
8
to create an engine which provides RDF access to biological data distributed through
several databases such asKEGGorNCBI. Bio2RDF7makes all the data available in their
websiteusingonlytheURLto locatetheresources. Splendiani[5]alsoasaproposalto
bringthesemanticwebtobiology,buttheimplementationisn’tasadvancedasBio2RDF.
These are themost recent implementations but biology andmedicine are very difficult
scientificareasduetothecomplexityindefiningaproperontologythatcoversallthelife
sciencesconceptsandterms.
SOCIALENVIRONMENTS
Social networks and collaboration environments are some of the most popular
Web2.0applications.Theseapplicationsconnectusersandallowthemtosharepersonal
information, music, videos or any other type of data. Additionally, several small
applications are developed to integrate information about different users or
entertainment areas. For instance, a movies application would allow every user to
describehispersonalmovietastes;whenusedinalargescaleenvironment,itwouldgive
the developers important information about cinema which could be used to improve
advertisementsshowntotheuser:auserwholikeshorrormovieswouldhaveagreater
probabilityofseeinghorrormovieadsthanonewholikescomedies.Facebook8isoneof
thelargestworldwideusedsocialwebapplicationswithover120millionusers.Usingthe
personal connections, personal preferences and other specific applications, Facebook
owners have valuablemarket information. Like Facebook,MySpace9or Google’s Orkut10
provide almost the same functionalities to users. Experiencing a sustained growth is
Carole Anne et al. [6] myExperiment11which is the first bioinformatics social network
application where one can connect with others, share files (with focus on Taverna
workflows,detailedmoreaheadinthisreport)andcreatescientificcommunities.Despite
the focus on Taverna, myExperiment provides a rich scientific ecosystem offering the
community a wide range of tools essential in any social collaborative environment.
myExperiment also offers access to its services using RESTful programming interfaces,
7 Bio2RDF: http://www.bio2rdf.org 8 Facebook: http://www.facebook.com 9 MySpace: http://www.myspace.com 10 Orkut: http://www.orkut.com 11 myExperiment: http://www.myexperiment.org
9
thus,itispossibletobuildnewapplicationsontheframeworkorusemyExperimentdata
andtoolstoimproveexistingones.
INTEGRATION
Integrationinbioinformatics isoneoftheareaswheremoregroupsareinterested
andwithmoreongoingwork.Integrationisaresearchareawhichincludesthementioned
semantic web and social networking tools besides other fields such as mashups or
workflows.Aworkflowisasimplesequenceof logicstepsoractivitiesthatareexecuted
independently fromeachother [7].Applying this generic concept tobioinformatics, one
mayassumethataworkflowisanorganizedinformationflow,connectingdistinctservices
and/ordata sources inorder to solveaproblem inamodularlymanner.Themostused
solution forworkflow building and execution is Taverna [8, 9]. Taverna is a Java based
desktopapplicationofferingasimpleinterfaceforworkflowcompositionandexecution.It
canaccessseveraltypesofservicessuchasBioMoby[10]orgenericWSDLwebservices.
The major setback is that to integrate services, one must define an integration XML
componenttoassistinformationpipingfromserviceAoutputtoserviceBinput.Taverna
can also be used from within other applications, allowing access to the results of
previously savedworkflows or executingworkflows in real time. One ofmyExperiment
functionalitiesisworkflowsharing,onemayaccessalargeworkflowstoragesystemand
findsolutionsdevelopedbyothersor shareone’sworkflowand importantdevelopment
information.Currently,Taverna’sgreatestflawisbeingdesktopbasedaswe’reassistinga
shift in the computational paradigm: web applications usage dominating over desktop
ones.
Alongsidewithworkflowstherearemashups.Mashupsbeguninthemusicindustry:
theyweresimplemixesofseveralsongsintoasinglesong.WithWeb2.0,thisideacrossed
to web applications. Mashups are web applications which combine information from a
predefinedcollectionofdatasourcesorservices inasingle interface.Wecanconsidera
mashup as being a meta application: it basically creates a new application by using
functionalitiesprovidedbyotherapplications.Online,thereareseveralworkflow/mashup
building frameworks. It is important to mention Yahoo! Pipes12and Microsoft Popfly13
becausetheyhaveremarkableinterfacesandpre‐builtcomponentstoaccessWorldWide
12 Yahoo! Pipes: http://pipes.yahoo.com/pipes 13 Microsoft Popfly: http://www.popfly.com
10
Web most popular websites. Bioinformaticians can use these tools with data from
different data sources to develop new applications. Cheung et al. [11] pursued this
approach to create a biomedical mashup application. Despite this, the mentioned tools
weren’t specifically designed to be used in the life sciences area. Therefore, several
researchersareworkingonserviceintegrationframeworks:deKnikkeretal.[12]havea
basicweb service choreography scenario;Bio‐jETI fromMargaria et al. [13] is a similar
solution,usingthesameprinciplesasdeKnikker.Thesetoolsshareacommonproblemin
integration: the information sources heterogeneity doesn’t allow a fully automated
integrationsolution.Eachservicestoresandoffers thedata in itsownmodel, increasing
thedifficultyinconceptmappingandinformationexchange.Thereisn’tyetanautomated
toolwhichoffersasimpleintegrationinterface,allowingtheuseofcomponentsfromany
randomservice.BioMoby[10]isaninitiativetocreateanontologyandcentralrepository
of bioinformatic resources.With this semantic framework, one can share or use online
services created by others in an almost automated fashion [14]. BioMoby14 central
repository faces typical resource discovery problems such as validation or duplication.
Anyonecanaddservicesandthedescriptionprovidedorservicefunctionalitymaynotbe
scientifically valid and induce errors to users.Duplication of services is also a problem:
there can be any number of services doing the same task, thus it is difficult to choose
whichonesfitsbetterinthedesiredrequirements.
Fig. 3 – Existingdevelopmentscategories
SUMMARY
Fullyautomatedanddynamicintegrationisthepanaceathatdevelopershaven’tyet
reached.Workflow ormashup solutions are themost popular to integrate services and
14 BioMoby: http://www.biomoby.org
11
datasources.However,bothofthemimplyhardcodingseveralfunctionalities,increasing
dependencyondeveloperstoaddnewfunctionalities.Applyingasemanticwebapproach
to bioinformatics will empower developers to create more independent applications.
Describing services and information semantically will allow automated communication
between heterogeneous applications. This will enhance existing workflow and mashup
applications: it will be easier for users to add new services to existing applications,
becomingdevelopersofnewmetaapplicationsadjustedtotheirneeds.
12
OUR ONGOING DEVELOPMENTS
Our bioinformatics group is, like others, developing software solutions to solve
problems associated with this specific area. The developed work didn’t focus on
integration or semantic web. Our work was mostly focused on aiding microarray
laboratory research. ANACONDA [15] is a tool to study gene primary structure. The
Microarray Information Database – MIND [16] ‐ is a web application which helps
researchers in the task of analyzingmicroarray experiment results. More abstract than
MINDisGeneBrowser[17],atoolforgeneexpressionstudiesfrommicroarraygenelists
results.
However, the web trends and the association with projects like GEN2PHEN or
ALERT15broughtthenecessitytoexpandourgroup’sapplicationrange.DynamicFlow[18,
19] is a web‐based workflow management application, providing Web2.0 semi‐
autonomous service integration. DiseaseCard [20] is an older application, however it
alreadyimplementsbasiccollaborationandintegrationfunctionalitieswhichlaterbecame
famouswithWeb2.0.Furtherdevelopmentsarebeingstudiedtoimplementsemanticweb
engines,mashupapplicationsandnovelinformationvisualizationtechniques.
DYNAMICFLOW
DynamicFlowisaframeworkfordynamicintegrationofheterogeneousinformation
sources. Themaingoalwhendevelopingthisframeworkwastocreateanovelandagile
interfaceforservice integration.Theapplicationshouldhaveausable,easyandintuitive
interfaceforsolvingproblemsusinga“divideandconquer”strategy:themainproblemis
dividedinsmallertasksthatcanbesolvedwithacertainwebservice;thetasksarethen
combined,usingtheworkflowmetaphor,creatingan informationflowfromtasktotask,
until we get the final solution. This modular approach could be useful for researchers
because it ismore similar to the plan they havewhen solving problems in thewet lab:
structuring the problem and then solving it iteratively, using simple tasks in a web
applicationrunningintheirbrowser.
15 ALERT Project: http://www.alert-project.org
13
Fig. 4 DynamicFlowframeworkmodel
One of DynamicFlow’s key elements is its innovative model. The three‐layered
model ‐ Fig. 4 – divides the application in access: the bottom layer, containing the
databasesandtheexternalservices;design,thetoplayerwheretheuserinteractionslike
workflow building occur, using AJAX technology and drag‐‘n‐dropmetaphors; core, the
processing layer which encompasses server‐side processing on the application’s web
server and client‐sideprocessing in the client’s browser.This is oneof the framework’s
mainfeatures,thedivisionoftheprocessinglayerintwoseparatecomponents.Theweb
server processes client requests and connects to the authentication server and the
framework’sDBMSbutservice–applicationcommunicationanddatapipingbetweentasks
are client‐side processed, reducing server charger and speeding up the application
executionwithanincreaseinefficiencyandresponsetime.Thissemi‐autonomousprocess
ofmaintainingavalidinformationflowfromoneservicetothenextispossibleduetothe
service definition standard that was previously defined. The standard follows a simple
ontology and provides an easy way for editing the available services. Using it, the
application can validate workflow consistency, execute the workflow and display
intermediateresultsallusingthebrowser’sresources.It’saprimitiveversionofsemantics
inaninformationintegrationapplication.
Thework conducted resulted in aweb application prototype available for testing
and open to new developments. These new developments will be on five main topics:
perfectingtheservicedefinitionstandard, inclusionofsemanticwebtechnologies(RDF),
interfaceimprovements,newuserinteractionandwideningtheservicerange.
14
DISEASECARD
DiseaseCard16project has begun in 2003 with the objective of creating a rare
disease link aggregator, integrating information from distributed and heterogeneous
medical andgenomicdatabases.The linksweregatheredby aweb crawling engine and
groupedintonodesrepresentingconcepts‐Fig.5.For instance, forthePetersanomaly17
disease, the node References contains all the reference sections of the NCBI OMIM18
database that refer to this disease and the node Pathology contains Orphanet 19
informationaboutthisdisease.Alongwiththeexternalinformation,eachdiseasealsohas
a forum entry, where any registered user can share his personal experience. A tree –
similar to Windows Explorer one – shows all the nodes and their collection of links,
displaying,inaunifiedinterface,informationfromthegenotypetothephenotype.Aswe
wanttogatherasmuchinformationaspossible,rarediseasesarethemaintargetdueto
theirhighassociationbetweengenotypeandphenotype.Itisimportanttomentionthatno
database information is replicated: DiseaseCard only saves link information of shared
data.Modernconceptslikeintegration–heterogeneouslinkgathering–andcollaboration
–publicdiseaseforums–wherealreadyconsideredwhendevelopingthesystem.
16 DiseaseCard: http://www.diseasecard.org 17 Peters Anomaly disease card: http://diseasecard.org/evaluateCard.do?diseaseid=604229 18 OMIM Home: http://www.ncbi.nlm.nih.gov/omim 19 Orphanet: http://www.orpha.net/consor/cgi-bin/index.php
15
Fig. 5 DiseaseCardconceptmap
As the application got older, it lost quality: the web crawling engine doesn’t
automaticallyadapttolinkchangesandso,forseveralconcepts,theresultingnodeswere
empty. In apreliminaryanalysisofGEN2PHENgoals andhow they canbeachieved,we
concluded that DiseaseCard was the most adequate solution and should be under
development again. After a careful analysis and the definition of an action plan, its
operability was restored, the crawler was corrected, the interface got a new look and
DiseaseCardisbackontrack.
AsfarasGEN2PHENisconcerned,DiseaseCardwillbeasimplewaytoachievesome
oftheinitiallyproposedgoals.Inthefuture,addingGEN2PHENrelateddatabasesandweb
portalsisaprioritytocompletetheapplication.TheinclusionofsemanticsinDiseaseCard
and in the portals it crawls will ease the crawling process and improve the obtained
resultsprecision. Informationmiming featuresarealsobeing researched: even if it only
storeslinks,DiseaseCardcontainsvaluableinformationinthoselinkswhichcanbeuseful
innewtypesofqueries.
SUMMARY
Both DynamicFlow and DiseaseCard are ongoing projects that will be developed
withintheGEN2PHENperspective.Thenextsectiondetailsnewfunctionalities,interfaces
anduserinteractionsthatcanbeimplementedineitheroftheseapplicationsinorderto
improvetheirquality.
16
FUTURE PERSPECTIVES
Web2.0 changed Internet forever. Developers don’t just care about what the
applicationdoesanymorebutalsowhattheuserswant it todo.Usersarenowthemost
importantpartoftheInternet.Theyproducecontent,theyhavetheirownwebfootprint,
andtheyarepartofanewonlinecommunity.IfWeb2.0isthesocialweb,Web3.0maybe
the intelligent web. Despite being science fiction, Web3.0 is nearer one may think.
Differentplatformscancommunicatewitheachotherautomatically;“cloud‐computing”is
taking over the web; web is getting intelligent with new semantics; distributed
applicationsarebeingintegrated.Thesefacts,whichweremeredreamsafewyearsago,
are empowering the Internetwith new solutions and establishing it as the platform for
everything:productivity,entertainment,research,leisure…
CLOUD‐COMPUTING
NewcomputingparadigmsarechangingtheInternetatthearchitecturelevel.GRID
[21] architectures are the new solution for distributed computing. Virtualization
improvements [22] make virtual machines almost as powerful as real ones. “Cloud‐
computing” [23] uses the best of both to offer an online development environment.
MicrosoftwiththeAzureServicesPlatform20,AmazonwiththeElasticComputeCloud21or
Googlewith its App Engine22offer access to virtualmachineswhere anyone can deploy
applications which will use distributed resources to guarantee real‐time scalability,
flexibilityandavailability.
Following the same paradigm trend, new web applications and web applications
suites are replacing traditionaldesktopapps. For instance,Microsoft’s Live23suiteoffers
almostalltheOfficesuitetoolsonlineandGoogle24alsohastheessentialproductivitytools
online,inthe“cloud”.
20 Azure Services Platform: http://www.microsoft.com/azure/default.mspx 21 Amazon Elastic Compute Cloud: http://aws.amazon.com/ec2 22 Google App Engine: http://code.google.com/appengine 23 Microsoft Live: http://www.live.com 24 Google Apps: http://www.google.com/apps
17
INFORMATIONINTEGRATION
Considering information integration tools one can explore mashups and web
desktops.Popularmashupapplicationsarepersonalandcustomizablewebportals,made
with gadgets that access almost any web application. Netvibes25is definitely the most
completepersonalportalintheWeb.However,themostfamousisGoogle’siGoogle26.Both
offer, in a simple interface, the ability to customize a page with any gadgets we want.
Availablegadgetsincludee‐mailaccess,calendars,to‐dolists,newsreadersandalmostany
interestingtooltoincludeinasingleportal.
Fig. 6 iGooglegadgetinterfacestub
Web desktops are web applications that simulate the traditional desktop
environment:there’swallpaper,iconstoaccessapplications,trashbin,taskbarandmenus
forapplications.eyeOS27isacloudcomputingoperatingsystemallowinganyusertowork
online in a vast set of applications. Besides this, it is also an open source development
platform:userscancreatetheirapplicationsandinstallthemontheirwebdesktop.
25 Netvibes: http://www.netvibes.com 26 iGoogle: http://www.google.com/ig 27 eyeOS: http://eyeos.org
18
DATAVISUALIZATION
Other interesting area is data visualization. Traditionally, search results are listed
withasimpledescription.However,newsearchengineslikeViewzi28orSearchme29offer
resultsindifferentinterfaces.Theresultsarepresentedinamuchmorevisuallyappealing
interface. Screenshots are taken from the pages and show in grids or lists. Results are
orderedbydate to formachronological sequence. Information isgathered fromdistinct
searchenginesinordertobetterranktheresults.Contextrelationsareestablishedamong
resultstocreateavisualrelationaltree.Thedistinctvisualizationsofthesameresultsare
importantastheycanofferdistinctiveinsightsonthesamedata.Aiminganimproveduser
interactionandgreaterusagesatisfaction,thesetoolsrelyonAJAX,FlashorSilverlightto
createcaptivatingandusableinterfaces.
Fig. 7 Viewziresultgridforgen2phensearch
28 Viewzi: http://www.viewzi.com 29 Searchme: http://www.searchme.com
19
SUMMARY
All the presented applications and interfaces are new solutions that are being
consideredinseveralthematicfields.Theyrepresentthefirststeptothenextgeneration
ofwebapplicationsandopenthedoortoanewlevelofuserinteraction.
Thisnewwaveofwebapplicationswillhaverepercussionsonbioinformatics.New
applications like iBioinformatics andBioDesktop or new result visualization tools could
leavetheirmarkinthebioinformaticsworld.
From the iGoogle and Netvibes example one could develop a similar portal,
integrating gadgets and applications in a single interface. iBioinformatics or BioVibes
wouldrepresenta leap forward in integrationandpersonalization. Ifonecouldcreatea
large range of services in the gadget repository, any research could customize the
applicationaccordingtohisneeds,thus,creatinghisownpersonalmetaapplication.
BioDesktop or BiOS could be an EyeOS based bioinformatics and biomedical web
desktop.Followingthedesktopmetaphor,onecouldcreateawebdesktopimplementation
containingapplicationsandtoolsusefulforresearchers.Anyusercouldthenhavehisown
personaldesktoponline,customizedaccordingtohisownneedsandtaste.
Integrationplaysalargeroleinthefutureofbioinformatics,butdatavisualizationis
alsoimportant.Webscreenshotsareusefultoshowapreviewofthepagewe’researching.
Thisideacouldbeappliedtobioinformaticssearchresults,showingpathwaypreviewsor
proteinstructurepreviews.Arrangingtheresults ingridsor listsandusingtechnologies
likeAJAX,FlashorSilverlighttocreatenewinterfacesonecoulddevelopinterestingand
usefulapplications.
20
CONCLUSION
Bioinformatics applications are evolving. Evolution isn’t a simple process and
choosingtherightpath isn’ta trivial task.Thisevolutionprocess isusuallysustainedby
largeprojectsliketheHumanGenomeProjectafewyearsagoortheEuropeanGEN2PHEN
projectnow.
As bioinformatics is evolving, so are other software applications. The trend is to
move the software to theweb and tomake it available, freely, to the entireworld. This
processmaybecomplex,but intheend, thepositiveaspectsruleoverthetradeoffs that
havetobemade.
Forbioinformatics,continuingthisridealongwithstate‐of‐the‐artwebtechnologies
isatremendoustask.Thelifesciencesareaisdefinitelyoneoftheareaswheretheamount
ofdata is larger, andwhere thedifferencesbetweenapplications and services aremore
noticeable. This leads to an enormous complexity in integration heterogeneous
informationsources.
Despite these facts, several groups areworking to solve integrationproblems and
they have several approaches. Semantic web concepts for better machine‐machine
exchanges or “proprietary” integration frameworks using hard‐coded concept mapping
aresolutionscurrentlyunderdevelopment.However,thereisn’tanyheavenlysolutionfor
these problems. Fully automatic and dynamic information integration hasn’t yet been
achievedandisstillsciencefiction.
Hopefully,usingthepresentedperspectivesandusingmoreconceptsfromsuccess
casesinotherareaslikeentertainmentorCRM,willenhancecurrentbioinformaticsweb
applicationsandempowerdeveloperswithtoolstodesignnewones.
21
REFERENCES
1. Berners‐Lee,T.,Hendler,J.,Lassila,O.:TheSemanticWeb.SciAm284 (2001)34‐
43
2. Goble, C., Stevens, R.: State of the nation in data integration for bioinformatics.
JournalofBiomedicalInformatics41 (2008)687‐693
3. Fielding,R.:SemanticWebServicesChallenge:ArchitecturalStylesandtheDesign
ofNetwork‐basedSoftwareArchitectures.SemanticWebServicesChallenge:Challengeon
Automating Web Services Mediation, Choreography and Discovery: 2006; Stanford
University,USA(2000)
4. Belleau,F.,Nolin,M.‐A.,Tourigny,N.,Rigault,P.,Morissette,J.:Bio2RDF:Towardsa
mashuptobuildbioinformaticsknowledgesystems.JournalofBiomedicalInformatics41
(2008)706‐716
5. Splendiani, A.: RDFScape: Semantic Web meets Systems Biology. BMC
Bioinformatics9 (2008)S6
6. Carole Anne, G., David Charles De, R.: myExperiment: social networking for
workflow‐usinge‐scientists.Proceedingsofthe2ndworkshoponWorkflowsinsupportof
large‐scalescience.ACM,Monterey,California,USA(2007)
7. Cardoso, J., Sheth, A.: Semantic E‐Workflow Composition. Journal of Intelligent
InformationSystems(2003)
8. Ludascher, B., Altintas, I., Berkley, C., Higgings, D., Jaeger, E., Jones,M., Lee, E.A.,
Tao, J., Zhao, Y.: Taverna: Scientific Workflow Management and the Kepler System.
ResearchArticles,ConcurrencyandComputation:Practice&Experience18 (2006)1039‐
1065
9. Oinn, T., Addis, M., Ferris, J., Marvin, D., Senger, M., Greenwood, M., Carver, T.,
Glover,K.,Pocock,M.R.,Wipat,A.,Li,P.:Taverna:atoolforthecompositionandenactment
ofbioinformaticsworkflows.Bioinformatics20 (2004)3045‐3054
10. Wilkinson, M., Links, M.: BioMoby: An open source biological web services
proposal.BriefBioinform3 (2002)331‐341
11. Cheung,K.‐H.,Yip,K.Y.,Townsend, J.P., Scotch,M.:HCLS2.0/3.0:Healthcareand
lifesciencesdatamashupusingWeb2.0/3.0.JournalofBiomedicalInformatics41 (2008)
694‐705
22
12. deKnikker,R.,Guo,Y., Li, J.‐l.,Kwan,A.,Yip,K.,Cheung,D.,Cheung,K.‐H.:Aweb
services choreography scenario for interoperating bioinformatics applications. BMC
Bioinformatics5 (2004)25
13. Margaria, T., Kubczak, C., Steffen, B.: Bio‐jETI: a service integration, design, and
provisioning platform for orchestrated bioinformatics processes. BMC Bioinformatics 9
(2008)S12
14. DiBernardo, M., Pottinger, R., Wilkinson, M.: Semi‐automatic web service
compositionforthe lifesciencesusingtheBioMobysemanticwebframework. Journalof
BiomedicalInformatics41 (2008)837‐847
15. Pinheiro, M., Afreixo, V., Moura, G., Freitas, A., Santos, M.A.S., Oliveira, J.L.:
Statistical, computational and visualization methodologies to unveil gene primary
structurefeatures.Vol.vol.45,n.¬∫2(2006)p.163‐168
16. Joel,A.,Laura,C.,Manuel,A.S.S.,JoséLuis,O.:Collaborativeworkonmicroarrays
usingMAGE‐ML.MGED9:ThemeetingoftheMicroarrayGeneExpressionDataSociety
17. Arrais, J., Santos, B., Fernandes, J., Carreto, L., Santos, M., A. S., Oliveira, J.L.:
GeneBrowser: an approach for integration and functional classification of genomic data.
Vol.vol.4,n.º3(2007)
18. Lopes, P.: Service Integration for Knowledge Extraction. Electronics,
Telecommunications and Informatics Department, Vol. Master of Science. University of
Aveiro,Aveiro(2008)
19. Lopes, P., Arrais, J., Oliveira, J.L.: Dynamic Service Integration using Web‐based
Workflows. In: Society, A.C. (ed.): 10th International Conference on Information
IntegrationandWebApplications&Services.Association forComputerMachinery,Linz,
Austria(2008)622‐625
20. Oliveira,J.L.,Dias,G.M.S.,Oliveira,I.F.C.,Rocha,P.D.N.S.d.,Hermosilla,I.,Vicente,J.,
Spiteri, I.,Martin‐Sánchez, F., Pereira , A.M.M.d.S.: DISEASECARD: AWeb‐based Tool for
the Collaborative Integration of Genetic and Medical Information. 5th International
Symposium,ISBMDA2004:BiologicalandMedicalDataAnalysis(2004)409‐417
21. Nadeem, F., Yousaf, M.M., Ali, M.: Grid Performance Prediction: Requirements,
Framework,andModels.EmergingTechnologies,2006.ICET'06.InternationalConference
on(2006)695‐702
22. Chen,W.,Lu,H.,Shen,L.,Wang,Z.,Xiao,N.,Chen,D.:ANovelHardwareAssisted
Full Virtualization Technique. Young Computer Scientists, 2008. ICYCS 2008. The 9th
InternationalConferencefor(2008)1292‐1297
23
23. Vouk,M.A.:Cloudcomputing‐Issues,researchandimplementations.Information
TechnologyInterfaces,2008.ITI2008.30thInternationalConferenceon(2008)31‐40