affordable biocomputing for everyone: using the internet, freeware and open-source software
TRANSCRIPT
TRENDS in Biochemical Sciences Vol.27 No.11 November 2002586 Forum
Public institutions such as hospitals anduniversities often have limited resourcesand until recently they have been unableto use the full power of computers as anintegral part of the design and analysis ofexperiments. The high cost ofbioanalytical software and the hardwarenecessary to run it, meant that onlyprivate companies or well-fundedorganizations could afford professionalbiocomputing systems. Even standardoffice and reference management softwareis very costly; reference managementsoftware is priced at around $300, and thetypical office package starts at $400 perlicense. However, the global adoption ofthe Internet, particularly within thebiological and medical researchcommunity, has made a large number ofprofessional tools freely available.Furthermore, with the availability ofinexpensive, high-powered desktopcomputers and several advances both inaccessibility and maturation of FREEWARE
and OPEN-SOURCE SOFTWARE (see Glossary),there is little need for customizedcomputing platforms running proprietaryand expensive office and bioinformaticssoftware. It is now possible to createhighly capable computer systems thatenable the user to set up a completeworking biocomputing platform withnothing more than a desk-top computerand an Internet connection.
Here I describe how such a computersystem could be created using MicrosoftWindows or Macintosh OS, given thatmany computers come pre-configuredwith these OPERATING SYSTEMS. All of thesoftware mentioned here is free, except for Microsoft Windows, and almost all ofthe software has reached a mature andstable development status (i.e. at leastversion 1.0). Links and details on thesoftware presented can be found athttp://www.madswichmann.dk/biocomputing.html.
System requirements
Hardware requirements have been astumbling block for many users, but the
processing power of modern computersgreatly outperforms the requirements ofmost software. A PC with a 1.2 GHzprocessor, including a 20 Gb hard driveand a 17-inch monitor can now bepurchased for less than $600. Obviously,more-powerful computer systems willincrease the execution speed of programs,particularly for intensive tasks such asmodeling of proteins and searchingthrough large datasets. PCs withWindows and an Intel Pentium or AMDK6 processor running at 200 MHz willaccommodate most computing needs andwith around 4 Gb of total hard diskcapacity there should be sufficient storagespace for most operations. The graphiccapabilities of the computer are lessimportant and only become a bottleneckwhen one is working with visual tools, forinstance visualizing molecules.
Previously, the Macintosh OS did nothave the seemingly endless number offreeware applications that were available
for Windows, but the recent release ofMac OS X has lured open-sourcedevelopers to the platform, both as achallenge and because of the famous user-friendliness and style of Apple systems(see for example Unix software for yourMac [http://fink.sourceforge.net]).Currently, however, although freesoftware is being developed, Mac OSusers have fewer software options thanWindows users. Even so, any Applecomputer with a PowerPC processor andMac OS 8.5 or higher should be sufficientfor setting up a fully functionalbiocomputing system.
A biocomputing system based on freeware
and open-source software
Open-source and free alternatives tocommercial products have recently begunto appear. On a Windows computer,OpenOffice.org is a very capable andcompletely free office suite (Fig. 1), withevery application an office will ever need,
TRENDS in Biochemical Sciences
http://tibs.trends.com 0968-0004/02/$ – see front matter © 2002 Elsevier Science Ltd. All rights reserved. PII: S0968-0004(02)02155-2
Forum
Computer Corner
Affordable biocomputing for everyone: using the
Internet, freeware and open-source software
Mads Wichmann Matthiessen
Fig. 1. OpenOffice.org is similar to other modern office suites, providing tools for word processing, spreadsheets,presentations, and drawing. In this screenshot, an article is being edited in the word processor with an image-libraryopen at the top. The red underline beneath some words shows mis-spelled or unknown words. On the right,a formatting dialog is open.
TRENDS in Biochemical Sciences Vol.27 No.11 November 2002
http://tibs.trends.com
587Forum
including software for word processing,spreadsheets, reference management, adatabase, presentations, editing formulaeand drawing. It even imports and exportsto competing file formats, such as MicrosoftOffice and WordPerfect. On Applecomputers, OpenOffice.org has yet to reacha mature level, but it can nevertheless bedownloaded as a ‘developer build’ for MacOS X. Alternatively, AbiWord for Mac OS Xor Nisus Writer 4.16 for Mac OS 8–9 areadvanced word processors that featurespell-checking, a thesaurus and limitedreference management. More advancedreference management can be achievedwith Papyrus, which allows up to200 references in the free version andimports from Medline, Silver Platter andother popular formats.
MacAnova is a statistical analysispackage, and contrary to its name, isavailable for both Windows and Mac OS.It is not easy to use, but it does supplystrong statistical and graphics tools. Anequally impressive package is the ViSta(for ‘Visual Statistics System’) which isalso available for both platforms.However, this has an unconventionalgraphical user interface that takes sometime to get used to.
For advanced photo-editing, The GIMP(Fig. 2) rivals Adobe Photoshop both innumber and complexity of functions, butunfortunately it is only available for
Windows. Instead, image editing on anApple can be done in Scion Image.Although not as versatile as The GIMP, ithas all the necessary features of aprofessional image editing application.Both programs are compatible withPhotoshop filter plug-ins, giving access tothousands of free image tools.
Adobe portable document format files(PDF files) are well suited for distributionof print-quality documents across manyoperating systems and hardwareplatforms (cross platform). As analternative to the Adobe Acrobat package,PDF files can be created with the freeGhostscript package that converts almostany file format into the PDF format via anintermediate postscript file. Alternatively,the incredibly simple PDF995, which isbuilt on Ghostscript, installs as a printerdriver and prints directly to a PDF filefrom any application.
Although both operating systemsusually include an Internet browser, it isworth the effort to install theindependent, cross-platform Mozillabrowser suite. Mozilla includes an e-mailand newsgroup reader, a file transferprotocol (FTP) client and an Internetrelay chat program. Installing thePubMed toolbar add-in leaves apermanent toolbar in the Mozilla webbrowser, providing quick and easy accessto important web pages on the National
Center for Biotechnology Information(NCBI) website and a direct searchinterface for many of the NCBI databases(PubMed, GenBank, OMIM, amongothers). Both OpenOffice.org and Mozillainclude HTML and web-page editingsoftware, making it straightforward tomaintain a website.
When it comes to computationalbiology and in silico manipulation of DNA,RNA and protein sequences, a plethora oftools is available over the Internet, forinstance through the Baylor College ofMedicine BCM Search Launcher [1] orSeWeR, with its attractive user interfacethat even incorporates outside toolsthrough a customization feature [2]. Fromboth sites, it is possible to submit DNA,RNA or protein sequences for such diverseanalyses as sequence alignments, contigassembling, translation, exon–intronidentification, phylogenic clustering,restriction enzyme digestion andhomology searching. One can prepareprimers for PCR with the Primer3 tool,which attempts to eliminate as manypitfalls of PCR as possible (e.g. primer-dimers and self-complementarity), andallows for specification of experimentalvariables; for example, primer degeneracy,melting temperature and ionconcentration of buffers. The BiologyWorkbench, co-developed by University ofIllinois and UC San Diego [3], is anotherInternet portal, which supplies many ofthe tools mentioned above, but alsoprovides the useful option of savingsequences, files and results on the webserver for future use.
Three-dimensional protein structures,for example from the Protein Data Bank [4],can be visualized and manipulated(mutation of amino acids, twisting andturning of side chains and truncation of theprotein chain) with the Swiss-PdbViewer [5](Fig. 3). SwissPdbViewer can export toPersistence of Vision Raytracer (POV-Ray),which produces spectacular visualrepresentations of 3D protein models. Twosimilar programs, Visual MolecularDynamics (VMD), and The Biodesigner,although still somewhat unstable, aregreat for creating pictures of proteinmodels, but do not contain all themanipulation features of theSwiss-PdbViewer. Instead, they have theability to produce high-quality imagesdirectly. Finally, if one wants to ventureinto protein homology modeling, ExPASy offers this over the Internet via the
TRENDS in Biochemical Sciences Forum
Fig. 2. The GIMP can be used for advanced image editing, supports Adobe Photoshop filter plug-ins and iscompatible with most image formats. Here an autoradiograph is being edited with a number of dialog boxescontrolling various settings.
TRENDS in Biochemical Sciences Vol.27 No.11 November 2002
http://tibs.trends.com
588 Forum
Swiss-Model website that walks theresearcher through the process. Secondary-structure predictions of proteins areavailable from 3D-PSSM [6] and RNA andDNA can be folded and visualized throughZuker and Turner’s mFold server [7,8].
The free open-source operating system
GNU/Linux
Although it goes beyond the scope of thisarticle, it would be an oversight not tomention the free and very successfuloperating system GNU/Linux, which is
the progenitor of open-source software(for background information onGNU/Linux visit Linux Online athttp://www.linux.org/). Apart from thesoftware mentioned above, GNU/Linuxoffers advanced options, such asmulti-computer clustering (for higherprocessing power), and a large number ofsoftware development tools and specialistprogramming tools. One relevantexample is BioPerl, which is a complexand competent bioinformaticsprogramming package that can tap
directly into large public databanks, and process rapidly through hundreds of sequences. GNU/Linux is available for almost any contemporary computer and ~1000 free scientificprograms can currently be downloadedfrom the Internet.
Conclusion
The availability and maturation of open-source and free software now means it ispossible to create a highly capable andstable biocomputing system for nothingmore than the cost of low-budgethardware and an Internet connection.Many corporations (IBM, AOL-TimeWarner and HP/Compaq) are supportingthe open-source community, leading tocontinued efforts and growth within thisfield. Together with the powerful toolsand databases available over theInternet, biological computing is nolonger limited to well-fundedorganizations, but is widely available tothe general public.
References
1 Smith, R.F. et al. (1996) BCM Search Launcher– an integrated interface to molecular biologydata base search and analysis services availableon the World Wide Web. Genome Res. 6,454–462
2 Basu, M.K. (2001) SeWeR: a customizable andintegrated dynamic HTML interface tobioinformatics services. Bioinformatics 17,577–578
3 Subramaniam, S. (1998) The Biology Workbench– a seamless database and analysis environmentfor the biologist. Proteins 32, 1–2
4 Berman, H.M. et al. (2000) The Protein DataBank. Nucleic Acids Res. 28, 235–242
5 Guex, N. and Peitsch, M.C. (1997) SWISS-MODEL and the Swiss-PdbViewer: anenvironment for comparative protein modeling.Electrophoresis 18, 2714–2723
6 Bates, P.A. et al. (2001) Enhancement of proteinmodeling by human intervention in applying theautomatic programs 3D-JIGSAW and 3D-PSSM.Proteins 45 (Suppl. 5), 39–46
7 Mathews, D.H. et al. (1999) Expanded sequencedependence of thermodynamic parametersimproves prediction of RNA secondary structure.J. Mol. Biol. 288, 911–940
8 SantaLucia, J., Jr (1998) A unified view ofpolymer, dumbbell, and oligonucleotide DNAnearest-neighbor thermodynamics. Proc. Natl.Acad. Sci. U. S. A. 95, 1460–1465
Mads Wichmann Matthiessen
Dept of Medical Gastroenterology (54O3),Herlev Research Hospital, Herlev Ringvej 75,2730 Herlev, Denmark.e-mail: [email protected]
Published online: 19 September 2002
TRENDS in Biochemical Sciences Forum
Fig. 3. The Swiss-PdbViewer provides sophisticated visualization and modification tools for three dimensionalprotein models. In this example, arginine 54 of HNRNP A1 (1HA1) is mutated to glutamic acid beforeresubmission to the Swiss-Model web server for energy minimization based on homology modeling. Theright-hand panel allows easy selection of individual amino acids, which is very useful when preparingpresentations. A rendering of the 3D structure made with POV-Ray, complete with lighting and shadows, issuperimposed over the Swiss-PdbViewer.
Freeware and open-source software
Open-source software is being developed with the programming code freely available foranyone to obtain, scrutinize and use. This effectively leads to strong peer-reviewing and editingof the code. In some cases thousands of programmers, ranging from teenage amateurs tocorporate professionals, participate in software projects. With proper maintenance of the code,very high quality and secure programs can be developed because problems cannot be kepthidden and are addressed openly. A significant characteristic of open-source software is itsextensive catalog of applications. Visiting SourceForge.org, a large repository of free software,shows that there are ~2500 projects in the category ‘Scientific/Engineering’ and softwarecatering for almost any need can be found here. Freeware is a less well defined term, but isusually closed-source software that is being distributed for free.
For more information, visit The Free Software Foundation (www.fsf.org) or The Open-SourceInitiative (www.opensource.org).
Operating system
The operating system (OS) on a computer is the software that administers all the other programsin the computer. The OS acts as a necessary software layer between the computer’s hardwareand the programs by managing requests and communication between the two.
Glossary