affordable biocomputing for everyone: using the internet, freeware and open-source software

3
TRENDS in Biochemical Sciences Vol.27 No.11 November 2002 586 Forum Public institutions such as hospitals and universities often have limited resources and until recently they have been unable to use the full power of computers as an integral part of the design and analysis of experiments. The high cost of bioanalytical software and the hardware necessary to run it, meant that only private companies or well-funded organizations could afford professional biocomputing systems. Even standard office and reference management software is very costly; reference management software is priced at around $300, and the typical office package starts at $400 per license. However, the global adoption of the Internet, particularly within the biological and medical research community, has made a large number of professional tools freely available. Furthermore, with the availability of inexpensive, high-powered desktop computers and several advances both in accessibility and maturation of FREEWARE and OPEN-SOURCE SOFTWARE (see Glossary), there is little need for customized computing platforms running proprietary and expensive office and bioinformatics software. It is now possible to create highly capable computer systems that enable the user to set up a complete working biocomputing platform with nothing more than a desk-top computer and an Internet connection. Here I describe how such a computer system could be created using Microsoft Windows or Macintosh OS, given that many computers come pre-configured with these OPERATING SYSTEMS. All of the software mentioned here is free, except for Microsoft Windows, and almost all of the software has reached a mature and stable development status (i.e. at least version 1.0). Links and details on the software presented can be found at http://www.madswichmann.dk/ biocomputing.html. System requirements Hardware requirements have been a stumbling block for many users, but the processing power of modern computers greatly outperforms the requirements of most software. A PC with a 1.2 GHz processor, including a 20 Gb hard drive and a 17-inch monitor can now be purchased for less than $600. Obviously, more-powerful computer systems will increase the execution speed of programs, particularly for intensive tasks such as modeling of proteins and searching through large datasets. PCs with Windows and an Intel Pentium or AMD K6 processor running at 200 MHz will accommodate most computing needs and with around 4 Gb of total hard disk capacity there should be sufficient storage space for most operations. The graphic capabilities of the computer are less important and only become a bottleneck when one is working with visual tools, for instance visualizing molecules. Previously, the Macintosh OS did not have the seemingly endless number of freeware applications that were available for Windows, but the recent release of Mac OS X has lured open-source developers to the platform, both as a challenge and because of the famous user- friendliness and style of Apple systems (see for example Unix software for your Mac [http://fink.sourceforge.net]). Currently, however, although free software is being developed, Mac OS users have fewer software options than Windows users. Even so, any Apple computer with a PowerPC processor and Mac OS 8.5 or higher should be sufficient for setting up a fully functional biocomputing system. A biocomputing system based on freeware and open-source software Open-source and free alternatives to commercial products have recently begun to appear. On a Windows computer, OpenOffice.org is a very capable and completely free office suite (Fig. 1), with every application an office will ever need, TRENDS in Biochemical Sciences http://tibs.trends.com 0968-0004/02/$ – see front matter © 2002 Elsevier Science Ltd. All rights reserved. PII: S0968-0004(02)02155-2 Forum Computer Corner Affordable biocomputing for everyone: using the Internet, freeware and open-source software Mads Wichmann Matthiessen Fig. 1. OpenOffice.org is similar to other modern office suites, providing tools for word processing, spreadsheets, presentations, and drawing. In this screenshot, an article is being edited in the word processor with an image-library open at the top. The red underline beneath some words shows mis-spelled or unknown words. On the right, a formatting dialog is open.

Upload: mads-wichmann-matthiessen

Post on 18-Sep-2016

212 views

Category:

Documents


0 download

TRANSCRIPT

TRENDS in Biochemical Sciences Vol.27 No.11 November 2002586 Forum

Public institutions such as hospitals anduniversities often have limited resourcesand until recently they have been unableto use the full power of computers as anintegral part of the design and analysis ofexperiments. The high cost ofbioanalytical software and the hardwarenecessary to run it, meant that onlyprivate companies or well-fundedorganizations could afford professionalbiocomputing systems. Even standardoffice and reference management softwareis very costly; reference managementsoftware is priced at around $300, and thetypical office package starts at $400 perlicense. However, the global adoption ofthe Internet, particularly within thebiological and medical researchcommunity, has made a large number ofprofessional tools freely available.Furthermore, with the availability ofinexpensive, high-powered desktopcomputers and several advances both inaccessibility and maturation of FREEWARE

and OPEN-SOURCE SOFTWARE (see Glossary),there is little need for customizedcomputing platforms running proprietaryand expensive office and bioinformaticssoftware. It is now possible to createhighly capable computer systems thatenable the user to set up a completeworking biocomputing platform withnothing more than a desk-top computerand an Internet connection.

Here I describe how such a computersystem could be created using MicrosoftWindows or Macintosh OS, given thatmany computers come pre-configuredwith these OPERATING SYSTEMS. All of thesoftware mentioned here is free, except for Microsoft Windows, and almost all ofthe software has reached a mature andstable development status (i.e. at leastversion 1.0). Links and details on thesoftware presented can be found athttp://www.madswichmann.dk/biocomputing.html.

System requirements

Hardware requirements have been astumbling block for many users, but the

processing power of modern computersgreatly outperforms the requirements ofmost software. A PC with a 1.2 GHzprocessor, including a 20 Gb hard driveand a 17-inch monitor can now bepurchased for less than $600. Obviously,more-powerful computer systems willincrease the execution speed of programs,particularly for intensive tasks such asmodeling of proteins and searchingthrough large datasets. PCs withWindows and an Intel Pentium or AMDK6 processor running at 200 MHz willaccommodate most computing needs andwith around 4 Gb of total hard diskcapacity there should be sufficient storagespace for most operations. The graphiccapabilities of the computer are lessimportant and only become a bottleneckwhen one is working with visual tools, forinstance visualizing molecules.

Previously, the Macintosh OS did nothave the seemingly endless number offreeware applications that were available

for Windows, but the recent release ofMac OS X has lured open-sourcedevelopers to the platform, both as achallenge and because of the famous user-friendliness and style of Apple systems(see for example Unix software for yourMac [http://fink.sourceforge.net]).Currently, however, although freesoftware is being developed, Mac OSusers have fewer software options thanWindows users. Even so, any Applecomputer with a PowerPC processor andMac OS 8.5 or higher should be sufficientfor setting up a fully functionalbiocomputing system.

A biocomputing system based on freeware

and open-source software

Open-source and free alternatives tocommercial products have recently begunto appear. On a Windows computer,OpenOffice.org is a very capable andcompletely free office suite (Fig. 1), withevery application an office will ever need,

TRENDS in Biochemical Sciences

http://tibs.trends.com 0968-0004/02/$ – see front matter © 2002 Elsevier Science Ltd. All rights reserved. PII: S0968-0004(02)02155-2

Forum

Computer Corner

Affordable biocomputing for everyone: using the

Internet, freeware and open-source software

Mads Wichmann Matthiessen

Fig. 1. OpenOffice.org is similar to other modern office suites, providing tools for word processing, spreadsheets,presentations, and drawing. In this screenshot, an article is being edited in the word processor with an image-libraryopen at the top. The red underline beneath some words shows mis-spelled or unknown words. On the right,a formatting dialog is open.

TRENDS in Biochemical Sciences Vol.27 No.11 November 2002

http://tibs.trends.com

587Forum

including software for word processing,spreadsheets, reference management, adatabase, presentations, editing formulaeand drawing. It even imports and exportsto competing file formats, such as MicrosoftOffice and WordPerfect. On Applecomputers, OpenOffice.org has yet to reacha mature level, but it can nevertheless bedownloaded as a ‘developer build’ for MacOS X. Alternatively, AbiWord for Mac OS Xor Nisus Writer 4.16 for Mac OS 8–9 areadvanced word processors that featurespell-checking, a thesaurus and limitedreference management. More advancedreference management can be achievedwith Papyrus, which allows up to200 references in the free version andimports from Medline, Silver Platter andother popular formats.

MacAnova is a statistical analysispackage, and contrary to its name, isavailable for both Windows and Mac OS.It is not easy to use, but it does supplystrong statistical and graphics tools. Anequally impressive package is the ViSta(for ‘Visual Statistics System’) which isalso available for both platforms.However, this has an unconventionalgraphical user interface that takes sometime to get used to.

For advanced photo-editing, The GIMP(Fig. 2) rivals Adobe Photoshop both innumber and complexity of functions, butunfortunately it is only available for

Windows. Instead, image editing on anApple can be done in Scion Image.Although not as versatile as The GIMP, ithas all the necessary features of aprofessional image editing application.Both programs are compatible withPhotoshop filter plug-ins, giving access tothousands of free image tools.

Adobe portable document format files(PDF files) are well suited for distributionof print-quality documents across manyoperating systems and hardwareplatforms (cross platform). As analternative to the Adobe Acrobat package,PDF files can be created with the freeGhostscript package that converts almostany file format into the PDF format via anintermediate postscript file. Alternatively,the incredibly simple PDF995, which isbuilt on Ghostscript, installs as a printerdriver and prints directly to a PDF filefrom any application.

Although both operating systemsusually include an Internet browser, it isworth the effort to install theindependent, cross-platform Mozillabrowser suite. Mozilla includes an e-mailand newsgroup reader, a file transferprotocol (FTP) client and an Internetrelay chat program. Installing thePubMed toolbar add-in leaves apermanent toolbar in the Mozilla webbrowser, providing quick and easy accessto important web pages on the National

Center for Biotechnology Information(NCBI) website and a direct searchinterface for many of the NCBI databases(PubMed, GenBank, OMIM, amongothers). Both OpenOffice.org and Mozillainclude HTML and web-page editingsoftware, making it straightforward tomaintain a website.

When it comes to computationalbiology and in silico manipulation of DNA,RNA and protein sequences, a plethora oftools is available over the Internet, forinstance through the Baylor College ofMedicine BCM Search Launcher [1] orSeWeR, with its attractive user interfacethat even incorporates outside toolsthrough a customization feature [2]. Fromboth sites, it is possible to submit DNA,RNA or protein sequences for such diverseanalyses as sequence alignments, contigassembling, translation, exon–intronidentification, phylogenic clustering,restriction enzyme digestion andhomology searching. One can prepareprimers for PCR with the Primer3 tool,which attempts to eliminate as manypitfalls of PCR as possible (e.g. primer-dimers and self-complementarity), andallows for specification of experimentalvariables; for example, primer degeneracy,melting temperature and ionconcentration of buffers. The BiologyWorkbench, co-developed by University ofIllinois and UC San Diego [3], is anotherInternet portal, which supplies many ofthe tools mentioned above, but alsoprovides the useful option of savingsequences, files and results on the webserver for future use.

Three-dimensional protein structures,for example from the Protein Data Bank [4],can be visualized and manipulated(mutation of amino acids, twisting andturning of side chains and truncation of theprotein chain) with the Swiss-PdbViewer [5](Fig. 3). SwissPdbViewer can export toPersistence of Vision Raytracer (POV-Ray),which produces spectacular visualrepresentations of 3D protein models. Twosimilar programs, Visual MolecularDynamics (VMD), and The Biodesigner,although still somewhat unstable, aregreat for creating pictures of proteinmodels, but do not contain all themanipulation features of theSwiss-PdbViewer. Instead, they have theability to produce high-quality imagesdirectly. Finally, if one wants to ventureinto protein homology modeling, ExPASy offers this over the Internet via the

TRENDS in Biochemical Sciences Forum

Fig. 2. The GIMP can be used for advanced image editing, supports Adobe Photoshop filter plug-ins and iscompatible with most image formats. Here an autoradiograph is being edited with a number of dialog boxescontrolling various settings.

TRENDS in Biochemical Sciences Vol.27 No.11 November 2002

http://tibs.trends.com

588 Forum

Swiss-Model website that walks theresearcher through the process. Secondary-structure predictions of proteins areavailable from 3D-PSSM [6] and RNA andDNA can be folded and visualized throughZuker and Turner’s mFold server [7,8].

The free open-source operating system

GNU/Linux

Although it goes beyond the scope of thisarticle, it would be an oversight not tomention the free and very successfuloperating system GNU/Linux, which is

the progenitor of open-source software(for background information onGNU/Linux visit Linux Online athttp://www.linux.org/). Apart from thesoftware mentioned above, GNU/Linuxoffers advanced options, such asmulti-computer clustering (for higherprocessing power), and a large number ofsoftware development tools and specialistprogramming tools. One relevantexample is BioPerl, which is a complexand competent bioinformaticsprogramming package that can tap

directly into large public databanks, and process rapidly through hundreds of sequences. GNU/Linux is available for almost any contemporary computer and ~1000 free scientificprograms can currently be downloadedfrom the Internet.

Conclusion

The availability and maturation of open-source and free software now means it ispossible to create a highly capable andstable biocomputing system for nothingmore than the cost of low-budgethardware and an Internet connection.Many corporations (IBM, AOL-TimeWarner and HP/Compaq) are supportingthe open-source community, leading tocontinued efforts and growth within thisfield. Together with the powerful toolsand databases available over theInternet, biological computing is nolonger limited to well-fundedorganizations, but is widely available tothe general public.

References

1 Smith, R.F. et al. (1996) BCM Search Launcher– an integrated interface to molecular biologydata base search and analysis services availableon the World Wide Web. Genome Res. 6,454–462

2 Basu, M.K. (2001) SeWeR: a customizable andintegrated dynamic HTML interface tobioinformatics services. Bioinformatics 17,577–578

3 Subramaniam, S. (1998) The Biology Workbench– a seamless database and analysis environmentfor the biologist. Proteins 32, 1–2

4 Berman, H.M. et al. (2000) The Protein DataBank. Nucleic Acids Res. 28, 235–242

5 Guex, N. and Peitsch, M.C. (1997) SWISS-MODEL and the Swiss-PdbViewer: anenvironment for comparative protein modeling.Electrophoresis 18, 2714–2723

6 Bates, P.A. et al. (2001) Enhancement of proteinmodeling by human intervention in applying theautomatic programs 3D-JIGSAW and 3D-PSSM.Proteins 45 (Suppl. 5), 39–46

7 Mathews, D.H. et al. (1999) Expanded sequencedependence of thermodynamic parametersimproves prediction of RNA secondary structure.J. Mol. Biol. 288, 911–940

8 SantaLucia, J., Jr (1998) A unified view ofpolymer, dumbbell, and oligonucleotide DNAnearest-neighbor thermodynamics. Proc. Natl.Acad. Sci. U. S. A. 95, 1460–1465

Mads Wichmann Matthiessen

Dept of Medical Gastroenterology (54O3),Herlev Research Hospital, Herlev Ringvej 75,2730 Herlev, Denmark.e-mail: [email protected]

Published online: 19 September 2002

TRENDS in Biochemical Sciences Forum

Fig. 3. The Swiss-PdbViewer provides sophisticated visualization and modification tools for three dimensionalprotein models. In this example, arginine 54 of HNRNP A1 (1HA1) is mutated to glutamic acid beforeresubmission to the Swiss-Model web server for energy minimization based on homology modeling. Theright-hand panel allows easy selection of individual amino acids, which is very useful when preparingpresentations. A rendering of the 3D structure made with POV-Ray, complete with lighting and shadows, issuperimposed over the Swiss-PdbViewer.

Freeware and open-source software

Open-source software is being developed with the programming code freely available foranyone to obtain, scrutinize and use. This effectively leads to strong peer-reviewing and editingof the code. In some cases thousands of programmers, ranging from teenage amateurs tocorporate professionals, participate in software projects. With proper maintenance of the code,very high quality and secure programs can be developed because problems cannot be kepthidden and are addressed openly. A significant characteristic of open-source software is itsextensive catalog of applications. Visiting SourceForge.org, a large repository of free software,shows that there are ~2500 projects in the category ‘Scientific/Engineering’ and softwarecatering for almost any need can be found here. Freeware is a less well defined term, but isusually closed-source software that is being distributed for free.

For more information, visit The Free Software Foundation (www.fsf.org) or The Open-SourceInitiative (www.opensource.org).

Operating system

The operating system (OS) on a computer is the software that administers all the other programsin the computer. The OS acts as a necessary software layer between the computer’s hardwareand the programs by managing requests and communication between the two.

Glossary