g09-misc-emboss
DESCRIPTION
EMBOSS: New developments and extended data access (Peter Rice)TRANSCRIPT
EBI is an Outstation of the European Molecular Biology Laboratory.
EMBOSS
European Molecular Biology Open Software Suite
Open-Bio Project Update 2011
Peter Rice [email protected]
Alan Bleasby, Jon Ison,
Mahmut Uludag, Michael Schuster
BOSC 2011: EMBOSS10.04.232
A quick introduction
• Open source package for sequence analysis• ANSI C source code• GPL licensed applications, LGPL libraries• 275+ applications• 150+ third party applications in 15 associated packages
• MIRA, MEME, HMMER, PHYLIP, VIENNA, etc.• Project started 1996 at Sanger and Daresbury/HGMP• Now based at EBI• Release 1.0.0 15th July 2000• Release 6.4.0 15th July 2011• Funded by UK-BBSRC and EMBL-EBI• Originally funded by the Wellcome Trust• Additional funds from UK-MRC
BOSC 2011: EMBOSS10.04.233
Who do we serve?
• Expert software developers• Bioinformaticians• Computer scientists
• Expert users• Biology research community• Industry
• Scientific users• Biology research community• Industry
BOSC 2011: EMBOSS10.04.234
EMBOSS command line interface
• EMBOSS applications run from the command line• This is not the only interface
• There are over 100 interfaces and packaged systems available• Web: wEMBOSS, Mobyle• GUI: Jemboss• Web Services: SoapLab• Workflows: Galaxy, Taverna, Pipeline Pilot• Windows: mEMBOSS
• All applications have a command definition file (.acd)• Defines all inputs, outputs, and other options• Read at startup• Contains all command line options with descriptions• Template for any other interface
BOSC 2011: EMBOSS10.04.235
EMBOSS Update
• Release 6.4.0 as usual on 15th July 2011• New Website emboss.open-bio.org• Three open source books: users, developers, admin
• Cambridge University Press
BOSC 2011: EMBOSS10.04.236
Data sources for EMBOSS
• Server definitions• One server, 100+ databases• server:dbname as the database name
• Data access methods• Ensembl, DAS, BioMart, CHADO,SRS, Entrez, MRS• EBI REST and SOAP services• Data resource Catalogue (DRCAT)
• emboss.standard file for all installations• IF-ELSE-ENDIF to customize for SQL, AXIS2C, local setup
• New applications• showserver, dbtell, servertell
BOSC 2011: EMBOSS10.04.237
New data types: input and output
• OBO ontology terms• NCBI Taxonomy• Data Resource Catalogue entries• Text• URL
• Cross-references:• dbname and identifier• data content
BOSC 2011: EMBOSS10.04.238
New query language
• SRS-like syntax• id lists: dbname:{ida,idb,idc}• or operator: dbname-{id:h* | des:hemoglobin}• and operator: dbname-{id:h* & des:hemoglobin}• and operator: dbname-{id:h* & des:hemoglobin}• eor operator: dbname-{id:h* ^ des:hemoglobin}
• Compressed (20-fold) b+tree indexes• New indexing applications (obo, taxon, drcat)
BOSC 2011: EMBOSS10.04.239
EDAM ontology
• EDAM defines topic, operation, data, format, identifier• ACD file application, inputs, outputs, parameters• DRCAT resources, queries, identifiers• SoapLab web services• Redefined EMBOSS program groups.
• OBO format ontology• 2835 terms• Available throughout EMBOSS as database EDAM:
• New applications• EDAM namespace searches, relation queries• OBO ontology applications• GO, SO, and other OBO ontologies in release
BOSC 2011: EMBOSS10.04.2310
DRCAT Data Resource Catalogue
• Public Data Resources• EDAM annotations• UniProt and EMBL/GenBank/DDBJ cross-references• Query prototypes• Example identifiers for testing• 662 entries• Available in EMBOSS as database DRCAT:
• Applications:• Search by EDAM annotation• Search by 18 indexed fields
BOSC 2011: EMBOSS10.04.2311
Ontologies: NCBI Taxonomy
• Parsers for “.dmp” files• Indexed by dbxtax• Navigation up, down, siblings (the usual suspects)• Automatic cross references from sequence data
• EMBL source line• UniProt OX lines• BioMart mart name (organism name)• etc.
• New applications• Search and retrieve from taxon hierarchy
BOSC 2011: EMBOSS10.04.2312
Installation
• Release size increased• EDAM, DRCAT, NCBI Taxonomy, GO, plus index files• Associated packages
• AXIS2C (SOAP web service access)• MYSQL (Ensembl)• PostgresQL (FlyBase)
• mEMBOSS for Windows• Enhanced QA testing
• Standard test set adapted for use on Windows and Unix
BOSC 2011: EMBOSS10.04.2313
EMBOSS Interfaces and wrappers
• Two releases in this year• Too many for other projects to keep up
• So we are obliged to help, starting with:• SoapLab2• Jemboss• Galaxy• Mobyle• … and anyone else who asks
• Interface generation should be automated• Tested during development• Changes highlighted before release
BOSC 2011: EMBOSS10.04.2314
EMBOSS Future Plans
• Further development this year• Mapped short reads• Reference sequences• Sequence variation• Genome browser data format support
• Leaving EBI in December
• … into the unknown
• …still supporting EMBOSS and planning new developments
BOSC 2011: EMBOSS10.04.2315
Peter RiceAlan Bleasby
Jon Ison Mahmut Uludag
The Emboss Team
Michael Schuster
BOSC 2011: EMBOSS10.04.2316
Acknowledgements
• EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam, Syed Haider
• RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop
• LION: Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold
• Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley
• National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina
• Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux, Ivo Hofacker, Kristoffer Rapacki, Matus Kalas
• Cambridge University Press, LION bioscience, IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, SciTegic, Microsoft Research
• Open-Bio Foundation, Sourceforge, ... And the British Antarctic Survey
http://emboss.open-bio.org
http://emboss.open-bio.org/wiki