g09-misc-emboss

16
EBI is an Outstation of the European Molecular Biology Laboratory. EMBOSS European Molecular Biology Open Software Suite Open-Bio Project Update 2011 Peter Rice [email protected] Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster

Upload: bioinformatics-open-source-conference

Post on 28-Dec-2014

546 views

Category:

Technology


0 download

DESCRIPTION

EMBOSS: New developments and extended data access (Peter Rice)

TRANSCRIPT

Page 1: G09-Misc-EMBOSS

EBI is an Outstation of the European Molecular Biology Laboratory.

EMBOSS

European Molecular Biology Open Software Suite

Open-Bio Project Update 2011

Peter Rice [email protected]

Alan Bleasby, Jon Ison,

Mahmut Uludag, Michael Schuster

Page 2: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.232

A quick introduction

• Open source package for sequence analysis• ANSI C source code• GPL licensed applications, LGPL libraries• 275+ applications• 150+ third party applications in 15 associated packages

• MIRA, MEME, HMMER, PHYLIP, VIENNA, etc.• Project started 1996 at Sanger and Daresbury/HGMP• Now based at EBI• Release 1.0.0 15th July 2000• Release 6.4.0 15th July 2011• Funded by UK-BBSRC and EMBL-EBI• Originally funded by the Wellcome Trust• Additional funds from UK-MRC

Page 3: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.233

Who do we serve?

• Expert software developers• Bioinformaticians• Computer scientists

• Expert users• Biology research community• Industry

• Scientific users• Biology research community• Industry

Page 4: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.234

EMBOSS command line interface

• EMBOSS applications run from the command line• This is not the only interface

• There are over 100 interfaces and packaged systems available• Web: wEMBOSS, Mobyle• GUI: Jemboss• Web Services: SoapLab• Workflows: Galaxy, Taverna, Pipeline Pilot• Windows: mEMBOSS

• All applications have a command definition file (.acd)• Defines all inputs, outputs, and other options• Read at startup• Contains all command line options with descriptions• Template for any other interface

Page 5: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.235

EMBOSS Update

• Release 6.4.0 as usual on 15th July 2011• New Website emboss.open-bio.org• Three open source books: users, developers, admin

• Cambridge University Press

Page 6: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.236

Data sources for EMBOSS

• Server definitions• One server, 100+ databases• server:dbname as the database name

• Data access methods• Ensembl, DAS, BioMart, CHADO,SRS, Entrez, MRS• EBI REST and SOAP services• Data resource Catalogue (DRCAT)

• emboss.standard file for all installations• IF-ELSE-ENDIF to customize for SQL, AXIS2C, local setup

• New applications• showserver, dbtell, servertell

Page 7: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.237

New data types: input and output

• OBO ontology terms• NCBI Taxonomy• Data Resource Catalogue entries• Text• URL

• Cross-references:• dbname and identifier• data content

Page 8: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.238

New query language

• SRS-like syntax• id lists: dbname:{ida,idb,idc}• or operator: dbname-{id:h* | des:hemoglobin}• and operator: dbname-{id:h* & des:hemoglobin}• and operator: dbname-{id:h* & des:hemoglobin}• eor operator: dbname-{id:h* ^ des:hemoglobin}

• Compressed (20-fold) b+tree indexes• New indexing applications (obo, taxon, drcat)

Page 9: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.239

EDAM ontology

• EDAM defines topic, operation, data, format, identifier• ACD file application, inputs, outputs, parameters• DRCAT resources, queries, identifiers• SoapLab web services• Redefined EMBOSS program groups.

• OBO format ontology• 2835 terms• Available throughout EMBOSS as database EDAM:

• New applications• EDAM namespace searches, relation queries• OBO ontology applications• GO, SO, and other OBO ontologies in release

Page 10: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.2310

DRCAT Data Resource Catalogue

• Public Data Resources• EDAM annotations• UniProt and EMBL/GenBank/DDBJ cross-references• Query prototypes• Example identifiers for testing• 662 entries• Available in EMBOSS as database DRCAT:

• Applications:• Search by EDAM annotation• Search by 18 indexed fields

Page 11: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.2311

Ontologies: NCBI Taxonomy

• Parsers for “.dmp” files• Indexed by dbxtax• Navigation up, down, siblings (the usual suspects)• Automatic cross references from sequence data

• EMBL source line• UniProt OX lines• BioMart mart name (organism name)• etc.

• New applications• Search and retrieve from taxon hierarchy

Page 12: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.2312

Installation

• Release size increased• EDAM, DRCAT, NCBI Taxonomy, GO, plus index files• Associated packages

• AXIS2C (SOAP web service access)• MYSQL (Ensembl)• PostgresQL (FlyBase)

• mEMBOSS for Windows• Enhanced QA testing

• Standard test set adapted for use on Windows and Unix

Page 13: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.2313

EMBOSS Interfaces and wrappers

• Two releases in this year• Too many for other projects to keep up

• So we are obliged to help, starting with:• SoapLab2• Jemboss• Galaxy• Mobyle• … and anyone else who asks

• Interface generation should be automated• Tested during development• Changes highlighted before release

Page 14: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.2314

EMBOSS Future Plans

• Further development this year• Mapped short reads• Reference sequences• Sequence variation• Genome browser data format support

• Leaving EBI in December

• … into the unknown

• …still supporting EMBOSS and planning new developments

Page 15: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.2315

Peter RiceAlan Bleasby

Jon Ison Mahmut Uludag

The Emboss Team

Michael Schuster

Page 16: G09-Misc-EMBOSS

BOSC 2011: EMBOSS10.04.2316

Acknowledgements

• EBI: Peter Rice, Alan Bleasby, Jon Ison, Mahmut Uludag, Michael Schuster, Martin Senger, Tom Oinn, Jaina Mistry, Rodrigo Lopez, Sharmilla Pillai, Hamish McWilliam, Syed Haider

• RFCGR/HGMP: Alan Bleasby, Jon Ison, Tim Carver, Hugh Morgan, Claude Beazley, Lisa Mullan, Damian Counsell, Gary Williams, Val Curwen, Mark Faller, Sinead O’Leary, Thon deBoer, Martin Bishop

• LION: Thomas Laurent, Bijay Jassal, Bren Vaughan, Thure Etzold

• Sanger Institute: Ian Longden, Richard Bruskiewich, Simon Kelley

• National bioinformatics service providers in: Norway, Spain, Italy, Netherlands, Germany, Belgium, Russia, China, Canada, Australia, Argentina

• Others: Catherine Letondal, Don Gilbert, Rodger Staden, Bill Pearson, Webb Miller, Marie-Laetitia Denayer, Amandine Schurmann, Gabriele Weiler, Luke McCarthy, David Mathog, David Bauer, Henrikki Almusa, Thomas Siegmund, Scott Markel, Darryl Leon, Bastien Chevreux, Ivo Hofacker, Kristoffer Rapacki, Matus Kalas

• Cambridge University Press, LION bioscience, IBM, Hewlett-Packard, (Compaq), Apple, SGI, Sun, SciTegic, Microsoft Research

• Open-Bio Foundation, Sourceforge, ... And the British Antarctic Survey

http://emboss.open-bio.org

http://emboss.open-bio.org/wiki