owen bosc2010 taverna2.2-cows

11
http://taverna.org .uk Analysing African and European cattle with Taverna 2.2 Stuart Owen Based on the work by: Professor Andy Brass and Mohammad Khodadadi University of Manchester, UK Harry Noyes and Steve Kemp University of Liverpool, UK BOSC2010 – Boston.

Upload: bosc-2010

Post on 11-Jun-2015

418 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

Analysing African and European cattle with Taverna 2.2

Stuart Owen

Based on the work by:

Professor Andy Brass and Mohammad KhodadadiUniversity of Manchester, UK

Harry Noyes and Steve KempUniversity of Liverpool, UK

BOSC2010 – Boston.

Page 2: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

Analysing African and European cattle with Taverna 2.2A BioInformatics case study

demonstrating the use of the Taverna 2 workflow system

This is a snapshot of some exiting science which is currently in progress

Page 3: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

Analysing African and European cattle with Taverna 2.2

• 10,000 years separation• African Livestock adaptations:

• Hardier• Better disease resistance

• Potential outcomes: • Food security• Understanding resistance• Understanding environmental

Conditions• Drought• Parasites

• Understanding diversity

http://news.bbc.co.uk/1/hi/science_and_environment/10403254.stm

http://www.sciencemag.org/cgi/content/full/328/5986/1640

Page 4: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

Workflow and phases

MAP

FILTER

ANALYSIS

Page 5: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

Workflow and phases

Input SNP file

Populate DB with start SNP’s and resource version numbers

Lift-over: maps between UMD3 and BTA4 cow assemblies

Exon positions from ENSMBL

Find SNPs in Exon regions

PolyPhen to mark “dangerous” SNP’s

Page 6: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

Little more about the phases …

• Input SNP file result of 15 fold average coverage of an entire Boran cow

– 11.9 million SNP’s described.

– Resulting from Next Generation Sequencing.

• All initial data is stored within a Database, mapped by a runID to the versions of ENSEMBL, LiftOver, Polyphen.

• LiftOver – provides a mapping between 2 different reference cow assemblies –

– UMD3 : more accurate assembly

– BTA4 : better annotated and ENSEMBL friendly

– Store BT4 position, Chromosome and Allele in database

– Filter out, but store, results where there is a mismatch between the base.

Page 7: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

… Little more about the phases

• ESEMBL is used to retrieve annotations about the SNP’s : http://www.ensembl.org/

– For all the SNPs that have the same base we go over all the exons for cow in ENSEMBL and see if we can match the SNPs to any of these exons ( exon start < SNP position < exon end), also store geneID, Allele, associated Gene names, and Bio-Type.

– Filter out, but store, ENSEMBL/BTA4 mismatches.

– Second phase fetches the consequence according the the BTA4 positions.

– From this information a file is generated for PolyPhen, for all SNPs that got non-synonymous as a consequence.

• A local instance of PolyPhen is queried using a file generated from the ENSEMBL annotations to produce an indication of the level to which a SNP changes the protein.

• Outcome is an Annotated Database of ~20,000 “interesting” SNPs

Page 8: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

Packaged as a sharable virtual machine image

11.9 Million SNPs

LiftOverLiftOver

ResultsPolyPhen

50,000 annotatedSNPs

ENSEMBL

11.9 Million SNPs

LiftOverLiftOver

ResultsPolyPhen

20,000 annotatedSNPs + provenance.

ENSEMBL

Page 9: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

Packaged as a sharable virtual machine image

• LiftOver, Taverna, PolyPhen and the Workflow is packaged as a Virtual Machine image.

– Everything (except ENSEMBL) is run locally

– Full Cow analysis takes 2 days – previous attempts would have taken an estimated 3 months for the PolyPhen phase alone.

• Results and experiment can be distributed and shared as a complete package

– Re-use

– Repeatable

– Reproducible

• Future plans to deploy the image on “The Cloud”

Page 10: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

Packaged as a sharable virtual machine image

ENSEMBL

Boran Cow Annotated DB

MAP

FILTER

ANALYSIS

FILTER

ANALYSIS

MAP

FILTER

ANALYSISSheko Cow

N’Dama Cow

Etc …

Page 11: Owen bosc2010 taverna2.2-cows

http://taverna.org.uk

Highlights new Taverna 2.2 features

• Officially released last Wednesday – July 7th 2010

• Loading and sharing of service sets

• Ability to load and edit workflows that contain services that are offline

• Reporting on the state of the workflow

• Tabular representation of a workflow run

• Retrying and parallelization of service calls

• Consistent representation of the intermediate and workflow results

• Pause/resume/cancel of a running workflow

• Command line tool that allows you to execute workflows outside of the workbench.

• Faster, Better, Easier