grid enabling phylogenetic inference on virus sequences using beast - a possibility?

13
Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility? EUAsiaGrid Workshop 4-6 May 2010 Chanditha Hapuarachchi Environmental Health Institute National Environment Agency

Upload: zahina

Post on 21-Jan-2016

26 views

Category:

Documents


0 download

DESCRIPTION

Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?. EUAsiaGrid Workshop 4-6 May 2010. Chanditha Hapuarachchi Environmental Health Institute National Environment Agency. Outline. Work scope Analytical approach Current limitations - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

EUAsiaGrid Workshop

4-6 May 2010

Chanditha Hapuarachchi

Environmental Health Institute

National Environment Agency

Page 2: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Outline

Work scope

Analytical approach

Current limitations

What is expected from Grid-enabling?

Page 3: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Work scope

Understanding the molecular epidemiology of vector-borne, infectious diseases in Singapore with a view of utilizing information in disease control operations

Objectives To determine the routes of pathogen migration (mainly Dengue and

Chikungunya viruses)

To understand the evolutionary dynamics of pathogens

To understand the outbreak potential of pathogens within the country

Page 4: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Molecular epidemiology

of DENV & CHIKV

Phylogenetic relationships

(trees)

(BEAST, MEGA)

Evolutionary dynamics

(Evolutionary rates, selection pressure, recombination etc)

(BEAST, HYPHY etc.)

Population dynamics

(Bayesian skyline plots)

(BEAST)

Temporo-spatial distribution of viruses

(BEAST, NETWORK)

What phylogenetic inferences are made?

BEAST is a multi-task software package

Page 5: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

CHIKV whole genome tree with spatial model

India

Sri Lanka

Singapore

Malaysia

Ind. Ocean Islands

Kenya

Time (yrs)

Page 6: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Spatial distribution of different lineages of DENV in Singapore

Page 7: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

However……..

BEAST analysis is time consuming & requires substantial computing power

Page 8: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Limitations of the BEAST approach?

Size of dataset

Length of sequences

No. of sequences

E.g. Analyzing a dataset of ~90 whole genomes of CHIKV (11.8 kb) takes several days depending on the available computing power

Page 9: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Analytical parameters

A basic analysis takes ~0.3 hrs per million states

(Core 2 duo, 2.1 GHz, 4 GB RAM, >50% CPU)

A general run involves at least a 100 million sampling frame

(=~30 hrs)

The duration increases substantially with changing parameters

Incorporation of spatial model (7 states) alone increases the runtime to ~0.4 hrs per million states

The ultimate duration depends on Effective Sample Size (ESS)

values (general requirement >200)

Limitations…

Page 10: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

BEAST Tracer output window

Page 11: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Limitations…

Number of parallel runs & users

↑ runs & users -------- ↓ analytical efficiency

Single run takes up >50% of CPU power

Page 12: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Why to Grid-enable BEAST?

Enables efficient data analysis

parallel runs

multiple users

expanded datasets

Enhances data interpretation

Page 13: Grid enabling phylogenetic inference on virus sequences using BEAST - a possibility?

Can Grid-enabling help to improve the existing performance?