Download - Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis Sixth International

Identification of human-to-human transmissibility factors

in PB2 proteins of influenza A by large-scale mutual information analysis

Sixth International Conference on Bioinformatics (InCoB2007) Hong Kong, 28th August 2007

Olivo Miotto

Institute of Systems Science and Yong Loo Lin School of Medicine, National University of Singapore

AT Heiny Tan Tin Wee J Thomas August Vladimir BrusicYong Loo Lin School of Medicine Johns Hopkins University Cancer Vaccine Center National University of Singapore School of Medicine Dana-Farber Cancer Inst.

Outline

Background

Mutual Information Analysis

Materials and Methods

Results

Discussion and conclusions

Outline

Background



Results


1

Avian Flu: is The Pandemic coming?

Can H5N1 viruses spread amongst humans?

1918-1919

NANeuraminidase

9 subtypes

HAHaemagglutinin

16 subtypesViral RNA

Matrixprotein

The Influenza A Virus

SerologicalSubtyping:

http://www.roche.com/pages/facets/10/viruse.htm

Avian vs Human Influenza

Wild Waterfowl- Natural pool

- Over 100 subtypes observed

- Affects the digestive tract

- Often asymptomatic

Humans- Only 4 subtypes transmitted human-to-human (H2H)

- Avian-to-human (A2H) infection in small number of subtypes

- Affects the respiratory tract

Influenza Circulation

Wild Waterfowl

Avian-to-Avian

(A2A)

> 100 subtypes

Humans

Human-to-Human

(H2H)

only 4 subtypes

Domestic Poultry

Swine

cf. Webster RG et al. (1992). Microbiol Rev. 1992, 56(1), 152-179.

Avian origins of pandemic strains

From: Belshe RB (2005) N Engl J Med. 2005;353:2209-2211.

Pressing Questions

What are the mechanisms of adaptation to human hosts?

Which genes/products are involved? Can we identify mutations responsible for the

capability to infect humans? Can we identify mutations responsible for

adaptation to human-to-human transmission? Can we elucidate the role of such mutations? Can we assess the pandemic potential of current

H5N1 (and/or other strains)?

Study goals

Analyze all influenza protein sequence data available Historical data Whole Genome

Use statistical approaches to identify amino sites that characterize H2H transmissibility Compare H2H with non-H2H (A2A) Create an "adaptation map"

Use the information acquired to characterize individual isolates and strain evolution Map out the emergence of characteristic mutations Assess a strain's potential for H2H transmissibility

Why PB2?

Initial study performed on PB2 proteinInternal protein, component of RNPSome experimentally determined functional regions

Well-known E627K mutation involvement in mammalian and cold-temperature adaptation

From: http://www.omedon.co.uk/influenza/influenza/

Subbarao EK, London W, Murphy BR (1993) J Virol, 67(4), 1761-1764.

Outline

Background



Results


2

Information Theory

Information Entropy H is a measure of uncertainty

where e is an event from a possible set E, and pe is the probability of e occurring

Lower entropy -> more predictable outcome Entropy is affected by

the number of outcomes their relative probabilities

Shannon CE (1948) Bell System Tech J, 27: 379-423,623-656.

Entropy in multiple alignments

In a multiple sequence alignment, we can treat each alignment site as a separate "variable" Each observed residue at that site as a separate "event" The "event probability" as the percentage of sequences

in the alignment that contain the residue

H = 0 at fully conserved positions Single, 100% predictable outcome

H increases when several residues are observed at the same position, and/or

their probability is evenly distributed

Entropy is a measure of diversity

Both full sequences and sequence fragments can be used in entropy computation

Entropy of Influenza A PB2 protein

based on alignment of 3132 sequences

Entropy in Sequence Alignments

S G W K E E L A V N Q P V Q E F E T F E I EW E E K E E F A V Y I P L Q P F L T F G R LG E S P E E N F V N V P H Q Y F Y T V E P MG E S L E E A S V N G P F Q Y F Y T V E C LW E S K E E N A V N V P H Q K F F T V L T MT E N P E E E L F K V P F R V F F S L S H YK E T N E E P W F K K P M R E F Y S A W G LG E T N E E E A F N V P R R V F F S V S N LG E K N E E E A F K L P F R E F Y S V Q R VE E Q S E S A E S Q Q P E E P F Y Q I L E LG E Q V E S S E S Q E P H E E F Y Q I R T LG E K Q E S S S S Y E P K E E F A Q C V L LR E A Q E S Q A S N V P M E T F Y Q V R T LH E R V E S A A S N V P M E T F Y Q I A E LR E C H E V K A Q Y V P M L E F Y Q V K P WG E S S E V A A Q N V P M L W F Y Q R H V MG E A S E V E H Q N V P H L K F Y Q E G P P

M MZ Z ZH H

Z = zero entropy H = high entropyM = medium entropy

Comparing Alignments

G E T N E E E A F N V P R R V F F S V S N LG E T N E E E A F N V P R R V F F S V S N IG E T N E E E A W N V P R R V F F S I S N LG E T N E E E A F N V P R R V F F S V S N LG E T N E E E A F N V P R R V F F S V S N IG E T N E E E A F N V P R R V F F S I S N LG E T N E E E A W N V P R R V F F S V S S LG E T N E E E A F N V P R R V F F S V S S L

G E V N E D E A F N V P R R V F F S A S N LG E V N E D E A F N V P R R V F F S A S S IG E V N E D E A F N V P R R V F F S A S N LG E V N E D E A F N V P R R V F F S A S N IG E G N E D E A F S V P R R V F F S A S N IG E G N E D E A F S V P R R V F F S A S S IG E G N E D E A F S V P R R V F F S A S N IG E G N E D E A F S V P R R V F F S A S N L

AVIAN

sequences

HUMANsequences

C

C = characteristic sites

C

Z = zero entropy

Z Z

N = non-characteristic

N N

Mutual Information

Mutual Information (MI) uses information entropy to measure relationship between two variablesThe higher the MI, the more information about variable A

can be obtained by knowing the value of variable B

where H(A) and H(B) are entropies of A and B,

and H(A,B) is the joint entropy of A and B

Joint entropy is computed by considering eachcombination of the two variables as a separate outcome

Using MI to detect Characteristic Sites

At a characteristic site, the residue observed is strongly associated to a set of sequencesE.g. : Arg -> Avian Thr -> Human

This association is explored by measuring MI of The residue observed at a site The label of the set in which it is observed

MI is in range 0 – 1.0MI = 0.0 -> no statistical significance in the occurrence

of residues in the two sets

MI = 1.0 -> Residues observed in one set are never observed in the other, and vice versa

A2A (719 sequences)

H2H (1650 sequences)

PB2 Protein

PB2 Protein

MI

Entropy

Spikes indicate characteristic sites

Outline

Background



Results


3

The Antigenic Variability Analyzer (AVANA)

Source Sequences

Comprehensive set of PB2 proteins 3,132 protein sequences with accompanying metadata:

Host

Subtype

Country of isolation

Year of Isolation

Extracted from NCBI Protein and Nucleotide databases(all proteins > 40,000 sequences)

Automated aggregation, metadata extraction and metadata cleaning - using the ABK software

Multiple sequence alignment (MSA) using Muscle 3.6 Manually verified and corrected metadata and MSA

Datasets

Three subsets produced for comparison A2A

Avian sequences for all subtypes, except those that circulate amongst humans (H1N1, H2N2, H3N2, H1N2) and H5N1

H1N1HHuman sequences for H1N1

HxN2HHuman sequences for H2N2, H3N2, H1N2

To retain alignment, subsets are extracted from single MSA

H1N1 and HxN2 are separate co-circulating lineages

Webster RG et al. (1992). Microbiol Rev. 56(1), 152-179.

Identification of characteristic sites

Compare each of H1N1H, HxN2H against A2A1. Pick sites with high MI (>0.4)

2. Identify characteristic variants of human transmission:At least 4x more frequent in human than in avian set

Appear in at least 2% of human sequences

3. Identify avian characteristic variants

4. Discard site if >5% human sequences contain avian variantsAll sites with >2% avian variants were verified by hand

Merge catalogues of sites for H1N1H and HxN2HKeep only sites that are present in both catalogues

Outline

Background



Results


4

Results: 17 characteristic sites

A2A H2H A2A H2H9 DE NT 1933 98.57% 99.33% 0.49%

44 A S 1940 96.82% 99.27% 0.61%64 M T 1933 97.29% 99.58% 0.30%81 T MV 1933 97.93% 99.27% 0.30%

105 TA VM 1933 98.41% 99.45% 0.36%199 A S 1918 99.47% 99.76% 0.24%

271 TI A 1940 98.59% 99.51% 0.37%292 IV T 1940 95.54% 99.15% 0.67%368 R K 1940 98.12% 99.33% 0.67%475 L M 1918 99.66% 99.76% 0.24%567 DE N 1918 98.28% 99.39% 0.55%

588 AV I 1940 98.45% 99.63% 0.31%613 VA TI 1940 98.28% 99.32% 0.61%627 E K 1918 99.31% 99.76% 0.12%

661 A T 1933 86.72% 99.39% 0.43%674 AS T 1933 95.69% 99.63% 0.18%702 K R 1918 89.70% 99.39% 0.49%

Conservation X-presence of A2A

PositionChar. Variants 1st Human

isolateNaffakh

2000

Chen

2006

Chen GW et al. (2006) Emerg Infect Dis 12(9), 1353-1360. Naffakh N et al. (2000). J Gen Virol, 81, 1283-1291.

Functional Atlas of PB2 Adaptations

9 44 64 81 105 199 271 292 368 475 613 627 661 674567 588 702

DE M TITA IVA T A LR AE ASVAAV KDE

NT T AVM TS MV S MK TK TTII RN

Nuclear Localization

Signal

PB1binding

NPbinding

RNA capbinding

A2A

H2H

http://www-micro.msb.le.ac.uk/3035/Orthomyxoviruses.html

Reconstructing adaptation timelines

Characteristic sites can show "adaptation signature"A summary of mutations necessary for H2H adaptation

We can then characterize any PB2 sequence at these sites

Spanish Flu - H1N1A/Brevig Mission/1/1918

H1N1

1918-1957

H2N2

1957-1968

H3N2

1968-now

1940s: Fully H2H Signature

1918: Mostly Avian Signature

1957, 1968: No disruption by

pandemics: no introduction

of avian PB2 protein

Remarkable stability,

to present day

Human Timeline over 3 pandemics

Sporadic avian/swine infections

Swine Influenza Timeline

Evidence of avian and human mutations

Supports role of Swine

as “mixing vessel”

H5N1: Timeline 1997-2006

Presents H2H mutations more frequently than other

avian strains

H2H mutations usually do not persist

H5N1 not “becoming” H2H

Outline

Background



Results

Discussion and conclusions5

Discussion: Methodology

Detection of characteristic sites by MI has greater resolving power than previous approachesAllows multiple characteristic variants at a site

MI method allows large-scale analysisThousands of sequences, strong support for findings

Fragments can also be used too

Sequence signatures are effective for recapitulating strain characteristics and understanding trends

Good metadata is necessary for quality analysisLuckily, this is largely available for Influenza

Other viruses have poorer coverage

Discussion: Human Sequences

H2H variants show remarkable historical stabilityResilience to HA and NA changes suggests limited interplay

in adaptation between internal and external proteins

Location of characteristic sites in binding domains suggests complex interactions are involved in adaptation to H2H transmissionCataloguing characteristic sites in other RNP proteins may

shed new light on their roles

Both current lineages of PB2 (H1N1, HxN2) have evolved from the same source (1918 Spanish Flu)No evidence of PB2 interchange between the two lineages

Discussion: Avian Sequences

Avian strains rarely show any H2H mutation77% contain none (H5N1 excluded)

Only one sequence had 3 out of 17 mutation

Spanish Flu had 5 H2H mutationsCould be the minimum set, probably not optimal

H5N1 repeatedly exhibits H2H mutations, but they do not “stick”May account for its ability to jump the species barrier

May indicate that H5N1 PB2 is far from suited for H2H

Even the E627K mutation was not conserved

Reassortment is still possible- but how pathogenic?

Future Developments

Full Catalogue of Influenza Characteristic SitesPreliminary results:

Characterization of subgroups of Influenza

Application of the method to other viruses

Release of AVANA tool

NP 18 M1 3PA 19 M2 10

PB1 1 NS1 9PB1-F2 3 NS2 3

Characteristic site count

Acknowledgements and Thanks

Institute of Systems Science, NUSFunding support for this conference

Asif M Khan

KN SrinivasanTesting and feedback on AVANA tool

Partial Grant Support:

National Institute of Allergy and Infectious Diseases, NIHGrant No. 5 U19 AI56541, Contract No. HHSN2662-00400085C

ImmunoGrid ProjectEC Contract FP6-2004-IST-4, No. 028069

Download - Identification of human-to-human transmissibility factors in PB2 proteins of influenza A by large-scale mutual information analysis Sixth International

Top Related