1 mapping mutations in hiv rna by nimrod bar-yaakov [email protected] with co-operation of dr....

39
1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov [email protected] With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study Group, National HIV reference Laboratory in Tel-Hashomer.

Post on 21-Dec-2015

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

1

Mapping Mutations in HIV RNA

By Nimrod Bar-Yaakov [email protected]

With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study Group, National HIV reference Laboratory in Tel-Hashomer.

Page 2: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

2

Today’s Topics HIV – What is it and how it operates. What so important about the HIV DNA

mutations? Extracting the RNA sequence for

analyze. Naïve view of the HIV RNA sequences Locating the RNA mutations Analysis of the RNA mutation interactions

Page 3: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

3

Virus OverviewViruses may be defined as acellular

organisms whose genomes consist of nucleic acid, and which obligately replicate

inside host cells using host metabolic machinery and

ribosomes to form a pool of components which assemble into particles called VIRIONS, which serve to protect the

genome and to transfer it to other cells.

Page 4: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

4

Virus Animation

Page 5: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

5

Virus Overview The concept of a virus as an organism

challenges the way we define life: viruses do not respire, nor do they display irritability; they do not move and nor do they grow, however, they do most certainly

reproduce, and may adapt to new hosts.

Page 6: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

6

What is an HIV human immunodeficiency virus, A type of

retrovirus that is responsible for the fatal illness Acquired Immunodeficiency Syndrome (AIDS)

Retrovirus – A virus that's carry their genetic material in the form of RNA rather than DNA and have the enzyme reverse transcriptase that can transcribe it into DNA.

In most animals and plants, DNA is usually made into RNA, hence "retro" is used to indicate the opposite direction

Page 7: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

7

How does the HIV infects the body cells? HIV begins its infection of a susceptible host cell by

binding to the CD4 receptor on the host cell The genetic material of the virus, which is RNA, is

released and undergoes reverse transcription into DNA, which enters the host cell nucleus where it can be integrated into the genetic material of the cell.

Activation of the host cells results in the transcription of viral DNA into messenger RNA (mRNA), which is then translated into viral proteins.

The viral RNA and viral proteins assemble at the cell membrane into a new virus.

The virus then buds forth from the cell and is released to infect another cell.

Page 8: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

8

Treatment related to the active RNA sites The HIV DNA generates proteins that are essential

to the virus life-cycle.Medical treatment interfere or block the operation of these proteins.

Reverse Transcriptase medicines:Inhibits the transcription of the HIV RNA into the cell’s DNA

The HIV protease protein, is required to process other HIV proteins into their functional forms. Protease inhibitors medicines, act by blocking this critical maturation step.

Page 9: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

9

RNA mutations Environmental/Biological processes may

cause mutations in the HIV RNA. The mutated HIV RNA merge into the

infected cell’s DNA. The generated Amino-Acids sequence is

then altered. A different Protein is generated by the

cell. The altered protein may resist the

medical treatment!

Page 10: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

10

Mutation families The HIV RNA has a high mutation rate (a

1000 times more than a regular cell). Fast evolutionary processes causes the

best mutated viruses to increase their population in the infected body.

We’ll focus on 3 main mutation families: Resistance mutations Clade mutations Other – noise/random

Page 11: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

11

The importance of identifying the resistance mutations Selecting the best medicine

treatments Understanding the way different

medicines interacts with the HIV Understanding the functional

interpretation of the RNA sequence

Page 12: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

12

Extracting the RNA Sequence The RNA sequences are transcript into DNA

sequences. The DNA sequences then multiplied

several times A DNA sequencer ‘read’ the aligned DNA

sequences. The decision how to interpret a specific

DNA segment is based over image processing algorithms (define the segment boundaries and find the best match for the segment pattern) and isn’t deterministic!

Page 13: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

13

Sequence Alignment (from Ron Shamir’s Course)

Page 14: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

14

Page 15: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

15

Sequence Alignment Before alignmentAtaaagakagggggacagctaaaagaggctctcTTAGACACAGGAGCAGATGATACAACTCTTTGGCAGCGaCCCCGTTGTCACaATAAAAATagGGGGACAGCTAAgGGagGcTAAAAGAGGCTCTCTTAGCACACAGGMGCAGAYGAYACAGTMCTTASCAAGAAATAAACTCTTTGGCAGCGACCCCTTGTcACAATAAAAGTAGAGGGACAGCTAAGGGAKGCTACTCTTTGGCAGCGaCCCCTTGTCACAATAAAAATAGGGGACAGCTAAGGGAGGCTCACTCTTTGGcAGCGACCCCTtGTCACAATAAAAGtAGGGGGaCAGCTAAAgGAGGCTaCTnTTnGRCAGCGaCCCCTTgTCYCARtAAAAATAGGGGGGCAGRTAARGGAGGCt

After Alignment------------------------------ATAAAGAKAGGGGG-ACAG-CTAAAAGAGG------------C-GACCCC--TTGTCACAATAARAATAGGGGG-ACAG-CTAAAAGAGGACTCTTTGGCAAC-GACCCC--TTGTCACAATAAGAGTAGGGGG-ACAG-CTAAAAGAGG-CTCTTTGGCAAC-GA-CCCC-TTGTCACAGTAAAAATAGRAGG-ACAG-CTAAAAGAAGACTCTTTGGCAAC-GA-CCCC-TTGTCACAGTAAAAATAGGAGG-ACAG-CTAAAAGAAGACTCTTTGGCAAC-GA-CCCC-TTGTCACAGTAAAAATAGGAGG-ACAG-CTMAAAGAAGACTCTTTGGCAAC-GA-CCCC-TTGTCACAGTAAGAATAGGAGG-ACAG-CTAAAAGAAG

Degapping---------------------------ATAAAGAKAGGGGGACAGCTAAAAGAGGC------------CGACCCCTTGTCACAATAARAATAGGGGGACAGCTAAAAGAGGCACTCTTTGGCAACGACCCCTTGTCACAATAAGAGTAGGGGGACAGCTAAAAGAGGC-CTCTTTGGCAACGACCCCTTGTCACAGTAAAAATAGRAGGACAGCTAAAAGAAGCACTCTTTGGCAACGACCCCTTGTCACAGTAAAAATAGGAGGACAGCTAAAAGAAGCACTCTTTGGCAACGACCCCTTGTCACAGTAAAAATAGGAGGACAGCTMAAAGAAGCACTCTTTGGCAACGACCCCTTGTCACAGTAAGAATAGGAGGACAGCTAAAAGAAGCACTCTTTGGCAACGACCCCTTGTCACAGTAAGAATAGGAGGACAGCTAAAAGAAGC

Page 16: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

16

Reduction from Bio problem to CS Problem Generation of a consensus RNA sequence. For each sequence, generate a matching

binary sequence, each 1 represents a mismatch between the consensus and the original sequence, and 0 represents a match.

Now we have a binary feature vector for each sample.

We can now calculate the correlations between the mutations to the treatment and between the mutations to themselves.

Page 17: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

17

From Sequences to Mutation Matrix

Page 18: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

18

So where are the problems? Curse of dimensionality Noisy data Sequenced data are of stochastic nature Small number of samples Clades and sub-clades Vague definitions of independent

variables values. Silent mutations Talk Bio language!

Page 19: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

19

Data Overlook

Page 20: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

20

Frequencies of Mutations occurrences

Page 21: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

21

Filtering the Data Mutations that occur less than 5

times in a specific RNA index cannot considered significant (we’ll see it later in the Chi square slides)

We’ll filter all the mutations that occur less than 3 times and replace them with the consensus value.

Thus filtering much of the noise.

Page 22: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

22

Naïve clustering of Data

Clade Distribution

Treatment Distribution

120 9 12 59 8 29 215 65 147 7

Cluster Size

Total Cases

671

Clustering of 671 RNA samples using Centroid linkage

A

C

B

Treated

Non-Tr

Page 23: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

23

Feature Extraction Better to have misdetection than a

false alarm. Filter the noisy data Work within the clades Locate the mutations (features) that

are highly correlate with treatment. Now we have only few dozens of

features to work on.

Page 24: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

24

Finding mutations and treatment correlation We want to find for each RNA

index i whether P(Mut_in_i) is significantly different from P(Mut_in_ i/ Treatment).

We’ll use the CHI square distribution test for each index to find that.

Page 25: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

25

Chi Square Overview We will use the Chi-Square test to

check the probability that our observed results had came from the same statistical population as the expected (chance) results.

A probability of less than 0.05 means that the results are significant, I.e the populations are significally different .

Page 26: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

26

Chi Square Calculations Calculating the chi-square statistic

The probability Q that a X2 value calculated for an experiment with d degrees of freedom

(where d=k-1) is due to chance is:

Page 27: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

27

Example – Mutation V82A

TreatedMut 30 Observed Treated NonTreated TotalNonTreatedMut 12 Mut 30 12 42TotalTreated 77 NonMut 47 377 424TotalUnTreated 389 Total 77 389 466

Expected Treated NonTreated TotalMut 6.93991416 35.06008584 42NonMut 70.0600858 353.9399142 424Chi Statistic 10.71333 466ChiVal 9.751E-24

Page 28: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

28

Mutation Table

Clade SequenceNum Concensus Mutation AA pos Concens AAMut AA ChiVal TreatedMut UntreatedMutC 32 A G 14 lys arg <0.05 44 33C 39 G A 16 Gly Gly <0.003 17 7C 47 T C 19 leu pro <0.05 45 38C 50 A G 20 lys arg <0.001 63 37C 69 A G 26 thr thr <0.05 9 4C 96 A C 35 glu asp <0.05 50 40C 151 A G 54 ileu met (atg) <0.001 22 10C 155 A G 55 ileu val <0.05 7 2C 162 A G 57 arg arg <0.05 7 2C 203 C T 71 ala val <0.001 20 8C 211 A T/G 74 thr ser/ala <0.001 57 39C 217 C T 76 leu ser <0.05 105 117C 235 G A 82 val ile <0.05 8 26C 236 T C 82 val ala <0.001 30 12C 241 A G 84 ileu val <0.02 12 6C 259 T A 90 leu met <0.001 41 22C 276 C T 95 cys cys <0.05 106 117

Page 29: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

29

Calculating the mutations Correlations Matrix Because the treatment is a major

artifact in all the treatment mutations, we’ll have to find the correlations within the treated samples:

P(mut_A/Treat.) ~ P(mut_A/mut_b,Treat.) Our Chi-Square table will be (all in

treated cases)– Mut B Non -

Mut BTotal

Mut A

Non Mut A

Total

Page 30: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

30

Example correlation results

Total Samples Total Treated466 195

First site Index Second Site Index

Chi Value Number of A mutations

Number of B mutations

Shared number of Mutations

Probability of Mutation in A

Probability of Mutation in A when B is also mutated

19 50 0 44 63 28 0.2256 0.444419 259 0 44 41 20 0.2256 0.487850 96 0.0001 63 50 27 0.3231 0.5469 39 0.0001 9 17 4 0.0462 0.2353

151 203 0 22 20 12 0.1128 0.6

Page 31: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

31

Example – Mutation D30N D30N is an important resistance mutation.

But it appears at frequency of 0.0258 in the C clade compare with 0.0945 in the B clade, What’s the explanation for this?

Correlation analysis reveals that in clade B, D30N is highly correlated with other resistance Mutations. In clade C it’s not.

One assumption can be that the Clade B structure can influence the connections between resistance mutations.

Page 32: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

32

Using CART to find mutations interactions A regression tree is a sequence of

questions that can be answered as yes or no, plus a set of fitted response values. Each question asks whether a predictor satisfies a given condition.

In our research we will ask whether a mutation i (1 value at i index), predicts the existence of mutation j (1 value at j index).

This way we can identify relationships between the mutations.

Page 33: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

33

CART results – D95M

Page 34: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

34

Using clustering to find mutations patterns

We’ll cluster the mutation sample vectors in order to locate mutation patterns.

Our distance function will be the sum of differences between two samples.

We’ll use the ward method to cluster nodes.

Page 35: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

35

Ward Clustering Centroid linkage uses the distance between the

centroids of the two groups:

Where and Xs defined similarly.

Ward linkage uses the incremental sum of squares; that is, the increase in the total

within-group sum of squares as a result of joining groups r and s. It is given by

Where drs is the distance between cluster r and cluster s defined in the Centroid linkage. The

within-group sum of squares of a cluster is defined as the sum of the squares of the distance between all objects in the cluster and the centroid of the cluster.

Page 36: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

36

Cluster results

Page 37: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

37

Using clustering to find mutations patterns When we filter the mutation only to significant ones, we

can see mutations pattern as a result of clustering -

Mutations

Samples

Page 38: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

38

What’s next? Biological interpretation of the findings: Locating Amino-Acid and protein

functional changes. May lead to better understand of resistance behavior.

Identifying new resistance mutations and specific treatment/resistance correlations.

Focus on specific treatments, apply additional research in order to investigate the efficiency of such treatment.

Page 39: 1 Mapping Mutations in HIV RNA By Nimrod Bar-Yaakov nimrod-b@orbotech.com With co-operation of Dr. Zehava Grossman of the Israel’s Multi-Center AIDS Study

39

The End!

Thank you for listening