ribra–an error-tolerant algorithm for the nmr backbone assignment problem

43
Jia-Ming Chang 0509 Graph Algorithms and Their Applications to Bioinformatics 1/40

Upload: jia-ming-chang

Post on 27-Jun-2015

932 views

Category:

Education


0 download

TRANSCRIPT

Page 1: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Jia-Ming Chang 0509Graph Algorithms and Their Applications to Bioinformatics

1/40

Page 2: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Determine Protein Structure X-ray

波長約 1 Å 長度接近原子間的距離 研究結晶的狀態的分子行為 定出其晶體結構,也包含蛋白質體結構

X-ray與結構生物學 利用 X-ray繞射法分析高度純化結晶的蛋白質的每個基團和原子的空間定位。

Nuclear magnetic resonance (NMR)NMR是涉及原子核吸收的過程。因為對某些原子核而言,具有自旋和磁矩的性質。因此,若暴露於強磁場中原子核會吸收電磁輻射,這是由磁場誘導而發生能階分裂的結果。科學家並發現,分子環境會影響在磁場中原子核的無線電波的吸收,利用這種特性來分析分子的結構

AVANCE 800 AV IBMS, Sinica 2/40

Page 3: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

NMR – Nuclear Spin (1/5)

3/40

Page 4: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

NMR – Nuclear Spin (2/5)

4/40

Page 5: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

NMR - Magnetic Field (3/5)

5/40

Page 6: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

NMR – Resonance (4/5)

6/40

Page 7: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

NMR – Chemical Shift (5/5)

7/40

Page 8: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Find out Chemical Shift for Each Atom• Backbone: Ca, Cb, C’, N, NH

HSQC, CBCANH, CBCACONH

C CN

H H

C

C

C

H2

H2

H3

Chemical Shift Assignment (1/2)

One amino acid

8/40

Page 9: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Chemical Shift Assignment (2/2)

H-C-H

H-CC-H

H

-N-C-C-N-C-C-N-C-C-N-C-C-

O

O

O

O

H H

H

H

H O

H

H-C-H

CH3

Backbone

ppm18-23

19-24

16-20

17-23

31-34

55-60

CH3 30-35

9/40

Page 10: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

HSQC Spectra HSQC peaks (1 chemical shifts for an amino acid)

HH NN IntensityIntensity

8.1098.109 118.60118.60 6592003265920032

HSQC

10/40

Page 11: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

CBCA(CO)NH Spectra CBCA(CO)NH peaks (2 chemical shifts for one amino

acid) HH NN CC IntensityIntensity

8.1168.116 118.25118.25 16.3716.37 7923881179238811

8.1098.109 118.60118.60 36.5236.52 6592003265920032

11/40

Page 12: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

CBCANH Spectra CBCANH peaks (4 chemical shifts for one amino acid)

Ca (+), Cb (-)

HH NN CC Intensity Intensity

8.1168.116 118.25118.25 16.3716.37 7923881179238811

8.1098.109 118.60118.60 36.5236.52 -65920032-65920032

8.1178.117 118.90118.90 61.5861.58 -51223894-51223894

8.1198.119 117.25117.25 57.4257.42 109928374109928374

++

--

12/40

Page 13: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

A Dataset Example

N

HHSQC

HNCACB

CBCA(CO)NH

13/40

Page 14: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

A Perfect Spin System Group

NN HH CC IntensityIntensity

113.293113.293 7.8977.897 56.29456.294 1.64325e+0081.64325e+008

113.293113.293 7.8977.897 27.85327.853 1.08099e+0081.08099e+008

CCaai-1i-1 CCbb

i-1i-1 CCaaii CCbb

ii

56.294

28.165

62.544 68.483NN HH CC IntensityIntensity

113.293113.293 7.927.92 62.54462.544 8.52851e+0078.52851e+007

113.293113.293 7.927.92 56.29456.294 4.71331e+0074.71331e+007

113.293113.293 7.927.92 68.48368.483 -8.54121e+007-8.54121e+007

113.293113.293 7.927.92 28.16528.165 -3.49346e+007-3.49346e+007

CBCA(CO)NH

CBCANH

i -1

i -1

Ca

Ca

Cb

Cb

14/40

Page 15: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Coding

Translate the target protein sequence and spin systems into coding sequences based on the following table.

Atreya, H.S., K.V.R. Chary, and G. Govil, Automated NMR assignments of proteins for high throughput structure determination: TATAPRO II. Current Science, 2002. 83(11): p. 1372-1376.

15/40

Page 16: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Backbone Assignment

GoalAssign chemical shifts to N, NH, Ca (and

Cb) along the protein backbone.

General approachesGenerate spin systems

○ A spin system: an amino acid with known chemical shifts on its N, NH, Ca (and Cb).

Link spin systems

16/40

Page 17: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

17 /40

Backbone Assignment

DGRIGEIKGRKTLATPAVRRLAMENNIKLS

Page 18: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

18 /40

Blind Men’s Elephant We cannot directly “see” the positions of

these atoms (the 3D structure) But we can measure a set of parameters

(with constraints) on these atoms,which can help us infer their coordinates

Each experiment can only determine a subset of parameters (with noises)

To combine the parameters of different experiments we need to stitch them together

Page 19: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

A Peculiar Parking Lot (valet parking) Information you have: The make of your car, the car parked in front of you (approximately). Together with others, try to identify as many cars as possible (maximizing the overall satisfaction).

19 /43

Page 20: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Ambiguities

All 4 point experiments are mixed together

All 2 point experiments are mixed together

Each spin system can be mapped to several amino acids in the protein sequence

False positives, false negatives

20/40

Page 21: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Multiple Candidates One spin system maybe assign to many places

of a protein sequence. Spin system(SS)

Protein Sequence: AKFERQHMDSSTSRNLTKDR

NN HH CCaai-1i-1 CCbb

i-1i-1 CCaaii CCbb

ii

119.7119.7 8.848.84 58.458.4 32.732.7 56.356.3 40.840.8

SS SS SS SSPossible place

21/40

Page 22: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

False Positives and False Negatives False positives

Noise with high intensityProduce fake spin systems

False negativesPeaks with low intensityMissing peaks

In real wet-lab data, nearly 50% are noises (false positive).

22/40

Page 23: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

False Positive & False NegativePerfect

False Negative

False Positive

N

HHSQC

HNCACB

CBCA(CO)NH

23/40

Page 24: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Ambiguous Spin System

NN HH CC IntensityIntensity

106.9106.9 8.878.87 54.9254.92 423879423879

106.9106.9 8.878.87 40.3540.35 524522524522

NN HH CC IntensityIntensity

106.91106.91 8.858.85 59.759.7 235673235673

106.92106.92 8.868.86 54.9354.93 346234346234

106.91106.91 8.868.86 61.561.5 432432432432

106.91106.91 8.858.85 40.3140.31 -335759-335759

106.92106.92 8.868.86 30.530.5 -483759-483759

NN HH CCaai-1i-1 CCbb

i-1i-1 CCaaii CCbb

ii

106.1106.1 8.858.85 54.9354.93 40.3140.31 59.759.7 30.530.5

106.1106.1 8.858.85 61.561.5 40.3140.31 59.759.7 30.530.5

Two possible spin systems

24/40

Page 25: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Spin System Group Nearest Neighboring (TATAPRO, RIBRA, GASA)

N

HHSQC

HNCACB

CBCA(CO)NH

25/40

Page 26: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Spin System Linking

GoalLink spin system as long as possible.

Constraints Each spin system is uniquely assigned to a

position of the target protein sequence.Two spin systems are linked only if the

chemical shift differences of their intra- and inter- residues are less than the predefined thresholds.

26/40

Page 27: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Previous Approaches Constrained bipartite matching problem*

Can’t deal with ambiguous link Legal matching Illegal matching under constraints

*Xu Y, Xu D, Kim D, Olman V, Razumovskaya J, Jiang T. Automated assignment of backbone NMR peaks using constrained bipartite matching. Computing in Science & Engineering 2002;4(1):50-62.

27/40

Page 28: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Naatural Language Processing ─ Noises or Ambiguity ?

Speech recognition : Homopone selection

台 北 市 一 位 小 孩 走 失 了

台 北 市 小 孩台 北 適 宜 走 失 事 宜 一 位 一 味 移 位

28/40

Page 29: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

An Error-Tolerant Algorithm

29/40

Page 30: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Phrase, Sentence Combination

30/40

Page 31: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Spin System Positioning

55.266 38.675 44.555 0

44.417 0 55.043 30.04

44.417 0 30.665 28.72

55356 29.782 60.044 37.541

D 50 G 10 R 40 I 50|51

55.266 38.675 44.555 0 => 50 10

44.417 0 55.043 30.04 =>10 40

44.417 0 30.665 28.72 =>10 40

55356 29.782 60.044 37.541 => 40 50

We assign spin system groups to a protein We assign spin system groups to a protein sequence according to their codes. sequence according to their codes.

Spin System

31/40

Page 32: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Link Spin System groups

Segment 3

Segment 2

Segment 155.266 38.675 44.555 0

44.417 0 55.043 30.04

44.417 0 30.665 28.72

55356 29.782 60.044 37.541

D G R I

32/40

Page 33: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Iterative Concatenation DGRI….FKJJREKL

….

Step n Segment 99

1

2

….

56

Spin Systems

1

2

2

47

1Step156…

Step2 Segment 1

Segment 2

Segment 31…

Step n-1 Segment 78 Segment 79…

33/40

Page 34: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Conflict Segments

DGRIDGRIGEIKGRKTLATPAVRRLAMENNIKLSGEIKGRKTLATPAVRRLAMENNIKLSSegment 78

Segment 71

Segment 79

Segment 99 Segment 98

Segment 97

Two kinds of conflict segments

Overlap (e.g. segment 71, segment 99)

Use the same spin system (e.g. both segment 78 and segment 79 contain spin system 1)

34/40

Page 35: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Independent Set

Subset S of vertices such that no two vertices in S are connected

www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt 35/40

Page 36: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Independent Set

Subset S of vertices such that no two vertices in S are connected

www.cs.rochester.edu/~stefanko/Teaching/06CS282/06-CSC282-17.ppt 36/40

Page 37: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

A Graph Model for Spin System Linking

G(V,E) V: a set of nodes (segments). E: (u, v), u, v V, u and v are conflict.

Goal Assign as many non-conflict segments

as possible => find the maximum independent set of G.

37/40

Page 38: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

An Example of G

Seq. : Seq. : GEIKGRKTLATPAVRRLAMENNIKLSEGEIKGRKTLATPAVRRLAMENNIKLSE

Segment1: SP12->SP13->SP14

Segment2: SP9->SP13->SP20->SP4

Segment3: SP8->SP15->SP21

Segment4: SP7->SP1->SP15->SP3

Seg1 Seg3

Seg4 Seg2

Seg1

Seg3

Seg2

Seg4

SP13

SP15

Overlap

Overlap

38/40

Page 39: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Segment weight

The larger length of segment is, the higher weight of segment is.

The less frequency of segment is, the higher of segment is.

39/40

Page 40: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Find Maximum Weight Independent Set of G (1/2)

Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent Sets by Excluding Subgraphs. BIR, 1992. 32(2).

VN(v)

Head_N(v)

40/40

Page 41: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Find Maximum Weight Independent Set of G (2/2)

Boppana, R. and M.M. Halldόrsson, Approximating Maximum Independent Sets by Excluding Subgraphs. BIR, 1992. 32(2).

V

41/40

Page 42: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

An Iterative Approach

We perform spin system generation and linking iteratively.

Three stages. Perfect spin systems Weak false negative spin systems Severe false negative spin systems

42/40

Page 43: RIBRA–an error-tolerant algorithm for the NMR backbone assignment problem

Segment Extension DGRDGRGEKGRKTLATPAVRRLAMENNIKLSGEKGRKTLATPAVRRLAMENNIKLS

MaxIndSetMaxIndSet

77 99‘ 97‘

99 97

45

23

263129

3233

24

2728

28

77

71

78

99‘

97‘

99 97

43/40