needleman-wunch algorithm harshita
TRANSCRIPT
NEEDLEMAN WUNCH ALGORITHM
HARSHITA BHAWSARM.SC LIFE SCIENCE
NIT ROURKELA
What is Needleman-Wunsch algorithm?
The Needleman–Wunsch algorithm is an algorithm used in bioinformatics to align protein or nucleotide sequences.
It performs a global alignment on two sequences. The algorithm was developed by Saul B. Needleman and
Christian D. Wunsch and published in 1970. It is an example of Dynamic Programming and It was one of the
first applications of dynamic programming to compare biological sequences.
Even for relatively short sequences, there are lots of possible alignments. But it will take a long time to assess each alignment
one-by-one , to find the best alignment.
The Needleman-Wunsch algorithm saves us the trouble of assessing all the many possible alignments to find the best one.
The N-W algorithm takes time proportion to n2 to find the best alignment of two sequences that are both n letters long.
.
Alignment methods
Alignment:- Arranging the sequence of DNA/RNA or PROTEIN to identify similarities.
2 types:- Global and local sequence alignment methods
Global : Needleman-Wunch algorithm Local : Smith-Waterman algorithm
These two dynamic programming alignment algorithm are guaranteed to give OPTIMAL alignments
Goals of sequence alignment
Measure the similarity
Observe patterns of sequence conservation between related biological species and variability of sequences over time.
Infer evolutionary relationships.
Algorithm
Steps
1. Initialization
2 Matrix fill or scoring
3. Traceback and alignment
RULES
Put the gap in the first Fill the first column and last row with gap values Value of Box beside + Gap value Value of Box bottom + Gap value Diagonal value + {match/mismatch}
Lets see an example….TWO SEQUENCES WILL BE ALIGNED:- GATC (#SEQUENCE 1)GAGC (#SEQUENCE 2)
InitilizationCreate Matrix with M + 1 columns and N + 1 rows.M= sequence 1N= sequence 2
0
C
G
A
G
- - G A T C
Matrix FillFill the first column and For match=+1; Mismatch= -1; Gap= -2 last row with gap valuesWe putting the values by adding the gap valuesWith the beside box
0 -2 -4
C
G
A
G
- - G A T C
For match=+1; Mismatch= -1; Gap= -2
-8
-6
-4
-2
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
Scoring Parameters Value of Box beside + Gap value match=+1;
Mismatch= -1; Value of Box bottom + Gap value Gap= -2 Diagonal value + {match/mismatch}
-8
-6
-4
-2
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
-4 -4+1
1
Scoring match=+1; Mismatch= -1; Gap= -2
-8
-6
-4
-2 1 -1 -3 -5
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
Continuing the procedure…
match= +1; Mismatch= -1; Gap= -2
-8 -5 -2 -1 2
-6 -3 0 1 -1
-4 -1 2 0 -2
-2 1 -1 -3 -5
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
Traceback Step After scoring is done we get the maximum global alignment score
at the end. It may be in negative or positive. The trace back step will determine the actual alingment(s) that
result in the maximum score. In this step we need to come back towards zero. Since we have kept the pointers to all the predecessors, so the traceback step become simple.
-8 -5 -2 -1 2
-6 -3 0 1 -1
-4 -1 2 0 -2
-2 1 -1 -3 -5
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
we follow the pointers
-8 -5 -2 -1 2
-6 -3 0 1 -1
-4 -1 2 0 -2
-2 1 -1 -3 -5
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
GAGC It`s the optimal alignment GA T C
-8 -5 -2 -1 2
-6 -3 0 1 -1
-4 -1 2 0 -2
-2 1 -1 -3 -5
0 -2 -4 -6 -8
C
G
A
G
-
- G A T C
Other example…AGC and AACC
For alignment we need to look at the pointers:- = sequence = gaps
We got 3 optimal alignment:-A-GC AG-C -AGCAAAC AACC AACC
A
G C
0 -2 -4 -6
A -2 1 -1 -3
A -4 -1 0 -2
C -6 -3 -2 -1
C -8 -5 -4 -1
Checking..!
We can also check our alignment is right or not, by doing scoring manually.
Eg:- GAGC A-GC
GATC AACC +1+1-1+1 +1-2-1+1 = 2 = -1This score should must be equal to the maximum score of traceback.If it is then it`s a perfect alingment.