reconstruction of dna sequencing by hybridization ji-hong zhang, ling-yun wu and xiang-sun zhang...

20
Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang- Sun Zhang [email protected] Institute of Applied Mathematics, AMSS, CAS

Upload: camryn-alban

Post on 01-Apr-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Reconstruction of DNA sequencing by hybridization

Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang

[email protected] of Applied Mathematics, AMSS, CAS

Page 2: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Bioinformatics

Human Genome Project Large molecule data in biology, such as DNA

and protein Knowledge of mathematics, computer

science, information science, physics, system science, management science as well as biology

Genomics DNA sequencing Gene prediction Sequence alignment

Page 3: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

DNA Sequencing

…ACGTGACTGAGGACCGTGCGACTGAGACTGACTGGGTCTAGCTAGACTACGTTTTATATATATATACGTCGTCGTACTGATGACTAGATTACAGACTGATTTAGATACCTGACTGATTTTAAAAAAATATT…

Page 4: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

DNA Sequencing (shotgun)

cut many times at random

known dist forward-reverse linked reads

~500 bp~500 bp

target DNA

Page 5: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

DNA Sequencing (SBH)

DNA array (DNA chip) with 43 probes Target DNA: AAATGCG

Page 6: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Sequencing by Hybridization

Hybridize target to array containing a spot for each possible k-tuple (k-mer)The spectrum of a sequence multi-set of all its k-long substrings (k-tuples)

Goal reconstruct the sequence from its spectrum

Pevzner (1989): reconstruction is polynomialBut …

Page 7: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Uniqueness of Reconstruction

Different sequences can have the same spectrum: ACT, CTA, TAC

ACTAC TACTA

Non-uniqueness Probability

Page 8: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Experiment Errors

Hybridization experiments are error proneFalse negative error k-tuple appears in target DNA but does not appe

ar in its measured spectrum Repetition of k-tuple

False positive error k-tuple does not appear in target DNA but does

appear in its measured spectrum

Page 9: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Sequencing by Hybridization

Target DNA ……TTTTACGC……

Spectrum

Errors: Positive (misread) / Negative (missing, repetition)

TTT TTT TTA TAC ACG CGC

Ideal case

TTT TTT TTA TAC ACG CGC TGA

With errors

Page 10: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,
Page 11: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

SBH Reconstruction Problem

In the case of error-free SBH experiments A desired solution of SBH is just a feasible soluti

on including all k-tuple in the specturmFor the general case There is no additional information except spectr

um and the length of target DNA A feasible solution composed of a maximum car

dinality subset of the spectrum shall be a reasonable desired solution

Page 12: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

SBH Reconstruction Problem

Ideal case (without repetitions and errors) Equivalent to finding an Eulerian path in a corre

sponding graph (Pevzner, 1989) A linear time algorithm (Fleischner, 1990)

General case is NP-hard problem Branch and bound Heuristics

Extensions PSBH (Positional SBH) SBH with length error

Page 13: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Motivations

Give some criteria which can determine the most possible k-tuples at both ends and in the middle of all possible reconstructions of the target DNA

These criterions greatly reduce ambiguities in the reconstruction of DNA

Transform the negative errors into the positive errors

These means enables us to handle both types of errors easily

Separate the repetitions from both type of errors

Page 14: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Methods

Estimate the number of k-tuples that does not occur in a solution Adjacency matrix (connection matrix) Give a lower bound of k-tuples that does not occ

ur in all solutions from k-tuple i to j

Page 15: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Methods

Determine the most possible k-tuples at both ends Reconstruct from the most possible end pairs to

get an upper bound of SBH problem Purge the end pairs that can not have better sol

ution than current upper bound

Page 16: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Methods

Transform the negative errors into the positive errors Artificial k-tuple

Fill in all the possible gaps due to false negative error Negative error level

The maximal number of allowed consecutively missing k-tuples

Reduce the number of artificial k-tuples

Page 17: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Computational Experiments

109 DNA sequence from GenBankSimulate the SBH experimentsError models Randomly (probabilistic model) Systematically (one base mismatched m

odel)

Page 18: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,
Page 19: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,
Page 20: Reconstruction of DNA sequencing by hybridization Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang ZHANGroup@aporc.org Institute of Applied Mathematics,

Conclusions

Ideal case (without repetitions and errors) can be solved in polynomial time (Pevzner, 1989)General case is NP-hard problemDesign efficient algorithms

Ji-Hong Zhang, Ling-Yun Wu and Xiang-Sun Zhang. A new approach to the reconstruction of DNA sequencing by hybridization. Bioinformatics, vol 19(1), pages 14-21, 2003.

Xiang-Sun Zhang, Ji-Hong Zhang and Ling-Yun Wu. Combinatorial optimization problems in the positional DNA sequencing by hybridization and its algorithms. System Sciences and Mathematics, vol 3, 2002. (in Chinese)

Ling-Yun Wu, Ji-Hong Zhang and Xiang-Sun Zhang. Application of neural networks in the reconstruction of DNA sequencing by hybridization. In Proceedings of the 4th ISORA, 2002.