signal processing of dna and protein sequences

Post on 25-May-2015

824 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Nitesh Kumar Singh

SIGNAL PROCESSING OF PROTEIN SEQUENCES AND

DNA

Signal -Signal is the flow of Information.Mathematically, Signals are the functions of

the independent variable, such as time ( For example speech signal ), or position ( for example image ).

Biomedical Signal –

Electrical signals generated in the a biological system (human or animal) or originating from a physiologic process due to electrochemical changes accompanied by the conduction of signals. Examples are EEG, ECG.

Signal Processing Methods –

Analog or Continuous Time Signal Processing

Digital or Discrete Time Signal Processing

Advantages of DSP over ASP -

Stable, robust, accurate.Flexibility and up-gradation.easily stored.Easy operation in short timeMultiplexing done by Integrated Service

Digital Network (ISDN)

DSP In Biomedical Signals -

Processing of biomedical signals in biological as well as synthetic biological world. Signals are then recorded and processed digitally.

Example : EEG, ECG etc.DSP in medical imaging. Example : CT scanner,

ultrasound, endoscopes etc.Manufacturing healthcare instruments. Example :

heart rate meter, aspect bispectral index.For diagnostic purposes, like analyzing the signals of

heartbeat to check the abnormality and so like, the proteins sequences to study the genomic of living beings.

Biomedical application domain using DSP -

Information gathering : Measurement of phenomena to understand the biological system.

Diagnosis : Detection of the malfunction, abnormality, pathology.

Monitoring : To obtain periodic or continuous information about the biological system.

Therapy and Control : Modify the behavior of the system and ensure the result.

Evaluation : Objective analysis, i.e. proof of performance, quality control, effect of treatment.

Processing of Biomedical Signals -

Transducers

Amplifiers and Filters

Analog to Digital conversion

Filtering to remove artifacts

Detection of events and components

Analysis of events and waves; Feature extraction

Pattern recognition, classification and

diagnostic decisions

Computer aided diagnostic therapy

Biomedical

signals

Sign

al

proc

essi

ng

Signal

processingSignal processing

Signal processing

Signal Data Acquisition

IN THE GENOMICS WORLD

DNA and proteins are mathematically represented in ‘character strings’, in which each character is a letter of an alphabet.

For e.g., DNA has alphabet size of 4 and has the letters A, T, C and G.

Protein has alphabet size of 20.

REVISING SOME BIOLOGICAL FUNDAMENTALS

DNA :It is made up of many linked smaller

components, called Nucleotides.Each nucleotides is of 4 types, designated by A,

G, T, C with ends either being 3’ or 5’. 3’ end is linked to 5’ and vica-versa for a strong

covalent bond.Always read in a specific direction, from left to

right5’ 3’

Cont.

DNA occurs in pair of stands.Each pair being complementary to each other.The nucleotide chains are bonded by hydrogen

bond with

A = T

C GThe 2 stands in a DNA runs opposite to each

other

CENTRAL DOGMA

Each DNA is made up of 2 types of regions : Genes and intergenic spaces.

Gene contain the information of the proteins.Each gene is responsible for the production of

protein.A gene, further has 2 sub-regions : Introns and

Exons.Genes are first transcribed into single stranded

RNA or mRNA.Introns from RNA are then removed by the

process of splicing.

Cont.After splicing, each mRNA is divided into 3

adjacent bases.Each base is called a Codon.

E.g., AGT, AAC, TGC, TAC, etc.A codon identifies an amino acid which defines

a protein.There are about 64 possible codons, but only 20

amino acids.Many codons can define 1 single amino acid

(many-to-one)

Cont.

The process of conversion of mRNA to protein is called as translation.

Translation is aided by an adopter molecules, called transfer RNA or tRNA.

DNA SEQUENCES AND DSP

The macromolecular biological sequences corresponding to chains of nucleotides or amino acids is done by considering them to be strings of characters “A,” “T,” “C,” and “G.” In DSP of these sequences, the characters are assigned a numerical values.

Suppose, we assign number a to character ‘A’, t to character ‘T’, c to character ‘C’, and g to character ‘G’ where a, t, c and g are complex numbers.

Cont.If, we take ‘ t = a* ’ and ‘ g = c* ’

We can get a complementary DNA sequence by :

We can also obtain a sequences of proteins by assigning numerical values to the amino acids.

Indicator SequenceThe indicator sequence of adenine of a DNA

sequence is defined as:

Where , adenine

And, DNA sequenceSimilarly, we can obtain for the rest 3 bases

Cont.

The total spectrum of a symbolic sequence is often defined as the squared modulus of the DFT’s of the indicator sequences, that is:

Spectral Envelope

Consider the n × 4 matrix,

and the vector of real weights,

The sequence z = uw then corresponds to the mapping of

A a, C c, G g, t T

DNA walk

It is a graphical representation of DNA sequence, termed as “fractal landscape” or “DNA walk”.

random walk model, a walker moves either up ( u(i) = +1) or down ( u(i) = −1) one unit length for each step i of the walk.

uncorrelated walk, the direction of each step is independent of the previous steps.

correlated random walk, the direction of each step depends on the history (“memory”) of the walker.

Cont.

The DNA walk is defined by the rule that the walker steps up ( u(i) = +1) if a pyrimidine occurs at position a linear distance i along the DNA chain, while the walker steps down ( u(i) = −1) if a purine occurs at position i.

This provides degree of correlation in the base pair sequence, which is directly visualized by calculating the “net displacement” of the walker after number of steps.

Gene Prediction

Characteristics of protein coding DNA regions:base sequences in the protein-coding regions of

DNA molecules have a period-3 component because of the codon structure involved in the translation of base sequences into amino acids.

Eg, For eucaryotes (cells with nucleus) this periodicity has mostly been observed within the exons and not within the introns.

Cont.

Filtering:

The filtering of the fragment of the DNA sequence is done with the help of IIR Antinotch Filter

Cont.

DNA Spectrogram:the appearance of spectrograms provides

significant information about signals.

provide local frequency information for all four bases defined by displaying the resulting three magnitudes by superposition of the corresponding three primary colors

red for x, green for y, blue for z

Cont.

Cont.

Cont.

Identification of protein coding DNA region:First, DFT’s are calculated for different bases by

the formula of

with k = N/3, that:

W=aA+tT+cC+gG.

Color coding and color map approach

Since, Number of primary colors is same as the number of the coding reading frames, color-coding scheme is applied. In this,

the value Θ = 0B is assigned to color RED

the value Θ = 120B is assigned to color BLUE

the value Θ = -120B is assigned to color GREEN

Cont. In-between values are color-coded in a linear manner in

which the three axes labeled R, G, and B correspond to the primary colors red, green, and blue.

Cont.In color map, the intensity is modulated by the square

magnitude multiplied by 700 and clipped to the interval (0, 1).

DisadvantagesThe obstacles involved include large amounts of data,

lacking a complete knowledge of the genome length a priori, and recognizing nucleotide symbol identity with complete accuracy.

These impediments are typical of ones encountered in standard telecommunications problems.

Using Fourier transforms for mapping, the mapping may either expose or hide some frequency information.

Furthermore, there might be no biochemical meaning for the ordering and arithmetic structure that result from the symbolic to numeric mapping.

Conclusion -Signal processing-based computational and visual tools

are meant to synergistically complement character-string-domain tools that have successfully been used for many years by computer scientists.

The assignment of optimized, complex numerical values to nucleotides and amino acids provides a new computational framework, which may also result in new techniques for the solution of useful problems in bioinformatics, including sequence alignment, macromolecular structure analysis, and phylogeny.

field of computer science, bioinformatics, has emerged, focusing on the use of computers for efficiently deriving, storing, and analyzing these character strings to help solve problems in molecular biology

THANK YOU!!

top related