bioinformatics dr. aladdin hamwiehkhalid al-shamaa abdulqader jighly 2010-2011 lecture 1...

Post on 19-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Bioinformatics

Dr. Aladdin Hamwieh Khalid Al-shamaaAbdulqader Jighly

2010-2011

Lecture 1Introduction

Aleppo UniversityFaculty of technical engineeringDepartment of Biotechnology

Main Lines• Definition• Bioinformatics areas• Bioinformatics data– Data types– Applications for these data

• Next generation sequencing• Bioinformatics algorithms• Joint international programming

initiatives

Definition• Bioinformatics is the field of science in

which biology, computer science, and information technology merge into a single discipline.

• Bioinformatics is the science of managing and analyzing biological data using advanced computing techniques

• Bioinformatics applies principles of information science to make the vast, diverse, and complex life sciences data more understandable and useful.

Definition• There are two extremes in

bioinformatics work– Tool users (biologists): know how to

press the buttons and the biology but have no clue what happens inside the program

– Tool shapers (informaticians): know the algorithms and how the tool works but have no clue about the biology

Bioinformatics areas

• Molecular sequence analysis1. Sequence alignment2. Sequence database searching3. Motif discovery4. Gene and promoter finding5. Reconstruction of evolutionary

relationships6. Genome assembly and

comparison

Bioinformatics areas

• Molecular structural analysis1. Protein structure analysis2. Nucleic acid structure analysis3. Comparison4. Classification5. prediction

Bioinformatics areas

• Molecular functional analysis1. gene expression profiling2. Protein–protein interaction

prediction3. protein sub-cellular localization

prediction4. Metabolic pathway reconstruction5. simulation

Bioinformatics data

There is different data types usually used in

bioinformatics

The same data may be used in different

areas

Data types• DNA sequences• RNA sequences• Expression (microarray) profile• Proteome (x-ray, NMR) profile• Metabolome profile• Haplotype profile• Phenotype profile

1 -DNA Sequences• Simple sequence analysis– Database searching– Pairwise and multiple analysis

• Regulatory regions • Gene finding• Whole genome annotation• Comparative genomics

2 -RNAs• Splice variants• Tissue specific expression• 2D structure• 3D structure• Single gene analysis• Microarray

2D and 3D structure of tRNA

2D and 3D structure of rRNA

Microarray

• 20,000 to 60,000 short DNA probes of specified sequences are orderly tethered on a small slide. Each probe corresponds to a particular short section of a gene.

• DNA microarrays measure the RNA abundance with either 1 channel (one color) or 2 channels (two colors).

• Stanford microarrays measure by competitive hybridization the relative expression under a given condition (fluorescent red dye Cy5) compared to its control (labeled with a green fluorescent dye, Cy3) (Two channels)

• Affymetrix GeneChip has 1 channel and use either fluorescent red dye Cy5 or green fluorescent dye, Cy3

Microarray

3 -Proteins• Protein sequences analysis– Database searching– Pairwise and multiple analysis

• 2D structure• 3D structure• Classification of proteins families• Protein arrays

3D structure

Animation

4- Metabolome and molecular biology

• Metabolic pathways• Regulatory networks

Helps to understand systems biology

5- Haplotype• Molecular Markers– RFLP– RAPD– SSR– ISSR– AFLP– DArT

– SNP– ….

SNP

6 -Phenotype• Morphological data• Physiological data• Stresses tolerance• Pathogenic infections• Diseases resistance • Cancers types• …..

Haplotype & Phenotype

Next Generation Sequencing

SMRT Helicos AB SOLiD

IlluminaSolexa

RocheGSFLX

ABI 3730 Sequencing Machine

Target release 2010

2008 2007 2006 2004 2000 Launched

964 28 25-35 35-70 250-400 800-1100 Read lengthNA 85M 170M 120M 400K 96 Reads/runNA 2 GB 6 GB 6 GB 100 MB 0.1 MB Throughput

per runNA NA $5.81 k $5.97 k $84.39 High cost Cost/Mb

Short reads assembly problems

Short reads assembly problems

Short reads assembly problems

• String algorithms• Dynamic programming• Machine learning (NN, k-NN, SVM, GA, ..)• Markov chain models• Hidden Markov models• Markov Chain Monte Carlo (MCMC) algorithms• Stochastic context free grammars• EM algorithms• Gibbs sampling• Clustering• Tree algorithms (suffix trees)• Graph algorithms• Text analysis• Hybrid/combinatorial techniques• ….

Algorithms in bioinformatics

Joint international programming initiatives

• Bioperlhttp://www.bioperl.org/wiki/Main_Page

• Biopythonhttp://www.biopython.org/

• BioTclhttp://wiki.tcl.tk/12367

• BioJavawww.biojava.org/wiki/Main_Page

Thank You

top related