bioinformatics simple

37
1 Introduction to Bioinformatics

Upload: nadeem-akhter

Post on 28-Aug-2014

114 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: bioinformatics simple

1

Introduction to Bioinformatics

Page 2: bioinformatics simple

2

Science of collecting, analyzing and conceptualizing biological data by implication of informatics techniques.

Bioinformatics

Biology

Informa-tics

Bioinformatics

Page 3: bioinformatics simple

3

What is Bioinformatics

BiologicalData

ComputerAnalysis+

Mouse Genome: 2.5 billion base pairsHuman Genome: 3 billion base pairs

Page 4: bioinformatics simple

4

Manage biological information organize biological information using databases Process, analyze, and visualize biological data Share biological information to the public using the

Internet.

Goals of Bioinformatics

Page 5: bioinformatics simple

5

Bio – informatics Bioinformatics is conceptualizing biology in

terms of molecules (in the sense of physical-chemistry) applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.

Bioinformatics is a practical discipline with many applications.

Definition

Page 6: bioinformatics simple

6

Computational biology

Bioinformatics

Systems biology Genomics

Bioinformatics

Page 7: bioinformatics simple

7

Biological Information Central Dogma

of Molecular Biology DNA -> RNA -> Protein -> Phenotype -> DNA

Molecules Sequence, Structure, Function,

Interaction Processes Mechanism, Specificity,

Regulation

Central Paradigmfor Bioinformatics

Genomic Sequence Information -> mRNA (level) -> Protein Sequence -> Protein Structure -> Protein Function -> Protein Interaction -> Phenotype

Large Amounts of Information Statistical Computer Processing

Page 8: bioinformatics simple

8

Methods of analyzing

data Systems Analysis

Information Theory

Graph Theory

Robotics

Algorithms

Artificial IntelligenceStatistics

Page 9: bioinformatics simple

9

Domains of

bioinformaticsBio-

informatist

Development of new softwareAlgorithm

s

Bio-informaticians.

Using different algorithms

and computer software

Page 10: bioinformatics simple

10

Could not have been achieved without bioinformatics  Goals 3 billion DNA subunits Discover all the human genes Make them accessible for further biological study

then ?

Need to bring together and store vast amounts of information from

Lab equipment and experiments Computer Analysis Human Analysis Make visible to the world’s scientists

Human genome project

Page 11: bioinformatics simple

11

How to analyze

information Data –Management. –Analysis. –Derive Hypothesis. –Design and Implement an in silico

experiment. –Confirm in the wet lab.

Page 12: bioinformatics simple

12

Find an answer quickly Most in silico biology is faster than in vitro 2. Massive amounts of data to analyze Need to make use of all information Not possible to do analysis by hand Can’t organize and store information only using lab

note books• Automation is key However! Verification ?

Why bioinformatics

Page 13: bioinformatics simple

1. Computational biology- Computing methods for classical biology Primarily concerned ----> Evolutionary, population

and theoretical biology, Cellular/Molecular biology ?

2. Medical informatics- Computing methods to improve communication,

understanding, and management of medical data Data Manipulation

Applications

Page 14: bioinformatics simple

3. Chemo -informatics  Chemical and biological technology, for drug

design and development

4. Genomics Analysis and comparison of the entire genome of

a single species or of multiple species Genomics existed before any genomes were

completely sequenced, but in a very primitive state

Continued…

Page 15: bioinformatics simple

5. Proteomics Study of how the genome is expressed in proteins, and of

how these proteins function and interact Concerned with the actual states of specific cells, rather

than the potential states described by the genome

6. Pharmacogenomics The application of genomic methods to identify drug

targets For example, searching entire genomes for potential drug

receptors, or by studying gene expression patterns in tumors

Continued….

Page 16: bioinformatics simple

7. Pharmacogenetics : The use of genomic methods to determine

what causes variations in individual response to drug treatments

The goal is to identify drugs that may be only be effective for subsets of patients, or to tailor drugs for specific individuals or groups

Page 17: bioinformatics simple

17

Main Goal: ?

Annotation Comparativegenomics

Structuralgenomics

Functionalgenomics

The “post-genomics” era

Page 18: bioinformatics simple

18

Annotation

Identify the genes within a given sequence of DNA

Identify the sitesWhich regulate the gene

Predict the function

Page 19: bioinformatics simple

19

A gene is characterized by several features (promoter, ORF…)

some are easier and some harder to detect…

How do we identify a gene

in a genome?

Page 20: bioinformatics simple

20

Comparativegenomics

Page 21: bioinformatics simple

21

Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23%

How humans are chimps?

Perhaps not surprising!!!

Page 22: bioinformatics simple

So where are we different ??

22

Human ATAGCGGGGGGATGCGGGCCCTATACCCChimp ATAGGGG - - GGATGCGGGCCCTATACCCMouse ATAGCG - - - GGATGCGGCGC -TATACCA

Page 23: bioinformatics simple

23

StructuralGenomics

Page 24: bioinformatics simple

24

The protein three dimensional structure can tell much more than the sequence alone

Protein-ligand complexes

Functional sites

fold Evolutionaryrelationship

Shape and electrostatics

Active sites

protein complexes

Biologic processes

Page 25: bioinformatics simple

The different types of data are collected in

database

Sequence databases Structural databases Databases of Experimental Results

All databases are connected

25

Resources and Databases

Page 26: bioinformatics simple

Gene database Genome database Disease related mutation database

26

Sequence databases

Page 27: bioinformatics simple

3-dimensional structures of proteins, nucleic

acids, molecular complexes etc

3-d data is available due to techniques such as NMR and X-Ray crystallography

27

Structure Databases

Page 28: bioinformatics simple

Data such as experimental microarray images-

gene expression data Proteomic data- protein expression data Metabolic pathways, protein-protein

interaction data, regulatory networks

28

Databases of Experimental Results

Page 29: bioinformatics simple

29

PubMed

Service of the National Library of Medicine

http://www.ncbi.nlm.nih.gov/pubmed/

Literature Databases

Page 30: bioinformatics simple

Each Database contains specific information

Like other biological systems also these databases are interrelated

30

Putting it all Together

Page 31: bioinformatics simple

31

GENOMIC DATAGenBank

DDBJEMBL

ASSEMBLED GENOMES

GoldenPathWormBase

TIGR

PROTEINPIR

SWISS-PROT

STRUCTUREPDB

MMDBSCOP

LITERATUREPubMed

PATHWAYKEGGCOG

DISEASELocusLink

OMIMOMIA

GENESRefSeq

AllGenesGDBSNPs

dbSNP

ESTsdbEST

unigene

MOTIFSBLOCKS

PfamProsite

GENE EXPRESSION

Stanford MGDBNetAffx

ArrayExpress

Page 32: bioinformatics simple

Applications I-- Genomics

Finding Genes in Genomic DNA introns exons Promotors

Characterizing Repeats in Genomic DNA Statistics Patterns

Expression Analysis Time Course Clustering Identifying regulatory Regions Measuring Differences

• Genome Comparisonsà Ortholog Familiesà Genome annotationà Evolutionary Phylogenetic

trees• Characterizing Intergenic

Regionsà Finding Pseudo genes à Patterns

• Duplications in the Genomeà Large scale genomic

alignment

Page 33: bioinformatics simple

Application II-

Protein Sequence

Sequence Alignment non-exact string matching,

gaps How to align two strings

optimally via Dynamic Programming

Local vs Global Alignment Suboptimal Alignment Hashing to increase speed

(BLAST, FASTA) Amino acid substitution

scoring matrices Multiple Alignment and

Consensus Patterns How to align more than one

sequence and then fuse the result in a consensus representation

Transitive Comparisons HMMs, Profiles Motifs

Scoring schemes and Matching statistics How to tell if a given

alignment or match is statistically significant

A P-value (or an e-value)? Score Distributions

(extreme val. dist.) Low Complexity Sequences

Evolutionary Issues Rates of mutation and

change

Page 34: bioinformatics simple

Application III--

Protein Structure

Secondary Structure “Prediction” via Propensities Neural Networks, Genetic

Algorithm. Simple Statistics Trans Membrane Regions Assessing Secondary

Structure Prediction

Tertiary Structure Prediction Fold Recognition Threading Ab initio

Function Prediction Active site identification

Relation of Sequence Similarity to Structural Similarity

Page 35: bioinformatics simple

Example Application IV: Finding Homologs

Core

Page 36: bioinformatics simple

Overall Occurrence of a

Certain Feature in the Genome e.g. how many

kinases in Yeast Compare Organisms and

Tissues Expression levels in

Cancerous vs Normal Tissues

Databases, Statistics

Example Application IV:Overall Genome Characterization

Page 37: bioinformatics simple

37

Thanks