bioinformatics simple

Post on 28-Aug-2014

116 Views

Category:

Education

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

 

TRANSCRIPT

1

Introduction to Bioinformatics

2

Science of collecting, analyzing and conceptualizing biological data by implication of informatics techniques.

Bioinformatics

Biology

Informa-tics

Bioinformatics

3

What is Bioinformatics

BiologicalData

ComputerAnalysis+

Mouse Genome: 2.5 billion base pairsHuman Genome: 3 billion base pairs

4

Manage biological information organize biological information using databases Process, analyze, and visualize biological data Share biological information to the public using the

Internet.

Goals of Bioinformatics

5

Bio – informatics Bioinformatics is conceptualizing biology in

terms of molecules (in the sense of physical-chemistry) applying “informatics” techniques (derived from disciplines such as applied math, CS, and statistics) to understand and organize the information associated with these molecules, on a large-scale.

Bioinformatics is a practical discipline with many applications.

Definition

6

Computational biology

Bioinformatics

Systems biology Genomics

Bioinformatics

7

Biological Information Central Dogma

of Molecular Biology DNA -> RNA -> Protein -> Phenotype -> DNA

Molecules Sequence, Structure, Function,

Interaction Processes Mechanism, Specificity,

Regulation

Central Paradigmfor Bioinformatics

Genomic Sequence Information -> mRNA (level) -> Protein Sequence -> Protein Structure -> Protein Function -> Protein Interaction -> Phenotype

Large Amounts of Information Statistical Computer Processing

8

Methods of analyzing

data Systems Analysis

Information Theory

Graph Theory

Robotics

Algorithms

Artificial IntelligenceStatistics

9

Domains of

bioinformaticsBio-

informatist

Development of new softwareAlgorithm

s

Bio-informaticians.

Using different algorithms

and computer software

10

Could not have been achieved without bioinformatics  Goals 3 billion DNA subunits Discover all the human genes Make them accessible for further biological study

then ?

Need to bring together and store vast amounts of information from

Lab equipment and experiments Computer Analysis Human Analysis Make visible to the world’s scientists

Human genome project

11

How to analyze

information Data –Management. –Analysis. –Derive Hypothesis. –Design and Implement an in silico

experiment. –Confirm in the wet lab.

12

Find an answer quickly Most in silico biology is faster than in vitro 2. Massive amounts of data to analyze Need to make use of all information Not possible to do analysis by hand Can’t organize and store information only using lab

note books• Automation is key However! Verification ?

Why bioinformatics

1. Computational biology- Computing methods for classical biology Primarily concerned ----> Evolutionary, population

and theoretical biology, Cellular/Molecular biology ?

2. Medical informatics- Computing methods to improve communication,

understanding, and management of medical data Data Manipulation

Applications

3. Chemo -informatics  Chemical and biological technology, for drug

design and development

4. Genomics Analysis and comparison of the entire genome of

a single species or of multiple species Genomics existed before any genomes were

completely sequenced, but in a very primitive state

Continued…

5. Proteomics Study of how the genome is expressed in proteins, and of

how these proteins function and interact Concerned with the actual states of specific cells, rather

than the potential states described by the genome

6. Pharmacogenomics The application of genomic methods to identify drug

targets For example, searching entire genomes for potential drug

receptors, or by studying gene expression patterns in tumors

Continued….

7. Pharmacogenetics : The use of genomic methods to determine

what causes variations in individual response to drug treatments

The goal is to identify drugs that may be only be effective for subsets of patients, or to tailor drugs for specific individuals or groups

17

Main Goal: ?

Annotation Comparativegenomics

Structuralgenomics

Functionalgenomics

The “post-genomics” era

18

Annotation

Identify the genes within a given sequence of DNA

Identify the sitesWhich regulate the gene

Predict the function

19

A gene is characterized by several features (promoter, ORF…)

some are easier and some harder to detect…

How do we identify a gene

in a genome?

20

Comparativegenomics

21

Comparison between the full drafts of the human and chimp genomes revealed that they differ only by 1.23%

How humans are chimps?

Perhaps not surprising!!!

So where are we different ??

22

Human ATAGCGGGGGGATGCGGGCCCTATACCCChimp ATAGGGG - - GGATGCGGGCCCTATACCCMouse ATAGCG - - - GGATGCGGCGC -TATACCA

23

StructuralGenomics

24

The protein three dimensional structure can tell much more than the sequence alone

Protein-ligand complexes

Functional sites

fold Evolutionaryrelationship

Shape and electrostatics

Active sites

protein complexes

Biologic processes

The different types of data are collected in

database

Sequence databases Structural databases Databases of Experimental Results

All databases are connected

25

Resources and Databases

Gene database Genome database Disease related mutation database

26

Sequence databases

3-dimensional structures of proteins, nucleic

acids, molecular complexes etc

3-d data is available due to techniques such as NMR and X-Ray crystallography

27

Structure Databases

Data such as experimental microarray images-

gene expression data Proteomic data- protein expression data Metabolic pathways, protein-protein

interaction data, regulatory networks

28

Databases of Experimental Results

29

PubMed

Service of the National Library of Medicine

http://www.ncbi.nlm.nih.gov/pubmed/

Literature Databases

Each Database contains specific information

Like other biological systems also these databases are interrelated

30

Putting it all Together

31

GENOMIC DATAGenBank

DDBJEMBL

ASSEMBLED GENOMES

GoldenPathWormBase

TIGR

PROTEINPIR

SWISS-PROT

STRUCTUREPDB

MMDBSCOP

LITERATUREPubMed

PATHWAYKEGGCOG

DISEASELocusLink

OMIMOMIA

GENESRefSeq

AllGenesGDBSNPs

dbSNP

ESTsdbEST

unigene

MOTIFSBLOCKS

PfamProsite

GENE EXPRESSION

Stanford MGDBNetAffx

ArrayExpress

Applications I-- Genomics

Finding Genes in Genomic DNA introns exons Promotors

Characterizing Repeats in Genomic DNA Statistics Patterns

Expression Analysis Time Course Clustering Identifying regulatory Regions Measuring Differences

• Genome Comparisonsà Ortholog Familiesà Genome annotationà Evolutionary Phylogenetic

trees• Characterizing Intergenic

Regionsà Finding Pseudo genes à Patterns

• Duplications in the Genomeà Large scale genomic

alignment

Application II-

Protein Sequence

Sequence Alignment non-exact string matching,

gaps How to align two strings

optimally via Dynamic Programming

Local vs Global Alignment Suboptimal Alignment Hashing to increase speed

(BLAST, FASTA) Amino acid substitution

scoring matrices Multiple Alignment and

Consensus Patterns How to align more than one

sequence and then fuse the result in a consensus representation

Transitive Comparisons HMMs, Profiles Motifs

Scoring schemes and Matching statistics How to tell if a given

alignment or match is statistically significant

A P-value (or an e-value)? Score Distributions

(extreme val. dist.) Low Complexity Sequences

Evolutionary Issues Rates of mutation and

change

Application III--

Protein Structure

Secondary Structure “Prediction” via Propensities Neural Networks, Genetic

Algorithm. Simple Statistics Trans Membrane Regions Assessing Secondary

Structure Prediction

Tertiary Structure Prediction Fold Recognition Threading Ab initio

Function Prediction Active site identification

Relation of Sequence Similarity to Structural Similarity

Example Application IV: Finding Homologs

Core

Overall Occurrence of a

Certain Feature in the Genome e.g. how many

kinases in Yeast Compare Organisms and

Tissues Expression levels in

Cancerous vs Normal Tissues

Databases, Statistics

Example Application IV:Overall Genome Characterization

37

Thanks

top related