biology and bioinformatics gabor t. marth department of biology, boston college [email protected] bi820...

44
Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College [email protected] BI820 – Seminar in Quantitative and Computational Problems in Genomics

Post on 19-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Biology and

Bioinformatics

Gabor T. Marth

Department of Biology, Boston [email protected]

BI820 – Seminar in Quantitative and Computational Problems in Genomics

Page 2: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

The animal cell

Page 3: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

DNA – the carrier of the genetic code

Page 4: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

DNA organization – chromosomes

Page 5: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Translation of genetic information

Page 6: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

DNA sequencing informatics

DNA sequencing informatics

Page 7: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

DNA organization

Page 8: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Genome annotation

Page 9: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

De novo gene prediction

Page 10: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Similarity-based gene prediction

Page 11: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Gene localization

Page 12: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Genetic mapping

Page 13: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Gene function

Page 14: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Expression analysis

Page 15: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Protein structure

Page 16: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

RNA structure

Page 17: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Protein structure prediction

Page 18: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

RNA structure prediction

Page 19: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

DNA evolution

Page 20: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Evolution of chromosome organization

Page 21: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Evolution of gene structure

Page 22: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Evolution of DNA sequence

Page 23: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Comparative genomics

Page 24: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Phylogenetics

Page 25: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Mechanisms of molecular evolution

Page 26: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Sequence variations

• Human Genome Project produced a reference genome sequence that is 99.9% common to each human being

• sequence variations make our genetic makeup unique

SNP

• Single-nucleotide polymorphisms (SNPs) are most abundant, but other types of variations exist and are important

Page 27: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Why do we care about variations?

phenotypic differences

demographic history

inherited diseases

Page 28: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

How do we find polymorphisms?

• look at multiple sequences from the same genome region

• diverse sequence resources can be used EST

WGS

BAC

• diversion: sequencing informatics

Page 29: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

SNP discovery -- Methods

Sequence clustering

Cluster refinement

Multiple alignment

SNP detection

Page 30: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

SNP discovery – Computer tools

Page 31: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

>CloneXACGTTGCAACGTGTCAATGCTGCA

>CloneYACGTTGCAACGTGTCAATGCTGCA

ACCTAGGAGACTGAACTTACTGACCTAGGAGACCGAACTTACTG

~ 30,000 clones

25,901 clones (7,122 finished, 18,779 draftwith basequality values)

21,020 clone overlaps(124,356 fragment overlaps)

507,152 high-quality candidate SNPs(validation rate 83-96%)

Marth et al., Nature Genetics 2001

SNP discovery – Mining Projects

Page 32: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

SNP databases and characteristics

• access to variation data• SNP properties• reliability of information

• characterizing known polymorphic sites in sample collections – genotyping

Page 33: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Where do variations come from?

• sequence variations are the result of mutation events TAAAAAT

TAACAAT

TAAAAAT TAAAAAT TAACAAT TAACAAT TAACAAT

TAAAAAT TAACAAT

TAAAAAT

MRCA• mutations are propagated down through generations

Page 34: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Mutation rate

accgttatgtaga accgctatgtaga

MRCA

actgttatgtaga accgctatataga

MRCA

• higher mutation rate (µ) gives rise to more SNPS

Page 35: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Recombination

accgttatgtaga accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

Page 36: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Demographic history

small (effective) population size N

large (effective)

population size N

• different world populations have varying long-term effective population sizes (e.g. African N is larger than European)

Page 37: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Modeling

past

present

stationary expansioncollapse

MD(simulation)

AFS(direct form)

histo

ry

0

0.05

0.1

1 2 3 4 5 6 7 8 9 10

0

0.05

0.1

1 2 3 4 5 6 7 8 9 100

0.05

0.1

1 2 3 4 5 6 7 8 9 10

0

0.05

0.1

1 2 3 4 5 6 7 8 9 10

bottleneck

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

Page 38: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Ancestral inference

0

0.05

0.1

0.15

1 2 3 4 5 6 7 8 9 10

minor allele count

bottleneckmodest but

uninterrupted expansion

Page 39: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

The signatures of selection

• selective mutations influence the genealogy itself; in the case of neutral mutations the processes of mutation and genealogy are decoupled

Page 40: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Association and haplotype structure

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.81E-6

1E-5

1E-4

1E-3

0.01

0.1

1

10

100

1000

Reco

mbin

atio

n F

ract

ion

r2

European Asian

African American

Dista

nce

(kb)

“linkage disequilibrium”

“haplotype blocks”

Page 41: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Computer simulations: the Coalescent

Page 42: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Medical utility?

clinical phenotypemolecular markers

?

functional understanding

Page 43: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Mapping disease-causing loci

genetic linkage

association between allele and phenotype

Page 44: Biology and Bioinformatics Gabor T. Marth Department of Biology, Boston College marth@bc.edu BI820 – Seminar in Quantitative and Computational Problems

Forensic applications