biology and bioinformatics gabor t. marth department of biology, boston college marth@bc.edu bi820...

Post on 19-Dec-2015

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Biology and

Bioinformatics

Gabor T. Marth

Department of Biology, Boston Collegemarth@bc.edu

BI820 – Seminar in Quantitative and Computational Problems in Genomics

The animal cell

DNA – the carrier of the genetic code

DNA organization – chromosomes

Translation of genetic information

DNA sequencing informatics

DNA sequencing informatics

DNA organization

Genome annotation

De novo gene prediction

Similarity-based gene prediction

Gene localization

Genetic mapping

Gene function

Expression analysis

Protein structure

RNA structure

Protein structure prediction

RNA structure prediction

DNA evolution

Evolution of chromosome organization

Evolution of gene structure

Evolution of DNA sequence

Comparative genomics

Phylogenetics

Mechanisms of molecular evolution

Sequence variations

• Human Genome Project produced a reference genome sequence that is 99.9% common to each human being

• sequence variations make our genetic makeup unique

SNP

• Single-nucleotide polymorphisms (SNPs) are most abundant, but other types of variations exist and are important

Why do we care about variations?

phenotypic differences

demographic history

inherited diseases

How do we find polymorphisms?

• look at multiple sequences from the same genome region

• diverse sequence resources can be used EST

WGS

BAC

• diversion: sequencing informatics

SNP discovery -- Methods

Sequence clustering

Cluster refinement

Multiple alignment

SNP detection

SNP discovery – Computer tools

>CloneXACGTTGCAACGTGTCAATGCTGCA

>CloneYACGTTGCAACGTGTCAATGCTGCA

ACCTAGGAGACTGAACTTACTGACCTAGGAGACCGAACTTACTG

~ 30,000 clones

25,901 clones (7,122 finished, 18,779 draftwith basequality values)

21,020 clone overlaps(124,356 fragment overlaps)

507,152 high-quality candidate SNPs(validation rate 83-96%)

Marth et al., Nature Genetics 2001

SNP discovery – Mining Projects

SNP databases and characteristics

• access to variation data• SNP properties• reliability of information

• characterizing known polymorphic sites in sample collections – genotyping

Where do variations come from?

• sequence variations are the result of mutation events TAAAAAT

TAACAAT

TAAAAAT TAAAAAT TAACAAT TAACAAT TAACAAT

TAAAAAT TAACAAT

TAAAAAT

MRCA• mutations are propagated down through generations

Mutation rate

accgttatgtaga accgctatgtaga

MRCA

actgttatgtaga accgctatataga

MRCA

• higher mutation rate (µ) gives rise to more SNPS

Recombination

accgttatgtaga accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

accgttatgtaga

Demographic history

small (effective) population size N

large (effective)

population size N

• different world populations have varying long-term effective population sizes (e.g. African N is larger than European)

Modeling

past

present

stationary expansioncollapse

MD(simulation)

AFS(direct form)

histo

ry

0

0.05

0.1

1 2 3 4 5 6 7 8 9 10

0

0.05

0.1

1 2 3 4 5 6 7 8 9 100

0.05

0.1

1 2 3 4 5 6 7 8 9 10

0

0.05

0.1

1 2 3 4 5 6 7 8 9 10

bottleneck

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 100

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

0

0.1

0.2

0.3

0 1 2 3 4 5 6 7 8 9 10

Ancestral inference

0

0.05

0.1

0.15

1 2 3 4 5 6 7 8 9 10

minor allele count

bottleneckmodest but

uninterrupted expansion

The signatures of selection

• selective mutations influence the genealogy itself; in the case of neutral mutations the processes of mutation and genealogy are decoupled

Association and haplotype structure

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.81E-6

1E-5

1E-4

1E-3

0.01

0.1

1

10

100

1000

Reco

mbin

atio

n F

ract

ion

r2

European Asian

African American

Dista

nce

(kb)

“linkage disequilibrium”

“haplotype blocks”

Computer simulations: the Coalescent

Medical utility?

clinical phenotypemolecular markers

?

functional understanding

Mapping disease-causing loci

genetic linkage

association between allele and phenotype

Forensic applications

top related