theory and application of multiple sequence alignmentstheory and application of multiple sequence...

31
Theory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to Make One, and What to Do With It

Upload: others

Post on 24-May-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Theory and Application of Multiple Sequence Alignments

Brett Pickett, PhD

a.k.a What is a Multiple Sequence Alignment,

How to Make One, and What to Do With It

Page 2: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

History

• Structure of DNA discovered (1953)

• First (phage) genome determined in 1977

• Human genome project begun in 1990

• First living organism (H.i.) sequenced in 1995

• Human “Rough draft” completed in 2000

– NHGRI (public) vs. J. Craig Venter (private)

• Used “super” computer to put human genome together in right order

Page 3: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

What is a Genome?

• Genetic material required for organism to replicate – Eukaryotes (Humans): # chromosomes

– Prokaryotes (Bacteria): 1 chromosome

– Viruses: “what’s a chromosome?”

– 10 trillion cells in human body X 2m = 3.2 Gb • 780,000 times around Earth

• 67.8 roundtrips to the sun

– Bacteria (580 kb- 10 Mb)

– Virus (3.5 kb – 1.3 Mb)

http://www.rsc.org/chemsoc/timeline/pages/2001.html

Page 4: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Why are Genomes so Important?

• Encode all organismal functions

– DNA -> RNA -> protein

• Unique to each organism

– Find differences (mutations) only by comparing genomes with each other

www.thednastore.com/images/cells/mrdna1.jpg

Page 5: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

How are Sequences Made? 1. Make lots of copies of original sequence (PCR)

2. Put the copies into a machine to make even more copies

3. Fluorescent (glow-in-the-dark) bases get incorporated randomly into new DNA molecule

4. Laser detects glowing bases and tells the computer the order of bases = sequence

http://bjpsbiotech.edublogs.org/files/2007/12/electropherogram.jpg

Page 6: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

What’s the Next Step?

• After sequence is determined, then what?

• Make sense of it by comparing with other related (homologous) sequences

– Multiple Sequence Alignment

Page 7: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

What is an Alignment?

• Lining up related (homologous) positions

– Allows comparison

Unaligned

Aligned

Page 8: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Comparing Sequences (Genomes)

• All DNA contains a unique genetic “fingerprint”

• Similarity reveals

– Related function

– Shared evolutionary history

education.vetmed.vt.edu/.../FINGERPRINT.jpg

Page 9: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Aligning with Computational Methods

• Computers can’t “see” patterns

– Use math to find best alignment by assigning scores

– Match

– Mismatch

– Gap

• Internal – Insertion / deletion (indel)

• Terminal – Missing information?

Page 10: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

What is a Gap?

• Allows bases to be lined up even if sequences are different lengths

– Insertions / deletions (indels)

• Impossible to tell which sequence has lost (gained) information

– Terminal gaps

• Sequence is either naturally shorter or artificially cutoff

Page 11: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Mismatches Gaps

Nucleotide Alignment

• Custom Scores – Match – Mismatch – Gap-opening penalty

• Penalized for not having letter (begin a gap) • Why?

– Gap-extension penalty • Little or no penalty for lengthening a gap • Why?

– Scores balance between mismatch &

gap

Page 12: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Dynamic Programming

• Used to calculate alignment

– Breaks a very complicated process into smaller steps

– Helps computers to solve the problem faster

Sequence 1

Sequ

en

ce 2

Math

Read

http://www.myspacepimper.com/images/232763/Disney-s-Goofy-Baking-a-Cake.htm

Page 13: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Manual Alignment

Sequence A A T C

0 0 0 0 0

A 0

-4 5 -4

5

1 5 -4

5

1 -2 -4

1

-3 -2 -4

-2

T 0

-4 -2 1

1

-3 3 1

3

-1 10 -3

10

6 -1 -6

6

C 0

-4 -2 -3

-2

-6 -1 -1

-1

-5 1 6

6

2 15 2

15

Match = 5 Mismatch = -2 Gap Opening = -4 Gap Extension = 0

Traceback: Follow the highest scores back to the beginning Up or sideways = gap, diagonal = homology (line up)

A

A

A

-

T

T

C

C

Page 14: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Computer-Generated Alignment

• Much faster than we are

– 2 GHz = 2B calculations per second

– Don’t get tired, make mistakes, or get handcramps

Page 15: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Alignment Process

Page 16: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Types of Alignment

• Global

– Aligns entire sequence

– Permits gaps

– Forced even if sequences not homologous

• Local

– Aligns longest region possible with minimal (no) gaps

Page 17: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Beware!

• The computer is not always right

– Alignments

• Optimal: highest score

• True: evolutionarily correct

– Can be improved

• Hard for computer to accurately place indels (gaps) – Apply prior knowledge--codons

- AAA CCC

Lys Pro

AA- ACC C

??? Thr ?

Asn

Lys

vs. Nucleotide Sequence Amino Acid Sequence

Page 18: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

BLAST

• Basic Local Alignment Search Tool

– Most frequently used alignment tool

– Local alignment of 1 sequence (query) against all known sequences (subjects) in database

• Uses a “heuristic” to reduce number of sequences it actually has to align – Like using “Google” to find most homologous sequences

Page 19: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

BLAST Input

Page 20: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

BLAST Output

Page 21: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

How Does This Impact Me?

• Human Microbiome project – Sequence all bacteria in intestines

• Millions of bacteria in each gram of excrement – Which ones make us sick? How different is flora between people?

• Ocean Virus Metagenomics project – Try to get an idea of virus diversity across the globe

• Boat goes around N.A. collecting samples – Billions of viruses in each gallon of seawater

Page 22: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

How Does This Impact Me (cont’d)?

• Used to take swabs, grow colonies on agar

– Antimicrobial resistance in turkeys

• Sequencing removes middle step

• How to quickly assign genus and species to new sequences?

– BLAST

• Project: New Phage from ponds

Page 23: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Other Uses for Alignments

Page 24: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

SNP Detection

• Single Nucleotide Polymorphism

– Genetic changes occurring in at least one sequence

– May have biological significance

• Antibiotic resistance

• Changes could avoid detection by immune system

• Cause of genetic disease (CF)

Page 25: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Phylogenetic Trees

• Computer generated by: – Examining alignment

– Looking for shared mutations

• Show relationship(s) between sequences – History of sequences

• Where they came from

• Genetic changes that have occurred

CY065067

CY061195

CY065107

GU562458

CY065059

CY098563

CY098130

CY065011

CY061578

Clade

Node

Leaf

iOS Phylogram App (Free)

Branch

Page 26: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Recombination

• Can occur in all types of organisms – Eukaryotes – Prokaryotes – Viruses

• May change characteristic of organism – Make you sick (or not) – Not recognized by immune system – Fast way of getting lots of genetic changes

Breakpoint

RdRP

Genome 1

Genome 2

Daughter Sequence

Major Parent

Minor Parent

Page 27: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Reassortment

• Chromosomes (segments) from one organism replace those from another

– May change characteristic of organism

• Make you sick (or not)

• Not recognized by immune system

• Fast way of getting lots of genetic changes

+ =

Page 28: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Other Analysis Options

• Align Sequences

• Look for genetic changes (genotype) that are associated with traits (phenotype) – Host

– How sick it makes you

– Drug resistance

– Inherited disease

• Do any mutations consistently accompany the traits? – Genome Wide Association

Studies

http://lovestats.wordpress.com/dman/

Page 29: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to
Page 30: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

How Does an Alignment Get a Score?

• Amino acids

– Identical >> Similar >> Dissimilar

Page 31: Theory and Application of Multiple Sequence AlignmentsTheory and Application of Multiple Sequence Alignments Brett Pickett, PhD a.k.a What is a Multiple Sequence Alignment, How to

Score Lookup Table (Matrix)

Symmetrical Positive Scores on Diagonal (Matches)

Some Mismatches get Negative Scores

Some Mismatches don’t