algorithmic approaches for biological data, lecture #23 · 2020. 1. 16. · algorithmic approaches...

169
Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York American Museum of Natural History 2 May 2016

Upload: others

Post on 24-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Algorithmic Approaches for Biological Data, Lecture #23

Katherine St. John

City University of New YorkAmerican Museum of Natural History

2 May 2016

Page 2: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Outline

Project & Last Day Notes

Complexity Revisited: NP-hardness, Time & SpaceComplexity

Searching for Optimal Trees

Edit distances between trees

Tree Vectors

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54

Page 3: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Outline

Project & Last Day Notes

Complexity Revisited: NP-hardness, Time & SpaceComplexity

Searching for Optimal Trees

Edit distances between trees

Tree Vectors

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54

Page 4: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Outline

Project & Last Day Notes

Complexity Revisited: NP-hardness, Time & SpaceComplexity

Searching for Optimal Trees

Edit distances between trees

Tree Vectors

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54

Page 5: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Outline

Project & Last Day Notes

Complexity Revisited: NP-hardness, Time & SpaceComplexity

Searching for Optimal Trees

Edit distances between trees

Tree Vectors

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54

Page 6: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Outline

Project & Last Day Notes

Complexity Revisited: NP-hardness, Time & SpaceComplexity

Searching for Optimal Trees

Edit distances between trees

Tree Vectors

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 2 / 54

Page 7: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Project & Last Day Notes

Last week of lectures (and lab).

Next Monday (last day of class):

I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)

I Open Lab for questions about labs,Rosalind questions, etc.

For those who cannot make Monday, possible todo presentation this Wednesday (see me).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54

Page 8: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Project & Last Day Notes

Last week of lectures (and lab).

Next Monday (last day of class):

I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)

I Open Lab for questions about labs,Rosalind questions, etc.

For those who cannot make Monday, possible todo presentation this Wednesday (see me).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54

Page 9: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Project & Last Day Notes

Last week of lectures (and lab).

Next Monday (last day of class):

I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)

I Open Lab for questions about labs,Rosalind questions, etc.

For those who cannot make Monday, possible todo presentation this Wednesday (see me).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54

Page 10: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Project & Last Day Notes

Last week of lectures (and lab).

Next Monday (last day of class):

I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)

I Open Lab for questions about labs,Rosalind questions, etc.

For those who cannot make Monday, possible todo presentation this Wednesday (see me).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54

Page 11: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Project & Last Day Notes

Last week of lectures (and lab).

Next Monday (last day of class):

I Project Presentations (about 5 minutes):brief overview of biological question,techniques used, and results)

I Open Lab for questions about labs,Rosalind questions, etc.

For those who cannot make Monday, possible todo presentation this Wednesday (see me).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 3 / 54

Page 12: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Complexity Revisited: Running Times

Theoretical Estimates on Running Time

big-Oh notation

Complexity Classes

P vs. NP

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54

Page 13: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Complexity Revisited: Running Times

Theoretical Estimates on Running Time

big-Oh notation

Complexity Classes

P vs. NP

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54

Page 14: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Complexity Revisited: Running Times

Theoretical Estimates on Running Time

big-Oh notation

Complexity Classes

P vs. NP

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54

Page 15: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Complexity Revisited: Running Times

Theoretical Estimates on Running Time

big-Oh notation

Complexity Classes

P vs. NP

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 4 / 54

Page 16: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Comparing Algorithms

Measure the size of the problem, usually called n.

Example: for sorting cards, n is the number of cards.

Different approaches can take different amounts oftime.

How long does the algorithm take proportional to n?

Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in

2002) that is hybrid of merge sort and insertion sort.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54

Page 17: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Comparing Algorithms

Measure the size of the problem, usually called n.

Example: for sorting cards, n is the number of cards.

Different approaches can take different amounts oftime.

How long does the algorithm take proportional to n?

Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in

2002) that is hybrid of merge sort and insertion sort.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54

Page 18: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Comparing Algorithms

Measure the size of the problem, usually called n.

Example: for sorting cards, n is the number of cards.

Different approaches can take different amounts oftime.

How long does the algorithm take proportional to n?

Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in

2002) that is hybrid of merge sort and insertion sort.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54

Page 19: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Comparing Algorithms

Measure the size of the problem, usually called n.

Example: for sorting cards, n is the number of cards.

Different approaches can take different amounts oftime.

How long does the algorithm take proportional to n?

Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in

2002) that is hybrid of merge sort and insertion sort.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54

Page 20: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Comparing Algorithms

Measure the size of the problem, usually called n.

Example: for sorting cards, n is the number of cards.

Different approaches can take different amounts oftime.

How long does the algorithm take proportional to n?

Sorting Algorithms demoNot in demo is the built-in Python sort: timSort (invented by Tim Peters in

2002) that is hybrid of merge sort and insertion sort.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 5 / 54

Page 21: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Page 22: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Page 23: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Page 24: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Page 25: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Page 26: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))

if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Page 27: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Page 28: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Page 29: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analysis of Algorithms

How long does the algorithm take proportional to n?

If an algorithm looks at each element once (or aconstant number of times), the running time isproportional to n, the number of elements.

Then, the algorithm runs in linear time.

Would write “the running time is O(n).”(“big-Oh” notation).

Usually measure the worst-case running time.

Formally: If the running time of an algorithm is f (n)for n items, then we f (n) = O(g(n))if there exists N, c > 0 such that for all n > N,

f (n) < c · g(n)

(That is, after some point, f (n) is smaller thanc · g(n).)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 6 / 54

Page 30: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 31: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 32: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 33: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 34: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 35: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 36: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 37: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 38: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 39: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).

(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 40: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyzing Sorts by Running Times

The sorting algorithms vary in running time, depending onnumber of elements and type of data.

Sorting Demo

Thinking about the worst-case, how many operations areperformed in bubbleSort?

def bubbleSort(a): #Let n be # of elements in a.

for j in range(1,len(a)): #Will go through this loop n times.

for i in range(1,len(a)): #Will go through this loop n times.

if a[i-1] > a[i]: #Takes constant time.

a[i-1], a[i] = a[i], a[i-1] #Takes constant time.

The lines in the if statement take constant time, but areperformed n · n time.

Upper bound on running time is O(c · n · n) = O(n2).(For big-Oh notation, drop constants and keep only largest terms.)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 7 / 54

Page 41: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Complexity Classes: What is NP-hardness?

P?= NP: Roughly, if the answer to a

problem can be checked quickly, canit be computed quickly?

P stands for problems that can becomputed quickly (polynomial time).

NP stands for problems that can bechecked quickly (nondeterministicpolynomial time).

Example: given a geometry proof,you can check if its correct quickly,but knowing that, is there a quickway to find proofs?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 8 / 54

Page 42: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Complexity Classes: What is NP-hardness?

P?= NP: Roughly, if the answer to a

problem can be checked quickly, canit be computed quickly?

P stands for problems that can becomputed quickly (polynomial time).

NP stands for problems that can bechecked quickly (nondeterministicpolynomial time).

Example: given a geometry proof,you can check if its correct quickly,but knowing that, is there a quickway to find proofs?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 8 / 54

Page 43: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Complexity Classes: What is NP-hardness?

P?= NP: Roughly, if the answer to a

problem can be checked quickly, canit be computed quickly?

P stands for problems that can becomputed quickly (polynomial time).

NP stands for problems that can bechecked quickly (nondeterministicpolynomial time).

Example: given a geometry proof,you can check if its correct quickly,but knowing that, is there a quickway to find proofs?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 8 / 54

Page 44: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Millennium Prize Problems

In 2000, the Clay Mathematics Institute an-nounced million dollar prizes for:

Birch and Swinnerton-DyerConjecture

Hodge Conjecture

Navier-Stokes Equations

P vs NP

Poincare Conjecture

Riemann Hypothesis

Yang-Mills Theory

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 9 / 54

Page 45: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Millenium Prize Problems

Grigori Perelman, 1993

In 2010, CMI announced Grigori Perelmansolved:

Birch and Swinnerton-DyerConjecture

Hodge Conjecture

Navier-Stokes Equations

P vs NP

Poincare Conjecture

Riemann Hypothesis

Yang-Mills Theory

He turned down the prize.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 10 / 54

Page 46: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Millenium Prize Problems

Grigori Perelman, 1993

In 2010, CMI announced Grigori Perelmansolved:

Birch and Swinnerton-DyerConjecture

Hodge Conjecture

Navier-Stokes Equations

P vs NP

Poincare Conjecture

Riemann Hypothesis

Yang-Mills Theory

He turned down the prize.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 10 / 54

Page 47: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

P vs. NP

Birch and Swinnerton-DyerConjecture

Hodge Conjecture

Navier-Stokes Equations

P vs. NP

Poincare Conjecture

Riemann Hypothesis

Yang-Mills Theory

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 11 / 54

Page 48: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

More Examples of NP Problems

���������� ���

����

������

������

�������

����

����������� �����

������������

�������������

�������������

����

��� ���� ���

!���

��"������� ���

#��� ������$��� ���

%���

��$������� ���

&���

����� ���� �

����'�������� �����������

�'�����(��)(��

*'�)���

������)�'���

��)���)�'���

+�)(����,����,-

������

��.�-�),

��������''�����

�����(��

��������

�,'�����

������)�

��.�

,�����

���)��,�

������,��

��)���,��

��������

.��������

.-����(

������

������)�(��

.�����(��

��'�*�)���

��+���

���

�)� ���

��.�������

�),�����

.+

+/

�/�/

��������)��

��.�)���.��,

��+������

� � � � � � �

���,����.��

�'��)�

�)������'�����

����/

�//)/�/

�/0/

��' /��)-'���

�������������

��.*����'���1�'��)���)

�)�������.�)���'���

��2 �"�3$��3��� ����&�4�

����

0 50 100 150 Miles

0 50 100 150 200 Kilometers

��'�)���

0 100 200 300 400 Miles

0 100 200 300 400 500 600 Kilometers

��)����

Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.

Mont Tremblant

MonctonPresque Isle

Yellowknife

Sand Spit

Prince RupertTerrace

Smithers

Fort St. John

Fort McMurrayPrince George

Kamloops

KelownaNanaimo

Penticton

CastlegarCranbrook

LethbridgeMedicine Hat

Thunder Bay

Sault Ste.Marie

North Bay

Sarnia

Grande Prairie

Sudbury

TimminsRouyn-Noranda

Kingston

Baie-Comeau

Wabush

Mont-Joli

Gaspe

Charlottetown

Bathurst

Fredericton

Saint John

Sydney

Goose Bay

Deer LakeGander

Îles de la Madeliene

Windsor

Vancouver

Toronto

Edmonton

Calgary

Winnipeg

Halifax

Ottawa

Victoria

London

City

Regina

Saskatoon

Cullaton LakeEnnadai Lake

Saguenay

Bangor

Miami

Orlando

West Palm Beach

Portland

Seattle

Boise

San Jose

Las Vegas

LOS ANGELES

San Diego

SAN FRANCISCO Oakland

DENVER

Sacramento

Salt Lake City

Tucson

Phoenix/ScottsdaleAlbuquerque

Charleston

Colorado Springs

Greenville/Spartanburg

Savannah

Baltimore

Birmingham

HOUSTON(INTERCONTINENTAL)

Louisville

Memphis

Milwaukee

Philadelphia

San Antonio

St. Louis

Tampa/St. Petersburg

Charlotte

CLEVELAND

Dallas/Fort Worth

Detroit

Jacksonville

Kansas City

NewOrleans

New York (La Guardia) (J.F. Kennedy)

Norfolk/Virginia Beach

Omaha

Albany

Atlanta

Austin

Boston

Columbia

Columbus

NashvilleOklahoma City

Raleigh/Durham

Richmond

WASHINGTON, DC (DULLES)

Hartford/Springfield

Cincinnati

Bozeman

Orange County

Portland

Providence

NEW YORK (NEWARK)

Greensboro/High Point/Winston-Salem

Lexington

GrandRapids

Ft. Lauderdale/Hollywood

Syracuse

Buffalo/Niagara Falls

KnoxvilleTulsa

El Paso

Honolulu

Manchester

Ft. Myers

Kahului

Indianapolis

Minneapolis

Dayton

Allentown

Madison

Pittsburgh

Appleton/Fox Cities

Burlington

CedarRapids/Iowa City

Wausau

DesMoines

Ft.Wayne

Green Bay

WhitePlains

Lansing

Moline

Rochester

SouthBend/Elkhart/Mishawaka

Springfield

Spokane

Wichita

Lincoln

Missoula

Rapid City

Reno/Tahoe

Charleston

Traverse City

Akron/Canton StateCollege

Jackson Hole

Kona

Burbank

Gunnison/CrestedButte

Hayden/SteamboatSprings

Montrose

Vail/Eagle

Fargo

Gillette

Rock Springs

Crescent City

Eureka

Aspen

Wilkes Barre/Scranton

Bakersfield

Charlottesville

Chico

Carlsbad

Cody/Yellowstone

Casper

Eugene

Fresno

SiouxFalls

GrandJunction

Medford

Pasco

Palm Springs

Santa Barbara

Roanoke

Imperial

Inyokern

Monterey

San Luis Obispo

Santa Maria

Yuma

Modesto

Springfield

Redmond

Redding

(Reagan National)

Bismarck

Peoria

Asheville

Augusta

Pensacola

Myrtle Beach

Fayetteville/Ft. Bragg

Gainesville

Hilton Head Island

Huntsville/Decatur

Jacksonville

Long Island/Islip

New Bern

Tri-Cities Regional

Wilmington

Newport News/Williamsburg

GreenvilleNorthwestArkansas

Great Falls

LittleRock

Billings

AltoonaJohnstown

Beckley

ShenandoahValley

ClarksburgMorgantown

Helena

KlamathFalls

North Bend

Midland/Odessa

Chattanooga

Gulfport/Biloxi

Huntington

New Haven

Williamsport

Jackson Montgomery

Mobile

Salisbury

Newburgh

Ft. WaltonBeach

Florence

Durango

Paducah

Brownsville

BatonRouge

Corpus Christi

Harlingen

Laredo

McAllen

Daytona

Lubbock

Amarillo

Dallas (Love)

Waco

College Station

Lafayette

Alexandria

LakeCharles

Shreveport

Beaumont/Pt. Arthur

Tyler

Monroe

Victoria

Erie

LiberalDodge City

Great BendGarden City

Hays

Prescott

Hilo

Flint

Long BeachFlagstaff

Midland/Saginaw

Parkersburg

Lynchburg

Elmira

Hyannis

Bar Harbor

Presque Isle

Nassau

Tallahassee

Treasure Cay

Cat IslandAndros Town

Nantucket

LOS ANGELES

SAN FRANCISCO

DENVER

Toronto

Honolulu

Ontario

Kahului

HarrisburgLincoln

Kona

Fargo

Casper SiouxFalls

Bismarck

IthacaBinghamton

Idaho Falls

Kalispell

Billings Duluth

Jackson

Salisbury

Muskegon

Brownsville

Corpus Christi

Harlingen

Laredo

McAllen

Eau Claire

Houghton

Minot

Pierre

Alliance

Chadron

Scottsbluff

Liberal

Kearney

Laramie

Huron

McCook

Dodge CityGreat Bend

Hays

AlamosaPuebloCortez

Farmington

TelluridePage/Lake Powell

Show LowPrescott

Moab

Worland

Sheridan

DickinsonMiles City

SidneyWilliston

Glasgow

Lewistown

Visalia

Hilo

Kapalua

Key West

GrandIsland

Vernal

North PlatteCheyenne

Riverton

LOS ANGELES

SAN FRANCISCO

DENVER

to Anchorage

Bimini

Freeport

George Town

North EleutheraGovernors Harbour

Marsh Harbour

Jamestown

Dubois

BradfordFranklin

Lewisburg

Clearfield

Sarasota/Bradenton

Plattsburgh

Melbourne

Killeen

Del Rio

Mammoth Lakes

Hobbs

Glendive

St. George

�������������

New York (Penn Station)

Boston

Newark(Liberty)

New Haven

Philadelphia

Washington, DC

Stamford

Wilmington

Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service

Traveling Salesman Problem (TSP): find the shortest paththat visits all cities.

Knapsack Problem: fill your backpack with the most valuableobjects without exceeding weight restrictions.

Sudoku: find a solution to a (large) Sudoku puzzle.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 12 / 54

Page 49: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

More Examples of NP Problems

���������� ���

����

������

������

�������

����

����������� �����

������������

�������������

�������������

����

��� ���� ���

!���

��"������� ���

#��� ������$��� ���

%���

��$������� ���

&���

����� ���� �

����'�������� �����������

�'�����(��)(��

*'�)���

������)�'���

��)���)�'���

+�)(����,����,-

������

��.�-�),

��������''�����

�����(��

��������

�,'�����

������)�

��.�

,�����

���)��,�

������,��

��)���,��

��������

.��������

.-����(

������

������)�(��

.�����(��

��'�*�)���

��+���

���

�)� ���

��.�������

�),�����

.+

+/

�/�/

��������)��

��.�)���.��,

��+������

� � � � � � �

���,����.��

�'��)�

�)������'�����

����/

�//)/�/

�/0/

��' /��)-'���

�������������

��.*����'���1�'��)���)

�)�������.�)���'���

��2 �"�3$��3��� ����&�4�

����

0 50 100 150 Miles

0 50 100 150 200 Kilometers

��'�)���

0 100 200 300 400 Miles

0 100 200 300 400 500 600 Kilometers

��)����

Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.

Mont Tremblant

MonctonPresque Isle

Yellowknife

Sand Spit

Prince RupertTerrace

Smithers

Fort St. John

Fort McMurrayPrince George

Kamloops

KelownaNanaimo

Penticton

CastlegarCranbrook

LethbridgeMedicine Hat

Thunder Bay

Sault Ste.Marie

North Bay

Sarnia

Grande Prairie

Sudbury

TimminsRouyn-Noranda

Kingston

Baie-Comeau

Wabush

Mont-Joli

Gaspe

Charlottetown

Bathurst

Fredericton

Saint John

Sydney

Goose Bay

Deer LakeGander

Îles de la Madeliene

Windsor

Vancouver

Toronto

Edmonton

Calgary

Winnipeg

Halifax

Ottawa

Victoria

London

City

Regina

Saskatoon

Cullaton LakeEnnadai Lake

Saguenay

Bangor

Miami

Orlando

West Palm Beach

Portland

Seattle

Boise

San Jose

Las Vegas

LOS ANGELES

San Diego

SAN FRANCISCO Oakland

DENVER

Sacramento

Salt Lake City

Tucson

Phoenix/ScottsdaleAlbuquerque

Charleston

Colorado Springs

Greenville/Spartanburg

Savannah

Baltimore

Birmingham

HOUSTON(INTERCONTINENTAL)

Louisville

Memphis

Milwaukee

Philadelphia

San Antonio

St. Louis

Tampa/St. Petersburg

Charlotte

CLEVELAND

Dallas/Fort Worth

Detroit

Jacksonville

Kansas City

NewOrleans

New York (La Guardia) (J.F. Kennedy)

Norfolk/Virginia Beach

Omaha

Albany

Atlanta

Austin

Boston

Columbia

Columbus

NashvilleOklahoma City

Raleigh/Durham

Richmond

WASHINGTON, DC (DULLES)

Hartford/Springfield

Cincinnati

Bozeman

Orange County

Portland

Providence

NEW YORK (NEWARK)

Greensboro/High Point/Winston-Salem

Lexington

GrandRapids

Ft. Lauderdale/Hollywood

Syracuse

Buffalo/Niagara Falls

KnoxvilleTulsa

El Paso

Honolulu

Manchester

Ft. Myers

Kahului

Indianapolis

Minneapolis

Dayton

Allentown

Madison

Pittsburgh

Appleton/Fox Cities

Burlington

CedarRapids/Iowa City

Wausau

DesMoines

Ft.Wayne

Green Bay

WhitePlains

Lansing

Moline

Rochester

SouthBend/Elkhart/Mishawaka

Springfield

Spokane

Wichita

Lincoln

Missoula

Rapid City

Reno/Tahoe

Charleston

Traverse City

Akron/Canton StateCollege

Jackson Hole

Kona

Burbank

Gunnison/CrestedButte

Hayden/SteamboatSprings

Montrose

Vail/Eagle

Fargo

Gillette

Rock Springs

Crescent City

Eureka

Aspen

Wilkes Barre/Scranton

Bakersfield

Charlottesville

Chico

Carlsbad

Cody/Yellowstone

Casper

Eugene

Fresno

SiouxFalls

GrandJunction

Medford

Pasco

Palm Springs

Santa Barbara

Roanoke

Imperial

Inyokern

Monterey

San Luis Obispo

Santa Maria

Yuma

Modesto

Springfield

Redmond

Redding

(Reagan National)

Bismarck

Peoria

Asheville

Augusta

Pensacola

Myrtle Beach

Fayetteville/Ft. Bragg

Gainesville

Hilton Head Island

Huntsville/Decatur

Jacksonville

Long Island/Islip

New Bern

Tri-Cities Regional

Wilmington

Newport News/Williamsburg

GreenvilleNorthwestArkansas

Great Falls

LittleRock

Billings

AltoonaJohnstown

Beckley

ShenandoahValley

ClarksburgMorgantown

Helena

KlamathFalls

North Bend

Midland/Odessa

Chattanooga

Gulfport/Biloxi

Huntington

New Haven

Williamsport

Jackson Montgomery

Mobile

Salisbury

Newburgh

Ft. WaltonBeach

Florence

Durango

Paducah

Brownsville

BatonRouge

Corpus Christi

Harlingen

Laredo

McAllen

Daytona

Lubbock

Amarillo

Dallas (Love)

Waco

College Station

Lafayette

Alexandria

LakeCharles

Shreveport

Beaumont/Pt. Arthur

Tyler

Monroe

Victoria

Erie

LiberalDodge City

Great BendGarden City

Hays

Prescott

Hilo

Flint

Long BeachFlagstaff

Midland/Saginaw

Parkersburg

Lynchburg

Elmira

Hyannis

Bar Harbor

Presque Isle

Nassau

Tallahassee

Treasure Cay

Cat IslandAndros Town

Nantucket

LOS ANGELES

SAN FRANCISCO

DENVER

Toronto

Honolulu

Ontario

Kahului

HarrisburgLincoln

Kona

Fargo

Casper SiouxFalls

Bismarck

IthacaBinghamton

Idaho Falls

Kalispell

Billings Duluth

Jackson

Salisbury

Muskegon

Brownsville

Corpus Christi

Harlingen

Laredo

McAllen

Eau Claire

Houghton

Minot

Pierre

Alliance

Chadron

Scottsbluff

Liberal

Kearney

Laramie

Huron

McCook

Dodge CityGreat Bend

Hays

AlamosaPuebloCortez

Farmington

TelluridePage/Lake Powell

Show LowPrescott

Moab

Worland

Sheridan

DickinsonMiles City

SidneyWilliston

Glasgow

Lewistown

Visalia

Hilo

Kapalua

Key West

GrandIsland

Vernal

North PlatteCheyenne

Riverton

LOS ANGELES

SAN FRANCISCO

DENVER

to Anchorage

Bimini

Freeport

George Town

North EleutheraGovernors Harbour

Marsh Harbour

Jamestown

Dubois

BradfordFranklin

Lewisburg

Clearfield

Sarasota/Bradenton

Plattsburgh

Melbourne

Killeen

Del Rio

Mammoth Lakes

Hobbs

Glendive

St. George

�������������

New York (Penn Station)

Boston

Newark(Liberty)

New Haven

Philadelphia

Washington, DC

Stamford

Wilmington

Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service

Traveling Salesman Problem (TSP): find the shortest paththat visits all cities.

Knapsack Problem: fill your backpack with the most valuableobjects without exceeding weight restrictions.

Sudoku: find a solution to a (large) Sudoku puzzle.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 12 / 54

Page 50: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

More Examples of NP Problems

���������� ���

����

������

������

�������

����

����������� �����

������������

�������������

�������������

����

��� ���� ���

!���

��"������� ���

#��� ������$��� ���

%���

��$������� ���

&���

����� ���� �

����'�������� �����������

�'�����(��)(��

*'�)���

������)�'���

��)���)�'���

+�)(����,����,-

������

��.�-�),

��������''�����

�����(��

��������

�,'�����

������)�

��.�

,�����

���)��,�

������,��

��)���,��

��������

.��������

.-����(

������

������)�(��

.�����(��

��'�*�)���

��+���

���

�)� ���

��.�������

�),�����

.+

+/

�/�/

��������)��

��.�)���.��,

��+������

� � � � � � �

���,����.��

�'��)�

�)������'�����

����/

�//)/�/

�/0/

��' /��)-'���

�������������

��.*����'���1�'��)���)

�)�������.�)���'���

��2 �"�3$��3��� ����&�4�

����

0 50 100 150 Miles

0 50 100 150 200 Kilometers

��'�)���

0 100 200 300 400 Miles

0 100 200 300 400 500 600 Kilometers

��)����

Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.

Mont Tremblant

MonctonPresque Isle

Yellowknife

Sand Spit

Prince RupertTerrace

Smithers

Fort St. John

Fort McMurrayPrince George

Kamloops

KelownaNanaimo

Penticton

CastlegarCranbrook

LethbridgeMedicine Hat

Thunder Bay

Sault Ste.Marie

North Bay

Sarnia

Grande Prairie

Sudbury

TimminsRouyn-Noranda

Kingston

Baie-Comeau

Wabush

Mont-Joli

Gaspe

Charlottetown

Bathurst

Fredericton

Saint John

Sydney

Goose Bay

Deer LakeGander

Îles de la Madeliene

Windsor

Vancouver

Toronto

Edmonton

Calgary

Winnipeg

Halifax

Ottawa

Victoria

London

City

Regina

Saskatoon

Cullaton LakeEnnadai Lake

Saguenay

Bangor

Miami

Orlando

West Palm Beach

Portland

Seattle

Boise

San Jose

Las Vegas

LOS ANGELES

San Diego

SAN FRANCISCO Oakland

DENVER

Sacramento

Salt Lake City

Tucson

Phoenix/ScottsdaleAlbuquerque

Charleston

Colorado Springs

Greenville/Spartanburg

Savannah

Baltimore

Birmingham

HOUSTON(INTERCONTINENTAL)

Louisville

Memphis

Milwaukee

Philadelphia

San Antonio

St. Louis

Tampa/St. Petersburg

Charlotte

CLEVELAND

Dallas/Fort Worth

Detroit

Jacksonville

Kansas City

NewOrleans

New York (La Guardia) (J.F. Kennedy)

Norfolk/Virginia Beach

Omaha

Albany

Atlanta

Austin

Boston

Columbia

Columbus

NashvilleOklahoma City

Raleigh/Durham

Richmond

WASHINGTON, DC (DULLES)

Hartford/Springfield

Cincinnati

Bozeman

Orange County

Portland

Providence

NEW YORK (NEWARK)

Greensboro/High Point/Winston-Salem

Lexington

GrandRapids

Ft. Lauderdale/Hollywood

Syracuse

Buffalo/Niagara Falls

KnoxvilleTulsa

El Paso

Honolulu

Manchester

Ft. Myers

Kahului

Indianapolis

Minneapolis

Dayton

Allentown

Madison

Pittsburgh

Appleton/Fox Cities

Burlington

CedarRapids/Iowa City

Wausau

DesMoines

Ft.Wayne

Green Bay

WhitePlains

Lansing

Moline

Rochester

SouthBend/Elkhart/Mishawaka

Springfield

Spokane

Wichita

Lincoln

Missoula

Rapid City

Reno/Tahoe

Charleston

Traverse City

Akron/Canton StateCollege

Jackson Hole

Kona

Burbank

Gunnison/CrestedButte

Hayden/SteamboatSprings

Montrose

Vail/Eagle

Fargo

Gillette

Rock Springs

Crescent City

Eureka

Aspen

Wilkes Barre/Scranton

Bakersfield

Charlottesville

Chico

Carlsbad

Cody/Yellowstone

Casper

Eugene

Fresno

SiouxFalls

GrandJunction

Medford

Pasco

Palm Springs

Santa Barbara

Roanoke

Imperial

Inyokern

Monterey

San Luis Obispo

Santa Maria

Yuma

Modesto

Springfield

Redmond

Redding

(Reagan National)

Bismarck

Peoria

Asheville

Augusta

Pensacola

Myrtle Beach

Fayetteville/Ft. Bragg

Gainesville

Hilton Head Island

Huntsville/Decatur

Jacksonville

Long Island/Islip

New Bern

Tri-Cities Regional

Wilmington

Newport News/Williamsburg

GreenvilleNorthwestArkansas

Great Falls

LittleRock

Billings

AltoonaJohnstown

Beckley

ShenandoahValley

ClarksburgMorgantown

Helena

KlamathFalls

North Bend

Midland/Odessa

Chattanooga

Gulfport/Biloxi

Huntington

New Haven

Williamsport

Jackson Montgomery

Mobile

Salisbury

Newburgh

Ft. WaltonBeach

Florence

Durango

Paducah

Brownsville

BatonRouge

Corpus Christi

Harlingen

Laredo

McAllen

Daytona

Lubbock

Amarillo

Dallas (Love)

Waco

College Station

Lafayette

Alexandria

LakeCharles

Shreveport

Beaumont/Pt. Arthur

Tyler

Monroe

Victoria

Erie

LiberalDodge City

Great BendGarden City

Hays

Prescott

Hilo

Flint

Long BeachFlagstaff

Midland/Saginaw

Parkersburg

Lynchburg

Elmira

Hyannis

Bar Harbor

Presque Isle

Nassau

Tallahassee

Treasure Cay

Cat IslandAndros Town

Nantucket

LOS ANGELES

SAN FRANCISCO

DENVER

Toronto

Honolulu

Ontario

Kahului

HarrisburgLincoln

Kona

Fargo

Casper SiouxFalls

Bismarck

IthacaBinghamton

Idaho Falls

Kalispell

Billings Duluth

Jackson

Salisbury

Muskegon

Brownsville

Corpus Christi

Harlingen

Laredo

McAllen

Eau Claire

Houghton

Minot

Pierre

Alliance

Chadron

Scottsbluff

Liberal

Kearney

Laramie

Huron

McCook

Dodge CityGreat Bend

Hays

AlamosaPuebloCortez

Farmington

TelluridePage/Lake Powell

Show LowPrescott

Moab

Worland

Sheridan

DickinsonMiles City

SidneyWilliston

Glasgow

Lewistown

Visalia

Hilo

Kapalua

Key West

GrandIsland

Vernal

North PlatteCheyenne

Riverton

LOS ANGELES

SAN FRANCISCO

DENVER

to Anchorage

Bimini

Freeport

George Town

North EleutheraGovernors Harbour

Marsh Harbour

Jamestown

Dubois

BradfordFranklin

Lewisburg

Clearfield

Sarasota/Bradenton

Plattsburgh

Melbourne

Killeen

Del Rio

Mammoth Lakes

Hobbs

Glendive

St. George

�������������

New York (Penn Station)

Boston

Newark(Liberty)

New Haven

Philadelphia

Washington, DC

Stamford

Wilmington

Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service

Traveling Salesman Problem (TSP): find the shortest paththat visits all cities.

Knapsack Problem: fill your backpack with the most valuableobjects without exceeding weight restrictions.

Sudoku: find a solution to a (large) Sudoku puzzle.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 12 / 54

Page 51: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

P?= NP : Why Does It Matter?

wiki

If you could quickly find solutionsto NP-hard problems (i.e. P=NP),then

Many security systems (suchas the Data EncryptionStandard (DES) used to sendATM/bank data) would beeasily breached.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 13 / 54

Page 52: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

P?= NP : Why Does It Matter?

���������� ���

����

������

������

�������

����

����������� �����

������������

�������������

�������������

����

��� ���� ���

!���

��"������� ���

#��� ������$��� ���

%���

��$������� ���

&���

����� ���� �

����'�������� �����������

�'�����(��)(��

*'�)���

������)�'���

��)���)�'���

+�)(����,����,-

������

��.�-�),

��������''�����

�����(��

��������

�,'�����

������)�

��.�

,�����

���)��,�

������,��

��)���,��

��������

.��������

.-����(

������

������)�(��

.�����(��

��'�*�)���

��+���

���

�)� ���

��.�������

�),�����

.+

+/

�/�/

��������)��

��.�)���.��,

��+������

� � � � � � �

���,����.��

�'��)�

�)������'�����

����/

�//)/�/

�/0/

��' /��)-'���

�������������

��.*����'���1�'��)���)

�)�������.�)���'���

��2 �"�3$��3��� ����&�4�

����

0 50 100 150 Miles

0 50 100 150 200 Kilometers

��'�)���

0 100 200 300 400 Miles

0 100 200 300 400 500 600 Kilometers

��)����

Route lines reflect flights operated by United Airlines, ContinentalAirlines, Inc. and/or their regional partners. For accurate flightschedules, please see www.united.com or www.continental.com.© 2011 United Air Lines, Inc. All Rights Reserved.

Mont Tremblant

MonctonPresque Isle

Yellowknife

Sand Spit

Prince RupertTerrace

Smithers

Fort St. John

Fort McMurrayPrince George

Kamloops

KelownaNanaimo

Penticton

CastlegarCranbrook

LethbridgeMedicine Hat

Thunder Bay

Sault Ste.Marie

North Bay

Sarnia

Grande Prairie

Sudbury

TimminsRouyn-Noranda

Kingston

Baie-Comeau

Wabush

Mont-Joli

Gaspe

Charlottetown

Bathurst

Fredericton

Saint John

Sydney

Goose Bay

Deer LakeGander

Îles de la Madeliene

Windsor

Vancouver

Toronto

Edmonton

Calgary

Winnipeg

Halifax

Ottawa

Victoria

London

City

Regina

Saskatoon

Cullaton LakeEnnadai Lake

Saguenay

Bangor

Miami

Orlando

West Palm Beach

Portland

Seattle

Boise

San Jose

Las Vegas

LOS ANGELES

San Diego

SAN FRANCISCO Oakland

DENVER

Sacramento

Salt Lake City

Tucson

Phoenix/ScottsdaleAlbuquerque

Charleston

Colorado Springs

Greenville/Spartanburg

Savannah

Baltimore

Birmingham

HOUSTON(INTERCONTINENTAL)

Louisville

Memphis

Milwaukee

Philadelphia

San Antonio

St. Louis

Tampa/St. Petersburg

Charlotte

CLEVELAND

Dallas/Fort Worth

Detroit

Jacksonville

Kansas City

NewOrleans

New York (La Guardia) (J.F. Kennedy)

Norfolk/Virginia Beach

Omaha

Albany

Atlanta

Austin

Boston

Columbia

Columbus

NashvilleOklahoma City

Raleigh/Durham

Richmond

WASHINGTON, DC (DULLES)

Hartford/Springfield

Cincinnati

Bozeman

Orange County

Portland

Providence

NEW YORK (NEWARK)

Greensboro/High Point/Winston-Salem

Lexington

GrandRapids

Ft. Lauderdale/Hollywood

Syracuse

Buffalo/Niagara Falls

KnoxvilleTulsa

El Paso

Honolulu

Manchester

Ft. Myers

Kahului

Indianapolis

Minneapolis

Dayton

Allentown

Madison

Pittsburgh

Appleton/Fox Cities

Burlington

CedarRapids/Iowa City

Wausau

DesMoines

Ft.Wayne

Green Bay

WhitePlains

Lansing

Moline

Rochester

SouthBend/Elkhart/Mishawaka

Springfield

Spokane

Wichita

Lincoln

Missoula

Rapid City

Reno/Tahoe

Charleston

Traverse City

Akron/Canton StateCollege

Jackson Hole

Kona

Burbank

Gunnison/CrestedButte

Hayden/SteamboatSprings

Montrose

Vail/Eagle

Fargo

Gillette

Rock Springs

Crescent City

Eureka

Aspen

Wilkes Barre/Scranton

Bakersfield

Charlottesville

Chico

Carlsbad

Cody/Yellowstone

Casper

Eugene

Fresno

SiouxFalls

GrandJunction

Medford

Pasco

Palm Springs

Santa Barbara

Roanoke

Imperial

Inyokern

Monterey

San Luis Obispo

Santa Maria

Yuma

Modesto

Springfield

Redmond

Redding

(Reagan National)

Bismarck

Peoria

Asheville

Augusta

Pensacola

Myrtle Beach

Fayetteville/Ft. Bragg

Gainesville

Hilton Head Island

Huntsville/Decatur

Jacksonville

Long Island/Islip

New Bern

Tri-Cities Regional

Wilmington

Newport News/Williamsburg

GreenvilleNorthwestArkansas

Great Falls

LittleRock

Billings

AltoonaJohnstown

Beckley

ShenandoahValley

ClarksburgMorgantown

Helena

KlamathFalls

North Bend

Midland/Odessa

Chattanooga

Gulfport/Biloxi

Huntington

New Haven

Williamsport

Jackson Montgomery

Mobile

Salisbury

Newburgh

Ft. WaltonBeach

Florence

Durango

Paducah

Brownsville

BatonRouge

Corpus Christi

Harlingen

Laredo

McAllen

Daytona

Lubbock

Amarillo

Dallas (Love)

Waco

College Station

Lafayette

Alexandria

LakeCharles

Shreveport

Beaumont/Pt. Arthur

Tyler

Monroe

Victoria

Erie

LiberalDodge City

Great BendGarden City

Hays

Prescott

Hilo

Flint

Long BeachFlagstaff

Midland/Saginaw

Parkersburg

Lynchburg

Elmira

Hyannis

Bar Harbor

Presque Isle

Nassau

Tallahassee

Treasure Cay

Cat IslandAndros Town

Nantucket

LOS ANGELES

SAN FRANCISCO

DENVER

Toronto

Honolulu

Ontario

Kahului

HarrisburgLincoln

Kona

Fargo

Casper SiouxFalls

Bismarck

IthacaBinghamton

Idaho Falls

Kalispell

Billings Duluth

Jackson

Salisbury

Muskegon

Brownsville

Corpus Christi

Harlingen

Laredo

McAllen

Eau Claire

Houghton

Minot

Pierre

Alliance

Chadron

Scottsbluff

Liberal

Kearney

Laramie

Huron

McCook

Dodge CityGreat Bend

Hays

AlamosaPuebloCortez

Farmington

TelluridePage/Lake Powell

Show LowPrescott

Moab

Worland

Sheridan

DickinsonMiles City

SidneyWilliston

Glasgow

Lewistown

Visalia

Hilo

Kapalua

Key West

GrandIsland

Vernal

North PlatteCheyenne

Riverton

LOS ANGELES

SAN FRANCISCO

DENVER

to Anchorage

Bimini

Freeport

George Town

North EleutheraGovernors Harbour

Marsh Harbour

Jamestown

Dubois

BradfordFranklin

Lewisburg

Clearfield

Sarasota/Bradenton

Plattsburgh

Melbourne

Killeen

Del Rio

Mammoth Lakes

Hobbs

Glendive

St. George

�������������

New York (Penn Station)

Boston

Newark(Liberty)

New Haven

Philadelphia

Washington, DC

Stamford

Wilmington

Train RoutesCodeshare / OnePass ServiceOnePass Eligible Service

United Airlines

If you could quickly find solutionsto NP-hard problems (i.e. P=NP),then

Scheduling and routingquestions (such as theKnapsack question andTraveling Salesman Problem)could be done efficiently.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 14 / 54

Page 53: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

P?= NP : Why Does It Matter?

wiki

If you could quickly find solutionsto NP-hard problems (i.e. P=NP),then

Some hard biologicalquestions (such as proteinfolding) would be tractable.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 15 / 54

Page 54: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

P?= NP

P?= NP: Roughly, if the answer to

a problem can be checked quickly,can it be computed quickly?

Solving this, will bring

fame,

fortune, and

change how algorithms aredesigned

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54

Page 55: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

P?= NP

P?= NP: Roughly, if the answer to

a problem can be checked quickly,can it be computed quickly?

Solving this, will bring

fame,

fortune, and

change how algorithms aredesigned

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54

Page 56: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

P?= NP

P?= NP: Roughly, if the answer to

a problem can be checked quickly,can it be computed quickly?

Solving this, will bring

fame,

fortune, and

change how algorithms aredesigned

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54

Page 57: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

P?= NP

P?= NP: Roughly, if the answer to

a problem can be checked quickly,can it be computed quickly?

Solving this, will bring

fame,

fortune, and

change how algorithms aredesigned

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 16 / 54

Page 58: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

In Pairs: Analyze Running Time

Give upper bounds on the worst case running time of:

1 def double(n):

d = 2*n

return d

2 def sum(n):

s = 0

for i in range(n):

s += i

return s

3 def sum2(n):

return n*(n+1)/2

4 def findMin(numList):

m = numList[0]

for i in range(1,len(numList)):

if numList[i] < m:

m = numList[i]

return m

5 (Code from text):

6 (Code from text):

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 17 / 54

Page 59: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyze Space Requirements

How much space an algorithm uses canmatter.

If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.

Can easily overwhelm the memory onyour computer.

Can measure space usage as we did fortime complexity.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54

Page 60: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyze Space Requirements

How much space an algorithm uses canmatter.

If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.

Can easily overwhelm the memory onyour computer.

Can measure space usage as we did fortime complexity.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54

Page 61: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyze Space Requirements

How much space an algorithm uses canmatter.

If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.

Can easily overwhelm the memory onyour computer.

Can measure space usage as we did fortime complexity.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54

Page 62: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyze Space Requirements

How much space an algorithm uses canmatter.

If you want to align two long sequences(say 1 million bp each). The dynamicprogramming will require a matrix with 1million × 1 million entries, requiring106 × 106 = 1012 places of storage.

Can easily overwhelm the memory onyour computer.

Can measure space usage as we did fortime complexity.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 18 / 54

Page 63: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyze Space Requirements

insertionSort sorts “in place”, and use noadditional space.

Our sequence alignment underNeedleman-Wunsch used an additionalO(n2) space to store the dynamicprogramming array.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 19 / 54

Page 64: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyze Space Requirements

insertionSort sorts “in place”, and use noadditional space.

Our sequence alignment underNeedleman-Wunsch used an additionalO(n2) space to store the dynamicprogramming array.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 19 / 54

Page 65: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analyze Space Requirements

insertionSort sorts “in place”, and use noadditional space.

Our sequence alignment underNeedleman-Wunsch used an additionalO(n2) space to store the dynamicprogramming array.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 19 / 54

Page 66: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

In Pairs: Analyze Space Requirement

The running time of bubblesort is analyzed above. Give upper bounds on theworst case space needed for:

1 def double(n):

d = 2*n

return d

2 def sum(n):

s = 0

for i in range(n):

s += i

return s

3 def sum2(n):

return n*(n+1)/2

4 def findMin(numList):

m = numList[0]

for i in range(1,len(numList)):

if numList[i] < m:

m = numList[i]

return m

5 def findMaxDist(numList):

n = len(numList)

d = np.zeros(n,n)

for i in range(1,n):

for j in range(1,n)

d[i,j] = abs(i,j)

return np.amax(d)

6 def findMaxDist2(numList):

n = len(numList)

m = 0

for i in range(1,n):

for j in range(1,n)

if abs(i,j) > m:

m = abs(i,j)

return m

7 (Code from text):

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 20 / 54

Page 67: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

More Complexity: Fixed Parameter Tractability

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.

Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.Often, the parameter, k , will be the distance between the trees.For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54

Page 68: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

More Complexity: Fixed Parameter Tractability

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.

Often, the parameter, k , will be the distance between the trees.For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54

Page 69: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

More Complexity: Fixed Parameter Tractability

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.Often, the parameter, k , will be the distance between the trees.

For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54

Page 70: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

More Complexity: Fixed Parameter Tractability

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

A

J

E

D

H

C

F

G

BA

I

J

I

H

G

F

ED

C

B

Roughly, the ability to efficiently calculate instances that are smallwith respect to some parameter is called fixed parameter tractability.Though NP-hard, some problems can be solved in time polynomial inthe size of the input size but exponential in the size of a fixedparameter.Often, the parameter, k , will be the distance between the trees.For example, the distance between the two trees can be calculated byshrinking the common regions and focusing on the differences, whichcan be bounded by k.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 21 / 54

Page 71: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Recap: Small Parsimony Problem

Last Week: given a tree with leaveslabeled by sequences, computed theparsimony score of the tree.

Thinking in terms of time complexity:How long does it take?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 22 / 54

Page 72: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Recap: Small Parsimony Problem

Last Week: given a tree with leaveslabeled by sequences, computed theparsimony score of the tree.

Thinking in terms of time complexity:How long does it take?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 22 / 54

Page 73: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Page 74: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.

I Output: The parsimony score of the tree(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Page 75: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Page 76: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Page 77: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structure

I Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Page 78: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Page 79: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Page 80: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Algorithm Design: Scoring Trees Under Parsimony

AMNH

How do you code this?

I Input: A tree and sequences on the leaves.I Output: The parsimony score of the tree

(with respect to the leaf labels).

What data structures do you need?

I Tree structureI Count of the number changes

Algorithm:

I First pass: Starting at the leaves, label theinternal leaves (with possible multiple labels).

I Second pass: Starting at the root, choose alabeling, then work towards the leavesminimizing the conflicts.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 23 / 54

Page 81: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Page 82: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Page 83: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Page 84: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.

F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Page 85: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Page 86: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: set

F Has functions for union and intersection ofsets.

F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Page 87: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.

F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Page 88: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Page 89: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Page 90: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position:

F If overlap, use that label.F If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1)

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 24 / 54

Page 91: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

First pass: Starting at the leaves, label the internal

leaves (with possible multiple labels):

I Given labels for children, compute label forthe parent:A T A T G

A A T T G → A AT AT T GI Go position by position: for-loop

F If overlap, use that label. if-statementF If no overlap, use the union.

I Useful Python container type: setF Has functions for union and intersection of

sets.F s1 = set(l1) set operations

s2 = set(l2)

print s1.intersection(s2)

print s1.union(s2)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 25 / 54

Page 92: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G → A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Page 93: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:

A AT AT T G → A A T T GI For all other nodes, compare to the parent:

A T A T G

AT AT G ACT G → A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Page 94: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:

A T A T G

AT AT G ACT G → A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Page 95: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G

→ A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Page 96: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G → A T G T G

I Go position by position:F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Page 97: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G → A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Page 98: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G → A T G T GI Go position by position:

F If overlap, use that label.F If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 26 / 54

Page 99: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Fitch’s Algorithm: Pseudocode

AMNH

Second pass: Starting at the root, choose a labeling,

then work towards the leaves minimizing the conflicts.

I At root, choose one labeling:A AT AT T G → A A T T G

I For all other nodes, compare to the parent:A T A T G

AT AT G ACT G → A T G T GI Go position by position: for-loop

F If overlap, use that label. if-statementF If no overlap, choose label from child.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 27 / 54

Page 100: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

In Pairs: Analyze Fitch’s Algorithm

AMNH

What is the running time and space require-ments for:

First pass: Starting at the leaves, label theinternal leaves (with possible multiplelabels).

Second pass: Starting at the root, choosea labeling, then work towards the leavesminimizing the conflicts.

Print out all tree on n leaves.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 28 / 54

Page 101: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Searching for Optimal Trees

Large Parsimony Problem: Given sequencesfor leaves, find the optimal scoring tree.

Visually: think of the trees on a 2D map andthe height above sea level is the score.

Works for any optimality criteria (i.e. sameanalogy works for maximum likelihood).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 29 / 54

Page 102: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Searching for Optimal Trees

Large Parsimony Problem: Given sequencesfor leaves, find the optimal scoring tree.

Visually: think of the trees on a 2D map andthe height above sea level is the score.

Works for any optimality criteria (i.e. sameanalogy works for maximum likelihood).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 29 / 54

Page 103: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Searching for Optimal Trees

Large Parsimony Problem: Given sequencesfor leaves, find the optimal scoring tree.

Visually: think of the trees on a 2D map andthe height above sea level is the score.

Works for any optimality criteria (i.e. sameanalogy works for maximum likelihood).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 29 / 54

Page 104: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

polymaps.org

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 30 / 54

Page 105: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Sampling:

Choose 1000 random points.

Find height at each point.

Output the sampled point with largestheight.

Will you reach the highest point?

Only if very lucky or a very dense sample.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54

Page 106: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Sampling:

Choose 1000 random points.

Find height at each point.

Output the sampled point with largestheight.

Will you reach the highest point?

Only if very lucky or a very dense sample.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54

Page 107: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Sampling:

Choose 1000 random points.

Find height at each point.

Output the sampled point with largestheight.

Will you reach the highest point?

Only if very lucky or a very dense sample.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54

Page 108: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Sampling:

Choose 1000 random points.

Find height at each point.

Output the sampled point with largestheight.

Will you reach the highest point?

Only if very lucky or a very dense sample.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54

Page 109: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Sampling:

Choose 1000 random points.

Find height at each point.

Output the sampled point with largestheight.

Will you reach the highest point?

Only if very lucky or a very dense sample.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 31 / 54

Page 110: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Page 111: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Page 112: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Page 113: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Page 114: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Page 115: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Page 116: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Analogy: Find the Highest Point

Hill Climbing:

Start at the harbor.

Can see 25 meters in all directions.

Walk upwards, repeat.

Will you reach the highest point?

Maybe, but maybe not.

I Could reach small peaks, but missthe larger ones.

I Start in multiple places to see more.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 32 / 54

Page 117: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Page 118: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Page 119: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a point

I Choose the next point from its neighbors (e.g. best scoring)I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Page 120: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)

I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Page 121: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Page 122: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Local Search Techniques

Goal: Find the point with the optimal score

Local search techniques prevail:

I Begin with a pointI Choose the next point from its neighbors (e.g. best scoring)I Repeat

Many variations on the theme: branch-and-bound, MCMC, geneticalgorithms,...

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 33 / 54

Page 123: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54

Page 124: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54

Page 125: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54

Page 126: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 34 / 54

Page 127: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

NNI Metric

C

ED

BA

G

F →

A

E

C

D

B

GF

The NNI distance between two trees is the minimal number of movesneeded to transform one to the other (NP-hard, DasGupta et al. ‘97).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 35 / 54

Page 128: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

SPR Distance

F

G

ED

C

A B A B

F

G

ED

CA B

F

G

ED

C

SPR distance is the minimal number of moves that transforms onetree into the other.

SPR for rooted trees is NP-hard (Bordewich & Semple ‘05).

SPR for unrooted trees is NP-hard (Hickey et al. ‘08).

SAT-based heuristic (Bonet & S ‘09).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 36 / 54

Page 129: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

SPR Distance

F

G

ED

C

A B A B

F

G

ED

CA B

F

G

ED

C

SPR distance is the minimal number of moves that transforms onetree into the other.

SPR for rooted trees is NP-hard (Bordewich & Semple ‘05).

SPR for unrooted trees is NP-hard (Hickey et al. ‘08).

SAT-based heuristic (Bonet & S ‘09).

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 36 / 54

Page 130: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

TBR Distance

F

G

ED

C

A B

F

G

ED

C

A B

F

G

ED

C

A B BA

CD

EF

G

TBR distance is the minimal number of moves that transforms onetree into the other.

TBR is NP-hard and FPT. (Allen & Steel ‘01)

TBR has a linear time 5-approximation and a polynomial time3-approximation (Amenta, Bonet, Mahindru, & S ‘06;

Bordewich, McCartin, & Semple ‘08)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 37 / 54

Page 131: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

TBR Distance

F

G

ED

C

A B

F

G

ED

C

A B

F

G

ED

C

A B BA

CD

EF

G

TBR distance is the minimal number of moves that transforms onetree into the other.

TBR is NP-hard and FPT. (Allen & Steel ‘01)

TBR has a linear time 5-approximation and a polynomial time3-approximation (Amenta, Bonet, Mahindru, & S ‘06;

Bordewich, McCartin, & Semple ‘08)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 37 / 54

Page 132: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

TBR Distance

F

G

ED

C

A B

F

G

ED

C

A B

F

G

ED

C

A B BA

CD

EF

G

TBR distance is the minimal number of moves that transforms onetree into the other.

TBR is NP-hard and FPT. (Allen & Steel ‘01)

TBR has a linear time 5-approximation and a polynomial time3-approximation (Amenta, Bonet, Mahindru, & S ‘06;

Bordewich, McCartin, & Semple ‘08)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 37 / 54

Page 133: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Tree Rearrangements

C D E

A

B

G

F

CF

A

B

D

E

GE

A

C

D

G

F

BA

F

C

B

D

E

GNearest Neighbor Interchange Subtree Prune & Regraft Tree Bisection & Reconnection

(NNI) (SPR) (TBR)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 38 / 54

Page 134: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Tree Rearrangements

C D E

A

B

G

F

CF

A

B

D

E

GE

A

C

D

G

F

BA

F

C

B

D

E

GNearest Neighbor Interchange Subtree Prune & Regraft Tree Bisection & Reconnection

(NNI) (SPR) (TBR)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 38 / 54

Page 135: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Metrics Matter

NNI & SPR Isoscope

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 39 / 54

Page 136: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Metrics Matter

NNI & SPR Isoscope

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 39 / 54

Page 137: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Landscapes

Parsimony score for compatible characters for n = 7 (Urheim, Ford, & S, submitted)

A treespace with assigned scores is often called a landscape.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 40 / 54

Page 138: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

What does the landscape look like?

Each landscape depends on the number of taxa and the score of each tree(usually derived from the inputted character sequences).

(from wikipedia)

If very smooth, ‘hill climbing’ will workwell.

The phylogeny problem is that of finding those trees which optimise some function of the input data.

We may end up with several trees optimising some optimality criterion (say parsimony) and a completely different set oftrees optimising another criterion. There are hundreds of phylogenetic optimality criteria! This is a problem, but it's not theone in which I'm interested today.

If very rugged, need more sophisticatedsearches that use the underlying struc-ture of the space.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 41 / 54

Page 139: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

What does the landscape look like?

Each landscape depends on the number of taxa and the score of each tree(usually derived from the inputted character sequences).

(from wikipedia)

If very smooth, ‘hill climbing’ will workwell.

The phylogeny problem is that of finding those trees which optimise some function of the input data.

We may end up with several trees optimising some optimality criterion (say parsimony) and a completely different set oftrees optimising another criterion. There are hundreds of phylogenetic optimality criteria! This is a problem, but it's not theone in which I'm interested today.

If very rugged, need more sophisticatedsearches that use the underlying struc-ture of the space.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 41 / 54

Page 140: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

What does the landscape look like?

Each landscape depends on the number of taxa and the score of each tree(usually derived from the inputted character sequences).

(from wikipedia)

If very smooth, ‘hill climbing’ will workwell.

The phylogeny problem is that of finding those trees which optimise some function of the input data.

We may end up with several trees optimising some optimality criterion (say parsimony) and a completely different set oftrees optimising another criterion. There are hundreds of phylogenetic optimality criteria! This is a problem, but it's not theone in which I'm interested today.

If very rugged, need more sophisticatedsearches that use the underlying struc-ture of the space.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 41 / 54

Page 141: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Adjusting Search Space

SPR metric NNI metric

Parsimony score for compatible characters for n = 7 (Urheim, Ford, & S, submitted)

The same data, organized by different tree metrics.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 42 / 54

Page 142: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Attraction Basins

resalliance.org

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 43 / 54

Page 143: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Adjusting Search Space

SPR metric NNI metric

Parsimony score for compatible characters for n = 7 (Urheim, Ford, & S, submitted)

Simplest Case: for compatible character sequences (‘perfect data’):

Under SPR, there is a single attraction basin.

Under NNI, multiple attraction basins occur even for perfect data.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 44 / 54

Page 144: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

In Pairs: Tree Rearrangements

F

G

ED

C

A B

For the tree on the left:

1 What are the NNI neighbors?

2 Give a SPR neighbor that is not a NNI neighbor.

3 Give a TBR neighbor that is not an SPR neighbor.

4 Find a tree that is NNI distance 3 but SPRdistance 1.

5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54

Page 145: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

In Pairs: Tree Rearrangements

F

G

ED

C

A B

For the tree on the left:

1 What are the NNI neighbors?

2 Give a SPR neighbor that is not a NNI neighbor.

3 Give a TBR neighbor that is not an SPR neighbor.

4 Find a tree that is NNI distance 3 but SPRdistance 1.

5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54

Page 146: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

In Pairs: Tree Rearrangements

F

G

ED

C

A B

For the tree on the left:

1 What are the NNI neighbors?

2 Give a SPR neighbor that is not a NNI neighbor.

3 Give a TBR neighbor that is not an SPR neighbor.

4 Find a tree that is NNI distance 3 but SPRdistance 1.

5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54

Page 147: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

In Pairs: Tree Rearrangements

F

G

ED

C

A B

For the tree on the left:

1 What are the NNI neighbors?

2 Give a SPR neighbor that is not a NNI neighbor.

3 Give a TBR neighbor that is not an SPR neighbor.

4 Find a tree that is NNI distance 3 but SPRdistance 1.

5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54

Page 148: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

In Pairs: Tree Rearrangements

F

G

ED

C

A B

For the tree on the left:

1 What are the NNI neighbors?

2 Give a SPR neighbor that is not a NNI neighbor.

3 Give a TBR neighbor that is not an SPR neighbor.

4 Find a tree that is NNI distance 3 but SPRdistance 1.

5 Write an algorithm for given a tree T and aspecific edge/node, the two NNI neighbors aroundthat edge.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 45 / 54

Page 149: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Weighted Trees

Philippe et al., ‘05

Branch weights are part of the model.

Indicated by length of edges in drawing.Two classic trees with same underlying topology.The metrics and search spaces above treat them as identical.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54

Page 150: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Weighted Trees

Philippe et al., ‘05

Branch weights are part of the model.Indicated by length of edges in drawing.

Two classic trees with same underlying topology.The metrics and search spaces above treat them as identical.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54

Page 151: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Weighted Trees

Philippe et al., ‘05

Branch weights are part of the model.Indicated by length of edges in drawing.Two classic trees with same underlying topology.

The metrics and search spaces above treat them as identical.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54

Page 152: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Weighted Trees

Philippe et al., ‘05

Branch weights are part of the model.Indicated by length of edges in drawing.Two classic trees with same underlying topology.The metrics and search spaces above treat them as identical.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 46 / 54

Page 153: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 47 / 54

Page 154: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Popular Tree Metrics

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGerberaGazaniaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

NicotianaCampanulaScaevolaStokesiaDimorphothecaSenecioGazaniaGerberaEchinopsFeliciaTagetesChromolaenaBlennospermaCoreopsisVernoniaCacosmiaCichoriumAchilleaCarthamnusFlaveriaPiptocarpaHelianthusTragopogonChrysanthemumEupatoriumLactucaBarnadesiaDasyphyllum

Those based on tree rearrangements:

Subtree Prune and Regraft (SPR)

Tree Bisection and Reconnection (TBR)

Nearest Neighbor Interchange (NNI)

Used for Searching for Optimal Trees, NP-hard

Those based on comparing tree vectors:

Robinson-Foulds (RF)

Rooted Triples (RT)

Quartet Distance

Billera-Holmes-Vogtmann (BHV or geodesic))

Used for comparing trees, poly time

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 47 / 54

Page 155: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Trees as Vectors

1

2 3 4

5

T1

1

2 4 3

5

T2T0

12|345

124|35

123|45K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 48 / 54

Page 156: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Trees as Vectors

1

2 3 4

5

T1

1

2 4 3

5

T2T0

12|345

124|35

123|45

1|2

34

52|1

34

53|1

24

54|1

23

55|1

23

41

2|3

45

13|2

45

14|2

35

15|2

34

23|1

45

24|1

35

25|1

34

34|1

25

35|1

24

45|1

23

T0 = (1, 2, 3, 4, 5) 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0T1 = ((1, 2), (3, (4, 5)) 1 1 1 1 1 1 0 0 0 0 0 0 0 0 1T2 = ((1, 2), (4, (3, 5)) 1 1 1 1 1 1 0 0 0 0 0 0 0 1 0

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 49 / 54

Page 157: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

BHV Distance

Billera, Holmes, Vogtmann ‘01

Billera, Holmes, and Vogtmann ‘01have a continuous metric space oftrees.

View each split in a tree as acoordinate in the space.

Identify edges of orthants to formspace

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 50 / 54

Page 158: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

BHV Distance

Billera, Holmes, Vogtmann ‘01

Billera, Holmes, and Vogtmann ‘01have a continuous metric space oftrees.

View each split in a tree as acoordinate in the space.

Identify edges of orthants to formspace

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 50 / 54

Page 159: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

BHV Distance

Billera, Holmes, Vogtmann ‘01

Billera, Holmes, and Vogtmann ‘01have a continuous metric space oftrees.

View each split in a tree as acoordinate in the space.

Identify edges of orthants to formspace

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 50 / 54

Page 160: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Identify Edges of Orthants

(All images from Billera, Holmes, Vogtmann ‘01)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 51 / 54

Page 161: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Identify Edges of Orthants

(All images from Billera, Holmes, Vogtmann ‘01)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 51 / 54

Page 162: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Identify Edges of Orthants

(All images from Billera, Holmes, Vogtmann ‘01)

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 51 / 54

Page 163: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

BHV & NNI Distances

1 5

423

1 5

432

3 5

412

123|45

12|345

23|145

13|245

1

2 3 4

5

(1,1,0,0)

(1,0,1,0)

1

32

4

5

3

2

1

4

5

(3,2,0,0)

1

2 3 4

5

(3,0,0,2)

Simplest move between orthants: shrink coordinate/edge and expand.

Corresponds to a Nearest Neighbor Interchange (NNI) move on the topology.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 52 / 54

Page 164: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

BHV & NNI Distances

1 5

423

1 5

432

3 5

412

123|45

12|345

23|145

13|245

1

2 3 4

5

(1,1,0,0)

(1,0,1,0)

1

32

4

5

3

2

1

4

5

(3,2,0,0)

1

2 3 4

5

(3,0,0,2)

Simplest move between orthants: shrink coordinate/edge and expand.

Corresponds to a Nearest Neighbor Interchange (NNI) move on the topology.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 52 / 54

Page 165: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

In Pairs: Continuous Treespace

1

2 3 4

5

T1

1

2 4 3

5

T2T0

12|345

124|35

123|45

123|45

12|345

23|145

13|245

1

2 3 4

5

(1,1,0,0)

(1,0,1,0)

1

32

4

5

3

2

1

4

5

(3,2,0,0)

1

2 3 4

5

(3,0,0,2)

1 What is the distance between T1 and T2?

2 What is the average (tree at the midpoint) ofT1 and T2?

3 Give three trees that have distance 1 to theorigin, T0 (star tree).

4 Lower figure: what is the distance between(3,0,0,2) and (1,1,0,0)?

5 What is a good average/consensus for thefour trees?

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 53 / 54

Page 166: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Recap

Efficiency (in time and space) matterswhen data gets large.

More on tree searching next lecture.

Email lab reports to [email protected].

Challenges available at rosalind.info.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54

Page 167: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Recap

Efficiency (in time and space) matterswhen data gets large.

More on tree searching next lecture.

Email lab reports to [email protected].

Challenges available at rosalind.info.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54

Page 168: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Recap

Efficiency (in time and space) matterswhen data gets large.

More on tree searching next lecture.

Email lab reports to [email protected].

Challenges available at rosalind.info.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54

Page 169: Algorithmic Approaches for Biological Data, Lecture #23 · 2020. 1. 16. · Algorithmic Approaches for Biological Data, Lecture #23 Katherine St. John City University of New York

Recap

Efficiency (in time and space) matterswhen data gets large.

More on tree searching next lecture.

Email lab reports to [email protected].

Challenges available at rosalind.info.

K. St. John (CUNY & AMNH) Algorithms #23 2 May 2016 54 / 54