cs 312: algorithm design & analysis lecture #24: optimality, gene sequence alignment this work...

30
CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . by: Eric Ringger, with contributions from Mike Jones, Eric Mercer, Sean Warn

Upload: holly-singleton

Post on 03-Jan-2016

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

CS 312: Algorithm Design & Analysis

Lecture #24: Optimality,

Gene Sequence Alignment

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License.

Slides by: Eric Ringger, with contributions from Mike Jones, Eric Mercer, Sean Warnick

Page 2: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Announcements

Homework #15 due now

Project #5: Gene Sequence Alignment Kick-off: today Read directions now Whiteboard experience: due Monday Early: Monday after mid-term exam Due: Wednesday after mid-term exam

Mid-term Exam Start preparing your one page of notes Must be prepared by you. No cutting and pasting.

Page 3: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Objectives

Revisit the main ideas behind Dynamic Programming

Define the optimality property for DP Develop the algorithm for gene sequence

alignment (or at least begin) Prepare for Project #5

Page 4: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Dynamic Programming

The six steps:1. Ask: am I solving an optimization problem?2. Devise a minimal description (address) for any problem

instance and sub-problem3. Divide problems into sub-problems: define the recurrence to

specify the relationship of problems to sub-problems4. Check that the optimality property holds: An optimal

solution to a problem is built from optimal solutions to sub-problems.

5. Store results – typically in a table – and re-use the solutions to sub-problems in the table as you build up to the overall solution.

6. Back-trace / analyze the table to extract the composition of the final solution.

Page 5: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Optimality Property

An optimal solution to a problem is built from optimal solutions to sub-problems.

The optimality property is a necessary condition for solving an optimization problem by DP! It allows us to store and re-use optimal results to

sub-problems.

Page 6: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Optimality

A

B C

E F G H I

D

J K

1

2

1

2

( ( ))

( ( ))( ) min ( max)

...

( ( ))nn

f optimalsolution child

f optimalsolution childoptimalsolution parent or

f optimalsolution child

Page 7: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Shortest Path

American Fork

Orem

Provo

Sundance

Geneva

20

1012

3

15

18

10

12

Goal: the shortest path from AF to Provo.

Does this problem exhibit the optimality property? Pair up. Discuss

Page 8: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Questions

Q. In general, do you know whichsub-problem solutions to use in advance?

A. No. So a very greedy algorithm is not an option. (But Dijkstra’s is.) Q: How does having a table of intermediate shortest path results help

find the shortest path from AF to Provo? A: Reuse those results for intermediate destinations as you try

different routes. Q. Do you have to reconsider alternative sub-optimal solutions for the

intermediate destinations? A. No

Thus,, the Optimality Property holds Therefore, the shortest path problem can be solved by DP.

American Fork

Orem

Provo

Sundance

Geneva

20

10

12

3

15

18

10

12

Page 9: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Optimality in Driving

The shortest route from American Fork to Provo passes through Orem.

Assume we have found this route.

Then what can we say about the shortest route from AF to Orem?

It follows that optimal route from AF to Provo.

Could it be otherwise?

Page 10: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

A related problem

Now suppose you drive from AF to Orem as fast as you canon your way to Provo,

But you are limited by the gas in your tank.

Page 11: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Does the Optimality Property Hold?

AF Orem Provo

Goal: get to Provo in as little time as possible. No refueling.Does this problem (formulation) satisfy the optimality property or not? Why?

5/9

10/5

20/1

5/9

10/5

20/1

“takes 20 minutes using1 gallon of gas”

Start with 10 gallons

Page 12: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Problem Solving Advice

Start by asking: which sub-problems should be solved? If you know how to choose in advance using local

information only, then greedy might work.

Else if sub-problems don’t overlap, then divide and conquer would be a good choice.

Else if the optimality property holds, then DP is a good choice.

Else the optimality property does NOThold, so apply another strategy.

(Stay tuned for more guidance)

Important!

Page 13: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

x=ACGCTGA y=ACTGT

Gene Sequence Alignment

Page 14: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Virtually Identical Problems

Edit Distance aka Levenshtein Distance

Sequence Alignment E.g., Gene Sequence Alignment

Fundamentally the same thing! We’re focusing on gene sequence

alignment.

Page 15: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Edit Distance / Sequence Alignment Problem

Given: 2 strings: and ; ;

Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)

Contrast the 2 perspectives.

Page 16: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Edit Distance / Sequence Alignment Problem

Given: 2 strings: and ; ;

Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)

x: ACGCT-Cy: A--CTGT

Alignment Example:

The ‘-’ is a “gap”

Page 17: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Edit Distance / Sequence Alignment Problem

Given: 2 strings: and ; ;

Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)

x: ACGCT-Cy: A--CTGT

Divide intoPairs

Page 18: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Edit Distance / Sequence Alignment Problem

Given: 2 strings: and ; ;

Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)

Cost: Type: Match; Cost = cmatch

x: ACGCT-Cy: A--CTGT

Each Pair hasa type and a cost

Page 19: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

x: ACGCT-Cy: A--CTGT

Edit Distance / Sequence Alignment Problem

Given: 2 strings: and ; ;

Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)

Cost: Match: cmatch

Type: Insertion into x (= deletion from y) aka “indel”; Cost = cindel

Page 20: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

x: ACGCT-Cy: A--CTGT

Edit Distance / Sequence Alignment Problem

Given: 2 strings: and ; ;

Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)

Cost: Match: cmatch

Insertion into x (= deletion from y): cindel

Insertion into y (= deletion from x): cindel

Page 21: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Edit Distance / Sequence Alignment Problem

Given: 2 strings: and ; ;

Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)

Cost: Match: cmatch

Insertion into x (= deletion from y): cindel

Insertion into y (= deletion from x): cindel

Type: Substitution of x into y (or from y into x); Cost = csub

x: ACGCT-Cy: A--CTGT

Page 22: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Edit Distance / Sequence Alignment Problem

Given: 2 strings: and ; ;

Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)

Cost: Match: cmatch

Insertion into x (= deletion from y): cindel

Insertion into y (= deletion from x): cindel

Substitution of x into y (or from y into x); Cost = csub

x: ACGCT-Cy: A--CTGT

Page 23: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Edit Distance / Sequence Alignment Problem

Given: 2 strings: and ; ;

Return: Smallest cost to transform string into string (or vice versa) Another perspective: smallest cost of aligning to (or vice versa)

Cost: Match: cmatch

Insertion into x (= deletion from y): cindel

Insertion into y (= deletion from x): cindel

Substitution of x into y (or from y into x); Cost = csub

x: ACGCT-Cy: A--CTGT

How would you solve this problem?

Page 24: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Solution Ideas Enumerate all and score

Pro: Easy to code Pro: Optimal Con: exponential

Greedy: work from left to right, gobbling up matches and inserting gaps or allowing substitutions as necessary Pro: Easy Pro: Linear = fast / efficient Con: not optimal

DP Pre-req: optimality property Pre-req: define addressable sub-problems Pre-req: determine relationship between problem and sub-problems Pro: Optimal Con: ?

Divide and Conquer?

Page 25: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Designing the DP Algorithm for Gene Sequence Alignment

Page 26: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

DP?

Define each sub-problem to be the best score for aligning the first bases of sequence with the first bases of sequence

Does that suffice as a minimal description?

In those terms, what is our objective function? minimize

Can we divide this problem into sub-problems? How many? Hint: how many sub-problems are one step away from ?

Page 27: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Example: Sub-problems

x=ACGCTGA y=ACTGT

Page 28: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Example: Sub-problems

x=ACGCTGA y=ACTGT

Page 29: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

To be continued in Lecture #25

Page 30: CS 312: Algorithm Design & Analysis Lecture #24: Optimality, Gene Sequence Alignment This work is licensed under a Creative Commons Attribution-Share Alike

Assignment

HW #16

Read Section 6.3, if you haven’t done so already.

Thursday: Screencast & Quiz