approximation algorithms for path-planning and clustering problems on graphs (thesis proposal)

Approximation algorithms for

Path-Planning and Clustering

problems on graphs

(Thesis Proposal)

Shuchi ChawlaCarnegie Mellon University

Shuchi Chawla, Carnegie Mellon University2

Two classes of Graph Optimization problems

Optimization problems on graphs arise in many fields

Typically NP-hard

We consider two classes of problems motivated by machine learning and AI:

Path-planning – Construct a “good” path, given a map

Clustering – Divide objects into groups based on similarity

Path-planning Problems


A Robot Navigation Problem

Task: Deliver packages to certain locations Faster delivery => greater happiness; “reward”

Want a path with short length and large reward

Classic formulation – Traveling SalesmanFind the shortest tour covering all locations

Some complicating constraints Limited battery power – robot may die before finishing

task Packages have different deadlines for delivery Preference to the larger reward packages

An alternate formulation – Orienteering Construct a path of length · D Visit as many locations (reward) as possible


Path-planning in the real-world: Motivation

Given graph (metric) G, construct a path satisfying some constraints and optimizing some function.

Some applications:Robotics Assembly analysisManufacturing Production planning

A trade-off between time and rewardmaximize reward with bounded lengthminimize length with reward quotasome combination of both


A time-reward trade-off

Impose a reward quota and minimize length Metric TSP Collect all points k-Path Collect at least k reward

Budget the path-length and maximize reward Orienteering Hard bound on path length Time Window Visit node v within [Rv, Dv]

Optimize a combination of reward and length Prize Collecting TSP Min (length + reward left) Discounted Reward TSP max reward; reward

decreases with time


A time-reward trade-off

Impose a reward quota and minimize length Metric TSP 1.5 [Christofides 76] k-Path 2 + [Chaudhury Godfrey Rao+ 03]

Budget the path-length and maximize reward Orienteering 3 Time Window 3log2n [Bansal Blum C Meyerson 04]

Optimize a combination of reward and length Prize Collecting TSP 2 [Goemans Williamson 95] Discounted Reward TSP 6.75 + [Blum C

Karger+ 03]

[Blum C Karger Meyerson Minkoff Lane 03]


Orienteering and k-Path

Orienteering : length · D ; maximize reward k-Path : reward ¸ k ; minimize length

Complementary problems

Series of results on k-TSP (related to k-Path)

[BRV99] [Garg99] [AK00] [CGRT03] …

best approx: (2+)

None for Orienteering until recently!


Why is Orienteering difficult?

First attempt – Use distance-based approximations to approximate reward

Let OPT(d) = max achievable reward with length d

A 2-approx for distance implies that ALG(d) ≥ OPT(d/2)

However, we may have OPT(d/2) << OPT(d) Bad trade-off between distance and reward!

sOPT(d)

APPROX


Why is Orienteering difficult?

Second attempt – approximate subparts of the optimal path and shortcut other parts

If we stray away from the optimal path by a lot, we may not be able to cover reward that’s far away

Approximate the “extra” length taken by a path over the shortest path length

s tOPTAPPROX

Min-Excess Path Problem


Given graph G, start and end nodes s, t, reward on nodes v, target reward k, find a path that collects reward at least k and minimizes (P) = ℓ(P) – d(s,t)

The Min-Excess Problem

At optimality, this is exactly the same as the k-path objective of minimizing ℓ(P)

However, approximation is different: Min-excess is strictly harder than K-path

We give a (2+)-approximation for Min-Excess

[Blum, C, Karger, Meyerson, Minkoff, Lane, FOCS’03] Our algorithm returns a path with length

d(s,t) + (2+) (P)

excess


A 3-approximation to Orienteering

There exists a path from s to t, that collects reward at least has length D

Given a 3-approximation to min-excess:1. Divide into 3 “equal-reward” parts (hypothetically)

2. Approximate the part with the smallest excess 3-approximation to orienteering

s t

Excess of one path · (1+2+3)/3Can afford an excess up to (1+2+3)

Excess of path P (P) = dP(u,v)– d(u,v)

Using an r-approx for Min-excess ( r Z+ ), we get an r-approximation for s-t Orienteering

1 2

3

v1

v2 OPT

APPROX

Open: Given an r-approx for min-excess (r 2 R +), can we get r-approx to Orienteering?

[Blum C Karger + 03]


The next step: Deadline-TSP

Every vertex has a deadline D(v); Find a path that maximizes nodes v visited before D(v)

Arises in scheduling, production planning

If the last node on the path has the min deadline, use Orienteering to approximate the reward

Don’t need to bother about deadlines of other nodes

Does OPT always have a large subpath with the above property?

There are many subpaths of OPT with the above property that together contain all the reward

NO!

[Bansal Blum C Meyerson 04]


A segmentation of OPT

Time

Dead

line


Deadline-TSP

Segment graph into many parts, approximate each using Orienteering and patch them together

How do we find such a segmentation without knowing the optimal path?

In order to avoid double-counting of reward, segments should be node-disjoint

Our result – There exists a segmentation based only on deadlines, such that the resulting solution is a (3 log n)-approximation

Open: Is there a segmentation based on other properties (eg. distance from the root), that

gives a constant approximation?


An overview of our results

Time-Window Problem 3 log2n

ApproximationProblem

Discounted-Reward TSP

Orienteering 3

References

[STOC 04]

[STOC 04]

[FOCS 03]6.75+

Deadline TSP 3 logn [STOC 04]

Min-Excess 2+ [FOCS 03]

Time-Window Problem - bicriteria

reward: log 1/ deadlines: 1+ [STOC 04]


Future Directions

Better approximations can we get a constant factor for Time-Windows? special metrics such as trees or planar graphs hardness of approximation?

Asymmetric Path-planning the graph is directed; still obeys triangle inequality polylog-approximations and lower bounds for distance need entirely different ideas for asymmetric-

Orienteering is it log-hard?

Group Path-planning Reward is associated with “groups” of nodes visit at least one node in a group to obtain reward


Future Directions

Stochastic Path-planning Closer home to Robot Navigation; The graph is a

Markov Decision Process Each edge is an “action” associated with a

probability distribution

The goal: Give a “strategy” to accomplish

a given task as fast as possible Best action could be history

dependent Can we write down the best strategy

in polynomial time? Approximate it in poly-time or

even in NP?

0.2

0.7

0.1

0.3

0.2

0.5

Correlation Clustering

Coming up next :


Natural Language Processing

In order to understand the article automatically, need to figure out which entities are one and the same

Is “his” in the second line the same person as “The secretary” in the first line?


Real-World Clustering Problems

A wide variety of clustering problems Co-reference Analysis Web document clustering Co-authorship (Citeseer/DBLP) Computer Vision

Typical characteristics: No well-defined “similarity metric” Number of clusters is unknown No predefined topics – desirable to figure them out as

part of the algorithm


Cohen, McCallum & Richman’s idea

Mr. Rumsfieldhis

he

Saddam Hussein

The secretary

Strong similarity

Strong dissimilarity

“Learn” a similarity measure based on context


Consistent clustering:edges inside clusters

edges between clusters

Mr. Rumsfieldhis

he

Saddam Hussein

The secretary

Strong similarity


A good clustering


Consistent clustering:edges inside clusters

edges between clusters

Strong similarity


A good clustering

Mr. Rumsfieldhis

he

Saddam Hussein

The secretary

Inconsistencies or “mistakes”


A good clustering

Goal: Find the most consistent clustering

Strong similarity


Mr. Rumsfieldhis

he

Saddam Hussein

The secretary

Mistakes

No consistent clustering!


Correlation Clustering

Given a graph with positive (similar) and negative (dissimilar) edges, find the most consistent clustering

NP-hard [Bansal, Blum, C, FOCS’02]

Two natural objectives – Maximize agreements

(# of +ve inside clusters) + (# of –ve between clusters)

Minimize disagreements(# of +ve between clusters) + (# of –ve inside clusters)

Equivalent at optimality, but different in terms of approximation


Overview of results

Weighted graphs

Unweighted (complete) graphs

Max AgreeMin Disagree

17433 [Bansal Blum C 02]

4 [Charikar Guruswami Wirth

03]

PTAS[Bansal Blum C 02]

1.3048O(log n)

[CGW 03]

1.3044 [Swamy 04][Immorlica Demaine 03]

[Charikar Guruswami Wirth 03]

[Emanuel Fiat 03]

116/11529/28 [CGW 03] [CGW 03]

APX-hard [CGW 03]


Minimizing Disagreements [Bansal, Blum, C, FOCS’02]

Goal: approximately minimize number of “mistakes” Assumption: The graph is unweighted and complete

A lower bound on OPT : Erroneous Triangles

Consider

+

- +

Any clustering disagrees with at least one of these edges

“Erroneous Triangle”

If several edge-disjoint erroneous ∆s, then any clustering makes a mistake on each oneDopt Maximum fractional packing of erroneous triangles


Using the lower bound: -clean clusters

Relating erroneous triangles to mistakes In special cases, we can “charge-off” disagreements to

erroneous triangles

“clean” clusters each vertex has few disagreements incident on it few is relative to the size of the cluster # of disagreements · ¼ # of erroneous triangles

“good” vertex

“bad” vertexClean cluster All vertices are good



Relating erroneous triangles to mistakes In special cases, we can “charge-off” disagreements to

erroneous triangles

-clean clusters each vertex in cluster C has fewer than |C| positive and

|C| negative mistakes # of disagreements · ¼ # of erroneous triangles

A high density of positive edgesWe can easily spot them in the graph

Possible solution: Find a -clean clustering, and charge disagreements to erroneous triangles

Caveat: It may not exist



We show: an almost--clean clustering that is almost as

good as OPT

Nice structure helps us find it easily.

Caveat: A -clean clustering may not exist

An almost--clean clustering:All clusters are either -clean or contain a single node

An almost -clean clustering always exists – trivially

OPT()


OPT() – clean or singleton

Optimal Clustering

Imaginary Procedure

OPT() : All clusters are -clean or singleton

“bad” vertice

s

Few new mistakes


Finding clean clusters

OPT()

ALG

Clean clusters

Charging-off mistakes

1. Mistakes among clean clusters - charge to erron. ∆s

2. Mistakes among singletons - no more than corresponding mistakes in OPT()


A summary of results

Weighted graphs

Unweighted (complete) graphs

Max AgreeMin Disagree

17433 [Bansal Blum C 02]

4 [Charikar Guruswami Wirth

03]

PTAS

[Bansal Blum C 02]

1.3048O(log n)

[CGW 03]

1.3044 [Swamy 04][Immorlica Demaine 03]


[Emanuel Fiat 03]

116/11529/28 [CGW 03] [CGW 03]

APX-hard [CGW 03]


Future Directions

Better combinatorial approximation The current best algorithms have a large running time

-- employ an LP with O(n2) variables

Improving the lower bound: Erroneous cycles – one negative edge and remaining

positiveThe gap of this lower bound is between 2 and 4


Can we obtain a 2-approximation?

A good “iterative” approximation on few changes to the graph, quickly recompute a good

clustering


Future Directions

Clustering with small clusters Given that all clusters in OPT have size at most k, find

a good approximation Is this NP-hard? Different from finding best clustering with small

clusters, without guarantee on OPT

Clustering with few clusters Given that OPT has at most k clusters, find an

approximation

Maximizing Correlation number of agreements – number of disagreements Can we get a constant factor approximation?


Timeline

Plan to finish in a year

Summer 04 Stochastic/Time-dependent path-planningClustering with constraints

Fall 04 Asymmetric/group path-planningCombinatorial/streaming algo for clustering

Spring 05 Wrap-up; writing; job search!

Questions?

approximation algorithms for path-planning and clustering problems on graphs (thesis proposal)

Documents

combination of reward

reward budget

locations reward

reward thats

optimal path

path of length dvisit

good path

max achievable reward