approximation algorithms for path-planning and clustering problems on graphs (thesis proposal)
DESCRIPTION
Approximation algorithms for Path-Planning and Clustering problems on graphs (Thesis Proposal). Shuchi Chawla Carnegie Mellon University. Two classes of Graph Optimization problems. Optimization problems on graphs arise in many fields Typically NP-hard - PowerPoint PPT PresentationTRANSCRIPT
Approximation algorithms for
Path-Planning and Clustering
problems on graphs
(Thesis Proposal)
Shuchi ChawlaCarnegie Mellon University
Shuchi Chawla, Carnegie Mellon University2
Two classes of Graph Optimization problems
Optimization problems on graphs arise in many fields
Typically NP-hard
We consider two classes of problems motivated by machine learning and AI:
Path-planning – Construct a “good” path, given a map
Clustering – Divide objects into groups based on similarity
Path-planning Problems
Shuchi Chawla, Carnegie Mellon University4
A Robot Navigation Problem
Task: Deliver packages to certain locations Faster delivery => greater happiness; “reward”
Want a path with short length and large reward
Classic formulation – Traveling SalesmanFind the shortest tour covering all locations
Some complicating constraints Limited battery power – robot may die before finishing
task Packages have different deadlines for delivery Preference to the larger reward packages
An alternate formulation – Orienteering Construct a path of length · D Visit as many locations (reward) as possible
Shuchi Chawla, Carnegie Mellon University5
Path-planning in the real-world: Motivation
Given graph (metric) G, construct a path satisfying some constraints and optimizing some function.
Some applications:Robotics Assembly analysisManufacturing Production planning
A trade-off between time and rewardmaximize reward with bounded lengthminimize length with reward quotasome combination of both
Shuchi Chawla, Carnegie Mellon University6
A time-reward trade-off
Impose a reward quota and minimize length Metric TSP Collect all points k-Path Collect at least k reward
Budget the path-length and maximize reward Orienteering Hard bound on path length Time Window Visit node v within [Rv, Dv]
Optimize a combination of reward and length Prize Collecting TSP Min (length + reward left) Discounted Reward TSP max reward; reward
decreases with time
Shuchi Chawla, Carnegie Mellon University7
A time-reward trade-off
Impose a reward quota and minimize length Metric TSP 1.5 [Christofides 76] k-Path 2 + [Chaudhury Godfrey Rao+ 03]
Budget the path-length and maximize reward Orienteering 3 Time Window 3log2n [Bansal Blum C Meyerson 04]
Optimize a combination of reward and length Prize Collecting TSP 2 [Goemans Williamson 95] Discounted Reward TSP 6.75 + [Blum C
Karger+ 03]
[Blum C Karger Meyerson Minkoff Lane 03]
Shuchi Chawla, Carnegie Mellon University8
Orienteering and k-Path
Orienteering : length · D ; maximize reward k-Path : reward ¸ k ; minimize length
Complementary problems
Series of results on k-TSP (related to k-Path)
[BRV99] [Garg99] [AK00] [CGRT03] …
best approx: (2+)
None for Orienteering until recently!
Shuchi Chawla, Carnegie Mellon University9
Why is Orienteering difficult?
First attempt – Use distance-based approximations to approximate reward
Let OPT(d) = max achievable reward with length d
A 2-approx for distance implies that ALG(d) ≥ OPT(d/2)
However, we may have OPT(d/2) << OPT(d) Bad trade-off between distance and reward!
sOPT(d)
APPROX
Shuchi Chawla, Carnegie Mellon University10
Why is Orienteering difficult?
Second attempt – approximate subparts of the optimal path and shortcut other parts
If we stray away from the optimal path by a lot, we may not be able to cover reward that’s far away
Approximate the “extra” length taken by a path over the shortest path length
s tOPTAPPROX
Min-Excess Path Problem
Shuchi Chawla, Carnegie Mellon University11
Given graph G, start and end nodes s, t, reward on nodes v, target reward k, find a path that collects reward at least k and minimizes (P) = ℓ(P) – d(s,t)
The Min-Excess Problem
At optimality, this is exactly the same as the k-path objective of minimizing ℓ(P)
However, approximation is different: Min-excess is strictly harder than K-path
We give a (2+)-approximation for Min-Excess
[Blum, C, Karger, Meyerson, Minkoff, Lane, FOCS’03] Our algorithm returns a path with length
d(s,t) + (2+) (P)
excess
Shuchi Chawla, Carnegie Mellon University12
A 3-approximation to Orienteering
There exists a path from s to t, that collects reward at least has length D
Given a 3-approximation to min-excess:1. Divide into 3 “equal-reward” parts (hypothetically)
2. Approximate the part with the smallest excess 3-approximation to orienteering
s t
Excess of one path · (1+2+3)/3Can afford an excess up to (1+2+3)
Excess of path P (P) = dP(u,v)– d(u,v)
Using an r-approx for Min-excess ( r Z+ ), we get an r-approximation for s-t Orienteering
1 2
3
v1
v2 OPT
APPROX
Open: Given an r-approx for min-excess (r 2 R +), can we get r-approx to Orienteering?
[Blum C Karger + 03]
Shuchi Chawla, Carnegie Mellon University13
The next step: Deadline-TSP
Every vertex has a deadline D(v); Find a path that maximizes nodes v visited before D(v)
Arises in scheduling, production planning
If the last node on the path has the min deadline, use Orienteering to approximate the reward
Don’t need to bother about deadlines of other nodes
Does OPT always have a large subpath with the above property?
There are many subpaths of OPT with the above property that together contain all the reward
NO!
[Bansal Blum C Meyerson 04]
Shuchi Chawla, Carnegie Mellon University14
A segmentation of OPT
Time
Dead
line
Shuchi Chawla, Carnegie Mellon University15
Deadline-TSP
Segment graph into many parts, approximate each using Orienteering and patch them together
How do we find such a segmentation without knowing the optimal path?
In order to avoid double-counting of reward, segments should be node-disjoint
Our result – There exists a segmentation based only on deadlines, such that the resulting solution is a (3 log n)-approximation
Open: Is there a segmentation based on other properties (eg. distance from the root), that
gives a constant approximation?
Shuchi Chawla, Carnegie Mellon University16
An overview of our results
Time-Window Problem 3 log2n
ApproximationProblem
Discounted-Reward TSP
Orienteering 3
References
[STOC 04]
[STOC 04]
[FOCS 03]6.75+
Deadline TSP 3 logn [STOC 04]
Min-Excess 2+ [FOCS 03]
Time-Window Problem - bicriteria
reward: log 1/ deadlines: 1+ [STOC 04]
Shuchi Chawla, Carnegie Mellon University17
Future Directions
Better approximations can we get a constant factor for Time-Windows? special metrics such as trees or planar graphs hardness of approximation?
Asymmetric Path-planning the graph is directed; still obeys triangle inequality polylog-approximations and lower bounds for distance need entirely different ideas for asymmetric-
Orienteering is it log-hard?
Group Path-planning Reward is associated with “groups” of nodes visit at least one node in a group to obtain reward
Shuchi Chawla, Carnegie Mellon University18
Future Directions
Stochastic Path-planning Closer home to Robot Navigation; The graph is a
Markov Decision Process Each edge is an “action” associated with a
probability distribution
The goal: Give a “strategy” to accomplish
a given task as fast as possible Best action could be history
dependent Can we write down the best strategy
in polynomial time? Approximate it in poly-time or
even in NP?
0.2
0.7
0.1
0.3
0.2
0.5
Correlation Clustering
Coming up next :
Shuchi Chawla, Carnegie Mellon University20
Natural Language Processing
In order to understand the article automatically, need to figure out which entities are one and the same
Is “his” in the second line the same person as “The secretary” in the first line?
Shuchi Chawla, Carnegie Mellon University21
Real-World Clustering Problems
A wide variety of clustering problems Co-reference Analysis Web document clustering Co-authorship (Citeseer/DBLP) Computer Vision
Typical characteristics: No well-defined “similarity metric” Number of clusters is unknown No predefined topics – desirable to figure them out as
part of the algorithm
Shuchi Chawla, Carnegie Mellon University22
Cohen, McCallum & Richman’s idea
Mr. Rumsfieldhis
he
Saddam Hussein
The secretary
Strong similarity
Strong dissimilarity
“Learn” a similarity measure based on context
Shuchi Chawla, Carnegie Mellon University23
Consistent clustering:edges inside clusters
edges between clusters
Mr. Rumsfieldhis
he
Saddam Hussein
The secretary
Strong similarity
Strong dissimilarity
A good clustering
Shuchi Chawla, Carnegie Mellon University24
Consistent clustering:edges inside clusters
edges between clusters
Strong similarity
Strong dissimilarity
A good clustering
Mr. Rumsfieldhis
he
Saddam Hussein
The secretary
Inconsistencies or “mistakes”
Shuchi Chawla, Carnegie Mellon University25
A good clustering
Goal: Find the most consistent clustering
Strong similarity
Strong dissimilarity
Mr. Rumsfieldhis
he
Saddam Hussein
The secretary
Mistakes
No consistent clustering!
Shuchi Chawla, Carnegie Mellon University26
Correlation Clustering
Given a graph with positive (similar) and negative (dissimilar) edges, find the most consistent clustering
NP-hard [Bansal, Blum, C, FOCS’02]
Two natural objectives – Maximize agreements
(# of +ve inside clusters) + (# of –ve between clusters)
Minimize disagreements(# of +ve between clusters) + (# of –ve inside clusters)
Equivalent at optimality, but different in terms of approximation
Shuchi Chawla, Carnegie Mellon University27
Overview of results
Weighted graphs
Unweighted (complete) graphs
Max AgreeMin Disagree
17433 [Bansal Blum C 02]
4 [Charikar Guruswami Wirth
03]
PTAS[Bansal Blum C 02]
1.3048O(log n)
[CGW 03]
1.3044 [Swamy 04][Immorlica Demaine 03]
[Charikar Guruswami Wirth 03]
[Emanuel Fiat 03]
116/11529/28 [CGW 03] [CGW 03]
APX-hard [CGW 03]
Shuchi Chawla, Carnegie Mellon University28
Minimizing Disagreements [Bansal, Blum, C, FOCS’02]
Goal: approximately minimize number of “mistakes” Assumption: The graph is unweighted and complete
A lower bound on OPT : Erroneous Triangles
Consider
+
- +
Any clustering disagrees with at least one of these edges
“Erroneous Triangle”
If several edge-disjoint erroneous ∆s, then any clustering makes a mistake on each oneDopt Maximum fractional packing of erroneous triangles
Shuchi Chawla, Carnegie Mellon University29
Using the lower bound: -clean clusters
Relating erroneous triangles to mistakes In special cases, we can “charge-off” disagreements to
erroneous triangles
“clean” clusters each vertex has few disagreements incident on it few is relative to the size of the cluster # of disagreements · ¼ # of erroneous triangles
“good” vertex
“bad” vertexClean cluster All vertices are good
Shuchi Chawla, Carnegie Mellon University30
Using the lower bound: -clean clusters
Relating erroneous triangles to mistakes In special cases, we can “charge-off” disagreements to
erroneous triangles
-clean clusters each vertex in cluster C has fewer than |C| positive and
|C| negative mistakes # of disagreements · ¼ # of erroneous triangles
A high density of positive edgesWe can easily spot them in the graph
Possible solution: Find a -clean clustering, and charge disagreements to erroneous triangles
Caveat: It may not exist
Shuchi Chawla, Carnegie Mellon University31
Using the lower bound: -clean clusters
We show: an almost--clean clustering that is almost as
good as OPT
Nice structure helps us find it easily.
Caveat: A -clean clustering may not exist
An almost--clean clustering:All clusters are either -clean or contain a single node
An almost -clean clustering always exists – trivially
OPT()
Shuchi Chawla, Carnegie Mellon University32
OPT() – clean or singleton
Optimal Clustering
Imaginary Procedure
OPT() : All clusters are -clean or singleton
“bad” vertice
s
Few new mistakes
Shuchi Chawla, Carnegie Mellon University33
Finding clean clusters
OPT()
ALG
Clean clusters
Charging-off mistakes
1. Mistakes among clean clusters - charge to erron. ∆s
2. Mistakes among singletons - no more than corresponding mistakes in OPT()
Shuchi Chawla, Carnegie Mellon University34
A summary of results
Weighted graphs
Unweighted (complete) graphs
Max AgreeMin Disagree
17433 [Bansal Blum C 02]
4 [Charikar Guruswami Wirth
03]
PTAS
[Bansal Blum C 02]
1.3048O(log n)
[CGW 03]
1.3044 [Swamy 04][Immorlica Demaine 03]
[Charikar Guruswami Wirth 03]
[Emanuel Fiat 03]
116/11529/28 [CGW 03] [CGW 03]
APX-hard [CGW 03]
Shuchi Chawla, Carnegie Mellon University35
Future Directions
Better combinatorial approximation The current best algorithms have a large running time
-- employ an LP with O(n2) variables
Improving the lower bound: Erroneous cycles – one negative edge and remaining
positiveThe gap of this lower bound is between 2 and 4
[Charikar Guruswami Wirth 03]
Can we obtain a 2-approximation?
A good “iterative” approximation on few changes to the graph, quickly recompute a good
clustering
Shuchi Chawla, Carnegie Mellon University36
Future Directions
Clustering with small clusters Given that all clusters in OPT have size at most k, find
a good approximation Is this NP-hard? Different from finding best clustering with small
clusters, without guarantee on OPT
Clustering with few clusters Given that OPT has at most k clusters, find an
approximation
Maximizing Correlation number of agreements – number of disagreements Can we get a constant factor approximation?
Shuchi Chawla, Carnegie Mellon University37
Timeline
Plan to finish in a year
Summer 04 Stochastic/Time-dependent path-planningClustering with constraints
Fall 04 Asymmetric/group path-planningCombinatorial/streaming algo for clustering
Spring 05 Wrap-up; writing; job search!
Questions?