zhen zhang seung-won hwang kevin c. chang min wang christian a. lang yuan-chi chang presented acm...

45
Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented By : Pavan Kumar M.K. (1000618890) Aditya Mangipudi (1000649172)

Upload: chester-gregory

Post on 13-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Zhen Zhang Seung-won HwangKevin C. ChangMin WangChristian A. LangYuan-chi Chang

Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006

Presented By : Pavan Kumar M.K. (1000618890) Aditya Mangipudi (1000649172)

Page 2: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Introduction Motivation A* Search Algorithm A*-Driven State Space Construction Optimization Driven Configuration OPT* Search Algorithm Experiments Conclusion

Page 3: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

The wide spread of databases for managing structured data, compounded with the expanded reach of the Internet, has brought forward interesting data retrieval and analysis scenarios to RDBMS

Only the Top-K results are of interest to the user.

Page 4: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

4

Ranking query:

Top 5 ranked by GPA

+

Boolean query:

dept = CSE and year = 2

Qualifying constraint

Quantifying function

O: GPA

B: dept = CSE and year = 2

Find top answers

QUERY: Select the Top-5 2nd year students in CSE with highest GPA

Page 5: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Query Q = (G, k)

G - Goal Function G = B . O k – Retrieval Size

Page 6: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

6

Ranking query+Boolean query

How to answer?

Page 7: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

If evaluated as separate operators

If search by an overall goal function G as a ranking function

7

Boolean query B

………Ranking query R

Current techniques optimize only condition-by-condition

D Boolean query B

Ranking query R

D RBGoal function G

Page 8: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Att 1 Att 2

Page 9: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Threshold Algorithm essentially relies on a rigid assumption that G functions are Monotonic.

The monotonicity requires G to be decreasing if all its parameters are decreasing.

Page 10: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Consider the example query as below to find houses in a certain price range with good price/sqrft ratio

The function G here in Non-Monotonic.

Select h.address from House h,

Where h.price ≤ 200k ν h.price ≥ 400k

Order by h.size/|h.price-300k|

Page 11: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Att 1 Att 2

Page 12: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Existing algorithms build upon their problem-specific assumptions on the goal functions or index traversals.

For example, Threshold Algorithm assumes the monotonicity of G and the use of sorted accesses (interleaf navigation), based on which the search is implicitly hardwired.

In a Boolean Query like B = price > 100K, such a search is straightforward as the constraint expressions B explicitly suggests how to carry out a focused search, eg., visiting only the nodes with locality potentially satisfying B.

Page 13: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

In contrast, for a general k-constrained optimization query potentially involving arbitrary ranking combined with Boolean conditions and joining multiple relations, eg.. Q maximizing size/price ratio, it is no longer clear how to focus the search.

By encoding into a generic search with no assumptions on G, the search is generalized to support arbitrary G over potentially multiple indices and a combination of both hierarchical and interleaf traversals.

Page 14: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

A* is a well known search algorithm that finds the Shortest Path, given an initial and a designated goal state.

Widely used in the field of Artificial Intelligence. Uses Best-First Search Traversal. Uses heuristic information to carry out the search

in a guided manner. A* is guaranteed to find the correct answer

(Correctness) by visiting the least number of states (Optimality)

Ex: GPS, Google Maps, A lot of puzzles, games etc.

Page 15: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

For a tuple t with m attribute values, Goal Function G(t) maps the tuple to a positive numeric score.

15

G(t) = B(t)*R(t) = R(t) if B(t) is true

0 if B(t) is false(ie, lowest score)

Page 16: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Addr Price Size

1. Oak park, Chicago 600K 4500

2. Mattis, Champaign 350K 2000

3. … 150K 1000

4. … 250K 2000

5. … 300K 3500

6. … 80K 500

Select h.address from House h,

Where h.price ≤ 200k ν h.price ≥ 400k

Order by h.size/|h.price-300k|

Score

15

0

6.67

0

0

2.27

Page 17: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Addr Price Size

1. Oak park, Chicago 600K 4500

2. Mattis, Champaign 350K 2000

3. … 150K 1000

4. … 250K 2000

5. … 300K 3500

6. … 80K 500

Score

15

0

6.67

0

0

2.27

Page 18: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

To realize k-constrained optimization over databases, this paper develops the OPT* framework.

Objective: To Optimize G with the help of indices as access methods over tuples in D.

Discrete State Search: From the view of using indices, we are to search the maximizing tuples on the index nodes as “discrete states”.

Continuous Function Optimization: From the view of maximizing goal functions, we are to optimize G.

Page 19: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

19

Optimize G over D

Function optimization

of GDiscrete state

search over D

G

D

D

OPT*

Page 20: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Indices Value Space

Page 21: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

States : States in a search graph represent “localities” of values at different granularity– from coarse to fine, and eventually reach tuples in the database.

• Region State• Tuple State

Transitions : While states of space give “locations” in the map, transitions further capture possible paths followed to reach our destination of query answers.

Example : for two states u and v, there is a transition (u, v) if v ∈ Next(u)

Page 22: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

22

250

3000

350

100

1500

4000

4500

600

250-600

0-250

100-250

0-100

350-600

250-350

52 1………

b1

b3b2

b7b6

3000-4500

0-3000

1500-3000

0-1500

4000-6000

3000-4000

5 1………

a1

a6

a3a2

a7

size

Price (k)

1

52

3 4

6

Page 23: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

23

250

3000

350

100

1500

4000

4500

600 M11

M22M32 M23 M33

M66 M77

M67

M76M55 M56M75

154 2

250-600

0-250

100-250

0-100

350-600

250-350

52 1………

b1

b3b2

b7b6

3000-4500

0-3000

1500-3000

0-1500

4000-6000

3000-4000

5 1………

a1

a6

a3a2

a7

size

Price (k)

1

52

3 4

6

Mij = (ai, bj)

……

Page 24: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

24

250

3000

350

100

1500

4000

4500

600 M11

M22 M32 M23 M33

M66 M77 M67 M76M55 M56M75

154 2

250-600

0-250

100-250

0-100

350-600

250-350

52 1………

b1

b3b2

b7b6

3000-4500

0-3000

1500-3000

0-1500

4000-6000

3000-4000

5 1………

a1

a6

a3a2

a7

size

Price (k)

1

52

3 4

6

Mij =(ai, bj)

conceptually, combined space

Page 25: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Challenge 1: What is the search mechanism?

25

Page 26: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

26

> A* Gives Shortest Path to testable goal.

> The goal is to find optimal tuple states with maximal G-Score.

K-constrained optimization

Find a tuple with maximal score

A* Shortest path

Find a path with minimal distance

Page 27: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

How to encode a tuple to a path?◦ Adding a virtual target t* only reachable through tuples

How to encode maximal tuple with minimal path?◦ Quality of path depends solely on the tuple it passes

by For tuple state t D(t, t*) = - G(t) For two states r, u

D(r, u) = 0

27

M55

M11

M22 M32 M23 M33

M66 M77 M67 M76M75 M56

154 2

t*

0

0

0

0

- G(4)- G(1)

0

0

Page 28: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Challenge 2: How to guide the search?

28

Page 29: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Function optimization measures quality of states Function optimization aspects:

• Defines Proper Heuristics• Identifies a set of initial states to start search.

29

Page 30: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Input : G(x1,……,xm) and domain of values dom = xi ε [xi

1,xi2]

Output : <O,U> = OPT(G,dom) where O={gives local optima} U={Upper Bound Score}

OPTPOINT gives O Component of OPTOPTMAX gives U Component of OPT

Approaches

Analytical MethodSeach based (Ex:Hill Climbing)Template Based

Page 31: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Figure illustrates different states have different promises.

Search should favor the choice of M77 over M67 because its more promising.

HighMediumLow

Page 32: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

To guarantee completeness◦ A* requires admissible heuristics, i.e., estimate optimistically

To ensure admissible heuristics◦ Function optimization gives tightest upper bound

Analytical approaches Numeric analysis package

32

H(region) = OPTMAX(G, region)

i.e., maximal value of G in the region

Page 33: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

h(M67) gives U=0 However if we follow the link from M67 to M77, we can

reach Tuple 1 with score 15.

250

3000

350

100

1500

4000

4500

600 1

52

3 4

6

M77M67

Page 34: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

To guarantee optimality ◦ A* requires descending heuristics

To ensure descending heuristics◦ Remove uphill links

34

M11

M22 M32 M23 M33

M66 M77 M67 M76M55 M75 M56

154 2

Page 35: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

To guarantee correctness◦ Every tuple state must be reachable from start states◦ Taking only downhills requires start with high points

To ensure reachability◦ Initial states should contain all local optima

35

M11

M22 M32 M23 M33

M66 M77 M67 M76M55 M75 M56

15

42

Page 36: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

36

M11

M22 M32 M23 M33

M66 M77 M67 M76M55 M75 M56

154 2

M57…

Search is implemented as priority queue driven traversal

top-down

Page 37: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Example . Given a set of states constructed from the set of index graph I, the search, in principle, should follow those transitions to look for the tuple states maximizing the goal function.. The search may follow the path

M11 → M33 → M77 → 1 Top-down search

M57 → M77 → 1 Bottom-Up Search

Page 38: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

M11

M22 M32 M23 M33

M66 M77 M67 M76M55 M75 M56

14

25

Page 39: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

OPT* may result in different costs if started at different initial states.

Top down-> More hops | Bottom up->Less hops

Preference goes to Bottom Up but what if Goal functions G=1/(X-Y)2+1, any value satisfying

X=Y maximizes the function.

Page 40: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Comparison vs.◦ Boolean then ranking◦ Ranking then boolean

Metrics: node accessed = Nl + Nt

Settings:◦ Benchmark queries over real dataset◦ Controlled queries over synthetic dataset

40

Page 41: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Datasets:◦ 19,706 real estate listing crawled online

Queries◦ Q1: size * bedrms/| price-450k| : [40k<=price<=50k]◦ Q2: size * ebedrms / |price-350k| : [price<400k^size>4000]◦ Q3: size/price : [bedrms=3 ν bedrms=4]

41

BR_unclustered

BR_clustered

OPT*

Q1 Q2 Q3

Page 42: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Datasets◦ Three randomly generated datasets of 100k points

Uniform, gaussian, logvariatenormal Queries

◦ Linear average queries: (eg, 0.4*a + 0.6*b)◦ Nearest neighbor queries: (eg, (x-3)^2 + (y-4)^2)◦ Join queries: (0.4*R.a + 0.6*S.b: R.c=R.d)

42

!"#$

%

!"#$

! "#$%

Page 43: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Problem◦ Study K-constrained optimization queries as boolean

+ ranking Abstraction

◦ Encode K-constrained optimization into shortest path problem

Framework◦ Develop OPT* to process K-constrained optimization

43

Page 44: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

References

• Boolean + Ranking: Querying a Database by K-Constrained Optimization. Z. Zhang, S. Hwang, K. C.-C. Chang, M. Wang, C. Lang, and Y. Chang. In Proceedings of the 2006 ACM SIGMOD Conference (SIGMOD 2006), pages 359-370, Chicago, June 2006

• www.wikipedia.org

44

Page 45: Zhen Zhang Seung-won Hwang Kevin C. Chang Min Wang Christian A. Lang Yuan-chi Chang Presented ACM SIGMOD Conference (SIGMOD 2006), Chicago, June 2006 Presented

Questions?

45