seaform slides in vldb 2010 phd workshop

27
Search-As-You-Type in Forms: Hao Wu Supervised by Prof. Lizhu Zhou Database Research Group, Tsinghua University VLDB PhD Workshop – Sept. 13, Singapore Database Research Group Leveraging the Usability and the Functionality of Search Paradigm in Relational Databases

Upload: hao-wu

Post on 23-Jun-2015

316 views

Category:

Technology


2 download

TRANSCRIPT

Page 1: Seaform Slides in VLDB 2010 PhD Workshop

Search-As-You-Type in Forms:

Hao WuSupervised by Prof. Lizhu Zhou

Database Research Group, Tsinghua University

VLDB PhD Workshop – Sept. 13, Singapore

DatabaseResearch

Group

Leveraging the Usability and the Functionalityof Search Paradigm in Relational Databases

Page 2: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 3: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 4: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 4Hao Wu, DB Group, Tsinghua University

• Relational databases are widely used.• There are many search paradigms:

▪Structured Query Language (SQL)▪Keyword Search (KS)▪Query-By-Example (QBE)

• Different search paradigms are needed by different users.

Page 5: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 5Hao Wu, DB Group, Tsinghua University

#1: SQL is complex.SELECT *FROM Author A, Autor_Paper AP, Paper PWHERE title LIKE 'keyword' AND title LIKE 'search' AND authors LIKE 'g%' AND A.id = AP.aid AND P.id = AP.pid

Page 6: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 6Hao Wu, DB Group, Tsinghua University

Traditional keyword search is imprecise.

keyword search g

Title? Conf. name? Author name?

#2:

Page 7: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 7Hao Wu, DB Group, Tsinghua University

#3: Form is awkward.

UCI Directory: http://directory.uci.edu/index.php?form_type=advanced_search

Page 8: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 8Hao Wu, DB Group, Tsinghua University

The "Search" button is not convenient.

#4:

Page 9: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

04/13/2023 9Hao Wu, DB Group, Tsinghua University

+ Keyword Search+ Form-Style Interface+ Search-as-you-type

Seaform=

Page 10: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 11: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 12: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 12

Problem Statement

• Data:▪Single relational table.▪Several searchable attributes.

ID Title Conf. Author

1 xml database VLDB albert

2 xml database SIGMOD bob

3 xml search VLDB albert

4 xml security VLDB alice

5 rdbms SIGMOD charlie

Page 13: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 13

Problem Statement

• Query:▪A set of keywords (prefixes) split by fields.▪A focus indicator.

Author:

xmlTitle:

al

Focus = Author

Page 14: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 14

Problem Statement

• Results:▪Global results: corresponding tuples.▪ Local results: corresponding attribute values.▪Aggregations.

Author:

xmlTitle:

albert 2alice 1

xml database (albert)xml search (albert)xml security (alice)al

Page 15: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 16: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 17: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 17

Challenges: Search-As-You-Type

• Prefix matching:▪E.g. al albert, alice, …

Trie structure w/ cache.• Fast response:

▪Synchronization of local resultsand global results yields heavycomputational cost.On-demand synchronization and dual-list trie.

……

Φ

a

l

b

b

o

bi

……

Page 18: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 18

Challenges: Error Tolerance

• Misplacing of keywords:▪ E.g. input "albert" into the Title input box.

Automatic query refinement (given a query, how can we modify it to obtain more results?)Large search space; rely on precise estimation and probabilistic model.

• Fuzzy matching:▪ E.g. input "albrt" instead of "albert".

Edit-distance computation on trie structure.Ranking issue of local results: should local results be sorted by edit-distance, or by aggregation values?

Page 19: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 19

Challenges: Scalability

• Handle large-scale databases:▪ There are large number of tuples.

1) Top-k algorithmPrecise aggregation is impossible in this case.2) Using RDBMS itselfIndex structure should be redesigned for DBMS; performance issues.

• Handle multiple tables:▪ Data are regularized to several tables.

Generalize the single-table local-global computation and reduce on-the-fly joins using pre-joined tables.It is hard to determine which tables are the most necessary to pre-join; extra storage cost.

Page 20: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 21: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 22: Seaform Slides in VLDB 2010 PhD Workshop

Initial Achievements

04/13/2023 22Hao Wu, DB Group, Tsinghua University

Seaform-DBLP

Features:• Single table.• Prefix matching.• Average response time

is less than 30 ms.

Limitations:• Does not tolerate errors.• Non-top-k, i.e. it returns

all matching results.• Memory-resident.

Page 23: Seaform Slides in VLDB 2010 PhD Workshop

14:00 to 15:302 Sept. 14, Tuesday

14:00 to 15:305 Sept. 15, Wednesday

Demonstrations:

Page 24: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 25: Seaform Slides in VLDB 2010 PhD Workshop

Motivation

Problem Statement

Challenges

Initial Achievements

Conclusions

Page 26: Seaform Slides in VLDB 2010 PhD Workshop

04/13/2023 Hao Wu, DB Group, Tsinghua University 26

Conclusions

• Search-as-you-type with form is a good choice to balance the usability and functionality.

• There are still many problems to solve:▪ More effective index other than trie + inverted lists.▪ Support error tolerance.▪ Native DBMS support.▪ Top-k algorithms.▪ Pre-join (materialize) tables.▪ ...

Page 27: Seaform Slides in VLDB 2010 PhD Workshop

Thankshttp://tastier.cs.thu.edu.cn/seaform/

My homepage: http://dbgroup.cs.thu.edu.cn/wuhao/