modeling and querying web data a survey

17
Modeling and Querying Web Data A Survey By Li Lu

Upload: elmo

Post on 16-Jan-2016

56 views

Category:

Documents


0 download

DESCRIPTION

Modeling and Querying Web Data A Survey. By Li Lu. Overview. Introduction Data Representation for Querying the Web Modeling and Querying the Web Summary and Future. Introduction. Background - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Modeling and Querying Web Data A Survey

Modeling and Querying Web DataA Survey

By Li Lu

Page 2: Modeling and Querying Web Data A Survey

Overview

Introduction Data Representation for Querying the Web Modeling and Querying the Web Summary and Future

Page 3: Modeling and Querying Web Data A Survey

Introduction

Background• The most common techniques used in searching

information from the Web are based on sending information retrieval requests to index servers.

• Use web query techniques to locate, filter and present web information.

Challenges• Difficult to build a common model for Web.

• Hard to extract information from web data.

Page 4: Modeling and Querying Web Data A Survey

Data Representation for Querying the Web

Graph Data Models

• Based on a labeled graph in which the nodes represent web pages, edges represent links between web pages, and the labels on the edges can be attribute names

• Capable of express navigational queries over the graph structure.

Semistructured Data Models• Based on labeled directed graphs. There is no

restriction on the number of edges that can go out from a given node, or on the type of attribute value.

• Be able to query the schema or the labels on the edges of the graph

Page 5: Modeling and Querying Web Data A Survey

Data Representation for Querying the Web (cont.)

A Hypertree Containing a Publications Database (WebOQL) [AM98]

Page 6: Modeling and Querying Web Data A Survey

Semantic Web Data Models• Semantic Web is a Web whose content can be

annotated by metadata and be processed automatically by machines.

• The formulation of semantic assertions of semantic Web is based on Resource Description Framework (RDF) model [LS99], which can be viewed as a partially labeled directed graph.

• They have the ability to exploit the semantics of the Web content and can provide better query result than their counterpart that based on the content and structure of the Web data.

Data Representation for Querying the Web (cont.)

Page 7: Modeling and Querying Web Data A Survey

Data Representation for Querying the Web (cont.)

An Example RDF Graph [WWW1]

Page 8: Modeling and Querying Web Data A Survey

Modeling and Querying the Web

Query Languages for Graph Representation of Website • The query languages combine both the content-based

queries and structure-based queries. Therefore, they are able to formulate regular path expression queries and to express navigational queries over the graph structure.

• WebSQL [MMM97], W3QL [KS95], WebLog [LSS96]

• Example: WebSQL [MMM97]

Page 9: Modeling and Querying Web Data A Survey

Modeling and Querying the Web (cont.)

WebSQL [MMM97]

• Model of Web as a relational database with two virtual relations: Document and Anchor.

“Document[url, title, text, type, length, modif]”

“Anchor[base, href, label]”

• To map onto the graph structure of the WWW, each document in the Document relation is mapped to a node object in the graph and each hypertext link between two documents in Anchor relation is represented by a link object.

Page 10: Modeling and Querying Web Data A Survey

Modeling and Querying the Web (cont.)

• Sample query [FLM98]: to find a list of tuples of the form (d1, d2, label), where d1 is a document stored at local site, d2 is a document stored somewhere else, and d1 points to d2 by a link labeled label. Suppose all the local documents are reachable from www.mysite.start.

“SELECT d.url, e.url, a.label

FROM Document d SUCH THAT

www.mysite.start * d,

Document e SUCH THAT d => e,

Anchor a SUCH THAT a.base = d.url

WHERE a.href = e.url”

Page 11: Modeling and Querying Web Data A Survey

Modeling and Querying the Web (cont.) Query Languages for Semi-Structured

Representation of Website • To discover the implicit structure within the

semistructured Web data and then recast the Web data to fit into the discovered structure

• WG-Log [CDPT98], ULIXES and PENELOPE [AMM97a], WebOQL [AM98]

• Example: WebOQL [AM98]

Page 12: Modeling and Querying Web Data A Survey

Modeling and Querying the Web (cont.)WebOQL [AM98]

• Introduced a hypertree data structure. Hypertree is an ordered arc-labeled tree with two kinds of arcs, internal arcs and external arcs. Internal arcs are used to indicate structured objects and external arcs are used to indicate hyperlinks among objects. Arcs are labeled with records.

A Hypertree Containing a Publications Database (WebOQL) [AM98]

Page 13: Modeling and Querying Web Data A Survey

Modeling and Querying the Web (cont.)

• Represent web pages by hypertree and mapping function. Mapping function is used to map URLs to corresponding hypertrees. The hypertree and mapping function are also called schema and browsing function of the Web respectively.

• Sample query [FLM98]: to extract the title and URL of the full version of papers authored by “Smith” from the csPapers database.

“SELECT [y.Title, y’.Url]

FROM x in csPapers, y in x’

WHERE y.Authors ~ “Smith” ”

Page 14: Modeling and Querying Web Data A Survey

Modeling and Querying the Web

Query Languages for Semantic Web • Semantic web is a web whose content can be

annotated by metadata and be processed automatically by machines.

• Semantic query has the ability to exploit the semantics of the Web content.

• RQL [KACPS02], SquishQL [MSR02] , TRIPLE [SBAHKW02].

Page 15: Modeling and Querying Web Data A Survey

Summary and Future Summary

• Web data models are divided into three main categories: graph data model, semistructured data model and semantic web data model.

• Based on these data models, Web query languages are also classified into three primary groups.

Future• To develop techniques to manipulate dynamic pages could be

beneficial to Web query application and it may be a promising direction for future research.

• To combine the query result from different resource on the Web, especially the result from both structured and unstructured data sources also pose some challenges for future research.

Page 16: Modeling and Querying Web Data A Survey

[AM98] G. Arocena, A. Mendelzon, “WebOQL: Restructuring Documents, Databases, and Webs”, Proc. ICDE'98, Orlando, Florida, Feb. 1998.

[CDPT98] S. Comai, E. Damiani, R. Posenato, L. Tanca, “A Schema-based Approach to Modeling and Querying WWW Data”, Proc. of FQAS'98, Roskilde, May 1998, LNAI 1495.

[AMM97a] P. Atzeni, G. Mecca, P. Merialdo, “To Weave the Web”, International Conference on Very Large Data Bases (VLDB'97), Athens, Greece, August 26-29, 1997, pages 206-215.

[FLM98] D. Florescu, A. Levy, A. Mendelzon, “Database Techniques for the World-Wide Web: A Survey”, SIGMOD Record 27, 3 (1998), 59-74.

[KACPS02] G. Karvounarakis, S. Alexaki, V. Christophides, D. Plexousakis, M. Scholl, “RQL: A Declarative Query Language for RDF”, WWW2002, May 2002, Honolulu, Hawaii.

[KS95] D. Konopnicki and O. Shmueli, “W3QS: A query system for the World Wide Web”, In Proc. of the Int. Conf. on Very Large Data Bases (VLDB), pages 54-65, Zurich, Switzerland, 1995.

[LSS96] L. V. S. Lakshmanan, F. Sadri, L. N. Subramanian, “A declarative language for querying and restructuring the Web”, In Proc. of the sixth International Workshop on Research Issues in Data Engineering, RIDE’96, New Orleans, February 1996.

[MM97] A. O. Mendelzon, T. Milo, “Formal Models of Web Queries”, Proceedings of the Sixteenth ACM Symposium on Principles of Database Systems, 134-143, 1997.

[MMM97] A. Mendelzon, G. Mihaila, T. Milo, “Querying the world wide web”, International Journal on Digital Libraries, 1(1):54-67, 1997.

References

Page 17: Modeling and Querying Web Data A Survey

[MSR02] L. Miller, A. Seaborne, A. Reggiori, “Three Implementations of SquishQL, a Simple RDF Query Language”, Proceedings of 1st International Semantic Web Conference. ISWC2002, Sardinia, Italy, June 9-12, 2002

[SBAHKW02]A. Sheth, C. Bertram, D. Avant, B. Hammond, K. Kochut, Y. Warke, “Semantic Content Management for Enterprises and the Web”, IEEE Internet Computing, July/August 2002, pp.80-87, 2002.

[WWW1] http://www.amk.ca/talks/semweb-intro, “Introduction to the Semantic Web and RDF”