ontology based extraction of rdf data from the world wide web

14
1 Ontology Based Extraction of RDF Data from the World Wide Web Tim Chartrand A Thesis Proposal Research Supported By NSF

Upload: soren

Post on 23-Jan-2016

48 views

Category:

Documents


0 download

DESCRIPTION

Ontology Based Extraction of RDF Data from the World Wide Web. Tim Chartrand A Thesis Proposal Research Supported By NSF. Introduction. World Wide Web Has a huge amount of existing information Designed primarily for human consumption Semantic Web Is an extension of WWW - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Ontology Based Extraction of RDF Data  from the World Wide Web

1

Ontology Based Extraction of RDF Data from the World Wide Web

Tim ChartrandA Thesis Proposal

Research Supported By NSF

Page 2: Ontology Based Extraction of RDF Data  from the World Wide Web

2

Introduction

World Wide Web Has a huge amount of existing information Designed primarily for human consumption

Semantic Web Is an extension of WWW Gives information a well-defined meaning Allows automation of tasks

DEG contribution – Extract data from the WWWProposed solution Extract Semantic Web data from the WWW Superimpose extracted data

Page 3: Ontology Based Extraction of RDF Data  from the World Wide Web

3

Extraction Ontology

ExtractionEngine

HTML Page

RelationalData

Overview of Proposed Research

Extraction Ontology

DAML Ontology

User

ExtractionEngine

HTML Page

RelationalData

RDF Data

RDF Browser

Page 4: Ontology Based Extraction of RDF Data  from the World Wide Web

4

RDF – What is it?

Resource Description Framework

Language of the Semantic Web Set of <subject><predicate><object> triples<mailto:[email protected]><genealogy#age>“25”

<mailto:[email protected]><genealogy#fatherOf><mailto:[email protected]>

Extraction Ontology

DAML Ontology

User

ExtractionEngine

HTML

RelationalDataRDF Data

mailto:[email protected]

25

genealogy:age

mailto:[email protected]

genealogy:fatherOf

genealogy:fatherOf

Page 5: Ontology Based Extraction of RDF Data  from the World Wide Web

5

RDFS & DAML

Core ConceptsClasses daml:class – defines a class rdfs:subClassOf – specifies the generalization of a class

Properties daml:property – defines a binary relation, has a value rdfs:domain – specifies class to which a property applies rdfs:range – specifies possible values of a property

Extraction Ontology

DAML Ontology

User

ExtractionEngine

HTML

RelationalDataRDF Data

Page 6: Ontology Based Extraction of RDF Data  from the World Wide Web

6

Example Ontology<daml:Class rdf:ID="Program">

<rdfs:label>Program</rdfs:label></daml:Class><daml:Class rdf:ID="Size">

<rdfs:label>Size</rdfs:label></daml:Class>

. . .<daml:Property rdf:ID="Name">

<rdfs:domain rdf:resource="#Program"/><rdfs:range rdf:resource="&rdfs;Literal"/><rdf:type

rdf:resource="&daml;UniqueProperty"/><rdf:type

rdf:resource="&daml;UnambiguousProperty"/></daml:Property><daml:ObjectProperty rdf:ID="ProgSize">

<rdfs:domain rdf:resource="#Program"/><rdfs:range rdf:resource="#Size"/>

</daml:ObjectProperty>. . .

Extraction Ontology

DAML Ontology

User

ExtractionEngine

HTML

RelationalDataRDF Data

Page 7: Ontology Based Extraction of RDF Data  from the World Wide Web

7

DAML OSM

Classes Non-lexical object sets

Properties Binary relationship sets between object sets

Literal properties Binary relationship sets between non-lexical and lexical object sets

Cardinality restrictions Participation constraints

Extraction Ontology

DAML Ontology

User

ExtractionEngine

HTML

RelationalDataRDF Data

Page 8: Ontology Based Extraction of RDF Data  from the World Wide Web

8

DAML OSM

<daml:Class rdf:ID="Program"><rdfs:label>Program</rdfs:label>

</daml:Class><daml:Class rdf:ID="Size">

<rdfs:label>Size</rdfs:label></daml:Class>

. . .<daml:Property rdf:ID="Name">

<rdfs:domain rdf:resource="#Program"/><rdfs:range rdf:resource="&rdfs;Literal"/><rdf:type

rdf:resource="&daml;UniqueProperty"/><rdf:type

rdf:resource="&daml;UnambiguousProperty"/></daml:Property><daml:ObjectProperty rdf:ID="ProgSize">

<rdfs:domain rdf:resource="#Program"/><rdfs:range rdf:resource="#Size"/>

</daml:ObjectProperty>. . .

Extraction Ontology

DAML Ontology

User

ExtractionEngine

HTML

RelationalDataRDF Data

Page 9: Ontology Based Extraction of RDF Data  from the World Wide Web

9

Data Frames

Lexical object sets need data frame.Use data-frame libraryMatch lexical object sets with data frames Compare names

Stemming Levenshtein edit distance Soundex Longest Common Subsequence

Choose most similar data frame

Extraction Ontology

DAML Ontology

User

ExtractionEngine

HTML

RelationalDataRDF Data

Page 10: Ontology Based Extraction of RDF Data  from the World Wide Web

10

User Modification

Cardinality Constraints Provide graphical ontology editor Allow the user to edit participation constraints Disallow the user to modify ontology structure

Data Frames Allow user to edit mapping Provide data frame editor Allow user to edit or add data frames

Extraction Ontology

DAML Ontology

User

ExtractionEngine

HTML

RelationalDataRDF Data

Page 11: Ontology Based Extraction of RDF Data  from the World Wide Web

11

Extracting the DataExtraction Ontology

DAML Ontology

User

ExtractionEngine

HTML

RelationalDataRDF Data

Page 12: Ontology Based Extraction of RDF Data  from the World Wide Web

12

http://www.downloads.com/Program1001

software:Program

Stick Death 1.0 Windows 3.x/95/98/Me/NT/2000/X

2.66 MB

rdf:type

software:name

software:versionsoftware:OperatingSystem

software:ProgSize

software:SizeValsoftware:SizeUnit

software:Size

rdf:type

Convert to RDFExtraction Ontology

DAML Ontology

User

ExtractionEngine

HTML

RelationalDataRDF Data

Page 13: Ontology Based Extraction of RDF Data  from the World Wide Web

13

Superimposed DataExtraction Ontology

DAML Ontology

User

ExtractionEngine

HTML

RelationalDataRDF Data

Page 14: Ontology Based Extraction of RDF Data  from the World Wide Web

14

Contributions

Advancement of Semantic Web

Application of Information Extraction to building Semantic Web

Semantic Web data as superimposed information

Algorithm for ontology conversion