ontology based extraction of rdf data from the world wide web
DESCRIPTION
Ontology Based Extraction of RDF Data from the World Wide Web. Tim Chartrand A Thesis Proposal Research Supported By NSF. Introduction. World Wide Web Has a huge amount of existing information Designed primarily for human consumption Semantic Web Is an extension of WWW - PowerPoint PPT PresentationTRANSCRIPT
1
Ontology Based Extraction of RDF Data from the World Wide Web
Tim ChartrandA Thesis Proposal
Research Supported By NSF
2
Introduction
World Wide Web Has a huge amount of existing information Designed primarily for human consumption
Semantic Web Is an extension of WWW Gives information a well-defined meaning Allows automation of tasks
DEG contribution – Extract data from the WWWProposed solution Extract Semantic Web data from the WWW Superimpose extracted data
3
Extraction Ontology
ExtractionEngine
HTML Page
RelationalData
Overview of Proposed Research
Extraction Ontology
DAML Ontology
User
ExtractionEngine
HTML Page
RelationalData
RDF Data
RDF Browser
4
RDF – What is it?
Resource Description Framework
Language of the Semantic Web Set of <subject><predicate><object> triples<mailto:[email protected]><genealogy#age>“25”
<mailto:[email protected]><genealogy#fatherOf><mailto:[email protected]>
Extraction Ontology
DAML Ontology
User
ExtractionEngine
HTML
RelationalDataRDF Data
mailto:[email protected]
25
genealogy:age
mailto:[email protected]
genealogy:fatherOf
genealogy:fatherOf
5
RDFS & DAML
Core ConceptsClasses daml:class – defines a class rdfs:subClassOf – specifies the generalization of a class
Properties daml:property – defines a binary relation, has a value rdfs:domain – specifies class to which a property applies rdfs:range – specifies possible values of a property
Extraction Ontology
DAML Ontology
User
ExtractionEngine
HTML
RelationalDataRDF Data
6
Example Ontology<daml:Class rdf:ID="Program">
<rdfs:label>Program</rdfs:label></daml:Class><daml:Class rdf:ID="Size">
<rdfs:label>Size</rdfs:label></daml:Class>
. . .<daml:Property rdf:ID="Name">
<rdfs:domain rdf:resource="#Program"/><rdfs:range rdf:resource="&rdfs;Literal"/><rdf:type
rdf:resource="&daml;UniqueProperty"/><rdf:type
rdf:resource="&daml;UnambiguousProperty"/></daml:Property><daml:ObjectProperty rdf:ID="ProgSize">
<rdfs:domain rdf:resource="#Program"/><rdfs:range rdf:resource="#Size"/>
</daml:ObjectProperty>. . .
Extraction Ontology
DAML Ontology
User
ExtractionEngine
HTML
RelationalDataRDF Data
7
DAML OSM
Classes Non-lexical object sets
Properties Binary relationship sets between object sets
Literal properties Binary relationship sets between non-lexical and lexical object sets
Cardinality restrictions Participation constraints
Extraction Ontology
DAML Ontology
User
ExtractionEngine
HTML
RelationalDataRDF Data
8
DAML OSM
<daml:Class rdf:ID="Program"><rdfs:label>Program</rdfs:label>
</daml:Class><daml:Class rdf:ID="Size">
<rdfs:label>Size</rdfs:label></daml:Class>
. . .<daml:Property rdf:ID="Name">
<rdfs:domain rdf:resource="#Program"/><rdfs:range rdf:resource="&rdfs;Literal"/><rdf:type
rdf:resource="&daml;UniqueProperty"/><rdf:type
rdf:resource="&daml;UnambiguousProperty"/></daml:Property><daml:ObjectProperty rdf:ID="ProgSize">
<rdfs:domain rdf:resource="#Program"/><rdfs:range rdf:resource="#Size"/>
</daml:ObjectProperty>. . .
Extraction Ontology
DAML Ontology
User
ExtractionEngine
HTML
RelationalDataRDF Data
9
Data Frames
Lexical object sets need data frame.Use data-frame libraryMatch lexical object sets with data frames Compare names
Stemming Levenshtein edit distance Soundex Longest Common Subsequence
Choose most similar data frame
Extraction Ontology
DAML Ontology
User
ExtractionEngine
HTML
RelationalDataRDF Data
10
User Modification
Cardinality Constraints Provide graphical ontology editor Allow the user to edit participation constraints Disallow the user to modify ontology structure
Data Frames Allow user to edit mapping Provide data frame editor Allow user to edit or add data frames
Extraction Ontology
DAML Ontology
User
ExtractionEngine
HTML
RelationalDataRDF Data
11
Extracting the DataExtraction Ontology
DAML Ontology
User
ExtractionEngine
HTML
RelationalDataRDF Data
12
http://www.downloads.com/Program1001
software:Program
Stick Death 1.0 Windows 3.x/95/98/Me/NT/2000/X
2.66 MB
rdf:type
software:name
software:versionsoftware:OperatingSystem
software:ProgSize
software:SizeValsoftware:SizeUnit
software:Size
rdf:type
Convert to RDFExtraction Ontology
DAML Ontology
User
ExtractionEngine
HTML
RelationalDataRDF Data
13
Superimposed DataExtraction Ontology
DAML Ontology
User
ExtractionEngine
HTML
RelationalDataRDF Data
14
Contributions
Advancement of Semantic Web
Application of Information Extraction to building Semantic Web
Semantic Web data as superimposed information
Algorithm for ontology conversion