thesis defense mini-ontology generator (mogo) mini-ontology generation from canonicalized tables...

16
Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department of Computer Science Brigham Young University Supported by the

Post on 20-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Mini-Ontology Generation from Canonicalized Tables

Stephen LynnData Extraction Research GroupDepartment of Computer ScienceBrigham Young University

Supported by the

Page 2: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

TANGO Overview

1. Transform tables into a canonicalized form

2. Generate mini-ontologies

3. Merge into a growing ontology

TANGO: Table ANalysis for Generating Ontologies

Project consists of the following three components:

Page 3: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Sample Input

Region and State InformationLocation Population (2000) Latitude LongitudeNortheast 2,122,869 Delaware 817,376 45 -90 Maine 1,305,493 44 -93Northwest 9,690,665 Oregon 3,559,547 45 -120 Washington 6,131,118 43 -120

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Sample Output

Page 4: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Mini-Ontology GeneratOr (MOGO)

Concept/Value Recognition Relationship Discovery Constraint Discovery

Page 5: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Concept/Value Recognition Lexical Clues

Labels as data values Data value assignment

Data Frame Clues Labels as data values Data value assignment

Default Classifies any unclassified

elements according to simple heuristic.

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Concepts and Value Assignments

NortheastNorthwest

DelawareMaineOregonWashington

Location Population Latitude Longitude

2,122,869817,3761,305,4939,690,6653,559,5476,131,118

45444543

-90-93-120-120

Region State

Year

20022003

Page 6: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Relationship Discovery Dimension Tree Mappings Lexical Clues

Generalization/Specialization Aggregation

Data Frames Ontology Fragment Merge

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Location

Northeast Northwest

Maine WashingtonOregonDelaware

[Dimension2]

LongitudeLatitudePopulation

2,122,869 -120817,376

Title: Region and State Information

2000

Page 7: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Constraint Discovery Generalization/Specialization Computed Values Functional Relationships Optional Participation

Region and State InformationLocation Population (2000) Latitude LongitudeNortheast 2,122,869 Delaware 817,376 45 -90 Maine 1,305,493 44 -93Northwest 9,690,665 Oregon 3,559,547 45 -120 Washington 6,131,118 43 -120

Page 8: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Validation Concept/Value Recognition

Correctly identified concepts Missed concepts False positives Data values assignment

Relationship Discovery Valid relationship sets Invalid relationship sets Missed relationship sets

Constraint Discovery Valid constraints Invalid constraints Missed constraints

Precision Recall F-measure

Concept Recognition

87% 94% 90%

Relationship Discovery

73% 81% 77%

Constraint Discovery

89% 91% 90%

FoundIncorrectTotalCorrectActual

FoundCorrectTotalprecision

___

__

CorrectActual

FoundCorrectTotalrecall

_

__

precisionrecall

precisionrecallmeasureF

**2

Page 9: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Concept Recognition What we counted:

Correct/Incorrect/Missing Concepts

Correct/Incorrect/Missing Labels

Data value assignments

Page 10: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Relationship Discovery What we counted:

Correct/incorrect/missing relationship sets

Correct/incorrect/missing aggregations and generalization/specializations

Page 11: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Constraint Discovery What we counted:

Correct/Incorrect/Missing: Generalization/Specialization

constraints Computed value constraints Functional constraints Optional constraints

Page 12: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Concept Recognition Successes

98% of concepts identifiedMissing label identification97% of values assigned to

correct concept

Common problemsFinding an appropriate labelDuplicate concepts

Page 13: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Relationship Discovery Recall of 92% for relationship sets Missing aggregations and

generalizations/specializationsOnly found in label nesting

Page 14: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Constraint Discovery F-measure of 98% for functional relationship sets Poor computed value discovery

Rows/Columns with totals

Page 15: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Conclusions

Tool to generate mini-ontologies Assessment of accuracy of automatic generation

Precision Recall F-measure

Concept Recognition

87% 94% 90%

Relationship Discovery

73% 81% 77%

Constraint Discovery

89% 91% 90%

Page 16: Thesis Defense Mini-Ontology GeneratOr (MOGO) Mini-Ontology Generation from Canonicalized Tables Stephen Lynn Data Extraction Research Group Department

Thesis DefenseMini-Ontology GeneratOr (MOGO)

Future Work Tool Enhancements

Linguistic processingData frame libraryDomain specific heuristics

Alternate UsesAnnotation for the Semantic Web