graphical models and probabilistic reasoning for generating linked data from tables
DESCRIPTION
Graphical Models and Probabilistic Reasoning for Generating Linked Data from Tables. Varish Mulwad ( @ varish ) University of Maryland, Baltimore County Doctoral Consortium at ISWC 2011 October 24, 2011. Guru: Dr. Tim Finin. What ?. Contribution. - PowerPoint PPT PresentationTRANSCRIPT
Graphical Models and Probabilistic Reasoning for Generating Linked Data from Tables
Varish Mulwad (@varish)University of Maryland, Baltimore County
Doctoral Consortium at ISWC 2011October 24, 2011
Guru: Dr. Tim Finin
3
Contribution
Name Team Position Height
Michael Jordan Chicago Shooting guard 1.98
Allen Iverson Philadelphia Point guard 1.83
Yao Ming Houston Center 2.29
Tim Duncan San Antonio Power forward 2.11
http://dbpedia.org/class/yago/NationalBasketballAssociationTeams
http://dbpedia.org/resource/Allen_Iverson Map literals as values of properties
dbprop:team
4
Contribution
Name Team Position Height
Michael Jordan Chicago Shooting guard 1.98
Allen Iverson Philadelphia Point guard 1.83
Yao Ming Houston Center 2.29
Tim Duncan San Antonio Power forward 2.11
@prefix dbpedia: <http://dbpedia.org/resource/> .@prefix dbpedia-owl: <http://dbpedia.org/ontology/> .@prefix yago: <http://dbpedia.org/class/yago/> .
"Name"@en is rdfs:label of dbpedia-owl:BasketballPlayer ."Team"@en is rdfs:label of yago:NationalBasketballAssociationTeams .
"Michael Jordan"@en is rdfs:label of dbpedia:Michael Jordan .dbpedia:Michael Jordan a dbpedia-owl:BasketballPlayer .
"Chicago Bulls"@en is rdfs:label of dbpedia:Chicago Bulls .dbpedia:Chicago Bulls a yago:NationalBasketballAssociationTeams .
All this in a completely automated way !!
6
Tables are everywhere !! … yet …
389, 697 raw and geospatial datasets0.071 % in RDF
The web – 154 million high quality relational tables [1]
7
Current Systems
Problems with systems on the Semantic Web
– Require users to have knowledge of the Semantic Web
– Do not automatically link to existing classes and entities on the Semantic Web / Linked Data cloud
– RDF data in some cases is as useless as raw data– Majority of the work focused on relational data
where schema is available
11
A graphical model for tables
C1 C2 C3
R11
R12
R13
R21
R22
R23
R31
R32
R33
Team
Chicago
Philadelphia
Houston
San Antonio
Class
Instance
12
Parameterized graphical model
C1 C2C3
𝝍𝟓
R11 R12 R13 R21 R22 R23 R31 R32 R33
𝝍𝟑 𝝍𝟑 𝝍𝟑
𝝍𝟒 𝝍𝟒 𝝍𝟒
Function that captures the affinity between the column headers and row values
Row value
Variable Node: Column header
Captures interaction between column headers
Captures interaction between row values
Factor Node
14
Evaluation
• Dataset of > 6000 tables [2]
• Compare our accuracy against our baseline system and the results in [2]
• Use Mean Average Precision [3] to compare a ‘ranked list of possible classes/entities’ against a ranked list obtained from human evaluators
• Experiment with datasets from www.data.gov
15
References
1. Cafarella, M. J., Halevy, A., Wang, D. Z., Wu, E., Zhang, Y., 2008. Webtables: exploring the power of tables on the web. Proc. VLDB Endow.1 (1), 538-549.
2. Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. In: Proc. 36th Int. Conf. on Very Large Databases (2010)
3. Salton, G., Mcgill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)