using linked data to interpret tables varish mulwad september 14, 2010 1
TRANSCRIPT
Using linked data to interpret tables
Varish MulwadSeptember 14 2010
1
Interpreting a table
httpdbpediaorgresourceBaltimorehttpdbpediaorgresourceBaltimoreLink Cell Value to an entity
Find Relationships between columnshttpdbpediaorg
ontologyPopulatedPlace
httpdbpediaorgontology
PopulatedPlaceLargestCityLargestCity
2
Annotate web tables
Confirm existing facts
in LOD
Discover knowledge
and new facts
Search query over web tables
Data integration
1000 reasons why itrsquos important hellip
prefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity hellip hellip
Interpreting a table
4
Overview
bull Introductionbull Related Work amp Motivationbull Approachbull Resultsbull Upcoming Workbull Conclusion
5
6
Introduction
The World Wide Web hellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
hellip helliphellip
hellip helliphellip
Talk abcBy xyzVenue some location
Talk abcBy xyzVenue some location
hellip helliphellip
hellip helliphellip
7
The World Wide Web hellip
Good for you and me hellip
hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html
8
Web of Data ndash The Semantic Web
Image ndash wwwlinkeddataorg9
Linked Data
The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web
Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore
10
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Interpreting a table
httpdbpediaorgresourceBaltimorehttpdbpediaorgresourceBaltimoreLink Cell Value to an entity
Find Relationships between columnshttpdbpediaorg
ontologyPopulatedPlace
httpdbpediaorgontology
PopulatedPlaceLargestCityLargestCity
2
Annotate web tables
Confirm existing facts
in LOD
Discover knowledge
and new facts
Search query over web tables
Data integration
1000 reasons why itrsquos important hellip
prefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity hellip hellip
Interpreting a table
4
Overview
bull Introductionbull Related Work amp Motivationbull Approachbull Resultsbull Upcoming Workbull Conclusion
5
6
Introduction
The World Wide Web hellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
hellip helliphellip
hellip helliphellip
Talk abcBy xyzVenue some location
Talk abcBy xyzVenue some location
hellip helliphellip
hellip helliphellip
7
The World Wide Web hellip
Good for you and me hellip
hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html
8
Web of Data ndash The Semantic Web
Image ndash wwwlinkeddataorg9
Linked Data
The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web
Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore
10
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Annotate web tables
Confirm existing facts
in LOD
Discover knowledge
and new facts
Search query over web tables
Data integration
1000 reasons why itrsquos important hellip
prefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity hellip hellip
Interpreting a table
4
Overview
bull Introductionbull Related Work amp Motivationbull Approachbull Resultsbull Upcoming Workbull Conclusion
5
6
Introduction
The World Wide Web hellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
hellip helliphellip
hellip helliphellip
Talk abcBy xyzVenue some location
Talk abcBy xyzVenue some location
hellip helliphellip
hellip helliphellip
7
The World Wide Web hellip
Good for you and me hellip
hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html
8
Web of Data ndash The Semantic Web
Image ndash wwwlinkeddataorg9
Linked Data
The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web
Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore
10
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
prefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity hellip hellip
Interpreting a table
4
Overview
bull Introductionbull Related Work amp Motivationbull Approachbull Resultsbull Upcoming Workbull Conclusion
5
6
Introduction
The World Wide Web hellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
hellip helliphellip
hellip helliphellip
Talk abcBy xyzVenue some location
Talk abcBy xyzVenue some location
hellip helliphellip
hellip helliphellip
7
The World Wide Web hellip
Good for you and me hellip
hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html
8
Web of Data ndash The Semantic Web
Image ndash wwwlinkeddataorg9
Linked Data
The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web
Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore
10
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Overview
bull Introductionbull Related Work amp Motivationbull Approachbull Resultsbull Upcoming Workbull Conclusion
5
6
Introduction
The World Wide Web hellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
hellip helliphellip
hellip helliphellip
Talk abcBy xyzVenue some location
Talk abcBy xyzVenue some location
hellip helliphellip
hellip helliphellip
7
The World Wide Web hellip
Good for you and me hellip
hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html
8
Web of Data ndash The Semantic Web
Image ndash wwwlinkeddataorg9
Linked Data
The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web
Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore
10
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
6
Introduction
The World Wide Web hellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
hellip helliphellip
hellip helliphellip
Talk abcBy xyzVenue some location
Talk abcBy xyzVenue some location
hellip helliphellip
hellip helliphellip
7
The World Wide Web hellip
Good for you and me hellip
hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html
8
Web of Data ndash The Semantic Web
Image ndash wwwlinkeddataorg9
Linked Data
The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web
Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore
10
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
The World Wide Web hellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
helliphelliphellip
hellip helliphellip
hellip helliphellip
Talk abcBy xyzVenue some location
Talk abcBy xyzVenue some location
hellip helliphellip
hellip helliphellip
7
The World Wide Web hellip
Good for you and me hellip
hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html
8
Web of Data ndash The Semantic Web
Image ndash wwwlinkeddataorg9
Linked Data
The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web
Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore
10
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
The World Wide Web hellip
Good for you and me hellip
hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html
8
Web of Data ndash The Semantic Web
Image ndash wwwlinkeddataorg9
Linked Data
The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web
Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore
10
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Web of Data ndash The Semantic Web
Image ndash wwwlinkeddataorg9
Linked Data
The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web
Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore
10
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Linked Data
The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web
Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore
10
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Related Work and Motivation
11
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Chicken Egg hellip No Chicken
bull More than a trillion documents on the Web
bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured
data 13
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Automate the process
bull We need systems that can generate data from existing sources
bull Not practical for humans to encode all this into RDF manually
14
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
On the Semantic Web hellip
bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]
bull Mapping spreadsheets to RDF [RDF123 XLWrap]
bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
hellip elsewhere
bull Learning to index tables to improve search experience (Cafarella et al 2008)
bull Expanding attributes (columns) of web tables (Lin et al 2010)
bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Interpreting a Table
17
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
T2LD Framework
18
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
19
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Predicting Class Labels for column
City
Baltimore
Boston
New York
Type
Instance
Type
Type
Type
Class Type for the column
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Querying the KnowledgendashBase
City
Baltimore
Boston
New York
1Baltimore2 Baltimore County3 John Baltimore
1Boston2 Boston_(band)3 Boston_University
1 New_York_City2 New_York3 New_York_(album)
21
dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople
Types
dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip
helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip
[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip
Eg Processing class ndash ldquodbpedia-owlCityrdquo
String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]
Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
23
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Approach
Table Cell + Column Header + Row Data
+ Column Type
Requery KB with predicted class labels as additional evidence
Generate a feature vector for the top N results of the query
Classifier ranks the entities within the set
of possible results
Select the highest ranked entity
Classifier decides whether to link or
not
Link to ldquoNILrdquoLink to the top
ranked instance
24
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Learning to Rank
bull We trained a SVMrank classifier which learnt to rank entities within a given set
Feature Vector
Similarity Measures
Popularity Measures
bull Levenshtein distancebull Dice Score
bull Wikitology Scorebull PageRankbull Page Length
25
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
ldquoTo Link or not to Link hellip rsquorsquo
bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in
the KBndash Top N results may not include the correct answer
bull We trained an SVM classifier which would determine whether to link to the top one or not
26
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
ldquoTo Link or not to Link hellip rsquorsquo
bull Feature vector included the feature vector of the top ranked entity and additional two features ndash
ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two
ranked entities
27
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
28
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Relation between columns
City
Baltimore
Boston
New York
State
Maryland
Massachusetts
New York
29
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Relation between columns
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
dbontoCapital dbontoLargestCity
Candidate relations
30
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Scoring the relations
Maryland - Baltimore
Massachusetts - Boston
New York - New York
dbontoLargestCity
dbontoLargestCitydbontoCapital
dbontoLargestCity
Candidates dbontoCapital
dbontoLargestCity
dbontoCapital Score0
dbontoCapital Score1
dbontoLargestCity Score3
31
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
T2LD Framework
Predict Class for Columns
Linking the table cells
Identify and Discover relations
32
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Annotating web tables for the Semantic Web
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion
ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity
ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity
dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity
dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-
owlAdminstrativeRegion
dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity
34
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Results
35
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Dataset summary
Number of Tables 15
Total Number of rows 199
Total Number of columns 56 (52)
Total Number of entities 639 (611)
The number in the brackets indicates excluding columns that contained numbers
36
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Dataset summary
37
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Dataset summary
38
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Evaluation for class label predictions
39
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Evaluation 1 (MAP)
bull Compared the systemrsquos ranked list of labels against a human ranked list of labels
bull Metric - Mean Average Precision (MAP)
bull Commonly used in the Information Retrieval domain to compare two ranked sets
40
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Evaluation 1 (MAP)
41
8076
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Evaluation 2 (Recall)
Recall gt 06 (75 )
42
System Ranked1 Person2 Politician3 President
Evaluator Ranked1 President2 Politician3 OfficeHolder
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Evaluation 3 (Correctness)
bull Evaluated whether our predicted class labels were ldquofair and correctrdquo
bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a
correct label for column of cities
bull Three human judges evaluated our predicted class labels
43
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Evaluation 3 (Correctness)
bull A category-wise breakdown for class label correctnessOverall
Accuracy 7692
44
Column ndash NationalityPrediction ndash MilitaryConflict
Column ndash Birth PlacePrediction ndash PopulatedPlace
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Evaluation for linking table cells to entities
45
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Category-wise accuracy for linking table cells
Overall Accuracy 6612
46
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Relation between columns
bull Idea ndash Ask human evaluators to identify relations between columns in a given table
bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset
bull Evaluators identified 20 relations
bull Our accuracy ndash 5 out of 20 (25 ) were correct
47
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Future Work
48
Current
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
AutomaticSemi-automatic template learning
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Confirming LD Facts
Baltimore MD SRawlings hellipFor Baltimore Dbpedia says
DbppropLeaderName ndash SDixon
DbppropLeaderName ndash SDixon SRawlings
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Discover knowledge relations
bull Inception rdftype dbpedia-owlMovie
bull Howard County rdftype dbpediaAdminstrativeRegion
bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Conclusion
bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents
bull We presented an automatic framework to interpret such data
bull We believe our work will contribute in materializing the web of data vision
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh
International Semantic Web Conference Springer (2008)
bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)
bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549
bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)
bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-
- Using linked data to interpret tables
- Interpreting a table
- Slide 3
- Slide 4
- Overview
- Introduction
- The World Wide Web hellip
- The World Wide Web hellip (2)
- Web of Data ndash The Semantic Web
- Linked Data
- Related Work and Motivation
- Slide 12
- Chicken Egg hellip No Chicken
- Automate the process
- On the Semantic Web hellip
- hellip elsewhere
- Interpreting a Table
- T2LD Framework
- Slide 19
- Predicting Class Labels for column
- Querying the KnowledgendashBase
- Scoring the classes
- Slide 23
- Approach
- Learning to Rank
- ldquoTo Link or not to Link hellip rsquorsquo
- ldquoTo Link or not to Link hellip rsquorsquo (2)
- Slide 28
- Relation between columns
- Relation between columns (2)
- Scoring the relations
- Slide 32
- Annotating web tables for the Semantic Web
- Table as linked RDF
- Results
- Dataset summary
- Dataset summary (2)
- Dataset summary (3)
- Evaluation for class label predictions
- Evaluation 1 (MAP)
- Evaluation 1 (MAP) (2)
- Evaluation 2 (Recall)
- Evaluation 3 (Correctness)
- Evaluation 3 (Correctness) (2)
- Evaluation for linking table cells to entities
- Category-wise accuracy for linking table cells
- Relation between columns (3)
- Future Work
- AutomaticSemi-automatic template learning
- Confirming LD Facts
- Discover knowledge relations
- Conclusion
- References
- Slide 54
-