using linked data to interpret tables varish mulwad september 14, 2010 1

54
Using linked data to interpret tables Varish Mulwad September 14, 2010 1

Upload: garry-green

Post on 13-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Using linked data to interpret tables

Varish MulwadSeptember 14 2010

1

Interpreting a table

httpdbpediaorgresourceBaltimorehttpdbpediaorgresourceBaltimoreLink Cell Value to an entity

Find Relationships between columnshttpdbpediaorg

ontologyPopulatedPlace

httpdbpediaorgontology

PopulatedPlaceLargestCityLargestCity

2

Annotate web tables

Confirm existing facts

in LOD

Discover knowledge

and new facts

Search query over web tables

Data integration

1000 reasons why itrsquos important hellip

prefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity hellip hellip

Interpreting a table

4

Overview

bull Introductionbull Related Work amp Motivationbull Approachbull Resultsbull Upcoming Workbull Conclusion

5

6

Introduction

The World Wide Web hellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

hellip helliphellip

hellip helliphellip

Talk abcBy xyzVenue some location

Talk abcBy xyzVenue some location

hellip helliphellip

hellip helliphellip

7

The World Wide Web hellip

Good for you and me hellip

hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html

8

Web of Data ndash The Semantic Web

Image ndash wwwlinkeddataorg9

Linked Data

The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web

Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore

10

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Interpreting a table

httpdbpediaorgresourceBaltimorehttpdbpediaorgresourceBaltimoreLink Cell Value to an entity

Find Relationships between columnshttpdbpediaorg

ontologyPopulatedPlace

httpdbpediaorgontology

PopulatedPlaceLargestCityLargestCity

2

Annotate web tables

Confirm existing facts

in LOD

Discover knowledge

and new facts

Search query over web tables

Data integration

1000 reasons why itrsquos important hellip

prefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity hellip hellip

Interpreting a table

4

Overview

bull Introductionbull Related Work amp Motivationbull Approachbull Resultsbull Upcoming Workbull Conclusion

5

6

Introduction

The World Wide Web hellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

hellip helliphellip

hellip helliphellip

Talk abcBy xyzVenue some location

Talk abcBy xyzVenue some location

hellip helliphellip

hellip helliphellip

7

The World Wide Web hellip

Good for you and me hellip

hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html

8

Web of Data ndash The Semantic Web

Image ndash wwwlinkeddataorg9

Linked Data

The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web

Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore

10

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Annotate web tables

Confirm existing facts

in LOD

Discover knowledge

and new facts

Search query over web tables

Data integration

1000 reasons why itrsquos important hellip

prefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity hellip hellip

Interpreting a table

4

Overview

bull Introductionbull Related Work amp Motivationbull Approachbull Resultsbull Upcoming Workbull Conclusion

5

6

Introduction

The World Wide Web hellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

hellip helliphellip

hellip helliphellip

Talk abcBy xyzVenue some location

Talk abcBy xyzVenue some location

hellip helliphellip

hellip helliphellip

7

The World Wide Web hellip

Good for you and me hellip

hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html

8

Web of Data ndash The Semantic Web

Image ndash wwwlinkeddataorg9

Linked Data

The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web

Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore

10

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

prefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity hellip hellip

Interpreting a table

4

Overview

bull Introductionbull Related Work amp Motivationbull Approachbull Resultsbull Upcoming Workbull Conclusion

5

6

Introduction

The World Wide Web hellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

hellip helliphellip

hellip helliphellip

Talk abcBy xyzVenue some location

Talk abcBy xyzVenue some location

hellip helliphellip

hellip helliphellip

7

The World Wide Web hellip

Good for you and me hellip

hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html

8

Web of Data ndash The Semantic Web

Image ndash wwwlinkeddataorg9

Linked Data

The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web

Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore

10

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Overview

bull Introductionbull Related Work amp Motivationbull Approachbull Resultsbull Upcoming Workbull Conclusion

5

6

Introduction

The World Wide Web hellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

hellip helliphellip

hellip helliphellip

Talk abcBy xyzVenue some location

Talk abcBy xyzVenue some location

hellip helliphellip

hellip helliphellip

7

The World Wide Web hellip

Good for you and me hellip

hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html

8

Web of Data ndash The Semantic Web

Image ndash wwwlinkeddataorg9

Linked Data

The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web

Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore

10

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

6

Introduction

The World Wide Web hellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

hellip helliphellip

hellip helliphellip

Talk abcBy xyzVenue some location

Talk abcBy xyzVenue some location

hellip helliphellip

hellip helliphellip

7

The World Wide Web hellip

Good for you and me hellip

hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html

8

Web of Data ndash The Semantic Web

Image ndash wwwlinkeddataorg9

Linked Data

The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web

Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore

10

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

The World Wide Web hellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

helliphelliphellip

hellip helliphellip

hellip helliphellip

Talk abcBy xyzVenue some location

Talk abcBy xyzVenue some location

hellip helliphellip

hellip helliphellip

7

The World Wide Web hellip

Good for you and me hellip

hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html

8

Web of Data ndash The Semantic Web

Image ndash wwwlinkeddataorg9

Linked Data

The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web

Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore

10

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

The World Wide Web hellip

Good for you and me hellip

hellip not so good for machinesImages from httpwwwbbccoukblogsradiolabss5linked-datas5html

8

Web of Data ndash The Semantic Web

Image ndash wwwlinkeddataorg9

Linked Data

The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web

Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore

10

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Web of Data ndash The Semantic Web

Image ndash wwwlinkeddataorg9

Linked Data

The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web

Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore

10

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Linked Data

The principles of Linked Data outline the best practices to share and expose structured data on the World Wide Web

Every resource has a URI Baltimore httpdbpediaorgresourceBaltimore

10

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Related Work and Motivation

11

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Chicken Egg hellip No Chicken

bull More than a trillion documents on the Web

bull ~ 141 billion tables 154 million with high quality relational data (Cafarella et al 2008)bull Where is structured

data 13

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Automate the process

bull We need systems that can generate data from existing sources

bull Not practical for humans to encode all this into RDF manually

14

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

On the Semantic Web hellip

bull Mapping Relational databases to RDF [W3C working group ndash RDB2RDF]

bull Mapping spreadsheets to RDF [RDF123 XLWrap]

bull Practical and helpful systems but hellip ndash Require significant manual workndash Do not generate linked data

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

hellip elsewhere

bull Learning to index tables to improve search experience (Cafarella et al 2008)

bull Expanding attributes (columns) of web tables (Lin et al 2010)

bull Interpreting web tables to answer complex search queries over the web tables (Limaye et al 2010)

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Interpreting a Table

17

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

T2LD Framework

18

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

19

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Predicting Class Labels for column

City

Baltimore

Boston

New York

Type

Instance

Type

Type

Type

Class Type for the column

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Querying the KnowledgendashBase

City

Baltimore

Boston

New York

1Baltimore2 Baltimore County3 John Baltimore

1Boston2 Boston_(band)3 Boston_University

1 New_York_City2 New_York3 New_York_(album)

21

dbpedia-owlPlace dbpedia-owlAdminstrativeRegion dbpedia-owlCity dbpedia-owlAreayagoAmericanConductorsyagoLivingPeople

Types

dbpedia-owlPlace dbpedia-owlPopulatedPlace dbpedia-owlBand dbpedia-owlOrganisation hellip hellip hellip

helliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphelliphellip

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Scoring the classesPossible Classes for the column - dbpedia-owlPlacedbpedia-owlAdminstrativeRegiondbpediaowlCityyagoAmericanConductorsyagoLivingPeople dbpedia-owlBanddbpedia-owlOrganisationhelliphelliphellip

[Baltimore dbpedia-owlCity][Boston dbpedia-owlCity][New York dbpedia-owlCity] helliphellip[Baltimoredbpedia-owlBand][Bostondbpedia-owlBand]helliphelliphellip

Eg Processing class ndash ldquodbpedia-owlCityrdquo

String Baltimore (R = 1) Baltimore dbpedia-owlCity dbpedia-owlPlace [PR = 6](R = 2) Baltimore County dbpedia-owlAdministrativeRegion [PR = 4](R = 3) John Baltimore yagoAmericanConductorsyagoLivingPeople [PR = 5]

Score = w x ( 1 R ) + (1 ndash w) x (Normalized Page Rank)[Baltimore dbpediaCity] = (025 x 1 1 ) + (075 x 6 7) = 0892

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

23

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Approach

Table Cell + Column Header + Row Data

+ Column Type

Requery KB with predicted class labels as additional evidence

Generate a feature vector for the top N results of the query

Classifier ranks the entities within the set

of possible results

Select the highest ranked entity

Classifier decides whether to link or

not

Link to ldquoNILrdquoLink to the top

ranked instance

24

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Learning to Rank

bull We trained a SVMrank classifier which learnt to rank entities within a given set

Feature Vector

Similarity Measures

Popularity Measures

bull Levenshtein distancebull Dice Score

bull Wikitology Scorebull PageRankbull Page Length

25

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

ldquoTo Link or not to Link hellip rsquorsquo

bull The highest ranked entity may not the correct one to link to hellip ndash Because the string we are querying may not be in

the KBndash Top N results may not include the correct answer

bull We trained an SVM classifier which would determine whether to link to the top one or not

26

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

ldquoTo Link or not to Link hellip rsquorsquo

bull Feature vector included the feature vector of the top ranked entity and additional two features ndash

ndash The SVMrank score of the top ranked entityndash The difference in scores between the top two

ranked entities

27

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

28

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Relation between columns

City

Baltimore

Boston

New York

State

Maryland

Massachusetts

New York

29

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Relation between columns

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

dbontoCapital dbontoLargestCity

Candidate relations

30

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Scoring the relations

Maryland - Baltimore

Massachusetts - Boston

New York - New York

dbontoLargestCity

dbontoLargestCitydbontoCapital

dbontoLargestCity

Candidates dbontoCapital

dbontoLargestCity

dbontoCapital Score0

dbontoCapital Score1

dbontoLargestCity Score3

31

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

T2LD Framework

Predict Class for Columns

Linking the table cells

Identify and Discover relations

32

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Annotating web tables for the Semantic Web

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Table as linked RDFprefix rdfs lthttpwwww3org200001rdf-schemagt prefix dbpedia lthttpdbpediaorgresourcegt prefix dbpedia-owl lthttpdbpediaorgontologygt prefix dbpprop lthttpdbpediaorgpropertygt

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoStaterdquoen is rdfslabel of dbpedia-owlAdminstrativeRegion

ldquoBaltimorerdquoen is rdfslabel of dbpediaBaltimore dbpediaBaltimore a dbpedia-owlCity ldquoMDrdquoen is rdfslabel of dbpediaMaryland dbpediaMaryland a dbpedia-owlAdministrativeRegion

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion dbppropLargestCity rdfsrange dbpedia-owlCity

ldquoCityrdquoen is rdfslabel of dbpedia-owlCity ldquoCityrdquo is the common human name for the class dbpedia-owlCity

dbpediaBaltimore a dbpedia-owlCity dbpediaBaltimore is a type (instance) dbpedia-owlCity

dbppropLargestCity rdfsdomain dbpedia-owlAdminstrativeRegion The subjects of the triples using the property have to be instances of dbpedia-

owlAdminstrativeRegion

dbppropLargestCity rdfsrange dbpedia-owlCity The objects of the triples using the property have to be instances of dbpedia-owlCity

34

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Results

35

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Dataset summary

Number of Tables 15

Total Number of rows 199

Total Number of columns 56 (52)

Total Number of entities 639 (611)

The number in the brackets indicates excluding columns that contained numbers

36

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Dataset summary

37

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Dataset summary

38

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Evaluation for class label predictions

39

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Evaluation 1 (MAP)

bull Compared the systemrsquos ranked list of labels against a human ranked list of labels

bull Metric - Mean Average Precision (MAP)

bull Commonly used in the Information Retrieval domain to compare two ranked sets

40

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Evaluation 1 (MAP)

41

8076

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Evaluation 2 (Recall)

Recall gt 06 (75 )

42

System Ranked1 Person2 Politician3 President

Evaluator Ranked1 President2 Politician3 OfficeHolder

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Evaluation 3 (Correctness)

bull Evaluated whether our predicted class labels were ldquofair and correctrdquo

bull Class label may not be the most accurate one but may be correct ndash Eg dbpediaPopulatedPlace is not the most accurate but still a

correct label for column of cities

bull Three human judges evaluated our predicted class labels

43

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Evaluation 3 (Correctness)

bull A category-wise breakdown for class label correctnessOverall

Accuracy 7692

44

Column ndash NationalityPrediction ndash MilitaryConflict

Column ndash Birth PlacePrediction ndash PopulatedPlace

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Evaluation for linking table cells to entities

45

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Category-wise accuracy for linking table cells

Overall Accuracy 6612

46

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Relation between columns

bull Idea ndash Ask human evaluators to identify relations between columns in a given table

bull Pilot Experiment ndash Asked three evaluators to annotate five random tables from our dataset

bull Evaluators identified 20 relations

bull Our accuracy ndash 5 out of 20 (25 ) were correct

47

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Future Work

48

Current

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

AutomaticSemi-automatic template learning

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Confirming LD Facts

Baltimore MD SRawlings hellipFor Baltimore Dbpedia says

DbppropLeaderName ndash SDixon

DbppropLeaderName ndash SDixon SRawlings

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Discover knowledge relations

bull Inception rdftype dbpedia-owlMovie

bull Howard County rdftype dbpediaAdminstrativeRegion

bull David Beckham dbpedia-owlTeam dbpedia Los_Angeles_Galaxy

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Conclusion

bull Therersquos lot of data that is stored in html tables spreadsheets databases and documents

bull We presented an automatic framework to interpret such data

bull We believe our work will contribute in materializing the web of data vision

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54

Referencesbull Han L Finin T Parr C Sachs J Joshi A RDF123 from Spreadsheets to RDF In Seventh

International Semantic Web Conference Springer (2008)

bull Langegger A Wob W Xlwrap - querying and integrating arbitrary spreadsheets with sparql In 8th International Semantic Web Conference (ISWC2009) (2009)

bull Cafarella MJ Halevy AYWang ZDWu E Zhang Y Webtables exploring the power of tables on the web PVLDB 1 (2008) 538 - 549

bull Limaye G Sarawagi S Chakrabarti S Annotating and searching web tables using entities types and relationships In Proc of the 36th Intl Conference on Very Large Databases (VLDB) (2010)

bull Lin C X Zhao BWeninger T Han J and Liu B 2010 Entity relation discovery from web tables and links In Rappa M Jones P Freire J and Chakrabarti S eds WWW 1145ndash1146 ACM

  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54
  • Using linked data to interpret tables
  • Interpreting a table
  • Slide 3
  • Slide 4
  • Overview
  • Introduction
  • The World Wide Web hellip
  • The World Wide Web hellip (2)
  • Web of Data ndash The Semantic Web
  • Linked Data
  • Related Work and Motivation
  • Slide 12
  • Chicken Egg hellip No Chicken
  • Automate the process
  • On the Semantic Web hellip
  • hellip elsewhere
  • Interpreting a Table
  • T2LD Framework
  • Slide 19
  • Predicting Class Labels for column
  • Querying the KnowledgendashBase
  • Scoring the classes
  • Slide 23
  • Approach
  • Learning to Rank
  • ldquoTo Link or not to Link hellip rsquorsquo
  • ldquoTo Link or not to Link hellip rsquorsquo (2)
  • Slide 28
  • Relation between columns
  • Relation between columns (2)
  • Scoring the relations
  • Slide 32
  • Annotating web tables for the Semantic Web
  • Table as linked RDF
  • Results
  • Dataset summary
  • Dataset summary (2)
  • Dataset summary (3)
  • Evaluation for class label predictions
  • Evaluation 1 (MAP)
  • Evaluation 1 (MAP) (2)
  • Evaluation 2 (Recall)
  • Evaluation 3 (Correctness)
  • Evaluation 3 (Correctness) (2)
  • Evaluation for linking table cells to entities
  • Category-wise accuracy for linking table cells
  • Relation between columns (3)
  • Future Work
  • AutomaticSemi-automatic template learning
  • Confirming LD Facts
  • Discover knowledge relations
  • Conclusion
  • References
  • Slide 54