semiotics in spreadsheets

89
Semiotics in Spreadsheets: Enhancing Semantic Interoperability Ivelize Rocha Bernardo André Santanchè

Upload: ivelize-rocha-bernardo

Post on 14-Jul-2015

164 views

Category:

Education


1 download

TRANSCRIPT

Page 1: Semiotics in spreadsheets

Semiotics in Spreadsheets: Enhancing Semantic

InteroperabilityIvelize Rocha Bernardo

André Santanchè

Page 2: Semiotics in spreadsheets

Outline

•Motivation•Research Problems•Related Work•What I did in my Master Degree•Limitations of the Master Degree Proposal•Which are the plans to the PhD

Page 3: Semiotics in spreadsheets

Motivation

Large amount of information in spreadsheets[Syed et al., 2010]

Page 4: Semiotics in spreadsheets

Motivation

Large amount of information in spreadsheets[Syed et al., 2010]

Why?

•They are intuitive•They have high flexibility -> diverse needs

Page 5: Semiotics in spreadsheets

Motivation

However, they were designed for:•Isolated use•Human reading

Page 6: Semiotics in spreadsheets

Research Goal

The main goal of our research is to promote a richer semantic interoperability among spreadsheets

Page 7: Semiotics in spreadsheets

Interoperability(Ouksel & Sheth 1999)

system interoperabilitysyntactic interoperabilitystructural interoperabilitysemantic interoperability

Page 8: Semiotics in spreadsheets

Interoperability(Ouksel & Sheth 1999)

system interoperabilitysyntactic interoperabilitystructural interoperabilitysemantic interoperability

(Tolk 2006)no interoperabilitytechnical interoperabilitysyntactic interoperabilitysemantic interoperabilitypragmatic interoperabilitydynamic interoperabilityconceptual interoperability

Page 9: Semiotics in spreadsheets

Interoperability(Ouksel & Sheth 1999)

system interoperabilitysyntactic interoperabilitystructural interoperabilitysemantic interoperability

(Tolk 2006)no interoperabilitytechnical interoperabilitysyntactic interoperabilitysemantic interoperabilitypragmatic interoperabilitydynamic interoperabilityconceptual interoperability

Page 10: Semiotics in spreadsheets

Interoperability

semantic interoperability semantic interoperabilitypragmatic interoperabilitydynamic interoperabilityconceptual interoperability

Data Interpretation

Page 11: Semiotics in spreadsheets
Page 12: Semiotics in spreadsheets

Which elements must be considered in this

interpretation process?

Page 13: Semiotics in spreadsheets

Which elements must be considered in this

interpretation process?

Unity Interpretation

Page 14: Semiotics in spreadsheets

Related Work

isolated label

(Han et al,. 2008) - RDF123: from spreadsheets to RDF, The Semantic Web. Lecture Notes in Computer Science, vol. 5318. Springer

(Langegger & Wolfram, 2009) - XLWrap Querying and Integrating Arbitrary Spreadsheets with SPARQL, The Semantic Web. Lecture Notes in Computer Science, vol. 5823. Springer

Page 15: Semiotics in spreadsheets

Related Work

template

(Abraham & Erwig, 2006) - Inferring Templates from Spreadsheets, Proceedings of the International Conference on Software Engineering

Page 16: Semiotics in spreadsheets

Related Work

instances

(Zhao et al, 2010) - A spreadsheet system based on data semantic object, IEEE International Conference on Information Management and Engineering

Page 17: Semiotics in spreadsheets

Related Work

isolated label associated to linked data

(Syed et al., 2010) - Exploiting a Web of Semantic Data for Interpreting Tables, Proceedings of the Web Science Conference

Page 18: Semiotics in spreadsheets

Related Work

correlation of labels associated to linked data

(Venetis et al., 2011) - Recovering Semantics of Tables on the Web, Proceedings of the VLDB Endowment

(Mulwad et al., 2010) - Using linked data to interpret tables, Proceedings of the International Workshop on Consuming Linked Data

Page 19: Semiotics in spreadsheets

Related Work

correlation between several spreadsheet elements associated to linked data

(Limaye, 2010) - Annotating and Searching Web Tables Using Entities, Proceedings of the VLDB Endowment

Page 20: Semiotics in spreadsheets

How far the system can interpret, considering labels and

their correlations?

Page 21: Semiotics in spreadsheets

How much different they are in fact?

Page 22: Semiotics in spreadsheets

How much different they are in fact?

Page 23: Semiotics in spreadsheets

How much different they are in fact?

Page 24: Semiotics in spreadsheets

How much different they are in fact?

Page 25: Semiotics in spreadsheets

What I did in my Master Degree

Page 26: Semiotics in spreadsheets

Research Strategy1. To identify construction patterns followed by biologists

during the creation of these spreadsheets

2. To verify if these construction patterns could lead us to recognition of the spreadsheet purpose

3. To achieve a semantic interoperability among these spreadsheets

Page 27: Semiotics in spreadsheets

How to identify Construction Patterns

*

Page 28: Semiotics in spreadsheets

*

How to identify Construction Patternswhat

Page 29: Semiotics in spreadsheets

*

How to identify Construction Patternswhat

Page 30: Semiotics in spreadsheets

*

How to identify Construction Patternswhat

what

Page 31: Semiotics in spreadsheets

*

How to identify Construction Patternswhat

whatwhen

Page 32: Semiotics in spreadsheets

*

How to identify Construction Patternswhat

what wherewhen

Page 33: Semiotics in spreadsheets

Construction Patterns

*

Page 34: Semiotics in spreadsheets

Construction Patterns

*

catalogue

Page 35: Semiotics in spreadsheets

Construction Patterns

*

catalogue

Page 36: Semiotics in spreadsheets

Construction Patterns

*

catalogue

collection

Page 37: Semiotics in spreadsheets

Construction Patterns

*

catalogue

collection

Page 38: Semiotics in spreadsheets

SciSpread System

Page 39: Semiotics in spreadsheets
Page 40: Semiotics in spreadsheets
Page 41: Semiotics in spreadsheets
Page 42: Semiotics in spreadsheets
Page 43: Semiotics in spreadsheets

Architecture EvaluationAutomatic analysis of 11,150 spreadsheets

the system recognized 1,151 spreadsheets806 spreadsheets were classified as catalogue

345 spreadsheets were classified as collection

Total: 748,459 records analyzed

*

Page 44: Semiotics in spreadsheets

Architecture Evaluation - Results

• Random subset of 1,203 spreadsheets was selected to evaluate precision/recall– Precision: 0.84

– Recall: 0.76

– Specificity: 0.95

*

Page 45: Semiotics in spreadsheets

Limitation of the Master Degree Proposal

Page 46: Semiotics in spreadsheets

Main Limitations● Single DomainSpecific spreadsheets (catalogue and

collection)

● Lack of a Model to represent construction patterns○ after, model for construction

patterns isolated for each other

● Linking labels to ontologies○ not able to aggregate different

labels belonging to the same concept

○ the ontology was selected by us, it is not necessarily the best representation for spreadsheets' data

Page 47: Semiotics in spreadsheets

● Single Domain○ Specific spreadsheets (catalogue

and collection)

● Lack of a Model to represent construction patterns○ after, model for construction

patterns isolated for each other

● Linking labels to ontologies○ not able to aggregate different

labels belonging to the same concept

○ the ontology was selected by us, it is not necessarily the best representation for spreadsheets' data

● Multiple Domains

● Model as an association network○ relates elements and

concepts of several spreadsheets

● Linking spreadsheet structure to ontologies○ the link is made between

concepts

Page 48: Semiotics in spreadsheets

Which are the plans to my PhD

Page 49: Semiotics in spreadsheets
Page 50: Semiotics in spreadsheets
Page 51: Semiotics in spreadsheets
Page 52: Semiotics in spreadsheets

Start

SEEK

Page 53: Semiotics in spreadsheets

Start

SEEK

proj.

Page 54: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unittre.val.

SD

Unit

Page 55: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

tre.val.

SD

Unittre.val.

SD

Unit

Page 56: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

tre.val.

SD

Unittre.val.

SD

Unit

Page 57: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Page 58: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Page 59: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Page 60: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Page 61: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Page 62: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

MOSES

M_MZ_sample1

ura

Saccharomyces_cerevisiae

4932

CEN.PK-113-7D

ura3

6,5 0,1 37 0,5 oC

tre.val.

SD

Unittre.val.

SD

Unit

Page 63: Semiotics in spreadsheets

Semantic Interoperability among Spreadsheets

Page 64: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 65: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 66: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

ID

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 67: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 68: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 69: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 70: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 71: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 72: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 73: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 74: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

trea.

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 75: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

trea.

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 76: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

trea.

tre.val.

SD

Unit

tre.val.

SD

Unittre.val.

Page 77: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

trea.

SD

Unit

tre.val.

SD

Unittre.val.

Page 78: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

IDtimerel.glu.

genotype

trea.

tre.val.

SD

Unit

tre.val.

SD

Unit

Page 79: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

IDtimerel.glu.

genotype

trea.

Page 80: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

IDtimerel.glu.

genotype

trea.

SpreadsheetPurpose

Page 81: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

IDtimerel.glu.

genotype

trea.

SpreadsheetPurpose

SpreadsheetDomain

Page 82: Semiotics in spreadsheets

Data Model

Spreadsheets Semiotic Sign

Page 83: Semiotics in spreadsheets

Data Model

Spreadsheets Semiotic Sign

signifierstructuralform

Page 84: Semiotics in spreadsheets

Data Model

Spreadsheets Semiotic Sign

signifier signifiedstructuralform

spreadsheet purpose

+semantic

spreadsheet data

Page 85: Semiotics in spreadsheets

Architecture

Page 86: Semiotics in spreadsheets

Start

SEEK

proj.

title

nam.

org.

NCBIID

stra.

genenam.

Mod.type

phe.

com.

tre1.ph

tre2.tem.

End

tre.val.

SD

Unit

tre.val.

SD

Unit

IDtimerel.glu.

genotype

trea.

SpreadsheetPurpose

SpreadsheetDomain

StartXYZ

How to devise different domains when the networks are interconnected?

Research Challenge

SpreadsheetDomain

SpreadsheetPurpose

Page 87: Semiotics in spreadsheets

Research Questions

• When spreadsheets could be considered of the same purpose?

• Is there a canonical representation among spreadsheets of the same purpose?

• Is it possible to define a canonical representation for a spreadsheet group• Can this representation be used to predict

spreadsheets of a given purpose?

Page 88: Semiotics in spreadsheets

Acknowledgements● Laboratory of Information Systems (LIS)● UNICAMP● FAPESP● Microsoft Research FAPESP Virtual Institute

(NavScales project)● CNPq (MuZOO Project and PRONEX-FAPESP)● INCT in Web Science(CNPq 557.128/2009-9)● CAPES

Page 89: Semiotics in spreadsheets

Thank you for your attention!