Transcript
Page 1: Alessio Bosca: Linked Data for Content Analytics in CELI

Linked Data for content analytics in Celi Semantics 2014 - Leipzig Alessio Bosca

Page 2: Alessio Bosca: Linked Data for Content Analytics in CELI

Agenda ü  Presentation of Celi ü  Technologies (and what we do with

them) ü  Focus on LOD for content analytics

in Celi ü  … what we’d like to do

2

Page 3: Alessio Bosca: Linked Data for Content Analytics in CELI

1999 CELI srl was born

1999 2005 2010

2002 Speech Technology

2006 BlogMeter

2013 Korean Market

2011 Cross Library

2010 Milan, Rome,

Trento

3

Page 4: Alessio Bosca: Linked Data for Content Analytics in CELI

4 Seats

Torino Milano Trento Roma

6 Markets

Italy Belgium France Spain Corea Poland

50 Employees + Collaborators

>100 Active clients

4 Business branches

15 Years of experience

NLP components Speech technology Social Media Intelligence Digital Humanities

4

Page 5: Alessio Bosca: Linked Data for Content Analytics in CELI

>50 Published papers

15 Research projects

Relationships with the scientific community

6 Agreements with research centers

Scuola Normale Superiore Università di Torino Università di Pisa Università di Trento Fondazione Bruno Kessler Politecnico di Milano

5

Page 6: Alessio Bosca: Linked Data for Content Analytics in CELI

6

Core technology

opinion mining,

mood and sentiment

analysis

language identification

normalization

tokenization

NSW processing morphological

analysis

disambiguation

chunking and phrasing

phonetic transcription

with word stress

semantic clustering

automatic classification

named entities

Page 7: Alessio Bosca: Linked Data for Content Analytics in CELI

Techs

Guava

Kestrel

Virtuoso OpenSource

7

Page 8: Alessio Bosca: Linked Data for Content Analytics in CELI

8

Clients

Speech Technology Semantic Solutions Social Media Monitoring

Page 9: Alessio Bosca: Linked Data for Content Analytics in CELI

Linked (and/or Open) Data

Linked Data

Open Data

?

LOD

9

Page 10: Alessio Bosca: Linked Data for Content Analytics in CELI

Private Sector: how Celi exploits L(O)D

•  as user LODs as linguistic resources for NER, content enrichment, machine linking, discovery search… •  as provider for the PA publishing, data integration •  internal use (e.g. assets management) •  crafting of RDF artifacts for custom projects and applications

10

Page 11: Alessio Bosca: Linked Data for Content Analytics in CELI

LOD for NER

•  GENDER GUESSER •  LOCATION GUESSER •  ENTITY LINKER •  ETC .

11

INDEXER

DUMP

CELI TRIPLE STORES

INDEXES

Linguistic Analysis

SPARQL QUERIES

SEARCHER

CUSTOM RDF

WEBAPPS

Page 12: Alessio Bosca: Linked Data for Content Analytics in CELI

Faceted Semantic Search

Browse through documents and contents

Relations between Facets

12

Page 13: Alessio Bosca: Linked Data for Content Analytics in CELI

LOD for CLIR

THE AGROVOC THESAURUS HAS BEEN USED IN THE ORGANIC.LINGUA PROJECT FOR ONTOLOGY-BASED CLIR

13

Page 14: Alessio Bosca: Linked Data for Content Analytics in CELI

Sem-web techs for internal models Information in the CRUNCHED BOOK is represented using combinations of RDF and GRAPH DBS

14

Page 15: Alessio Bosca: Linked Data for Content Analytics in CELI

Public Sector: clear process …

acquire data

set open license

open formats publish

15

Celi for the public sector (CSI Piemonte): the Homer project

Page 16: Alessio Bosca: Linked Data for Content Analytics in CELI

(Public sector contd.) … but …

LACK OF MONEY

LACK OF WILLINGNESS

USE OF “STANDARDS”

… hard problems OPAQUE DATASETS

POOR RDF/SPARQL SUPPORT

16

Page 17: Alessio Bosca: Linked Data for Content Analytics in CELI

Why companies’ RDF is not published

HENCE à OVERFITTING:

Provocation It would not be interesting nor usable

WAY OUTS: having more standard models for particular micro-domains could permit their direct (re)use by the private company (and hence the publication of enhanced versions)

•  It reflects customers’ needs •  It reflects internal data models

17

Page 18: Alessio Bosca: Linked Data for Content Analytics in CELI

Receipts

Public Sector: use “true” LOD technologies (RDF dumps and SPARQL endpoints) Private companies: use standard data models, internally and for their artifacts OpenData Community: please stress the linked in LOD!

The success of LOD is bound to the use of Linked Data (as a technology) The use of LD in the Private Sector will positively feedback on the diffusion of the necessary expertise and sensibility in the Public Sector too

18

Page 19: Alessio Bosca: Linked Data for Content Analytics in CELI

Thank You!


Top Related