linking stanford typed dependencies to support text analytics

15
LINKING STANFORD TYPED DEPENDENCIES TO SUPPORT TEXT ANALYTICS Fouad Zablith, Ibrahim H. Osman American University of Beirut

Upload: fzablith

Post on 21-Jul-2015

95 views

Category:

Data & Analytics


3 download

TRANSCRIPT

Page 1: Linking Stanford Typed Dependencies to Support Text Analytics

LINKING STANFORD TYPED

DEPENDENCIES TO SUPPORT

TEXT ANALYTICS

Fouad Zablith, Ibrahim H. OsmanAmerican University of Beirut

Page 2: Linking Stanford Typed Dependencies to Support Text Analytics

Problem

• Text documents naturally include dependency relations

among textual elements

• Such dependencies enable readers to cognitively infer the

flow of thoughts and how the various elements are

semantically affected

• Automatically identifying textual dependencies has been

the focus of various approaches. However we observe

that aggregating, accessing and reusing dependencies for

further processing is still a challenge

Page 3: Linking Stanford Typed Dependencies to Support Text Analytics

Question

• Our aim is to answer the following question: how can we

make text dependencies more accessible for consumption

and reuse in text analysis?

• For that we focus on the following requirements:

• To have unique references to textual elements

• To preserve dependency links across text sources

• To store and serve the data for further consumption

Page 4: Linking Stanford Typed Dependencies to Support Text Analytics

Approach Overview

POS

Tagger

Lexical

Parser

RDF

Generator

Triple

Store

Text

Analytics

Apps

Input

Te

xt

Pro

ce

ssin

gLin

kin

gP

ub

lish

ing

/

Re

usin

g

Page 5: Linking Stanford Typed Dependencies to Support Text Analytics

Approach Overview

POS

Tagger

Lexical

Parser

RDF

Generator

Triple

Store

Text

Analytics

Apps

Input

Te

xt

Pro

ce

ssin

gLin

kin

gP

ub

lish

ing

/

Re

usin

g

Page 6: Linking Stanford Typed Dependencies to Support Text Analytics

RDF Model

Sentence

RDFS:hasDescription

Sentence text

DCT:hasPartTerm

STD:…

RD

FS

:su

bC

lassO

f

STD:Dependent

STD:JJ

STD:VB

STD:NN

STD:CD

STD:auxiliary

RDFS:subPropertyOf

STD:passiveAuxiliary

RDFS:subPropertyOf

STD:copula

STD:modifier

STD:adjectivalModifier

RDFS:subPropertyOf

STD:quantifierModifier

STD:… (all other dependency relations)

Text/

context

DCT:hasPart

Term

Label

RDFS:label

Page 7: Linking Stanford Typed Dependencies to Support Text Analytics

Example

NS:sentence/4c7aa81ba8fbcd3ad42996eb6bac18dc

RDFS:hasDescription

It is an efficient service

NS:term/PRP/It/4c7aa81ba8fbcd3ad42996eb6bac18dc_1

It

RDFS:label

NS:term/VBZ/is/4c7aa81ba8fbcd3ad42996eb6bac18dc_2

IsRDFS:label

NS:term/DT/an/4c7aa81ba8fbcd3ad42996eb6bac18dc_3

an

RDFS:label

NS:term/JJ/efficient/4c7aa81ba8fbcd3ad42996eb6bac18dc_4

efficient

RDFS:label

NS:term/NN/service/4c7aa81ba8fbcd3ad42996eb6bac18dc_5

service

RDFS:label

DCT:hasPart

STD:nsubj

STD:det

STD:PRP STD:VBZ

STD:DT

STD:JJ

STD:NN

ISA

ISA

ISA

ISA

ISA

Page 8: Linking Stanford Typed Dependencies to Support Text Analytics

Scenario

3140 User Comments

on eGovernment Services

174,862 triples

Input

Processing

Output

Page 9: Linking Stanford Typed Dependencies to Support Text Analytics

Processing Dependencies through

SPARQL – Example 1• What were the adjectives used by users to describe their

experience from the most frequent, to the less frequent?

Page 10: Linking Stanford Typed Dependencies to Support Text Analytics

Processing Dependencies through

SPARQL – Example 2• What were the “things” that users found “easy”?

Page 11: Linking Stanford Typed Dependencies to Support Text Analytics

Processing Dependencies through

SPARQL – Example 3• How is the term “Service” described by users in the

comments?

• users?

Page 12: Linking Stanford Typed Dependencies to Support Text Analytics

So What?

• This graph based manipulation of dependencies would

add potential benefits such as:

• Aggregating and transforming distributed pieces of text as a

coherent query enabled dependency layer

• The possibility of “hardwiring” text dependency patterns at a query

level, and hook them to further analytical tools and techniques (e.g.

visualization)

• The ability to easily extend the text-based graph to capture further

data entities such as polarity dictionaries and perform further

analytics

Page 13: Linking Stanford Typed Dependencies to Support Text Analytics

Future Directions

• At the level of dependency RDF generator, the extractor can be improved by providing filtering mechanisms that can be controlled by the analyst

• We are building an online tool that would enable users to upload a corpus, and generate the corresponding dependency RDF to be downloaded or pushed to a triplestore

• We are planning to focus next on exploiting this graph representation to perform business analytics around decision models (e.g. user satisfaction and performance models)

Page 14: Linking Stanford Typed Dependencies to Support Text Analytics

Conclusions

• We presented our work on generating a linked

dependency layer on top of text documents

• We highlighted the preliminary value of this layer by

applying the linking process on 3,140 disparate user

comments

• We believe that this layer will open the path for improving

the consumption and reuse of text dependencies in the

context of text and business analytics