linking stanford typed dependencies to support text analytics
TRANSCRIPT
LINKING STANFORD TYPED
DEPENDENCIES TO SUPPORT
TEXT ANALYTICS
Fouad Zablith, Ibrahim H. OsmanAmerican University of Beirut
Problem
• Text documents naturally include dependency relations
among textual elements
• Such dependencies enable readers to cognitively infer the
flow of thoughts and how the various elements are
semantically affected
• Automatically identifying textual dependencies has been
the focus of various approaches. However we observe
that aggregating, accessing and reusing dependencies for
further processing is still a challenge
Question
• Our aim is to answer the following question: how can we
make text dependencies more accessible for consumption
and reuse in text analysis?
• For that we focus on the following requirements:
• To have unique references to textual elements
• To preserve dependency links across text sources
• To store and serve the data for further consumption
Approach Overview
POS
Tagger
Lexical
Parser
RDF
Generator
Triple
Store
Text
Analytics
Apps
Input
Te
xt
Pro
ce
ssin
gLin
kin
gP
ub
lish
ing
/
Re
usin
g
Approach Overview
POS
Tagger
Lexical
Parser
RDF
Generator
Triple
Store
Text
Analytics
Apps
Input
Te
xt
Pro
ce
ssin
gLin
kin
gP
ub
lish
ing
/
Re
usin
g
RDF Model
Sentence
RDFS:hasDescription
Sentence text
DCT:hasPartTerm
STD:…
RD
FS
:su
bC
lassO
f
STD:Dependent
STD:JJ
STD:VB
STD:NN
STD:CD
STD:auxiliary
RDFS:subPropertyOf
STD:passiveAuxiliary
RDFS:subPropertyOf
STD:copula
STD:modifier
STD:adjectivalModifier
RDFS:subPropertyOf
STD:quantifierModifier
STD:… (all other dependency relations)
Text/
context
DCT:hasPart
Term
Label
RDFS:label
Example
NS:sentence/4c7aa81ba8fbcd3ad42996eb6bac18dc
RDFS:hasDescription
It is an efficient service
NS:term/PRP/It/4c7aa81ba8fbcd3ad42996eb6bac18dc_1
It
RDFS:label
NS:term/VBZ/is/4c7aa81ba8fbcd3ad42996eb6bac18dc_2
IsRDFS:label
NS:term/DT/an/4c7aa81ba8fbcd3ad42996eb6bac18dc_3
an
RDFS:label
NS:term/JJ/efficient/4c7aa81ba8fbcd3ad42996eb6bac18dc_4
efficient
RDFS:label
NS:term/NN/service/4c7aa81ba8fbcd3ad42996eb6bac18dc_5
service
RDFS:label
DCT:hasPart
STD:nsubj
STD:det
STD:PRP STD:VBZ
STD:DT
STD:JJ
STD:NN
ISA
ISA
ISA
ISA
ISA
Scenario
3140 User Comments
on eGovernment Services
174,862 triples
Input
Processing
Output
Processing Dependencies through
SPARQL – Example 1• What were the adjectives used by users to describe their
experience from the most frequent, to the less frequent?
Processing Dependencies through
SPARQL – Example 2• What were the “things” that users found “easy”?
Processing Dependencies through
SPARQL – Example 3• How is the term “Service” described by users in the
comments?
• users?
So What?
• This graph based manipulation of dependencies would
add potential benefits such as:
• Aggregating and transforming distributed pieces of text as a
coherent query enabled dependency layer
• The possibility of “hardwiring” text dependency patterns at a query
level, and hook them to further analytical tools and techniques (e.g.
visualization)
• The ability to easily extend the text-based graph to capture further
data entities such as polarity dictionaries and perform further
analytics
Future Directions
• At the level of dependency RDF generator, the extractor can be improved by providing filtering mechanisms that can be controlled by the analyst
• We are building an online tool that would enable users to upload a corpus, and generate the corresponding dependency RDF to be downloaded or pushed to a triplestore
• We are planning to focus next on exploiting this graph representation to perform business analytics around decision models (e.g. user satisfaction and performance models)
Conclusions
• We presented our work on generating a linked
dependency layer on top of text documents
• We highlighted the preliminary value of this layer by
applying the linking process on 3,140 disparate user
comments
• We believe that this layer will open the path for improving
the consumption and reuse of text dependencies in the
context of text and business analytics