Download - Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Extracting Keyphrases to Represent Relations in Social

Networks from Web

Junichiro Mori and Mitsuru IshizukaUniversiry of Tokyo

Yutaka MatsuoNational Institute of Advanced

Industrial Science and Technology

IJCAI-07

Abstract

• The goal is extracting the underlying relations between entities that are embedded in social networks.

• The algorithm automatically extracts labels that describe relations among entities.

• The algorithm– clusters similar entity pairs– underlying relations between entities are obtained

from results of clustering.

Introduction

• Social networks for AI and the Semantic Web – trust estimation– ontology construction– end-user ontology

• Building social networks– extraction of social networks automatically from vari

ous sources of information.• Flink : Web pages, e-mail messages, and publications

• Polyphonet [www06]

Introduction

• Explore underlying relations• Most automatic extraction methods are superficial approac

h

• Co-occurrence analysis

• Non-profound assessment

– Flink : provide a clue to the strength of relations– Polyphonet : defines four kinds of relations

• C5

• Co-Author, Co-Lab, Co-Proj, Co-Conf

Related Work

• A supervised method– Need large annotated corpora– to gather the domain specific knowledge– a priori to define extracted relations

• Ontology population (Semantic annotation)– Pattern-based approaches– context-based approaches

• Web is highly heterogeneous and unstructured– In this paper

• context-based• a bag-of-words of context [Turney, 2005]

Method - Concept (1/4)

• The social network was extracted according to co-occurrence of entities on the Web.


• Given entity pairs in the social network– discover relevant keyphrases

• to analyze the surrounding local context (Co-occur on the Web )

• keyword extraction


• The keywords are ordered according to TF-IDF-based scoring


• Hypothesize:– the local contexts of entity pairs in the Web are similar,

the entity pairs share a similar relation.– [Harris, 1968; Schutze, 1998]: words are similar to the

extent that their contextual representations are similar.

• According to that hypothesis– the method clusters entity pairs according to the simila

rity of their collective contexts.– each cluster represents a different relation and each en

tity pair in a cluster is an instance of similar relation.

Method - Procedure

Method - Context Model and Similarity Calculation

• Ci,j (n,m) = t1, ..., tN

– A context model Ci,j of an entity pair (ei, ej)

– N terms t1, ..., tN that are extracted from the context of an entity pair

– m is the number of intervening terms between ei and ej

– n is the number of words to the left and right of either entity.

– a feature weight of ti : TF-IDF

• TF : term frequency of term ti in the contexts

• IDF : log(|C|/df(ti))+1

Method - Clustering and Label Selection

• TFIDF-based cosine similarity • Hierarchical agglomerative clustering

– complete linkage– The similarity between the clusters CL1 , CL2 is evalu

ated by considering the two most dissimilar elements

• With a cluster CL’s labels l1, ..., ln scored according to the term relevancy, an entity pair, ei and ej , that belongs to the CL can be regarded as holding the relations described by l1, ..., ln.

Experiment – 1/3

• Test Data– 143 distinct entity pairs from a political social netwo

rk• pair of a politician and a geo-political entity

– 421 entity pairs from a researcher network• pair of Japanese AI researchers

• Context model of each entity pair– 100 Web pages– NP and Noun by part-of-speeches (POS) – exclude stop words

Experiment – 2/3

• Clustering– complete-linkage agglomerative

• five distinct clusters for the political social network

• twelve distinct clusters for the researcher network

• two human subjects– three or fewer possible labels for each pairs– a cluster label

• the most frequent term among the manually assigned relation labels of entity pairs in the cluster.

Experiment – 3/3

Evaluation

• For each cluster cl– EPcl,correct : manually assigned relation labels include t

he label of cluster cl– EPcl,total : the number of entity pairs in the cluster cl

• For each relation l– EPl,correct : the relation label l whose cluster label is l– EPl,total : the number of entity pairs have the relation l

abel l

Evaluation

Conclusions

• Automatically extracting labels– relations between entities in social networks– Unsupervised and domain independent

• Utilizing the Web to obtain the collective contexts– Semantic Web– Web mining

• Future– other types of social networks– enriching social networks

Download - Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Top Related