extracting keyphrases to represent relations in social networks from web junichiro mori and mitsuru...

18
Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Is hizuka Universiry of Tokyo Yutaka Matsuo National Institute of Advanced Industrial Science and Technology IJCAI-07

Upload: sara-hubbard

Post on 13-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Extracting Keyphrases to Represent Relations in Social

Networks from Web

Junichiro Mori and Mitsuru IshizukaUniversiry of Tokyo

Yutaka MatsuoNational Institute of Advanced

Industrial Science and Technology

IJCAI-07

Page 2: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Abstract

• The goal is extracting the underlying relations between entities that are embedded in social networks.

• The algorithm automatically extracts labels that describe relations among entities.

• The algorithm– clusters similar entity pairs– underlying relations between entities are obtained

from results of clustering.

Page 3: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Introduction

• Social networks for AI and the Semantic Web – trust estimation– ontology construction– end-user ontology

• Building social networks– extraction of social networks automatically from vari

ous sources of information.• Flink : Web pages, e-mail messages, and publications

• Polyphonet [www06]

Page 4: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Introduction

• Explore underlying relations• Most automatic extraction methods are superficial approac

h

• Co-occurrence analysis

• Non-profound assessment

– Flink : provide a clue to the strength of relations– Polyphonet : defines four kinds of relations

• C5

• Co-Author, Co-Lab, Co-Proj, Co-Conf

Page 5: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Related Work

• A supervised method– Need large annotated corpora– to gather the domain specific knowledge– a priori to define extracted relations

• Ontology population (Semantic annotation)– Pattern-based approaches– context-based approaches

• Web is highly heterogeneous and unstructured– In this paper

• context-based• a bag-of-words of context [Turney, 2005]

Page 6: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Method - Concept (1/4)

• The social network was extracted according to co-occurrence of entities on the Web.

Page 7: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Method - Concept (2/4)

• Given entity pairs in the social network– discover relevant keyphrases

• to analyze the surrounding local context (Co-occur on the Web )

• keyword extraction

Page 8: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Method - Concept (3/4)

• The keywords are ordered according to TF-IDF-based scoring

Page 9: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Method - Concept (4/4)

• Hypothesize:– the local contexts of entity pairs in the Web are similar,

the entity pairs share a similar relation.– [Harris, 1968; Schutze, 1998]: words are similar to the

extent that their contextual representations are similar.

• According to that hypothesis– the method clusters entity pairs according to the simila

rity of their collective contexts.– each cluster represents a different relation and each en

tity pair in a cluster is an instance of similar relation.

Page 10: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Method - Procedure

Page 11: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Method - Context Model and Similarity Calculation

• Ci,j (n,m) = t1, ..., tN

– A context model Ci,j of an entity pair (ei, ej)

– N terms t1, ..., tN that are extracted from the context of an entity pair

– m is the number of intervening terms between ei and ej

– n is the number of words to the left and right of either entity.

– a feature weight of ti : TF-IDF

• TF : term frequency of term ti in the contexts

• IDF : log(|C|/df(ti))+1

Page 12: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Method - Clustering and Label Selection

• TFIDF-based cosine similarity • Hierarchical agglomerative clustering

– complete linkage– The similarity between the clusters CL1 , CL2 is evalu

ated by considering the two most dissimilar elements

• With a cluster CL’s labels l1, ..., ln scored according to the term relevancy, an entity pair, ei and ej , that belongs to the CL can be regarded as holding the relations described by l1, ..., ln.

Page 13: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Experiment – 1/3

• Test Data– 143 distinct entity pairs from a political social netwo

rk• pair of a politician and a geo-political entity

– 421 entity pairs from a researcher network• pair of Japanese AI researchers

• Context model of each entity pair– 100 Web pages– NP and Noun by part-of-speeches (POS) – exclude stop words

Page 14: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Experiment – 2/3

• Clustering– complete-linkage agglomerative

• five distinct clusters for the political social network

• twelve distinct clusters for the researcher network

• two human subjects– three or fewer possible labels for each pairs– a cluster label

• the most frequent term among the manually assigned relation labels of entity pairs in the cluster.

Page 15: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Experiment – 3/3

Page 16: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Evaluation

• For each cluster cl– EPcl,correct : manually assigned relation labels include t

he label of cluster cl– EPcl,total : the number of entity pairs in the cluster cl

• For each relation l– EPl,correct : the relation label l whose cluster label is l– EPl,total : the number of entity pairs have the relation l

abel l

Page 17: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Evaluation

Page 18: Extracting Keyphrases to Represent Relations in Social Networks from Web Junichiro Mori and Mitsuru Ishizuka Universiry of Tokyo Yutaka Matsuo National

Conclusions

• Automatically extracting labels– relations between entities in social networks– Unsupervised and domain independent

• Utilizing the Web to obtain the collective contexts– Semantic Web– Web mining

• Future– other types of social networks– enriching social networks