mining bulletin board systems using community generation ming li, zhongfei (mark) zhang, and zhi-hua...

Post on 03-Jan-2016

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Mining Bulletin Board Systems Using Community Generation

Ming Li, Zhongfei (Mark) Zhang, and Zhi-Hua ZhouPAKDD’08

Reporter: Che-Wei, LiangDate: 2008.07.10

1

Outline

• Introduction• General Model• Interest-Sharing Group Identification• Predicting User Behavior Using Generated

Community• Experiment

2

Introduction

• Bulletin Board System (BBS)– Information exchanging and sharing platform– Consists of a number of boards– Users can read/post messages on different topics

• Users with similar interests may have similar actions

• Effective discovery of relationships between users of a BBS is essential

3

4

General Model

• Consider the posted messages,– Use title to fully determine the topics of message– Extracted key words of titles – Mapped to collected topics

• A BBS user tends to join in a discussion on topics that he or she is interested– Messages that users posted may reflect users’ interests– Users’ interests are time-dependent– Frequency of messages posted should also be assessed

5

General Model

• Access pattern of BBS users– View of Topics• A set of topics and user access frequencies of the

messages posted to different boards by different users along the timeline

– View of Boards• A set of boards and frequencies of messages posted to

the boards along the timeline

6

General Model

• BBS model– A collection of users, each being represented by

two timelines of actions on Boards view and Topics view

7

Interest-Sharing Group Identification

8

Interest-Sharing Group Identification

• Given two timelines of actions X and Y of two users idx and idy

• A Straight forward way – Similarity between Xi and Yj =

9

Interest-Sharing Group Identification

• Average frequency differences of actions

• Local similarity between Xi and Yj

10

Interest-Sharing Group Identification

• Hybrid similarity between Xi and Y

• Global similarity between X and Y

11

Predict User Behavior Using Generated Community

• Given a user idi, – Predict what action idi may take in the near future

• Actions that have been taken by idi may be closely related to idi’s future actions– Possible solution• Compute posterior probability

12

Predict User Behavior Using Generated Community

• Resolved with interest-sharing groups– Similar users may take similar actions at some

time instants

13

BPUC algorithm

14

Experiment

• Data Set– BBS of Nanjing University– messages collected from January 1st, 2003 to

December 1st, 2005 on 17 most popular boards.– 4512 topics of 17 boards, 1109 users.

• Evaluation set – 42 volunteers, 18 users interested in modern

weapons, 12 users are fond of programming skills; rest of users are interested in computer games

15

16

Experiments on Community Generation

• Neighborhood accuracy– Describes how accurate the neighbors of a user in

a generated community share similar interests to that of the user

• Component accuracy– Measures how well these generated groups

represent certain interests that are common to the individuals of the groups

17

Experiments on Community Generation

• Example– A generated community, 7 links between similar

users, 10 links between dissimilar users

– Neighborhood accuracy = (7+10)/21 = 0.810Component accuracy = (7+0)/21 = 0.333

18

Experiments on Community Generation

• Compare with CORAL

19

Experiments on Community Generation

20

Experiments on Community Generation

• Running time comparison

21

Experiments on User Behavior Prediction

• 1056 days for training the probability model• Last 10 days for testing

22

top related