social network analysis & friend network in blogosphere

67
Social network analysi s & friend network in blogosphere 吳吳吳吳吳吳吳

Upload: dysis

Post on 11-Jan-2016

36 views

Category:

Documents


1 download

DESCRIPTION

Social network analysis & friend network in blogosphere. 吳邦一 樹德資工系. Social network. Node: actor (people, group, organization) Arc (edge) : social relation tie, such as friend, collaboration, message transmission … Directed or undirected (bidirectional or unidirectional) Friend network: - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Social network analysis & friend network in blogosphere

Social network analysis & friend network in blogosphere

吳邦一樹德資工系

Page 2: Social network analysis & friend network in blogosphere

Social network

Node: actor (people, group, organization) Arc (edge) : social relation tie, such as friend, coll

aboration, message transmission… Directed or undirected (bidirectional or unidirectiona

l) Friend network:

Node: people Arc: friend relationship In blogosphere: a node is a blog

Page 3: Social network analysis & friend network in blogosphere

Social Network

Page 4: Social network analysis & friend network in blogosphere

Friend relation in blogosphere By data mining

Similar hyper-linking Similar interests comments Cross posting

From the explicit friend lists maintained by bloggers themselves

Page 5: Social network analysis & friend network in blogosphere
Page 6: Social network analysis & friend network in blogosphere

Friend network in Blogosphere A node is a blog X1 在 x3 的朋友清單

中但 x3 不在 x1 的朋友清單中

X1 在 x2 的朋友清單中而且 x2 也在 x1 的朋友清單中

Page 7: Social network analysis & friend network in blogosphere

Why unidirectional friends?

在大多數的 Blog 系統中,加入一個好友是不需要對方認可的 Hub effect :藝人、名人、正妹帥哥 別忘了無名是以相簿起家 包括: wretch, yam, xuite, pchome 台灣大概只有 MSN live space and pixnet

的好友需要確認

Page 8: Social network analysis & friend network in blogosphere

The hubs in Wretch

Page 9: Social network analysis & friend network in blogosphere

Balance theory People tend to maintain balanced relations

hip : Reciprocity: bidirectional tie

symmetry, undirected Transitivity: a friend’s friend tends to be friend

Bloggers like to but hard to know Who add me as a friend

Also hard to know a friend of distance more than 2

Page 10: Social network analysis & friend network in blogosphere

誰加入你為好友 在某人 A 的部落格網頁上 , 你可以看見他

的好友清單 ( 如果他有 , 而且是開放的 ), 但是你無從得知有哪些人將 A 加入好友 , 除非 .... 你把所有其他的部落格看一遍來個地毯式搜索

徹底檢查 Like a one-way function

Easy to find the outgoing arcs but hard to find the incoming arcs

Page 11: Social network analysis & friend network in blogosphere

人緣列表 Only in few blog systems (in Taiwan)

MSN live spaces, Pixnet : need confirmation Yam( 天空部落 ) 提供人緣列表

Other blog systems in Taiwan Wretch, PCHome, Xuite, Blogger, Yahoo, Sina, …

Wretch just provides the service recently.

Page 12: Social network analysis & friend network in blogosphere

Why crawling the friend network

學術研究 Social network analysis :

傳統上只能做小型社群 : data acquisition Online data: 有機會分析大型的朋友網路

Newman (01): Scientific collaboration networks Ahn (07): CyWorld, 超過一千多萬人,韓國最大 Blog

系統 提供 bloggers 查詢服務

人際關係搜尋引擎

Page 13: Social network analysis & friend network in blogosphere

WARM – blog friend relationship search service

http://warm.stu.edu.tw

Page 14: Social network analysis & friend network in blogosphere
Page 15: Social network analysis & friend network in blogosphere

想知道誰將你加為好友

1

2

34

Page 16: Social network analysis & friend network in blogosphere

想知道誰將你加為好友(續) 輸出畫面中包含

雙向好友(雙箭頭表示) 單向好友(單箭頭表示)

點選鄰近程度可顯示你與對方的距離

Page 17: Social network analysis & friend network in blogosphere

關係搜尋

1

2

34

5

Page 18: Social network analysis & friend network in blogosphere

利用關係搜尋 若 ksbcboy 想認識 lindy7684 , ksbcbo

y 可以利用【關係查詢】輸入自己與對方帳號後查出現面的結果。

這表示要認識 lindy7684 ,他可能要先認識 suzuka ,想認識 suzuka 可能得先認識 cristin ,而要認識 cristin 可以從 yulu著手,而 yulu 本身就是 ksbcboy 的朋友。

Page 19: Social network analysis & friend network in blogosphere

利用關係搜尋(續) 上面的例子中,只查出一條線,更通常

的情形可以查出很多條路線,例如如果nocold同樣去查 lindy則會得到下面的結果,這時候代表他有很多條路線可以進行

Page 20: Social network analysis & friend network in blogosphere

人氣排名 這是一個輸出的例子,當然,排名一值會有改變。

這裡也顯示關注人數增加或減少的情形以及排名的升降,另外,也提供簡單的評論功能。

WARM 的資料更新跟其他搜尋引擎是一樣的,並無法做到及時更新,所以增減的改變要在下一次資料更新時才會顯現。上次更新日期顯示在首頁中。

Page 21: Social network analysis & friend network in blogosphere

相似人氣 如果我們去查詢 Jolin 的

相似人氣,會得到類似右圖的結果。這表示加入 Jolin 的那些人中有 10684位( 29%)也加 SHE 為好友。

除了看哪些人的粉絲重疊比較高之外,也可能發現跟你相似人氣者其實是你的一位朋友。

Page 22: Social network analysis & friend network in blogosphere

相似好友 相似好友與相似人氣很像,

不同的是,他是查詢「你所加的那些好朋友其實也是誰的好朋友」

此外,相似好友還有一個用途是可能找到志同道合的人,也就是他加入的 blog跟你很像,那麼,他可能跟你興趣相同。

Page 23: Social network analysis & friend network in blogosphere

好友群 所謂好友群是指一群彼此聯繫緊密的朋

友,她們內部聯繫緊密,而對外部的人的連結相對較疏

一個人的好友通常可以分成若干群

國中同學高中同學

女朋友

Page 24: Social network analysis & friend network in blogosphere

利用好友群功能 如果我們去查 lindy 在無

名的好友群,會得到類似右圖的結果

通常可用來了解你的朋友中哪些人彼此關係比較密切。

發現他聯誼所認識的女孩子跟他的哪位朋友彼此有互連。

Page 25: Social network analysis & friend network in blogosphere

系統規模

Blog 用戶 鏈結

Wretch 2,948, 702

43,939,230

Yam 177,929 1,438,857

Pixnet 49,849 21,867

Xuite 62,257 159,891

Page 26: Social network analysis & friend network in blogosphere

有什麼用途 ?對資工人來說,不過是 BFS, database, shortest path, 網頁程式等簡單技術的應用(除了好友群)他有用嗎?

Page 27: Social network analysis & friend network in blogosphere

網站流量

Page 28: Social network analysis & friend network in blogosphere

營運狀況

Page 29: Social network analysis & friend network in blogosphere

使用者需要的,就是有用的

我們不過證實了使用者的需求 – ISP 最需要知道的事情

社會科學是在研究人的行為

Page 30: Social network analysis & friend network in blogosphere

網站只是為了滿足人類的偷窺慾望嗎 ?

人類在網路上的社交行為已經成為趨勢,無法討論對錯,只能讓它變得更好為何網路使用者會成為宅男腐女 ?

Page 31: Social network analysis & friend network in blogosphere

把要去台北的人帶上火車才發現鐵軌只舖到台南 網路社群平台的最終目的為何?

社交平台,擴大人際關係 Social network 的 diameter 會隨時間變小

在真實社會中,我們會藉由社交活動認識朋友的朋友,擴大自己的人際關係,但是在網路上, Blog 能提供足夠的交友功能嗎? Only publish, comments, cross-posting 越來越宅

Page 32: Social network analysis & friend network in blogosphere

提供更多的社會活動服務為各平台提供商的責任與趨勢

目前僅僅剛開始 , 勢必有越來越多的服務

Page 33: Social network analysis & friend network in blogosphere

動機(續)

Page 34: Social network analysis & friend network in blogosphere

各大報導 -雜誌

Page 35: Social network analysis & friend network in blogosphere

聯合報報導

Page 36: Social network analysis & friend network in blogosphere

蘋果報導

Page 37: Social network analysis & friend network in blogosphere

報導 -TVBS

Page 38: Social network analysis & friend network in blogosphere
Page 39: Social network analysis & friend network in blogosphere

劈腿事件

Page 40: Social network analysis & friend network in blogosphere

報導後流量

Page 41: Social network analysis & friend network in blogosphere

隱私權迷思 抓到劈腿對我們來說是個意外,這樣的意外是不是我們的錯誤?然而,換一個角度想,常看到網友受騙的情事,如果這個網站,可以讓人在結交往有時了解一下對方的交友狀況,不也有預防犯罪的功能嗎?

公開此服務讓網友了解別人可以對你做什麼,如果你不願意,把你的資訊設為保密狀態吧!

Page 42: Social network analysis & friend network in blogosphere

隱私權的問題 媒體報導過於辛辣,讓人有揭發隱私之錯覺

We have only public data 劈腿故事與單向好友的迷思

別人出賣你是否是我的錯 有人大叫「我愛林志玲」此是否為林志玲的隱私

Page 43: Social network analysis & friend network in blogosphere

技術與學術方面

Page 44: Social network analysis & friend network in blogosphere

The performance

資料庫資料擷取伺服器

網頁伺服器

資料計算伺服器

靜態資料

動態資料

Page 45: Social network analysis & friend network in blogosphere

The difficulty of blog friend network analysis Blog friend relation differs from the real one

Data incompleteness suffered for all social network analyses

Hub-effect Only for unidirectional relationships

How to verify Traditional method Network reconstruction

good metrics need to be defined

Page 46: Social network analysis & friend network in blogosphere

關係搜尋 : all shortest paths

Page 47: Social network analysis & friend network in blogosphere

Average distance

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

0 5 10 15 20 25

0

0.2

0.4

0.6

0.8

1

1.2

0 5 10 15 20 25

Page 48: Social network analysis & friend network in blogosphere

How to compute

BFS O(mn) is too time-consuming Random sampling (100 nodes is enough)

Is diameter a good metric? Usually not strongly connected Effect diameter (90 percentile)

Page 49: Social network analysis & friend network in blogosphere

六度分離理論 六度分離理論:大多數的人之間距離不超過 6

所謂的關係 , 定義很模糊 , 我們都有很熟的朋友 , 也有很多不怎麼熟的朋友 .

在 BLOG上 , 並非朋友都會設定好友 , 從這一點上來說 , BLOG的好友比現實生活稀疏 ;

另一方面 , 設為好友的未必是朋友 ( 如名人或是仰慕的對象 ), 從這一點來說 , BLOG好友又較真實來得多一些

整體來說 , 除去名人效應 ( 而名人通常並不設很多的好友 ), BLOG的好友連接比真實來得少 ,

如果人們會透過 WARM 的查詢而發現原來未設好友的朋友 , 因而拉近了彼此的距離 , 這就是 WARM當初所希望做到的

Page 50: Social network analysis & friend network in blogosphere

Degree 分佈 (log-log scale)

0.0000001

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

1 10 100 1000 10000 100000

0.0000001

0.000001

0.00001

0.0001

0.001

0.01

0.1

1

1 10 100 1000 10000

Power-law with two slopsBig-tail

三種會員等級

Page 51: Social network analysis & friend network in blogosphere

Clustering 係數

0.00001

0.0001

0.001

0.01

0.1

1

1 10 100 1000 10000 100000

Degree為 k者其好友之間有關聯之機率

(big-tail)

Page 52: Social network analysis & friend network in blogosphere

好友群 n 個點的 Clique: 這 n 個人彼此都相互認識 .

資料的不完整與名人效應 找 clique 是一個計算上非常困難的問題 ,

我們可以放鬆一些條件並用近似的方法來做 n 個點的 k-clique: 指的是 n 個人的群體每個人都

認識其中的 n-k 個人 n 個點的 k-plex: 指的是 n 個人的群體 , 任意兩個

人之間的距離都不超過 k, 距離的意思透過幾個人可以相互認識 , 直接認識距離就是 1, 所以 1-plex就是 clique

Page 53: Social network analysis & friend network in blogosphere
Page 54: Social network analysis & friend network in blogosphere

The friend group

Define friend group as a clique in the transitive extension

Find the max-clique in the extension

Density analysis

Page 55: Social network analysis & friend network in blogosphere

2-clique

d(u,v)<=2 for all u and v Even 2-clique is too sparse

May have a small density 2/n

Page 56: Social network analysis & friend network in blogosphere

3/2-clique

We define the 3/2-clique d(u,v)+d(v,u)<=3 Each pair is on a 3-cycle or bidirectional f

riends The density is at least ½.

Page 57: Social network analysis & friend network in blogosphere
Page 58: Social network analysis & friend network in blogosphere
Page 59: Social network analysis & friend network in blogosphere
Page 60: Social network analysis & friend network in blogosphere
Page 61: Social network analysis & friend network in blogosphere
Page 62: Social network analysis & friend network in blogosphere

The 3/2-clique are much more dense than the theoretical lower bound Well-structure network but not random a

t all A good method to find the friend group in

blogosphere with unidirectional friend relationship

Page 63: Social network analysis & friend network in blogosphere

Degree of balance Reciprocity

The prob. of that an edge is bidirectional = the ratio of bidirectional edges 0.51 for Wretch

Transitivity degree The prob. of that a friend’s friend is also a

direct friend. 0.0337 for Wretch (almost not depending on

degree)

Page 64: Social network analysis & friend network in blogosphere

Betweenness The number of shortest paths

passing through a node (an edge) Large for inter-cluster nodes Small for intra-cluster nodes Used to find community

Girvan-Newman’s algorithm

Page 65: Social network analysis & friend network in blogosphere

Betweenness Not good for large networks

Friends of distance>2 have less influence Hard to compute

GN algorithm takes O(m^2n) time Maybe we should try to define the bet

weenness with limit distances

Page 66: Social network analysis & friend network in blogosphere

Remarks Social computing: 方興未艾 Social network analysis for blogosphe

re or WWW 計算問題待解決 評估模式待定義 真相待發覺 機會與需求極大 商機無限

Page 67: Social network analysis & friend network in blogosphere

The End

Thank you