chapter 1. social media and social computing october 2012 youn-hee han

57
Chapter 1. Social Media and Social Computing October 2012 Youn-Hee Han http ://link.koreatech.ac.kr

Upload: alice-carr

Post on 29-Dec-2015

222 views

Category:

Documents


2 download

TRANSCRIPT

Chapter 1.Social Media and Social Computing

October 2012Youn-Hee Han

http://link.koreatech.ac.kr

2

1.1 Social Media A rapid development and change of the Web and the Inter-

net– Participatory web application and social networking sites

• Empowering them with new forms of collaboration• Communication

– Wikipedia• Much numbers of online volunteers collaboratively write encyclo-

pedia articles

– Amazon (Online Market) and Social Commerce• They recommend products by tapping on crowd wisdom via user

shopping and reviewing interactions;

– Twitter• Political movements benefit from new forms of engagement and

collective actions

– Facebook• Connecting People

3

1.1 Social Media Facebook – Big Change of Our Life

– 901 million monthly active users at the end of March 2012.– More than 125 billion friend connections on Facebook at the

end of March 2012.

4

1.1 Social Media Classical web and traditional media

– 1 : N Present social media

– N : M

5

1.1 Social Media A user of social media can be both a consumer and a pro-

ducer.

This new type of mass publication enables the production of timely news and grassroots ( 일반인들에 의한 ) information and leads to mountains of user-generated contents, forming the wisdom of crowds. (Collective Intelligence)

Distinctive characteristic of social media – Participation– Sharing– Rich user interaction

6

1.2 Concepts and Definitions Social Networks

– A social network is a social structure made of nodes (individu-als or organizations) and edges that connect nodes in various relationships (or interdependencies) like friendship, kinship, etc.

Why Social Network in Research Community?– All entities (e.g., people, devices, or systems) in this world are

related to each other in one way or another– It can be used in the context of information and communica-

tion technologies to provide efficient data exchange, sharing, and delivery services

– By using a social network, we can use the knowledge about the

relationship to improve efficiency and effectiveness of network services

7

1.2 Concepts and Definitions Networks and Representations

– Graphical representation, Matrix representation

– In a weighted network, edges are associated with numerical values.

– In a signed network, some edges are associated with positive relationships, some others might be negative.

– Directed networks have directions associated with edges. • In our example in Figure 1.1, the network is undirected.

Figure 1.1

8

1.2 Concepts and Definitions Networks and Representations

– Example of Directed Social Networks: Twitter• one user x follows another user y, but user y does not necessarily fol-

low user x• In this case, the follower-followee network is directed and asymmetri-

cal

9

1.2 Concepts and Definitions

Figure 1.1

Nomenclature ( 용어 체계 )

– The number of nodes adjacent to a node vi is called its degree• d1 = 3, d4 = 4.

– Geodesic & Geodesic Distance• g(2, 8) = 4 as there is a geodesic (2, 3, 4, 6, 8).

– The eccentricity of a node v is the maximum geodesic dis-tance from v to all other nodes in the network ().

• e(1) = 4, e(2) = 5, e(3) = 4, e(4) = 3, e(5) = 3, e(6) = 3, e(7) = 4, e(8) = 4, e(9) = 5

10

1.2 Concepts and Definitions Nomenclature ( 용어 체계 )

– The radius of a network is the minimum eccentricity among the vertices of the network ()

• radius(G)=3

– The diameter of a network is maximum eccentricity among the vertices of the network (i.e., the length of the longest geo-desic) ()

• diameter(G)=5

– The center of a network is the set of vertices of eccentricity equal to the radius ()

• Center(G)={4, 5, 6}

– The periphery of a network is the set of vertices of eccentric-ity equal to the diameter ()

• Center(G)={2, 9}

Figure 1.1

11

1.2 Concepts and Definitions Properties of large-scale networks

– Networks in social media are often very huge, with millions of actors and connections.

– These large-scale networks share some common patterns • scale-free distributions• small-world effect• strong community structure.

– Simple Networks• a lattice graph or random graphs.

– Complex Networks• Networks with non-trivial topological features are called complex

networks to differentiate them from simple networks

12

1.2 Concepts and Definitions Power law distribution

– Node degrees in a large-scale network often follow a power law

distribution• Most nodes have a low degree, while few have an extremely high

degree (say, degree > 104)

Low de-gree

Long tail

13

1.2 Concepts and Definitions Scale-free distribution

– Such a pattern is also called scale-free distribution • the shape of the distribution does not change with scale.• if we zoom into the tail (say, examine those nodes with degree >

100), we will still see a power law distribution

– This self-similarity is independent of scales.

– Networks with a power law distribution for node degrees are called scale-free networks

14

1.2 Concepts and Definitions Small-world effect

– Travers and Milgram (1969)• conducted an experiment to examine the average path length for

social networks of people in the United States

• “six degrees of separation”

– Leskovec and Horvitz (Microsoft, 2008)• This result is also confirmed recently in a planetary-scale instant

messaging network of more than 180 million people, in which the average path length of any two people is 6.6

• Washington Post Article http://

www.washingtonpost.com/wp-dyn/content/article/2008/08/01/AR2008080103718.html?nav=hcmodule

– Most real-world large-scale networks observe a small diameter

15

1.2 Concepts and Definitions Strong Community Structure

– People in a group tend to interact with each other more than with those outside the group.

– friends of a friend are likely to be friends

Clustering coefficient of a node vi

– Number of connections between vi’s friends over the total number of possible connections among them

’s neighbors forms a clique

16

1.3 Challenges Flood of data allows for an unprecedented large-scale social

network (complex networks) analysis – millions of actors or even more in one network.

• email communication networks, instant messaging networks, mo-bile call networks, friendship networks, co-authorship or citation networks, biological networks, metabolic pathways, genetic regulatory net-works and food web.

These large-scale networks present novel challenges for mining social media.

Some examples are given below:

17

1.3 Challenges Scalability.

– Networks of this astronomical size!

Heterogeneity.– Two persons can be friends and colleagues at the same time.

Evolution.– Social media emphasizes timeliness.

Collective Intelligence.– Wisdom of crowds.

Evaluation– A research barrier concerning mining social media is evalua-

tion.

18

1.4 Social Computing Tasks Network Modeling

– Since the seminal work by Watts and Strogatz (1998), and Barabási and Albert (1999), network modeling has gained some significant momentum.

– Researchers have observed that large-scale networks across different domains follow similar patterns, such as scale-free distribu-tions, the small-world effect and strong community structures as we dis-cussed in Section 1.2.2.

Youtube Flickr

19

1.4 Social Computing Tasks Network Modeling

– When networks scale to over millions and more nodes, it be-comes a challenge to compute some network statistics such as the di-ameter and average clustering coefficient.

• One way to approach the problem is sampling.

• Others explore I/O efficient computation.

• Recently, techniques of harnessing the power of distributed com-puting are attracting increasing attention.

20

1.4 Social Computing Tasks

Centrality analysis – It identifies the most “important” nodes in a network (Wasser-

man and Faust, 1994).

• degree centrality• betweenness centrality• closeness centrality• eigenvector centrality

equivalent to Pagerank scores (Page et al., 1999)

Influence modeling– It aims to understand the process of influence or information

diffusion.– Researchers study how information is propagated (Kempe et

al.,2003) and how to find a subset of nodes that maximize influence in a population.

21

1.4 Social Computing Tasks Community Detection

– Community• Groups, clusters, cohesive subgroups, modules in different con-

texts.

– It is one of the fundamental tasks in social network analysis.– The founders of sociology claimed that the causes of social

phenomena were to be found by studying groups rather than individuals (Hechter (1988), Chapter 2, Page 15).

22

1.4 Social Computing Tasks Community Detection

– Recent Community Detection Research • Scaling up community detection methods to handle networks of

colossal sizes.

• Deals with networks of heterogeneous entities and interactions Youtube

» Entities (nodes): users, videos, tags» Edges: connecting to a friend, leaving a comment, sending

a message

• Considers the temporal development of social media networks. Facebook has grown from 14 million in 2005 to 500 million as in

2010. As a network evolves, we can study how communities are kept

abreast with its growth and evolution, what temporal interaction pat-terns are there, and how these patterns can help identify communities.

23

1.4 Social Computing Tasks Classification and Recommendation

– A successful social media site often requires a sufficiently large population

– Personalized recommendations can help enhance user experi-ence.

• Classification can help recommendation. E.g., in Facebook

24

1.4 Social Computing Tasks Classification and Recommendation

– For instance, given a social network and some user information (interests, preferences, or behaviors), we can infer the informa-tion of other users within the same network.

The classification task here is to know whether an actor is a smoker or a non-smoker (indicated by + and −, respectively).

25

1.4 Social Computing Tasks Privacy, Spam and Security

– Privacy• Many social media sites (e.g., Facebook, Google Buzz) often find

themselves as the subjects in heated debates about user privacy.

– Spam and Attacks• Another issue that causes grave concerns in social media • In blogosphere, spam blogs (a.k.a., splogs) (Kolari et al., 2006a,b)

and spam comments have cropped up.

These spams typically contain links to other sites that are often disputable or otherwise irrelevant to the indicated content or context.

• Some spammers use fake identifiers to obtain other user’s private information on social networking sites.

– Research is needed for “secure social computing platform”• it is critical in turning social media sites into a successful marketplace

26

1.5 Summary Social media mining is a young and vibrant field with many

promises.

Social media has kept surprising us with its novel forms and variety.–

Social media is increasingly blended into the physical world with recent mobile technologies and smart phones.

27

Appendix

28

Networks

Regular Networks ( 출처 ) ; http://geza.kzoo.edu/~csardi/module/html/regular.html

1. Rings

1. A ring is a connected graph in which each vertex is connected to exactly two other vertices. 2. Lattices

1. A lattice is a graph in which the vertices are placed on a grid and the neighboring vertices are connected by an edge. A one dimensional lattice is like a ring, only it is not circular, the circle is

not closed. A two dimensional lattice can be seen in the following picture:

Ring Lattice

29Tree Full Graph

Edge 개수 :

v·(v-1)⁄2v ; vertices 개수

3. Trees

A tree is a connected graph which contains no circles (cycles). A tree graph is usually plotted “tree-like” with its root on the top and then its branches going downward. (Hence its name.) The top vertex is called the “root” and the vertices at the next lower level are called the children of the root. In general the neighbors of a vertex at a lower level are called the children of that vertex

4. Stars

A star graph is a special tree, where every vertex is connected to the root.

5. Full graph

In a full graph every possible edge is realized, ie. there is an edge between every pair of vertices.

Regular Graph 란 각 vertices 들을 연결하는 edge 들의 모양(structure, topology) 이 전체 그래프에 걸쳐 계속하여 반복적으로 나타나는 형태의 그래프

30

Erdõs-Rényi random graphsG(n,p) graphs are generated this way: the graph contains n vertices. Then for every pair of vertices with probability p an edge is drawn connecting them. Below is a G(n,p) graph with n=100 and p=2/100.

NE

MA

Small world phenomenon: Milgram’s experiment

[Instructions]Given a target individual (stockbroker in Boston), pass the message to a person you correspond with who is “closest” to the target.

NE: Nebraska 주MA: Massachusetts 주

[Outcome]20% of initiated chains reached targetaverage chain length = 6.5

“Six degrees of separation”31

32

규칙적 제 멋대로 , 무작위

Collective dynamics of “small-world” networksDuncan J. Watts & Steven H. Strogatz (http://www.tam.cornell.edu/tam/cms/manage/upload/SS_nature_smallworld.pdf)

33

Structural metrics: Average path length

34

Structural Metrics:Degree distribution(connectivity)

35

Structural Metrics:Clustering coefficient

36

Regular networks –fully connected

37

Regular networks –Lattice

38

Regular networks –Lattice: ring world

Random Networks

k=3

39

40

Random Networks

Small-world networks

41

42

Small-world networks

43

Small-world networks

44

Small-world networks

45

Scale-free networks

46

Scale-free networks

47

Scale-free networks

48

Scale-free networks

49

Scale-free networks

50

Scale-free networks

51

Case studies - Internet

52

Case studies - Internet

53

Case studies - Internet

54

Case studies - World Wide Web

55

Case studies - World Wide Web

56

Case studies - World Wide Web

57

Case studies - Actors