attention economics in social web systems

49
ATTENTION ECONOMICS IN SOCIAL WEB SYSTEMS Digital Futures Seminar – 25 th October 2012 DR MATTHEW ROWE @MROWEBOT [email protected] WWW.MATTHEW-ROWE.COM WWW.LANCS.AC.UK/STAFF/ROWEM/

Upload: matthew-rowe

Post on 22-Apr-2015

517 views

Category:

Technology


0 download

DESCRIPTION

Slides from a Highwire Digital Futures Seminar that I gave at Lancaster University on 25th October 2012 covering Attention Economics in Social Web Systems

TRANSCRIPT

Page 1: Attention Economics in Social Web Systems

ATTENTION ECONOMICS IN SOCIAL WEB SYSTEMS Digital Futures Seminar – 25th October 2012

DR MATTHEW ROWE @MROWEBOT [email protected] WWW.MATTHEW-ROWE.COM WWW.LANCS.AC.UK/STAFF/ROWEM/

Page 2: Attention Economics in Social Web Systems

Outline

Attention Economics in Social Web Systems

1

¨  Background (About Me) ¨  Preamble:

¤  Social Networks ¤  The Evolution of the Web ¤ Attention Economics

¨  The Nitty-Gritty: Research ¤ Content Attention Patterns ¤  Follower Prediction ¤ Churn Prediction

¨  Summary

Page 3: Attention Economics in Social Web Systems

From… a small town (not so) far away

Attention Economics in Social Web Systems

2

Page 4: Attention Economics in Social Web Systems

Studied…

Attention Economics in Social Web Systems

3

¨  2002 – 2006: M.Eng. in Software Engineering at the University of Sheffield ¤  Developed an interest in:

n  Information Extraction

n  Machine Learning n  Semantic Web

¨  2006 – 2010: Ph.D. in Computer Science at the University of Sheffield ¤  ‘Disambiguating Identity Web References with Social Data’ ¤  Researched:

n  Social networks n  Digital Identity

n  Disambiguation techniques

n  Semantic Web techngologies

Page 5: Attention Economics in Social Web Systems

Worked…

Attention Economics in Social Web Systems

4

¨  April 2010 – August 2010: Research Associate at the University of Sheffield ¤  Information Extraction

¤  Linked Data for the Semantic Web

¤  Unsupervised clustering methods for person disambiguation

¨  September 2010 – August 2012: Research Associate at the Knowledge Media Institute, OU ¤  Social networks and churn

¤  Behaviour modelling

¤  Community evolution

¤  Forecasting and prediction methods

Page 6: Attention Economics in Social Web Systems

Interested in…

Attention Economics in Social Web Systems

5

¨  1. Data ¤  Semantics – how data is connected together

¤  Social networks – how people are connected together ¤  Digital Identity – how people present themselves

¨  2. Prediction ¤  Forecasting and classification ¤  Disambiguation

¨  3. Machines ¤  Automation of processes

¤  Modelling social systems for machines ¤  Artificial Intelligence

Page 7: Attention Economics in Social Web Systems

Preamble: Social Networks, the Web, and Attention Economics

6

Attention Economics in Social Web Systems

Page 8: Attention Economics in Social Web Systems

What is a social network?

Attention Economics in Social Web Systems

7

¨  A social network consists of: ¤  Nodes: users in the social network ¤  Edges: connections between users

¨  Networks can be built using various mechanisms: ¤  Explicitly:

n  Undirected edge: User A friends user B n  Directed edge: User A followers user B

¤  Implicitly: n  User A replies to user B in a community forum n  User A ‘likes’ user B’s content

¨  Properties of social networks can be measured using: ¤  Network-measures:

n  Clustering coefficient: how connected users in the social network are ¤  Nodes-measures:

n  Degree: the number of users connected to a given user n  Centrality: how central a user is to the network, important for information flow

A B

A B

A B

P

Page 9: Attention Economics in Social Web Systems

Social Network Theories

Attention Economics in Social Web Systems

8

¨  Homophily: “birds of a feather flock together” ¤  Nodes in a network tend to group with similar nodes

n  Structural: users who share many common friends are likely to be friends n  Behavioural: users who exhibit similar behaviour are likely to be friends

n  Congruent with ‘Social Identity’: a user select friends as definition of their intentional identity

¨  Small-world: ¤  Social networks form ‘small worlds’ where two users can be indirectly connected

by a small number of steps ¨  Self-affirmation and self-efficacy:

¤  Users construct their social network to affirm themselves n  E.g. Action: discuss a problem. Reaction: support is offered from peers n  E.g. Action: announce successful outcome. Reaction: congratulations from peers

¨  Social Contagion: ¤  Users in a network are influenced by their peers. Influence grows with tie

strength n  E.g. A buys a product. B sees that A has bought the product. B buys the product.

Page 10: Attention Economics in Social Web Systems

Social Network Analysis

Attention Economics in Social Web Systems

9

¨  Rooted in sociology ¤  Understanding how people are connected together, their grouping and clustering

¨  Stanley Milgrim: small-world experiment ¤  Forwarded postcards onto direct acquaintances. Found 5.7 degrees.

n  Lead to ‘Six degrees of separation’ ¤  Backstrom et al. ‘Four Degrees of Separation’. Web Science 2012.

n  Nodes=731m, edges=69b. Found 3.74 degrees.

¨  Paul Erdos: one of the most widely published mathematicians ¤  Erdos number: the degree of separation between Paul Erdos and you ¤  The Kevin Bacon game: try to connect any actor to Kevin Bacon in 6 steps

¨  Robin Dunbar: formulated that the maximum number of ties = 150 ¤  Repeatedly found to be the same for average social network sizes

¨  The explicit ‘socialisation’ of the Web has made social network analysis

possible at large scale…

Page 11: Attention Economics in Social Web Systems

The Evolution of the Web

Attention Economics in Social Web Systems

10

¨  Web 1.0 – the document web ¤  Communication medium: Bulletin-boards, Email, IM (ICQ) ¤  Documents are connected to one another via hyperlinks ¤  Web presence: restricted to the technologically savvy

¨  Web 2.0 – the social web ¤  Platforms provide APIs and open up of data ¤  Users become central to the web (User-generated content: Wikipedia) ¤  Social networking sites: mediation through social objects ¤  Web presence: blogs (cult of the amateur)

¨  Web 3.0 – the semantic web ¤  Big and open data: Machines are now crunching large-scale datasets ¤  Rise in lightweight semantics: Google Rich Snippets, Facebook Open

Graph ¤  Links have meaning! :Matthew foaf:knows :Jon!

Page 12: Attention Economics in Social Web Systems

Defining Social Web Systems

Attention Economics in Social Web Systems

11

¨  Social web sites are in essence applications: ¤  Offer a range of functionalities and features, aside from just information

¨  Idea: Model social web sites as systems, define system properties: ¤  Actors = users ¤  Processes = social behaviour ¤  Structure = social network ¤  Input/Output = data

¨  Social web systems can evolve and change over time: ¤  User behaviour may impact the behaviour of others ¤  Shared content may spread through the system ¤  Systems are susceptible to:

n  Viruses (trolling, nefarious content) n  Stimuli (external events, key actors, content injection)

¨  Attention economics is also a salient factor in social web systems…

Page 13: Attention Economics in Social Web Systems

Attention

Attention Economics in Social Web Systems

12

First, some background on attention… ¨  Attention: the middle ground between ‘awareness’ and

‘action’ ¤  It’s what motivates us to respond, read, like, comment, share

¨  Attention is the new currency: ¤  Rise in ‘Attention Culture’:

n  Reflected in media programming: TOWIE, Made in Chelsea, X-Factor

n  Fame is now pursued and celebrity is a marker of ‘success’ ¤  Follows an economic structure:

n  Demand: attention from others (i.e. to my presence, content) n  Supply: attention to others (i.e. reply to content, share content)

n  Attention is a limited commodity, only so much can be given to others

Awareness Action Attention

Page 14: Attention Economics in Social Web Systems

Attention Economics in Social Web Systems

Attention Economics in Social Web Systems

13

¨  “What counts now is what is most scarce now, namely attention.” Michael H. Goldhaber, 1997

¨  Rise of the Information Economy has made attention economics pertinent: ¤  Web 3.0 has lead to masses of data being released = Information Overload ¤  “…what is the most precious resource in our new information economy? Certainly not

information, for we are drowning in it. No, what we are short of is the attention to make sense of that information.”

Richard A. Lanham, 2006

¨  Social web systems are the setting for the Battle for Attention: ¤  Content publishers and creators: want to maximise content exposure ¤  Government policy makers: want feedback to initiated policy/issue discussions ¤  Digital marketing firms: maximise client’s audience, draw attention to client’s product

¨  The battle for attention has created various careers and issues: ¤  The ‘Social Media Professional’ ¤  Digital Marketing – social media campaigns ¤  Like Farms – generating artificial attention

Page 15: Attention Economics in Social Web Systems

Attention Economics in Social Web Systems: Research Challenges

Attention Economics in Social Web Systems

14

¨  How do social network theories relate to attention economics?

¨  What causes users’ behaviour to change?

¨  Who influences whom? ¤  Can we effectively model social contagion?

¨  How can I maximise attention to my content?

¨  How do social networks grow over time?

¨  Why do people subscribe to me? And then unsubscribe?!

Page 16: Attention Economics in Social Web Systems

Attention Economics in Social Web Systems: Research Challenges

Attention Economics in Social Web Systems

15

¨  How do social network theories relate to attention economics?

¨  What causes users’ behaviour to change?

¨  Who influences whom? ¤  Can we effectively model social contagion?

¨  How can I maximise attention to my content?

¨  How do social networks grow over time?

¨  Why do people subscribe to me? And then unsubscribe?!

Page 17: Attention Economics in Social Web Systems

Content Attention Patterns

The Nitty-Gritty: Research (I) 16

Attention Economics in Social Web Systems

Page 18: Attention Economics in Social Web Systems

Content Attention Patterns

Attention Economics in Social Web Systems

17

¨  Content publishers want people to: ¤  Share their content ¤  Discuss their content

¨  Government policy makers want to: ¤  Enable public engagement ¤  Get policy feedback

¨  How can I maximise attention to my content?

¨  Need to: ¤  Model features associated with shared content ¤  Predict which pieces of content will achieve high attention levels ¤  Identify the feature patterns of high attention content ¤  Learn how these patterns differ between social web systems

Page 19: Attention Economics in Social Web Systems

Content Attention Patterns: Model Derivation

Attention Economics in Social Web Systems

18

Wish to capture features associated with published content… ¨  User features:

¤  Number of followers, number of followees: social-network based ¤  number of posts, age in the system, post rate: activity-based

¨  Content features: ¤  Post length, referral count, time in day: surface features of the post ¤  Complexity: cumulative entropy of terms in the post ¤  Readability: Gunning Fog index of the post ¤  Informativeness: TF-IDF measure of terms within the post ¤  Polarity: average sentiment of terms in the post

¨  Topic features: ¤  Topic entropy: the concentration of the author across community forums

n  Higher entropy indicates a wider spread of forum activity ¤  Topic Likelihood: the likelihood that a user posts in a specific forum given his post

history n  Measures the affinity that a user has with a given forum

Page 20: Attention Economics in Social Web Systems

Content Attention Patterns: Predicting Attention

Attention Economics in Social Web Systems

19

¨  Two-stage process: ¤  1. Seed post identification

n  Pick out the posts (seeds) which elicit a response from those that don’t (non-seeds)

n  Identify the features of seed posts: How do they differ from non-seeds?

n  Task: binary classification using supervised classifiers n  Train on one sample (80%), test on another sample (20%)

n  Class labels: positive (seed) and negative (non-seed)

¤  2. Attention-level prediction n  Predict which posts will get the post attention (i.e. number of replies)

n  Identify the features of high-attention posts

n  Task: regression using linear regression n  Train on sample (80%), test on another sample (20%) n  Predict the number of replies

¨  How do the patterns from (1) and (2) differ between social web systems?

¨  Are there differences in the patterns within the same social web system?

Page 21: Attention Economics in Social Web Systems

Content Attention Patterns: Datasets

Attention Economics in Social Web Systems

20

¨  Boards.ie ¤  Largest community-message board in Ireland ¤  Covers a range of topics and subjects in dedicated forums ¤  Analysed all posts and forums in 2006 ¤  Attention measure: number of posts in a thread ¤  1.9m posts, 90k seeds, 21k non-seeds, 30k users

¨  Twitter ¤  Subscription-network social web system

n  Users subscribe (follow) other users, then read their content ¤  Collected a random subset over 24-hour period ¤  Attention measure: length of @reply chain ¤  1.4m posts, 144k seeds, 930k non-seeds, 766k users

¨  High class imbalance in each dataset! ¤  i.e. high proportion of seeds to non-seeds

Page 22: Attention Economics in Social Web Systems

Content Attention Patterns: Experiment 1 – General Patterns

Attention Economics in Social Web Systems

21

Began by examining the general patterns in the dataset… ¨  1. Identification of Seed Posts

¤  Which model performs best?

A:8

Table II. Results from the classification of seed posts on TwitterP R F1 ROC

User Naive Bayes 0.780 0.859 0.805 0.558Max Ent 0.749 0.866 0.803 0.566J48 0.855 0.866 0.806 0.537

Content Naive Bayes 0.772 0.866 0.803 0.664Max Ent 0.801 0.863 0.808 0.777J48 0.826 0.866 0.810 0.671

All Naive Bayes 0.802 0.746 0.770 0.677Max Ent 0.807 0.864 0.810 0.781J48 0.837 0.870 0.831 0.775

Table III. Results from the classification of seed posts on Boards.ieP R F1 ROC

User Naive Bayes 0.691 0.767 0.719 0.540Max Ent 0.776 0.806 0.722 0.556J48 0.778 0.809 0.734 0.582

Content Naive Bayes 0.730 0.794 0.740 0.616Max Ent 0.758 0.806 0.730 0.678J48 0.795 0.822 0.783 0.617

Focus Naive Bayes 0.710 0.737 0.722 0.588Max Ent 0.649 0.805 0.719 0.586J48 0.649 0.805 0.719 0.500

User + Content Naive Bayes 0.712 0.772 0.732 0.593Max Ent 0.767 0.807 0.734 0.671J48 0.795 0.821 0.779 0.675

User + Focus Naive Bayes 0.699 0.778 0.724 0.585Max Ent 0.771 0.806 0.722 0.607J48 0.777 0.810 0.742 0.617

Content + Focus Naive Bayes 0.732 0.787 0.746 0.658Max Ent 0.762 0.807 0.731 0.692J48 0.798 0.823 0.787 0.662

All Naive Bayes 0.724 0.780 0.740 0.637Max Ent 0.768 0.808 0.733 0.688J48 0.798 0.824 0.792 0.692

5.2.3. Twitter vs Boards.ie. Comparing Twitter with Boards.ie we notice similarities be-tween the performance achieved using the solitary feature sets of user features andcontent features and the improved performance when these features are combined to-gether. In both cases content plays a greater role in seeding a discussion than theuser’s standing on each platform. In the case of Twitter, however, the increase in per-formance is not as significant as for Boards.ie, suggesting that the user’s standing onthe platform is more important than Boards.ie.

5.3. Results: Feature AssessmentThus far we have assessed the predictive accuracy of different feature sets and identi-fied the best performing model - classifier and feature sets - for each dataset, findingthat the J48 classifier with all features yielded the best performing model for eachsystem. Our next experiment sought to identify key features in differentiating seedsfrom non-seeds on Social Web systems by training the best performing model on thetraining split and applying the model to the testing split, thus yielding a baseline. In-dividual features were then dropped from the model before re-training it, applying it tothe testing split once again and recording the reduction in the produced F1 level. Fromthis we produced a ranking of features based on the extent to which they harmed per-formance and chose the top-5 features from each dataset and assessed their correlationwith seeds and non-seed posts in the training split.

ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY.

Twitter Boards.ie

A:9

Table II. Results from the classification of seed posts on Twitter,where content features outperform user features and all featuresachieves the optimum performance

P R F1 ROCUser Naive Bayes 0.803 0.862 0.809 0.603

Max Ent 0.823 0.865 0.805 0.612J48 0.833 0.866 0.811 0.636

Content Naive Bayes 0.811 0.850 0.823 0.651Max Ent 0.874 0.870 0.814 0.697J48 0.888 0.882 0.841 0.666

All Naive Bayes 0.833 0.868 0.820 0.680Max Ent 0.853 0.870 0.820 0.733J48 0.869 0.883 0.851 0.726

among other users. Combining the features together yields our best performing modelwith the J48 classifier.

5.2.2. Boards.ie. For solitary feature sets on Boards.ie Table III demonstrates thatcontent features provide the best features, outperforming user features and focus fea-tures. Focus features perform poorly on their own, suggesting that such information isinsufficient for identifying seeds. When we combine the feature sets together we noticeimproved performance, for example combining content features with focus featuresachieves the best performing model in terms of 2 feature sets combined, demonstrat-ing the utility of focus features when used in conjunction with content quality metrics.In each case of combining feature sets we observe improvements, and by combiningall feature sets together we achieve the optimum model, suggesting that the use of allthree feature sets holds important information for differentiating seeds from non-seedson discussion forums.

5.2.3. Twitter vs Boards.ie. Comparing Twitter with Boards.ie we notice similarities be-tween the performance achieved using the solitary feature sets of user features andcontent features and the improved performance when these features are combined to-gether. In both cases content plays a greater role in seeding a discussion than theuser’s standing on each platform. In the case of Twitter, however, the increase in per-formance is not as significant as for Boards.ie, suggesting that the user’s standing onthe platform is more important than Boards.ie.

5.3. Results: Feature AssessmentThus far we have assessed the predictive accuracy of different feature sets and identi-fied the best performing model - classifier and feature sets - for each dataset, findingthat the J48 classifier with all features yielded the best performing model for eachsystem. Our next experiment sought to identify key features in differentiating seedsfrom non-seeds on Social Web Systems by training the best performing model on thetraining split and applying the model to the testing split, thus yielding a baseline.Individual features were then dropped from the model before re-training it, applyingit to the testing split once again and recording the reduction in the produced F1 level.From this we produced a ranking of features based on the extent to which they harmedperformance, we then chose the top-5 features from each dataset and assessed theircorrelation with seeds and non-seed posts in the training split.

Table IV presents the reduction in F1 levels in each dataset and the significance ofeach reduction - measured using the non-parametric Sign Test for statistical signifi-cance. Fig. 1 presents the reduction in all accuracy levels in both datasets and Fig. 2shows the correlations between the top-5 features, in terms of F1 reduction, and theseeds and non-seeds in the training splits of each dataset. We now describe the findingsfrom each platform and their differences and similarities.

ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 23: Attention Economics in Social Web Systems

Content Attention Patterns: Experiment 1 – General Patterns

Attention Economics in Social Web Systems

22

¨  1. Identification of Seed Posts ¤ How do features correlate with seed posts?

A:10

Table III. Results from the classification of seed posts on Boards.ie, wherecontent features outperform user and focus features, content + focus featuresachieve the best performance for combinations of two feature sets and all fea-tures achieves the optimum performance

P R F1 ROCUser Naive Bayes 0.691 0.767 0.719 0.540

Max Ent 0.776 0.806 0.722 0.556J48 0.778 0.809 0.734 0.582

Content Naive Bayes 0.730 0.794 0.740 0.616Max Ent 0.758 0.806 0.730 0.678J48 0.795 0.822 0.783 0.617

Focus Naive Bayes 0.710 0.737 0.722 0.588Max Ent 0.649 0.805 0.719 0.586J48 0.649 0.805 0.719 0.500

User + Content Naive Bayes 0.712 0.772 0.732 0.593Max Ent 0.767 0.807 0.734 0.671J48 0.795 0.821 0.779 0.675

User + Focus Naive Bayes 0.699 0.778 0.724 0.585Max Ent 0.771 0.806 0.722 0.607J48 0.777 0.810 0.742 0.617

Content + Focus Naive Bayes 0.732 0.787 0.746 0.658Max Ent 0.762 0.807 0.731 0.692J48 0.798 0.823 0.787 0.662

All Naive Bayes 0.724 0.780 0.740 0.637Max Ent 0.768 0.808 0.733 0.688J48 0.798 0.824 0.792 0.692

Table IV. Reduction in F1 levels as individual features aredropped from the j48 classifier

Feature Dropped Twitter Boards.ie- 0.862 0.815Post Count 0.864 0.815In-Degree 0.861. 0.811*Out-Degree 0.858*** 0.811*User Age 0.863 0.807***Post Rate 0.863 0.815Topic Entropy - 0.815Topic Likelihood - 0.798***Post Length 0.861 0.810**Complexity 0.862 0.811**Readability 0.857*** 0.802***Referral Count 0.862 0.793***Time in Day 0.842*** 0.810**Informativeness 0.861. 0.801***Polarity 0.860** 0.808***Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 .

5.3.1. Twitter. Time-in-day yielded the greatest reduction in the F1 level when it wasremoved from the best performing model - J48 using all features. Assessing the boxplots in Fig. 2 we note that there is a clear correlation between earlier in the day andseed posts, while non-seed posts appear later. The variance in the distribution is alsonot as great for the non-seed posts when compared to seeds, this suggests that there isa specific window during which the likelihood of yielding a reply on Twitter is distinctlyreduced. The next feature to produce the largest F1 reduction was readability. Lookingat the box plots once again we see no clear differences between seeds and non-seeds -although the mean is higher for non-seeds.

The third feature to produce the greatest reduction in accuracy was the out-degreeof the post author. The boxplots indicate that seed post authors followed more users, onaverage, than non-seed post authors. This indicates that participation on the platform

ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 24: Attention Economics in Social Web Systems

Content Attention Patterns: Experiment 1 – General Patterns

Attention Economics in Social Web Systems

23

¨  1. Identification of Seed Posts ¤ How do features correlate with seed posts?

Twitter

Boards.ie

Page 25: Attention Economics in Social Web Systems

Content Attention Patterns: Experiment 1 – General Patterns

Attention Economics in Social Web Systems

24

¨  2. Prediction of Attention Levels ¤ Which model performs best?

A:16

We now describe our results from the model selection stage before breaking down thebest performing model for each system and analysing its coefficients.

Table V. Averaged nDCG@k levels for different datasets and featuresets

Twitter Boards.ie SCN DiggUser 0.376 0.646 0.592 0.594Content 0.790 0.433 0.522 0.647Focus - 0.587 0.564 0.824User + Content 0.554 0.547 0.676 0.812User + Focus - 0.660 0.583 0.559Content + Focus - 0.756 0.573 0.848All - 0.687 0.569 0.831Average 0.573 0.617 0.583 0.731

6.2. Results: Model Selection6.2.1. Twitter. Table V presents the results from each of the tested systems, we first

focus on the results obtained for the microblogging platform Twitter where we testeduser and content features and their combination together. The results show that con-tent features performed best, far outperforming the use of user features on their ownand the combination of user and content features. This indicates that, as with the seedpost prediction, content plays an important role on Twitter over merely the use of userfeatures.

Fig. 3 shows the nDCG@k values obtained using the different feature sets and com-binations for varying levels of k. For Twitter when using user features we improve inperformance as we predict growing ranks, yielding poor accuracy when predicting thetop ranks - confirming that user features are poor indicators of heightened discussionactivity on Twitter. When using just content features we yield a gradual decrease inprediction accuracy as we increase the rank position, suggesting that content qualitymeasures, on their own, are good indicators of what will generate the greatest activitybut lead to reduced performance as the prediction space enlarges.

6.2.2. Boards.ie. Looking now at Boards.ie - the Irish community discussion board -the results in Table V show that for solitary feature sets - i.e. user features, content fea-tures and focus features - user features performed best, followed by focus features. Thisindicates that in terms of heightened activity the standing which a user has within thecommunity is more important than solely the characteristics of the content they aresharing. Combining the features together we yield the best performing model whenusing content and focus features. This indicates that information about the topicalconcentration of the user when used in conjunction with content information providesa useful means of predicting activity levels for popular posts.

The normalised discounted cumulative gain plots in Fig. 5 demonstrate that usingjust user features we are able to predict with fairly good accuracy (0.89) the top-rankedpost within the validation split. Content and focus features, as we state above, per-form best and show a gradual decrease in nDCG as k is increased. We find the small-est standard deviation for this feature set combination too, indicating its consistencyacross nDCG@k levels for growing values of k.

6.2.3. SCN. For the SAP Community Network we find that user features perform bestout of the solitary feature sets, indicating that the user’s standing provides more infor-mation of heightened attention than the use of content or focus features on their own.User and content features provide the best combination of feature sets, outperform-ing the other two permutations and the use of all features. In this instance the use

ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY.

Page 26: Attention Economics in Social Web Systems

Content Attention Patterns: Experiment 1 – General Patterns

Attention Economics in Social Web Systems

25

¨  2. Prediction of Attention Levels ¤ How do features correlate with heightened attention?

A:18

For the nDCG@k values - in Fig. 3 - focus features on their own perform well forpredicting top ranks: 0.866 for nDCG@1, 0.887 for nDCG@5, 0.849 for nDCG@10 and0.91 for nDCG@20. Content and focus features perform extremely well when predictingthe post with the greatest activity, yielding 0.98 for nDCG@1, producing an effectivemeans of predicting the posts that will garner the greatest attention.

6.2.5. Comparing Systems. Comparing the four Social Web Systems in terms of themodels and their performance we find similar patterns when using solitary featuresets, where user features perform best on Boards.ie and SCN - emphasising the rep-utation dynamics that influence attention to content. Combining the feature sets to-gether we find that content and focus features provide similar predictive performancefor Boards.ie and Digg, indicating the necessity for capturing the topical concentrationof the post author to enable effective predictions.

Several differences also emerge between the systems, for instance we found contentfeatures to be more important on Twitter as opposed to user features., thus differingfrom the other platforms where content features on their own perform poorly. Thiscould be due to to the dynamics of the platform requiring content to be informativewithin a restricted length, therefore information must be conveyed in a meaningfuland concise manner. In our case we are exploring attention measured through thelength of reply chains and comments attributed to a given piece of content, for othermeasures of attention such as retweets on Twitter then user features could play agreater role.

Looking at the average results obtained from our model selection task, we findthat our method achieves its best performance over the Digg dataset, possibly dueto the skewed dataset towards more popular content. The next highest performanceis achieved on Boards.ie where content and focus features provide the best model forprediction. We achieve poor performance for Twitter when using all features from thetested models, where the use of user features harms the predictive performance ofcontent features given that accuracy worsens. In this case the content of the tweetcontains vital indicators of the attention that we can expect to yield.

Table VI. Summary of coefficients from Linear Regression Models induced from best per-forming features and their significance levels

Twitter Boards.ie SCN DiggPost Count - - -5.689E�04 -Out-degree - - -2.520E�02 *** -In-degree - - 5.013E�02 *** -User Age - - 6.665E�08 -Post Rate - - 1.227E�01 -Topic Entropy - -0.2441 *** - -16.369 **Topic Likelihood - 60.0807 *** - -33.286 .Post Length -0.0092 0.0369 *** 2.414E�02 *** 7.131 *Complexity -1.9664 *** 2.4775 **** 3.610E�01 ** -30.592 ***Readability 0.0043 ** 0.0024 *** -1.846E�03 -0.018Referral Count -0.5842 *** -0.1236 ** 2.147E�02 . -Time in Day -0.0028*** 7.98�5 -2.340E�05 0.012 **Informativeness 0.0035 -0.0093 **** -4.773E�03 *** -1.146 *Polarity 0.0309 -4.0863 *** -1.094E�01 -3.464

Signif. codes: p-value < 0.001 *** 0.01 ** 0.05 * 0.1 . 1

6.3. Results: Feature AssessmentThus far we have identified the best performing model from each of the Social WebSystems’ datasets. Our next task focuses on how the feature patterns in the linear re-

ACM Journal Name, Vol. V, No. N, Article A, Publication date: January YYYY.

Twitter High Attention= •  Shorter posts •  Denser vocabulary •  Fewer hyperlinks •  Earlier in the day!

Boards.ie High Attention= •  Concentrated topics •  Longer posts •  Wider vocabulary •  Fewer referrals •  Negative sentiment

Page 27: Attention Economics in Social Web Systems

Content Attention Patterns: Experiment 2 – Specific Patterns

Attention Economics in Social Web Systems

26

Examining community-specific patterns in Boards.ie. Added additional features to capture community-dependencies ¨  1. Identification of Seed Posts (Over 9 randomly sampled forums):

¤  267 (Astronomy and Space) = content features alone performs best ¤  221 (Spanish) = title features and user features performs best

¤  In support communities: new users to the topic = more likely to get replies ¤  Specificity of community’s subject has an effect:

n  Work and Jobs forum is very general: post does not have to fit the forum n  Golf forum is very specific: distance between post and community must be minimised

TABLE IIF1 SCORE AND MATTHEWS CORRELATION COEFFICIENT (MCC) FOR DIFFERENT FORUMS WHEN PERFORMING SEED POST IDENTIFICATION. THE BEST

PERFORMING MODEL FOR EACH FORUM IS MARKED IN BOLD.

forumid User Focus Content Community Title AllMCC F1 MCC F1 MCC F1 MCC F1 MCC F1 MCC F1

10 0.0 0.75 0.0 0.75 0.071 0.76 0.0 0.75 0.0 0.75 0.1 0.766607 0.332 0.839 0.0 0.802 0.0 0.802 0.0 0.802 0.0 0.802 0.359 0.857343 0.0 0.769 0.0 0.769 0.093 0.782 0.0 0.769 0.0 0.769 0.148 0.789267 0.078 0.609 -0.132 0.531 0.242 0.673 0.078 0.609 0.0 0.549 0.181 0.643865 0.0 0.533 0.0 0.533 0.0 0.533 0.0 0.533 0.0 0.533 0.632 0.815544 0.0 0.818 0.0 0.818 -0.052 0.809 0.0 0.818 0.0 0.818 0.109 0.82855 0.0 0.913 0.0 0.913 0.0 0.913 0.0 0.913 0.0 0.913 0.144 0.918221 0.447 0.625 -0.447 0.25 0.0 0.486 0.0 0.333 0.707 0.829 0.0 0.333630 0.0 0.678 0.0 0.678 -0.044 0.675 0.0 0.678 0.0 0.678 0.109 0.686

and p < 0.01) in order to attract attention.Another support and advise oriented community is the com-

munity around forum 343 (Golf). The topic of this communityis a more specific than the topic of the previous community.In this community the content of a post needs to be rathercomplex (coef = 2.261 and p < 0.01) and should also notcontain links (coef = �0.586 and p < 0.05) in order to attractattention. Further posts which are topically distinct from whatthe Golf community usually talks about (community distancecoef = �4.528 and p < 0.05) are less likely to get replies. Thisindicates that within the community specialist terminology isused and the divergence away from such vocabularies reducesthe likelihood of generating attention to a new post.

The community around forum 865 (HE Video Players &Recorders) has an advice seeking and experience sharingpurpose but only for one specific group of products. For thiscommunity forum all features’ coefficients are not significant.However, a classification model trained with all featuresoutperformed a random baseline classification model with aMCC value of 0.632. By looking at the feature list rankedby the IGR, we note that only one feature contributed to thisperformance boost, namely the inequity score (IGR = 0.7).The coefficient of the inequity score in the regression modelis negative (coef = �5.025) which indicates that a post is lesslikely to get replies if it is authored by a user who repliedto many posts in this forum in the past but hasn’t got manyreplies himself in this forum. One possible explanation is thatin support oriented communities users who reply to many postsare more likely to be experts. It is not surprising that posts ofsuch expert users are less likely to get replies since less usershave enough expertise to answer or comment on the post ofan expert.

The main purpose of the community around forum 544(Banking & Insurance & Pensions) is also for seeking adviceand sharing experiences and information. In this communityshorter posts (content length coef = �0.017 and p < 0.05)authored by users who are new to the topic - or have notpublished anything about the topic before (topic distance coef =2.890 and p < 0.01) - are more likely to get replies. Wheninspecting the IGR based feature ranking of the content group,we find that only the complexity of content is a useful featurefor informing a classifier which has to differentiate betweenseed and non seeds (IGR = 0.354). This indicates that short,

but complex posts which have been authored by newbies aremost likely to catch the attention of this community.

The main purpose of the community around forum 267(Astronomy & Space) is to share information and contentand to engage in discussions. Long posts (coef = 0.083

and p < 0.05) which do not contain many novel terms(informativeness coef = �0.029 and p < 0.05) but are positivein their sentiment (polarity’s coef = 4.556 and p < 0.05)are very likely to attract the attention of this community. Thecontent feature with the highest IGR is the number of linksper post (IGR = 0.1). Since the coefficient of the number oflinks is positive in our regression model we can conclude thata higher number of links indicates that the post is more likelyto get replies (coef = 0.157) in this forum. This suggests thatin this forum posts which are long, informative and re-usethe vocabulary of the community are more likely to attractattention.

Also for the topical community around forum 55 (Satellite)the main purpose is to share information and content and toengage in discussions. In this community posts authored byusers who have a high forum likelihood are less likely to getreplies (coef = �5.891 and p < 0.01). This suggests thatusers who stimulate discussions in this community have tofocus their activity away from this forum. Further posts whichare topically distant from the topics the community usuallytalks about are again less likely to get replies (coef = �2.944

and p < 0.01). This pattern indicates that users who focustheir activity away from this community and then post a newthread that is about topics which seem to be in the topicalinterest area of the community are more likely to get replies.

The community around forum 221 (Spanish) is a communityof practice which means that the community members havea common interest in a particular domain or area, and learnfrom each other. This community is mainly impacted by userand title factors, however all features’ coefficients are notsignificant. Ranking the features by their IGR shows thatthe most important feature for discriminating between postsgetting replies and posts not getting replies is the title length(IGR = 0.558). Interestingly in this forum, posts with shorttitles are more likely to get replies. The longer the title theless likely a post gets replies (title length’s coef = �0.326).The second most important feature is the user account age(IGR = 0.381). Users who have owned an account for

Page 28: Attention Economics in Social Web Systems

Content Attention Patterns: Experiment 2 – Specific Patterns

Attention Economics in Social Web Systems

27

¨  2. Prediction of Attention Levels ¤  Golf forum (343):

n  Seed post identification = content and community features

n  Prediction of attention levels = focus features

¤  Satellite forum (55): n  Seed post identification = all features n  Prediction of attention levels = title features only works best.

TABLE IIIAVERAGED NORMALISED DISCOUNTED CUMULATIVE GAIN nDCG@k

VALUES USING A LINEAR REGRESSION MODEL WITH DIFFERENT FEATURESETS. A nDCG@k OF 1 INDICATES THAT THE PREDICTED RANKING OF

POSTS PERFECTLY MATCHES THEIR REAL RANKING. POSTS ARE RANKEDBY THE NUMBER OF REPLIES THEY GOT.

Forum User Focus Content Commun’ Title All10 0.599 0.561 0.452 0.516 0.418 0.616221 0.887 0.954 0.863 0.954 0.88 0.985267 0.63 0.703 0.773 0.6 0.75 0.685343 0.558 0.727 0.612 0.634 0.572 0.636544 0.5 0.514 0.607 0.684 0.461 0.57455 0.574 0.42 0.655 0.671 0.73 0.692607 0.77 0.632 0.814 0.48 0.686 0.842630 0.707 0.459 0.635 0.547 0.485 0.762865 0.673 0.612 0.85 0.643 0.771 0.796

the same features which have a positive impact on the startof discussions in one community can have a negative impactin another community. For example, our results from the seedpost identification experiment suggest that a high number oflinks in a post has a negative impact on the post getting repliesespecially in communities having a supportive purpose (suchas community 343 and 10). However, in the community aroundforum 267, which mainly has an information and contentsharing purpose, the contrary is the case. Posts which tendto have many links are more likely to get replies in thiscommunity forum. This example nicely shows that the purposeof a community may influence how individual factors impactthe start of discussions in a community forum.

It is also interesting to note that for support orientedforums (such as forum 865 and 544) users which seem tobe rather new to a topic (i.e. have not published posts beforewhich are topically similar to the content produced by thiscommunity) are more likely to get replies. Further, we noticethat the importance of whether a post fits the topical focusof a community or not is largely dependent on the subjectspecificity of the community. In other words communitiesaround very specific topics (such as the community aroundthe sport Golf) require posts to match the topical focus of thecommunity in order to attract attention, while communitiesaround more general topics (such as the community aroundtopic Work and Jobs) do not have this requirement.

In our previous work [4] we learnt a general pattern forgenerating attention on Boards.ie by performing seed postidentification using all data from 2006, not just a selectionof forums. The best performing model contained all features(user, content and focus), and indicated that the inclusion ofhyperlinks was correlated with non-seed posts, while seedposts were those that had a high forum likelihood - i.e. the userhad posted in the forum before and was therefore familiar withthe forum. The results from our current work have identifiedthe key differences between this general attention pattern andthe patterns that each community exhibits. For instance forthe 9 analysed forums, 7 perform best when using all features- similar to our previous work - while for the 2 remainingforums, one forum performs best when using content featuresand another when using title features. Additionally we find

differences in the patterns: for forum 55 we find that the lowerthe forum likelihood the greater the likelihood that the userwill generate attention, this being the converse of the generalpattern learnt previously [4]. For forums 10 and 343 we findthat an increased number of hyperlinks reduces the likelihoodof the post generating attention, agreeing with the generalattention pattern, while for forum 267 a greater number ofhyperlinks increases the likelihood of generating attention.

Our results from the activity level prediction experimentshow that the factors that impact whether a discussion startsaround a post tend to differ from the factors that impactthe length of this discussion. For example, in the communityaround forum 10 (Work & Jobs) a posts which has questionmarks in the title is more likely to get a reply but in orderto stimulate lengthy discussions it is more important that thetitle of a post has a certain length rather than that it containsquestion marks.

It is also interesting to note that the title length is theonly feature which has a significant positive impact acrossseveral communities on the number of replies a post gets. Thissuggests that in some communities posts with longer titles aremore likely to stimulate lengthy discussions. We assume thatthis happens because long titles may on the one hand attractmore users to read the posts and on the other hand long titlesmay be correlated with high quality or substantivity of posts’scontent. It is also likely to be an effect caused by the platform’sinterface, as users are presented with a list of threads in a givencommunity each of which is listed by its title. The first pieceof information, along with the username of the author, thatcommunity members see is the title of the post.

We also found a shared attention pattern between the Golfand Real-World Tournaments and Events communities, sincein these communities posts which are topically distant fromwhat these communities usually talk about are less likelyto stimulate lengthy discussions. Therefore we can concludethat although most attention patterns which we identified inour work are local and community-specific, cross-communitypatterns also exist and can be identified with our approach.

Comparing these findings to our previously work [4] onceagain reveals interesting differences between the general pat-tern learnt across the entirety of Boards.ie for activity levelprediction and the per-forum patterns that we have found inthis paper. For instance in [4] the general pattern indicatedthat lower forum entropy and informativeness together withincreased forum likelihood lead to lengthier discussions, whilefor forum 343 we found an increase in forum entropy to beassociated with an increase in activity. For the other featuresnone were found to be significant.

VII. CONCLUSIONS, LIMITATIONS AND FUTURE WORK

In this paper, we have presented work that identifies at-tention patterns in community forums and shows how suchpatterns differ between communities. Our exploration wasfacilitated through a two-stage approach that provided novelfeatures able to capture the community and focus informationpertaining to the creators of community content.

Page 29: Attention Economics in Social Web Systems

Content Attention Patterns: Summary

Attention Economics in Social Web Systems

28

¨  Key differences in Content Attention Patterns: ¤  Between social web systems

n  i.e. language complexity ¤  Within communities in the same social web system

n  Purpose and specificity of the community impact attention

¨  Currently exploring: ¤  Content attention patterns across different systems ¤  The relation between content attention patterns and:

n  Topical-specificity of community/network-cluster or group n  Community purpose

Ignorance isn't Bliss: An Empirical Analysis of Attention Patterns in Online Communities. C Wagner, M Rowe, M Strohmaier and H Alani. To appear in the proceedings of the Fourth IEEE International Conference on Social Computing. Amsterdam, The Netherlands. (2012) What catches your attention? An empirical study of attention patterns in community forums. C Wagner, M Rowe, M Strohmaier and H Alani. In the proceedings of the International Conference on Weblogs and Social Media. Dublin, Ireland. (2012). Anticipating Discussion Activity on Community Forums. M Rowe, S Angeletou and H Alani. The Third IEEE International Conference on Social Computing. Boston, USA. (2011)

Page 30: Attention Economics in Social Web Systems

Follower Prediction

The Nitty-Gritty: Research (II) 29

Attention Economics in Social Web Systems

Page 31: Attention Economics in Social Web Systems

Follower Prediction

Attention Economics in Social Web Systems

30

¨  Digital marketing firms want to: ¨  Maximise a client’s audience ¨  Draw attention to client’s product ¨  Maintain the reputation of their clients

¨  Content publishers want to: ¨  Ensure as many people view their content as possible

¨  How do social networks grow over time? ¨  Why do people subscribe to me?

¨  Need to: ¤  Profile users, how they behave and the content they share ¤  Predict who will follow whom in a social network ¤  Identify how people differ in their decision to follow others ¤  Understand how follow patterns differ between social web systems

Page 32: Attention Economics in Social Web Systems

Follower Prediction: Task Formulation

Attention Economics in Social Web Systems

31

¨  Formulating the problem: ¨  User A is given a set of recommendations of who to follow: R(A) ¨  Given R(A), which users will A actually follow? ¨  Goal: learn a function f which when given A and R(A) can accurately predict follower

decisions. ¨  Model this problem as a binary classification task:

¨  Predict whether A will follow B (positive), or not (negative) ¨  Constrains the task to modelling pairwise similarities between A and B across

different follow-factors: ¨  Social = similarities in the social network of A and B ¨  Topical = topical-similarity between the content of A and B ¨  Visibility = visibility of the B’s presence to A

¨  Once pairwise similarities have been measured we can: ¨  1. Learn a general model to predict who will follow whom ¨  2. Learn behaviour specific models to identify divergent follow-patterns

Page 33: Attention Economics in Social Web Systems

Follower Prediction: Social Factors

Attention Economics in Social Web Systems

32

¨  The decision of A to follow B might be based on common relationships between A and B ¤  Based on the principle of ‘homophily’

¨  Implement existing network-topology measures from the literature: ¤  Mutual Followers Count:

n  Overlap of the sets of followers of A and B

¤  Mutual Followees Count: n  Overlap of the sets of followees of A and B

¤  Mutual Friends Count: n  Overlap of the sets of friends of A and B

n  Friend of A is both a follower and a followee of A (directed)

¤  Mutual Neighbours Count: n  Overlap of sets of followees or followers

n  Ignores direction

Page 34: Attention Economics in Social Web Systems

Follower Prediction: Topical Factors

Attention Economics in Social Web Systems

33

¨  The decision of user A to follow user B might be based on the content that B has shared

¨  Implement topical affinity measures based on different models: ¤  Tag Vectors

n  Cosine similarity: between the content tag vectors of A and B

¤  Concept Bags n  Generated using concept extraction over the content of A and B

n  Disambiguated reference (e.g. “football” = ex:association_football)

n  Cosine similarity: between the concept bags of A and B

n  Jenson-Shannon divergence: between prob’ dist’ of the concept bags of A and B

¤  Concept Graphs n  Concepts are connected together in a semantic web (Google ‘DBPedia’)

n  db:Lancaster_University dbprop:city db:Lancaster !

n  Measure average d(c1,c2) between the concepts of content from A and B

n  Shortest Path: between c1 and c2 in the concept graph

n  Hitting Time: steps taken by random walker from c1 to c2

n  Commute Time: steps taken by random walker to go from c1 to c2 and back

Page 35: Attention Economics in Social Web Systems

Follower Prediction: Visibility Factors

Attention Economics in Social Web Systems

34

¨  The decision of user A to follow user B might be based on user A noticing user B’s presence

¨  Implement visibility measures that capture presence potential: ¤  Retweet Count:

n  Number of times a followee of A has retweeted content from B ¤  Mention Count:

n  Number of times a followee of A has mentioned B (e.g. @B)

¤  Comment Count:

n  Number of times a followee of A has commented on content from B

¤  Influence-weighted Counts:

n  Weight each of the above by the influence of followee of A on A n  Measured by the number of times the followee has been replied to by A

n  Related to our earlier theory of ‘Social Contagion’

n  Derive weighted versions of the three measures

Page 36: Attention Economics in Social Web Systems

Follower Prediction: Dataset + Experimental Setup

Attention Economics in Social Web Systems

35

¨  Knowledge Discovery and Data Mining (KDD) Cup 2012 Dataset ¤  Task: follower prediction! Ideal ;)

¨  1. General follower prediction ¤  Learn a general followee-decision model (10% of users) ¤  For 10%: built features based on recommendations ¤  Divided dataset up into: training (80%) and testing (20%)

¨  2. Binned follower prediction: Topical-focus ¤  Learn models of users who differ in their topical focus ¤  For each user: measured concept-entropy, derived equal-frequency bins, selected users in the lowest

and highest bins ¤  For selected users: built features based on recommendations ¤  Divided 2 datasets (low & high) into: training (80%) and testing (20%)

¨  3. Binned follower prediction: Degree ¤  Learn models of users who differ in their popularity (i.e. follower count) ¤  For each user: measured the degree, derived equal-frequency bins, selected users in the lowest and

highest bins ¤  For selected users: built features based on recommendations ¤  Divided 2 datasets (low & high) into: training (80%) and testing (20%)

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●●●

●●●●

●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●

●●

●●●●●

●●●●●●

●●●●●●

●●●

●●●●

●●●●●●●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●

●●

●●●●●●

●●●●●

●●●

●●●●

●●●●●●●

●●

●●

●●●●●●

●●

●●

●●

●●

●●●●●

●●●

●●●

●●●●

●●

●●●

●●

●●●●●●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●

●●

●●●●

●●

●●●

●●●●●●●

●●

●●●●

●●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●●●

●●●●●●

●●

●●●●●●●

●●●●●

●●

●●●●●●●●●●●●

●●

●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●●●

1 5 50 500

110

010

000

recommendations (n)

Freq

uenc

y (c

(n))

Page 37: Attention Economics in Social Web Systems

Follower Prediction: Accuracy

Attention Economics in Social Web Systems

36

¨  General Model: ¤  Topical-information provides the best solitary-factor set performance

n  Outperforms existing topological approaches from the state of the art!

¤  All features performs best

¨  Binned Models: ¤  Topical-focus: Low entropy users = topical features, High entropy users = social features

¤  Degree: Low degree and high degree users = topical features n  Expected high-degree users to be driven by social factors

Full Entropy − Low Entropy − High Degree − Low Degree − High

0.0

0.2

0.4

0.6

0.8

1.0

SocialTopicalVisibilityAll

Page 38: Attention Economics in Social Web Systems

Follower Prediction: Follower-Decision Patterns

Attention Economics in Social Web Systems

37

¨  Used a logistic regression classifier for experiments: ¤  Provides log-odds ratio diagnostics: explaining how a change in feature value effects

follow-likelihood

¨  Connections are formed when…

¤  In the general model: n  Users are closers topically (greater tag vector cosine, lower shortest path, hitting time

and commute time)

¤  In the topical-focus model: n  For low entropy users: same as the general model (greater topical affinity)

n  For high entropy users: users share more mutual followers, reduced tag vector similarity but reduced hitting time

¤  In the degree model: n  For low degree users: topical affinity is greater (same as general model)

n  For high degree users: more mutual followees present (i.e. they follow more of the same people), similar topical effects as the general model

Page 39: Attention Economics in Social Web Systems

Follower Prediction: Summary

Attention Economics in Social Web Systems

38

¨  Homophily shown to play a crucial role in users following one another ¤  Existing work used network-topology methods

¤  Presented work utilises the semantic web to gauge topical affinity

¨  Staying on topic will gain you followers within those topics ¤  Highlighted by the low-entropy users

¨  Follow-decisions are based upon user behaviour: ¤  Differences the follow-decisions based on the focus and popularity of users

¨  Current work: ¤  Examining followee-decision patterns on Twitter

n  Overhead of data gathering (as I will explain next)

¤  Can we use the same approach to predict churners?...

Who will follow whom? Exploiting Semantics for Link Prediction in Attention-Information Networks. M Rowe, M Stankovic and H Alani. To appear in the proceedings of the International Semantic Web Conference 2012. Boston, US. (2012)

Page 40: Attention Economics in Social Web Systems

Churn Prediction

The Nitty-Gritty: Research (III) 39

Attention Economics in Social Web Systems

Page 41: Attention Economics in Social Web Systems

Churn Prediction

Attention Economics in Social Web Systems

40

The complement of Follower Prediction… ¨  Same motivation as link prediction, but with an emphasis on maintenance of

subscribers ¨  Digital marketing firms want to:

¨  Draw attention to client’s product

¨  Maintain the audience of their clients

¨  How do social networks grow over time? ¨  Why do people subscribe to me? And then unsubscribe?!

¨  Need to: ¤  Profile users, how they behave and the content they share ¤  Predict who is going to ‘unfollow’ whom

n  i.e. churn from their social network ¤  Identify how people differ in their behaviour and decisions ¤  Understand how churn patterns differ between social web systems

Page 42: Attention Economics in Social Web Systems

Churn Prediction: Hypotheses

Attention Economics in Social Web Systems

41

¨  Churn on Twitter: ¤  (Kwak et al., 2012) – More common and followers tags = less likely to churn

¤  (Kwak et al., 2011) – Uninteresting topics, mundane details = more likely to churn

¤  (Kivran-Swaine et al., 2011) – If followee is more important/powerful than follower = churn

¨  Churn on Facebook: ¤  (Sibona and Walczak, 2011) – Unimportant, inappropriate and polarising posts = churn

¤  (Quercia et al., 2012) – Follower is neurotic and introverted = churn

¨  H1: Churn is topically-driven ¤  Intuition: people follow me for work topics (#semanticweb, #socialnetworks), if I

talk about football then I experience churn!

¨  H2: Topically-focussed users experience churn when they diverge

¨  H3: General-discussion users experience less churn than topically focussed-users

Page 43: Attention Economics in Social Web Systems

Churn Prediction: Data Acquisition Problem

Attention Economics in Social Web Systems

42

¨  Predicting churners and followers on Twitter requires comparing social networks at consecutive time steps

¨  Topical-Homophily is important for: a) link prediction, b) hypotheses ¨  Therefore we need to capture, at regular time steps for a given collection of seed

users S: ¤  A) Follower network of each user (s) and each follower in the follower network of s ¤  B) Content published by each user (s) and each follower in the follower network of s!

¨  We are also restricted by API limits. L = max number of requests per day: ¤  Twitter: w/o whitelisting; L=1,440. W/ whitelisting; 480k!

¨  Goal: derive S such that we: ¤  Maximise the size of S ¤  Account for growth in the follower network of each member of S ¤  Account for the growth of the follower network of each follower of each member of S ¤  A member of S has no more than 5k followers (upper limit of the API response) ¤  Remain within the API limits! (i.e. requests < L/2 per day)

Page 44: Attention Economics in Social Web Systems

Churn Prediction: Seed Set Derivation

Attention Economics in Social Web Systems

43

¨  Performed an initial exploration period for seed sampling: ¤  Logged all geotagged tweets in the North of England for 25 days

¤  Recorded the user statistics of the authors (in-degree, out-degree, etc)

¤  Need to understand: follower-network growth potential, users to choose (remove outliers)

¨  Analysed changes in the follower-networks: ¤  Between t (start) and t’ (end) follower distributions are significantly different (t-test: p-

value < 0.001)

¤  Follower-networks grow (blue) but some churn (red)

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●●●

●●

●●●●

●●

●●

●●●●●

●●●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●●

●●●●●

●●●●

●●●●●

●●●●

●●

●●

●●●

●●●●●

●●

●●●

●●●●●●●●●●●●●●

●●●●●

●●

●●

●●●

●●●●

●●

●●●●●

●●●

●●●●

●●●

●●

●●●

●●●●●●

●●●●●

●●●●●●

●●

●●●

●●

●●●●●●●●

●●

●●●●

●●●●

●●●●●●●●●●●●

●●

●●●●●●

●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●●

●●●

●●●●●●●●●

●●●●

●●●●

●●●●●●

●●●●●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●

●●●●●

●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0

5010

015

020

0

Followers @ t

c(Fo

llowe

rs @

t)

1 10 100 1000 10000

(a) Followers Count Distribution at t

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●●

●●●●

●●●

●●

●●●●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●●●

●●●●●●●●●●●

●●●

●●

●●●●●●●●●●

●●●●

●●●●●●

●●●●●●

●●●●●●●●

●●

●●

●●

●●●

●●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●●

●●●●●●●●●

●●●●●

●●●●●●●●●

●●●●●●

●●●●

●●●●

●●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●●

●●●●●●●

●●

●●●●●●●●●●

●●●

●●●

●●

●●●●●●●

●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●

●●●●●●●●●

●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●0

5010

015

020

0

Followers @ t'

c(Fo

llowe

rs @

t')

1 10 100 1000 10000

(b) Followers Count Distribution at t0

Fig. 2. Distribution of followers-per-user at t and at t0. The distributions look similarhowever they are actually significantly divergent according to the Kolmogorov-Smirnovtest.

01000

2000

3000

4000

5000

6000

Δ

c(Δ)

−40 −20 0 20 40

(a) Absolute change in the followers countsbetween t ! t0

0500

1000

1500

2000

ΔN

c(ΔN)

−10 −5 0 5 10

(b) Normalised change in the followerscounts between t ! t0

Fig. 3. Changes in the follower’s count between the first and last recorded in-degreefor the users in the Exploration Period. The left portion shaded red represents userswho experience churn, while the right experienced overall increase.

Given our definition of the normalised change in the sizes of the followernetworks we can derive ⌘ as the growth rate of the subscriber network everyday:

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●●●

●●

●●●●

●●

●●

●●●●●

●●●●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●●

●●●●●

●●●●

●●●●●

●●●●

●●

●●

●●●

●●●●●

●●

●●●

●●●●●●●●●●●●●●

●●●●●

●●

●●

●●●

●●●●

●●

●●●●●

●●●

●●●●

●●●

●●

●●●

●●●●●●

●●●●●

●●●●●●

●●

●●●

●●

●●●●●●●●

●●

●●●●

●●●●

●●●●●●●●●●●●

●●

●●●●●●

●●●●●●

●●

●●●

●●●●●●●●●●●●●●●●●

●●

●●●●

●●●●●

●●●

●●●●●●●●●

●●●●

●●●●

●●●●●●

●●●●●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●●●

●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●

●●●●●

●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●

●●●●

●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●●●

●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●

●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●0

5010

015

020

0

Followers @ t

c(Fo

llowe

rs @

t)

1 10 100 1000 10000

(a) Followers Count Distribution at t

●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●●

●●●●

●●●

●●

●●●●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●

●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●

●●

●●●●●

●●●●●●●●●●●

●●●

●●

●●●●●●●●●●

●●●●

●●●●●●

●●●●●●

●●●●●●●●

●●

●●

●●

●●●

●●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●●

●●●●●●●●●

●●●●●

●●●●●●●●●

●●●●●●

●●●●

●●●●

●●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●●

●●●●●●●

●●

●●●●●●●●●●

●●●

●●●

●●

●●●●●●●

●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●

●●●●●●●●●

●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●●●●●●●●

●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●

●●●

●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●

●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●● ●●●●●●0

5010

015

020

0

Followers @ t'

c(Fo

llowe

rs @

t')

1 10 100 1000 10000

(b) Followers Count Distribution at t0

Fig. 2. Distribution of followers-per-user at t and at t0. The distributions look similarhowever they are actually significantly divergent according to the Kolmogorov-Smirnovtest.

01000

2000

3000

4000

5000

6000

Δ

c(Δ)

−40 −20 0 20 40

(a) Absolute change in the followers countsbetween t ! t0

0500

1000

1500

2000

ΔN

c(ΔN)

−10 −5 0 5 10

(b) Normalised change in the followerscounts between t ! t0

Fig. 3. Changes in the follower’s count between the first and last recorded in-degreefor the users in the Exploration Period. The left portion shaded red represents userswho experience churn, while the right experienced overall increase.

Given our definition of the normalised change in the sizes of the followernetworks we can derive ⌘ as the growth rate of the subscriber network everyday:

Page 45: Attention Economics in Social Web Systems

Churn Prediction: Summary

Attention Economics in Social Web Systems

44

¨  Only preliminary work has been shown

¨  Sampling of users can be performed as a constraint-satisfaction problem: ¤  Choose the maximum number of users to log data for while accounting for

growth potential in follower networks

¨  Exploration period showed: ¤  Link creation is more prevalent than churn

¤  Churn will be harder to spot = more challenging!

¨  Current/future work: ¤  Begin logging sampled users

¤  Proposed approach: topic-deviation models n  Users who stray from their regular topics of discussions will experience churn

Page 46: Attention Economics in Social Web Systems

Summary 45

Attention Economics in Social Web Systems

Page 47: Attention Economics in Social Web Systems

In Summary

Attention Economics in Social Web Systems

46

¨  Social web systems are the setting for the battle for user attention ¤  Content publishers: want users to read their content

¤  Government policy makers: want citizen feedback on policies and ideas

¤  Digital marketers: want to maximise the audience of their clients and alert them to products/services

¨  We are drowning in information, need to decide what to pay attention to

¨  Data mining, social network analysis and semantic web techniques can be used to uncover patterns of attention economics: ¤  Content Attention Patterns

n  Content publishers can adapt their content depending on the target

¤  Link Prediction n  Marketers can understand how to build up an audience for a client

¤  Churn Prediction n  Can understand how to maintain audience levels

Page 48: Attention Economics in Social Web Systems

Questions to be explored/pondered…

Attention Economics in Social Web Systems

47

¨  How do social network theories relate to attention economics?

¨  What causes users’ behaviour to change?

¨  Who influences whom? ¤ Can we effectively model social contagion?

¨  Can we devise a ‘standard model’ for attention economics in social web systems?

Page 49: Attention Economics in Social Web Systems

QUESTIONS?

48

DR MATTHEW ROWE @MROWEBOT [email protected] WWW.MATTHEW-ROWE.COM WWW.LANCS.AC.UK/STAFF/ROWEM/