2. a first glance at different kinds of social media data · social media data texts images videos...
TRANSCRIPT
2. A first glance at different
kinds of social media data
1
2
2006
3
2011
Social Media Data
Texts
Images
Videos
Mixed formats
Connections I (friends, followers)
Connections II (links/URLs)
Connections/Actions (likes, favs, comments, downloads)
Images
5
http://www.guardian.co.uk/uk/2011/dec/07/twitter-riots-how-news-spread
Vis F, Faulkner S, Parry K, Manyukhina Y & Evans L (2013) Twitpic-ing the riots: analysing images shared on Twitter during the 2011 UK riots In Weller K, Bruns A, Burgess J, Mahrt M & Puschmann C (Ed.), Twitter and Society (pp. 385-398). Peter Lang.
6
Bruns, A., & Burgess, J. (2012). Notes towards the scientific study of Twitter. In Tokar, A., Beurskens, M., Keuneke, S., Mahrt, M., Peters, I., Puschmann, C., van Treeck, T., & Weller, K. (Eds.). (2012). Science and the Internet (pp. 159-169). Düsseldorf: Düsseldorf University Press http://nfgwin.uni-duesseldorf.de/sites/default/files/Bruns.pdf
Hashtags
Mentions
7
Timeline
8 Gummer, T., Roßmann, J., & Wolf, C. (2014). Candidates’ Twitter Use in the German Election 2013. Presentation at the General Online Research 2014, Cologne, Germany.
9
Timeline
Gummer, T., Roßmann, J., & Wolf, C. (2014). Candidates’ Twitter Use in the German Election 2013. Presentation at the General Online Research 2014, Cologne, Germany.
Rhythm of a City
10 http://engineering.twitter.com/2012/06/studying-rapidly-evolving-user.html
11
1. FC Köln (@fckoeln)
Borussia Mönchengladbach (@VfLBorussia)
BVB Dortmund 09 II (@BVB)
FC Bayern München (@BayMuenchen)
FC Schalke 04 II (@s04, official)
FC Schalke 04 I (@FCSchalke04, inofficial)
Hamburger SV (@HSV)
SV Werder Bremen I (@Werder_Bremen)
SV Werder Bremen II (@werderbremen)
0
10000
20000
30000
40000
50000
60000
70000
80000
Jun 11 Jul 11 Aug 11 Sep 11 Oct 11 Nov 11 Dec 11 Jan 12 Feb 12 Mar 12 Apr 12 May 12 Jun 12
nu
mb
er
of
follo
we
rs
month 1. FC Augsburg (@FCAugsburg) 1. FC Kaiserslautern (@Rote_Teufel)* 1. FC Köln (@fckoeln)1. FC Nürnberg (@1_fc_nuernberg) 1. FSV Mainz 05 (1FSVMainz05) 1899 Hoffenheim (achtzehn99)Bayer 04 Leverkusen (@bayer04fussball) Borussia Mönchengladbach (@VfLBorussia) BVB Dortmund 09 I (@BVBDortmund09)BVB Dortmund 09 II (@BVB) FC Bayern München (@BayMuenchen) FC Schalke 04 II (@s04, official)FC Schalke 04 I (@FCSchalke04, inofficial) Hamburger SV (@HSV) Hannover 96 I (@ichbin96)Hannover 96 II (@hannover96) Hertha BSC Berlin (@HerthaBSC)* SC Freiburg (@sc_freiburg)SV Werder Bremen I (@Werder_Bremen) SV Werder Bremen II (@werderbremen) VfB Stuttgart (@VfB)
Bruns, A., Weller, K., & Harrington, S. (2014). Twitter and Sports: Football Fandom in Emerging and Established Markets. In: K.Weller, A. Bruns, J. Burgess, M. Mahrt and C. Puschmann (Eds.), Twitter and Society (pp. 263-280). New York et al.: Peter Lang.
Followers
Interactions
12
Paßmann, J., Boeschoten, T., & Shäfer, M.T. (2014). The Gift of the Gab: Retweet Cartels and Gift Economies on Twitter. In K. Weller, A. Bruns, J. Burgess, M. Mahrt & C. Puschmann (Eds.), Twitter and Society. New York et al.: Peter Lang.
Networks
13
following
Lietz, H., Wagner, C., Bleier, A., & Strohmaier, M. (2014). When politicians talk: Assessing online conversational practices of political parties on twitter. In International AAAI Conference on Weblogs and Social Media (ICWSM2014), Ann Arbor, MI, USA, June 2-4, 2014.
mentioning retweeting
Networks
14
Facebook (Paul Butler)
Data from: Facebook
https://www.facebook.com/note.php?note_id=469716398919
Data from Twitter
https://blog.twitter.com/2013/geography-tweets-3
Geo data
Geo data
Livehood Project
Daten: Foursquare (via Twitter)
http://livehoods.org/maps/montreal 16
17
http://www.nytimes.com/interactive/2009/11/26/us/20091126-search-graphic.html?_r=0 Data from: Allrecipes.com
Geo data
The Guardian
Data from: Twitter
http://www.guardian.co.uk/news/datablog/2
012/nov/28/data-shadows-twitter-uk-
floods-mapped#zoomed-picture
19 http://www.jeuneafrique.com/Article/ARTJAWEB20130215165826/internet-libreville-accra-addis-abebareseaux-sociaux-les-capitales-africaines-de-twitter-quartier-par-quartier.html#Tunis
Northeastern University and Harvard University
Data from: Twitter. http://www.ccs.neu.edu/home/amislove/twittermood/ 20
21 http://www.cci.edu.au/node/1362
The Australian Twitter-Sphere (by A. Bruns)
Some more about geo information
Overview on new geo-data:
Elwood S., Goodchild M.F., Sui D.Z. (2012). Researching Volunteered Geographic Information: Spatial Data, Geographic Research, and New Social Practice. Annals of the Association of American Geographers, 102(3), 571-590.
Leetaru K., Wang S., Cao G., Padmanabhan A., Shook E. (2013). Mapping the global Twitter heartbeat: The geography of Twitter. First Monday, 18(5).
Research Methods
24
SERIOUSLY? DO THEY NOT REALIZE THAT 99% OF TWEETS ARE WORTHLESS BABBLE THAT READ SOMETHING LIKE ‘JUST WOKE UP. GOING TO STARBUCKS NOW. GETTING LATTE.’ READER’S COMMENT FOUND IN THE COMMENT SECTION FOR GROSS, D. (2010, APRIL 14). LIBRARY OF CONGRESS TO ARCHIVE YOUR TWEETS. CNN. RETRIEVED FROM HTTP://EDITION.CNN.COM/2010/TECH/04/14/LIBRARY.CONGRESS.TWITTER/, RETRIEVED NOVEMBER 19. PHOTOS: HTTPS://WWW.FLICKR.COM/SEARCH/?TEXT=COFFEE&LICENSE=4%2C5%2C6%2C9%2C10
New type of data
Researchers value social media as a new type of data
Previously „ephemeral data“ become visible
Immediate – quick reaction to events
Structured
„natural“ data
25
“What I find really interesting is that structure becomes manifest in internet communication. So it’s the first time in history actually that we can, that social structures between people become manifest within a technology. (...) They become visible, they become crawlable, they become analyzable.”
Kinder-Kurlanda, Katharina E., and Katrin Weller. 2014. "'I always feel it must be great to be a hacker!': The role of interdisciplinary work in social media research." In Proceedings of the 2014 ACM conference on Web Science, 91-98. New York: ACM.
Approaches Surveys Experiments Interviews Web ethnography
Content analysis
Network analysis Linguistic analyses (eg. sentiment analysis)
Rather rarely used in combination Many case studies, little methodological standards
Multi-disciplinary environment
Freedom to explore new approaches
Multi-method
Exchange with other disciplines
27
How to study social media?
„information disclosure and
privacy on Facebook“
„Election prediction with Twitter data“
Challenge vs. Chance
lots of room for exploration and innovation
but
few or no standards
29
Outlook: Data collection options
APIs Official resellers „manual“ forms
of collection
Re-using published datasets
Third party tools (Crowdsourcing)
30
Big Data?
31
Big Data?
Examples from Twitter research 309,740 Twitter users (with followers and tweets) 17,803 tweets from 8,616 users + 1st degree network (3,048,360 directed
edges, 631,416 unique followers, and 715,198 unique friends) 1.3 million Twitter conversations, with each conversation containing between
2 and 243 posts 20,000 tweets 21,623,947 geo-tagged tweets 99,832 tweets But also: One person’s Twitter network (652 followers, 114 followings). Experiment with 125 students. 1,827 annotated tweets Experiment with 1677 participants Survey with 505 young American adults none
33
Different methods – in social science based Twitter research
Weller, K. (2014). What do we get from Twitter – and what not? A close look at Twitter research in the social sciences. Knowledge Organization. 41(3), 238-248
Big data? Twitter and elections
No. of Tweets No. of publications (2013)
0-500 3
501-1.000 4
1.001-5.000 1
5.001-10.000 1
10.001-50.000 7
50.001-100.000 4
100.001-500.000 5
500.001-1.000.000. 3
1.000.001-5.000.000 3
mehr als 5.000.000 3
More than 100.000.000 1
More than 1.000.000.000 1
no/insufficient details 13
Weller, K. (2014). Twitter und Wahlen: Zwischen 140 Zeichen und Milliarden von Tweets. In: R. Reichert (Ed.), Big Data: Analysen zum digitalen Wandel von Wissen, Macht und Ökonomie (pp. 239-257). Bielefeld: transcript.
Example: Twitter
35
Example: Twitter Data
Example: Twitter Data
Some small example with Twitter data
38
Testdata
Go to http://tiny.cc/testdata (link will be deactivated after the course)
Save the file to the desktop.
Open the file.
39
Who is discussing?
Identify all users, who have written at least one tweet.
What is the distribution of tweets per user?
How many users have written exactly one tweet?
Who are the five most active users? – What can you find out about them?
40
Other information
• How many tweets are geocoded?
How many RTs?
YourTwapperkeeper
41
What is going on?
Read approx. 30 tweets.
How would you approach studying what the tweets are about?
Look up approx. 10 links from tweets.
How would you approach studying what the tweets are about?
42
Frequency of URLs: #www2010
0
5
10
15
20
25
30
35
40
45
1
31
61
91
12
1
15
1
18
1
21
1
24
1
27
1
30
1
33
1
36
1
39
1
42
1
45
1
48
1
51
1
54
1
Fre
qu
en
cy o
f U
RL
on
ran
k n
URL on rank n (ranked by frequency)
Distribution of URLs from #www2010
#www2010
Weller, K., Dröge, E., & Puschmann, C. (2011). Citation Analysis in Twitter: Approaches for Defining and Measuring Information Flows within Tweets during Scientific Conferences. In M. Rowe, M. Stankovic, A.-S. Dadzie, & M. Hardey (Eds.), Making Sense of Microposts (#MSM2011), Workshop at Extended Semantic Web Conference (ESWC 2011), Crete, Greece (pp. 1–12). CEUR Workshop Proceedings Vol. 718.
Frequency of URLs: #mla09
0
5
10
15
20
25
30
1 91
72
53
34
14
95
76
57
38
18
99
71
05
11
31
21
12
91
37
14
51
53
16
11
69
17
71
85
Fre
qu
en
cy o
f U
RL
on
ran
k n
URL on rank n (ranked by frequency)
Distribution of URLs from #mla09
#mla09
URL Categorization
Blog Conference Error Media
Press Project Publication Slides
Twitter Other
Frequent URLs and their categories: #www2010 URL Frequency Category
http://blog.marcua.net/post/566480920/twitter-papers-at-the-www-2010-conference
41 Blog
http://www.danah.org/papers/talks/2010/WWW2010.html 35 Publication
http://kmi.tugraz.at/staff/markus/www2010/www2010_roomstream.html
29 Twitter
http://xquery.pbworks.com/rtp-meetup 22 Error
http://www.elon.edu/e-web/predictions/futureweb2010/carl_mala mud_www_keynote.xhtml
22 Conference
http://www.elon.edu/e-web/predictions/futureweb2010/default .xhtml
18 Conference
http://futureweb2010.wordpress.com/schedule/ 16 Conference
http://www.slideshare.net/haewoon/what-is-twitter-a-social-network-or-a-news-media-3922095
13 Slides
http://events.linkeddata.org/ldow2010/ 12 Conference
http://opengraphprotocol.org/ 12 Project
http://www.websci10.org/program.html 12 Conference
Frequent URLs and their categories: #mla09 URL Frequency Category
http://amandafrench.net/2009/12/30/make-10-louder/ 27 Blog
http://www.briancroxall.net/2009/12/28/the-absent-presence-todays-faculty/
23 Blog
http://nowviskie.org/2009/monopolies-of-invention/ 22 Blog
http://chronicle.com/article/missing-in-action-at/63276/ 20 Error
http://www.profhacker.com/?p=4448 18 Press
http://www.samplereality.com/2009/11/15/digital-humanities-sessions-at-the-2009-mla/
18 Blog
http://chronicle.com/blogpost/the-mlathe-digital/19468/ 16 Press
http://www.profhacker.com/2010/01/09/academics-and-social-media-mla09-and-twitter/
15 Press
http://academhack.outsidethetext.com/home/2010/the-mla-briancroxall-and-the-non-rise-of-the-digital-humanities/
15 Blog
http://www.samplereality.com/2010/01/02/the-mla-in-tweets/ 15 Blog
URL Categories: #mla09 and #www2010
Blog; 229
Conference; 23
Error; 69
Media; 34
Press; 123
Project; 11
Publication; 4
Slides; 0
Twitter; 22 Other; 36
Categories of URLs from #mla09 (counting all URLs, n=551)
Blog; 54
Conference; 16
Error; 28
Media; 25
Press; 34 Project; 5
Publication; 3
Slides; 0
Twitter; 14 Other; 20
Categories of URLs from #mla09 (counting unique URLs only, n=199)
Blog; 68
Conference; 37
Error; 92
Media; 71
Press; 33 Project; 51
Publication; 52
Slides; 45
Twitter; 31
Other; 94
Categories of URLs from #www2010 (counting unique URLs only, n=574)
Blog; 222
Conference; 206
Error; 201
Media; 137 Press; 92
Project; 116
Publication; 135
Slides; 106
Twitter; 76
Other; 169
Categories of URLs from #www2010 (counting all URLs, n=1460)
Internal Citations: Retweets
#www2010 #mla09
Automatically detected RTs: Number and percentage of RTs in entire conference dataset
1,121 (33.38% of
3,358)
414 (21.46% of 1,929)
∅ RTs per twitterer (automatically detected RTs, entire conference dataset)
1.24 1.12
Manually detected RTs: Number and percentage of RTs in entire conference dataset
1,318 (39.25% of 3,358)
514 (26.65% of 1,929)
Different ways to count retweets
Weller, K., Dröge, E., & Puschmann, C. (2011). Citation Analysis in Twitter: Approaches for Defining and Measuring Information Flows within Tweets during Scientific Conferences. In M. Rowe, M. Stankovic, A.-S. Dadzie, & M. Hardey (Eds.), Making Sense of Microposts (#MSM2011), Workshop at Extended Semantic Web Conference (ESWC 2011), Crete, Greece (pp. 1–12). CEUR Workshop Proceedings Vol. 718.
Testdata 2 and 3
Go to http://tiny.cc/testdata2 and/or http://tiny.cc/testdata3 (links will be deactivated after the course)
Save file to the desktop.
No. 2: Import csv to Excel.
No. 3: Open in Excel.
Explore!
50
YourTwapperkeeper
YourTwapperkeeper
54 http://www.tagsleuth.com/
„Homework“
55
Voluntary Homework
56
Think about a case study you would be interested in (in the context of social media research). Prepare a research question you would like to answer.
Alternatively, you can usa an example topic tomorrow.
Activate TagSleuth Account
Activate a free 3 day trial account for TagSleuth.
Set up a collection that matches your selected topic.
57
Conclusions 2
58
Lessons learned In the context of social science research, it is not all
about „big“ data, but about new data which can enable new types of insights.
New types of data also come with several challenges, e.g. concerning new methods.
Get to know „your“ platforms and their data – as early as possible in the research process. Familiarizing with platforms may take some time.
Identify what is feasible or not in your domain of interest.
If your ideal dataset is not accessible, think about proxies.
59
If you have time to read 3 papers…
Almuhimedi, H., Wilson, S., Liu, B., Sadeh, N., & Acquisti, A. (2013). Tweets are forever: a large-scale quantitative analysis of deleted tweets (p. 897). ACM Press. http://doi.org/10.1145/2441776.2441878
Fabio Giglietto, Luca Rossi, Davide Bennato (2012) The Open Laboratory: Limits and Possibilities of Using Facebook, Twitter, and YouTube as a Research Data Source, 145-159. In: Journal of Technology in Human Services 30 (3-4).
Mahrt, M., & Scharkow, M. (2013). The value of big data in digital media research. In: Journal of Broadcasting & Electronic Media, 57(1), 20-33.
60