what do we get from twitter - and what not?
DESCRIPTION
Presentation held at World Social Science Forum 2013, Montreal.TRANSCRIPT
What Do We Get From Twitter – And What Not?
An Introduction to (Popular) Twitter Research in the Social Sciences
World Social Science Forum 2013 Montréal, Canada
Dr. Katrin Weller
[email protected], @kwelle, http://kwelle.wordpress.org
Work in progress
• First results
• Broader objective
Why study Twitter?
Exemplary Twitter research areas
4
Brand Communication & Marketing
Crises & Natural Disasters Elections & Politics
Popular Culture Education & Scholarly
Communication Health Care & Diseases
Visions
• New source of research data
• Availability, Popularity, Metadata
• Insights into what people do and think
• Big data
• Predictions
But: there are limitations and challenges.
Growing interest
6
0
50
100
150
200
250
300
350
400
450
2007 2008 2009 2010 2011 2012 2013
Number of Publications(Scopus total)
number of Publications(social sciences only)
SCOPUS search: TITLE(twitter) AND PUBYEAR > 2005, 22.08.2013.
How to study Twitter?
0 100 200 300 400 500 600 700
Computer Science
Social Sciences
Engineering
Medicine
Mathematics
Business, Management and Accounting
Arts and Humanities
Decision Sciences
Psychology
Agricultural and Biological Sciences
Economics, Econometrics and Finance
Nursing
Biochemistry, Genetics and Molecular Biology
Health Professions
Earth and Planetary Sciences
Environmental Science
Physics and Astronomy
Pharmacology, Toxicology and Pharmaceutics
Chemistry
Materials Science
Neuroscience
Energy
Multidisciplinary
Chemical Engineering
Veterinary
Dentistry
Immunology and Microbiology
Disciplines
298 publications from social sciences
Methods? Objectives? Data?
• Current approach: close look at the most popular (= highly cited) publications.
• Next steps:
– Bibliometric analyses including more sources
– Qualitative interviews with Twitter researchers
Top 17 (more than 50% of all citations), plus top 3 from 2012.
No. Publication Citations
[1] Huberman, B. A., Romero, D. M., & Wu, F. (2009). Social networks that matter: Twitter under the microscope. First Monday, 14(1).
Retrieved from http://firstmonday.org/ojs/index.php/fm/article/view/2317/2063
155
[2] Marwick, A. E., & boyd, d. (2011). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New
Media & Society, 13(1), 114–133. doi:10.1177/1461444810365313
77
[3] Junco, R., Heiberger, G., & Loken, E. (2011). The effect of Twitter on college student engagement and grades. Journal of Computer Assisted
Learning, 27(2), 119–132. doi:10.1111/j.1365-2729.2010.00387.x
55
[4] Yardi, S., Romero, D., Schoenebeck, G., & boyd, d. (2010). Detecting spam in a Twitter network. First Monday, 15(1). Retrieved from
http://firstmonday.org/ojs/index.php/fm/article/view/2793/2431
28
[5] Ritter, A., Cherry, C., & Dolan, B. (2010). Unsupervised modeling of Twitter conversations. In HTL'10 Human Language Technologies. The
2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 172–180). Stroudsburg, Pa:
Association for Computational Linguistics (ACL). Retrieved from http://dl.acm.org/citation.cfm?id=1858019
27
[6] Petrovic, S., Osborne, M., & Lavrenko, V. (2010). Streaming first story detection with application to Twitter. In HTL'10 Human Language
Technologies. The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 181–189).
Stroudsburg, Pa: Association for Computational Linguistics (ACL). Retrieved from http://dl.acm.org/citation.cfm?id=1858020
26
[7] Jiang, L., Yu, M., Zhou, M., Liu, X., & Zhao, T. (2011). Target-dependent Twitter sentiment classification. In HLT '11 Proceedings of the 49th
Annual Meeting of the Association for Computational Linguistics: Human Language Technologies:. Short papers - Volume 2 (pp. 151–160).
Retrieved from http://dl.acm.org/citation.cfm?id=2002492
22
[8] Han, B., & Baldwin, T. (2011). Lexical normalisation of short text messages: makn sens a #twitter. In HLT '11 Proceedings of the 49th Annual
Meeting of the Association for Computational Linguistics: Human Language Technologies. Short papers - Volume 2 (pp. 368–378). Retrieved
from http://dl.acm.org/citation.cfm?id=2002520
22
[9] Gimpel, K., Schneider, N., O'Connor, B., Das, D., Mills, D., Eisenstein, J., Heilmann, M., … (2011). Part-of-speech tagging for Twitter:
Annotation, features, and experiments. In HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:
Human Language Technologies. Short papers - Volume 2 (pp. 42–47). Retrieved from http://dl.acm.org/citation.cfm?id=2002747
21
[10] Schultz, F., Utz, S., & Göritz, A. (2011). Is the medium the message? Perceptions of and reactions to crisis communication via twitter, blogs
and traditional media. Public Relations Review, 37(1), 20–27. doi:10.1016/j.pubrev.2010.12.001
19
[11] Barbosa, L., & Feng, J. (2010). Robust sentiment detection on twitter from biased and noisy data. In COLING '10 Proceedings of the 23rd
International Conference on Computational Linguistics (pp. 36–44).
19
[12] Davidov, D., Tsur, O., & Rappoport, A. (2010). Enhanced sentiment lerarning using Twitter hashtags and smileys. In COLING '10 Proceedings
of the 23rd International Conference on Computational Linguistics (pp. 241–249). Retrieved from
http://dl.acm.org/citation.cfm?id=1944566.1944594
19
[13] Hargittai, E., & Litt, E. (2011). The tweet smell of celebrity success: Explaining variation in Twitter adoption among a diverse group of young
adults. New Media & Society, 13(5), 824–842. doi:10.1177/1461444811405805
18
[14] Zhou, X., Lee, W.-C., Peng, W.-C., Xie, X., Lee, R., & Sumiya, K. Measuring geographical regularities of crowd behaviors for Twitter-based geo-
social event detection, 1. doi:10.1145/1867699.1867701
18
[15] Gruzd, A., Wellman, B., & Takhteyev, Y. (2011). Imagining Twitter as an Imagined Community. American Behavioral Scientist, 55(10), 1294–
1318. doi:10.1177/0002764211409378
17
[16] Johnson, K. A. (2011). The effect of Twitter posts on students’ perceptions of instructor credibility. Learning, Media and Technology, 36(1),
21–38. doi:10.1080/17439884.2010.534798
16
[17] Alina Mungiu-Pippidi, & Igor Munteanu. (2009). Moldova's "Twitter Revolution". Journal of Democracy, 20(3), 136–142.
doi:10.1353/jod.0.0102
16
[18] Larsson, A. O., & Moe, H. (2012). Studying political microblogging: Twitter users in the 2010 Swedish election campaign. New Media &
Society, 14(5), 729–747. doi:10.1177/1461444811422894
15
[19] Lasorsa, D. L., Lewis, S. C., & Holton, A. E. (2012). Normalizing Twitter: Journalism practice in an emerging communication space. Journalism
Studies, 13(1), 19–36. doi:10.1080/1461670X.2011.571825
15
[29] Takhteyev, Y., Gruzd, A., & Wellman, B. (2012). Geography of Twitter networks. Social Networks, 34(1), 73–81.
doi:10.1016/j.socnet.2011.05.006
14
Results
Results I
Applied methods include: • interviews with Twitter users, • experimental settings of using Twitter in certain
environments, • quantitative analysis of tweets and their
characteristics, • network analysis of (following) users, • linguistic analyses, e.g. word clustering, event
detection, sentiment analysis, • analysis of tweets.
Almost no combination of methods
Results II
Data
• collections of tweets (retrieved randomly or based on specific search criteria)
• user profiles / user networks
• data from experiments, surveys, interviews.
Different datasets
Different sizes
Results III Datasets - Examples • 309740 Twitter users (with followers and tweets)
• Experiment with 125 students.
• 17,803 tweets from 8,616 users + 1st degree network (3,048,360 directed edges, 631,416 unique followers, and 715,198 unique friends)
• 1.3 million Twitter conversations, with each conversation containing between 2 and 243 posts
• 20,000 tweets
• 1,827 annotated tweets
• Experiment with 1677 participants
• Survey with 505 young American adults
• 21,623,947 geo-tagged tweets
• One person’s Twitter network (652 followers, 114 followings).
• none
• 99,832 tweets
Results IV
Challanges and Limitations
- Technical challenges (API limitations, data quality)
- Ethical issues (?)
- Co-existing approaches for the same problem (e.g. sentiment analysis)
- Sampling (size of dataset, random sample, subsample, biases, representativeness)
- User information, geo information often unavailable.
Outlook
Thank you for your attention!
Coming soon: Weller, K., Bruns, A., Burgess, J., Mahrt , M. & Puschmann, C. (Ed.) (2013). Twitter and Society. New York: Peter Lang.