what do we get from twitter - and what not?

What Do We Get From Twitter – And What Not?

An Introduction to (Popular) Twitter Research in the Social Sciences

World Social Science Forum 2013 Montréal, Canada

Dr. Katrin Weller

[email protected], @kwelle, http://kwelle.wordpress.org

http://kwelle.wordpress.org/

Work in progress

• First results

• Broader objective

Why study Twitter?

Exemplary Twitter research areas

4

Brand Communication & Marketing

Crises & Natural Disasters Elections & Politics

Popular Culture Education & Scholarly

Communication Health Care & Diseases

Visions

• New source of research data

• Availability, Popularity, Metadata

• Insights into what people do and think

• Big data

• Predictions

But: there are limitations and challenges.

Growing interest

6

0

50

100

150

200

250

300

350

400

450

2007 2008 2009 2010 2011 2012 2013

Number of Publications(Scopus total)

number of Publications(social sciences only)

SCOPUS search: TITLE(twitter) AND PUBYEAR > 2005, 22.08.2013.

How to study Twitter?

0 100 200 300 400 500 600 700

Computer Science

Social Sciences

Engineering

Medicine

Mathematics

Business, Management and Accounting

Arts and Humanities

Decision Sciences

Psychology

Agricultural and Biological Sciences

Economics, Econometrics and Finance

Nursing

Biochemistry, Genetics and Molecular Biology

Health Professions

Earth and Planetary Sciences

Environmental Science

Physics and Astronomy

Pharmacology, Toxicology and Pharmaceutics

Chemistry

Materials Science

Neuroscience

Energy

Multidisciplinary

Chemical Engineering

Veterinary

Dentistry

Immunology and Microbiology

Disciplines

298 publications from social sciences

Methods? Objectives? Data?

• Current approach: close look at the most popular (= highly cited) publications.

• Next steps:

– Bibliometric analyses including more sources

– Qualitative interviews with Twitter researchers

Top 17 (more than 50% of all citations), plus top 3 from 2012.

No. Publication Citations

[1] Huberman, B. A., Romero, D. M., & Wu, F. (2009). Social networks that matter: Twitter under the microscope. First Monday, 14(1).

Retrieved from http://firstmonday.org/ojs/index.php/fm/article/view/2317/2063

155

[2] Marwick, A. E., & boyd, d. (2011). I tweet honestly, I tweet passionately: Twitter users, context collapse, and the imagined audience. New

Media & Society, 13(1), 114–133. doi:10.1177/1461444810365313

77

[3] Junco, R., Heiberger, G., & Loken, E. (2011). The effect of Twitter on college student engagement and grades. Journal of Computer Assisted

Learning, 27(2), 119–132. doi:10.1111/j.1365-2729.2010.00387.x

55

[4] Yardi, S., Romero, D., Schoenebeck, G., & boyd, d. (2010). Detecting spam in a Twitter network. First Monday, 15(1). Retrieved from

http://firstmonday.org/ojs/index.php/fm/article/view/2793/2431

28

[5] Ritter, A., Cherry, C., & Dolan, B. (2010). Unsupervised modeling of Twitter conversations. In HTL'10 Human Language Technologies. The

2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 172–180). Stroudsburg, Pa:

Association for Computational Linguistics (ACL). Retrieved from http://dl.acm.org/citation.cfm?id=1858019

27

[6] Petrovic, S., Osborne, M., & Lavrenko, V. (2010). Streaming first story detection with application to Twitter. In HTL'10 Human Language

Technologies. The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (pp. 181–189).

Stroudsburg, Pa: Association for Computational Linguistics (ACL). Retrieved from http://dl.acm.org/citation.cfm?id=1858020

26

[7] Jiang, L., Yu, M., Zhou, M., Liu, X., & Zhao, T. (2011). Target-dependent Twitter sentiment classification. In HLT '11 Proceedings of the 49th

Annual Meeting of the Association for Computational Linguistics: Human Language Technologies:. Short papers - Volume 2 (pp. 151–160).

Retrieved from http://dl.acm.org/citation.cfm?id=2002492

22

[8] Han, B., & Baldwin, T. (2011). Lexical normalisation of short text messages: makn sens a #twitter. In HLT '11 Proceedings of the 49th Annual

Meeting of the Association for Computational Linguistics: Human Language Technologies. Short papers - Volume 2 (pp. 368–378). Retrieved

from http://dl.acm.org/citation.cfm?id=2002520

22

[9] Gimpel, K., Schneider, N., O'Connor, B., Das, D., Mills, D., Eisenstein, J., Heilmann, M., … (2011). Part-of-speech tagging for Twitter:

Annotation, features, and experiments. In HLT '11 Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:

Human Language Technologies. Short papers - Volume 2 (pp. 42–47). Retrieved from http://dl.acm.org/citation.cfm?id=2002747

21

[10] Schultz, F., Utz, S., & Göritz, A. (2011). Is the medium the message? Perceptions of and reactions to crisis communication via twitter, blogs

and traditional media. Public Relations Review, 37(1), 20–27. doi:10.1016/j.pubrev.2010.12.001

19

[11] Barbosa, L., & Feng, J. (2010). Robust sentiment detection on twitter from biased and noisy data. In COLING '10 Proceedings of the 23rd

International Conference on Computational Linguistics (pp. 36–44).

19

[12] Davidov, D., Tsur, O., & Rappoport, A. (2010). Enhanced sentiment lerarning using Twitter hashtags and smileys. In COLING '10 Proceedings

of the 23rd International Conference on Computational Linguistics (pp. 241–249). Retrieved from

http://dl.acm.org/citation.cfm?id=1944566.1944594

19

[13] Hargittai, E., & Litt, E. (2011). The tweet smell of celebrity success: Explaining variation in Twitter adoption among a diverse group of young

adults. New Media & Society, 13(5), 824–842. doi:10.1177/1461444811405805

18

[14] Zhou, X., Lee, W.-C., Peng, W.-C., Xie, X., Lee, R., & Sumiya, K. Measuring geographical regularities of crowd behaviors for Twitter-based geo-

social event detection, 1. doi:10.1145/1867699.1867701

18

[15] Gruzd, A., Wellman, B., & Takhteyev, Y. (2011). Imagining Twitter as an Imagined Community. American Behavioral Scientist, 55(10), 1294–

1318. doi:10.1177/0002764211409378

17

[16] Johnson, K. A. (2011). The effect of Twitter posts on students’ perceptions of instructor credibility. Learning, Media and Technology, 36(1),

21–38. doi:10.1080/17439884.2010.534798

16

[17] Alina Mungiu-Pippidi, & Igor Munteanu. (2009). Moldova's "Twitter Revolution". Journal of Democracy, 20(3), 136–142.

doi:10.1353/jod.0.0102

16

[18] Larsson, A. O., & Moe, H. (2012). Studying political microblogging: Twitter users in the 2010 Swedish election campaign. New Media &

Society, 14(5), 729–747. doi:10.1177/1461444811422894

15

[19] Lasorsa, D. L., Lewis, S. C., & Holton, A. E. (2012). Normalizing Twitter: Journalism practice in an emerging communication space. Journalism

Studies, 13(1), 19–36. doi:10.1080/1461670X.2011.571825

15

[29] Takhteyev, Y., Gruzd, A., & Wellman, B. (2012). Geography of Twitter networks. Social Networks, 34(1), 73–81.

doi:10.1016/j.socnet.2011.05.006

14

Results

Results I

Applied methods include: • interviews with Twitter users, • experimental settings of using Twitter in certain

environments, • quantitative analysis of tweets and their

characteristics, • network analysis of (following) users, • linguistic analyses, e.g. word clustering, event

detection, sentiment analysis, • analysis of tweets.

Almost no combination of methods

Results II

Data

• collections of tweets (retrieved randomly or based on specific search criteria)

• user profiles / user networks

• data from experiments, surveys, interviews.

Different datasets

Different sizes

Results III Datasets - Examples • 309740 Twitter users (with followers and tweets)

• Experiment with 125 students.

• 17,803 tweets from 8,616 users + 1st degree network (3,048,360 directed edges, 631,416 unique followers, and 715,198 unique friends)

• 1.3 million Twitter conversations, with each conversation containing between 2 and 243 posts

• 20,000 tweets

• 1,827 annotated tweets

• Experiment with 1677 participants

• Survey with 505 young American adults

• 21,623,947 geo-tagged tweets

• One person’s Twitter network (652 followers, 114 followings).

• none

• 99,832 tweets

Results IV

Challanges and Limitations

- Technical challenges (API limitations, data quality)

- Ethical issues (?)

- Co-existing approaches for the same problem (e.g. sentiment analysis)

- Sampling (size of dataset, random sample, subsample, biases, representativeness)

- User information, geo information often unavailable.

Outlook

Thank you for your attention!

Coming soon: Weller, K., Bruns, A., Burgess, J., Mahrt , M. & Puschmann, C. (Ed.) (2013). Twitter and Society. New York: Peter Lang.

what do we get from twitter - and what not?

Technology

study twitter

twitter hashtags

effect of twitter

twitter researchers

twitter users

twitter network

popular twitter research

computational linguistics