new methodologies for capturing and working with publicly available twitter data
Post on 10-May-2015
673 Views
Preview:
DESCRIPTION
TRANSCRIPT
New Methodologies for Capturing and Working with Publicly Available Twitter Data
Associate Professor Axel Bruns @snurb_dot_infohttp://mappingonlinepublics.net/ Queensland University of Technology
WHY TWITTER?
• Researching Twitter:– Significant world-wide social network– ~500 million accounts (but how many active?)– Varied range of uses: from phatic communication to emergency coordination– Healthy third-party ecosystem (for now)– Strong history of user innovation:
@replies, #hashtags– Flat and open network structure:
non-reciprocal following, public profiles by default– Good API for gathering (big) data for research
NEW MEDIA AND PUBLIC COMMUNICATION: MAPPING AUSTRALIAN USER-CREATED CONTENT
IN ONLINE SOCIAL NETWORKS
• Australian Research Council (ARC) Discovery Project (2010-13) – $410,000– QUT (Brisbane), Sociomantic Labs (Berlin)– First comprehensive study of Australian social media use– Computer-assisted cultural analysis: tracking, mapping, analysing blogs, Twitter, Flickr,
YouTube as ‘networked publics’– Addressing the problem of scale (‘Big Data’) and disciplinary change in media, cultural and
communication studies – natively digital methods– Studying society with the Internet (Richard Rogers)
http://mappingonlinepublics.net/
• Data Gathering– yourTwapperkeeper + in-house crawler
• Data Processing– Gawk – open source, multiplatform, programmable command-line tool for
processing CSV documents
• Textual Analysis– Leximancer – commercial, multiplatform: extracts key concepts from large
corpora of text, examines and visualises concept co-occurrence– WordStat – commercial, PC-only text analysis tool; generates concept co-
occurrence data that can be exported for visualisation
• Visualisation– Gephi – open source, multiplatform network visualisation tool
A TWITTER RESEARCH TOOLKIT
SO NOW WHAT?
APPROACHING TWITTER
• Possible research questions:– Hashtags as vehicles for ad hoc events and publics:
• How do online publics form and dissolve? How do they interact, what structures do they form?
• Where do they draw information from? What do they share?• Do they simply consist of the usual suspects? How insular and disconnected
are online publics?
– Hashtags in context:• How do different hashtag events compare? Are there common types of
hashtags/publics?• How ‘big’ are they? What topics attract attention on Twitter?• What community (?) structures emerge?
DEVELOPING TWITTER METRICS
• Key data points available through the Twitter API:– text: contents of the tweet itself, in 140 characters or less– to_user_id: numerical ID of the tweet recipient (for @replies)– from_user: screen name of the tweet sender– id: numerical ID of the tweet itself– from_user_id: numerical ID of the tweet sender– iso_language_code: code (e.g. en, de, fr, ...) of the sender’s default
language– source: client software used to tweet (e.g. Web, Tweetdeck, ...)– profile_image_url: URL of the tweet sender’s profile picture– geo_type: format of the sender’s geographical coordinates– geo_coordinates_0: first element of the geographical coordinates – geo_coordinates_1: second element of the geographical coordinates– created_at: tweet timestamp in human-readable format– time: tweet timestamp as a numerical Unix timestamp
DEVELOPING TWITTER METRICS
• Additional data points from tweets:– original tweets: tweets which are neither @reply nor retweet– retweets: tweets which contain RT @user… (or similar)
• unedited retweets: retweets which start with RT @user…• edited retweets: retweets do not start with RT @user…
– genuine @replies: tweets which contain @user, but are not retweets– URL sharing: tweets which contain URLs
• Potential uses:– metrics per hashtag– metrics per timeframe (day, hour, minute, second, …)– metrics per user (or group of users)– …
(Bruns & Stieglitz, forthcoming)
#QLDFLOODS @REPLIES
mainstream media
authorities
#ROYALWEDDING
#AUSPOL (FEB.-DEC. 2011)
HA
SH
TA
G M
ET
RIC
S
BEYOND HASHTAGS
• Publics on Twitter:– Micro: @reply and retweet conversations– Meso: follower/followee networks– Macro: hashtag ‘communities’ (Bruns & Moe, forthcoming)
Multiple overlapping publics / networks
• What drives their formation and dissipation?• How do they interact and interweave?• How are they interleaved with the wider media ecology?• Twitter doesn’t contain publics: publics transcend Twitter
‘BIG DATA’ AND THE DIGITAL HUMANITIES
• Emerging needs in Twitter research:– Unified, compatible methods and metrics for Twitter analysis
Tools and approaches shared at http://mappingonlinepublics.net/
– Powerful infrastructure for long-term, high-volume tracking of public communication on Twitter
Data access requires substantial funding stream
– Facilities for long-term data storage and preservation Key roles for National Libraries, National Archives
– Integration with related datasets (e.g. MSM content) Need to address data interoperability questions
– Robust frameworks for Internet research ethics Clear guidelines which take into account complex new public/private structures
• Twitter as a test case for digital humanities research– Widespread, open, public platform for everyday communication– Tool for observing society at scale through Internet research
http://mappingonlinepublics.net/@snurb_dot_info
@jeanburgess
@_StephenH
@DrTNitins
@timhighfield
@cdtavijit
top related