insights into the twitterverse: benchmarking and analysis twitter content

35
Exploring the Streams of Online Community Conversation: Insights into the Twitterverse @stephendann Australian National University

Upload: stephen-dann

Post on 12-May-2015

535 views

Category:

Business


0 download

DESCRIPTION

The 2014 Remix of the Twitter Content Classification framework now featuring statistics, radar plots, Linguistic Inquiry Word Count, Leximancer, network plots and more opportunities to run maths, stats and graphs than ever before.

TRANSCRIPT

Page 1: Insights into the Twitterverse: Benchmarking and analysis twitter content

Exploring the Streams of Online Community Conversation:

Insights into the Twitterverse

@stephendannAustralian National University

Page 2: Insights into the Twitterverse: Benchmarking and analysis twitter content

Usual House Rules

@stephendann for questions#anzmac13 for commentary

Page 3: Insights into the Twitterverse: Benchmarking and analysis twitter content

A little context

The Past• Dann (2010)

– Six top level twitter categories– 23 sub domains

• Dann (2011)– Six top level– 28 sub domain

The Present• Dann (Today)

– Six Top Level Categories • No sub domain analysis

– Secondary Processing • Leximancer• Linguistic Inquiry Word

Count

Page 4: Insights into the Twitterverse: Benchmarking and analysis twitter content

Twitter Analysis 2.0.14

The Procedure

Page 5: Insights into the Twitterverse: Benchmarking and analysis twitter content

Acquire Research Question• Does Event X change the tweeting patterns of Account @Y?

• Do responses to the #hashtag event change over time?– #EventTags in Time Period A will have more Status than in Time Period D– Time Period D will have more Pass Along than Status

• What were they thinking? – Dominant Categories of tweets over time within a selected account

• Do comments change by platform for account @X? – mobile versus web versus desktop

• Does @BrandX engage with the community?– Conversational over all other types over capture time period

Page 6: Insights into the Twitterverse: Benchmarking and analysis twitter content

Acquire your data

• Personal timelines – Download from Twitter

• #Hashtag captures – Hootsuite

• Time line captures – Choose your own adventure– Getting worse, harder and

Twitter’s API is less available.

• Try to avoid big data

Page 7: Insights into the Twitterverse: Benchmarking and analysis twitter content

Big Data

• If you are Axel Bruns, fine, continue– http://mappingonlinepublics.net/

• For everyone else, what are you looking for? – What sample suits your research question?

Page 8: Insights into the Twitterverse: Benchmarking and analysis twitter content

Process your data

• Stand by for ugliness and manual coding*– Extract data into Excel

• Excel allows for additional data inputs as you progress the analysis

– Keep tweet visible• Only keep a column visible if it fits your research question

– Eg date, time, @user, platform

– Add column for Tweet ID, category, cat_n• Sub category, sub_cat_n for the detailed version

*Automated coding? People are working on it. It’s a terrible idea that’ll happen anyway

Page 9: Insights into the Twitterverse: Benchmarking and analysis twitter content

Manual Coding

• Use the Dann (2010) or Dann (2011) top level domains– Dann (201X) is under development• I broke something important earlier this year

• Manual coding is superior– Nuance and interpretation counts.

Page 10: Insights into the Twitterverse: Benchmarking and analysis twitter content

Pick a box1 Conversational Uses an @statement to address another user

2 News Events Identifiable news content

3 Pass along Tweets of endorsement of content

4 Phatic Content independent connected presence

5 StatusTweets which address the statement "What are you doing?" and "What's happening?" in terms of an account holder's experiences

6 Spam Unsolicited content

Page 11: Insights into the Twitterverse: Benchmarking and analysis twitter content

Keep it on manualConversational Uses an @statement to address another user

1.1 Action Activities involving other Twitter users, or tweets which describe the presence of other Twitter users.

1.2 QueryAny statement style tweet that ends with a question mark, as it represents an active attempt to engage responses from the community

1.3 Referral An @response which contains URLs or recommendation of other Twitter users. (Excludes RT @user)

1.4 ResponseClassification for tweets which commence with another user’s name and which do not meet the requirements of the referral category

1.5 Rhetoric QuestionAsked and answered within the same tweet (distinct from Conversational - Query) which may not require (but may elicit) audience response

Page 12: Insights into the Twitterverse: Benchmarking and analysis twitter content

Upgrades

Pass along Tweets of endorsement of content

3.1 Automated Endorsement Status announcements triggered by third party applications which publish URLs3.2 Endorsement Links to web content not created by the sender

3.3 Retweet Any statement reproducing another Twitter status using the via @ or RT protocol

3.4 Secondary Social Media Links to Facebook (fb.me) or similar social media platform3.5 User generated content Links to own content created by the user

3.6 QuoteComment marked with “ “ to represent a direct quote, paraphrase of a statement without a source URL, including reference to offline speaker or overheard (OH)

3.7 CiteAny tweet which contains a reference in a recognised Harvard, Oxford or similar format

3.8 Modified ReTweet Acknowledgement of the use of MT protocol to allow for an edited RT.

Page 13: Insights into the Twitterverse: Benchmarking and analysis twitter content

Speed Hacking Excel

• Speed hacks exist– Alphabet Tweet Sort• @, RT, MT cluster

• “Find all” selecting.

Page 14: Insights into the Twitterverse: Benchmarking and analysis twitter content

Coding Time!

• Cross check the coding– Some variance is okay– Resolve it through the

usual traditions

Page 15: Insights into the Twitterverse: Benchmarking and analysis twitter content

Sample Data #qldquake

Page 16: Insights into the Twitterverse: Benchmarking and analysis twitter content

Coded

Page 17: Insights into the Twitterverse: Benchmarking and analysis twitter content

Analysis Table Block

Category Tweet(TCat)

TweetRatio

MaxDensity

ActualCharacters

CharacterDensity

Density Ratio

Conversational News Pass Along Phatic Spam Status n

Page 18: Insights into the Twitterverse: Benchmarking and analysis twitter content

Tweet Math Dude

• Tweet Count– N per category

• Calculate the Tweet Ratio– Tweet ratio is a normalized rank order of the highest volume of

tweets, where the most common category is scored as 1

• Calculating the Tweet Ratio– Highest number of tweets in a single category = TTMax

– Tweets per category = TCat

– Ratio is Tcat / TTMax

I’m only mildly mocking statistical analysis here

Page 19: Insights into the Twitterverse: Benchmarking and analysis twitter content

Maximum Character Density

• Max Density = 140 x TCat [number of tweets in each category]

• Theoretical range for a tweet is between 1 and 140 characters• Maximum tweet is 140 characters• More characters used, more information density

• Calculate Character Density – (Actual Character / Max Density)

• Divide each CharDensity score by the highest Char density• Normalise CharDensity score to rank order

Page 20: Insights into the Twitterverse: Benchmarking and analysis twitter content

Reporting the DataCategory Tweet

(TCat)TweetRatio

MaxDensity

ActualCharacters

CharacterDensity

Density Ratio

Conversational 39 0.08 5460 3533 65% 0.81

News 41 0.08 5740 3778 66% 0.83

Pass Along 481 1 67340 53491 79% 1.00

Phatic 21 0.04 2940 2179 74% 0.93

Spam 1 0.00 140 81 58% 0.73

Status 18 0.03 2520 1543 61% 0.77

n 601 84140 64605 77%

Page 21: Insights into the Twitterverse: Benchmarking and analysis twitter content

Reporting the DataConversational

News

Pass Along

Phatic

Spam

Status

0

1

2

Ratio Density

Page 22: Insights into the Twitterverse: Benchmarking and analysis twitter content

Text Analysis Wave 1

Linguistic Inquiry Word CountSo. Very. Fast.

Page 23: Insights into the Twitterverse: Benchmarking and analysis twitter content

LIWC

• http://www.liwc.net/– text analysis software – calculates the degree to which people use

different categories of words in texts• 70 other language dimensions.

– positive or negative emotions, – self-references, – causal words,

Page 24: Insights into the Twitterverse: Benchmarking and analysis twitter content

A giant bucket of data

• 70 variables– So have a hypothesis and a purpose for the

analysis

• Differences in tweet construction– Word Counts– Unique Words

Page 25: Insights into the Twitterverse: Benchmarking and analysis twitter content

Results

Average Word Count (AWC) Unique Word Count (UWC)

Category AWC AWC_Ratio

Conversational12.82 0.78

News 13.56 0.82

Pass Along16.35 1

Phatic 15.42 0.94Status 12.94 0.79

Category UWC UWC_Ratio

Conversational93 0.97

News 93 0.97

Pass Along92 0.96

Phatic 93 0.97Status 96 1

Page 26: Insights into the Twitterverse: Benchmarking and analysis twitter content

Results

Word Count Unique Word

Conversational

News

Pass AlongPhatic

Status

0

0.5

1Conversational

News

Pass AlongPhatic

Status

0.9

0.95

1

Chart Title

Page 27: Insights into the Twitterverse: Benchmarking and analysis twitter content

Text Analysis Wave 2

Leximancer

Page 28: Insights into the Twitterverse: Benchmarking and analysis twitter content

Leximancer

• Import into Leximancer as an individual analysis (individual project)– Edit Pre processing options: Sentence per block 1– Run to Generate Outputs– Generate Concept Map

Page 29: Insights into the Twitterverse: Benchmarking and analysis twitter content

Map time!

Page 30: Insights into the Twitterverse: Benchmarking and analysis twitter content

Four sample maps

Entirely because quadrants fit on screens better than hexes. No other reason

conversational

news

pass along

phatic

Page 31: Insights into the Twitterverse: Benchmarking and analysis twitter content

Tweet Network Density

• Calculate Network Density– Count Nodes (n)– Count Actual Connections (e) Edges (paths

between nodes)– Calculate Network density based on 2e / n(n-1)

• Network Density Notes– Calculate potential connections

Page 32: Insights into the Twitterverse: Benchmarking and analysis twitter content

Pass Along Network

Nodes Edges Network Density15 15 0.14

Page 33: Insights into the Twitterverse: Benchmarking and analysis twitter content

Network Density Results

Category Nodes Edges Network Density

Conversational 13 12 0.15

News 18 17 0.11

Pass Along 15 15 0.14

Phatic 3 2 0.67

Status 4 3 0.50

n 19 17 0.10

Conversational

News

Pass AlongPhatic

Status

0

0.5

1

Page 34: Insights into the Twitterverse: Benchmarking and analysis twitter content

One Bucket of Data

• This is why a research question is important– You can map a range of information– None of it is useful without the RQ / hypothesis– It’s pretty, but not valuable

Category Tweet Density Network Ave.WCUnique Words

Conversational 0.081081 0.819075 0.814598 0.830959 0.96875News 0.085239 0.83315 0.828595 0.878952 0.96875Pass Along 1 1.005496 1 1.059722 0.958333Phatic 0.043659 0.938173 0.933044 1 0.96875Status 0.037422 0.775065 0.770829 0.838992 1

Tweet Density Network Ave.WC Unique Words0

0.2

0.4

0.6

0.8

1

1.2

Chart Title

Conversational News Pass Along Phatic Status