analyzing social media with python and other tools (1/4)

82
Good morning! Enjoy your coffee and install Putty and NotepadPlus via "Software Maintance/Application Catalgue". And the Pattern-package (see my e-mail). Thanks.

Upload: damian-trilling

Post on 26-Jan-2015

107 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Analyzing social media with Python and other tools (1/4)

Good morning! Enjoy your coffee and installPutty and NotepadPlus via "Software Maintance/ApplicationCatalgue". And the Pattern-package (see my e-mail). Thanks.

Page 2: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Hands-on-WorkshopBig (Twitter) Data

Damian Trilling

[email protected]@damian0604

www.damiantrilling.net

Afdeling CommunicatiewetenschapUniversiteit van Amsterdam

30 January 20149.30

#bigdata Damian Trilling

Page 3: Analyzing social media with Python and other tools (1/4)
Page 4: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

The next one and a half days

You’ll hear about

• Collecting social media data via APIs, RSS and scraping (andthe tools for it)

• Technical infrastructure (via surfsara)• Python• Sentiment analysis• Automated coding• Frequencies and other statistics• Social network analysis with Gephi• . . .

#bigdata Damian Trilling

Page 5: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

In this session (1/4):

1 Big Data? What are we talking about?Exploring the fieldSome examples

2 The process: collect, store, analyzeA schemeOur implementation

3 PythonWhat it isWhen to use itWhen not to use it

4 Questions?

#bigdata Damian Trilling

Page 6: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Exploring the field

What’s big data?What are we talking about?

#bigdata Damian Trilling

Page 7: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Exploring the field

What are we talking about?

Today, it’s a hands-on workshop, so let’s keep this important (!)discussion for later.

So, no definition, but some brief thoughts

• Existing data ( 6= experiments or surveys)• Too big to code manually• Too big to handle with normal tools• New research questions• Call to revisit the relationship between theory and empiricalresearch

#bigdata Damian Trilling

Page 8: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Exploring the field

What are we talking about?

Today, it’s a hands-on workshop, so let’s keep this important (!)discussion for later.

So, no definition, but some brief thoughts

• Existing data ( 6= experiments or surveys)• Too big to code manually• Too big to handle with normal tools• New research questions• Call to revisit the relationship between theory and empiricalresearch

#bigdata Damian Trilling

Page 9: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Exploring the field

What are we talking about?

Today, . . .

• we are not going to talk about REALLY BIG data,• but we will have some exercises on datasets a normalcomputer can handle

Tomorrow, . . .

• we will also learn about scaling up these techniques• SurfSARA provides infrastructure for this

#bigdata Damian Trilling

Page 10: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Exploring the field

What are we talking about?

Today, . . .

• we are not going to talk about REALLY BIG data,• but we will have some exercises on datasets a normalcomputer can handle

Tomorrow, . . .

• we will also learn about scaling up these techniques• SurfSARA provides infrastructure for this

#bigdata Damian Trilling

Page 11: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Exploring the field

What are we talking about?

Some sources

• Social Network Sites• RSS-feeds• Databases• Scraping text from the web• . . .

#bigdata Damian Trilling

Page 12: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Exploring the field

It’s out there!You only have to collect it.

#bigdata Damian Trilling

Page 13: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Exploring the field

But why should we care?

We can answer new questions

• Find needles in haystacks• Identify networks, co-word analysis, linguistic analysis, . . .• Verify our theories in larger datasets

It makes sense

• There are things that computers are simply better at thanhumans, e.g. in counting things

• Having human coders look for words in texts is like calculatinga regression analysis by hand

#bigdata Damian Trilling

Page 14: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Exploring the field

But why should we care?

We can answer new questions

• Find needles in haystacks• Identify networks, co-word analysis, linguistic analysis, . . .• Verify our theories in larger datasets

It makes sense

• There are things that computers are simply better at thanhumans, e.g. in counting things

• Having human coders look for words in texts is like calculatinga regression analysis by hand

#bigdata Damian Trilling

Page 15: Analyzing social media with Python and other tools (1/4)
Page 16: Analyzing social media with Python and other tools (1/4)
Page 17: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

Some examples

#bigdata Damian Trilling

Page 18: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

A recent master thesis

The needle in the haystack

Imagine you want to analyze some very rare content.Normal sampling won’t work, that’s for sure.

#bigdata Damian Trilling

Page 19: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

A recent master thesis

The needle in the haystackImagine you want to analyze some very rare content.

Normal sampling won’t work, that’s for sure.

#bigdata Damian Trilling

Page 20: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

A recent master thesis

The needle in the haystackImagine you want to analyze some very rare content.Normal sampling won’t work, that’s for sure.

#bigdata Damian Trilling

Page 21: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites

1 Collect all articles from nine news sites during a period of twomonths, resulting in a database with 74.000 articles.

2 Filter articles containing specific keywords.3 Those 292 articles where then manually coded.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 22: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites

1 Collect all articles from nine news sites during a period of twomonths, resulting in a database with 74.000 articles.

2 Filter articles containing specific keywords.3 Those 292 articles where then manually coded.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 23: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites

1 Collect all articles from nine news sites during a period of twomonths, resulting in a database with 74.000 articles.

2 Filter articles containing specific keywords.

3 Those 292 articles where then manually coded.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 24: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

So you’d better collect everything first

Getting all news coverage from Dutch news sites

1 Collect all articles from nine news sites during a period of twomonths, resulting in a database with 74.000 articles.

2 Filter articles containing specific keywords.3 Those 292 articles where then manually coded.

Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 25: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

#bigdata Damian Trilling

Page 26: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

It’s just one line of code!

url.txthttp://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehnehttp://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermann-bittet-um-verzeihunghttp://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierung-will-zuruecktretenhttp://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klage-gegen-republikhttp://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafe-wegen-oelpesthttp://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-kein-babybauch-nur-fast-food. . .. . .. . .

wget-commandowget -i urls.txt

#bigdata Damian Trilling

Page 27: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

A recent bachelor thesis

Tone in tweets

Imagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?

#bigdata Damian Trilling

Page 28: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

A recent bachelor thesis

Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.

Do you really want to go through thousands of tweets by hand?

#bigdata Damian Trilling

Page 29: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

A recent bachelor thesis

Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?

#bigdata Damian Trilling

Page 30: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardstheir opponents

The student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 31: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.

She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 32: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.

For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 33: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

So you’d better think about automating your coding

Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.

Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.

#bigdata Damian Trilling

Page 34: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

#bigdata Damian Trilling

Page 35: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

#bigdata Damian Trilling

Page 36: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

Frame adoption on Twitter

Which phrases used by Merkel and Steinbrück on TV make itto the #tvduell discussion on Twitter?Identify frequently used words in the transcript of the debate andin tweets.Find co-occurrances.

#bigdata Damian Trilling

Page 37: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Some examples

Frame adoption on Twitter

#bigdata Damian Trilling

Page 38: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

A scheme

The process: collect, store, analyzeA scheme

#bigdata Damian Trilling

Page 39: Analyzing social media with Python and other tools (1/4)
Page 40: Analyzing social media with Python and other tools (1/4)
Page 41: Analyzing social media with Python and other tools (1/4)
Page 42: Analyzing social media with Python and other tools (1/4)
Page 43: Analyzing social media with Python and other tools (1/4)
Page 44: Analyzing social media with Python and other tools (1/4)
Page 45: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nl

yourTwapperkeeperContinuosly calls the Twitter-API and saves alltweets containing specific hashtags to amySQL-database.

rsshondCalls the RSS-feeds of news sites 1x/hour,saves title, time, header, and teaser of all newarticles into a CSV-table, follows the link tothe full text and downloads them.

snapshotVisits some URLs every 4x/day and downloadsthem.

#bigdata Damian Trilling

Page 46: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nlyourTwapperkeeperContinuosly calls the Twitter-API and saves alltweets containing specific hashtags to amySQL-database.

rsshondCalls the RSS-feeds of news sites 1x/hour,saves title, time, header, and teaser of all newarticles into a CSV-table, follows the link tothe full text and downloads them.

snapshotVisits some URLs every 4x/day and downloadsthem.

#bigdata Damian Trilling

Page 47: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nlyourTwapperkeeperContinuosly calls the Twitter-API and saves alltweets containing specific hashtags to amySQL-database.

rsshondCalls the RSS-feeds of news sites 1x/hour,saves title, time, header, and teaser of all newarticles into a CSV-table, follows the link tothe full text and downloads them.

snapshotVisits some URLs every 4x/day and downloadsthem.

#bigdata Damian Trilling

Page 48: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Our implementation

datacollection.followthenews-uva.cloudlet.sara.nlyourTwapperkeeperContinuosly calls the Twitter-API and saves alltweets containing specific hashtags to amySQL-database.

rsshondCalls the RSS-feeds of news sites 1x/hour,saves title, time, header, and teaser of all newarticles into a CSV-table, follows the link tothe full text and downloads them.

snapshotVisits some URLs every 4x/day and downloadsthem.

#bigdata Damian Trilling

Page 49: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Our implementation

How to access the collected data?

Apache-webserverDownload the data fromhttp://datacollection.followthenews-uva.cloudlet.sara.nl.

SSH (scp)Transfer data directly to your computer oranother server (likespeeltuin.followthenews-uva.cloudlet.sara.nl)

BeehubConnect the server to beehub, which can bemounted like the "p-schijf" or accessed online.

#bigdata Damian Trilling

Page 50: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Our implementation

How to access the collected data?Apache-webserverDownload the data fromhttp://datacollection.followthenews-uva.cloudlet.sara.nl.

SSH (scp)Transfer data directly to your computer oranother server (likespeeltuin.followthenews-uva.cloudlet.sara.nl)

BeehubConnect the server to beehub, which can bemounted like the "p-schijf" or accessed online.

#bigdata Damian Trilling

Page 51: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Our implementation

How to access the collected data?Apache-webserverDownload the data fromhttp://datacollection.followthenews-uva.cloudlet.sara.nl.

SSH (scp)Transfer data directly to your computer oranother server (likespeeltuin.followthenews-uva.cloudlet.sara.nl)

BeehubConnect the server to beehub, which can bemounted like the "p-schijf" or accessed online.

#bigdata Damian Trilling

Page 52: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Our implementation

How to access the collected data?Apache-webserverDownload the data fromhttp://datacollection.followthenews-uva.cloudlet.sara.nl.

SSH (scp)Transfer data directly to your computer oranother server (likespeeltuin.followthenews-uva.cloudlet.sara.nl)

BeehubConnect the server to beehub, which can bemounted like the "p-schijf" or accessed online.

#bigdata Damian Trilling

Page 53: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

Python

#bigdata Damian Trilling

Page 54: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

One tool to rule them all?

Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the

possibility to build exactly the tool we need. And it’sfun!

#bigdata Damian Trilling

Page 55: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

One tool to rule them all?

Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the

possibility to build exactly the tool we need.

And it’sfun!

#bigdata Damian Trilling

Page 56: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

One tool to rule them all?

Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the

possibility to build exactly the tool we need. And it’sfun!

#bigdata Damian Trilling

Page 57: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

What is Python?

It is a programming language

• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess

• You can run it on every platform

• And yet it is easy to learn!

It is widely used for content analysis

• Many online ressources and toolkits• Books about NLP and Web Scraping with Python

#bigdata Damian Trilling

Page 58: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

What is Python?

It is a programming language

• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess

• You can run it on every platform• And yet it is easy to learn!

It is widely used for content analysis

• Many online ressources and toolkits• Books about NLP and Web Scraping with Python

#bigdata Damian Trilling

Page 59: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

What is Python?

It is a programming language

• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess

• You can run it on every platform• And yet it is easy to learn!

It is widely used for content analysis

• Many online ressources and toolkits• Books about NLP and Web Scraping with Python

#bigdata Damian Trilling

Page 60: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

You do not have to become aprogrammer.

If you know how towrite SPSS or STATA syntax, youwill understand Python.(But if you have ever had contact with whatever programming language,

it helps.) It’s enough if you can read andmodify the code.

#bigdata Damian Trilling

Page 61: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

You do not have to become aprogrammer. If you know how towrite SPSS or STATA syntax, youwill understand Python.

(But if you have ever had contact with whatever programming language,

it helps.) It’s enough if you can read andmodify the code.

#bigdata Damian Trilling

Page 62: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

You do not have to become aprogrammer. If you know how towrite SPSS or STATA syntax, youwill understand Python.(But if you have ever had contact with whatever programming language,

it helps.)

It’s enough if you can read andmodify the code.

#bigdata Damian Trilling

Page 63: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

You do not have to become aprogrammer. If you know how towrite SPSS or STATA syntax, youwill understand Python.(But if you have ever had contact with whatever programming language,

it helps.) It’s enough if you can read andmodify the code.

#bigdata Damian Trilling

Page 64: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?

1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and

a column per actor, counting how often they are mentioned3 A typical task for a short Python script!

#bigdata Damian Trilling

Page 65: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?

1 The data structure: You have a folder with articles

2 The desired output: You want a table with the file names anda column per actor, counting how often they are mentioned

3 A typical task for a short Python script!

#bigdata Damian Trilling

Page 66: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?

1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and

a column per actor, counting how often they are mentioned

3 A typical task for a short Python script!

#bigdata Damian Trilling

Page 67: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

Think of the following task

RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?

1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and

a column per actor, counting how often they are mentioned3 A typical task for a short Python script!

#bigdata Damian Trilling

Page 68: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

You need someting like this:

for every file in folder:read the filecount actorsadd new row to table with filename and actor counts

save table

(such a notation is called pseudo-code)

#bigdata Damian Trilling

Page 69: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

mypath ="C:\Users\Ricarda\Documents\Artikelen"regex54 = re.compile(r’Israel.*[minister|politician.*|[Aa]uthorit’)filename_list=[]matchcount54=0matchcount54_list=[]onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]for f in onlyfiles:

matchcount54=0artikel=open(join(mypath,f),"r")for line in artikel:

matches54 = regex54.findall(line)for word in matches54:

matchcount54=matchcount54+1filename_list.append(f)matchcount54_list.append(matchcount54)artikel.close()

output=zip(filename_list,matchcount54_list)writer = csv.writer(open("overzichtstabel.csv", ’wb’))writer.writerows(output)

#bigdata Damian Trilling

Page 70: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

What it is

This is not too different from a script Jelle uses for his dissertation.The main difference: He doesn’t code regular expressions, butcalculates document similarity.slides-jelle.pdf

#bigdata Damian Trilling

Page 71: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When to use it

When to use Python

#bigdata Damian Trilling

Page 72: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When to use it

1st group of tasks

Highly repetitive tasksSimple tasks (counting things, comparing texts, . . . ) that can bedescribed in a formalized way. Saves time even with few cases, butthere is virtually no size limit.

Example: Retweets start with RT, optionally followed by a space,and some letters. So it is very easy to identify them automatically

#bigdata Damian Trilling

Page 73: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When to use it

2nd group of tasks

Task for which specific Python modules existThere are thousands of modules suitable for text analysis. Youbasically only have to write code for data input and output.

Example: Sentiment analysis

#bigdata Damian Trilling

Page 74: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When to use it

3rd group of tasks

API’s, RSS, webscraping . . .You can use Python if you want to collect and store information.

Example: Collecting bio’s of Twitter users, scraping the web (datajournalism!), downloading Facebook data

#bigdata Damian Trilling

Page 75: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When not to use it

When not to use Python

#bigdata Damian Trilling

Page 76: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When not to use it

Maybe you do not need to write a Python script . . .

. . . when there are already suitable tools available.Sometimes, the perfect ready-made tool already exists.

But still, sometimes it is more efficient to write something that does exactlywhat you want

Example: Axel Bruns’ awk-scripts for Twitter analysis(www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it inPython, but hey, he did it already with awk and it works.

#bigdata Damian Trilling

Page 77: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When not to use it

Maybe you do not need to write a Python script . . .

. . . when there are already suitable tools available.Sometimes, the perfect ready-made tool already exists.But still, sometimes it is more efficient to write something that does exactlywhat you want

Example: Axel Bruns’ awk-scripts for Twitter analysis(www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it inPython, but hey, he did it already with awk and it works.

#bigdata Damian Trilling

Page 78: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When not to use it

And, let’s face it,. . .

. . . we are no programmers.So maybe, some tasks are too complex for us to program ourselves.

But there is a huge online community that helps you.

#bigdata Damian Trilling

Page 79: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When not to use it

And, let’s face it,. . .

. . . we are no programmers.So maybe, some tasks are too complex for us to program ourselves.But there is a huge online community that helps you.

#bigdata Damian Trilling

Page 80: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When not to use it

Recap

1 Big Data? What are we talking about?Exploring the fieldSome examples

2 The process: collect, store, analyzeA schemeOur implementation

3 PythonWhat it isWhen to use itWhen not to use it

4 Questions?

#bigdata Damian Trilling

Page 81: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

When not to use it

After the break

Hand’s on! Exploring a basic Python script

#bigdata Damian Trilling

Page 82: Analyzing social media with Python and other tools (1/4)

Big Data? What are we talking about? The process: collect, store, analyze Python Questions?

Vragen of opmerkingen?

Damian Trilling

[email protected]@damian0604

www.damiantrilling.net

#bigdata Damian Trilling