analyzing social media with python and other tools (1/4)
DESCRIPTION
TRANSCRIPT
Good morning! Enjoy your coffee and installPutty and NotepadPlus via "Software Maintance/ApplicationCatalgue". And the Pattern-package (see my e-mail). Thanks.
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Hands-on-WorkshopBig (Twitter) Data
Damian Trilling
[email protected]@damian0604
www.damiantrilling.net
Afdeling CommunicatiewetenschapUniversiteit van Amsterdam
30 January 20149.30
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
The next one and a half days
You’ll hear about
• Collecting social media data via APIs, RSS and scraping (andthe tools for it)
• Technical infrastructure (via surfsara)• Python• Sentiment analysis• Automated coding• Frequencies and other statistics• Social network analysis with Gephi• . . .
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
In this session (1/4):
1 Big Data? What are we talking about?Exploring the fieldSome examples
2 The process: collect, store, analyzeA schemeOur implementation
3 PythonWhat it isWhen to use itWhen not to use it
4 Questions?
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Exploring the field
What’s big data?What are we talking about?
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Exploring the field
What are we talking about?
Today, it’s a hands-on workshop, so let’s keep this important (!)discussion for later.
So, no definition, but some brief thoughts
• Existing data ( 6= experiments or surveys)• Too big to code manually• Too big to handle with normal tools• New research questions• Call to revisit the relationship between theory and empiricalresearch
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Exploring the field
What are we talking about?
Today, it’s a hands-on workshop, so let’s keep this important (!)discussion for later.
So, no definition, but some brief thoughts
• Existing data ( 6= experiments or surveys)• Too big to code manually• Too big to handle with normal tools• New research questions• Call to revisit the relationship between theory and empiricalresearch
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Exploring the field
What are we talking about?
Today, . . .
• we are not going to talk about REALLY BIG data,• but we will have some exercises on datasets a normalcomputer can handle
Tomorrow, . . .
• we will also learn about scaling up these techniques• SurfSARA provides infrastructure for this
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Exploring the field
What are we talking about?
Today, . . .
• we are not going to talk about REALLY BIG data,• but we will have some exercises on datasets a normalcomputer can handle
Tomorrow, . . .
• we will also learn about scaling up these techniques• SurfSARA provides infrastructure for this
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Exploring the field
What are we talking about?
Some sources
• Social Network Sites• RSS-feeds• Databases• Scraping text from the web• . . .
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Exploring the field
It’s out there!You only have to collect it.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Exploring the field
But why should we care?
We can answer new questions
• Find needles in haystacks• Identify networks, co-word analysis, linguistic analysis, . . .• Verify our theories in larger datasets
It makes sense
• There are things that computers are simply better at thanhumans, e.g. in counting things
• Having human coders look for words in texts is like calculatinga regression analysis by hand
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Exploring the field
But why should we care?
We can answer new questions
• Find needles in haystacks• Identify networks, co-word analysis, linguistic analysis, . . .• Verify our theories in larger datasets
It makes sense
• There are things that computers are simply better at thanhumans, e.g. in counting things
• Having human coders look for words in texts is like calculatinga regression analysis by hand
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
Some examples
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
A recent master thesis
The needle in the haystack
Imagine you want to analyze some very rare content.Normal sampling won’t work, that’s for sure.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
A recent master thesis
The needle in the haystackImagine you want to analyze some very rare content.
Normal sampling won’t work, that’s for sure.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
A recent master thesis
The needle in the haystackImagine you want to analyze some very rare content.Normal sampling won’t work, that’s for sure.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
So you’d better collect everything first
Getting all news coverage from Dutch news sites
1 Collect all articles from nine news sites during a period of twomonths, resulting in a database with 74.000 articles.
2 Filter articles containing specific keywords.3 Those 292 articles where then manually coded.
Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
So you’d better collect everything first
Getting all news coverage from Dutch news sites
1 Collect all articles from nine news sites during a period of twomonths, resulting in a database with 74.000 articles.
2 Filter articles containing specific keywords.3 Those 292 articles where then manually coded.
Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
So you’d better collect everything first
Getting all news coverage from Dutch news sites
1 Collect all articles from nine news sites during a period of twomonths, resulting in a database with 74.000 articles.
2 Filter articles containing specific keywords.
3 Those 292 articles where then manually coded.
Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
So you’d better collect everything first
Getting all news coverage from Dutch news sites
1 Collect all articles from nine news sites during a period of twomonths, resulting in a database with 74.000 articles.
2 Filter articles containing specific keywords.3 Those 292 articles where then manually coded.
Pöll, B. (2013). Social media: new sources, new profession? A content analysis of the use of social media as asource for journalists in online news articles. Master Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
It’s just one line of code!
url.txthttp://www.gmx.at/themen/wissen/mensch/108g5xi-baeuerlich-schiefe-zaehnehttp://www.gmx.at/themen/unterhaltung/klatsch-tratsch/408g740-fuermann-bittet-um-verzeihunghttp://www.gmx.at/themen/nachrichten/aufruhr-arabien/268g70u-regierung-will-zuruecktretenhttp://www.gmx.at/themen/nachrichten/panorama/828g54y-neues-zur-klage-gegen-republikhttp://www.gmx.at/themen/nachrichten/panorama/968g72s-millionstrafe-wegen-oelpesthttp://www.gmx.at/themen/unterhaltung/klatsch-tratsch/368g6yc-kein-babybauch-nur-fast-food. . .. . .. . .
wget-commandowget -i urls.txt
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
A recent bachelor thesis
Tone in tweets
Imagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
A recent bachelor thesis
Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.
Do you really want to go through thousands of tweets by hand?
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
A recent bachelor thesis
Tone in tweetsImagine you want to know something about someone’s behavior ontwitter. Or how a specific topic is discussed on Twitter.Do you really want to go through thousands of tweets by hand?
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardstheir opponents
The student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.
She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.
For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
So you’d better think about automating your coding
Finding out how negative or positive politicians are towardstheir opponentsThe student took lists with positive and negative words and madeadditional ones with a politician’s opponents.She used a Python-script to check which type of words was used torefer to opponents.For further analysis, the results where imported in SPSS.
Schut, L. (2013). Verenigde Staten vs. Verenigd Koningrijk: Een automatische inhoudsanalyse naar verklarendefactoren voor het gebruik van positive campaigning en negative campaigning door vooraanstaande politici enpolitieke partijen op Twitter. Bachelor Thesis, Universiteit van Amsterdam.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
Frame adoption on Twitter
Which phrases used by Merkel and Steinbrück on TV make itto the #tvduell discussion on Twitter?Identify frequently used words in the transcript of the debate andin tweets.Find co-occurrances.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Some examples
Frame adoption on Twitter
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
A scheme
The process: collect, store, analyzeA scheme
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Our implementation
datacollection.followthenews-uva.cloudlet.sara.nl
yourTwapperkeeperContinuosly calls the Twitter-API and saves alltweets containing specific hashtags to amySQL-database.
rsshondCalls the RSS-feeds of news sites 1x/hour,saves title, time, header, and teaser of all newarticles into a CSV-table, follows the link tothe full text and downloads them.
snapshotVisits some URLs every 4x/day and downloadsthem.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Our implementation
datacollection.followthenews-uva.cloudlet.sara.nlyourTwapperkeeperContinuosly calls the Twitter-API and saves alltweets containing specific hashtags to amySQL-database.
rsshondCalls the RSS-feeds of news sites 1x/hour,saves title, time, header, and teaser of all newarticles into a CSV-table, follows the link tothe full text and downloads them.
snapshotVisits some URLs every 4x/day and downloadsthem.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Our implementation
datacollection.followthenews-uva.cloudlet.sara.nlyourTwapperkeeperContinuosly calls the Twitter-API and saves alltweets containing specific hashtags to amySQL-database.
rsshondCalls the RSS-feeds of news sites 1x/hour,saves title, time, header, and teaser of all newarticles into a CSV-table, follows the link tothe full text and downloads them.
snapshotVisits some URLs every 4x/day and downloadsthem.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Our implementation
datacollection.followthenews-uva.cloudlet.sara.nlyourTwapperkeeperContinuosly calls the Twitter-API and saves alltweets containing specific hashtags to amySQL-database.
rsshondCalls the RSS-feeds of news sites 1x/hour,saves title, time, header, and teaser of all newarticles into a CSV-table, follows the link tothe full text and downloads them.
snapshotVisits some URLs every 4x/day and downloadsthem.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Our implementation
How to access the collected data?
Apache-webserverDownload the data fromhttp://datacollection.followthenews-uva.cloudlet.sara.nl.
SSH (scp)Transfer data directly to your computer oranother server (likespeeltuin.followthenews-uva.cloudlet.sara.nl)
BeehubConnect the server to beehub, which can bemounted like the "p-schijf" or accessed online.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Our implementation
How to access the collected data?Apache-webserverDownload the data fromhttp://datacollection.followthenews-uva.cloudlet.sara.nl.
SSH (scp)Transfer data directly to your computer oranother server (likespeeltuin.followthenews-uva.cloudlet.sara.nl)
BeehubConnect the server to beehub, which can bemounted like the "p-schijf" or accessed online.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Our implementation
How to access the collected data?Apache-webserverDownload the data fromhttp://datacollection.followthenews-uva.cloudlet.sara.nl.
SSH (scp)Transfer data directly to your computer oranother server (likespeeltuin.followthenews-uva.cloudlet.sara.nl)
BeehubConnect the server to beehub, which can bemounted like the "p-schijf" or accessed online.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Our implementation
How to access the collected data?Apache-webserverDownload the data fromhttp://datacollection.followthenews-uva.cloudlet.sara.nl.
SSH (scp)Transfer data directly to your computer oranother server (likespeeltuin.followthenews-uva.cloudlet.sara.nl)
BeehubConnect the server to beehub, which can bemounted like the "p-schijf" or accessed online.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
Python
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
One tool to rule them all?
Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need. And it’sfun!
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
One tool to rule them all?
Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need.
And it’sfun!
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
One tool to rule them all?
Of course there are ready-made tool for some of the questions wewant to answer. But for many, there isn’t. Python offers us the
possibility to build exactly the tool we need. And it’sfun!
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
What is Python?
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess
• You can run it on every platform
• And yet it is easy to learn!
It is widely used for content analysis
• Many online ressources and toolkits• Books about NLP and Web Scraping with Python
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
What is Python?
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess
• You can run it on every platform• And yet it is easy to learn!
It is widely used for content analysis
• Many online ressources and toolkits• Books about NLP and Web Scraping with Python
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
What is Python?
It is a programming language
• It is flexible. You can use it for (in principle) any kind of data• There are virtually no limits regarding the amount of data toprocess
• You can run it on every platform• And yet it is easy to learn!
It is widely used for content analysis
• Many online ressources and toolkits• Books about NLP and Web Scraping with Python
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
You do not have to become aprogrammer.
If you know how towrite SPSS or STATA syntax, youwill understand Python.(But if you have ever had contact with whatever programming language,
it helps.) It’s enough if you can read andmodify the code.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
You do not have to become aprogrammer. If you know how towrite SPSS or STATA syntax, youwill understand Python.
(But if you have ever had contact with whatever programming language,
it helps.) It’s enough if you can read andmodify the code.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
You do not have to become aprogrammer. If you know how towrite SPSS or STATA syntax, youwill understand Python.(But if you have ever had contact with whatever programming language,
it helps.)
It’s enough if you can read andmodify the code.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
You do not have to become aprogrammer. If you know how towrite SPSS or STATA syntax, youwill understand Python.(But if you have ever had contact with whatever programming language,
it helps.) It’s enough if you can read andmodify the code.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
Think of the following task
RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?
1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned3 A typical task for a short Python script!
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
Think of the following task
RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?
1 The data structure: You have a folder with articles
2 The desired output: You want a table with the file names anda column per actor, counting how often they are mentioned
3 A typical task for a short Python script!
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
Think of the following task
RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?
1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned
3 A typical task for a short Python script!
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
Think of the following task
RQ: What are the differences in terms of actors mentionedbetween Israeli and Palestinian news coverage?
1 The data structure: You have a folder with articles2 The desired output: You want a table with the file names and
a column per actor, counting how often they are mentioned3 A typical task for a short Python script!
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
You need someting like this:
for every file in folder:read the filecount actorsadd new row to table with filename and actor counts
save table
(such a notation is called pseudo-code)
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
mypath ="C:\Users\Ricarda\Documents\Artikelen"regex54 = re.compile(r’Israel.*[minister|politician.*|[Aa]uthorit’)filename_list=[]matchcount54=0matchcount54_list=[]onlyfiles = [ f for f in listdir(mypath) if isfile(join(mypath,f)) ]for f in onlyfiles:
matchcount54=0artikel=open(join(mypath,f),"r")for line in artikel:
matches54 = regex54.findall(line)for word in matches54:
matchcount54=matchcount54+1filename_list.append(f)matchcount54_list.append(matchcount54)artikel.close()
output=zip(filename_list,matchcount54_list)writer = csv.writer(open("overzichtstabel.csv", ’wb’))writer.writerows(output)
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
What it is
This is not too different from a script Jelle uses for his dissertation.The main difference: He doesn’t code regular expressions, butcalculates document similarity.slides-jelle.pdf
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When to use it
When to use Python
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When to use it
1st group of tasks
Highly repetitive tasksSimple tasks (counting things, comparing texts, . . . ) that can bedescribed in a formalized way. Saves time even with few cases, butthere is virtually no size limit.
Example: Retweets start with RT, optionally followed by a space,and some letters. So it is very easy to identify them automatically
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When to use it
2nd group of tasks
Task for which specific Python modules existThere are thousands of modules suitable for text analysis. Youbasically only have to write code for data input and output.
Example: Sentiment analysis
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When to use it
3rd group of tasks
API’s, RSS, webscraping . . .You can use Python if you want to collect and store information.
Example: Collecting bio’s of Twitter users, scraping the web (datajournalism!), downloading Facebook data
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When not to use it
When not to use Python
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When not to use it
Maybe you do not need to write a Python script . . .
. . . when there are already suitable tools available.Sometimes, the perfect ready-made tool already exists.
But still, sometimes it is more efficient to write something that does exactlywhat you want
Example: Axel Bruns’ awk-scripts for Twitter analysis(www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it inPython, but hey, he did it already with awk and it works.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When not to use it
Maybe you do not need to write a Python script . . .
. . . when there are already suitable tools available.Sometimes, the perfect ready-made tool already exists.But still, sometimes it is more efficient to write something that does exactlywhat you want
Example: Axel Bruns’ awk-scripts for Twitter analysis(www. mappingonlinepublics. net ). If I had to write such a tool, I’d do it inPython, but hey, he did it already with awk and it works.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When not to use it
And, let’s face it,. . .
. . . we are no programmers.So maybe, some tasks are too complex for us to program ourselves.
But there is a huge online community that helps you.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When not to use it
And, let’s face it,. . .
. . . we are no programmers.So maybe, some tasks are too complex for us to program ourselves.But there is a huge online community that helps you.
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When not to use it
Recap
1 Big Data? What are we talking about?Exploring the fieldSome examples
2 The process: collect, store, analyzeA schemeOur implementation
3 PythonWhat it isWhen to use itWhen not to use it
4 Questions?
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
When not to use it
After the break
Hand’s on! Exploring a basic Python script
#bigdata Damian Trilling
Big Data? What are we talking about? The process: collect, store, analyze Python Questions?
Vragen of opmerkingen?
Damian Trilling
[email protected]@damian0604
www.damiantrilling.net
#bigdata Damian Trilling