five steps to get tweets sent by a list of users

26
Collecting Tweets Sent by a List of Users This Python tutorial is brought to you by CuriosityBits.com, with the generous support from Dr. Gregory D. Saxton (http ://social-metrics.org/) 1

Upload: weiai-wayne-xu

Post on 17-Jul-2015

687 views

Category:

Education


0 download

TRANSCRIPT

Collecting Tweets Sent by a List of Users

• This Python tutorial is brought to you by

CuriosityBits.com, with the generous support from

Dr. Gregory D. Saxton (http://social-metrics.org/)

1

Five Steps…

1. Install Python and necessary Python libraries.

2. Set up Twitter API Keys.

3. Prepare a list of Twitter handles (screen-names) in .csv

format.

4. Create a SQLite database using SQLite Browser, and

import the Twitter handle list.

5. Modify Python script and run it to get results!

Download the Python scripthttps://drive.google.com/file/d/0Bwwg6GLCW_IPVmNBMUV4bVhUU0U/edit?usp=sharing

2

The results you will get…

You will get an ample amount of metadata for each tweet collected.

Here is a breakdown of some important output variables:

name Def.

tweet_id The unique identifier for a tweet

inserted_date When the tweet is downloaded into your

database

language language

retweeted_status Is the tweet a RETWEET?

content The content of the tweet

from_user_screen_name The Twitter handle of sender

created_at When the tweet is sent

3

name Def.

from_user_followers_count The number of followers a sender has

from_user_friends_count The number of users a sender is following

from_user_listed_count How many times a sender is listed by other users

from_user_statuses_count The number of tweets sent by the sender

from_user_description The profile bio of the sender

from_user_location The location of the sender

from_user_created_at When the sender Twitter account is created

retweet_count How many times a tweet is retweeted

entities_urls The URLs included in a tweet

entities_urls_count The number of URLs included in a tweet

entities_hashtags The hashtags included in a tweet

entities_hashtags_count The number of hashtags in a tweet

entities_mentions The Twitter handles mentioned in a tweet

4

name Def.

in_reply_to_screen_name Whom do the sender reply to

in_reply_to_status_id The unique identifier of the Twitter handle

replied to by the sender

entities_expanded_urls Complete URLs extracted from short URLs

json_output The ENTIRE metadata in JSON format,

including metadata not parsed into columns

entities_media_count NA

media_expanded_url NA

media_url NA

media_type NA

video_link NA

photo_link NA

twitpic NA

5

Step 1. Install Python and necessary libraries

6

Download Anaconda Python 2.7 to run Python scripts. Anaconda is free to download. Once you’ve installed Anaconda, you can modify scripts in Spyder

• Do you know how to install necessary Python libraries? If not, please review pg.8 in

http://curiositybits.com/python-for-mining-the-social-web/python-tutorial-mining-twitter-user-profile/

Install the following libraries

7

Step 1. Install Python and necessary libraries

• Simplejson (https://pypi.python.org/pypi/simplejson)

• Sqlite3 (http://sqlite.org/)

• Sqlalchemy (http://www.sqlalchemy.org/)

• Twython

(https://twython.readthedocs.org/en/latest/index.html)

Step 2: Set up Twitter API Keys.

First, go to https://dev.twitter.com/, and sign in your Twitter account. Go to my applications page to create an application.

8

Enter any name that makes sense to you

Enter any text that makes sense to you

you can enter any legitimate URL, here, I put in the URL of my institution.

Same as above, you can enter any legitimate

URL, here, I put in the URL of my institution.

9

Step 2: Set up Twitter API Keys.

Then, go to API Keys page, scroll down to the bottom and click Create my access token. Wait for a few minutes and refresh the page, then you get all your keys!

you need API Key, API Secret, Access token,

Access token secret.

10

Step 2: Set up Twitter API Keys.

Step 3: Prepare a Twitter handle list

Create a list of Twitter handles whose tweets we are interested in collecting. You can create the list in Excel and save it as csv format. The list should contains three columns (in accordance to the configuration in the Python script).

The first column lists sequential

numbers beginning with 1.

The second column lists Twitter

handles.

For the third column, I

entered 1 all throughout,

but you can leave it blank.

11

Go to http://sqlitebrowser.sourceforge.net/ and

download SQLite Database Browser. It allows you

to view and edit SQLite databases.

12

Step 4: Create a SQLite database

• File-New Database to create a new database.

• Remember the database filename you enter.

• The default file extension is .sqlite, to prevent future

complications, add the extension .sqlite when typing

filename.

13

Step 4: Create a SQLite database

Use File-Import Table From CSV File, import the

.csv file you’ve saved. Name the imported table as

accounts. This table name corresponds to the one

we will use in Python script. After you click create,

the csv list will be loaded into the database, and you

can browse it in Browse Data. Lastly, remember to

save the database.

Stay on the database you’ve just created.

14

Step 4: Create a SQLite database

Modify the imported table: Go to Edit-

Modify Tables, use Edit field to change

column names. To correspond to the

Python script, name the first column as

rowid, and Fileld Type as Integer; the

second column as screen_name, and

Field type String, and the third as

user_type, and String. In the end, the

database table is defined as the

screenshot.

15

Step 4: Create a SQLite database

Step 5: Modify the script and Run

API Key

API secret

Access token

Access token secret

Find this block of code, and enter your API Keys.

16

Step 5: Modify the script and Run

Find this block of code, and enter the filename and file path

of the SQLite database you have created.

You need to match the file path and file name to the SQLite

database you’ve created (RECOMMENDED).

If the Python script file and the created SQLite database are

in the same folder, just paste your database name here. 17

Step 5: Specify search criteria

You can refine search criteria:

e.g.

Count: Specifies the number of tweets to try and retrieve for each

Twitter handle. The maximum value is 200.

More on https://dev.twitter.com/docs/api/1/get/statuses/user_timeline

18

Step 4: Modify the script and Run

In Spyder, Go to Run, and choose Execute in a new dedicated Python

interpreter. The first option Execute in current Python or IPython

interpreter does not work on my end, but may be working on your

computer.

19

Some issues you may encounter

Too many values to unpack ERRORS!!

Don’t panic! It is almost certain that you will hit

roadblocks when learning Python. So, be prepared to

debug.

For this error, it is probably because you’ve saved the

Python script file in a place other than default Python

folders.

But what is default Python folder?

20

Find your default Python folders

A simple way to find out your default Python folder is, On a WINDOWS machine, In Start menu, right-click the Computer and choose Properties

21

Find the default Python folder

Folders listed here are your

default Python folders.

On my machine, C:\Anaconda\Lib\site-packages is one of the

default Python folders. If the Python script is running

successfully, it should give you these.

Some issues you may encounter

Oops! Error again!

Twitter API has rate limit. It restricts how many tweets you can get within a time

frame. Based on the current script, you can cover 300ish users in a 15 minute

window. Once you hit the limit, you will see the error message popping up.

There are two ways to get around the restriction:

1. wait for 15 minutes for another run;

2. create multiple Twitter apps and get multiple API keys. Once you use up the

quota in one run, paste in a new key to start a new run!

Some issues you may encounter

But, pay attention to the block of code shown as above, The number 0 means

that the script starts with the user listed in the first row.

Because we will hit rate limit, you will need to run the code multiple times to

complete crawling all users’ tweets. So, make sure to change the starting row

number!

For example, in the first run, you’ve covered user (0) to user (150), and run

into rate limit. You should put 151 as the starting number in the second run.

Load the SQLite data into Excel

You can export the data in SQLite Database to Excel.

File – Export – Table as CSV to export the data into csv. format. Make sure

to add the .csv file extension name.