bd-aca week1a

36
Introducing. . . The Linux command line Python Next meetings Big Data and Automated Content Analysis Week 1 – Monday »Introduction« Damian Trilling [email protected] @damian0604 www.damiantrilling.net Afdeling Communicatiewetenschap Universiteit van Amsterdam 30 March 2015 Big Data and Automated Content Analysis Damian Trilling

Upload: department-of-communication-science-university-of-amsterdam

Post on 17-Jul-2015

195 views

Category:

Education


0 download

TRANSCRIPT

Introducing. . . The Linux command line Python Next meetings

Big Data and Automated Content AnalysisWeek 1 – Monday»Introduction«

Damian Trilling

[email protected]@damian0604

www.damiantrilling.net

Afdeling CommunicatiewetenschapUniversiteit van Amsterdam

30 March 2015

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Today

1 Introducing. . .. . . the people. . . the topic. . . the methods. . . the tools

2 The Linux command line

3 Python

4 Next meetings

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

. . . the people

Introducing. . .. . . the people

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

. . . the people

Introducing. . .Damian

dr. Damian TrillingLecturer Political Communication & Journalism

• studied Communication Science in Münsterand at the VU 2003–2009

• PhD candidate @ ASCoR 2009–2012

• now: Universitair Docent (UD) / AssistantProfessor

• interested in political communication andjournalism in a changing media environmentand in innovative (digital, large-scale,computational) research methods

@damian0604 [email protected] 8th floor www.damiantrilling.net

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

. . . the people

Introducing. . .Björn

Björn Burscher, MSc.PhD Candidate

• studied Political Communication &Information Science

• currently PhD candidate @ ASCoR

• interested in automatic content analysis andelections

[email protected]

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

. . . the people

Introducing. . .You

Your name?Your background?Your reason to follow this course?

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

. . . the topic

Introducing. . .. . . the topic

⇒on Wednesday

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

. . . the topic

Introducing. . .. . . the topic

⇒on Wednesday

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

. . . the methods

Introducing. . .. . . the methods

⇒on Wednesday

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

. . . the methods

Introducing. . .. . . the methods

⇒on Wednesday

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

. . . the tools

Introducing. . .. . . the tools

⇒now!

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

. . . the tools

Introducing. . .. . . the tools

⇒now!

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

When point-and-click doesn’t help you further:The Linux command line

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Let’s switch to Linux!

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Tools: The linux command linea.k.a. the terminal, shell or, more specifically, bash

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Tools: The linux command line

Why?

• Direct access to your computer’s functions• In contrast to point-and-click programs, command lineprograms can easily be linked to each other, scripted, . . .

• Suitable for handling even huge files

• You simply cannot open them in many GUI programs• . . . or it takes ages• The command line allows you to do such things without

problems

• It is reproducible (ever tried to explain to your parents on thephone where they have to click?)

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Tools: The linux command line

Why?

• Direct access to your computer’s functions• In contrast to point-and-click programs, command lineprograms can easily be linked to each other, scripted, . . .

• Suitable for handling even huge files

• You simply cannot open them in many GUI programs• . . . or it takes ages• The command line allows you to do such things without

problems

• It is reproducible (ever tried to explain to your parents on thephone where they have to click?)

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Tools: The linux command line

Why?

• Direct access to your computer’s functions

• In contrast to point-and-click programs, command lineprograms can easily be linked to each other, scripted, . . .

• Suitable for handling even huge files

• You simply cannot open them in many GUI programs• . . . or it takes ages• The command line allows you to do such things without

problems

• It is reproducible (ever tried to explain to your parents on thephone where they have to click?)

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Tools: The linux command line

Why?

• Direct access to your computer’s functions• In contrast to point-and-click programs, command lineprograms can easily be linked to each other, scripted, . . .

• Suitable for handling even huge files

• You simply cannot open them in many GUI programs• . . . or it takes ages• The command line allows you to do such things without

problems

• It is reproducible (ever tried to explain to your parents on thephone where they have to click?)

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Tools: The linux command line

Why?

• Direct access to your computer’s functions• In contrast to point-and-click programs, command lineprograms can easily be linked to each other, scripted, . . .

• Suitable for handling even huge files

• You simply cannot open them in many GUI programs• . . . or it takes ages• The command line allows you to do such things without

problems• It is reproducible (ever tried to explain to your parents on thephone where they have to click?)

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Tools: The linux command line

Why?

• Direct access to your computer’s functions• In contrast to point-and-click programs, command lineprograms can easily be linked to each other, scripted, . . .

• Suitable for handling even huge files• You simply cannot open them in many GUI programs• . . . or it takes ages• The command line allows you to do such things without

problems

• It is reproducible (ever tried to explain to your parents on thephone where they have to click?)

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Tools: The linux command line

Why?

• Direct access to your computer’s functions• In contrast to point-and-click programs, command lineprograms can easily be linked to each other, scripted, . . .

• Suitable for handling even huge files• You simply cannot open them in many GUI programs• . . . or it takes ages• The command line allows you to do such things without

problems• It is reproducible (ever tried to explain to your parents on thephone where they have to click?)

Big Data and Automated Content Analysis Damian Trilling

There are endless tutorials, cheatsheets, videos . . . online. Google it!

Introducing. . . The Linux command line Python Next meetings

Exercise

Take the book.Follow the instructions in Chapter 2.

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

A language, not a program:Python

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Python

What?

• A language, not a specific program• Huge advantage: flexibility, portability• One of the languages for data analysis. (The other one is R.)

But Python is more flexible—the original version of Dropbox was written in Python. I’d say: R fornumbers, Python for text and messy stuff. But that’s just my personal view.

Which version?We use Python 3.http://www.google.com or http://www.stackexchange.com still offer a lotof Python2-code, but that can easily be adapted. Most notable difference: InPython 2, you write print "Hi", this has changed to print ("Hi")

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Python

What?

• A language, not a specific program• Huge advantage: flexibility, portability• One of the languages for data analysis. (The other one is R.)

But Python is more flexible—the original version of Dropbox was written in Python. I’d say: R fornumbers, Python for text and messy stuff. But that’s just my personal view.

Which version?We use Python 3.http://www.google.com or http://www.stackexchange.com still offer a lotof Python2-code, but that can easily be adapted. Most notable difference: InPython 2, you write print "Hi", this has changed to print ("Hi")

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Python

What?

• A language, not a specific program• Huge advantage: flexibility, portability• One of the languages for data analysis. (The other one is R.)

But Python is more flexible—the original version of Dropbox was written in Python. I’d say: R fornumbers, Python for text and messy stuff. But that’s just my personal view.

Which version?We use Python 3.http://www.google.com or http://www.stackexchange.com still offer a lotof Python2-code, but that can easily be adapted. Most notable difference: InPython 2, you write print "Hi", this has changed to print ("Hi")

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

If it’s not a program, how do you work with it?

Interactive mode

• Just type python3 on the command line, and you can startentering Python commands (You can leave again by entering quit())

• Great for quick try-outs, but you cannot even save your code

An editor of your choice

• Write your program in any text editor, save it as myprog.py• and run it from the command line with ./myprog.py or

python3 myprog.py

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

If it’s not a program, how do you work with it?

Interactive mode

• Just type python3 on the command line, and you can startentering Python commands (You can leave again by entering quit())

• Great for quick try-outs, but you cannot even save your code

An editor of your choice

• Write your program in any text editor, save it as myprog.py• and run it from the command line with ./myprog.py or

python3 myprog.py

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

If it’s not a program, how do you start it?

An IDE (Integrated Development Environment)

• Provides an interface• Both quick interactive try-outs and writing larger programs• We use spyder, which looks a bit like RStudio (and to someextent like Stata)

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Exercises

1. Run a program that greets you.The code for this is

1 print("Hello world")

After that, do some calculations. You can do that in a similar way:1 a=22 print(a*3)

Just play around.

2. Take the book.Follow the instructions in Chapter 3.We will talk about the concepts that are introduced during thenext lectures, but it helps if you first try to get startedyourself—that’s less abstract.

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Next meetings

Big Data and Automated Content Analysis Damian Trilling

Introducing. . . The Linux command line Python Next meetings

Next meetings

Wednesday, 1 April: LectureIntroduction to the theoretical and methodological underpinnings.Don’t forget to read the articles in advance.

Wednesday, 8 April: Lab SessionSome serious programming in Python

Big Data and Automated Content Analysis Damian Trilling