advanced r programming - lecture 5 · lecture 5. 6/ 39 input and outputbasic i/ocloud storageweb...

43
1/ 39 Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases Advanced R Programming - Lecture 5 Krzysztof Bartoszek (slides by Leif Jonsson and M˚ ans Magnusson) Link¨opingUniversity [email protected] 18 September 2017 K. Bartoszek (STIMA LiU) STIMA LiU Lecture 5

Upload: others

Post on 26-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

1/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Advanced R Programming - Lecture 5

Krzysztof Bartoszek(slides by Leif Jonsson and Mans Magnusson)

Linkoping University

[email protected]

18 September 2017

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 2: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

2/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Today

Input and output

Basic I/O

Cloud storage

web APIs: Lab

web scraping

Shiny

Relational Databases

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 3: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

3/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Questions since last time?

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 4: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

4/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Input and output

Format, localization

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 5: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

5/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Input and output

Format, localization and encoding...... hell!

http://www.joelonsoftware.com/articles/Unicode.html

The Absolute Minimum Every Software Developer Absolutely, Positively

Must Know About Unicode and Character Sets (No Excuses!)

Unicode defines codes for all (?) characters—multiple encodings(for a given language only small fraction of characters used)Content-Type tag for HTMLBUT e–mail, .txt, .csv

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 6: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

6/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

”Formats”

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 7: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

7/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Localization

own Computer Cloud Storagelocal network web pageslocal database web scraping

web APIsremote database

Table: Local - Remote

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 8: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

8/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Files on your computer

# Input simple data

read.table ()

read.csv()

read.csv2()

load()

# Output simple data

write.table ()

write.csv()

write.csv2()

save()

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 9: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

9/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

More complex formats

software/data packageExcel XLConnectSAS, SPSS, STATA, ... foreignXML xmlJSON (GeoJSON) rjsonio, RJSONDocuments tmMaps spImages raster

Table: Format - R package

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 10: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

10/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Cloud storage

Table: Local - Remote

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 11: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

11/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Why?

Robust

Backups

Cloud computing

can be tricky in the beginning

but

how about safety?

But control on what is going on?

BUT requires internet connection

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 12: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

11/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Why?

Robust

Backups

Cloud computing

can be tricky in the beginning

but how about safety?

But control on what is going on?

BUT

requires internet connection

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 13: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

11/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Why?

Robust

Backups

Cloud computing

can be tricky in the beginning

but how about safety?

But control on what is going on?

BUT requires internet connection

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 14: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

12/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Localization

Arbitrary data Structured data

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 15: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

13/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

API Packages

Remote packageGeneral downloaderGitHub repmis, downloaderDropbox rdropAmazon RAmazonS3Google Docs googlesheets

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 16: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

14/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

web APIs

application program interface using http

”contract to ’get data’ online”

more and more common

examples:

github

Riksdagen

Statistics Sweden

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 17: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

15/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

RESTful

Basic principles:

Data is returned (JSON / XML)

Each specific data has its own URI

Communication is based on HTTP verbs

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 18: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

16/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Hypertext Transfer Protocol (http)

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 19: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

17/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Hypertext Transfer Protocol (http)

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 20: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

18/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Verbs

Verb DescriptionGET Get ”data” from server.POST Post ”data” to server (to get something)PUT Update ”data” on serverDELETE Delete resource on server

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 21: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

19/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Status codes

Code Description1XX Information from server2XX Yay! Gimme’ data!3XX Redirections4XX You failed5XX Server failed

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 22: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

20/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Example REST API’s

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

http://www.linkoping.se/open/data/Luftkvalitet/

Linkoping Luftkvalitet API

https://developers.google.com/maps/documentation/geocoding/intro

Google Map Geocode API

Page 23: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

21/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Common API formats

JavaScript Object Notation (JSON)Think of named lists in R

R Packages: RJSONIO, rjsonlite

Extensible Markup Language (XML)Older format (using nodes)

xpathR Packages: XML

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 24: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

22/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

JSON{

"firstName": "John",

"lastName": "Smith",

"age": 25,

"address": {

"streetAddress": "21 2nd Street",

"city": "New York",

"state": "NY",

"postalCode": "10021"

},

"phoneNumber": [

{ "type": "home", "number": "212 555" },

{ "type": "fax", "number": "646 555" }

],

"newSubscription": false ,

"companyName": null

}K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 25: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

23/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

XML

<?xml version="1.0" encoding="utf -8"?>

<wikimedia >

<projects >

<project name="Wikipedia" launch="2001 -01 -05">

<editions >

<edition language="English">en.wikipedia.org</edition >

<edition language="German">de.wikipedia.org</edition >

<edition language="French">fr.wikipedia.org</edition >

<edition language="Polish">pl.wikipedia.org</edition >

<edition language="Spanish">es.wikipedia.org</edition >

</editions >

</project >

<project name="Wiktionary" launch="2002 -12 -12">

<editions >

<edition language="English">en.wiktionary.org</edition >

<edition language="French">fr.wiktionary.org</edition >

<edition language="Vietnamese">vi.wiktionary.org</edition >

<edition language="Turkish">tr.wiktionary.org</edition >

<edition language="Spanish">es.wiktionary.org</edition >

</editions >

</project >

</projects >

</wikimedia >

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 26: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

24/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

web scraping

Unstructured http(s) data

Often HTML format

Spiders / scraping / web crawlers

Basics behind search engines

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 27: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

25/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

HTML

<!DOCTYPE html>

<html>

<head>

<title >This is a title</title>

</head>

<body>

<p>Hello world!</p>

</body>

</html>

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 28: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

26/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

(har)rvest

JavaScript Object Notation (JSON)Simplify spider activity

Download dataParse dataFollow linksFill out formsStore crawling history

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 29: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

27/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Difficulties and bad spiders

Scraping is fragile!Difficulties and bad spiderswww.domain.se/robot.txtPoliteness

robot trapsjavascriptdelays

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 30: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

28/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Shiny?

Interactive dashboards made easy

online or local

R as ”backend”

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 31: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

29/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Shiny?

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

https://www.rstudio.com/products/shiny/shiny-user-showcase/

Shiny Examples

Page 32: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

30/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

How it works

Application

Reactive

modify using HTML

MyAppName/server.R

MyAppName/ui.R

server.R define working directory

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 33: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

31/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Shiny Example

library(shiny)

# Examples with code

runExample("01 _hello")

runExample("03 _reactivity")

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 34: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

32/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Publish Shiny

locallyzip-file in cloud

github (see runGithub() )

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 35: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

33/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Publish Shiny

locallyzip-file in cloud

github (see runGithub() )

your own servershinyapps.io

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 36: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

34/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Relational Databases

Structured database in tables

local or online

query language for I/O

effective for big data

difficult to design

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 37: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

35/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 38: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

36/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 39: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

37/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

A good database

Can be difficult to design ?

No duplicatesNo redundanciesEasy to update”Normal forms”

Easy to query

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 40: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

37/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

A good database

Can be difficult to design ?No duplicates

No redundanciesEasy to update”Normal forms”

Easy to query

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 41: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

37/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

A good database

Can be difficult to design ?No duplicates

No redundanciesEasy to update”Normal forms”

Easy to query

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 42: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

38/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

Using databases from R

Database system R packageODBC (Microsoft Access) RODBCPostgreSQL RPostgresqlOracle ROracleMySQL RMySqlMongoDB rmongodb

Table: Database - R package

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5

Page 43: Advanced R Programming - Lecture 5 · Lecture 5. 6/ 39 Input and outputBasic I/OCloud storageweb APIsweb scrapingShinyRelational Databases ... XML xml JSON (GeoJSON) rjsonio, RJSON

39/ 39

Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases

The End... for today.Questions?

See you next time!

K. Bartoszek (STIMA LiU) STIMA LiU

Lecture 5