advanced r programming - lecture 5 · lecture 5. 6/ 39 input and outputbasic i/ocloud storageweb...
TRANSCRIPT
1/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Advanced R Programming - Lecture 5
Krzysztof Bartoszek(slides by Leif Jonsson and Mans Magnusson)
Linkoping University
18 September 2017
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
2/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Today
Input and output
Basic I/O
Cloud storage
web APIs: Lab
web scraping
Shiny
Relational Databases
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
3/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Questions since last time?
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
4/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Input and output
Format, localization
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
5/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Input and output
Format, localization and encoding...... hell!
http://www.joelonsoftware.com/articles/Unicode.html
The Absolute Minimum Every Software Developer Absolutely, Positively
Must Know About Unicode and Character Sets (No Excuses!)
Unicode defines codes for all (?) characters—multiple encodings(for a given language only small fraction of characters used)Content-Type tag for HTMLBUT e–mail, .txt, .csv
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
6/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
”Formats”
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
7/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Localization
own Computer Cloud Storagelocal network web pageslocal database web scraping
web APIsremote database
Table: Local - Remote
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
8/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Files on your computer
# Input simple data
read.table ()
read.csv()
read.csv2()
load()
# Output simple data
write.table ()
write.csv()
write.csv2()
save()
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
9/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
More complex formats
software/data packageExcel XLConnectSAS, SPSS, STATA, ... foreignXML xmlJSON (GeoJSON) rjsonio, RJSONDocuments tmMaps spImages raster
Table: Format - R package
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
10/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Cloud storage
Table: Local - Remote
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
11/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Why?
Robust
Backups
Cloud computing
can be tricky in the beginning
but
how about safety?
But control on what is going on?
BUT requires internet connection
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
11/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Why?
Robust
Backups
Cloud computing
can be tricky in the beginning
but how about safety?
But control on what is going on?
BUT
requires internet connection
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
11/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Why?
Robust
Backups
Cloud computing
can be tricky in the beginning
but how about safety?
But control on what is going on?
BUT requires internet connection
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
12/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Localization
Arbitrary data Structured data
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
13/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
API Packages
Remote packageGeneral downloaderGitHub repmis, downloaderDropbox rdropAmazon RAmazonS3Google Docs googlesheets
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
14/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
web APIs
application program interface using http
”contract to ’get data’ online”
more and more common
examples:
github
Riksdagen
Statistics Sweden
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
15/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
RESTful
Basic principles:
Data is returned (JSON / XML)
Each specific data has its own URI
Communication is based on HTTP verbs
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
16/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Hypertext Transfer Protocol (http)
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
17/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Hypertext Transfer Protocol (http)
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
18/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Verbs
Verb DescriptionGET Get ”data” from server.POST Post ”data” to server (to get something)PUT Update ”data” on serverDELETE Delete resource on server
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
19/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Status codes
Code Description1XX Information from server2XX Yay! Gimme’ data!3XX Redirections4XX You failed5XX Server failed
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
20/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Example REST API’s
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
http://www.linkoping.se/open/data/Luftkvalitet/
Linkoping Luftkvalitet API
https://developers.google.com/maps/documentation/geocoding/intro
Google Map Geocode API
21/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Common API formats
JavaScript Object Notation (JSON)Think of named lists in R
R Packages: RJSONIO, rjsonlite
Extensible Markup Language (XML)Older format (using nodes)
xpathR Packages: XML
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
22/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
JSON{
"firstName": "John",
"lastName": "Smith",
"age": 25,
"address": {
"streetAddress": "21 2nd Street",
"city": "New York",
"state": "NY",
"postalCode": "10021"
},
"phoneNumber": [
{ "type": "home", "number": "212 555" },
{ "type": "fax", "number": "646 555" }
],
"newSubscription": false ,
"companyName": null
}K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
23/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
XML
<?xml version="1.0" encoding="utf -8"?>
<wikimedia >
<projects >
<project name="Wikipedia" launch="2001 -01 -05">
<editions >
<edition language="English">en.wikipedia.org</edition >
<edition language="German">de.wikipedia.org</edition >
<edition language="French">fr.wikipedia.org</edition >
<edition language="Polish">pl.wikipedia.org</edition >
<edition language="Spanish">es.wikipedia.org</edition >
</editions >
</project >
<project name="Wiktionary" launch="2002 -12 -12">
<editions >
<edition language="English">en.wiktionary.org</edition >
<edition language="French">fr.wiktionary.org</edition >
<edition language="Vietnamese">vi.wiktionary.org</edition >
<edition language="Turkish">tr.wiktionary.org</edition >
<edition language="Spanish">es.wiktionary.org</edition >
</editions >
</project >
</projects >
</wikimedia >
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
24/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
web scraping
Unstructured http(s) data
Often HTML format
Spiders / scraping / web crawlers
Basics behind search engines
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
25/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
HTML
<!DOCTYPE html>
<html>
<head>
<title >This is a title</title>
</head>
<body>
<p>Hello world!</p>
</body>
</html>
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
26/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
(har)rvest
JavaScript Object Notation (JSON)Simplify spider activity
Download dataParse dataFollow linksFill out formsStore crawling history
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
27/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Difficulties and bad spiders
Scraping is fragile!Difficulties and bad spiderswww.domain.se/robot.txtPoliteness
robot trapsjavascriptdelays
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
28/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Shiny?
Interactive dashboards made easy
online or local
R as ”backend”
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
29/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Shiny?
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
https://www.rstudio.com/products/shiny/shiny-user-showcase/
Shiny Examples
30/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
How it works
Application
Reactive
modify using HTML
MyAppName/server.R
MyAppName/ui.R
server.R define working directory
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
31/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Shiny Example
library(shiny)
# Examples with code
runExample("01 _hello")
runExample("03 _reactivity")
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
32/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Publish Shiny
locallyzip-file in cloud
github (see runGithub() )
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
33/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Publish Shiny
locallyzip-file in cloud
github (see runGithub() )
your own servershinyapps.io
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
34/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Relational Databases
Structured database in tables
local or online
query language for I/O
effective for big data
difficult to design
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
35/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
36/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
37/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
A good database
Can be difficult to design ?
No duplicatesNo redundanciesEasy to update”Normal forms”
Easy to query
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
37/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
A good database
Can be difficult to design ?No duplicates
No redundanciesEasy to update”Normal forms”
Easy to query
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
37/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
A good database
Can be difficult to design ?No duplicates
No redundanciesEasy to update”Normal forms”
Easy to query
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
38/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
Using databases from R
Database system R packageODBC (Microsoft Access) RODBCPostgreSQL RPostgresqlOracle ROracleMySQL RMySqlMongoDB rmongodb
Table: Database - R package
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5
39/ 39
Input and output Basic I/O Cloud storage web APIs web scraping Shiny Relational Databases
The End... for today.Questions?
See you next time!
K. Bartoszek (STIMA LiU) STIMA LiU
Lecture 5