data science: data visualization boot camp what is …ccartled/teaching/2020-spring/...data science:...
TRANSCRIPT
-
Data Science: Data Visualization Boot CampWhat is R?
Chuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhDChuck Cartledge, PhD
24 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 202024 January 2020
1/32
-
2/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Table of contents (1 of 1)
1 Intro.2 What is R?
The languageAvailability
3 RStudioBasic how-tos (left side)Basic how-tos (right side)
4 R BasicsTypes of numbersVariables
Operations and functions
5 Hands-on
6 Q & A
7 Conclusion
8 References
9 Files
-
3/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
What are we going to cover?
We’re going to talk about:
What is the language R?
What GUI do I use to write andexecute R programs?
What are some basic variable typesin R?
-
4/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
The language
The official definition.
“R is a language and environment for statistical computingand graphics. It is a GNU project which is similar to the Slanguage and environment which was developed at Bell Labo-ratories (formerly AT&T, now Lucent Technologies) by JohnChambers and colleagues. R can be considered as a differ-ent implementation of S. There are some important differ-ences, but much code written for S runs unaltered under R.R provides a wide variety of statistical (linear and nonlinearmodeling, classical statistical tests, time-series analysis, classifi-cation, clustering, . . . ) and graphical techniques, and is highlyextensible. The S language is often the vehicle of choice forresearch in statistical methodology, and R provides an OpenSource route to participation in that activity.”
CRAN Staff [2]
-
5/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Availability
R is available for almost all major operating systems.
Linux (and its variants)
(Mac) OS X
Windows
Get the R environment and a command line interface.Download from: https://cloud.r-project.org/Source code is available for custom OSs.https://github.com/wch/r-source
https://cloud.r-project.org/https://github.com/wch/r-source
-
6/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (left side)
A complete IDE
A complete, integrated Rdevelopment environment.
1 Text editor
2 R console
3 Variable list and contents
4 Tabbed display for differentuses
See software overview and designdocument for version anddownload information.
-
7/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (left side)
Same image.
-
8/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (left side)
Editor
“Smart” editor
CTRL + O to open a file
CTRL + S to save a file
CTRL + A to highlightcontents
CTRL + Enter to transfercontents to Console
Multiple files can be openedat once
-
9/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (left side)
Same image.
-
10/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (left side)
Console
Interprets R commands
Commands from editor,other panels, or manuallyentered
Execution errors appear here
Contents of print functionappear here
-
11/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (left side)
Same image.
-
12/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (right side)
Variables
Displays contents of selectedenvironment (includingvariables)
Display history of consolecommands
Can save and load data fromdata files
-
13/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (right side)
Same image.
-
14/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (right side)
Tabbed display
Displays files in the currentdirectory
Displays plots from theconsole
Allows packages to beadded, or removed from theconsole
Provides help/man pages forR functions and packages
-
15/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (right side)
Same image.
-
16/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (right side)
Starting an R script in the background
The image shows a Windowsenvironment.A *nix environment command is:Rscript backend.R &
-
17/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (right side)
Same image.
-
18/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Basic how-tos (right side)
Basic help with functions[1]1 Based on subject:
help.search("data input")
2 Based on pattern matching:apropos("lm")
3 Looking for a specific item:find("lm")
4 About a specific item:?lm
??lm
5 Example of a function:example(lm)
6 Source code for a function:lm
7 Demonstration of a function:demo(persp)
8 Demonstration of a function:vignette("moveline",
package="grid")
9 Contents of a library:library(help=spatial)
10 Install a new library:install.packages("Kfn")
11 Which data are included in apackage:data(package="ggplot2")
12 Which data are included in allpackages:data(package =
.packages(all.available =
TRUE))
13 Find an overview of R packages:https://cran.r-project.
org/web/views/
https://cran.r-project.org/web/views/https://cran.r-project.org/web/views/
-
19/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Types of numbers
Lots of different number types
We’ll dive into each type shortly.Other things:
Each builds on another.
Each may have attributes.
Each has a type.
Each has a class.
And, you can create your own.
-
20/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Types of numbers
Same image.
And, you can create your own.
-
21/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Types of numbers
Definition of types (1 of 3)
Character: surrounded by “ (“hi”) or ’ (’bye’). Special characters are escapedwith \
Complex: a combination of a real and an imaginary number in the form a + bi
DataFrame is a table or a two-dimensional array-like structure in which eachcolumn contains values of one variable and each row contains oneset of values from each column.
Date: number of days relative to January 1, 1970 (Unix dates)
Diff time: represent the amount of time between pairs of dates or date-times
Double: numbers be specified in decimal (0.1234), scientific (1.23e4), orhexadecimal (0xcafe)
Factor: Conceptually, factors take on a limited number of different values;such variables are often referred to as categorical variables
-
22/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Types of numbers
Definition of types (2 of 3)
Integer: are written similarly to doubles but must be followed by upper caseell (L) (1234L, 1e4L, or 0xcafeL)
List: objects which contain elements of different types like numbers,strings, vectors, and another list inside it
Logical: can have only one of two values (T[RUE] or F[ALSE])
NULL: NULL represents the null object in R. NULL is used mainly torepresent the lists with zero length
Numeric: the default computational data type
POSIXct: Portable Operating System Interface (POSIX) a family ofcross-platform standards, “ct” standards for calendar time
POSIXlt: Portable Operating System Interface (POSIX) a family ofcross-platform standards, “lt” standards for local time
Raw: data is stored as raw bytes
-
23/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Types of numbers
Definition of types (3 of 3)
Scalar: an individual value (actually a vector of length 1)
Tibble: are a modern take on data frames. They keep the features thathave stood the test of time, and drop the features that used to beconvenient but are now frustrating
Vector: a basic data structure in R. It contains elements of the same type.
-
24/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Variables
Variable types (part 1 of 2)[3]
1 Variable names:
Names are case sensitiveNames cannot beginwith numbers or specialsymbolsNames cannot haveinternal spaces
2 Scalars (simple values):variable
-
25/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Variables
Variable types (part 2 of 2)[3]
1 Data frames (each column must have the same number of values):L3
-
26/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Operations and functions
Operation
The basic data type is avector.
It is easy to create a vector,one way is as a sequence
x
-
27/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Operations and functions
Functions are supported
1 Have the same namingconventions as variables
2 Have three parts:1 Optional pass parameters
(named, evaluated,unnamed)
2 Text of the function3 The environment where
and while the functionexecutes
3 The last value evaluated isreturned.
4 Statements grouped by“curly braces” or semicolons.
functionName
-
28/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Some simple exercises to get familiar with R andRStudio
1 Create a variable andassign it the value 3
2 Print your variable
3 Create a function thattakes one parameter andreturns the square of thatvalue
4 Use your function tocompute the square of 45
5 Print the value of thepassed parameter inside thefunction
6 Open the file library.Rand explain what thefunction dumpObject does
-
29/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Q & A time.
Q: Do you know what the deathrate around here is?A: One per person.
-
30/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
What have we covered?
Covered a little bit of R’sbackgroundLooked at RStudio, a crossplatform GUI for working with RLooked at some R basics (variabletypes and functions)
Next: what is data visualization anyway?
-
31/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
References (1 of 1)
[1] Michael J. Crawley, The R Book, John Wiley & Sons, 2012.
[2] CRAN Staff, What is R?,https://www.r-project.org/about.html, 2017.
[3] Simon Walkowiak, Big Data Analytics with R, PacktPublishing Ltd., 2016.
https://www.r-project.org/about.html
-
32/32
Intro. What is R? RStudio R Basics Hands-on Q & A Conclusion References Files
Files of interest
1 Software installation
-
Software in Support of the Old Dominion UniversityCollege of Continuing Education and Professional
Development Big Data: Data Visualization Boot Camp
Chuck Cartledge
November 24, 2019
Contents
1 Introduction 1
2 Discussion 1
3 Conclusion 2
A Software on each workstation 2
B Software installation checkout 4
C Files 5
1 Introduction
A work in progress for software needed and used in the support of the Old Dominion Uni-versity (ODU) College of Continuing Education and Professional Development (CEPD) BigData: Data Visualization boot camp.
2 Discussion
Software will be needed on each virtual machine for the boot camp. This draft report containsa list of needed software, R scripts to install necessary libraries, and simple R scripts to testthe installation (see Section B).
-
3 Conclusion
After installing all the software identified in this report on their personal computers, thestudent will be able to replicate all boot camp activities.
A Software on each workstation
This section contains the assumptions about the operating system environment, and softwareload out for each work station.
1. Operating system: Windows 7
2. Software
(a) R
• Version: 3.3.2• Available from: https://cran.r-project.org/bin/windows/base/
(b) R Packages An install script is available to programmatically download the neededlibraries (see Section B). The list of libraries/packages include:
• bitops• cluster.datasets• clusterSim• colorspace• colourlovers• dplyr• ellipse• gcookbook• geosphere• getopt• ggmap• ggplot2• ggpubr• gnm• grDevices• grid
• gridBase• gridExtra• httr• jpeg• kernlab• KernSmooth• knitr• magrittr• mapdata• maps• methods• modeest• mvtnorm• NISTunits• oec• OpenStreetMap
• pdftools• plotrix• plyr• png• purrr• RColorBrewer• RCurl• readr• readxl• reshape• rgdal• rgl• rglwidget• rJava• rjson• scales
• sf
• sp
• sphereplot
• tidyr
• tm
• USAboundraries
• UScensus2000tract
• utils
• vcd
• vcdExtra
• xlsx
• xlsxjars
• XML
(c) R-Studio
2
https://cran.r-project.org/bin/windows/base/
-
• Version: 0.99.903• Available from: https://www.rstudio.com/products/rstudio/download/
(d) wget
• Version: 1.*• Available from: https://eternallybored.org/misc/wget/
The PATH environment variable should be updated to include the location of the Rinterpreter.
3
https://www.rstudio.com/products/rstudio/download/
https://eternallybored.org/misc/wget/
-
B Software installation checkout
There is an extensive list of software to be installed to support the boot camp. Afterthe software is installed, it is necessary to configure the software and test that it is installedcorrectly. A number of detailed procedureal files and R scripts are included in this document(see Section C) to facilitate the installation checkout. The R script files can be run inRStudio, or any other R environment that supports setting the current working directory.
The checkout is:
1. Associate the file extension “.R” with the RStudio program.
2. Set the current RStudio working directory to the location of installLibraries.R andrun the installLibraries.R script. There should be no errors.
4
-
C Files
A collection of miscellaneous files mentioned in the report.
• installLibraries.R – an R script to install all necessary libraries/packages from “the
cloud”
A complete collection of files (presentations, data, scripts, etc.) can be downloaded fromthe boot camp web site using this *nix command:
wget -np -r https://www.cs.odu.edu/~ccartled/Teaching/2019-Spring/DataVisualization/
or, this Windows command
wget -r -np -nH --cut-dirs=3 -R index.* https://www.cs.odu.edu/~ccartled/Teaching/2019-Spring/DataVisualization/
The Windows version of wget sometimes leaves “trashy” files behind, like “index.html@C=D;O=A”and so on. These files are not part of the boot camp web page, and can be removed or ig-nored. None of the boot camp scripts use, or process these files. The *nix version of wgetdoes not leave trashy files.
These commands are also located in:https://www.cs.odu.edu/~ccartled/Teaching/2020-Spring/DataVisualization/Errata/
wget.txt
5
rm(list=ls())
getNeededPackageList