the history and use of r

Post on 01-Nov-2014

382 Views

Category:

Technology

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

A presentation on the history, design, and use of R. The talk will focus on companies that use and support R, use cases, where it is going, competitors, advantages and disadvantages, and resources to learn more about R. Speaker Bio Joseph Kambourakis has been the Lead Data Science Instructor at EMC for over two years. He has taught in eight countries and been interviewed by Japanese and Saudi Arabian media about his expertise in Data Science. He holds a Bachelors in Electrical and Computer Engineering from Worcester Polytechnic Institute and an MBA from Bentley University with a concentration in Business Analytics.

TRANSCRIPT

The History and Use of R

Joseph Kambourakis

Ground Rules

• Interrupt me

• These are all my opinions and not of EMC or Big Data Analytics, Discovery & Visualization Meetup

• Slides will be available

Joseph Kambourakis @mouthorjoe

Taught Around the World

WPI

Bentley University

Sam Woolford & Dominique Haughton

First Got Exposed to R

What is

R is a free software environment for statistical computing and graphics

A language plus a run-time environment with graphics, a debugger, access to certain system functions, and the ability to run programs stored in script files

What is R’s Hashtag?

Robert Gentleman & Ross Ihaka

• R: A Language for Data Analysis and Graphics

Starts with S

1976 1988 1991

Scheme

• Lexical scoping

Lexical scoping

• Searches through environments

– First global

• Global is your workspace

– Second namespace of packages

• More on packages later

Under the Hood

Open Source

• GNU General Public License

• Freedom 0: The freedom to run the program for any purpose.

• Freedom 1: The freedom to study how the program works, and change it to make it do what you wish.

• Freedom 2: The freedom to redistribute copies so you can help your neighbor.

• Freedom 3: The freedom to improve the program, and release your improvements (and modified versions in general) to the public, so that the whole community benefits.

• source: GNU.org

R Project

• The R Foundation is a not for profit organization working in the public interest. It has been founded by the members of the R Development Core Team in order to – Provide support for the R project and other innovations in

statistical computing. We believe that R has become a mature and valuable tool and we would like to ensure its continued development and the development of future innovations in software for statistical and computational research.

– Provide a reference point for individuals, institutions or commercial enterprises that want to support or interact with the R development community.

– Hold and administer the copyright of R software and documentation.

• source: R Project

Contributors

How it Works: Design

How it Works: Design

• Functional

– mean()

– plot()

How it Works: Design • Interpreted language

How it Works: Install

• Hosted on Comprehensive R Archive Network (CRAN)

• 54 megabytes

http://cran.rstudio.com/

• Download and Install R

• Precompiled binary distributions of the base system and contributed packages, Windows and Mac users most likely want one of these versions of R:

• Download R for Linux

• Download R for (Mac) OS X

• Download R for Windows

• R is part of many Linux distributions, you should check with your Linux package management system in addition to the link above.

How it works: Command Line

How it Works: Packages

• Base

– mean()

• Utils

– read.csv()

• Stats

– lm()

– sd()

Packages

• Mostly hosted on CRAN

• Many others hosted elsewhere

– Github

– RStudio

– Bioconductor

– RevolutionR

Packages

• source: http://r4stats.com/articles/popularity/

Top 10 Most Popular Packages

• source: Revolution Analytics Blog

Data Frame

Capabilities

• ANALYTICS – Basic Mathematics – Basic Statistics – Probability Distributions – Machine Learning – Optimization and Mathematical Programming – Signal Processing – Simulation and Random Number Generation – Statistical Modeling – Statistical Tests

• GRAPHICS AND VISUALIZATION – Static Graphics – Dynamic Graphics – Devices and Formats

Model & Plot

GUI:RStudio

How Does it Compare?

How Does it Compare?

How Does it Compare? R SAS SPSS Professional MATLAB

Cost Free! Very VERY High High - $9,975 High

Documentation Yes Very comprehensive

OK Some examples

Training Course NA Yes Yes Yes

User interface Low Medium Best Medium

Output Separate commands

Automatically produce diagnosis graph and forecast

Totally automated Some automated via GUI, some specific command

Models* Does not STL moving average

Does not have ARCH/GARCH + and other moving average models

Does not have MA & decomposition models

Certification Program

Yes Yes Yes

Commercial Support

Commercial Support

• Version 3.1.1

7/10/2014

• source

Where it’s Now?

Where it’s Going

Source: Revolution Analytics Blog

Where it’s Going: Extensions and Interactions

• Rcpp

– Transfer from R to C++, and from C++ to R

• RLLVM

– Creates code

• H2O

– Big data package

The best thing about R is that it was developed by statisticians. The worst thing about R is that...it was developed by statisticians.

Bo Cowgill

Good: Open Source

• So many contributors

• Free!

• Community

Bad: Open Source

• No customer support

• Features

Good: Frequent Updates

• Always new packages

• New updates and bug fixes

Bad: Frequent Updates

• Package updates

• R updates

Bad: Documentation

Bad:Speed

• 40 year old code

Bad:Speed

• Interpreted

Bad:Speed

• Single threaded

Bad: Memory

• All stored in memory

Soccer Example

@11tegen11

Congressional Approval Rating

@adamramey

Use Cases 4

How to Learn:

How to Learn:

How to Learn: RStudio How to Learn:

How to Learn: Data Camp

How to Learn:

How to Learn: Springer Series

How to Learn: Art of R

Programming

How to Learn: Boot Camp Boston Predictive Analytics Meetup

How to Learn: Online Videos

Web Resources:

Web Resources:

Web Resources:

UseR Groups & Conferences

Closing Thoughts

Thank You

Thank You

Questions

?

top related