source code tons of code

28

Upload: aliza

Post on 26-Feb-2016

57 views

Category:

Documents


5 download

DESCRIPTION

Package More Code Statistical Functions Datasets. Workspace Fewer Lines of Code Capability. Source Code Tons of Code. http:// www.statmethods.net/management/functions.html. Currently , h ow many R Packages?. At the command line enter: dim( available.packages ()) available.packages (). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Source Code Tons of Code
Page 2: Source Code Tons of Code
Page 3: Source Code Tons of Code

Source Code- Tons of Code

Package- More Code- Statistical Functions- Datasets

Workspace- Fewer Lines of Code- Capability

Page 5: Source Code Tons of Code

Currently, how many R Packages?

At the command line enter: dim(available.packages()) available.packages()

Page 6: Source Code Tons of Code

Is there an R App Store?

Page 7: Source Code Tons of Code
Page 8: Source Code Tons of Code

Two heavyweights in the statistical software market are SAS and SPSS/IBM

Page 9: Source Code Tons of Code

R Packages have been created that are equivalent to the functionality of SAS and SPSS

Page 10: Source Code Tons of Code

XLConnect

XML

rhbase

sas7bdat

Rcpp

Packages for reading, writing for various file formats

RJSONIO

Hmisc

RODBC / ROracle

foreign

RMySQL

RWeka

Comma Separated Variables

Page 11: Source Code Tons of Code

Oracle R Enterprise (ORE)

R Being Integrated Into Other Data-Related Products

http://help.sap.com/hana/hana_dev_r_emb_en.pdf

https://blogs.oracle.com/R/

http://www-142.ibm.com/software/products/us/en/spss-stats-developer/

“Both R and SAS are here to stay, and finding ways to make them work better with each other is in the best interests of our customers.”`

http://support.sas.com/rnd/app/studio/Rinterface2.html

Page 12: Source Code Tons of Code

R “Machine Learning” LibrariesAnalytic Technique R Package/Library Author OrganizationSupport Vector Mach. libsvm

(ksvm)Chih-Chung ChangChih-Jen Lin

National Taiwan Univ. + EBay Research Labs

Neural Networks neuralnet Frauke GuntherStefan Fritsch

Epidemiology and Prevention Research

nnet Brian Ripley University of Oxford

monmlp Alex J. Cannon Atmospheric Science

Randomized Forests randomForest Fortran original by Leo Breiman & Adele Cutler, R port by Andy Liaw and Matthew Wiener. Merck

Decision Trees rpart Terry M Therneau and Beth Atkinson. R port by Brian Ripley.

Mayo Clinic

University of Oxford

Boosting Model Ada Mark Culp West Virginia University

Maximum Entropy maxent Yoshimasha TsuruokaTimothy Jurka

University of TokyoUC-Davis

Bagging, bootstrap adabag Esteban Alfaro-Cortes La Universidad de Castilla-La Mancha

Latent Diralect slda Jonathan Chang Facebook

Naïve Bayes e1071 David MeyerEvgenia Dimitriadout

Vienna University

Bayesian Network bnlearn Marco Scutari. UCL Genetics Institute

Hidden Markov hiddenmarkov David Harte Statistics Research

Page 13: Source Code Tons of Code
Page 14: Source Code Tons of Code

Industry Pct.Research 24%Higher Education 7%Information Technology 9%Computer Software 7%Financial Services 6%Banking 2%Pharmaceuticals 4%Biotechnology 4%Market Research 3%Management Consulting 3%Total 69%

Hadley Wickham

Asst. Professor of Statistics at Rice University

ggplot2plyrreshaperggobiprofr

Industries / Organizations Creating and Using R

Page 15: Source Code Tons of Code

Package Title Downloads1 plyr Tools for splitting, applying and combining data 840492 digest Create cryptographic hash digests of R objects 831923 ggplot2 An implementation of the Grammar of Graphics 827684 colorspace Color Space Manipulation 819015 stringr Make it easier to work with strings 776586 RColorBrewer ColorBrewer palettes 667837 reshape2 Flexibly reshape data: a reboot of the reshape package 649118 zoo S3 Infrastructure for Regular and Irregular Time Series 608449 proto Prototype object-based programming 59043

10 scales Scale functions for graphics 5836911 car Companion to Applied Regression 5745312 dichromat Color Schemes for Dichromats 5662413 gtable Arrange grobs in tables 5443114 munsell Munsell colour system 5318315 labeling Axis Labeling 5187716 Hmisc Harrell Miscellaneous 4783617 rJava Low-level R to Java interface 4773118 mvtnorm Multivariate Normal and t Distributions 4688419 bitops Bitwise Operations 4568920 rgl 3D visualization device system (OpenGL) 41001

http://www.r-statistics.com/2013/06/top-100-r-packages-for-2013-jan-may/

Top 100 R packages for 2013 (Jan-May)

Page 16: Source Code Tons of Code

Specialized“Domain”

Beginner Some Coverage

statsgraphics(both built-in)

Data Managementplyrreshape

Graphicsggplot2

BayesianDifferentialEquationsEconometricsEnvironmetricsExperimentalDesignFinanceGeneticsHighPerformanceComputingMachineLearningMedicalImagingNaturalLanguageProcessingPharmacokineticsPhylogeneticsPsychometricsSocialSciencesSpatialTimeSeries

Page 17: Source Code Tons of Code

Easy to

Use

InteractiveStandardVisualizations

SteepLearning

Curve

Visualization and Reporting

Page 18: Source Code Tons of Code
Page 19: Source Code Tons of Code

The R Graphics Package

Graphing Parameters

TitlesX-Axis TitleY-Axis TitleLegendScalesColorGridlines

library(help="graphics")

Basic Chart Types

Page 20: Source Code Tons of Code

In ggplot2 a plot is made up of layers.

ggplot2

Pl o t

Grammar of Graphics

Layer

- Data

- Mapping

- Geom

- Stat

- Postiion

Scale

Coord

Facet

Page 21: Source Code Tons of Code

Correlations Matrix library(car) scatterplotMatrix(h)

Page 22: Source Code Tons of Code

The Correlation Package was built on top of the Pairs Package

Page 23: Source Code Tons of Code
Page 24: Source Code Tons of Code

The next data visual was produced with about 150 lines of R code

Page 25: Source Code Tons of Code

http://rcharts.io/gallery/

Page 26: Source Code Tons of Code

https://plot.ly/r/

Page 28: Source Code Tons of Code

• http://statmethods.net/• good documentation and sample code

• http://stackoverflow.com/• helpful for trouble-shooting code

• http://www.r-bloggers.com/• helpful for hearing about new things

Additional Resources