introduction to microsoft r services
TRANSCRIPT
![Page 1: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/1.jpg)
An introduction to Microsoft R Services
Microsoft R Open and Microsoft R Server
498 – Show and Tell Gregg Barrett
![Page 2: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/2.jpg)
Introduction
This presentation will briefly cover the following:
- Why consider MRO and R Server
- R Server
- MRO
- Microsoft R Services/R Server Platform- DistributedR
- RevoScaleR/ScaleR
- ConnectR
- DevelopR
- DeployR
- Resources
- References
![Page 3: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/3.jpg)
Why consider MRO and R Server
- You get the optionality of working with R and the added benefits of Microsoft R Open (MRO) and R Server
- Performance
- MRO is FREE
- R Server is FREE – well for students at least through DreamSpark
![Page 4: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/4.jpg)
Why consider MRO and R Server
(Gartner, 2015)
![Page 5: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/5.jpg)
Definition: Originally released in 1993, R is a mature, domain-specific and open-sourced language for statistical analysis workloads.
Trend Analysis: Gartner client inquiry levels for R remain light and range from exploratory to best-practice adopter themes; however, like MATLAB, the number of inquiries has increased substantially in recent years. External data sources reflect a growth in R usage across the industry as well. We expect inquiry levels to increase consistently through 2017.
Time to Next Market Phase: 2 to 5 years
Business Impact: The significant impact of "big data" analytics and real-time data analysis is driving demand for languages such as R and MATLAB beyond previous entrenched market niches and into increasingly mainstream programming workloads. In particular, adopters are turning to R as a free alternative to platforms such as SAS and SPSS.
User Advice: Consider R as a free and open-source solution for workloads that require advanced statistical computing or data mining capabilities with minimal coding and optimal maintenance costs over more general-purpose languages.
Sample Vendors: Microsoft, Oracle, TIBCO Software, IBM, Wolfram Research (Gartner, 2015)
Why consider MRO and R Server
![Page 6: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/6.jpg)
Why consider MRO and R Server
(Microsoft, 2016)
![Page 7: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/7.jpg)
R Server
- Revolution R Enterprise (RRE) was developed by Revolution Analytics
- RRE is intended to offer a fast, cost effective enterprise-class big data analytics platform
- Revolution Analytics was acquired by Microsoft
- RRE is now Microsoft R Server
- R Server is free for students and can be obtained through DreamSpark
- Logon or create a profile at DreamSpark using your university credentials: https://www.dreamspark.com/Product/Product.aspx?productid=105
![Page 8: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/8.jpg)
- RRE uses an R engine called Revolution R Open
- The Revolution R Open engine is now called Microsoft R Open (MRO)
- MRO is intended to be an enhanced distribution of open source R from Microsoft Corporation. Specifically Microsoft R Open leverages high-performance, multi-threaded math libraries to deliver performance boosts. This means that functions in R that use, for example, matrix multiplication, will run faster out of the box.
- Just like R, Microsoft R Open is open source and free
- You can download MRO here: https://mran.revolutionanalytics.com/download/
- MRO is intended to support a variety of big data statistics, predictive modelling, and machine learning capabilities
- At the time of this writing the latest version of MRO is version 3.2.5
MRO
![Page 9: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/9.jpg)
- It is important to note that R Server uses a different version of MRO
- At the time of this writing the latest version of MRO for R Server is version 3.2.2
- MRO for R Server can be found here: https://mran.revolutionanalytics.com/download/mro-for-mrs/
- MRO for R Server is a prerequisite for R Server
- After downloading and installing MRO whether it be the version for R Server or not, download and install MKL
- MKL is the Intel Math Kernel Library
- Important: Install Microsoft R Open first before MKL
MRO
![Page 10: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/10.jpg)
Microsoft R Services/R Server Platform
Note: There are name changes due to the Microsoft acquisition with the “Revo” designation/reference falling away – making things a little more challenging.
![Page 11: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/11.jpg)
Microsoft R Services is positioned as R for the Enterprise.
The feature set provided by the Microsoft R Services software can be categorized as follows:
- Microsoft R Open: High performance math libraries installed on top of a stable version of Open Source R
- DistributedR: Parallel and distributed computing framework for Big Data Analytics
- RevoScaleR/ScaleR: High performance, scalable, parallelized and distributable for Big Data Analytics in R
- ConnectR: Data connections for the Big Data Analytics
- DevelopR: An integrated development environment (IDE) for R on Windows
- DeployR: A web services software development kit for integrating R with third party products (including business intelligence, data visualization, rules engines, etc.)
Microsoft R Services/R Server Platform
![Page 12: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/12.jpg)
DistributedR
DistributedR allows you to run the same R script on multiple platforms; you can create a model in one environment such as a workstation and then deploy it on a different environment such as an on-site Microsoft SQL Server, a Teradata platform, or a Hadoop cluster in the cloud. You just need to specify the information about where these computations should be performed and what data should be analyzed.
For information on supported computing environments, look for the “compute contexts” in the RevoScaleR package.
![Page 13: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/13.jpg)
RevoScaleR
RevoScaleR/ScaleR package provides efficient, scalable computational power and allows for the development of ready-to-deploy suites of data processing and analytics with R.
To learn more, look for the RevoScaleR “rx” analysis and data manipulation functions and “rxExec” for HPC functionality. If you are computing decision trees, also check out the included RevoTreeView package that allows you to interactively visualize your decision trees.
Or run the following script: ?RevoScaleR
![Page 14: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/14.jpg)
The RevoScaleR package provides a way for you to connect with the data you may have stored in a variety of formats: SAS, SPSS, Teradata, ODBC, delimited and fixed format text, and Hadoop Distributed File System (HDFS) text files. You have a choice of:
1. keeping the data as is and analyzing it directly with RevoScaleR analysis functions,
2. extracting the data you want to analyze and storing it in the efficient and higher performance .xdf file format provided with the RevoScaleR package, or
3. bringing some or all of your data into memory as an R data frame to use with any R analysis function.
To learn more, look for data sources in the RevoScaleR package.
Note: The RevoScaleR package is included with every distribution of RRE/R Server, and is automatically loaded into memory when you start the program. So all of the “rx” functions mentioned are at your fingertips.
You can get information on them by using the ? at the command line, for example: ?rxLinMod
ConnectR
![Page 15: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/15.jpg)
DevelopR
Microsoft R Services provides a tool for the R developer to efficiently create sets of R scripts—the R Productivity Environment (RPE).
Working on a Windows workstation with the RPE, the R developer has a full-featured Visual Studio-like integrated development environment for R, including an indispensable visual debugger for R. The RPE has a customizable workspace, including an enhanced Script Editor, an Object Browser, a Solution Explorer, and an R Command Console.
![Page 16: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/16.jpg)
DeployR
The optional DeployR package provides the tools for doing just that; it is a full-featured web services software development kit for R which allows programmers to use Java, JavaScript or .Net to integrate the R analysis output with a third party package.
There are now Accelerators for DeployR which are starter kits for integrating with tools including:
- Microsoft Excel
- Tableau
- Jaspersoft
- QlikView
![Page 17: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/17.jpg)
R Server User Interface
![Page 18: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/18.jpg)
Resources
R Services 2016 Getting Started Guide:
https://packages.revolutionanalytics.com/doc/8.0.0/win/MicrosoftRServices_Getting_Started.pdf
Webinar “Using Microsoft R Server to Address Scalability Issues in R”: https://channel9.msdn.com/blogs/Cloud-and-Enterprise-Premium/Using-Microsoft-R-Server-to-Address-Scalability-Issues-in-R
Task Views are guides on CRAN that group sets of R packages and functions by type of analysis, fields, or methodologies. You can browse and find packages organized by task view:
https://mran.microsoft.com/taskview/
![Page 19: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/19.jpg)
Resources
Software available to NU students:
http://www.it.northwestern.edu/software/
https://northwestern.onthehub.com/WebStore/Welcome.aspx
https://www.dreamspark.com/Student/Software-Catalog.aspx
![Page 20: Introduction to Microsoft R Services](https://reader036.vdocuments.us/reader036/viewer/2022062310/5871296f1a28abe4448b6cc3/html5/thumbnails/20.jpg)
Gartner. (2015). IT Market Clock for Programming Languages, 2015. [Diagram]. Retrieved from Gartner. (2015).
IT Market Clock for Programming Languages, 2015. [pdf]. https://www.gartner.com/doc/3145117/it-market-clock-programming-languages
Gartner. (2015). IT Market Clock for Programming Languages, 2015. [pdf]. Retrieved from https://www.gartner.com/doc/3145117/it-market-clock-programming-languages
Microsoft, (2016). The Benefits of Multithreaded Performance with Microsoft R Open. [webpage]. Retrieved from
https://mran.microsoft.com/documents/rro/multithread/
Microsoft, (2016). R Services 2016 Getting Started Guide. [pdf]. Retrieved from https://packages.revolutionanalytics.com/doc/8.0.0/win/MicrosoftRServices_Getting_Started.pdf
References