quick and dirty: scaling out predictive models using revolution analytics on teradata

10

Rapid Productionalization of Predictive Models In-database Modeling with Revolution Analytics on Teradata Skylar Lyon Accenture Analytics

Upload: revolution-analytics

Post on 10-Nov-2014

5.089 views

Category:

Technology

0 download

Report

Download

Embed Size (px):

DESCRIPTION

[Presentation by Skylar Lyon at DataWeek 2014, September 17 2014.] I recently faced the task of how to scale out an existing analytics process. The schedule was compressed - it always is in my world. The data was big - 400+ million rows waiting in database. What did I do? I offered my favorite type of solution - quick and dirty. At the outset, I wasn't sure how easy it would be. Nor was I certain of realized performance gains. But the concept seemed sound and the exercise fun. Let's move the compute to the data via Revolution R Enterprise for Teradata. This presentation outlines my approach in leveraging a colleague's R models as I experimented with running R in-database. Would my path lead to significant improvement? Could it be used to productionalize the workflow?

TRANSCRIPT

Page 1: Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Rapid Productionalization of Predictive Models

In-database Modeling with Revolution Analytics on Teradata

Skylar Lyon

Accenture Analytics

Page 2: Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Copyright © 2014 Accenture. All rights reserved. 2

• 7 years of experience with focus on big data and predictive analytics - using discrete choice modeling, random forest classification, ensemble modeling, and clustering

• Technology experience includes: Hadoop, Accumulo, PostgreSQL, qGIS, JBoss, Tomcat, R, GeoMesa, and more

• Worked from Army installations across the nation and also had the opportunity to travel twice to Baghdad to deploy solutions downrange.

Skylar Lyon

Accenture Analytics

Introduction

Page 3: Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Copyright © 2014 Accenture. All rights reserved. 3

• New Customer Analytics team for Silicon Valley Internet eCommerce giant

• Data scientists developing predictive models

• Deferred focus on productionalization

• Joined as Big Data Infrastructure and Analytics Lead

Project background and my involvement

How we got here

Page 4: Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Copyright © 2014 Accenture. All rights reserved. 4

• 50+ Independent variables including categorical with indicator variables

• Train from small sample (many thousands) – not a problem in and of itself

• Scoring across entire corpus (many hundred millions) – slightly more challenging

Binomial logistic regression

Colleague‘s CRAN R model

Page 5: Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Copyright © 2014 Accenture. All rights reserved. 5

We moved compute to data

We optimized the current productionalization process

Before After

Reduced 5+ hour process to 40 seconds

Page 6: Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Copyright © 2014 Accenture. All rights reserved. 6

5+ hours to 40 seconds: Recommendation is that this now become the defacto productionalization process

Benchmarking our optimized process

rows

min

ute

s

Page 7: Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Copyright © 2014 Accenture. All rights reserved. 7

Beforetrainit <- glm(as.formula(specs[[i]]), data = training.data, family='binomial', maxit=iters)fits <- predict(trainit, newdata=test.data, type='response')Aftertrainit <- rxGlm(as.formula(specs[[i]]), data = training.data, family='binomial', maxIterations=iters)fits <- rxPredict(trainit, newdata=test.data, type='response')

Recode CRAN R to Rx R

Optimization process

Page 8: Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Copyright © 2014 Accenture. All rights reserved. 8

• Train in-database on much larger set – reduces need to sample

• Nearly “native” R language – decrease deploy time

• Hadoop support – score in multiple data warehouses

Technology is increasing data science team’s options and opportunities

Additional benefits to new process

Page 9: Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Copyright © 2014 Accenture. All rights reserved. 9

• Technical Considerations

Table of Contents

Appendix

Page 10: Quick and Dirty: Scaling Out Predictive Models Using Revolution Analytics on Teradata

Copyright © 2014 Accenture. All rights reserved. 10

• Teradata environment – 4 node, 1700 series appliance server

• Revolution R Enterprise – version 7.1, running R 3.0.2

Environment setup

Technical considerations

Teradata Interview

Introduction to Teradata Client Tools. 2 Introduction to Teradata SQL OBJECTIVES : Teradata Product Components. Accessing Teradata – Database

Teradata to Teradata: Max Protection · Teradata to Teradata: Max Protection Page 3 Teradata Change Data Capture facility of the Teradata Database must be installed and configured

Ventana Research Best Practices in Big Data and Predictive Insights Presentation at Teradata Partners

teradata admin training | teradata Course | teradata database administration training | teradata administrator training

Teradata Overview

Teradata MultiLoad Reference - Home | Teradata Developer Exchange

Teradata On Microsoft Azure Innovating Together · Teradata Vantage Teradata Viewpoint (Multiple Systems) Teradata Server Management Teradata Data Stream Controller Teradata Query

Curriculum Map - TEN.Support Maps_2008.pdfDevelopment ILT Teradata SQL ILT Teradata SQL WBT or Teradata or Advanced SQL ILT Advanced Teradata SQL WBT Teradata Basics V2R5 NR0-011 Exam

Data Protection in Teradata and Teradata Utilities Overview

· Teradata 15 Teradata Tools and Utilities 14.10 Teradata 15 Teradata 15 Teradata 15 HicroStrategy 9.4. 1 IBU spss Yodeler Server 16.0 IBM spss Modeler Client 16.0 Informatica PowerCenter

Informatica / Teradata Best Practices · PDF file2 Informatica / Teradata Best Practices Chris Ward, Sr. Consultant, DI CoE, Teradata John Hennessey, Sr. Consultant, DI CoE, Teradata

Teradata Viewpoint User Guide - Teradata Downloads · PDF fileThe product or products described in this book are licensed products of Teradata Corporation or its affiliates. Teradata,

Teradata Parallel Transporter User Guidedeveloper.teradata.com/sites/all/files/documentation/linked_docs/... · • Teradata Tools and Utilities Installation Guide ... • Teradata

Basic Teradata Query Reference - Teradata Downloadsdeveloper.teradata.com/sites/all/files/documentation/linked_docs/... · work with Teradata Database. Teradata BTEQ software is a

Teradata DbLink User Guide v1.2 DbLink User... · Teradata DbLink Teradata CONFIDENTIAL 1-1 1 Overview 1.1 Teradata DbLink Teradata DbLink is a database facility that allows data

Defining a Teradata Library with the TERADATA …support.sas.com/...management-console-defining-teradata-library.pdf · Technical Paper . Defining a Teradata Library with the TERADATA

Introduction to Teradata And How Teradata Works

Teradata Parallel Transporter User Guide - Teradata Developer

Teradata Viewpoint Configuration Guide - Home | Teradata

INTRODUCTION BUSINESS ANALYTICS...University of Iowa, Teradata, Walt Disney Company. Past event themes include Data Visualization, Data Mining, Risk Analytics, and Predictive Analytics

Teradata MultiLoad Reference - Teradata Downloads · Teradata MultiLoad Reference 3 Preface Purpose This book provides information about Teradata MultiLoad, which is a Teradata®

Teradata Database Administration Class Outlinecoffingdw.com/.../Teradata-Database-Administration.pdf · Teradata Database Administration Class Outline CoffingDW education has been

Teradata Utilities

Teradata to Teradata: Max Protection - · PDF fileTeradata to Teradata: Max Protection Page 1 Teradata to Teradata: Max Protection Objective Upon completion of this lesson, you will

Security Features in Teradata Database · Teradata Database. Teradata Solutions Methodology Teradata believes that organizations with data warehouses that consolidate and centralize

Teradata to Teradata: Max Performance - · PDF fileTeradata to Teradata: Max Performance Page 1 ... Using the Teradata command interface, BTEQ, login to the source Teradata server

Defining a Teradata Library with the TERADATA Engine in ... · PDF fileTechnical Paper . Defining a Teradata Library with the TERADATA Engine in SAS® Management Console

Teradata Manager - dbmanagement.infodbmanagement.info/Books/MIX/Teradata_Manager_TeraData.pdf · the Basic Teradata Query Facility (BTEQ), ... Teradata Manager software, documenta-tion,

Teradata Past, Present and Future Todd Walter CTO – Teradata Labs

Teradata to Teradata: Max Performance - Oracle · Teradata ODBC installed in the source and target environment. Teradata Change Data Capture facility of the Teradata Database must

Basic Teradata Query Reference - Teradata Downloads · PDF fileBasic Teradata Query Reference 3 Preface Purpose This book provides information about Basic Teradata Query (BTEQ), which

Teradata Viewpoint Configuration Guide service has an init script on the Teradata Viewpoint server ... To check if the Teradata Managed Server Monitor is ... Teradata Viewpoint Configuration

· %1JE6-1 Y 7 Teradata 15 Teradata Tools and Utilities 14.10 Teradata 15 Teradata 15 Teradata 15 HicroStrategy 9.4. 1 IBU spss Yodeler Server 16.0