optimizing your analytics life cycle with sas & teradata
TRANSCRIPT
Optimizing Your Analytics Life Cycle with SAS & Teradata
Paul Segal - TeradataJune 2017
2
• The Analytic Life Cycle
• Common Problems
• SAS & Teradata solutions
• Agenda
3
Analytical Life Cycle
3
PreparationPrepare Data for
Analytics
ExplorationExplore All Your Data
DeploymentDeliver
Results to Business
DevelopmentBuild Analytic
Models
4
Analytical Life Cycle
TEXT
PREPARATION DEVELOPM
ENT
DEPLOYMENTEXPL
ORA
TION
• what the data looks like• what variables are in the data set• whether there are any missing
observations• how are the data related• what are some of the data patterns
• combining data from numerous sources• handling inconsistent or non-standardized
data• cleaning dirty data• integrating data that was manually entered• dealing with semi-structured and structured
data
• customer retention• customer attrition/churn• marketing response• consumer loyalty and offers• fraud detection• credit scoring• risk management
• the probability of responding to a particular promotional offer
• the risk of an applicant defaulting on a loan• the propensity to pay off a debt• the likelihood a customer leave/churn• the probability to buy a product
Stage 2
Stage 1 Stage 3
Stage 4
5
Common Problems
5
6
Typical Customer Challenges
• “ My analytical process runs too slowly* ”
• “ We spend too much time moving data around between systems ”
• “ I want to make more/better use of my EDW/data platform”
• “ It takes forever to extract and score the data”
• “ There is too much data for us to analyse ”
• “ We can’t buy new hardware”
• “The quality of our analytical models is lower because we sample – I worry we are missing valuable segments”
* scoring / analysis / data quality / data transformation
7
LINE OF BUSINESS
ADW
EDW
PROGRAM MANAGER
FINANCE
SUPPLYCHAIN
I.T.
HR
The Analytic Data Warehouse
8
Data Management: The crux of the issue facing your analysts
BUSINESS PROBLEM
BUSINESS DECISION
20%80%Preparing to
solve the problemSolving
the problem
9
Data Management: SAS & Teradata working to change the Equation
BUSINESS PROBLEM
BUSINESS DECISION
20% 80%Preparing to solve
the problem
Solving the problem
10
Barriers to the Adoption of Analytics
Scarcity of analytical skillsThe need to grow analytical talent from within
Disjointed, inefficient workflowHow can you fail fast & learn to refine quickly
Tools that aren’t right for the jobLearning curve to create, share and collaborate
11
SAS & Teradata Solutions
11
12
Analytical Life CycleDeploymentPreparation
a
Exploration Development
• ACCESS to Teradata
• Code Accelerator
• Data Quality Accelerator
• Data Set Builder for SAS
CUSTOMERCUSTOMER NUMBERCUSTOMER NAMECUSTOMER CITYCUSTOMER POSTCUSTOMER STCUSTOMER ADDRCUSTOMER PHONECUSTOMER FAX
ORDERORDER NUMBERORDER
DATE
STATUSORDER ITEM BACKORDERED
QUANTITY
ITEMQUANTITYDESCRIPTION
ORDER ITEM SHIPPEDQUANTITYSHIP DATE
• Analytics Accelerator for Teradata
• SAS High-Performance Analytics Products
• TD Appliance for SAS
• Visual Analytics &Visual Statistics
• Teradata Appliance for SAS
• SAS Scoring Accelerator
• SAS Model Manager
• Teradata Appliance for SAS
13
DEPLOYMENT FLEXIBILITY: DESKTOP - SERVER - IN-DATABASE - IN-MEMORY - CLOUD
ARCHITECTURE FLEXIBILITY: SMP - MPP - HADOOP - GRID - ESP
Information Management
Data Mining / Anomaly Detection
Text Analytics Predictive Analytics
High Performance
AnalyticsFraud & Risk Detection
Reporting & Visualization
Integrated End-to-End Foundation
SAS Analytics in Teradata
14
In-Database Functionality
• PROC APPEND• PROC CONTENTS• PROC COPY• PROC DATASETS• PROC DELETE• PROC FORMAT• PROC FREQ• PROC MEANS• PROC PRINT• PROC RANK• PROC REPORT• PROC SORT• PROC SQL• PROC SUMMARY• PROC TABULATE
Statistical Analysis Procedures:
SAS Enterprise Miner
• PROC CANCORR• PROC CORR• PROC FACTOR• PROC PRINCOMP• PROC REG• PROC SCORE• PROC TIMESERIES• PROC VARCLUS
• PROC DMDB• PROC DMINE• PROC DMREG (Logistic Regression)• Also nodes for Input, Sample, Partition, Filter, Merge, Expand
SAS Analytics Accelerator for Teradata
• PROC SCORE works with coefficients from:
• PROC ACECLUS• PROC CALIS• PROC CANDISC• PROC DISCRIM• PROC FACTOR• PROC PRINCOMP• PROC TCALIS• PROC VARCLUS• PROC ORTHOREG• PROC QUANTREG• PROC REG• PROC ROBUSTREG
•Match code•Parsing/Casing•Gender/Pattern/Identification analysis•Standardization
SAS/Access to Teradata:• PROC DS2
SAS Code Accelerator for Teradata SAS Scoring Accelerator for Teradata• EM/STAT* Models
DQ Accelerator for Teradata
15
In-Database Example: Data Quality
• SAS Data Quality functions ported to operate in-database
• Invoked as SQL Stored Procedures• Processing leverages parallelism of
underlying RDBMS• Significant performance/throughput
benefits
• Supported functions: • Matchcode generation• Parsing• Standardization• Casing• Pattern analysis• Identification analysis • Gender analysis
16
Exploration and modeling – what does it bring?Influence Relationships Contributions
Data driven explorationEliminating guesswork with predictive analytics
Increased understanding
17
SAS Visual Analytics & Visual Statistics
SAS Visual Analytics • Data exploration and discovery • Distribution and summary statistics• Post-model analysis and reporting
SAS Visual Statistics • Build prediction and classification
models• Refine candidate models• Compare models and generate score
code
18
• Analyze 100% of data• More/New variables• More model iterations• Manage complex models• More models (per domain area)• More questions/ideas/scenarios to
evaluate• Multiple deployment options: batch,
real-time• Continuously monitor model
effectiveness and retrain
SAS High-Performance Analytics
19
SAS High-Performance Analytics Procedures
SAS High-Performance Statistics
SAS High-Performance Econometrics
SAS High-Performance Optimization
SAS High-Performance Data Mining
SAS High-Performance Text Mining
SAS High-Performance Forecasting
HPCANDISCHPFMMHPPLSHPPRINCOMPHPQUANTSELECTHPLOGISTICHPREGHPLMIXEDHPNLINHPSPLITHPGENSELECT
HPCDMHPCOPULAHPPSNELHPCOUNTREGHPSEVERITYHPQLIM
OPTLSOSelect features inOPTGRAPHOPTMILPOPTLPOPTMODEL
HPBNETHPCLUSHPSVMHPTSDRHPREDUCEHPNEURALHPFORESTHP4SCOREHPDECIDE
HPTMINEHPTMSCORE
HPFORECAST
The above offers contain these standard PROCS: HPDS2, HPDMDB, HPSAMPLE, HPSUMMARY, HPIMPUTE, HPBIN, HPCORR
20
HPA Example
SAS Analyst’s Desktop
Traditional SAS Server
SAS HPA Environment
21
Platforms for Optimal SAS Performance
Supports SAS in-memory analytics tools used to visualize data and build data models, giving them direct access to data within the Active Data Warehouse
Enables SAS in-database analytics to clean and prepare data as well as deploy and score models without having to move the data.
Provides a cost effective, preconfigured data storage and staging area that makes data available for SAS analytics
Teradata Active Data Warehouses
Teradata Appliance for SAS
Teradata Appliancefor Hadoop
22
A Fully Contained SAS Ecosystem in One Box
Data Persistent
Teradata Appliance for SAS
Teradata 2800 Appliance with SAS Analytics
A fully contain SAS Analytics Ecosystem in a single box from Teradata that integrates the SAS server directly into the cabinet with your data
SASManaged
Server from Teradata
TERADATA
23
Customer Success Story
• Projects involving high-end analytics were placing more and more demand on IT and Business Analytics teams
• Needed to streamline the analytics workflow
• Needed a solution that provided scalability and data consistency
• Process could not support additional analytics requests without adding headcount
• Replaced desktop tools with SAS server based HPA tools such as Enterprise Miner, Scoring Accelerator, Model Manager, VA & Data Mining
• Added Teradata Appliance for SAS for production and development of data models
• Expanded EDW • Added Data Labs
Environment to
• Improved the speed and performance of analytics
• Streamlined data preparation, evaluation and testing
• Added additional analytics capacity without requiring the addition of new headcount
• Freed up IT time used for data collection, loading and prep
Leveraging SAS and Teradata for advanced analytics to analyze customer data and loyalty programs with in-database and in-memory technologiesIssues Solutions Impact
Large US Retailer
24
• Minimize the need to move the data• Faster modeling times (months/weeks to
hours/minutes)• Improve data quality, availability and
consistency• Work with entire data sets, including enabling an
end-to-end view of data from across the enterprise
• Free up staff to focus more time on value-adding activities
Summary
25
• Data modeling development with SAS Enterprise Miner and Teradata
• High Performance Analytics with Teradata
• Scoring data in Teradata with SAS Scoring Accelerator
• Demo – Modeling and Deployment