modernizing your analytics environment toronto data ... · about deloitte’s past experiences on...
TRANSCRIPT
Modernizing your analytics environment
Toronto Data Science ForumNovember 13, 2019
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 2 of 33
Agenda
01
Our approach – SAS Viya02
Demo03
Conclusion04
Q&A05
Business problem
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 3 of 33
AI & Analytics trends
2000 2004 2008 2012 2016 2020
Data generated doublesevery 2 years
The total amount of worldwide data will be 40 zettabytes by 2020
What does it mean for business:
97.2% of organizations are investing in big data and AI.
By 2020, more than 40% of data science tasks will be automated.
Most companies only analyze 12% of the data they have.
76.5% of AI initiatives are empowered by the greater availability of data.
Business problem
Our approach – SAS Viya
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 4 of 33
Business problem
Explore the data potential using machine learning models to facilitate faster and more accurate loan underwriting
process while reducing default risk.
Increase Revenue Manage Risk Improve Experience
Business ObjectivesBusiness problem
Our approach – SAS Viya
Demo
Conclusion
Q&AOnboarding new customers, retaining better customers, reducing fraud
Tapping into non-traditional data sources for improved segmentation
Improving employee productivity
Allowing for modelers to model in their language of choice
Creating a repeatable processes that can scale
Providing the stability of enterprise level software solutions
Modernize your analytics environment
– with the credit risk use case
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 6 of 33
Our approach
For the purpose of this
demonstration, we are
leveraging the public
“Home Credit Default Risk”
dataset from Kaggle
We used exclusively SAS
solutions, including SAS Viya,
SAS Visual Data Mining and
Machine Learning, SAS Studio,
and the old school SAS Base
and Enterprise Guide
Though we are demonstrating
the Kaggle case, we will talk
about Deloitte’s past
experiences on the same
subject matter
KA GGL E DA TA SET SA S SOL UTI ONSDEL OI TTE EXPERTI SE
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Gathering Data Exploring Data Preparing Data Choosing a
model
Evaluation Hyperparameter
Tuning
DeploymentTraining
a model
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 7 of 33
Traditional Scorecarding or simple regression
Gathering data
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Credit
Decision
Alternative Data
#Age
#Income
#Debt Ratio
#Length of time employed
#Credit score
#Loan size
#Loan terms and conditions
#Certain public records
…
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 8 of 33
Alternative data in credit risk use case
Gathering data
PAYMENTS OF BILLS AND OTHER OBLIGATIONS
Examples of alternative data include a consumer’s payment history on
items not included in a traditional credit report, such as rent, utilities,
cell phone bills from certain providers, or other bills.
LOAN DATA FROM SPECIALTY BUREAUS
Examples of alternative data include the duration and payment
frequency of payday loans, rent-to-own agreements, short-term
installment.
BANK ACCOUNT AND TRANSACTION DATA
A wealth of insights can be gleaned from a consumer’s transaction
data within a bank account, including the size and frequency of
income and the magnitude and types of outflows.
OTHER DATA
In addition, lenders may consider data that is not as closely tied to
financial behavior, such as educational background, occupation,
social media, and customer reviews for business borrow.
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 9 of 33
Kaggle Home Credit dataset
Gathering data
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 10 of 33
Exploratory Data Analysis
Exploring the data
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Common graphical tools include histograms, scatter plots, bar charts, and stem-and-leaf plots.
There are also more modern graphical tools, such as heat maps and word clouds, which scale
well to large data sets.
Numerical summary methods are also used to explore data. These include summary statistics
for measure of central tendency such as the mean, median, or mode. Numeric measures of
variability such as variance, standard deviation, range, or interquartile-range are also used to
explore data.
Exploratory data analysis refers to the critical process of performing
initial investigations on data so as to discover patterns, to spot
anomalies, to test hypothesis and to check assumptions with the
help of summary statistics and graphical representations.
“
”
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 11 of 33
Exploring the data
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Summary statistics – Data Profile
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 12 of 33
Exploring the data
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Graphical representations – Data Exploration Node
Categorical variables
Numerical variables
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 13 of 33
Feature engineering
Preparing the data
Below are a few feature engineering examples that are commonly used by practitioners:
• Creating a new variable.
• Numeric encoding for high-cardinality nominal variables such as zip code.
• Normalizing, binning, log transformation for interval variables.
• Transformations based on missingness patterns.
• Dimension reduction techniques such as autoencoders, principal component analysis (PCA),
t-Distributed Stochastic Neighbor Embedding (t-SNE), singular value decomposition (SVD).
In predictive modeling tasks, data scientists consistently report that
they spend most of their time on feature engineering.
“”
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 14 of 33
Preparing the data
Constructing new features1.
High risk customers
High risk customers appear
to have “less stability” in
their lives as evidenced by
transactions in
• Irregular credit card,
phone, utility bills payment
• Legal fees, betting or
casino, towing companies,
hospital visits, etc.
• They may also have cash
advance usage
Time to be creative – insight on transactional data
Low risk customers
Low risk customers show
significant “leisure activity”
and “disposable income”
• Tourist attractions, boat
rentals, golf course
• Dentists, orthodontists,
contractors
Other behaviors
• Percentage of transactions
over the last 6 months that
take place during work
hours
• Total transactions amount
in the last 3 months to the
prior average
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 15 of 33
In SAS Data Studio On the go in Model Studio Other tools
Preparing the data
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Constructing new features
SAS EG
PythonR
SAS Studio
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 16 of 33
Preparing the data
Data preprocessing 2.
Unusual numbers
Set upper or lower bounds;
Impute missing values;
Transform using log or other distribution. Etc.
Selecting key features
Unsupervised selection;
Supervised selection;
Tree-based selection, etc.
Extracting new features
PCA, Robust PCA, SVD, Autoencoder.
Clustering features into groups
Choose to keep one feature from each cluster; Or
Compute the first principal component of each cluster
using PCA.
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 17 of 33
Preparing the data
Feature engineering
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 18 of 33
OversamplingExamine the Target column
(Address unbalanced data)
Preparing the data
3.
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 19 of 33
Choosing a model
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
No one algorithm works best for every problem!
Choosing the best model is a process of educated trial and error with acute data-intuitions.
No one-size-fits-all!
Regression Neural Network Tree-Based SVM Bayesian
Interval
Binary
Nominal
What is the type of target we are trying to estimate?
• Interval: housing price, stock market price, number of car accidents
• Binary: junk email detection, malignant tumor detection
• Nominal: handwriting recognition (10 digits, 26 letters…)
Question 1 Target Type
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 20 of 33
Choosing a model
Is interpretability or explainable documentation an important part of your model governance?
• If interpretability is important, use decision trees or a regression technique
• If an uninterpretable prediction is acceptable, you should use sophisticated algorithms such as a neural network, a support vector machine, or any ensemble model to achieve a highly accurate model.
Question 2 Interpretability
Regression Neural Network Tree-Based SVM Bayesian
High Moderate ModerateLow LowInterpretability
Some powerful models take more computational resources and longer time to be trained, and they can capture the complex relationships between features and the target.
If time and computational resources are not a constraint, hyper-parameter auto-tuning can be leveraged to search for the optimal settings for the problem.
Question 3 Time Constraints
Auto-tuning
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 21 of 33
Choosing a model
Or…try all the models
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 22 of 33
Training a model
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Gradient Boosting model
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 23 of 33
Evaluation
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Gradient Boosting model
73.33%73.76%
75.84%
70.00%
71.00%
72.00%
73.00%
74.00%
75.00%
76.00%
77.00%
78.00%
79.00%
80.00%
AUROC
Application
Form Data
Application
+ Bureau Data
All Data
AUROC
Cumulative Lift
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 24 of 33
Hyperparameter tuning
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Gradient Boosting model
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 25 of 33
Hyperparameter tuning
Evaluation
73.33%73.68%
75.84%76.47%
70.00%
71.00%
72.00%
73.00%
74.00%
75.00%
76.00%
77.00%
78.00%
79.00%
80.00%
AUROC
Application
Form Data
Application
+
Auto-tuning
All Data All Data
+
Auto-tuning
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 26 of 33
Deployment
Business problem
Our approach – SAS Viya
Gathering the data
Exploring the data
Preparing the data
Choosing a model
Training a model
Evaluation
Hyperparameter tuning
Deployment
Demo
Conclusion
Q&A
SAS Model Manager
Model governance is available through SAS Model
Manager to cover the complete life cycle models and
retrain them through a centralized repository;
Visual Lineage is very effective to understand data
flow, model life cycle, and version control;
SAS provides the ability to write once and execute
the code in multiple targets (In-SAS, In-
Database / Hadoop, In-Stream, APIs), which
reduces costly recoding and testing for production
with no language conversion.
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 27 of 33
Credit Risk Use Case
Business problem
Our approach – SAS Viya
Demo
Conclusion
Q&A
Business Impact
With 4% increase in accuracy ratio, a mid-sized bank ($50B) can expect $4B in new loan origination and around $4M in additional profit.
Increase Revenue
Predicting probability of defaults allows for automatic high quality loan approvals; extending credits to better customers reduces risk.
With 4% increase in accuracy ratio, a bank's default rate would have been about 18% lower than if it had used the weaker model.
Tapping into alternative data sources allows for improved segmentation, reaching customers with both prime and non-prime credit history.
With high quality loans approved automatically, adjudicators can spend more time on high risk applications for maximum value generation.
Manage Risks
Improve experience
Modernize your analytics environment
– demonstration in SAS Viya
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 29 of 33
SAS Viya – Fraud Detection
Can we detect fraudulent transactions with optimal accuracy?
Business Problem
Business problem
Our approach – SAS Viya
Demo
Conclusion
Q&A
Impact
Increase Revenue
Manage Risk
Improve experience
Minimize revenue loss by detection of most fraudulent transactions
Ability to detect fraudulent transactions at high accuracy (0.95 ROC)
Maintain high quality customer service by appropriately allowing legitimate transactions
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 30 of 33
SAS Viya – Customer Intelligence
Can we predict churn and prevent loss of revenue to attrition?
Business Problem
Impact
Increase Revenue
Reduce Costs
Improve experience
Excellent model classification ability that identifies at-risk customers accurately: 5% reduction in churn equals $1.6M savings in total
Minimize marketing cost by targeting only the at-risk customers while maintaining high customer retention rate
Provide opportunity to take proactive steps towards personalized customer experience to resolve 80% of retention problems
Business problem
Our approach – SAS Viya
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 31 of 33
SAS Viya features
User-friendly features
Advanced features
Point and Click Data Exploration Variable Selection
Model Building Model Assessment
Model Deployment and Management
Hyper-parameter
auto-tuning
Massive Parallel Processing (MPP)
Cloud Computing
Business problem
Our approach – SAS Viya
Demo
Conclusion
Q&A
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 32 of 33
Q&A
Business problem
Our approach – SAS Viya
Demo
Conclusion
Q&A
Thank you!
Toronto Data Science ForumModernizing Your Analytics Environment
© 2019 Deloitte Touche Tohmatsu Limited
Slide 33 of 33
Contact Information
Nat D’ErcolePartner, Omnia AI Global Alliance [email protected] ǀ (416) 643-8063
Mahdi AmriPartner, Omnia AIClients and [email protected] ǀ (514) 702-6578
Raymond Outar Director, Omnia AIGlobal CoE [email protected] ǀ (416) 775-7220
Loreto ChiovittiManager, Omnia AICanada CoE Leadership Team [email protected] ǀ (416) 360-1087
Axel Siliadin, PhDSenior Manager, Omnia AICanada CoE Leadership [email protected] ǀ (514) 393-7061
SAS Center of Excellence
Kumaran SivagnanamConsultant, Omnia AI Data [email protected] ǀ (416) 354-0912
Jeffrey DuSenior Consultant, Omnia AIFinancial Risk [email protected] ǀ (416) 202-2717
Presenters
Ghislene ZerguiniManager, Omnia AI [email protected] ǀ (514) 390-4571
Jinender GulatiSenior Consultant, Omnia [email protected] ǀ (416) 775-8857
SAS Viya Solution Leads
Headline Verdana Bold
www.deloitte.caDeloitte provides audit & assurance, consulting, financial advisory, risk advisory, tax and related services to public and private clients spanning multiple industries. Deloitte serves four out of five Fortune Global 500® companies through a globally connected network of member firms in more than 150 countries and territories bringing world-class capabilities, insights and service to address clients’ most complex business challenges. To learn more about how Deloitte’s approximately 264,000 professionals—9,400 of whom are based in Canada—make an impact that matters, please connect with us on LinkedIn, Twitter or Facebook.
Deloitte LLP, an Ontario limited liability partnership, is the Canadian member firm of Deloitte Touche Tohmatsu Limited. Deloitte refers to one or more of Deloitte Touche Tohmatsu Limited, a UK private company limited by guarantee, and its network of member firms, each of which is a legally separate and independent entity. Please see www.deloitte.com/about for a detailed description of the legal structure of Deloitte Touche Tohmatsu Limited and its member firms.
© Deloitte LLP and affiliated entities.