best practices for big data analytics with machine learning by datameer
DESCRIPTION
Don't forget! You can watch the full Datameer recording here: http://info.datameer.com/Online-Slideshare-Big-Data-Analytics-Machine-Learning-OnDemand.html Learn through industry use cases, how to empower users to identify patterns & relationships for recommendations using big data analytics.TRANSCRIPT
© 2013 Datameer, Inc. All rights reserved.
Best Practices for Big Data Analytics with Machine Learning
Dr. Alex Guazzelli has co-authored the first book on PMML, the Predictive Model Markup Language. At Zementis, Dr. Guazzelli is responsible for developing core technology and analytical solutions for Big Data and real-time scoring. Most recently, Dr. Guazzelli started teaching a class on standards for predictive analytics at UC San Diego Extension.
About our Speakers
Dr. Alex Guazzelli Zementis Vice President, Analytics (@DrAlexGuazzelli)
• Came from Infomatica • Worked with start-ups • Infomatica purchased to bring data
solutions to market • Data quality • Master data management • B2B • Data security solutions
About our Speakers
• Over 15 years of enterprise software experience
• Co-authored 4 patents • Worked in a variety of engineering,
marketing and sales roles • Bachelors of Science degree in �
Management Science and Engineering from Stanford University
Karen Hsu Datameer Senior Director, Product Marketing (@Karenhsumar)
Agenda • Considerations • Best Practices • Demonstration
• Q&A
© 2013 Datameer, Inc. All rights reserved.
Considerations
Considerations
Target Users
Questions
Business IT
Descriptive! Predictive! Prescriptive!
Data Scientist
▪ Visual
Business Professional
Clustering
Decision Trees
Dependencies
+ More!
Target Users
IT
▪ Flexible, powerful
Target Users
▪ Algorithms ▪ SAS, SPSS, R
Data Scientist
Target Users
▪ Descriptive machine learning… – Tells you what has happened
Descriptive! Predictive! Prescriptive!Questions
▪ Predictive machine learning… – Answers the question what will happen
Descriptive! Predictive! Prescriptive!Questions
▪ Prescriptive machine learning… – What will happen, when it will happen, why
it will happen – Predict what will happen and prescribe how
to take advantage of this future
Descriptive! Predictive! Prescriptive!Questions
© 2013 Datameer, Inc. All rights reserved.
Best Practices
Lean Analytics
1. Integrate
3. Analyze
4. Visualize 2. PrepareIdentify
Use Case Deploy
Data Preparation
Profile Cleanse Enrich
Tran
sform
Bin
Normalize
Join
Union
Out
liers
Miss
ing
Valu
es
Inva
lid v
alue
s
Descriptive Analytics
Drag & Drop Smart Analytics
Predictive Analytics
Descriptive vs. Predictive Analytics " Descriptive Analytics answers “What happened?” " Predictive Analytics answers “What will happen next?”
Predictive Analytics helps you discover patterns in the past, which can signal what is ahead.
Predictive analytics is able to discover hidden patterns in historical data that the human expert may not see. It is in fact the result of mathematics applied to data. As such, it benefits from clever mathematical techniques as well as good data.
??
Example: Predicting Churn
Matt - Churned 2 days ago
Scott - “Liked” our company last week
John - ??
Churn-related features Matt 3 complaints in last 6 months Opened 2 support tickets in last 4 weeks Spent a total of $1,234 buying merchandise Spent a total of $123 in services Purchased 2 items in last 4 weeks Is 34 years old Is a male Lives in Los Angeles ...
Scott No complaints in last 6 months Opened 1 support ticket in last 4 weeks Spent a total of $9,876 buying merchandise Spent a total of $987 in services Purchased 12 items in last 4 weeks Is 54 years old Is a male Lives in Chicago ...
Big Data An ever expanding ocean of data containing
people and sensor data (lots and lots of it):
90% of the data today created in last 2 years
Breadth and Depth
" Transaction records " Social media " Climate information " Mobile GPS signals " Healthcare " Smart Grid " Digital Breadcrumbs
Churn-related “Big Data” features Matt 12 friends listed as customers 2 complaints from friends in last 6 months Average age of friends is 41 years old 2 friends churned in last 30 days No purchases for same items as friends 1 website visit in last 7 days 2 website pages opened during last visit Opened 3 newsletters in last 6 months ...
Scott 34 friends listed as customers 1 complaint from friends in last 6 months Average age of friends is 62 years old No friends churned in last 30 days Purchased same 2 items as friends in last 2 months 3 website visits in last 7 days 5 website pages opened during last visit Opened 12 newsletters in last 6 months ...
Predictive Model
Building a predictive model ... Model Training
Churn-related features
Churned Not-churned
Data Prediction
Hidden Layer
Input Layer
Output Layer
Neural Networks Linear/Logistic Regression Support Vector Machines Scorecards Decision Trees Clustering Association Rules K-Nearest Neighbors Naive Bayes Classifiers ...
Why not several models?
Model Ensemble
Data Pre-Processing
Raw Inputs
Prediction
Scores from all models are computed
Majority Voting, Weighted Voting,
Weighted Average, etc.
Model 1
Model 2
Model n
Voting . . .
End Goal: Predicting churn ...
Model Deployment and Execution in
Churn Risk
Score Churn-related
Features
Big Data
Predictive Churn Model
Production Environment
Scientist’s Desktop
SAS, R, IBM SPSS, Perl,
Python
Java, .NET C, SQL
Lost in Translation
From Model Building to Model Deployment (Traditionally ...)
SAS, R, IBM SPSS …
Great for model building but not for scoring, even
more so when it comes to Hadoop
From Model Building to Model Deployment (with PMML)
Model Building Model Deployment and Execution
" Angoss " BigML " FICO Model Builder " IBM SPSS " KNIME " KXEN " Microstrategy " Open Data " Pervasive DataRush " RapidMiner " R / Rattle " SAS " SAP Business Objects " Salford Systems " StatSoft STASTISTICA " SQL Server " TIBCO Spotfire " Custom Code, etc.
Universal PMML Plug-‐in (UPPI)
PMML (models)
PMML (models)
PMML (models) PMML
Datameer Server
Deploy in minutes ...
" PMML is an XML-based language used to define statistical and data mining models and to share these between compliant applications.
" It is a mature standard developed by the DMG (Data Mining Group) to avoid proprietary issues and incompatibilities and to deploy models.
" PMML eliminates need for custom model deployment and ensures reliability.
PMML defines a standard not only to represent data-mining models, but also data handling and data transformations (pre- and post-processing)
Predictive Model Markup Language
Models
Data Transformations
" Neural Networks (neural gas, radial-basis and backpropagation) " Support Vector Machines (for classification and regression) " Naive Bayes Classifier (for continuous and categorical inputs) " Rule Set Models " Clustering Models (2-step clustering, distribution and center-based) " Decision Trees (for classification and regression) " General Regression Models (Cox, General and Generalized Linear Models) " Regression Models (Linear, Logistic and Polynomial Regression Models) " Scorecards (with support for Reason Codes) " Restricted Boltzmann Machines " Association Rules " Multiple Models (with the possibility of having models spread over multiple PMML
files) " Model Ensemble (including Random Forest Models and Boosted Trees) " Model Segmentation " Model Chaining " Model Composition " Model Cascade
UPPI: Supported Techniques
© Zementis, Inc. - Confidential
Demonstration Flow
Descriptive Predictive Modeling Prescriptive Predictive
Production
Karen Alex Karen Karen
© 2013 Datameer, Inc. All rights reserved.
Descriptive Analytics
Descriptive Analytics ▪ Answers: What caused people to churn?
▪ Clustering ▪ Column Dependencies ▪ Decision Tree
Demonstration Flow
Descriptive Predictive Modeling Prescriptive Predictive
Production
Karen Alex Karen Karen
© 2013 Datameer, Inc. All rights reserved.
Predictive Analytics
Predictive Analytics ▪ Who will churn?
Demonstration Flow
Descriptive Predictive Modeling Prescriptive Predictive
Production
Karen Alex Karen Karen
© 2013 Datameer, Inc. All rights reserved.
Prescriptive Analytics
Prescriptive Analytics ▪ Who will churn? Why will they churn? ▪ What can we do to support that outcome?
Demonstration Flow
Descriptive Predictive Modeling Prescriptive Predictive
Production
Karen Alex Karen Karen
Q&A
Next Steps:
Page 40
More about Datameer and Big Data www.datameer.com
More about Zementis www.zementis.com
Contact us: Alex Guazzeli [email protected] Karen Hsu [email protected]