emerging business applications of high performance analytics pivotal
TRANSCRIPT
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 1/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
© Copyright 2013 Pivotal. All rights reserved.
Emerging Business Applications ofHigh Performance Analytics
August 2014
Tan Yaw, Sr. Data Scientist
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 2/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Table of Contents! Introduction! Data Lake! Analytics! Labs
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 3/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Pivotal At-a-Glance
! New Independent Venture: Spun jointly owned by EMC & VMware
! Top Talent: 1700~ employees ! Proven Leadership: Paul Maritz,
! Global Customer Validation:+1000 Tier-1 Enterprise Customers
! Strategic Backing: $105M investm! Bold Vision: New platform for a ne
focused on the intersection of Big Daand Agile Software Development
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 4/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
EMC Federation
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 5/39SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics
© Copyright 2013 Pivotal. All rights reserved.
Pivotal Data LabsF (X ) =
1M
M
Xm=1
T m(X ) = 1M
M
Xm=1
n
Xi=1
W im(X )Y i =n
Xi=1
1M
M
Xm=1
W im(X )!Y i
Pivotal – What we do
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 6/39SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics
© Copyright 2013 Pivotal. All rights reserved.
Customer Reference
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 7/39SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics
© Copyright 2013 Pivotal. All rights reserved.Pivotal Confidential–Internal Use Onl
Data Lake
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 8/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Big Data
! Bnsa
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 9/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Pivotal Business Data Lake ArchitectureCentralized Management
System monitoring System management
Unified Data Management TierData mgmt.
servicesMDMRDM
Audit andpolicy mgmt.
Processing Tier
Workflow Management
Distillation Tier
HDFS storageUnstructured and structured data
In-memory
MPP database
Unified Sources Flexible Actions
Real-timeingestion
Micro batchingestion
Batchingestion
Realinsig
Interinsig
Batcinsig
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 10/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Emerging Analytics Architecture
AnaData M
MPP Dat
Enterprise Data WarehouseRDBMS
Data StagingPlatform
DataIngestion
Streams/Feeds
Descriptive AnalyticsBusiness Analysis
Predictive AnalyticsData Science
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 11/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Criteria Business Data Lake EDW
CommonData Model
Single Standard Data view = Base classEnhanced Local Data view = DerivedClasses
Single Class = Siacross the enterpr
DataQuality
DataIntegration
MultipleInterfaces
SQL, SAS, R, MapReduce, NoSQL SQL access &Integration with S
Quality ofService
Mixed workload with varying QoS Limited QoS separarequired
Full Spectrum
How is Business Data Lake Different
Low Latency Interactive Batch
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 12/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Hadoop at the Center
Enabling the Data Driven Enterprise
Fastest SQL Query Engine
Hadoop as a Service
Big Data On-DemandGe
In-Memory Re
SpBuilding B
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 13/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
China Citic Bank implements Data Lake to integrmultiple databases and in-database modelling forrapid model deployment.
Opportunity: Integrate the bank’s FICO TRIAD CustomerManagement Solution, Database Marketing platform, IBM CognosBusiness Intelligence software, and subcenter customerrelationship management (CRM)Business Benefits:
•
More Productive Telephone Sales Center• Optimized Marketing Campaigns (1286 with 86% reduction inconfiguration time)
• Faster model deployment via ‘In-database’ analytics
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 14/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
GPDBAN IDEAL
WinNYSE Euronext manage exponential data growsupport analytic applications
“Pivotal rings the NYSE bell on Oct 29”
Opportunity: Work with NYSE technologies division on new historical archive p
Solution: Data Infrastructure to allow NYSE to handle trading data real time.
Co-developed in partnership with NYSE Technologies, Pivotal Data Dispatch is aimsquarely at the big data information worker. The idea of this product is to provide daanalysts with an easy way to provision various big data sets from any source, includHadoop, MPP, flat files or legacy databases.
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 15/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Improving security analytics and implementing Data Lakearchitecture based on Pivotal HD, HAWQ and GPDB
Business Challenge: : Improving security analytics for credit card transactionsdeveloping “Data Lake” architecture for future projects.Volume: 4 TBs and growing
Solution: Data Lake architecture including Pivotal HD, HAWQ and
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 16/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.Pivotal Confidential–Internal Use Onl
Analytics
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 17/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Moneyball
• Lost your best players• No resources• Competing against richer, be
opposition
Q:How do you comp A: Data Driven Anal
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 18/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Man vs. Machine : Simple Charts
! Traditionally, ‘Man’ takes data and turn them into charts in order tovisualize relationships. Charts are simple and easy to interpret
! ‘Man’ has ‘Analytical Limits’. We inherently view the world in 2-Dimensions and in simple linear relationships
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 19/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Man vs. Machine: Complex relationships
! But the real-world is complex!
! It is not just X and Y relationships. X " Y" Z" A" B" C
! It is not just linear.
! Charts that try and visualize complex relationships are themselves mcomplex.
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 20/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Man vs Machine : Finding Patterns! How do we classify and identify different groups within a
dataset
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 21/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Man vs Machine: Machine Learning
X
Y
52
3
! Machines are able to analyze complex
patterns within the data that thehuman mind has difficulty visualizing
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 22/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Evidence-Based DecisionsWhen somebody on staff asks what we should do toaddress a problem, the first questions I now ask are
‘What does the research say? What is the evidence base?
The core idea is that decisions supported by hard factsand sound analysis are likely to be better than decisionsmade on the basis of instinct, folklore or informalanecdotal evidence.
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 23/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Decision-Based EvidenceMany managers think they’ve committed their organizations to evidebased decision making
— but have instead, without realizing it, committed to decision-baseevidence creation.
When asking staff to conduct a major analysis, a projectteam told us, “The executives have already made uptheir minds ! . We are being told that this is the way thatwe are going, we need to get on board and make thedecision work out to be [the new choice].”
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 24/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Route OptimizationCustomer
A major courier delivery services company
Business Problem
Optimizing routing decisions while meeting thedemand and satisfying the many businessconstraints to guarantee feasibility andcompliance.
Challenges
• Routing problems are known to be NP-Hard
• Size of the operation. Delivery of 3 millionpackages a day with the largest fleet in the US
• Existing solution takes weeks to roll outmonthly routing plans
Solution
• Avoided expensive data movementsdemand forecasting and route optimdatabase
• Built a fully parallelized approximathat featured a variation of Floyd Wshortest paths and neighborhood sea
• Achieved significant reduction in fuover a greedy initial
feasible solution
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 25/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Predicting Commodity Futures through TwCustomer
A major a agri-business cooperativeBusiness Problem
Predict price of commodity futures throughTwitter
Challenges
! Language on Twitter does not adhere torules of grammar and has poor structure
! No domain specific label corpus of tweetsentiment – problem is semi-supervised
Solution
! Built Sentiment Analysis and T
Regression algorithms to predifutures from Tweets
! Established the foundation for structured data (market fundamunstructured data (tweets)
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 26/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Credit Risk Assessment and Stress Testing
Customer
A global financial services provider
Business Problem
Speed up the process of compliance reportingand stress testing for Basel III.
Challenges
Running the calculation procedures on thecustomer’s legacy database were time-consuming, therefore had to be done inovernight batch mode.
Solution
! Implement risk asset calculatiotesting on the Greenplum datab
! Three years of data was procesunder 2 minutes, significantly fcustomer’s current procedures.
! Connect an “in-database”
visualization tool to theGreenplum database viaODBC for on-demandreporting and visualization.
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 27/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Text Analytics for Churn PredictionCustomer
A major telecom companyBusiness Problem
Reducing churn through more accuratemodels
Challenges
! Existing models only used structuredfeatures
! Call center memos had poor structure andhad lots of typos
Solution
! Built sentiment analysis model
churn and topic models to undeof conversation in call center m
! Achieved 16% improve in ROCChurn Prediction
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 28/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Cross-Channel Customer EngagementCustomer
A major health insurance company
Business Problem
As each call to the call center represents asignificant cost to the company, find out whencustomers are using the call center instead of thewebsite
Challenges
# Unstructured text data requires considerablepreprocessing
Solution
# Used logistic regression to predict wcustomer would be unable to find theinformation on the web and need to
# Created a topic model based on the cto learn what these customers were cabout, since these would be the topicwere having trouble finding on the w
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 29/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.Pivotal Confidential–Internal Use Onl
Labs
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 30/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Industrial-era business practices! Many enterprise-gra
business practices arsuited for an industr
! But may face challendealing with the Intewhere ‘Speed’ and ‘
are being key compelevers
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 31/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Industrial-era business practices
! Waterfall Project Mgmt! Develop, Test, Production, DR environments! Detailed Requirements!
Structured Data Schema
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 32/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Knowledge-era business practices
INNOVATIO
! Silicon Valley has always been a hot-bed of innovation.
! When working with new technology, demanding high-availabilityspeed, uncertain customer preferences, DIFFERENT BUSINESSPROCESS are needed
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 33/39
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 34/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Labs Experiments! Data Lab experim
as a key approachgenerating value
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 35/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Data Labs
Data ScienceData Engineering
+
MAD Approach to Analytics
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 36/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
MAD Approach to AnalyticsMagnetic - attracting data to your EDW by r“barriers to entry”
Agile – enabling rapid analyses through thof powerful tools as close as possible to the data
Deep – going beyond basic data operations to eanalysts to reach new, rich depths in their
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 37/39
Pivotal Confidential–Internal Use Only Pivotal Confidential–Internal Use Only
Conclusion
A l i Vi i
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 38/39
Pivotal Confidential–Internal Use Only
Analytics Vision
Use into iterativ
your p
Build
Right TCleanse, organize, andmanage you data lake
Make the right toolsavailable
Use the resources wiselyto compute, analyze, and
understand data
Obsessively collectdata
Keep it forever
Put the data in oneplace
Analyze AnythingStore Everything
8/10/2019 Emerging Business Applications of High Performance Analytics Pivotal
http://slidepdf.com/reader/full/emerging-business-applications-of-high-performance-analytics-pivotal 39/39
SAS Event: Kuala Lumpur - Hadoop, Big Data & Analytics© Copyright 2013 Pivotal. All rights reserved.
Thank You