harnessing big data for analytics, insights, telematics · harnessing big data for analytics,...
TRANSCRIPT
Stephanie McReynolds
Senior Director of Product and Technical Marketing
February 2012
Harnessing Big Data for Analytics, Insights, Telematics
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 2
Teradata
• Integrated Data Warehouse
• Platform Family
• Interoperability & Consulting
Business Applications
Big Data Analytics
Data
Warehousing
• Aster MapReduce Platform
• Hadoop Partnerships
•Aprimo Applications
•Strategic Partnerships
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 3
What is Big Data?
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 4
• Big Data = Large scale (data volume) analytics MPP SQL databases have delivered large scale analytics for over a
decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.
What is Big Data?
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 5
• Big Data = Large scale (data volume) analytics MPP SQL databases have delivered large scale analytics for over a
decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.
• Big Data = Emerging new data types New multi-structured data types with unknown relationships that
require processing of data regardless of size to discover insights. Examples include web logs, sensor networks, social networks, text.
What is Big Data?
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 6
Big Data Challenges are More Than Data Size
“CIOs face significant
challenges in addressing the issues surrounding big data…
New technologies and applications are emerging (examples include Hadoop and MapReduce)
and should be investigated to understand their potential value.”
Source: CEO Advisory: ‘Big Data’ Equals Big Opportunity,
Gartner, 31 March 2011.
The Four Axes of Big Data
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 7
• Big Data = Large scale (data volume) analytics MPP SQL databases have delivered large scale analytics for over a
decade. Teradata has been the leader in large scale SQL analytics with over 16 customers with a Petabyte or more of data.
• Big Data = Emerging new data types New multi-structured data types with unknown relationships that
require processing of data regardless of size to discover insights. Examples include web logs, sensor networks, social networks, text.
• Big Data = New (non-SQL) analytics New Analytic Frameworks that provides parallel processing on
semi-structured data. Leveraging the power of MapReduce (Programmatic Languages; Java, Python, Perl, C, C++)
What is Big Data?
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 8
What is MapReduce?
• A parallel programming framework - Made popular by Google
• Generate search indexes
• Web scoring algorithms
- C++, Java, Python, etc.
- Harness 1000s of CPUs
• MapReduce provides - Automatic parallelization
- Fault tolerance
- Monitoring & status updates
“MapReduce allows programmers without any experience with parallel and distributed systems to easily utilize the resources of a large distributed system.”
- Jeffrey Dean and Sanjay Ghemawat,
Google, Inc., 2004
Scheduler
Results
Map Function
map
reduce
shuffle
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 9
MapReduce Analytics
Example: Pattern Matching Analysis
SQL-MapReduce • Single-pass of data • Linked list sequential analysis Traditional SQL • Self-Joins for sequencing • Limited operators for ordered data
Big Data in Banking
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 11
Challenge
• Know the “last mile” of a decision
• Data Mining tools predict probability but do not ID the “last mile”
With Teradata Aster
• SQL-MapReduce listens and predicts the “last mile”
- Identifies all interaction patterns prior to acquisition or attrition
Impact
• 10-300x less effort to pinpoint a customer in the “last mile”
Banking: “Last Mile” Marketing
92,000 Online Sessions
25,000 ATM Sessions 34,000 Branch Visits
Cross-Channel Customer Interactions
17,000 Customers, 1 Month
5,000 Call Center Sessions
43,000 E-mails
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 12
Financial Data Sets Analyzed
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 13
Events Preceding Account Closure
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 14
Interactive Analytics: Finding the Signal in Noise
SELECT *
FROM nPath (
ON (…)
PARTITION BY sba_id
ORDER BY datestamp
MODE (NONOVERLAPPING)
PATTERN ('(OTHER_EVENT|FEE_EVENT)+')
SYMBOLS (
event LIKE '%REVERSE FEE%' AS FEE_EVENT,
event NOT LIKE '%REVERSE FEE%' AS OTHER_EVENT)
RESULT (…)
) n;
Events Preceding Account Closure
Fee reversal seems to be a
“Signal”
Big Data in Retail
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 16
Retail: Digital Marketing Beyond the “Last Touch”
Jan 5: Organic Search Jan 10: E-mail Response
Jan 15: Response to Tweet Jan 7: Website Visit
Jan 20: In-store Purchase
• How would I re-allocate marketing budget if I knew it took all 5-6 touches to close the
customer but only one e-mail campaign? What could I do?
• Manage campaigns integrated programs
• Shift budget from some marketing assets to others
• Stop making “last-touch” decisions
“54 percent of marketers identified the ability to understand attribution as a project
that would be most beneficial to their business.”
– Forrester, 2011
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 17
Challenge
• Consolidate all digital touches to evaluate path to purchase
• Understand impact across both marketing and organic touches
With Teradata Aster
• SQL-MapReduce identifies behavioral patterns/paths
Impact
• Move beyond “single-touch” attribution to optimize marketing spend 3-10%
Retail Data Sets Analyzed
Identify customer behavioral patterns & marketing attribution
Campaigns
Search Terms
userID tweet time
15682817 I love shoes… 12:00 PM
16816193 30% off is great…
1:45 PM
19825996 Poor service.. 3:00 PM
15528047 Store closed… 12:20 PM
Social Media
IPAddress page time
192.168.20.14 http://... 12:00 PM
172.16.254.1 http://... 1:45 PM
216.27.61.137 http://... 3:00 PM
194.66.82.11 http://... 4:20 PM
Website Visits
IPAddres Referrer time
192.168.20.14 Google 1:00 PM
172.16.254.1 Bing 1:45 PM
216.27.61.137 None 3:00 PM
194.66.82.11 Google 4:20 PM
custID open click date
10001 Y Y 1/3/12
50001 Y N 1/3/12
40001 N N 1/3/12
50001 Y N 1/3/12
Multi-Channel Customer Interactions
In-Store
Point of Sale
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 18
Single Channel Pathing Analysis
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 19
Pattern and Path Analysis in MapReduce Aster nPath Module
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 20
Analyzing Multi-channel Identifies MPI Signal
Big Data in Telematics
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 22
Example: Telematics Identifying Driving Patterns with Time Series Data
At least five top 10 personal auto insurers and 4 of
the top 10 commercial auto insurers have
implemented programs to insureds implemented
in at least one state. Towers-Watson, 2011
Progressive leads rollout with 39 active states. 2011
Telematics is projected to grow at an annual rate of 22.2%
through 2017. iSuppli, 2011
…usage-based insurance offerings have
quietly caught on and now insurers and
service providers are betting on growth.
Insurance & Technology, 2011
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 23
Example: Telematics Identifying Driving Patterns with Time Series Data
JH4NA1157MTOO1832||08:01:00 120711||6373||33.1||-0.008 -0.002… 1FALP62W4WH128703||8:01:00 120711||14378||13.0||-0.003 +0.130… 1G1FP22PXS21-00001||08:01:00 120711||6531||45.8||0.02-0.003||… JH4NA1157MTOO1832||8:01:10 1208011||98323||81.5||+0.21 +0.033… 1FALP62W4WH128703||8:01:10 1208011||176323||61.0||+0.17 -0.002… 1G1FP22PXS2100001||8:01:10 120811||15643||22.4||-0.09 -0.001… WVWAF93D058000675||8:01:10 120811||3738||45.3||+0.34 -0.111… WVGBC77L34D064567||8:01:10 120811||2345||22.4||-0.10 -0.01… TRUWT28N411036790||8:01:10 120811||6764||85.0||+0.40 +0.12… JH4KB2F56BC000000||8:01:10 120811||12345||43.1||-0.23 – 0.10… 1G4GA5EC7BF000000||8:01:10 120811||65432||22.4||+0.23 +0.13… 1G6DA5EY3B000000||8:01:10 120811||100322||10.1||+0.10 -0.32…
…
JH4NA1157MTOO1832||08:01:01120711||6378||41.1||+0.21 +0.033… 1FALP62W4WH128703||8:01:01 120711||14379||23.0||+0.17; -0.002… 1G1FP22PXS21-00001||08:01:01 120711||6532||39.8||-0.09; -0.001… JH4NA1157MTOO1832||8:01:01 1208011||98327||90.5||+0.30 +0.023… 1FALP62W4WH128703||8:01:01 1208011||176325||62.0||+0.18 -0.001… 1G1FP22PXS2100001||8:01:01 120811||15644||11.4||-0.10 -0.002… WVWAF93D058000675||8:01:01 120811||3740||25.3||-0.14 -0.01… WVGBC77L34D064567||8:01:01 120811||2346||24.4||+0.01 -0.02… TRUWT28N411036790||8:01:01 120811||6769||75.0||-0.01 +0.02… JH4KB2F56BC000000||8:01:11 120811||12346||41.1||-0.19 – 0.11… 1G4GA5EC7BF000000||8:01:11 120811||65433||21.4||+0.03 +0.03… 1G6DA5EY3B000000||8:01:11 120811||100322||11.1||+0.11 -0.01…
…
Business Challenge
• Identify aggressive driving behaviors
• Create expanded risk segmentation to match driving patterns with pricing
• Provide customers with risk messaging to improve driving behavior
Big Data Challenge
• Telematics data is semi-structured and voluminous
• Patterns vary by individual and span multiple time periods
• Data capture can vary across programs
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 24
Example: Telematics Identifying Driving Patterns with Time Series Data
VIN Model Accelerometer 1st Reading
Time 1 Accelerometer 2nd Reading
Time 2
JH4NA1157MTOO1832 BMW 328i
-0.008; -0.002 8:01:00 12/7/11
+0.21; +0.033 8:01:10 12/7/11
Convert to nPath via SQL-MapReduce functions
Accelerometer 3rd Reading
Time 3 Accelerometer 4th Reading
Time 4
+0.044; +0.010 8:01:20 12/7/11 -0.10; -0.042 8:01:30 12/7/11
Accelerometer 5thrd Reading
Time 5 Accelerometer 6th Reading
Time 6
-0.041; +0.010 8:01:40 12/7/11 -0.10; -0.013 8:01:50 12/7/11
Sudden Fast Deceleration Fast Acceleration
VIN Model Accelerometer Time …
JH4NA1157MTOO1832 BMW 328i
-0.008; -0.002 8:01:00 12/7/11
…
1FALP62W4WH128703 Toyota Camry
+0.015; -0.003 8:01:00 12/7/11
…
1G1FP22PXS21-00001 VW Passat
-0.02; -0.003 8:01:00 12/7/11
…
VIN Model Accelerometer Time …
JH4NA1157MTOO1832 BMW 328i
+0.21; +0.033 8:01:10 12/8/11
…
1FALP62W4WH128703 Toyota Camry
+0.17; -0.002 8:01:10 12/8/11
…
1G1FP22PXS21-00001 VW Passat
-0.09; -0.001 8:01:10 12/8/11
… Y Axis:
Accelerations/
decelerations to
the left or right,
e.g., turning
X Axis: Forward/
backwards
acceleration/
deceleration With Teradata Aster
• Pattern matching to identify premium costs and risk messaging based on driving attributes
• Comparisons by individual VIN, across class of vehicles, by garaging location, etc.
Impact
• Create right pricing for the right customer driver score/variables
• Underwriting predictability
• Provide deeper analytics to create a carrier’s secret sauce
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 25
Example: Telematics
Visualization of Excessive Driving Events by OEM and Model
Score based on scale of 1.0-5.0
Threshold of 3.2 signifies risky
driving patters BMW drivers show the riskiest
driving as well as some VW and Toyota models
Big Data in Auto/Industrial
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 27
Challenge
• Predict the physical condition of operational assets (equipment, machinery, vehicles, aircraft, etc…)
With Teradata Aster
• Analyze the “global” current state reliability conditions and predict potential asset failures
Impact
• Reduce/eliminate downtime cost, increase profitability through run-time longevity and/or yield, and improve quality/safety.
Auto/Industrial: Condition-Based Maintenance
Trillions of pieces of event driven/diagnostic data aggregated from component to asset to groups of
assets by location/enterprise.
Big Data Analytics
“Big Data?? - The average number of components within an Auto/Industrial asset range anywhere from 10K to 1M plus.”
“Big Data?? - Boeing’s new 787 Dreamliner is expected to produce/transmit over one Terabyte of diagnostic data per aircraft/flight.”
Components
Systems
Modules Assets
Location
Assets
Enterprise
Assets
Sensors, PLCs, Meters, Telematics Big Data Analytic Platform
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 28
Example: Aerospace CBM Analytics (1 of 3)
Predictive Failure Analysis Leveraging Telematics Data
Scenario
• Commercial airlines condition-based maintenance leveraging telematics and predictive analytics.
Maintenance Controller
• Identifies two CBM alerts via flight/tail monitoring dashboard (MSP flight is Grounded; PIT flight requires investigation for “Engine” Alert).
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 29
Example: Aerospace CBM Analytics (2 of 3)
Predictive Failure Analysis Leveraging Telematics Data
Which Engine?
• Analytic drill-down identifies left outside engine and specifically a problem with module 4 of this engine.
Additional drill-down provides complete maintenance history of engine and modules.
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 30
Engine/Module Predictive Failure Analysis
• Oil Temperature (Increasing), Oil Pressure (Decreasing), Vibration (Increasing) However - All metrics are still within their upper/lower control limits??
Example: Aerospace CBM Analytics (3 of 3)
Predictive Failure Analysis Leveraging Telematics Data
Engine Module 4 has not
reached 80% of it’s planned
maintenance interval.
However the predictive analysis
shows vibration has crossed a
threshold prior to reaching 80%
of the Planned Maintenance
Interval.
Resulting in a potential failure
before the Planned
Maintenance Cycle.
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 31
Summary: Big Data requires new Analytics
Extract Value From New and Existing Data with massively parallel big data management and analytics
Analyze both relational & non-relational data
2
New, High-Value Analytics Beyond SQL, patented SQL-MapReduce Framework, pre-built analytics
Fast & easy analytics at scale
1
Increase Agility & Analyst Productivity with easy to scale, easy to build advanced analytics, easy for business users
Useable by any SQL-savvy analyst or BI toolset
3
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 33
Simple Word count with MapReduce
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 34
Simple Word count with MapReduce
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 35
What is MapReduce?
• A parallel programming framework
- Made popular by Google
• Generate search indexes
• Web scoring algorithms
- C++, Java, Python, etc.
- Harness 1000s of CPUs
• MapReduce provides - Automatic parallelization
- Fault tolerance
- Monitoring & status updates
• Hadoop
1. MapReduce (Analytics)
2. Hadoop Distributed File System (HDFS)
Scheduler
Results
Map Function
map
reduce
shuffle
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 36
3/1/2012
Big Data Architecture Positioning
Batch Interactive Active
Ingest, Transform, Archive
~5 concurrent users
Analyze and Execute
~100++ concurrent users
Discover and Explore
~25 concurrent users
• Fast data loading • ELT/ETL • Image processing • Online archival
Hadoop
• Ad-Hoc/OLAP
• Predictive Analytics
• Spatial/Temporal
• Active Execution
Teradata
Engineers Data Scientists Quants Business Analysts
Aster
• Path/Pattern Analysis
• Graph Analysis
• Multi-structured data
• SQL MapReduce
Aster
• Fast data loading • ELT/ETL • Online archival
Confidential and proprietary. Copyright © 2011 Teradata Corporation. 37
Business Analyst
BI Tools
Teradata IDW
Aster Discovery Platform
Discovery
Enterprise Discovery Architecture
ETL
ETL Data Sources
Structured Data
Multi-Structured Data
Non relational Data
OLTP DBMS’s
SAS Analyst
SAS In-DB Modeling
Users
Data Scientist
Fraud Discovery
Customer Discovery
Business Insight
Discovery
Discovery Apps
R In-DB
R Analyst