big data
TRANSCRIPT
The Three (or Four) Questions
1
2
3
What is Big Data and What does it have to do with my IT?
How can Big Data deliver value?
How do I implement it?
2 Devashish Khatwani
V Size of the data being processed. It can run up to Petabytes of data for companies such as Google and Facebook
V Speed at which the data is generated. To give you a perspective more than 90% of the data generated in human history was generated in the last two years. Banks can relate it with the # of credit card transactions happening per minute
Types of data which encompass the dataset you need to process. It could be structured data coming from databases or could be unstructured data coming from tweets about your company or could be semi structured such as the one coming from an online feedback form
Uncertainty of the data, you can think of it has a partially filled feedback form or a tweet with hashtags such as #YOLO
V V VOLUME VELOCITY VARIETY VERACITY
Huge amount of Data is not Big Data, Big Data is defined by
four key attributes
4 Devashish Khatwani
The leaders in Big Data implementation are moving away
from the traditional technology stack
5
Monolithic Commodity
Hardware
Centralized Relational
Database
Queries (SQL)
Distributed Commodity
Hardware
Hadoop Parallel
Relational
Database
No SQL
Database
Relational
Database
Monolithic Commodity
Hardware
Interactive
Query
Real Time
Query Map Reduce
Data Visualization Tools
Traditional Technology Stack
Big Data Technology Stack
Data Storage and Management
Data Processing
Data Analysis and Presentation
1 2 3
1
2
3
Devashish Khatwani
Migrating to a Big Data Technology Stack has to be gradual
so that regular reporting is not hindered
6
Monolithic Commodity
Hardware
Centralized Relational
Database
Queries (SQL)
Data Generation
Source
ETL Process
Distributed Commodity
Hardware
Parallel
Relational
Database
No SQL
Database
Relational
Database
Monolithic Commodity
Hardware
Interactive
Query
Real Time
Query Map Reduce
Data Visualization Tools
Hadoop
Data Generation
Source
Regular Reports
ETL Process
Export to the existing Database
Regular Reports
Devashish Khatwani
Technology is just an enabler, in order to make money you
need to analyse your Data
8
Models to present History- This is similar to the reporting that most companies have today with an added layer of drill down queries and advanced scorecard reporting
1
Models to explain the history – This would be analysis such as segmentation analysis, sensitivity analysis etc. which can be used to inform future decisions or to analyse the efficacy of past decisions
2
Models to predict the future – This maturity level entails advanced statistical models such as predictive analytics, simulations, optimization and machine learning
3
• Use Big Data to build models
which may predict the future
• Use historical data and/or
experiments to validate the
models
• Make business decisions based
on these models to get your ROI
Incr
easi
ng
Co
mp
lexi
ty
and
M
atu
rity
Devashish Khatwani
Big Data Use Cases: Insurance Industry can use Big Data for increasing
customer loyalty, actuarial risk management and increase the efficiency of claims
function
9
Increasing Customer Loyalty: By running a real time sentiment analysis on various social media platforms, emails , chats, website etc. Insurance providers can develop custom response approaches to known behaviour patterns. For instance by analyzing the website activity of the users and correlating it with the subsequent calls made to the call centre the insurance provider can predict the nature of the incoming call and develop a custom response to the situation.
1
Acturial Risk Management: Current auto insurance premiums are based on credit scores of individuals, demographic variables and vehicle classifications. Insurance companies can extract driving patterns through the use of telematics and use this data to offer better premiums to its customers. This will not only help the insurance provider to finely segment the customer base but also alleviate the problem of moral hazard
2
Increasing the efficiency of Claims function: With the text analysis of previous claims, a claims officer is able to cross reference similar claims more quickly and can speed up the claims process. Text analysis of various claims can reveal patterns and combining them with demographic and behavioral data can generate cohorts which the insurance provider can use to avoid frauds
3
Devashish Khatwani
Big Data Use Cases: Telecom Industry can use Big Data for better
segmentation, optimize capacity planning and optimizing promotional spend
10
Better Segmentation: Usually segmentation is based on demographic, geographic, behavioral and psychographic attributes but with Big Data a telecom provider can start micro-segmenting with adding on a layer of: • Activity based data( website tracking, purchase history, call centre data, mobile usage data, response to
incentives) • Social Network Profile • Social Influence and sentiment data
1
Optimize Capacity Planning: Network capacities, workforce capacity etc. can be optimized based on analysis of historical data. For instance instead of using aggregated time series forecasting for the number of calls received by the call centres , the telecom provider can use time series forecasting at customer segment level and then aggregate the forecast to generate a more accurate prediction
2
Optimizing Promotional Spend: The ROI of each email campaign, social media campaign etc. can be calculated and an optimized mix of promotion methods can be generated. Furthermore factors such as date, time, text of the campaign can be analyzed for finding out the best combination through experiments
3
Devashish Khatwani
Establishing a Centre of Excellence for Big Data implementation is
the fastest way to Big Data Success
12
Corporate
Business Unit COE
Big Data Project
Option 1: Internal Consulting Option 2: Centralized Option 3: Centre of Excellence
Corporate
Business Unit COE
Big Data Project
Corporate
Business Unit COE
Big Data Project
Analytics team
This setup treats the COE as internal team of experts which can be called upon by the Business Unit for projects. The onus of initiating a Big Data project lies with the Business Unit. The COE can be treated either as a cost centre or a profit centre under this structure
Under this setup COE identifies and executes the Big Data initiative with support from the business unit. The onus of identifying a viable Big Data initiative rests with the COE. The resources in the COE must be well versed with Business Unit’s business for this structure to be effective
Under this structure the COE is a small organization with very specialized Big Data skills and the Business Unit itself is well versed with basic analytics capabilities. This structure works well for organization who have historically be analytically savvy and have taken data driven decisions in the past
Devashish Khatwani
13
Big Data Initiatives need to be championed by CXOs in order for
them to have maximum impact
Source: LEAP Study 2014 by AT Kearney and Carnegie Melon University
Devashish Khatwani
You need seven types of people for
your Big Data Initiative
14
1
5 6
7
3
2 4
1
2
3
4
5
6
7
Executive Leader
Project Manager
Internal Trainer
External Liaison
Data Technologist
Data Scientist
Data Analyst
CORE
Devashish Khatwani
The Data Scientist is the person who will have the statistical know how of coding and performing statistical analysis such as clustering, predictive analytics, Sentiment Analysis, Machine learning etc. He is the most important link of converting data to actionable insights. Some of the skills that a data scientist should posses are: 1. Programming experience in
Python, Java, R and SQL 2. Knowledge of data mining,
machine learning and statistical methods
3. Experience working with relational databases
The Data Analyst is responsible for brainstorming for different models which need to be studied and statistically validated by the Data Scientist . He is also responsible for calculating the dollar impact of actions taken based on big data insights Some of the skills that a data analyst has to be familiar with is basic level statistics and sound understanding of product/function for which the Big Data initiative is being run. For instance if you are running a sentiment analysis on Social Media then the business analyst should be an expert in social media marketing
The three people who form the core of your Big Data Initiative:
Data Technologist, Data Scientist and Data Analyst
15
The Data Technologist is responsible for identifying the data sources of the organization and should be able to work on different aspects of data management such as data 1. Data Governance 2. Data Architecture 3. Data Quality 4. Data Security 5. Data Warehousing 6. Data Availability
Data Technologist Data Scientist Data Analyst
Devashish Khatwani
16
Prototyping and then developing a repeatable solution is the best
way for extracting value from Big Data
Source: LEAP Study 2014 by AT Kearney and Carnegie Melon University
Devashish Khatwani
17
Getting Started on your Big Data journey
Source: Deloitte, Big Data An Insurance business imperative
Devashish Khatwani
Devashish Khatwani
Thank You
Devashish Khatwani B Tech – Electrical Engineering, IIT Roorkee MBA – Rotman School of Management [email protected]
18