acm sf chapter presents 20091110l
TRANSCRIPT
-
7/29/2019 ACM SF Chapter Presents 20091110l
1/44
THE DATA MINING AUTOMATION COMPANY TM
Implicit Social Networks and
their Use in Predictive ModelingKhosrow Hassibi, Ph.D.
November 10, 2009
Presented @ SFBAY ACM Chapter Meeting
-
7/29/2019 ACM SF Chapter Presents 20091110l
2/44
THE DATA MINING AUTOMATION COMPANY TM
Agenda*
Evolution of Predictive Modeling Applications
Typical Attributes
Implicit Social Networks
Social Attributes (Focus on Telco)
Business Uses
Challenges Generalization
This presentation is based on the work & contribution
of many people at KXEN.
-
7/29/2019 ACM SF Chapter Presents 20091110l
3/44
THE DATA MINING AUTOMATION COMPANY TM
EVOLUTION OF PREDICTIVEMODELING APPLICATIONS
-
7/29/2019 ACM SF Chapter Presents 20091110l
4/44
THE DATA MINING AUTOMATION COMPANY TM
Evolution of Predictive Modeling Applications
Started to get attention in late 1980s and early 1990s
-Only pioneers
Predicting individual customer behavior B2C applications
With focus on financial applications
Score-based solutions based on
- Machine Learning (Neural Networks, )
AND- Traditional statistical modeling techniques
Risk, Fraud, Attrition (Churn), Targeted Marketing,
-
7/29/2019 ACM SF Chapter Presents 20091110l
5/44
THE DATA MINING AUTOMATION COMPANY TM
Evolution of Predictive Modeling Applications 2
By late 1990s, these applications became more popular and got
more industry traction- Also in Telco, Retail, and eCommerce
Predictive Analytics market & tools significantly improved
Nature of applications
- Offline/Batch:
Examples: Targeted marketing, Churn/Attrition
- Real-time Transaction-based:
Credit card Fraud, Web
-
7/29/2019 ACM SF Chapter Presents 20091110l
6/44
THE DATA MINING AUTOMATION COMPANY TM
TYPICAL ATTRIBUTES
-
7/29/2019 ACM SF Chapter Presents 20091110l
7/44
THE DATA MINING AUTOMATION COMPANY TM
Typical Data Streams
Data Volume, Variety, Velocity, and Validity is
increasing
Extreme granular data available
- Many data sources
For a typical customer Static: Customer demographics, psychographics,
Geographical,
Time Series transactional data (Monetary/non-Monetary):
Financial: ATM, checking, savings, credit cards, bill
pay, email, brokerage, campaign, call center, web,
payments,
Telco mobile: Calls, SMS, MMS, data, payments,campaigns, call center, downloads, browsing,
-
7/29/2019 ACM SF Chapter Presents 20091110l
8/44
THE DATA MINING AUTOMATION COMPANY TM
Example Raw Credit Card Transaction Data
NAME DESCRIPTION
TRANS_ACCOUNT Credit card account
TRANS_DATE Transaction date
TRANS_TIME Transaction time
TRANS_AMOUNT Transaction amount
TRANS_TYPE
Type of transaction - M for Merchandise and C
forCash,
TRANS_CRED_LINE Credit line at the time of transaction
TRANS_ITEM_CODE
Standard Industry Code used for item
category description by Visa and MasterCard
TRANS_ZIP5_CODE Five digit US zip code
TRANS_COUNTRY ISO Country code
TRANS_KEY_SWIPE S for Swiped or and K for Keyed,
Many more fields
-
7/29/2019 ACM SF Chapter Presents 20091110l
9/44
THE DATA MINING AUTOMATION COMPANY TM
Customer Behavioral Attributes
Time series transactional data is flattened for
modeling use- Resulting in larger number of attributes to be derived/stored/analyzed
Grows exponentially considering Time series, segments, interactions,
Focused on the entity to be modeled- A customer, a household, an account, a connection,
Behavioral variables most often capture past behaviorof the entity in isolation:
- Short, medium, and longer term activity
-
7/29/2019 ACM SF Chapter Presents 20091110l
10/44
THE DATA MINING AUTOMATION COMPANY TM
Customer Behavioral Attributes 2
More complex/customized approaches:
Capture the entity behavior within its peer group/segment computed based on oneor a variety of other attributes including interaction directly/indirectly with other
entities
Segmented modeling
Example:
Static attributes: Age, location, hobbies, Financial segment,
Behavioral attributes :
Number of non-cash credit card transactions in last 5 minutes
Total purchases in last two month on discretionary items
Deviation of average spend compared to average luxury traveler
Number of SMS sent/received received last 30 days
Typical customer behavior attributes range is in 100s
-
7/29/2019 ACM SF Chapter Presents 20091110l
11/44
THE DATA MINING AUTOMATION COMPANY TM
Customer Model Record
The Customer Model Record, is a set of all attributes thatis used for modeling (could be database views).
ID FIELDS BEHAVIORALATTRIBUTES
STATIC ATTRIBUTES MODELSSCORES
CONTACT HISTORY
Model Description
Model ID
Model NameModel Date
Model Scores
Model ID (FK)
Individual ID (FK)
Score
Individuals in Household
Individual ID
Household ID (FK)Individual attribute fieldsPrimary Householder Flag
Campaign Type Ref
Campaign ID
DescriptionStart Date
Contact History Detail
Campaign ID (FK)Individual ID (FK)
Promotion CodeTreatment CodeContact Date
Contact History Summary
Individual ID (FK)
Number of Mail ContactsNumber of eMail Contacts
Number of Telephone Contacts
Household
Household ID
Street AddressCityStateZip
PhoneDemographics Lifestage Lifestage
Individual ID (FK)
Demographic Data Fields
Accounts
Account No
Account Information
Transactions
Transaction IDIndividual ID (FK)
Account No (FK)
Transaction Details
Transaction Timestamp
Products
Product ID
Product Details
Purchases
Transaction IDIndividual ID (FK)
Product ID (FK)
Purchase Details
CustomerModelRecord
CustomerData
Warehouse
-
7/29/2019 ACM SF Chapter Presents 20091110l
12/44
THE DATA MINING AUTOMATION COMPANY TM
IMPLICIT SOCIAL NETWORKS
-
7/29/2019 ACM SF Chapter Presents 20091110l
13/44
THE DATA MINING AUTOMATION COMPANY TM
Social Networks
Social network analysis (related to network theory*) has emerged
as a key technique in modern sociology.- It has received more attention since popularity of online social networks
A social network is a social structure made of individuals (ororganizations) called "nodes," which are tied (connected) by one
or more specific types of interdependency, such as friendship,kinship, financial exchange, dislike, sexual relationships, orrelationships of beliefs, knowledge or prestige.
*Network theory concerns itself with the study of graphs as a representation of either symmetric relations or,more generally, of asymmetric relations between discrete objects.
Source: Wikipedia
-
7/29/2019 ACM SF Chapter Presents 20091110l
14/44
THE DATA MINING AUTOMATION COMPANY TM
Implicit Social Networks: Telco Call Detail Data
Main fields:
-A number (Origination)
- B number (Destination)
- Start Datetime
- Duration
- Call category
Voice, SMS, MMS,
Many other information:- rate information
- the result of the call (whether it was answered, busy etc)
- Revenue
- the number charged for the call
- additional digits on the B number used to correctly route or charge the call
- the route by which the call entered the exchange
- the route by which the call left the exchange
- any fault condition encountered
- any facilities used during the call, such as call waiting or call diversion,
A B
-
7/29/2019 ACM SF Chapter Presents 20091110l
15/44
THE DATA MINING AUTOMATION COMPANY TM
Implicit Social Network 2
Many implicit social networks are hidden in thecall data
They can be extracted by defining the relationshipbased on:- Call categories
Voice Network (A calls B on voice)
SMS Network (A sends a SMS to B)
MMS Network (A sends a MMS to B)
All services Network (A calls or sends SMS or MMS to B)
- Intensity
At least 3 communications
Duration at least 1 minute
- Time Period
Call took place during this month, last 60days, - Direction of communication
Directed or Un-directed
- Other categories
Rate, roaming, revenue,
-
7/29/2019 ACM SF Chapter Presents 20091110l
16/44
THE DATA MINING AUTOMATION COMPANY TM
Example Relationships
4 relationships defined
4 graphs (networks) can be extracted
A. Any voice call in between two nodes with direction
B. At least 3 voice calls in between the nodes with direction
C. At least 2 SMS in between nodes (direction does not matter)
D. Any MMS in between nodes with direction
-
7/29/2019 ACM SF Chapter Presents 20091110l
17/44
THE DATA MINING AUTOMATION COMPANY TM
Example Extracted Graphs
A
D
B
C
-
7/29/2019 ACM SF Chapter Presents 20091110l
18/44
THE DATA MINING AUTOMATION COMPANY TM
Historical Uses in Telco & Financial Industries
Mainly link analysis
- Security (Telco, Financial)
- Fraud (Telco, Financial)
Finding fraud rings
Establishing evidence
Point of compromise (counterfeit fraud)
-Money laundering (Financial)
Exploration
Investigation
-
7/29/2019 ACM SF Chapter Presents 20091110l
19/44
THE DATA MINING AUTOMATION COMPANY TM
New Marketing Applications
Interactions of customers impact their decision making
Example- An influential customer leaves, purchases a product,
- After some time, some in his/her micro-community may follow
A customer behavior/decision could exploit network structure
to propagate (Diffusion)
Understanding the implicit community structure in Telco dataallows:
- To better understand customer behaviors
- To better target customers in their
micro-communities
- Identify opportunity/risk of propagation of
nodes behavior within its micro-community
-
7/29/2019 ACM SF Chapter Presents 20091110l
20/44
THE DATA MINING AUTOMATION COMPANY TM
SOCIAL ATTRIBUTES
-
7/29/2019 ACM SF Chapter Presents 20091110l
21/44
THE DATA MINING AUTOMATION COMPANY TM
Circle
analysis
Connection
analysis
Community
analysis
Influence
analysis
Types of Attributes
Centrality
analysis
-
7/29/2019 ACM SF Chapter Presents 20091110l
22/44
THE DATA MINING AUTOMATION COMPANY TM
Example Social Attributes
Circle analysis (Level I Analysis)
-Neighbors (Links in or out)
- Off-net/on-net ratios per node
- Role (Sink, Source, Repeater)
Connection analysis (Profiling based on neighbors) Profile on continuous extra attributes
average of age of neighbors
Profile on discrete extra attributes
Example: ratio of churners in the first circle nodes
Ratio of those with iPhones in the first circle nodes
Centrality analysis- Number of triangles
- Betweenness
-
-
7/29/2019 ACM SF Chapter Presents 20091110l
23/44
THE DATA MINING AUTOMATION COMPANY TM
Social Attributes 2
Community Analysis
Identify communities Roles in the community
Social, Passive, Social, Internal
Marginal, Outlier, Leader, Follower, Bridge
Influence Analysis
Identify influencers
- Analyze the propagation of a node influence
-
7/29/2019 ACM SF Chapter Presents 20091110l
24/44
THE DATA MINING AUTOMATION COMPANY TM
Example of Community Assignment
-
7/29/2019 ACM SF Chapter Presents 20091110l
25/44
THE DATA MINING AUTOMATION COMPANY TM
SN Attributes Computation Process
KSN
-
7/29/2019 ACM SF Chapter Presents 20091110l
26/44
THE DATA MINING AUTOMATION COMPANY TM
Example List
-
7/29/2019 ACM SF Chapter Presents 20091110l
27/44
THE DATA MINING AUTOMATION COMPANY TM
BUSINESS USES
-
7/29/2019 ACM SF Chapter Presents 20091110l
28/44
THE DATA MINING AUTOMATION COMPANY TM
Marketing Use of Implicit Social Networks
Traditional
Targeted
Marketing
(Customer Behavior)
Viral Marketing(Diffusion by
Influence)?
-
7/29/2019 ACM SF Chapter Presents 20091110l
29/44
THE DATA MINING AUTOMATION COMPANY TM
Marketing Use of Implicit Social Networks (2)
Traditional targeted
marketing
Targeted marketing +
SNA attributes Viral marketing
+ uses customer data:
demographic,
psychographic, Time
Series transactional
data (behavioral)
ignores social data
+ empirical
+ adds lift
+ uses customer data:
demographic,
psychographic, Time
Series transactional
data (behavioral)
+ uses social data
+ empirical
+ adds incremental lift
over customer data
alone
Mostly ignores
customer data
+ uses social network
data
+ Targets only
influencers,
Identifying influencers
is challenging &
domain specific
Effectiveness?
-
7/29/2019 ACM SF Chapter Presents 20091110l
30/44
THE DATA MINING AUTOMATION COMPANY TM
1. Viral Marketing
The goal is to identify individuals with high Social Networking
Potential (SNP) and create Viral Messages that appeal to thissegment of the population and have a high probability of beingpassed along.
- To identify opportunity (or risk) of propagation of customers
behavior/decision within his or her community.
In its crude form
- Not empirical
- Influencer models?
Using detailed network data
- Empirical (churn Influence score, churn pressure score, )
- Experiments can be conducted
- Ignores detailed customers behavior
-
7/29/2019 ACM SF Chapter Presents 20091110l
31/44
THE DATA MINING AUTOMATION COMPANY TM
2. Targeted Marketing Enhanced by SNA Data
Use traditional targeted marketing using predictive modeling
-Customer behavior data
- Demographics,
Enhance by implicit social network data
-
7/29/2019 ACM SF Chapter Presents 20091110l
32/44
THE DATA MINING AUTOMATION COMPANY TM
2. Targeted Marketing Enhanced by SNA Data 2
Improved targeting
improved lift
Better understanding ofcustomer
Better messaging0
0,2
0,4
0,6
0,8
1
0% 10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Random Wizard Model
A
B
C
D
-
7/29/2019 ACM SF Chapter Presents 20091110l
33/44
THE DATA MINING AUTOMATION COMPANY TM
3. Graph (Node) Pairing
Identify a node based on its
connections characteristics in thenetwork
Business use:
- Rotational churn
Wrongly inflates the marketing campaign
success (Profit)
Wrongly inflates churn rate (Loss)
- Identifying fraudsters
-
7/29/2019 ACM SF Chapter Presents 20091110l
34/44
THE DATA MINING AUTOMATION COMPANY TM
Case Study 1 Predictive Modeling (KXEN)
Wireless/cable/media service provider.COMPANY
Relationship data hidden in call detail records is not leveraged.SITUATION
CHALLENGE
Augment the hundreds of typical behavioral attributes with hundreds of derivedsocial network attributes and measure the impact on lift for selected testcampaigns in acquisition and cross sell/up sell.
EXPERIMENT
Preliminary results positive. For a subset of test campaigns, lift could beimproved anywhere from 9% to 13%. Target overlap was anywhere from 65% to90%. Further experimentation is due.
RESULTS
With 7.5 million wireless subscribers and tens of millions of call detail recordsdaily, deriving and analyzing social network attributes is a monumental task.
-
7/29/2019 ACM SF Chapter Presents 20091110l
35/44
THE DATA MINING AUTOMATION COMPANY TM
Case Study 2 BI Use (Other)
Wireless service provider.COMPANY
SN data hidden in call detail records is not leveraged.SITUATION
CHALLENGE
Extract communities from implicit social network in the call data and analyzeEXPERIMENT
700,000 communities derived . Average of 6 people in a community. Theyidentified key people in ethnic groups and offered a phone to text in thatlanguage. Phones were immediately sold.
RESULTS
4.2 million SIM-card customers
-
7/29/2019 ACM SF Chapter Presents 20091110l
36/44
THE DATA MINING AUTOMATION COMPANY TM
Studies in Progress
1. Wireless operator (16.5 M nodes)
-Uplift on churn models
- Finding influencers
- Detect rotational churn
2. Fixed Line operator (7 M nodes)
-Uplift on churn models
- Detect communities and roles
- Detect propagation of churn
3. Wireless operator (90 M nodes)
- Scalability test (handling 30M subscribers with 90M nodes)
- Uplift on churn
-
7/29/2019 ACM SF Chapter Presents 20091110l
37/44
THE DATA MINING AUTOMATION COMPANY TM
CHALLENGES
-
7/29/2019 ACM SF Chapter Presents 20091110l
38/44
THE DATA MINING AUTOMATION COMPANY TM
Challenges
Call detail data is massive
Random sampling on nodes can not be used
Scalability is crucial
- Algorithms must be scalable
-
7/29/2019 ACM SF Chapter Presents 20091110l
39/44
THE DATA MINING AUTOMATION COMPANY TM
Challenges 2
What is influence and ways to measure it
How much additional lift SNA can add to traditional churnmodels?
How can roles be defined or customized?
How much additional lift SNA can add to traditional marketingpropensity models?
Community detection in an arbitrary network is difficult
More studies on business value (ROI)
-
7/29/2019 ACM SF Chapter Presents 20091110l
40/44
THE DATA MINING AUTOMATION COMPANY TM
GENERALIZATION
-
7/29/2019 ACM SF Chapter Presents 20091110l
41/44
THE DATA MINING AUTOMATION COMPANY TM
Indirect Relations
Social networks could still be defined when there is an indirect
link between customers
Example:
- Retail: purchases: Customer Item relation
- Financial: credit /debit card data: Customer Merchant relation
-
7/29/2019 ACM SF Chapter Presents 20091110l
42/44
THE DATA MINING AUTOMATION COMPANY TM
Other Applications
Financial
-Credit/debit card data: Customer Merchant buying patterns
- Small/Medium business payment data
- Online payment data such as PayPal
- Join accounts
- Anti-money laundering, investigating money transfers
-Anti-credit card fraud, using Merchant-Buyer patterns
Retail
- Improved recommendations, using Consumer-Product buying patterns
- Online Buyer-Seller communities
Social network sites
- Improved content, ads, using Friend-Friend interactions
Emails, forums, blogs
- Improving customer understanding using interactions
-
7/29/2019 ACM SF Chapter Presents 20091110l
43/44
THE DATA MINING AUTOMATION COMPANY TM
Q/A
-
7/29/2019 ACM SF Chapter Presents 20091110l
44/44
One Prediction
Data providers and Bureaus
currently provide manyinformation on US households:
- Demographic
- Psychographic
- Interest
-Hobby
- Financial
- Risk (credit worthiness)
-
In the future, they could alsoprovide
- Social (relationship) metrics?