acm sf chapter presents 20091110l

Upload: extremo-trece

Post on 04-Apr-2018

216 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    1/44

    THE DATA MINING AUTOMATION COMPANY TM

    Implicit Social Networks and

    their Use in Predictive ModelingKhosrow Hassibi, Ph.D.

    November 10, 2009

    Presented @ SFBAY ACM Chapter Meeting

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    2/44

    THE DATA MINING AUTOMATION COMPANY TM

    Agenda*

    Evolution of Predictive Modeling Applications

    Typical Attributes

    Implicit Social Networks

    Social Attributes (Focus on Telco)

    Business Uses

    Challenges Generalization

    This presentation is based on the work & contribution

    of many people at KXEN.

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    3/44

    THE DATA MINING AUTOMATION COMPANY TM

    EVOLUTION OF PREDICTIVEMODELING APPLICATIONS

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    4/44

    THE DATA MINING AUTOMATION COMPANY TM

    Evolution of Predictive Modeling Applications

    Started to get attention in late 1980s and early 1990s

    -Only pioneers

    Predicting individual customer behavior B2C applications

    With focus on financial applications

    Score-based solutions based on

    - Machine Learning (Neural Networks, )

    AND- Traditional statistical modeling techniques

    Risk, Fraud, Attrition (Churn), Targeted Marketing,

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    5/44

    THE DATA MINING AUTOMATION COMPANY TM

    Evolution of Predictive Modeling Applications 2

    By late 1990s, these applications became more popular and got

    more industry traction- Also in Telco, Retail, and eCommerce

    Predictive Analytics market & tools significantly improved

    Nature of applications

    - Offline/Batch:

    Examples: Targeted marketing, Churn/Attrition

    - Real-time Transaction-based:

    Credit card Fraud, Web

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    6/44

    THE DATA MINING AUTOMATION COMPANY TM

    TYPICAL ATTRIBUTES

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    7/44

    THE DATA MINING AUTOMATION COMPANY TM

    Typical Data Streams

    Data Volume, Variety, Velocity, and Validity is

    increasing

    Extreme granular data available

    - Many data sources

    For a typical customer Static: Customer demographics, psychographics,

    Geographical,

    Time Series transactional data (Monetary/non-Monetary):

    Financial: ATM, checking, savings, credit cards, bill

    pay, email, brokerage, campaign, call center, web,

    payments,

    Telco mobile: Calls, SMS, MMS, data, payments,campaigns, call center, downloads, browsing,

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    8/44

    THE DATA MINING AUTOMATION COMPANY TM

    Example Raw Credit Card Transaction Data

    NAME DESCRIPTION

    TRANS_ACCOUNT Credit card account

    TRANS_DATE Transaction date

    TRANS_TIME Transaction time

    TRANS_AMOUNT Transaction amount

    TRANS_TYPE

    Type of transaction - M for Merchandise and C

    forCash,

    TRANS_CRED_LINE Credit line at the time of transaction

    TRANS_ITEM_CODE

    Standard Industry Code used for item

    category description by Visa and MasterCard

    TRANS_ZIP5_CODE Five digit US zip code

    TRANS_COUNTRY ISO Country code

    TRANS_KEY_SWIPE S for Swiped or and K for Keyed,

    Many more fields

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    9/44

    THE DATA MINING AUTOMATION COMPANY TM

    Customer Behavioral Attributes

    Time series transactional data is flattened for

    modeling use- Resulting in larger number of attributes to be derived/stored/analyzed

    Grows exponentially considering Time series, segments, interactions,

    Focused on the entity to be modeled- A customer, a household, an account, a connection,

    Behavioral variables most often capture past behaviorof the entity in isolation:

    - Short, medium, and longer term activity

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    10/44

    THE DATA MINING AUTOMATION COMPANY TM

    Customer Behavioral Attributes 2

    More complex/customized approaches:

    Capture the entity behavior within its peer group/segment computed based on oneor a variety of other attributes including interaction directly/indirectly with other

    entities

    Segmented modeling

    Example:

    Static attributes: Age, location, hobbies, Financial segment,

    Behavioral attributes :

    Number of non-cash credit card transactions in last 5 minutes

    Total purchases in last two month on discretionary items

    Deviation of average spend compared to average luxury traveler

    Number of SMS sent/received received last 30 days

    Typical customer behavior attributes range is in 100s

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    11/44

    THE DATA MINING AUTOMATION COMPANY TM

    Customer Model Record

    The Customer Model Record, is a set of all attributes thatis used for modeling (could be database views).

    ID FIELDS BEHAVIORALATTRIBUTES

    STATIC ATTRIBUTES MODELSSCORES

    CONTACT HISTORY

    Model Description

    Model ID

    Model NameModel Date

    Model Scores

    Model ID (FK)

    Individual ID (FK)

    Score

    Individuals in Household

    Individual ID

    Household ID (FK)Individual attribute fieldsPrimary Householder Flag

    Campaign Type Ref

    Campaign ID

    DescriptionStart Date

    Contact History Detail

    Campaign ID (FK)Individual ID (FK)

    Promotion CodeTreatment CodeContact Date

    Contact History Summary

    Individual ID (FK)

    Number of Mail ContactsNumber of eMail Contacts

    Number of Telephone Contacts

    Household

    Household ID

    Street AddressCityStateZip

    PhoneDemographics Lifestage Lifestage

    Individual ID (FK)

    Demographic Data Fields

    Accounts

    Account No

    Account Information

    Transactions

    Transaction IDIndividual ID (FK)

    Account No (FK)

    Transaction Details

    Transaction Timestamp

    Products

    Product ID

    Product Details

    Purchases

    Transaction IDIndividual ID (FK)

    Product ID (FK)

    Purchase Details

    CustomerModelRecord

    CustomerData

    Warehouse

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    12/44

    THE DATA MINING AUTOMATION COMPANY TM

    IMPLICIT SOCIAL NETWORKS

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    13/44

    THE DATA MINING AUTOMATION COMPANY TM

    Social Networks

    Social network analysis (related to network theory*) has emerged

    as a key technique in modern sociology.- It has received more attention since popularity of online social networks

    A social network is a social structure made of individuals (ororganizations) called "nodes," which are tied (connected) by one

    or more specific types of interdependency, such as friendship,kinship, financial exchange, dislike, sexual relationships, orrelationships of beliefs, knowledge or prestige.

    *Network theory concerns itself with the study of graphs as a representation of either symmetric relations or,more generally, of asymmetric relations between discrete objects.

    Source: Wikipedia

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    14/44

    THE DATA MINING AUTOMATION COMPANY TM

    Implicit Social Networks: Telco Call Detail Data

    Main fields:

    -A number (Origination)

    - B number (Destination)

    - Start Datetime

    - Duration

    - Call category

    Voice, SMS, MMS,

    Many other information:- rate information

    - the result of the call (whether it was answered, busy etc)

    - Revenue

    - the number charged for the call

    - additional digits on the B number used to correctly route or charge the call

    - the route by which the call entered the exchange

    - the route by which the call left the exchange

    - any fault condition encountered

    - any facilities used during the call, such as call waiting or call diversion,

    A B

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    15/44

    THE DATA MINING AUTOMATION COMPANY TM

    Implicit Social Network 2

    Many implicit social networks are hidden in thecall data

    They can be extracted by defining the relationshipbased on:- Call categories

    Voice Network (A calls B on voice)

    SMS Network (A sends a SMS to B)

    MMS Network (A sends a MMS to B)

    All services Network (A calls or sends SMS or MMS to B)

    - Intensity

    At least 3 communications

    Duration at least 1 minute

    - Time Period

    Call took place during this month, last 60days, - Direction of communication

    Directed or Un-directed

    - Other categories

    Rate, roaming, revenue,

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    16/44

    THE DATA MINING AUTOMATION COMPANY TM

    Example Relationships

    4 relationships defined

    4 graphs (networks) can be extracted

    A. Any voice call in between two nodes with direction

    B. At least 3 voice calls in between the nodes with direction

    C. At least 2 SMS in between nodes (direction does not matter)

    D. Any MMS in between nodes with direction

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    17/44

    THE DATA MINING AUTOMATION COMPANY TM

    Example Extracted Graphs

    A

    D

    B

    C

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    18/44

    THE DATA MINING AUTOMATION COMPANY TM

    Historical Uses in Telco & Financial Industries

    Mainly link analysis

    - Security (Telco, Financial)

    - Fraud (Telco, Financial)

    Finding fraud rings

    Establishing evidence

    Point of compromise (counterfeit fraud)

    -Money laundering (Financial)

    Exploration

    Investigation

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    19/44

    THE DATA MINING AUTOMATION COMPANY TM

    New Marketing Applications

    Interactions of customers impact their decision making

    Example- An influential customer leaves, purchases a product,

    - After some time, some in his/her micro-community may follow

    A customer behavior/decision could exploit network structure

    to propagate (Diffusion)

    Understanding the implicit community structure in Telco dataallows:

    - To better understand customer behaviors

    - To better target customers in their

    micro-communities

    - Identify opportunity/risk of propagation of

    nodes behavior within its micro-community

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    20/44

    THE DATA MINING AUTOMATION COMPANY TM

    SOCIAL ATTRIBUTES

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    21/44

    THE DATA MINING AUTOMATION COMPANY TM

    Circle

    analysis

    Connection

    analysis

    Community

    analysis

    Influence

    analysis

    Types of Attributes

    Centrality

    analysis

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    22/44

    THE DATA MINING AUTOMATION COMPANY TM

    Example Social Attributes

    Circle analysis (Level I Analysis)

    -Neighbors (Links in or out)

    - Off-net/on-net ratios per node

    - Role (Sink, Source, Repeater)

    Connection analysis (Profiling based on neighbors) Profile on continuous extra attributes

    average of age of neighbors

    Profile on discrete extra attributes

    Example: ratio of churners in the first circle nodes

    Ratio of those with iPhones in the first circle nodes

    Centrality analysis- Number of triangles

    - Betweenness

    -

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    23/44

    THE DATA MINING AUTOMATION COMPANY TM

    Social Attributes 2

    Community Analysis

    Identify communities Roles in the community

    Social, Passive, Social, Internal

    Marginal, Outlier, Leader, Follower, Bridge

    Influence Analysis

    Identify influencers

    - Analyze the propagation of a node influence

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    24/44

    THE DATA MINING AUTOMATION COMPANY TM

    Example of Community Assignment

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    25/44

    THE DATA MINING AUTOMATION COMPANY TM

    SN Attributes Computation Process

    KSN

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    26/44

    THE DATA MINING AUTOMATION COMPANY TM

    Example List

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    27/44

    THE DATA MINING AUTOMATION COMPANY TM

    BUSINESS USES

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    28/44

    THE DATA MINING AUTOMATION COMPANY TM

    Marketing Use of Implicit Social Networks

    Traditional

    Targeted

    Marketing

    (Customer Behavior)

    Viral Marketing(Diffusion by

    Influence)?

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    29/44

    THE DATA MINING AUTOMATION COMPANY TM

    Marketing Use of Implicit Social Networks (2)

    Traditional targeted

    marketing

    Targeted marketing +

    SNA attributes Viral marketing

    + uses customer data:

    demographic,

    psychographic, Time

    Series transactional

    data (behavioral)

    ignores social data

    + empirical

    + adds lift

    + uses customer data:

    demographic,

    psychographic, Time

    Series transactional

    data (behavioral)

    + uses social data

    + empirical

    + adds incremental lift

    over customer data

    alone

    Mostly ignores

    customer data

    + uses social network

    data

    + Targets only

    influencers,

    Identifying influencers

    is challenging &

    domain specific

    Effectiveness?

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    30/44

    THE DATA MINING AUTOMATION COMPANY TM

    1. Viral Marketing

    The goal is to identify individuals with high Social Networking

    Potential (SNP) and create Viral Messages that appeal to thissegment of the population and have a high probability of beingpassed along.

    - To identify opportunity (or risk) of propagation of customers

    behavior/decision within his or her community.

    In its crude form

    - Not empirical

    - Influencer models?

    Using detailed network data

    - Empirical (churn Influence score, churn pressure score, )

    - Experiments can be conducted

    - Ignores detailed customers behavior

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    31/44

    THE DATA MINING AUTOMATION COMPANY TM

    2. Targeted Marketing Enhanced by SNA Data

    Use traditional targeted marketing using predictive modeling

    -Customer behavior data

    - Demographics,

    Enhance by implicit social network data

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    32/44

    THE DATA MINING AUTOMATION COMPANY TM

    2. Targeted Marketing Enhanced by SNA Data 2

    Improved targeting

    improved lift

    Better understanding ofcustomer

    Better messaging0

    0,2

    0,4

    0,6

    0,8

    1

    0% 10%

    20%

    30%

    40%

    50%

    60%

    70%

    80%

    90%

    100%

    Random Wizard Model

    A

    B

    C

    D

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    33/44

    THE DATA MINING AUTOMATION COMPANY TM

    3. Graph (Node) Pairing

    Identify a node based on its

    connections characteristics in thenetwork

    Business use:

    - Rotational churn

    Wrongly inflates the marketing campaign

    success (Profit)

    Wrongly inflates churn rate (Loss)

    - Identifying fraudsters

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    34/44

    THE DATA MINING AUTOMATION COMPANY TM

    Case Study 1 Predictive Modeling (KXEN)

    Wireless/cable/media service provider.COMPANY

    Relationship data hidden in call detail records is not leveraged.SITUATION

    CHALLENGE

    Augment the hundreds of typical behavioral attributes with hundreds of derivedsocial network attributes and measure the impact on lift for selected testcampaigns in acquisition and cross sell/up sell.

    EXPERIMENT

    Preliminary results positive. For a subset of test campaigns, lift could beimproved anywhere from 9% to 13%. Target overlap was anywhere from 65% to90%. Further experimentation is due.

    RESULTS

    With 7.5 million wireless subscribers and tens of millions of call detail recordsdaily, deriving and analyzing social network attributes is a monumental task.

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    35/44

    THE DATA MINING AUTOMATION COMPANY TM

    Case Study 2 BI Use (Other)

    Wireless service provider.COMPANY

    SN data hidden in call detail records is not leveraged.SITUATION

    CHALLENGE

    Extract communities from implicit social network in the call data and analyzeEXPERIMENT

    700,000 communities derived . Average of 6 people in a community. Theyidentified key people in ethnic groups and offered a phone to text in thatlanguage. Phones were immediately sold.

    RESULTS

    4.2 million SIM-card customers

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    36/44

    THE DATA MINING AUTOMATION COMPANY TM

    Studies in Progress

    1. Wireless operator (16.5 M nodes)

    -Uplift on churn models

    - Finding influencers

    - Detect rotational churn

    2. Fixed Line operator (7 M nodes)

    -Uplift on churn models

    - Detect communities and roles

    - Detect propagation of churn

    3. Wireless operator (90 M nodes)

    - Scalability test (handling 30M subscribers with 90M nodes)

    - Uplift on churn

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    37/44

    THE DATA MINING AUTOMATION COMPANY TM

    CHALLENGES

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    38/44

    THE DATA MINING AUTOMATION COMPANY TM

    Challenges

    Call detail data is massive

    Random sampling on nodes can not be used

    Scalability is crucial

    - Algorithms must be scalable

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    39/44

    THE DATA MINING AUTOMATION COMPANY TM

    Challenges 2

    What is influence and ways to measure it

    How much additional lift SNA can add to traditional churnmodels?

    How can roles be defined or customized?

    How much additional lift SNA can add to traditional marketingpropensity models?

    Community detection in an arbitrary network is difficult

    More studies on business value (ROI)

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    40/44

    THE DATA MINING AUTOMATION COMPANY TM

    GENERALIZATION

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    41/44

    THE DATA MINING AUTOMATION COMPANY TM

    Indirect Relations

    Social networks could still be defined when there is an indirect

    link between customers

    Example:

    - Retail: purchases: Customer Item relation

    - Financial: credit /debit card data: Customer Merchant relation

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    42/44

    THE DATA MINING AUTOMATION COMPANY TM

    Other Applications

    Financial

    -Credit/debit card data: Customer Merchant buying patterns

    - Small/Medium business payment data

    - Online payment data such as PayPal

    - Join accounts

    - Anti-money laundering, investigating money transfers

    -Anti-credit card fraud, using Merchant-Buyer patterns

    Retail

    - Improved recommendations, using Consumer-Product buying patterns

    - Online Buyer-Seller communities

    Social network sites

    - Improved content, ads, using Friend-Friend interactions

    Emails, forums, blogs

    - Improving customer understanding using interactions

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    43/44

    THE DATA MINING AUTOMATION COMPANY TM

    Q/A

  • 7/29/2019 ACM SF Chapter Presents 20091110l

    44/44

    One Prediction

    Data providers and Bureaus

    currently provide manyinformation on US households:

    - Demographic

    - Psychographic

    - Interest

    -Hobby

    - Financial

    - Risk (credit worthiness)

    -

    In the future, they could alsoprovide

    - Social (relationship) metrics?