social networks in data mining s as talks

40
1 Copyright © 2012, SAS Institute Inc. All rights reserved. Social Networks in Data Mining: Challenges and Applications SAS Talks May 10, 2012 PLEASE STAND BY Today’s event will begin at 1:00pm EST. The audio portion of the presentation will be heard through your computer speakers. This is an automatic setup and is preferred. There will also be a limited option to listen through the telephone to 250 lines. If you would prefer to dial in, please call: US Toll-Free: 1-888-682-4285 Toll/International: +1-973-368-0695 Conference Code: 4675179# If you experience any technical difficulties, you may contact WebEx Technical Support at 866-229-3239. #sastalks

Upload: hery-sucahyono

Post on 05-Dec-2015

217 views

Category:

Documents


2 download

DESCRIPTION

sms

TRANSCRIPT

Page 1: Social Networks in Data Mining s as Talks

1

Copyright © 2012, SAS Institute Inc. All rights reserved.

Social Networks in Data Mining: Challenges and Applications SAS Talks May 10, 2012

PLEASE STAND BY

Today’s event will begin at 1:00pm EST.

The audio portion of the presentation will be heard through your computer speakers.

This is an automatic setup and is preferred. There will also be a limited option to listen

through the telephone to 250 lines.

If you would prefer to dial in, please call:

US Toll-Free: 1-888-682-4285

Toll/International: +1-973-368-0695

Conference Code: 4675179#

If you experience any technical difficulties,

you may contact WebEx Technical Support

at 866-229-3239.

#sastalks

Page 2: Social Networks in Data Mining s as Talks

Copyright © 2012, SAS Institute Inc. All rights reserved.

Social Networks in Data Mining: Challenges and Applications SAS Talks May 10, 2012

Page 3: Social Networks in Data Mining s as Talks

3

Copyright © 2012, SAS Institute Inc. All rights reserved.

Speakers

Stacy Hobson

Director, Customer Loyalty and Retention SAS Institute

Bart Baesens

Associate Professor, K.U. Leuven (Belgium)

Lecturer, University of Southampton (United Kingdom)

Page 4: Social Networks in Data Mining s as Talks

Social Networks in Data Mining: Challenges and Applications

Prof. dr. Bart Baesens1

Dr. Wouter Verbeke2

1,2Department of Decision Sciences and Information Management

K.U.Leuven (Belgium)

1Vlerick Leuven Ghent Management School (Belgium) 1School of Management University of Southampton (United Kingdom)

{Bart.Baesens;Wouter.Verbeke}@econ.kuleuven.be

Twitter: DataMiningApps

Facebook: Data Mining with Bart

Page 5: Social Networks in Data Mining s as Talks

My Research Team

process mining

business process management

data mining

(social) network analysis

incorporating domain knowledge

in classification models

customer churn prediction

data quality in a credit risk

management context

data quality and decision

making

data quality metrics

customer churn prediction

social network analysis

profit based data mining

credit risk modeling and scoring

rating transitions

microfinance

survival analysis

machine learning in software

engineering: software fault &

effort prediction

comprehens. decision suppor-

tive data modeling systems

[email protected]

[email protected]

[email protected]

[email protected]

[email protected] [email protected]

Page 6: Social Networks in Data Mining s as Talks

Overview

• Revisiting Traditional analytics

• Improving Traditional analytics

• Social networks and applications

• A three-layered social network learner

• Case study: social networks in Telco

– Markov assumption

– Local versus Network variables

– Featurization

– Empirical Findings

• Conclusions

6

Page 7: Social Networks in Data Mining s as Talks

Revisting Traditional Analytics

Page 8: Social Networks in Data Mining s as Talks

Traditional Analytics: Performance benchmarks

Page 9: Social Networks in Data Mining s as Talks

Improving Traditional Analytics: 2 strategies

• Strategy 1: Use complex modeling techniques

– E.g. neural networks, support vector machines, random forests, …

– Pro: powerful models (e.g. universal approximation)

– Con: loss of interpretability, marginal performance gains

• Strategy 2: Enrich your data

– External data (FICO score, bureau data, …)

– Social Network data!

– Pro: model still interpretable

– Con: additional resources needed (economic, computational)

9

Page 10: Social Networks in Data Mining s as Talks

Traditional Approach to Analytics

Page 11: Social Networks in Data Mining s as Talks

Social Networks: Nodes versus Edges

• Nodes

– Customer (private/professional), household/family, patient, doctor, paper, author, terrorist, Web page, …

• Edges

– Different kinds of relationships, e.g., colleagues, friends, patients, disease, contact, reference, …

– Weighted based on, e.g., interaction frequency, importance of information exchange, intimacy, emotional intensity, …

11

Page 12: Social Networks in Data Mining s as Talks

Example Social Network Applications

• Churn detection in a Telco setting – Nodes are customers

– Edges are calling patterns between customers (based on CDR data)

• System risk in a Credit Risk setting – Nodes are banks

– Edges are liquidity dependencies

• Anti-Money Laundering – Nodes are bank accounts

– Edges are money transfers

• Viral marketing – Nodes are customers

– Edges are messages

12

Page 13: Social Networks in Data Mining s as Talks

Social Network Analytics: Challenges

• Finding the right balance between local, customer specific versus network information

– It’s not all in the network!

• Need procedures to infer the behavior of all nodes simultaneously

– Collective inference procedures (e.g. Gibbs sampling)

• No easy separation in training and test set

– Cannot just cut the network in two!

– Out-of-time validation needed

13

Page 14: Social Networks in Data Mining s as Talks

Out-of-Sample versus Out-of-Time Validation

14

?

?

?

?

? ?

?

?

?

Time

Page 15: Social Networks in Data Mining s as Talks

A three layered Social Network Learner

• Local model

– Only uses local (e.g., customer specific) information

– E.g. socio-demographic, RFM, customer interaction, …

– Can be estimated using e.g. logistic regression, decision trees, …

• Network model

– Takes into account the network information

• Collective inference

– Determines how the nodes mutually influence each other

15

Page 16: Social Networks in Data Mining s as Talks

16

?

?

?

?

? ?

?

?

?

?

?

?

?

?

?

?

?

?

?

Page 17: Social Networks in Data Mining s as Talks

Case Study: Social Networks in Telco

• Traditional customer churn prediction models treat customers as isolated entities

• Customers are however believed to be strongly influenced by their social environment

– Recommendations from peers, mouth-to-mouth publicity

– Social leader influence

– Promotions to acquire groups of friends

– Reduced tariffs for intra-operator traffic

17

Page 18: Social Networks in Data Mining s as Talks

Local Models for Churn Prediction

18

Page 19: Social Networks in Data Mining s as Talks

• Call Detail Records (CDR) data

– Detailed logs about each interaction involving a customer

– Gigabytes to Terabytes of data each day

– Extract the call graph using computationally efficient algorithms

– Represent call graph as sparse matrix

– Edge definition (SMS/Voice/MMS/Email/…)

181806208300809 32462208699 206105300897975 357014032645640 I 32461002530 9 MOBISTAR MOBILE 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 … 195455641 32475611232 206102200262341 351913035725230 I 32476000005 10 Base SMSC Platform 99 21JAN2010:23:46:02 0 0 0 0 2 1 1 … 187097451101277 32465245451 206101100499483 356712034636630 I 32473161616 8 Proximus SMSC Platform 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 … …

Constructing a social network using CDR Data

19

Page 20: Social Networks in Data Mining s as Talks

From CDR data to Sparse Matrix • Need facilities for sparse matrix handling and parallel computing

181806208300809 32462208699 206105300897975 357014032645640 I 32461002530 9 MOBISTAR MOBILE 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 …

195455641 32475611232 206102200262341 351913035725230 I 32476000005 10 Base SMSC Platform 99 21JAN2010:23:46:02 0 0 0 0 2 1 1 …

187097451101277 32465245451 206101100499483 356712034636630 I 32473161616 8 Proximus SMSC Platform 99 21JAN2010:23:45:44 0 0 0 0 2 1 1 …

Raw

CDRs

C

A D

E

B F

J

I

H

G

Weighted

network

8

9

4

3

2

3

3

3

2 2

9 8

7

Page 21: Social Networks in Data Mining s as Talks

Case Study: European Telco operator

• Prepaid segment; about 2.000.000 customers

• 5 months call detail records + local attributes

• Churn rate 0.5% per month (skewed class distribution!)

• Weighted edges: number of seconds called during 3 months

• About 8.000.000 edges

• Total data set about 300 Gigabytes in size

Page 22: Social Networks in Data Mining s as Talks

The Markov assumption

• The class/behavior of a node in the network only depends upon the class/behavior of its direct neighbors

• Aka homophily, guilt by association

– Birds of a feather, flock together attributed to Robert Burton (1577-1640)

– (People) love those who are like themselves Aristotle, Rhetoric and Nichomachean Ethics

• Needed to facilitate computations (cf. Markov chains)

22

Page 23: Social Networks in Data Mining s as Talks

Local versus Network Variables

• A network variable aggregates information that is contained within a network structure and makes a differentiation in the destination of outgoing links or the origin of incoming links

• Examples:

– the number of contacts (local variable)

– the number of contacts with churners (network variable)

– the number of international calls (network variable)

23

Page 24: Social Networks in Data Mining s as Talks

Local versus Network variables

24

Page 25: Social Networks in Data Mining s as Talks

A Basic Network Model: Featurization

• Featurization or propositionalization: translate network into traditional attributes

• Network attributes can be included in traditional model (e.g. logistic regression)

• Create as many as possible and do stepwise regression

• A simple, interpretable social network classifier!

25

Page 26: Social Networks in Data Mining s as Talks

Example Network Model: Featurization

Page 27: Social Networks in Data Mining s as Talks

Example Network Model: WVRN

Page 28: Social Networks in Data Mining s as Talks

Results: Finding 1

• Network models boost performance and profit compared to a local model

28

Incremental profit increase

compared to no network effects

Page 29: Social Networks in Data Mining s as Talks

• Non-Markovian network effects – incorporating the impact of higher order neighbors leads to improved predictive power and profit!

Results: Finding 2

29

Incremental profit increase

compared to first order network

effects

Note: higher order effects previously

discovered in the spreading of happiness

and obesitas (N. Christakis, ‘Social

networks and happiness’)

Page 30: Social Networks in Data Mining s as Talks

Results: Finding 3

• Network models detect other types of churners compared to traditional models!

Synergy opportunities!

30

Fraction of the churners detected by the

network models (as a function of the

selected fraction of customers, ranked

according to their predicted probability to

churn), that are NOT detected by the

local model

Different curves represent different network

models (induced by different techniques)

Page 31: Social Networks in Data Mining s as Talks

Ensemble approach : Combining Local and Network models

• Use two models in parallel by selecting customers indicated by the local model and the network model

• Decide upon optimal fraction (current research)

31

Network

model

0.24

0.68

0.18

0.92

0.22

Ensemble model output

Local model

0.13

0.54

0.34

0.84

0.29

Page 32: Social Networks in Data Mining s as Talks

Ensemble approach: 2D Lift Curve

32

Page 33: Social Networks in Data Mining s as Talks

Current Research Topics

• Extensions towards regression context (e.g. CLV)

• Applications in other contexts (e.g. credit risk, anti-money laundering, customer acquisition, …)

• Integrating local information in a network learner

• Quasi-Social Networks

• Community mining

• Backtesting

33

Page 34: Social Networks in Data Mining s as Talks

Key lessons learnt • Introduced a three-layer social network learning

environment (local information, network information, collective inferencing)

• Defined local versus network variables

• Introduced featurization as a basic social network learner

• Discussed how non-Markovian behavior can be modelled in a straightforward way

• Illustrated the theoretical concepts using a real-life case study about churn prediction in the Telco sector

34

Page 35: Social Networks in Data Mining s as Talks

References • VERBEKE W., DEJAEGER K, MARTENS D., HUR J., BAESENS B., New insights into churn prediction in the

telecommunication sector: a profit driven data mining approach, European Journal of Operational Research, forthcoming, 2011.

• DEJAEGER K., VERBEKE W., MARTENS D., BAESENS B., Data Mining Techniques for Software Effort Estimation: a Comparative Study, IEEE Transactions on Software Engineering, forthcoming 2011.

• MARTENS D., FAWCETT T., BAESENS B., Editorial Survey: Swarm Intelligence for Data Mining, Machine Learning, Volume 82, Number 1, pp. 1-42, 2010.

• VERBEKE W., MARTENS D., MUES C., BAESENS B., Building customer churn prediction models with advanced rule induction techniques, Expert Systems with Applications, Volume 38, pp. 2354-2364, 2011.

• BAESENS B., MUES C., MARTENS D., VANTHIENEN J., 50 years of Data Mining and OR: upcoming trends and challenges, Journal of the Operational Research Society, Volume 60, pp. 16-23, 2009.

• GLADY N., CROUX C., BAESENS B., Modeling Churn Using Customer Lifetime Value, European Journal of Operational Research, Volume 197, Number 1, pp. 402-411, 2009.

• MARTENS D., BAESENS B., VAN GESTEL T., Decompositional Rule Extraction from Support Vector Machines by Active Learning, IEEE Transactions on Knowledge and Data Engineering, Volume 21, Number 1, pp. 178-191, 2009.

• GLADY N., CROUX C., BAESENS B., A Modified Pareto/NBD Approach for Predicting Customer Lifetime Value, Expert Systems With Applications, Volume 36, Number 2, pp. 2062-2071, 2009.

• BAESENS B., SETIONO R., MUES C., VANTHIENEN J., Using Neural Network Rule Extraction and Decision Tables for Credit-Risk Evaluation, Management Science, Volume 49, Number 3, pp. 312-329, March 2003.

35

Page 36: Social Networks in Data Mining s as Talks

FYI • Advanced Analytics for Customer Intelligence Using SAS

• Lecturer: prof. dr. Bart Baesens

• 3-day course offered

• Many companies have gathered huge amounts of customer data about marketing success, use of financial services, online usage, and even fraud behavior. Given recent trends and needs such as mass customization, personalization, Web 2.0, one-to-one marketing, risk management, and fraud detection, it becomes increasingly important to extract, understand, and exploit analytical patterns of customer behavior and strategic intelligence. This course helps clarify how to successfully adopt recently proposed state-of-the art analytical and data-mining techniques for advanced customer intelligence applications. This highly interactive course provides a sound mix of both theoretical and technical insights as well as practical implementation details and is illustrated by several real-life cases. Background material such as selected papers, tutorials, and guidelines are provided.

36

Page 37: Social Networks in Data Mining s as Talks

Acknowledgments • Jerry Oglesby, Director Global Academic Program & Global

Certification Education Division

• Larry Stewart, SAS Education Vice President

• Sean O’Brien, Director, Business and Curriculum Development

• Bob Lucas, Statistical Training and Technical Services Director

• Karen Washburn, Business Knowledge Series Manager

• Patsy Poole, Project Manager

• Hillary Kokes, former Business Knowledge Series Manager

• Lieve Goedhuys, former Academic Program Manager, SAS Institute Belgium-Luxembourg

• All the other great SAS folks for the excellent collaboration during the past years!

37

Page 38: Social Networks in Data Mining s as Talks

38

Copyright © 2012, SAS Institute Inc. All rights reserved.

Q & A

Page 39: Social Networks in Data Mining s as Talks

39

Copyright © 2012, SAS Institute Inc. All rights reserved.

Additional Resources Live Classes

Advanced Analytics for Customer Intelligence Using SAS

Analytics: Putting It All to Work

Upcoming Live Webinars

May 18: Getting Started with SAS® Enterprise Miner™

June 14: SAS® Information Management: Leverage and Extend Hadoop

SAS Talks on support.sas.com

Upcoming Live Events

Analytics 2012

Follow along on Twitter using #sastalks

Page 40: Social Networks in Data Mining s as Talks

Copyright © 2011, SAS Institute Inc. All rights reserved.

support.sas.com