big data: descoberta de conhecimento em ambientes de big data e computação na nuvem - carlos andre

56
Network Analysis For Business Applications

Upload: rio-info

Post on 13-Dec-2014

23 views

Category:

Technology


3 download

DESCRIPTION

Palestra sobre Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem apresentada por Carlos André durante o Rio Info 2014

TRANSCRIPT

Page 1: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Network AnalysisFor Business Applications

Page 2: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Social network buildingand metrics computing

Onwards analysis uponparticular business events

Presenttime

Customers who

triggered some

business event

What happened withthe correlated customers?

Method to analyze the influence over time

Page 3: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

n

Social metrics to describe the influence

Influence 1Considers the

links and nodes’ weights of the

adjacent nodes to n.

Influence 2Consider the links

and nodes’ weights of adjacent nodes to n in addition to the

links’ weights of the nodes adjacent to

adjacent nodes to n.

ClosenessThe average short path to all nodes

connected to node n.

DegreeThe number of

connections incident (in and out) to node n. Betweenness

The number of shortest paths which node n

partakes.

HubHow many

important nodes n points to.

AuthorityHow many

important nodes point to n.

Page RankThe

percentage of possible time

spent by other nodes to node

n.

Page 4: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Social Network AnalysisApplications in Telecommunications

Product Adoption Diffusion

Page 5: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Diffusion process for 3G bundle acquisition

3G bundleacquisition

event

activeresident

ial mobile

customers

2,724

95related

customerswho

bought 3Gafterwards

4 months

1,420related

customers

136randomcustome

rs

Page 6: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

3G bundleacquisition

event

2,724

550related

customerswho

bought 3Gafterwards

4 months

3,585related

customers

136influent

ialcustome

rs

activeresident

ial mobile

customers

Diffusion process for 3G bundle acquisition

Page 7: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

randomcustome

rs

4261influenti

alcustome

rs

0.7

101

relatedcustomers

influencedcustomers

6.6%

15.3%

132% more effective

479% wider

Diffusion process for 3G bundle acquisition

Page 8: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

3G Bundle diffusion comparison

Comparison over six months

From 136 random customers, how do their related connections behave purschasing?

From top 136 influential customers, how do their related connections behave purchasing?

Page 9: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

3G Bundle diffusion for 136 random influencers

Page 10: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

3G Bundle diffusion for the top 136 top influencers

Page 11: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Adjusting the customer influence factor

Past behaviour of purchasing Future events

(network metrics)Influence Factor

ƒ is a function to adjust the network metrics in relation to the past events.

Page 12: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Computing the customer influence factor

networ

k

metrics

canonical

correlation

between

network

metrics

and the

event of

influence

relation

between

network

metrics for

influencers

and buyers

coeffi cient

of variation

standard

error of

the

mean

range

influenc

e

factor= x x x x

This formula predicts the influential customers for 3G bundle diffusion in

74% of the cases.

Page 13: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

customers

whodid not buy 3G bundles

customers who

did buy3G

bundles

Predicted to BUY

customer base

Predicted to NOT BUY

Predicted to BUY

Predicted to NOT BUY

83%

17%

76%

24%

8%

92%

84%

Prediction model for 3G bundle acquisition

Page 14: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Possible approach for marketing and sales campaigns

Cross analysis by using the likelihood of purchasing and the probability of customers’ influence in diffusing 3G

bundles

Most influential customers

for 3G diffusion

Customers most likely to purchase 3G

Targeted customers

for 3G campaigns

CustomerBase

PREDICTIVE MODEL FOR 3G ADOPTION

PREDICTIVE MODEL FOR 3G DIFFUSION

CUSTOMERS MOST LIKELY TO PURCHASE AND TO DIFFUSE 3G

BUNDLES WITHIN THEIR SOCIAL NETWORKS

Page 15: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Social Network AnalysisApplications in Telecommunications

Viral Effect in Portability

Page 16: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

M M+1 and M+2 M+3

LEADERS

Customers who have influenced others to port

out.

The leaders’ behavior is

described upon the social network

metrics.

The process to analyze the viral effect in portability

Page 17: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

101.389Correlated customers

1-17

6.101Customers ported out

inOctober

419Ported outNovember

64%Mobile

36%Op1

22%Op2

42%Op3

Viral effect in portability

228Ported outDecember

75%Mobile

29%Op1

16%Op2

55%Op3

212Ported out

January

77%Mobile

17%Op1

29%Op2

55%Op3

3.478Pre

2.623Post

57%

43%

2.848

Op1

1.375

Op2

1.878

Op3

47%

22%

31%

47.476

Offnet

47%

6.953

Post

30.575Pre

16.385Fix

7%

30%

16%

677Leade

rs

11%

Page 18: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

regularcustome

r

1,28

331leader

customer

0,14

171

correlatedcustomers

customersinfluenced

1,6%

8%

The leader might be up to 5 times more effective

Viral effect in portability

16

9

carrier’smobile

53%

48%

Page 19: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

100% Max

99% 95% 90% 75% Q3 50% Median

25% Q1 10% 5% 1% 0% Min

Predictive score for leaders in portability

76%

Hit rate

Misclassification

Predictive model based on decision tree

Page 20: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Customers who had ported out

in the past have influenced

their peers in port out

afterwards

Viral effect in portability154%

92%

43%

1 32

Page 21: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

3%

The propensity to port out increases as long as another port outs take place within the communities over the time.

10 32

49%

24%

10%

Viral effect of the portability within communities

Page 22: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

OctoberChurn

Predictive Model

Hit rate

86%

Timeframe

Historical behavior Target

Predictive model for portability events

November

December

January

Performance of the artificial neural networks model

Page 23: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

96%

59%

91%

Performance of the model during the training process

Predictive model for portability events

25 30 35 40 60 70 80 90 1005 15 20

74%

66%

63%

As

hig

her

the

scor

e as

bet

ter

the

pre

dic

tion

Page 24: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Outcomes from Social Network Analysis in portability

11% of the customers behave as leaders in afterwards portability, affecting

up to 8% of their peers. Up to 21% of the customers who port out are influenced by previous portability.

47% of all port outs goes to Op1. 51% of influenced port out goes to a Op3.

The viral effect of the leaders might be up to 5 times more effective than of

the regular customers. They talk up to 2 times more ion terms of frequency

and up to 3 times more in terms of volume.

The use of social network metrics, including communities topology, has

increased the performance of the predictive model up to 30%.

Previous portability within communities increase the propensity for

afterwards portability up tp 4 times.

Page 25: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Social Network AnalysisApplications in Telecommunications

Fraud Detection in Communities

Page 26: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Viral Effect in Social Networks for Churn and Purchase

Churners

11%Leade

rs

8%Influence

Buyers14%

Leaders

17%Influence

Communities

Previous

churn

3xMore churn

Communities

Previous

purchase

4xMore

purchase

Page 27: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

The Viral Effect in Social Networks for Fraud

Fraudsters

0,05%

Leaders

0,1%Influence

Communities

Previous

fraud

0xMore

frauds

Page 28: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Outlier Analysis upon Communities

Page 29: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

If a particular node X, within a

community with Y members, has a

Degree-In greater than Z then trigger a

alert...

Outilier Analysis upon the Differences

Difference between each node and its community

Network Metrics Computing

Communities Detection

Nodes & Links

Calls

Outlier Analysis upon Communities

Page 30: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Mean of Degree-In

Mean of Degree-Out

Mean of Hub

Mean of Authority

Mean of Links-In

Mean of Links-Out

Degree-In

Degree-Out

Hub

Authority

Links-In

Links-Out

Outlier Analysis upon Communities

DifDegree-In

DifDegree-Out

DifHub

DifAuthority

DifLinks-In

DifLinks-Out

Page 31: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Outlier Analysis upon Communities

Mean of Degree-In

Mean of Degree-Out

Mean of Hub

Mean of Authority

Mean of Links-In

Mean of Links-Out

Degree-In

Degree-Out

Hub

Authority

Links-In

Links-Out

DifDegree-In

DifDegree-Out

DifHub

DifAuthority

DifLinks-In

DifLinks-Out

Mean of Degree-In

Mean of Degree-Out

Mean of Hub

Mean of Authority

Mean of Links-In

Mean of Links-Out

Degree-In

Degree-Out

Hub

Authority

Links-In

Links-Out

DifDegree-In

DifDegree-Out

DifHub

DifAuthority

DifLinks-In

DifLinks-Out

1

n

Page 32: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Communities

Searching Outliers in Social Network Metrics

Nodes

Degree-In

Degree-Out

Hub

Authority

Links-In

Links-Out

DifDegree-In

DifDegree-Out

DifHub

DifAuthority

DifLinks-In

DifLinks-Out

Outiler Analysi

s

Nodes with unexpected

behavior

Page 33: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Month 1Churn

Predictive Model

Accuracy

78%

Observation Window

Training Behavior Target

Predictive Model to Detect Fraud

Month 2 Month 3 Month 4

Page 34: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

86%

83%81%

5%

15%

25%30%

Average accuracy of 78%

Target population

97%

25 40 50 60 70 80 9010 15 20 305 100

10%

20%

77%

91%

54%

Predictive Model to Detect Fraud

Soc

ial N

etw

ork

Met

rics

hav

e

impro

ved the

pre

dic

tive

mod

el in

mor

e th

an 3

0%

Page 35: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Social Network AnalysisApplications in Telecommunications

Community Layering

Page 36: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

1,034,675,130 links within the network.

863,229,202 UNDIRECTED links within the network.

84,493,587 nodes within the network.

34% on-net. 73% pre-paid.

2,234,496 communities detected within the network.

38 members in average.

Overall Figures about the network

Page 37: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

A set of distinct types of clustering methods was performed, including distances, hierarchy, disjoin, and dimension reduction.

The method that best explained the distribution/variation of the observations against the database was the clustering variable.

This method creates a clustering coeffi cient for each observation, allowing the scoring process and the computation of a cluster’s propensity for each

observation.

36 variables were used on the clustering process, mostly describing the differences within the link and the differences between links.

Basically, two sets of variables were created, one called internal derivative, and the other one called cross weighted.

Classifying the Links

Page 38: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Internal Derivative describes the distribution within the link:

Voice | SMS | MMSEarly Morning | Work Morning | Work Afternoon | Travel | Leisure | Night

Very short | Short | Normal | Long | Very Long | Extreme

voice / ( voice + sms + mms ) * 100

Cross Weighted describes the weight of a particular link upon all links for the pair of nodes:

Voice | SMS | MMSEarly Morning | Work Morning | Work Afternoon | Travel | Leisure | Night

Very short | Short | Normal | Long | Very Long | Extreme

voiceAB / ( ∑voiceA + ∑voiceB ) * 100

Classifying the Links

Page 39: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Additional variables were used to describe the links:

Same Cell | Same Community | Credit Transfer | Both Operator’s customersVoice Duration (Weighted) | Voice Relation Weighted (Duration/Amount)

Variables suggested to be quite relevant are rare:

Same Cell: 6%MMS: 1.15%

Credit Transfer: 0.14%

Even though they were weighted to standardize the observations:

Same Cell: x17MMS: x 87

Credit Transfer: x 741

Classifying the Links

Page 40: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

All relationships (undirected links) were clustered, turning into 3 clusters:

FRIENDS28%

FAMILY35%

BUSINESS37%

Clustering Communities upon Links Classification

Page 41: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Main characteristics

FRIENDS28%

2/3 of SMS and 1/3 of Voice

39% during Work time

46% during Travel and Leisure time

15% during Morning and Night time

2/3 Very Short calls

1/3 of Short calls

Clear indication of SMS, Leisure, and Very Short

Propensity to SMS, MMS, Night, and Very Short

FAMILY35%

100% of Voice

35% during Work time

56% during Travel time

9% during Morning and Night time

44% of Normal calls

32% of Long calls

11% of Very Long calls

2% of Extreme calls

Clear indication of Voice, Leisure, and Very Long

Propensity to Voice Duration, Leisure, Long, Very Long, and

Extreme

BUSINESS37%

100% of Voice

3/4 during Work time

16% during Travel and Leisure time

9% during Morning and Night time

1/2 of Normal calls

30% of Short calls

15% of Long calls

Clear indication of Voice, Work Afternoon, and Short

Propensity to Voice, Early Morning, Work Morning, Work Afternoon, Travel, Short, and

Normal

Clustering Communities upon Links Classification

Page 42: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Clusters were also sub-clustered, turning into 12 sub-clusters:

FRIENDS28%

FAMILY35%

BUSINESS37%

MEDIUM30%

SMALL7%

CORPORATE18%

WORKERS26%

PROFESSIONALS18%

COUPLE24%

WEDDED21%

DOMESTIC23%

APART32%

CLASSMATES35%

MATES36%

WORKMATES31%

Sub-Clustering Communities upon Links Classification

Page 43: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

FRIENDS 28%

CLASSMATE35%

95% of SMS

2/3 during Work time

25% during Work Morning time

23% during Night time

94 % Very Short calls

Indication of SMS, Night, and Very Short

Propensity to Voice Duration, SMS, MMS, Early Morning,

Work Morning, Work Afternoon, Travel, Night, Very Short, Long, Very Long, and

Extreme

MATE36%

63% of Voice

37% of SMS

69% during Leisure time

18% during Travel time

44% of Very Short calls

43% of Short calls

11% of Normal calls

Indication of Voice, Leisure, and Short

Propensity to Voice and Short

WORKMATE31%

63% of SMS

37% of Voice

52% during Work Afternoon time

27% during Work Morning time

67% of Very Short calls

22% of Short calls

8% of Normalcalls

Indication of SMS, Work Morning and Very Short

Propensity to Relation between Voir Duration and

Amount of Voice and Normal

Sub-Clustering Communities upon Links Classification

Page 44: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

FAMILY 35%

COUPLE24%

92% of Voice

8% of SMS

32% Work Afternoon

25% Travel

21% Leisure

12% Night

43% Long

28% Very Long

7% Extreme

Indication of Voice, Travel, Very Long and

Extreme

Propensity to Voice Duration, SMS, MMS,

Very Long and Extreme

WEDDED21%

100% of Voice

37% Travel

26% Leisure

13% Work Afternoon

10% Night

74% Normal

Indication of Voice, Travel and Normal

Propensity to Voice, Early Morning, Work Afternoon, Travel,

Leisure, Night, Very Short, Short and

Normal

DOMESTIC23%

100% of Voice

71% Work Morning

47% Normal

37% Long

Indication of Voice, Work Morning and

Long

Propensity to Work Morning

APART32%

100% of Voice

86% Leisure

48% of Normal

36% of Long

Indication of Voice, Leisure and Long

Sub-Clustering Communities upon Links Classification

Page 45: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

BUSINESS 37%MEDIUM

30%

100% Voice

72% Work Afternoon

11% Travel

93% Normal

Indication of Voice, Work

Afternoon and Normal

SMALL

7%

91% Voice

8% SMS

96% Work Afternoon

36% Normal

27% Very Short

24% Short

Indication of Voice, Work

Afternoon and Very Short

Propensity to MMS

CORPORATE

18%

98% Voice

2% SMS

77% Work Afternoon

8% Travel

7% Leisure

60% Normal

23% Long

Indication of Voice, Work

Afternoon and Long

Propensity to Voice Duration, Long Very Long

and Extreme

WORKERS

26%

99% Voice

1% SMS

57% Work Afternoon

11% Travel

10% Leisure

9% Night

76% Short

20% Normal

Indication of Voice,

Workafternoon and Short

Propensity to Travel, Leisure, Night and Short

PROFESSIONALS

18%

98% Voice

2% SMS

47% Work Morning

34% Work Afternoon

7% Travel

57% of Normal

28% of Short

Indication of Voice, Work Morning and

Normal

Propensity to Voice, SMS, Early

Morning, Work Morning, Work

Afternoon, Very Short and Normal

Sub-Clustering Communities upon Links Classification

Page 46: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Communities were classified based on the links distribution

4 FRIENDS 6 FAMILY 11 BUSINESS

BUSINESS

Layering Communities upon Relationships

Page 47: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Communities layers distribution

FRIENDS

38%

FAMILY

21%

BUSINESS

41%

Community Layers

Page 48: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

However, communities are shaped by different types of links.

FRIENDS well concentrated on the

38% types of friends links

FAMILY many business links

21% similar behavior in usage

BUSINESS many family links

41% similar behavior in usage

FRIENDS

19%

FAMILY

35%

BUSINESS

46%

FRIENDS

18%

FAMILY

44%

BUSINESS

37%

FRIENDS

52%

FAMILY

23%

BUSINESS

25%

Community Layers

Page 49: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

FRIEND’s communities

CLASSMATE

77%

MATE

12%

WORKMATE

11%

Community Layers

Page 50: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Different types of links in layer FRIENDS

CLASSMATE well concentrated on the

77% types of classmates links

MATE many classmates links

12% distributed behavior in usage

WORKMATE concentrated on the types

11% of family links

CLASSMATE

30%

MATE

20%

WORKMATE

50%

CLASSMATE

30%

MATE

47%

WORKMATE

23%

CLASSMATE

59%

MATE

18%

WORKMATE

23%

Community Layers

Page 51: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

FAMILY’s communities

COUPLE

16%

WEDDED

26%

DOMESTIC

27%

APART

31%

Community Layers

Page 52: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Different types of links in layer FAMILY

COUPLE well biased by apart

16%

WEDDED biased by apart

26%

DOMESTIC biased by apart

27%COUPLE

18%

WEDDED

21%

DOMESTIC

38%

COUPLE

17%

WEDDED

40%

DOMESTIC

21%

COUPLE

37%

WEDDED

20%

DOMESTIC

21%

APART biased by wedded

31%COUPLE

19%

WEDDED

22%

APART

38%

APART

22%

APART

23%

APART

22%

DOMESTIC

21%

Community Layers

Page 53: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

BUSINESS’ communities

MEDIUM

9%

SMALL

1%

CORPORATE

9%

WORKERS

56%

PROFESSIONALS

26%

Community Layers

Page 54: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Different types of links in layer BUSINESS

MEDIUM well workers and

9% professionals bias

SMALL many concentrated

1% on small

CORPORATE workers and

9% professionals bias

MED

18%

SML

7%

CORPORATE

35%

MED

12%

SMALL

46%

CORP

15%

MEDIUM

32%

SML

6%

CORP

19%

WORKERS concentrated on small professionals

56% bias

MED

17%

SML

5%

WORKES

41%

WORKERS

23%

WORKES

15%

WORKES

21%

CORP

14%

PROFESSIONALS small workers

26% of bias

MED

17%

SML

5%

PROFESSIONALS

37%

CORP

17%

PROF

21%

PROF

13%

PROF

20%

PROF

23%

WORKES

24%

Community Layers

Page 55: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Additional information

Network Analysis for Business Applications

SAS Business Knowledge Series

Page 56: Big data: Descoberta de conhecimento em ambientes de big data e computação na nuvem - Carlos Andre

Dr. Carlos Andre Reis PinheiroVisiting Professor, KU Leuven, [email protected]

Lecturer, FGV, BrazilLecturer at Neoma Business School, [email protected]