predictive semantic social media analysis david a. ostrowski system analytics and environmental...

16
Predictive Semantic Social Media Analysis David A. Ostrowski System Analytics and Environmental Sciences Research and Advanced Engineering Ford Motor Company

Upload: byron-lawrence

Post on 17-Dec-2015

214 views

Category:

Documents


0 download

TRANSCRIPT

Predictive Semantic Social Media Analysis

David A. Ostrowski System Analytics and Environmental Sciences

Research and Advanced Engineering

Ford Motor Company

Social media

• Influential• Sample of the web

– News driven• CRM

– Real-time– Less biased

• Unique opportunities for analytics

Opportunities

• Old Model– Reactionary

• Damage control• Inquiries• Confirm positive reaction

• New Model– Preemptive

• Focused engagement– Promotions– Events– Media

• Anticipatory

Social Dimensions

• Describes affiliations across a network

• Values / Community

• Reinforced by relationships

• Utilize to predict purchase behavior

Relational Learning

• ‘Birds of a Feather’

• Leverage each local network to semantic understanding

• Relational Learning =>Social dimensions

Framework Overview

• Relational learning– Strengthen representation– Support knowledge

• Unsupervised classification– Generation of dimensions

• Supervised classification– Dimensions => behavior

Movies Television Shows associationsschools

Fb identifier Fb identifier Fb identifier

Political affiliations Issues positions

values

Buying habits

Religious views

Framework Overview

Localnetwork

taxonomylabels

SocialDimension

RNclassification

K-meanscluster

features

Supv.classification

behaviorsfeatures

Higher level features

Case Study One

• 4000 facebook identifiers

• Associations to two vehicle lines

• Question:– What can we extract to characterize between these

two purchase behaviors

Relational Learning Step

• Extracted data from FB

• Consolidated interests

• Applied the RN algorithm

• Guided by taxonomy

45 50 55 60 65 70 75 80 85 90

0

10

20

30

40

50

60

70

80

90

100

Facebook Accounts

missing labels (normalized)

Acc

ura

cy

RNBayesk-Means

Preliminary cluster statistics

1 2 3 4 5 6veh1 k=3 46 39 13veh2 k=3 21 42 36veh1 k=4 44 16 12 26veh2 k=4 14 27 24 32veh1 k=5 21 8 1 0.3 45veh2 k=5 35 22 12 15 14veh1 k=6 7 43 6 13 9 19veh2 k=6 20 14 16 8 9 35

normalized differences between vehicle lines

Extracted social dimensions

• Applied feature sets to k-means (3-6)

• Each classification attempt to characterize between vehicle line and a social dimension (value / interest ..)

• All classification to be considered towards behavioral training

• Also considered community detection– Via maximization of a modularity matrix via leading eigenvectors

Applied Supervised Classification for the Behavior prediction

•Applied sets through three Machine Learning algorithm

•Simple Bayesprecision .7 , recall .69

• Weightily Averaged One-dependence Estimators(WAODE)precision .69 recall .70

•J48precision .69 recall .70

Case Study 2

• 20000 Facebook IDs across four vehicle lines

• Relational modeling– Similar performance as first case study

• Social Dimensions generated for k=(3-7)– Not as much separation after k=6 clustering

• Precision recall (among simple bayes, WAODE, J48).469, .483.591, .588.534, .536

Next Steps

• Institutionalization– Extract / define exactly what our dimensions are

explaining in our data sets.

• Relate to specific association – Values– community

Q/ASee me for friends and neighbors discount…. [email protected]

Appendix (software)

• ‘R’ igraph• ‘R’ km module• Weka• Ruby -Watir