social network analysis with spark
TRANSCRIPT
![Page 1: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/1.jpg)
Social Network Analysis (SNA)Ghulam Imaduddin
![Page 2: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/2.jpg)
2
Definition
From the point of view of data mining, a social network is a heterogeneous and
multirelational data set represented by a graph. The graph is typically very
large, with nodes (or vertex) corresponding to objects and edges
corresponding to links representing relationships or interactions between
objects. Both nodes and links have attributes
(Han & Kamber, 2006).Call, sms, IM, trf. Balance, …
mention, follow, like, …
subscriber subscriber
![Page 3: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/3.jpg)
3
Benefit of SNA
Identify role of subscriber in community:• Community leader• Bridge• Passive• Follower
Identify high value/prospect community by looking at:• Community size• Closeness• Member’s profile (device,
usage, ARPU, location)• Onnet/Offnet share in
community
Suspected samesubscriber
Comparing two social network to identify single identity of subscriber. By comparing two social network
Furt
her
Util
izatio
n
• New product campaign, targeting community leader, bridge, and high value community• Retention program prioritization for community leader, bridge, and high value community• Product adoption campaign for follower in community that already adopt the product• Identifying rotational churner to be excluded in retention campaign, or to evaluate dealer• SN variable can be used to enhance another predictive model. For example: social network
variable can increase the lift of churn model for high value customer (Imaduddin, 2014)
![Page 4: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/4.jpg)
4
Social Network Graph Mining
By mining the graph of social network, we can extract valuable information such as:• Degree (in-degree, out-degree, max-degree). Degree related to number of edge attached
to one vertex/node. Vertex with high number of in-degree means that vertex receive many information from others, and vice versa.
• PageRank. PageRank measures the importance of each vertex in a graph. If a Twitter user is followed by many others, the user will be ranked highly. For CDR based social network, reverse the graph direction before use PageRank function to identify the important vertex
• Local clustering coefficient (LCC). LCC represent how close a customer’s network. The higher the LCC, the closer the network. LCC calculation derived from triangle counting of each vertex.
𝐿𝐶𝐶=¿ 𝑡𝑟𝑖𝑎𝑛𝑔𝑙𝑒
(𝑛2),𝑛=¿ h𝑛𝑒𝑖𝑔 𝑏𝑜𝑢𝑟
![Page 5: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/5.jpg)
5
How To Build
Tools
Language
Platform
![Page 6: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/6.jpg)
6
Let’s get our hand dirty!
![Page 7: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/7.jpg)
7
Graph ExampleGraph Representation Data Representation
![Page 8: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/8.jpg)
8
Script Example – Degree Information
![Page 9: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/9.jpg)
9
Degree Information ResultGraph Representation
Result(id, total-degree, in-degree, out-degree)
![Page 10: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/10.jpg)
10
Script Example – PageRank
![Page 11: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/11.jpg)
11
PageRank ResultGraph Representation
Result(id, PageRank) (id, reverse PageRank)
![Page 12: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/12.jpg)
12
Script Example – Triangle
![Page 13: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/13.jpg)
13
Triangle Counting ResultGraph Representation
Result(id, #triangle)
![Page 14: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/14.jpg)
14
Solving Real World Problem
• Define the vertices. Is it subscriber, web pages, twitter account?
• Define the edge how the vertices connected. E.g. total call minutes in a month > 5 minutes,
sms > 10, etc
• Identify the mega hubs. Mega hubs is vertex that connected to massive amount of vertices
(something like call center or spammer). Mega hubs can be removed, or process separately
based on the problem.
• Identify the measure needed (PageRank, degree, LCC, triangle, etc)
• Build the data source (separate the vertex properties data and the connection data – join it
later), and put it distributed on hadoop.
• Build the code, run it, and feed the result back to data warehouse or hadoop for further
utilization
![Page 15: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/15.jpg)
15
References & Resources• Han, J., & Kamber, M. (2006). Data Mining Concepts and Techniques. San Francisco: Morgan Kaufmann.• Imaduddin, G. (2014). Evaluation and Improvement of Churn Model Using Customer Value and Social
Network. Jakarta: Universitas Indonesia.
References
Resources• Apache Spark Overview. https://spark.apache.org/docs/latest/• Databricks Training Resources. https://databricks.com/spark-training-resources• GraphX Programming Guide. https://
spark.apache.org/docs/latest/graphx-programming-guide.html• Social Network Analysis. http://en.wikipedia.org/wiki/Social_network_analysis• Spark Scala API Doc. https://
spark.apache.org/docs/1.1.0/api/scala/index.html#org.apache.spark.package• The Scala Programming Language. http://www.scala-lang.org/
![Page 16: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/16.jpg)
16
Appendix
![Page 17: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/17.jpg)
17
List of Graph Operation in GraphX
![Page 18: Social Network Analysis with Spark](https://reader036.vdocuments.us/reader036/viewer/2022081605/58ec2f0d1a28ab19158b4663/html5/thumbnails/18.jpg)
18
List of Graph Operation in GraphX