Download - Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems
![Page 1: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/1.jpg)
Steffen [email protected]
1WeST
Web Science & TechnologiesUniversity of Koblenz ▪ Landau, Germany
Modelling the Web Examples of Modelling Text, Knowledge Networks
and Physical-Social Systems
Steffen Staab
![Page 2: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/2.jpg)
Steffen [email protected]
2WeST
What do people want from the Web?
Web as storagelibrary
memory
Web as toolsearch
transaction
Web as social mediumcommunication
cooperation
Web as mirror of selfIdentification
outreach
![Page 4: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/4.jpg)
Steffen [email protected]
4WeST
My Agenda in the Large
Web Content Discovering patterns Building tools Understanding
Web Interaction Monitoring Exploiting Guiding Understanding
Web Evolution Monitoring Predicting Guiding Understanding
![Page 5: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/5.jpg)
Steffen [email protected]
5WeST
1. Modelling Text
My Agenda for Today
Web Content Web Interaction
Web Evolution
2. Modeling Network
Evolution3. Modeling Physical-
social Data
![Page 6: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/6.jpg)
Steffen [email protected]
6WeST
1. Modelling Text
My Agenda for Today
Web Content Web Interaction
Web Evolution
2. Modeling Network
Evolution3. Modeling Physical-
social Data
![Page 8: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/8.jpg)
Steffen [email protected]
8WeST
Language Models
What follows „UK is“?
Conditional probability:
where
Issue:Long word sequences can rarely be observed
![Page 9: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/9.jpg)
Steffen [email protected]
9WeST
Modified Kneser-Ney Smoothing of n-grams
If sequence is hard to observethen approximate recursively observing marginal frequencies of
......
![Page 10: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/10.jpg)
Steffen [email protected]
10WeST
Modified Kneser-Ney Smoothing of n-grams
If sequence is hard to observethen approximate recursively observing marginal frequencies of
First recursion step:
Problem:If last word in the sequnce is rare, the overall sequence will be rare,
then the approximation will be of low quality.
![Page 11: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/11.jpg)
Steffen [email protected]
11WeST
Generalized Language Models [ACL14]
If sequence is too hard to observe, then approximate based on marginal probabilities of
...
recursively.
Core idea of formal solution: Recursively applicable, commutative skip operators
![Page 12: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/12.jpg)
Steffen [email protected]
12WeST
Improvement of GLMs [ACL14]
Evaluation measure: Perplexity
Data set: English Wikipedia, different sample sizes
Relative improvement: 2,6% (most training data, smallest model) to13,9% (least training data, largest model)
Perplexity (normalized)
![Page 13: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/13.jpg)
Steffen [email protected]
13WeST
Outlook for Generalized Language Models Correcting mistakes that are done in all tools
Lack of appropriate models
Other operators („the wild black cat“) Delete: „the black cat“ Part-of-speech: „the adj adj cat“
Application: e.g. next word prediction
Other data structures Tree-like data Graph data
proposal for Google
current focus
Semantic Web
![Page 14: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/14.jpg)
Steffen [email protected]
14WeST
1. Modelling Text
My Agenda for Today
Web Content Web Interaction
Web Evolution
2. Modeling Network
Evolution3. Modeling Physical-
social Data
![Page 15: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/15.jpg)
Steffen [email protected]
15WeST
Evolution of Networks [ICWSM 2013]
Additions RemovalsTraining
Link Prediction Problem
Unlink Prediction Problem
Markov assumption:
history irrelevant
![Page 16: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/16.jpg)
Steffen [email protected]
16WeST
Related Work in Brief
Prediction feature f assigns a score to node pair (i, j) implies to be ranked above
• Link Prediction: edge likelier to be added• Unlink Prediction: edge likelier to be removed
f (i , j ) > f (i , k ) (i , j) (i , k )
![Page 17: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/17.jpg)
Steffen [email protected]
17WeST
Related Work in Brief
Static features degree common-neighbours path3 local-clustering-
coefficient/embeddedness ...
Prediction feature f assigns a score to node pair (i, j) implies to be ranked above
• Link Prediction: edge likelier to be added• Unlink Prediction: edge likelier to be removed
f (i , j ) > f (i , k ) (i , j) (i , k )
![Page 18: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/18.jpg)
Steffen [email protected]
18WeST
Unlink prediction is much more difficult than link prediction
The Snapshot View
Link and unlink prediction
(ICWSM 2013)
![Page 19: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/19.jpg)
Steffen [email protected]
19WeST
Related Work in Brief
Additions RemovalsTraining
Link Prediction Problem
Unlink Prediction Problem
Markov assumption:
history irrelevant
Advantage: General ModelDisadvantage: General Model
IdeaKeep generality,
improve prediction
![Page 20: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/20.jpg)
Steffen [email protected]
20WeST
Our Approach - 1
Additions RemovalsTraining
Link Prediction Problem
Unlink Prediction Problem
Markov assumption:
history irrelevant
Hypothesis: Temporal information generally improves prediction
Idea1 Nodes concerned2 Neighbourhood
![Page 21: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/21.jpg)
Steffen [email protected]
21WeST
Our Approach - 2
Dynamic features:+ recency+ longevity
Extrapolation for temporal preferential attachment:
![Page 22: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/22.jpg)
Steffen [email protected]
22WeST
Evaluation & Discussion (excerpt)
Temporal link prediction significantly better, but only sightly Temporal unlink prediction always significantly improved Temporal preferential attachment best
AUC baselinequalitativequantitativeextrapolation
![Page 23: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/23.jpg)
Steffen [email protected]
23WeST
Outlook for Evolution of Networks
Temporal dynamics still underexplored lack of datasets! next experiments:
• Twitter followers• Xing.de
Unlinks lead to link recommendation new Wikipedia link (reorganization of Wikipedia pages!) new job new friend
![Page 24: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/24.jpg)
Steffen [email protected]
24WeST
1. Modelling Text
My Agenda for Today
Web Content Web Interaction
Web Evolution
2. Modeling Network
Evolution3. Modeling Physical-
social Data
![Page 25: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/25.jpg)
Steffen [email protected]
25WeST
fish, rice
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
fish, salmon, wine
rice, fish
lobster, seafood, shrimp
coffee
coffee, wine
coffee
wine
wine
pizza, wine
pizza, wine
pasta, wine
pasta, shrimplobster, shrimp
seafood, shrimp
Tagged photos with geo-coordinates from Flickr
![Page 26: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/26.jpg)
Steffen [email protected]
26WeST
fish, rice
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
seafoodfishlobstershrimpcrabwinesalmon
winepizzacoffeeitalianpasta
seafood, shrimp
lobster, shrimp
Tasks: Discovering topics, finding clusters
![Page 27: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/27.jpg)
Steffen [email protected]
27WeST
Cultural areas, country borders, geographical features and other geographical observations exhibit complex spatial distributions
wikipedia.org
Challenge
![Page 28: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/28.jpg)
Steffen [email protected]
28WeST
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
seafood, shrimp
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
seafoodfishlobstershrimpcrabwinesalmon
winepizzacoffeeitalianpasta
A. Ahmed, L. Hong and A. Smola, 2013 (following (Yin et al 2011; Sizov 2010))
Existing approaches: Gaussian regions
![Page 29: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/29.jpg)
Steffen [email protected]
29WeST
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
seafood, shrimp
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
seafoodfishlobstershrimpcrabwinesalmon
winepizzacoffeeitalianpasta
MGTM 1: Global Topic Clustering
![Page 30: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/30.jpg)
Steffen [email protected]
30WeST
fish, rice
lobster, shrimp
seafood, fish seafood, shrimp lobster, wine
seafood, fish, salmon
seafood, shrimp
fish, salmon, wine
seafood, shrimp
lobster, seafood, shrimp
coffee
coffee, wine
coffeeitalian, wine
wine
pizza, wine
italian, pizza, wine
pasta, wine
pasta, shrimp
seafoodfishlobstershrimpcrabwinesalmon
winepizzacoffeeitalianpasta
MGTM 2: Determining Neighbourhoods
![Page 31: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/31.jpg)
Steffen [email protected]
31WeST
Cluster adjacency Dependencies of document-specific topic distributions
Exchange of topic information between clusters
MGTM 3: Derived Topic Model
![Page 32: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/32.jpg)
Steffen [email protected]
32WeST
Exchange of topic information between clusters
MGTM 4: Exchange of Topic Information
![Page 33: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/33.jpg)
Steffen [email protected]
33WeST
Exchange of topic information between clusters
MGTM 4: Exchange of Topic Information
![Page 34: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/34.jpg)
Steffen [email protected]
34WeST
Exchange of topic information between clusters
MGTM 4: Exchange of Topic Information
![Page 35: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/35.jpg)
Steffen [email protected]
36WeST
Evaluation: Anectodal, Perplexity, Gaming
Gaming study: intrusion detection
Precision 8 topicsavg / median
LGTA 0.60 / 0.58
Basic model 0.64 / 0.58
MGTM 0.78 / 0.75
![Page 36: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/36.jpg)
Steffen [email protected]
37WeST
Outlook for LDA with structure
Texts + social network structures scientometry xing.de
Web pages + user visits chefkoch.de
![Page 37: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/37.jpg)
Steffen [email protected]
38WeST
Future: Knowledge about social aspects needed
Future: CS style models for social sciences
![Page 38: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/38.jpg)
Steffen [email protected]
39WeST
References[ACL14] R. Pickhardt, T. Gottron, M. Körner, P. G. Wagner, T. Speicher, S.
Staab. A Generalized Language Model as the Combination of Skipped n-grams and Modified Kneser Ney Smoothing. In: Proc. of ACL-2014 - The 52nd Annual Meeting of the Association for Computational Linguistics. Baltimore, June 22-27, 2014.
[WSDM14] C. Kling, J. Kunegis, S. Sizov, S. Staab. Detecting Non-Gaussian Geographical Topics in Tagged Photo Collections. In: Proc. of the 7th ACM Conference on Web Search and Data Mining (WSDM2014), New York, US, February 24-28, 2014.
[ICWSM13] J.Preusse, J.Kunegis, M.Thimm, T.Gottron, S. Staab. Structural Changes in Collaborative Knowledge Networks. In: Proceedings of the Seventh International AAAI Conference on Weblogs and Social Media (ICWSM 2013), Boston, July 8-10, 2013.
![Page 39: Modelling the Web: Examples of Modelling Text, Knowledge Networks and Physical-Social Systems](https://reader034.vdocuments.us/reader034/viewer/2022050815/53ed80048d7f7289708b5cf5/html5/thumbnails/39.jpg)
Steffen [email protected]
40WeST
Semantic Web
Social Web & Web Retrieval
Interactive Web & Human Computing
Web & Economy
Software & Services
Web Science & Technologies Team & Research
Computational Social Science
Thank You!