information surprise or how to find interesting data
TRANSCRIPT
What is a ‘surprise’?
Define Surprise!
surprise
[countable] an event, a piece of news, etc. that is unexpected or that happens suddenlySYNONYMS: shock, … , eye-opener
[uncountable, countable] a feeling caused by something happening suddenly or unexpectedlySYNONYMS: astonishment, ...
(Oxford Advanced Learner's Dictionary)
Cat explores
Cat explores
meh
Cat meets unexpected
Cat meets unexpected
wow
Quantify Surprise!
?measured in
wows
QuantifyComplexity can measure any content type.Note: complex is not random!
Measures of complexity1. Subjective rating2. #Distinct elements3. #Dimension4. #Control parameters5. Minimal description6. Information content7. Minimal generator8. Minimum energy
Abdallah, S., & Plumbley, M. (2009). Information dynamics: patterns of expectation and surprise in the perception of music. Connection Science, 21(2-3), 89-117.
<vs>
Neuro/Cognitive ScienceHow do we perceive information?
Machine LearningHow to measure differences?
Surprise Quants in academia
... machine that constantly tells you what you already know is just irritating. So software alerts users only to surprises...Horvitz, E., Apacible, J., Sarin, R., & Liao, L. Prediction, Expectation, and Surprise: Methods, Designs, and Study of a Deployed Traffic Forecasting Service.
Friston, K. (2010). The free-energy principle: a unified brain theory?. Nature Reviews Neuroscience, 11(2), 127-138.
Surprise Quants in academiaNeuro/Cognitive ScienceHow do we perceive information?
Machine LearningHow to measure differences?
Machine LearningNeuro/Cognitive Science
Surprise Quants in academia
Itti, L., & Baldi, P. F. (2005). Bayesian surprise attracts human attention. In Advances in neural information processing systems (pp. 547-554).
Surprise Quants in academia
Itti, L., & Baldi, P. F. (2005). Bayesian surprise attracts human attention. In Advances in neural information processing systems (pp. 547-554).
meh
wow
meh
Typical ML applicationsUnsupervised Learning
1. Decision trees (inf. gain)2. MaxEnt principle 3. ...
Specifically after ‘surprise’:4. One-class classification5. Anomaly detection6. Novelty measure Pimentel, M. A., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014).
A review of novelty detection. Signal Processing, 99, 215-249.
Model of a catData Model
(expectations)
Data (stream) Surprising?
(interesting, new)
Update
wow(act)
meh(ignore)
Element(attention window)
Model of a cat’s surprise
Surprising?(interesting, new)
Quantify surprisal /self-information/
The surprise /information/ in observing the occurrence of an event having probability .
Axioms:≤≥
∗
Derive:∗ ∗
∗
Surprisal /self-information/:−
Flipping a fair coin provides 1bit of new information.
bitsor wows
bits
Surprisal applicationsSelecting information source:
Oleksandr Pryymak. Achieving Accurate Opinion Consensus in Large Multi-Agent SystemsUniversity of Southampton, Doctoral Thesis, 170pp., 2013
Model of a catData Model
(expectations)
Data (stream) Surprising?
(interesting, new)
Update
wow(act)
meh(ignore)
Element(attention window)
Model of a cat’s knowledgeData Model
(expectations)
Quantify ‘knowledge’ /entropy/
The Shannon entropy is the expected value of the self-information.
Notes:1. The maximum entropy distribution
is the least informative.
2. The statistical mechanics and the information entropy are principally the same.
max: log2(n)
Entropy of a Bernoulli trialX Є {0,1}
Entropy applicationsAnalysis of a binary of GeoIP ISP database:
Analyzing unknown binary files using information entropy:http://yurichev.com/blog/entropy/
Entropy applicationsVisualizing the OSX ksh binary (see binvis.io)
Visualizing entropy in binary files http://corte.si/posts/visualisation/entropy/index.html
1,2: Cryptic signature
Model of a cat’s discoveryData Model
(expectations)
Surprising?(interesting, new)
wow(act)
meh(ignore)
Element(attention window)
What has changed?
The Kullback–Leibler divergence /relative entropy, information gain/: is a measure of the information lost when Q is used to approximate P (measures the expected number of extra bits required to recode)
Quantify ‘discovery’ /information gain/
"KL-Gauss-Example" T. Nathan Mundhenk
Not a true measure: asymmetric →
Quantify ‘discovery surprise’ Symmetric KL Distances: All result in the same performance:
Pinto, D., Benedí, J. M., & Rosso, P. (2007). Clustering narrow-domain short texts by using the Kullback-Leibler distance. In Computational Linguistics and Intelligent Text Processing
Calculating KLDData sparseness problem: often ∞Solutions:- drop components from calculations- smothing:
Surprise in TweetsKLD application
Surprise in TweetsKLD application
Explore data: search engines Elasticsearch +Kibana = faceted data exploration
Whole dataset
I still have hopes to find where I left this partition
Whole dataset
Whole dataset
Whole datasetMH17July 17,2014
Annexation of CrimeaFeb 20... March 20,2014
Presidential electionsMay 25,2014
Experiments SETFeb 1 - 28, 2014
tweets: 5.64 M
Experiment dataset: Feb2014
Experiment dataset: English
Pipeline
Stream (tweets)
Last 8 timeslots (data model)
Timeslot(attention window)
KLD(interesting, new)
Update
new event
(act)
meh(ignore)
Simplistic topic modeling- tweets are super short
+ important events are widely discussed+ events change vocabulary- timeslot aggregation favors the predominant event
Document is a timeslot.Model:
- bag of words- freq. threshold > 200 tweets- term frequency (naive)- tokenizer: https://github.
com/jaredks/tweetokenize + a few touches
Simplistic topic modeling
Document is a time slot.Model:
- bag of words- freq. threshold > 200 tweets- term frequency (naive)- tokenizer: https://github.
com/jaredks/tweetokenize + a few touches
Vocabulary diversityFollows daily cycles
run out of disc space
Test a domain specific hack
Vocabulary: catastrophe
…
Vocabulary slots: KLD How surpriseful vocabulary of each hour against the whole dataset
Beware: on this scale individual hours are small, but events are plentiful
Higher KLD on sparse data
Lower KLD on dense data
Vocabulary slots: KLD smoothedSmoothing did not change peaks
new minimum
Vocabulary slots: rolling KLD How surpriseful vocabulary of each hour against the last 24h
Less variation on dense data
Vocabulary slots: rolling KLD How surpriseful vocabulary of each hour against the last 8h
Vocabulary slots: rolling KLD How surpriseful vocabulary of each hour against the last 4h
Event Detection ProblemOutliers detection:
- rate change of the ‘surprise’
Compare against:
Rolling KLD outliersEvents: detected rate change
Rolling KLD outliers tokensAnnotate events with the most surpriseful tokens
Further dataset limitation
primeevents
Rolling KLD outliers: Feb 19-28
Find representative tweetsLast 8 timeslots
(data model)
Timeslot(attention window)
KLD(surprising)
Update surprising tweets
-KLD(least surprising)
1. Detect distinct features
2. Find elements representing
distinct features
Surpriseful tweets link➥
Only from users with +500followers
The only spam/bot tweet selected. from the first time slot, when the prior is uniform. Notice: the dataset is not filtered!
majdannezalezhnosti.blogspot.com
1. Benchmark: ‘hot’ events from media2. Fight bots
a. spam (repetitions, bots)b. ‘forced’ opinionsc. filter low quality
3. Topic modela. no just Term Frequencyb. split topics (!)
To improve in Tweets app
Questions?
art by www.facebook.com/Marysya.Rudska