information surprise or how to find interesting data

Surprise!

Information Surprise or how to discover data

Oleksandr Pryymak /Sasha/@[email protected]

What is a ‘surprise’?

Define Surprise!

surprise

[countable] an event, a piece of news, etc. that is unexpected or that happens suddenlySYNONYMS: shock, … , eye-opener

[uncountable, countable] a feeling caused by something happening suddenly or unexpectedlySYNONYMS: astonishment, ...

(Oxford Advanced Learner's Dictionary)

http://www.oxforddictionaries.com/definition/english/shock#shock__3

http://www.oxforddictionaries.com/definition/english/eye-opener#eye-opener__3

http://www.oxforddictionaries.com/definition/english/astonishment#astonishment__3

http://www.oxforddictionaries.com/definition/learner/surprise

Cat explores

Cat explores

meh

Cat meets unexpected

Cat meets unexpected

wow

Quantify Surprise!

?measured in

wows

QuantifyComplexity can measure any content type.Note: complex is not random!

Measures of complexity1. Subjective rating2. #Distinct elements3. #Dimension4. #Control parameters5. Minimal description6. Information content7. Minimal generator8. Minimum energy

Abdallah, S., & Plumbley, M. (2009). Information dynamics: patterns of expectation and surprise in the perception of music. Connection Science, 21(2-3), 89-117.

<vs>

Neuro/Cognitive ScienceHow do we perceive information?

Machine LearningHow to measure differences?

Surprise Quants in academia

... machine that constantly tells you what you already know is just irritating. So software alerts users only to surprises...Horvitz, E., Apacible, J., Sarin, R., & Liao, L. Prediction, Expectation, and Surprise: Methods, Designs, and Study of a Deployed Traffic Forecasting Service.

Friston, K. (2010). The free-energy principle: a unified brain theory?. Nature Reviews Neuroscience, 11(2), 127-138.

Surprise Quants in academiaNeuro/Cognitive ScienceHow do we perceive information?

Machine LearningHow to measure differences?

Machine LearningNeuro/Cognitive Science


Itti, L., & Baldi, P. F. (2005). Bayesian surprise attracts human attention. In Advances in neural information processing systems (pp. 547-554).


Itti, L., & Baldi, P. F. (2005). Bayesian surprise attracts human attention. In Advances in neural information processing systems (pp. 547-554).

meh

wow

meh

Typical ML applicationsUnsupervised Learning

1. Decision trees (inf. gain)2. MaxEnt principle 3. ...

Specifically after ‘surprise’:4. One-class classification5. Anomaly detection6. Novelty measure Pimentel, M. A., Clifton, D. A., Clifton, L., & Tarassenko, L. (2014).

A review of novelty detection. Signal Processing, 99, 215-249.

Model of a catData Model

(expectations)

Data (stream) Surprising?

(interesting, new)

Update

wow(act)

meh(ignore)

Element(attention window)

Model of a cat’s surprise

Surprising?(interesting, new)

Quantify surprisal /self-information/

The surprise /information/ in observing the occurrence of an event having probability .

Axioms:≤≥

∗

Derive:∗ ∗

∗

Surprisal /self-information/:−

Flipping a fair coin provides 1bit of new information.

bitsor wows

bits

Surprisal applicationsSelecting information source:

Oleksandr Pryymak. Achieving Accurate Opinion Consensus in Large Multi-Agent SystemsUniversity of Southampton, Doctoral Thesis, 170pp., 2013

Model of a catData Model

(expectations)

Data (stream) Surprising?

(interesting, new)

Update

wow(act)

meh(ignore)


Model of a cat’s knowledgeData Model

(expectations)

Quantify ‘knowledge’ /entropy/

The Shannon entropy is the expected value of the self-information.

Notes:1. The maximum entropy distribution

is the least informative.

2. The statistical mechanics and the information entropy are principally the same.

max: log2(n)

Entropy of a Bernoulli trialX Є {0,1}

Entropy applicationsAnalysis of a binary of GeoIP ISP database:

Analyzing unknown binary files using information entropy:http://yurichev.com/blog/entropy/

http://yurichev.com/blog/entropy/

http://yurichev.com/blog/entropy/

Entropy applicationsVisualizing the OSX ksh binary (see binvis.io)

Visualizing entropy in binary files http://corte.si/posts/visualisation/entropy/index.html

1,2: Cryptic signature

http://corte.si/posts/visualisation/entropy/index.html

http://corte.si/posts/visualisation/entropy/index.html

Model of a cat’s discoveryData Model

(expectations)

Surprising?(interesting, new)

wow(act)

meh(ignore)


What has changed?

The Kullback–Leibler divergence /relative entropy, information gain/: is a measure of the information lost when Q is used to approximate P (measures the expected number of extra bits required to recode)

Quantify ‘discovery’ /information gain/

"KL-Gauss-Example" T. Nathan Mundhenk

Not a true measure: asymmetric →

Quantify ‘discovery surprise’ Symmetric KL Distances: All result in the same performance:

Pinto, D., Benedí, J. M., & Rosso, P. (2007). Clustering narrow-domain short texts by using the Kullback-Leibler distance. In Computational Linguistics and Intelligent Text Processing

Calculating KLDData sparseness problem: often ∞Solutions:- drop components from calculations- smothing:

Surprise in TweetsKLD application

Explore data: search engines Elasticsearch +Kibana = faceted data exploration

Whole dataset

I still have hopes to find where I left this partition

Whole dataset

Whole datasetMH17July 17,2014

Annexation of CrimeaFeb 20... March 20,2014

Presidential electionsMay 25,2014

Experiments SETFeb 1 - 28, 2014

tweets: 5.64 M

Experiment dataset: Feb2014

Experiment dataset: English

Pipeline

Stream (tweets)

Last 8 timeslots (data model)

Timeslot(attention window)

KLD(interesting, new)

Update

new event

(act)

meh(ignore)

Simplistic topic modeling- tweets are super short

+ important events are widely discussed+ events change vocabulary- timeslot aggregation favors the predominant event

Document is a timeslot.Model:

- bag of words- freq. threshold > 200 tweets- term frequency (naive)- tokenizer: https://github.

com/jaredks/tweetokenize + a few touches

https://github.com/jaredks/tweetokenize



Simplistic topic modeling

Document is a time slot.Model:

- bag of words- freq. threshold > 200 tweets- term frequency (naive)- tokenizer: https://github.

com/jaredks/tweetokenize + a few touches




Vocabulary diversityFollows daily cycles

run out of disc space

Test a domain specific hack

Vocabulary: catastrophe

…

Vocabulary slots: KLD How surpriseful vocabulary of each hour against the whole dataset

Beware: on this scale individual hours are small, but events are plentiful

Higher KLD on sparse data

Lower KLD on dense data

Vocabulary slots: KLD smoothedSmoothing did not change peaks

new minimum

Vocabulary slots: rolling KLD How surpriseful vocabulary of each hour against the last 24h

Less variation on dense data

Event Detection ProblemOutliers detection:

- rate change of the ‘surprise’

Compare against:

Rolling KLD outliersEvents: detected rate change

Rolling KLD outliers tokensAnnotate events with the most surpriseful tokens

Further dataset limitation

primeevents

Rolling KLD outliers: Feb 19-28

Find representative tweetsLast 8 timeslots

(data model)

Timeslot(attention window)

KLD(surprising)

Update surprising tweets

-KLD(least surprising)

1. Detect distinct features

2. Find elements representing

distinct features

Surpriseful tweets link➥

Only from users with +500followers

http://9p.org.ua/tweets/t2.html

http://9p.org.ua/tweets/t2.html

The only spam/bot tweet selected. from the first time slot, when the prior is uniform. Notice: the dataset is not filtered!

majdannezalezhnosti.blogspot.com

http://majdannezalezhnosti.blogspot.com

http://majdannezalezhnosti.blogspot.com

1. Benchmark: ‘hot’ events from media2. Fight bots

a. spam (repetitions, bots)b. ‘forced’ opinionsc. filter low quality

3. Topic modela. no just Term Frequencyb. split topics (!)

To improve in Tweets app

Questions?

art by www.facebook.com/Marysya.Rudska