human-centric data exploration · 2019-09-13 · human-centric data exploration part 1/5:...

194
HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie slides in collaboration with Jefrey Lijffijt Ghent University Based on joint work with many others (see references and final slide of this lecture) DEPARTMENT ELECTRONICS AND INFORMATION SYSTEMS (ELIS) RESEARCH GROUP IDLAB www.forsied.net 1

Upload: others

Post on 01-Aug-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

HUMAN-CENTRIC

DATA EXPLORATION

PART 1/5: MOTIVATION, BACKGROUND & OUTLINE

Tijl De Bie – slides in collaboration with Jefrey LijffijtGhent University

Based on joint work with many others (see references and final slide of this lecture)

DEPARTMENT ELECTRONICS AND INFORMATION SYSTEMS (ELIS)

RESEARCH GROUP IDLAB

www.forsied.net 1

Page 2: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SUBJECTIVITY

AND VISUALIZATION

PART 1/5: MOTIVATION, BACKGROUND & OUTLINE

Tijl De Bie – slides in collaboration with Jefrey LijffijtGhent University

DEPARTMENT ELECTRONICS AND INFORMATION SYSTEMS (ELIS)

RESEARCH GROUP IDLAB

www.forsied.net 2

Page 3: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MOTIVATION

3

Page 4: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SUBJECTIVITY = KEY

Three motivating examples:

1. Frequent itemset mining

‒ Individually frequent items = probably frequent together

2. Visualizing high-dimensional data

‒ Outliers = high variance, so why maximize it in PCA?

‒ Interaction = key

3. Graph embedding

‒ High degree nodes = probably embedded centrally

4

Page 5: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Subjective interestingness ranking

Prior info on:Row & column sums

#docs

Support x size (area)

#docs

svm, support, machin, vector 25 data, paper 389

state, art 39 algorithm, propose 246

unlabelled, labelled, supervised, learn 10 data, mine 312

associ, rule, mine 36 base, method 202

gene, express 25 result, show 196

frequent, itemset 28 problem 373

large, social, network, graph 15 data, set 279

column, row 13 approach 330

algorithm, order, magnitud, faster 12 model 301

paper, propos, algorithm, real, synthetic, data

27 present 296

ASSOCIATION ANALYSIS / ITEMSET MINING

words

docum

ents

KDD abstracts dataset:

Page 6: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Subjective interestingness ranking

Prior info on:Row & column sums

#docs

Support x size (area)

#docs

svm, support, machin, vector 25 data, paper 389

state, art 39 algorithm, propose 246

unlabelled, labelled, supervised, learn 10 data, mine 312

associ, rule, mine 36 base, method 202

gene, express 25 result, show 196

frequent, itemset 28 problem 373

large, social, network, graph 15 data, set 279

column, row 13 approach 330

algorithm, order, magnitud, faster 12 model 301

paper, propos, algorithm, real, synthetic, data

27 present 296

ASSOCIATION ANALYSIS / ITEMSET MINING

Subjective interestingness ranking

Prior info on:Row & column sums

#docs

Subjective interestingnessrankingAdditionally prior info on:Keyword tiles

#docs

svm, support, machin, vector 25 art, state 39

state, art 39 row, column, algorithm 12

unlabelled, labelled, supervised, learn 10 unlabelled, labelled, data 14

associ, rule, mine 36 answer, question 18

gene, express 25 Precis, recal 14

words

docum

ents

KDD abstracts dataset:

Page 8: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONDITIONAL NETWORK EMBEDDINGS

8

Page 9: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

The search for interesting patterns in data

• Association analysis

• Dimensionality reduction

• Graph embedding

• Clustering

• Community detection

• Privacy-preserving data publishing

• … Zillions of

PCA, ICA, projection pursuit, Laplacian Eigenmaps,

tSNE, LLE,…

K-means clustering, hierarchical clustering, Mixture of

Gaussians, spectral clustering,…

Stochastic block modelling, modularity, k-cores, quasi-

cliques, dense subgraphs,…

Frequency, lift, confidence, leverage, coverage,...

EXPLORING DATA

‘Interestingness measures’

Objective functions

Quality functions

Utility functions

Cost functions

Node2Vec, Path2Vec, MetaPath2Vec,...

Discernibility, generalization height,

average group size,...

Page 10: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

THE CHALLENGE

Zillions of interestingness measures = good & bad‒ Good: more options!

‒ Bad: the trees & the forest…

Challenge:

‒ Formalise true interestingness!‒ With minimal user interaction

‒ Without requiring user expertise

Page 11: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MOTIVATING EXAMPLE

Community detection:

What makes for an interesting community?

‒ Densely connected?

‒ Large?

‒ Few neighbours outside community?

‒ Unrelated to certain known ‘affiliations’?‒ …

11

Page 12: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

THE FORSIED APPROACH: SUBJECTIVITY!

12

DataData mining

researcher

Interestingness(pattern)

Page 13: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

THE FORSIED APPROACH: SUBJECTIVITY!

13

Data

Data mining

researcher

Data

analyst

Interestingness(pattern) Interestingness(pattern, analyst)

Interestingness = subjective

Page 14: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MOTIVATING EXAMPLE

Community detection:

User states expectations / beliefs

‒ Formalized as a ‘background distribution’ Any ‘pattern’ that contrasts with this and is easy to describe

= subjectively interesting

14

Page 15: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OUTLINE

15

Page 16: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OUTLINEPart 1: Introduction and motivation

10mins

Part 2: The FORSIED framework40mins

Part 3: Binary matrices, graphs, and relational data40mins

BREAK

Part 4: Numeric and mixed data (including high-dimensional data visualization)50mins

Part 5: Advanced topics, outlook & conclusions25mins

Q&A15mins

16

Page 17: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

17

Feel free to interrupt

for questions anytime

Page 18: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

HUMAN-CENTRIC

DATA EXPLORATION

PART 2/5: THE FORSIED FRAMEWORK

Tijl De Bie – slides in collaboration with Jefrey LijffijtGhent University

DEPARTMENT ELECTRONICS AND INFORMATION SYSTEMS (ELIS)

RESEARCH GROUP IDLAB

www.forsied.net 1

Page 19: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OUTLINEPart 1: Introduction and motivation

10mins

Part 2: The FORSIED framework40mins

Part 3: Binary matrices, graphs, and relational data40mins

BREAK

Part 4: Numeric and mixed data (including high-dimensional data visualization)50mins

Part 5: Advanced topics, outlook & conclusions25mins

Q&A15mins

2

Page 20: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

GENERIC FRAMEWORK

3

De Bie, KDD 2011

De Bie, DAMI 2011

Page 21: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

FORSIED(*)

framework

Data PatternsModel of user beliefs

(“background distribution”)

Data space

Data

Pattern

𝑃 evolves!

Pattern User Subjective!

(*)Formalizing Subjective Interestingness in Exploratory Data mining

𝑥 ∈ Ω′𝑥 ∈ Ω

ΩΩ′′Ω′ 𝑃′𝑃′′ 𝑃

• Data: the adjacency matrix

of a graph under study

• Patterns: the claim that a

specified set of nodes are

densely connected

• Prior beliefs: the degrees of

the nodes, a known block

structure,...

• Interestingness: subjective

information density

• Overlapping communities!

Interestingness Ω′, 𝑃 = InformationContent Ω′, 𝑃DescriptionLength Ω′

InformationContent Ω′, 𝑃 = − log 𝑃 Ω′

SI = IC/DL

Page 22: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

THE FINE PRINT

Initial background distribution 𝑃?‒ Maximum entropy distribution s.t. prior belief constraints:

Updated background distribution 𝑃′ given pattern 𝑥 ∈ Ω′?‒ 𝑃 conditioned onto event 𝑥 ∈ Ω′𝑃′ Ω′′ = 𝑃 Ω′′ ∩ Ω′𝑃 Ω′⇒ −log 𝑃′ 𝒙 = −log 𝑃 𝒙 + log 𝑃 Ω′

Description length?‒ Smaller if the pattern is a better explanation

‒ Essentially problem-dependent

max𝑃 𝐸𝑋~𝑃 − log𝑃(𝑋) s.t. 𝐸𝑋~𝑃 𝑓 𝑋 = 𝑐𝑓 (∀𝑓) ΩΩ′ 𝑃′ 𝑃𝑥

Information content

in data after pattern

Information content

in data before pattern

Minus information content

of pattern under 𝑃

Ω′′

Page 23: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

WHY MAXIMUM ENTROPY / CONDITIONING?

Most unbiased estimateInformal... no bias other than the constraints

Assume cautious / pessimistic userA user who expects to be very surprised

Leads to most robust estimate of true subjective information

contentInformation content estimated with maxent 𝑃 will never differ much from

information content w.r.t. true prior belief of user

6

Page 24: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

A FIRST INSTANTIATION:COMMUNITY DETECTION

7

van Leeuwen, De Bie, Spyropoulou, Mesnage, MLj, 2016

Page 25: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COMMUNITY DETECTION IN NETWORKS

Data: Graph

Prior beliefs:1. Overall density2. or: Vertex degrees

MaxEnt distribution:𝑃 𝑨 =ෑ𝑖>𝑗 𝑃𝑖,𝑗 (𝑎𝑖𝑗)𝑃𝑖,𝑗 𝑎𝑖𝑗 = exp 𝑎𝑖𝑗 ∙ 𝜆𝑖 + 𝜆𝑗1 + exp 𝜆𝑖 + 𝜆𝑗

𝑷𝒊,𝒋 𝒂𝒊𝒋 small

𝑷𝒊,𝒋 𝒂𝒊𝒋large

Adjacency

matrixEdge indicator

variables

Page 26: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COMMUNITY DETECTION IN NETWORKS

Data: Graph

Prior beliefs:1. Overall density2. or: Vertex degrees

Pattern: Dense subgraphs

𝑷𝒊,𝒋 𝒂𝒊𝒋 small

𝑷𝒊,𝒋 𝒂𝒊𝒋large𝑖,𝑗∈subgraph𝑎𝑖𝑗 ≥ 𝑘

Page 27: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COMMUNITY DETECTION IN NETWORKS

Data: Graph

Prior beliefs:1. Overall density2. or: Vertex degrees

Pattern: Dense subgraphs

Interestingness:

𝑷𝒊,𝒋 𝒂𝒊𝒋 small

𝑷𝒊,𝒋 𝒂𝒊𝒋large− log𝑃 patternDescriptionLength(pattern)

Page 28: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COMMUNITY DETECTION IN NETWORKS

Data: Graph

Prior beliefs:1. Overall density2. or: Vertex degrees

Pattern: Dense subgraphs

Interestingness: Density vs. size

2. preferably low degree nodes

Most interesting given 1.

Most interesting given 2.

Page 29: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COMMUNITY DETECTION IN NETWORKS

Data: Graph

Prior beliefs:1. Overall density2. or: Vertex degrees

Pattern: Dense subgraphs

Interestingness: Density vs. size

2. preferably low degree nodes

Hill-climbing for searchUpdate 𝑷 after each pattern

Most interesting given 1.

Most interesting given 2.

Page 30: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COMMUNITY DETECTION IN NETWORKS

Data: Graph

Prior beliefs:1. Overall density2. or: Vertex degrees

Pattern: Dense subgraphs

Interestingness: Density vs. size

2. preferably low degree nodes

Hill-climbing for searchUpdate 𝑷 after each pattern

Rock

Trance

Indie

Bhangra

GospelCountry

Hip hop / grime

Afro pop

UK garage

Hip hop

Page 31: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TAKE-AWAYS

1. What is the data?

2. Determine suitable pattern syntax

3. What are the prior beliefs? (= what is irrelevant to user?)

Compute background distribution 𝑷 using maximum entropy

4. Formulate subjective interestingness:

5. Design an algorithm to optimize it

6. Find out how to condition background distribution on a pattern

14

Interestingness Ω′, 𝑃 = InformationContent Ω′, 𝑃DescriptionLength Ω′InformationContent Ω′, 𝑃 = − log 𝑃 Ω′

Page 32: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

THE BACKGROUND DISTRIBUTION: MAXENT

15

Page 33: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MAXENT MODEL S.T. DEGREE BELIEFS

max𝑃 𝑨 −𝑃 𝑨 ∙ log 𝑃 𝑨s.t. 𝑨 𝑃 𝑨 ∙ 𝑗=1:𝑛 𝑎𝑖𝑗 = 𝑑𝑖 , ∀𝑖 = 1: 𝑛

𝑨 𝑃 𝑨 = 1𝐿 𝑃, 𝝀, 𝜇 =𝑨 −𝑃 𝑨 ∙ log 𝑃 𝑨 + 𝑖=1:𝑛 𝜆𝑖 𝑨 𝑃 𝑨 ∙ 𝑗=1:𝑛 𝑎𝑖𝑗 − 𝑑𝑖 + 𝜇 𝑨 𝑃 𝑨 − 1

Convex!

Lagrangian:

Entropy

Degree

Expected degree constraint

Normalization

Lagrange multipliers

𝑨 = adjacency matrix with 𝑎𝑖𝑗 in row 𝑖 and column 𝑗

Page 34: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

𝐿 𝑃, 𝝀, 𝜇 =𝑨 −𝑃 𝑨 ∙ log 𝑃 𝑨 + 𝑖=1:𝑛 𝜆𝑖 𝑨 𝑃 𝑨 ∙ 𝑗=1:𝑛 𝑎𝑖𝑗 − 𝑑𝑖 + 𝜇 𝑨 𝑃 𝑨 − 1Optimality condition:𝜕𝜕𝑃(𝑨) 𝐿 𝑃, 𝝀, 𝜇 = 0𝜕𝜕𝑃(𝑨) 𝐿 𝑃, 𝝀, 𝜇 = − log 𝑃 𝑨 − 1 + 𝑖,𝑗=1:𝑛 𝜆𝑖𝑎𝑖𝑗 + 𝜇 = 0

MAXENT MODEL S.T. DEGREE BELIEFS

Page 35: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MAXENT MODEL S.T. DEGREE BELIEFS𝜕𝜕𝑃(𝑨) 𝐿 𝑃, 𝝀, 𝜇 = − log 𝑃 𝑨 − 1 + 𝑖,𝑗=1:𝑛 𝜆𝑖𝑎𝑖𝑗 + 𝜇 = 0So: 𝑃 𝑨 = exp 𝜇 − 1 ∙ exp 𝑖,𝑗=1:𝑛 𝜆𝑖𝑎𝑖𝑗= 1𝑍 𝝀 ∙ exp 𝑖>𝑗 𝜆𝑖 + 𝜆𝑗 𝑎𝑖𝑗 = 1𝑍 𝝀 ∙ෑ𝑖>𝑗 exp 𝜆𝑖 + 𝜆𝑗 ∙ 𝑎𝑖𝑗

=ෑ𝑖>𝑗 exp 𝜆𝑖 + 𝜆𝑗 ∙ 𝑎𝑖𝑗1+exp 𝜆𝑖 + 𝜆𝑗

Page 36: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MAXENT MODEL S.T. DEGREE BELIEFS𝜕𝜕𝑃(𝑨) 𝐿 𝑃, 𝝀, 𝜇 = − log 𝑃 𝑨 − 1 + 𝑖,𝑗=1:𝑛 𝜆𝑖𝑎𝑖𝑗 + 𝜇 = 0So: 𝑃 𝑨 = exp 𝜇 − 1 ∙ exp 𝑖,𝑗=1:𝑛 𝜆𝑖𝑎𝑖𝑗= 1𝑍 𝝀 ∙ exp 𝑖>𝑗 𝜆𝑖 + 𝜆𝑗 𝑎𝑖𝑗 = 1𝑍 𝝀 ∙ෑ𝑖>𝑗 exp 𝜆𝑖 + 𝜆𝑗 ∙ 𝑎𝑖𝑗

=ෑ𝑖>𝑗 exp 𝜆𝑖 + 𝜆𝑗 ∙ 𝑎𝑖𝑗1+exp 𝜆𝑖 + 𝜆𝑗

Page 37: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MAXENT MODEL S.T. DEGREE BELIEFS𝜕𝜕𝑃(𝑨) 𝐿 𝑃, 𝝀, 𝜇 = − log 𝑃 𝑨 − 1 + 𝑖,𝑗=1:𝑛 𝜆𝑖𝑎𝑖𝑗 + 𝜇 = 0So: 𝑃 𝑨 = exp 𝜇 − 1 ∙ exp 𝑖,𝑗=1:𝑛 𝜆𝑖𝑎𝑖𝑗= 1𝑍 𝝀 ∙ exp 𝑖>𝑗 𝜆𝑖 + 𝜆𝑗 𝑎𝑖𝑗 = 1𝑍 𝝀 ∙ෑ𝑖>𝑗 exp 𝜆𝑖 + 𝜆𝑗 ∙ 𝑎𝑖𝑗

=ෑ𝑖>𝑗 exp 𝜆𝑖 + 𝜆𝑗 ∙ 𝑎𝑖𝑗1+exp 𝜆𝑖 + 𝜆𝑗

Page 38: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MAXENT MODEL S.T. DEGREE BELIEFS𝜕𝜕𝑃(𝑨) 𝐿 𝑃, 𝝀, 𝜇 = − log 𝑃 𝑨 − 1 + 𝑖,𝑗=1:𝑛 𝜆𝑖𝑎𝑖𝑗 + 𝜇 = 0So: 𝑃 𝑨 = exp 𝜇 − 1 ∙ exp 𝑖,𝑗=1:𝑛 𝜆𝑖𝑎𝑖𝑗= 1𝑍 𝝀 ∙ exp 𝑖>𝑗 𝜆𝑖 + 𝜆𝑗 𝑎𝑖𝑗 = 1𝑍 𝝀 ∙ෑ𝑖>𝑗 exp 𝜆𝑖 + 𝜆𝑗 ∙ 𝑎𝑖𝑗

=ෑ𝑖>𝑗 exp 𝜆𝑖 + 𝜆𝑗 ∙ 𝑎𝑖𝑗1+exp 𝜆𝑖 + 𝜆𝑗 𝑃𝑖,𝑗 𝑎𝑖𝑗 Product of independent

Bernoulli distributions!

Thanks to fact that prior belief constraint

is on a (weighted) sum of the 𝑎𝑖𝑗

Page 39: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MAXENT MODEL S.T. DEGREE BELIEFS

To find optimal values of Lagrange multipliers, solve the dual:min𝝀 𝐿 𝑃, 𝝀where 𝑃 is given as:

𝑃 𝑨 =ෑ𝑖>𝑗 exp 𝜆𝑖 + 𝜆𝑗 ∙ 𝑎𝑖𝑗1+exp 𝜆𝑖 + 𝜆𝑗After some calculations:

22

min𝝀 𝑖>𝑗 log 1 + exp 𝜆𝑖 + 𝜆𝑗 − 𝑖=1:𝑛 𝜆𝑖𝑑𝑖

Page 40: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MAXENT MODEL S.T. DEGREE BELIEFS

min𝝀 𝑖>𝑗 log 1 + exp 𝜆𝑖 + 𝜆𝑗 − 𝑖=1:𝑛 𝜆𝑖𝑑𝑖Can be solved using gradient descent:𝜕𝜕𝜆𝑘 𝑖>𝑗 log 1 + exp 𝜆𝑖 + 𝜆𝑗 − 𝑖=1:𝑛 𝜆𝑖𝑑𝑖= 𝑖=1:𝑛 exp 𝜆𝑖 + 𝜆𝑘1+exp 𝜆𝑖 + 𝜆𝑘 − 𝑑𝑘Lots of computational speed-ups possible...

23

Expected degree of node 𝒌 Required expected degree of node 𝒌

min𝝀 𝑖>𝑗 log 1 + exp 𝜆𝑖 + 𝜆𝑗 − 𝑖=1:𝑛 𝜆𝑖𝑑𝑖

Page 41: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TAKE-AWAYS

Constraints on the expected value of weighted averages:𝐸𝑨~𝑃 𝑖,𝑗∈𝐼 𝑓𝑖𝑗𝑎𝑖𝑗 = 𝑐where 𝑓𝑖𝑗 and 𝑐 are constants and 𝐼 is a set of indices,

lead to convenient product distributions

Other examples for graphs: Overall density (trivial)

Densities of particular blocks (e.g. block of nodes with same

affiliation)

Assortativity (approximately)

...

24

Page 42: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

THE INTERESTINGNESS

25

Page 43: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

INFORMATION CONTENT

Information content:InformationContent pattern, 𝑃 = − log 𝑃 pattern Pattern: “number of edges between given set of nodes 𝑊 ⊆ 𝑉 is larger than or

equal to a specified 𝑘𝑊” A bit tricky...

Cliques as a special case: “set of nodes 𝑊 ⊆ 𝑉 forms a clique”. Then:𝑃 pattern = ෑ𝑖>𝑗∈𝑊𝑃𝑖,𝑗 1 So: InformationContent pattern, 𝑃 = − log 𝑃 pattern= −σ𝑖>𝑗∈𝑊 log 𝑃𝑖,𝑗 1 Larger if |𝑊| is larger and if 𝑃𝑖,𝑗 1 for 𝑖, 𝑗 ∈ 𝑊 is smaller

26

Page 44: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

INFORMATION CONTENT

Pattern (general case): “number of edges between given set of nodes 𝑊 ⊆ 𝑉 is

larger than or equal to a specified 𝑘𝑊” Probability of at least 𝑘𝑊 successes in 𝑛𝑊 = |𝑊|2 Bernoulli trials?

Approximated by:𝑃 pattern ≈ exp −𝑛𝑊KL 𝑘𝑊𝑛𝑊 ||𝑝𝑊where 𝑝𝑊 is the average probability 𝑃𝑖,𝑗 1 for the edges between 𝑖, 𝑗 ∈ 𝑊

And thus:InformationContent pattern, 𝑃 = − log 𝑃 pattern≈ 𝑛𝑊KL 𝑘𝑊𝑛𝑊 ||𝑝𝑊 Larger if |𝑊| (and thus 𝑛𝑊) is larger, 𝑝𝑊 is smaller, and 𝑘𝑊 is larger

27

Page 45: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

DESCRIPTION LENGTH

For cliques:

Describe set 𝑊 A constant (to describe |𝑊|)

plus a linear term in 𝑊 (to describe its elements):DescriptionLength pattern = 𝛼 𝑊 + 𝛽 For dense subgraphs:

Constant 𝛽 also describes 𝑘𝑊28

Page 46: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

INTERESTINGNESS

Putting things together:Interestingness pattern, 𝑃 = −σ𝑖>𝑗∈𝑊 log 𝑃𝑖,𝑗 1𝛼 𝑊 + 𝛽 A bit more complex for general dense subgraphs:

Interestingness pattern, 𝑃 ≈ 𝑛𝑊KL 𝑘𝑊𝑛𝑊 ||𝑝𝑊𝛼 𝑊 + 𝛽 Hard to optimize!

Exact search for small graphs

Effective hill climber for large graphs

29

Page 47: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TAKE-AWAYS

No compromises w.r.t. interestingness

Often leads to hard search problems

Question: is this intrinsic to genuine subjective

interestingness?

30

Page 48: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

UPDATING THE BACKGROUND DISTRIBUTION

31

Page 49: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

UPDATING THE BACKGROUND DISTRIBUTION

Given a pattern,update the background distribution by conditioning on the pattern

Easy to do for cliques 𝑊:

Set 𝑃′𝑖,𝑗 𝑎𝑖𝑗 = 1 for 𝑎𝑖𝑗 = 1 if 𝑖, 𝑗 ∈ 𝑊 Fast to approximate for (non-clique) dense subgraphs 𝑊:

Set 𝑃′𝑖,𝑗 𝑎𝑖𝑗 ∝ 𝑃𝑖,𝑗 𝑎𝑖𝑗 ⋅ exp 𝜆𝑊𝑎𝑖𝑗 if 𝑖, 𝑗 ∈ 𝑊such that the expected density of 𝑊 is 𝑘𝑊

Remains a product of Bernoulli’s32

Page 50: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TAKE-AWAYS

Updating can be trivial

Otherwise, often easy to do approximately

33

Page 51: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

HUMAN-CENTRIC

DATA EXPLORATION

PART 3/5: BINARY MATRICES, GRAPHS, RELATIONAL DATA

Tijl De Bie – slides in collaboration with Jefrey LijffijtGhent University

DEPARTMENT ELECTRONICS AND INFORMATION SYSTEMS (ELIS)

RESEARCH GROUP IDLAB

www.forsied.net 1

Page 52: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OUTLINEPart 1: Introduction and motivation

10mins

Part 2: The FORSIED framework40mins

Part 3: Binary matrices, graphs, and relational data40mins

BREAK

Part 4: Numeric and mixed data (including high-dimensional data visualization)50mins

Part 5: Advanced topics, outlook & conclusions25mins

Q&A15mins

2

Page 53: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OUTLINE OF THIS PART

(Community detection)

Itemsets

Relational patterns

Connecting trees

Network embedding

3

Page 54: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

4

Page 55: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Data: binary matrix:

5

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

De Bie, DMKD 2011

Page 56: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Data: binary matrix:

Prior beliefs: row and column sums

6

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

De Bie, DMKD 2011

Page 57: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

7

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

Data: binary matrix:

Prior beliefs: row and column sums

Background distribution 𝑃- Nonnegative and properly normalized

- Has correct marginals Many solutions !?

‘Unbiased’ distribution: Maximum Entropy distribution

De Bie, DMKD 2011

Page 58: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

ITEMSETS

Data: binary matrix:

Prior beliefs: row and column sums

Background distribution 𝑃

8

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

𝑃 𝑿 =ෑ𝑖,𝑗 𝑃𝑖,𝑗 (𝑥𝑖𝑗) 𝑃𝑖,𝑗 𝑥𝑖𝑗 = exp 𝑥𝑖𝑗 ∙ 𝜇𝑖 + 𝜆𝑗1 + exp 𝜇𝑖 + 𝜆𝑗

De Bie, DMKD 2011

Convex

optimization

problem

‘Unbiased’ distribution: Maximum Entropy distribution

Page 59: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Data: binary matrix:

Prior beliefs: uniform at observed density

9

𝑿 ∈ {0,1}𝑚×𝑛De Bie, DMKD 2011

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

𝑃𝑖,𝑗 = 1424 ∀ 𝑖, 𝑗𝑃𝑖,𝑗 𝑥𝑖𝑗 = exp 𝑥𝑖𝑗 ∙ 𝜆1 + exp 𝜆 𝜆 = 0.3365

Page 60: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Data: binary matrix:

Prior beliefs: row and column sums

Patterns: tiles (set of rows and columns)

10

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

Large support may not be

interesting

?

De Bie, DMKD 2011

Page 61: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Data: binary matrix:

Prior beliefs: row and column sums

Patterns: tiles

11

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

Large surface may not be

interesting

?

De Bie, DMKD 2011

Page 62: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Data: binary matrix:

Prior beliefs: row and column sums

Patterns: tiles

12

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

Less expected smaller row

and column margins

Bingo!

De Bie, DMKD 2011

Page 63: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Data: binary matrix:

Prior beliefs: row and column sums

Patterns: tiles

As stated SI = IC / DL

13

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

Less expected smaller row

and column margins

Bingo!

De Bie, DMKD 2011

IC 𝑍 = − log Pr 𝑍 = 𝑖,𝑗∈𝑍−log Pr 𝑝𝑖,𝑗DL 𝑍 = 𝑎 #rows + #columns + 𝑏

Page 64: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

ITEMSETS

Data: binary matrix:

Prior beliefs: row and column sums

Patterns: tiles

14

𝑿 ∈ {0,1}𝑚×𝑛 Iterate

De Bie, DMKD 2011

We specified there are ones here

Page 65: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Data: binary matrix:

Prior beliefs: row and column sums

Patterns: tiles

15

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

Iterate

De Bie, DMKD 2011

Page 66: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Data: binary matrix:

Prior beliefs: row and column sums

Patterns: tiles

16

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

Iterate

De Bie, DMKD 2011

Page 67: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Subjective interestingness ranking

Prior info on:Row & column sums

#docs

Support x size (area)

#docs

svm, support, machin, vector 25 data, paper 389

state, art 39 algorithm, propose 246

unlabelled, labelled, supervised, learn 10 data, mine 312

associ, rule, mine 36 base, method 202

gene, express 25 result, show 196

frequent, itemset 28 problem 373

large, social, network, graph 15 data, set 279

column, row 13 approach 330

algorithm, order, magnitud, faster 12 model 301

paper, propos, algorithm, real, synthetic, data

27 present 296

ITEMSETS

words

docum

ents

KDD abstracts dataset:

De Bie, DMKD 2011

Page 68: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Subjective interestingness ranking

Prior info on:Row & column sums

#docs

Support x size (area)

#docs

svm, support, machin, vector 25 data, paper 389

state, art 39 algorithm, propose 246

unlabelled, labelled, supervised, learn 10 data, mine 312

associ, rule, mine 36 base, method 202

gene, express 25 result, show 196

frequent, itemset 28 problem 373

large, social, network, graph 15 data, set 279

column, row 13 approach 330

algorithm, order, magnitud, faster 12 model 301

paper, propos, algorithm, real, synthetic, data

27 present 296

ITEMSETS

Subjective interestingness ranking

Prior info on:Row & column sums

#docs

Subjective interestingnessrankingAdditionally prior info on:Keyword tiles

#docs

svm, support, machin, vector 25 art, state 39

state, art 39 row, column, algorithm 12

unlabelled, labelled, supervised, learn 10 unlabelled, labelled, data 14

associ, rule, mine 36 answer, question 18

gene, express 25 Precis, recal 14

words

docum

ents

KDD abstracts dataset:

De Bie, DMKD 2011

Page 69: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Data: binary matrix:

Prior beliefs: row and column sums

Background distribution 𝑃 Patterns: tiles of ones and zeros

19

𝑿 ∈ {0,1}𝑚×𝑛

Beer Diapers Lipstick Carrier SUM

Alice 1 1 1 3

Bob 1 1 1 3

Charlie 1 1 2

Denise 1 1 2

Eve 1 1 2

Frankie 1 1 2

SUM 4 3 2 5

Extension: noisy tiles

Kontonasios & De Bie, SDM 2010

IC is straightforward

DL depends on the skew

(entropy) of the distribution of

ones and zeros within the tile

Page 70: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ITEMSETS

Algorithmic approach: ?

Not studied extensively, but special case of relational

patterns (see next instantiation)

Interesting result: if we can mine the best pattern at

every iteration, then we approximate the total IC for the

best set of tiles with that (cumulative) DL at

20

1 − 1𝑒 (≈ 0.63)Kontonasios & De Bie, SDM 2010

Page 71: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RELATIONAL PATTERNS

21

Page 72: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RELATIONAL PATTERN MINING

Data: relational database

Pattern: connected complete subgraphs

Prior beliefs: degree of each node in each

relationship

Customers Items Attributes

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Lijffijt, Spyropoulou, Kang, De Bie (DSAA 2015, IJDSA 2016)

Guns, Aknin, Lijffijt, De Bie (ICDM 2016)

Page 73: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RELATIONAL PATTERN MINING

Data: relational database

Pattern: connected complete subgraphs

Prior beliefs: degree of each node in each

relationship

Customers Items Attributes

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Lijffijt, Spyropoulou, Kang, De Bie (DSAA 2015, IJDSA 2016)

Guns, Aknin, Lijffijt, De Bie (ICDM 2016)

Prior factorizes over relationships

equivalent to itemsets

Page 74: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Users Films

Genres

Actors

RELATIONAL PATTERN MINING

RMiner

Page 75: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

Based on fixpoint-enumeration (Boley et al. 2010)

25

A1

A2

A3

B1

B2

B3

C1

C2

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 76: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

2626

A1

A2

A3

B1

B2

B3

C1

C2

1) Branch on any entity

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 77: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

2727

A1

A2

A3

B1

B2

B3

C1

C2

1) Branch on any entity

2) Compute closure

- Add entities that are in all supersets

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 78: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

28

A1

A2

A3

B1

B2

B3

C1

C2

1) Branch on any entity

2) Compute closure

- Add entities that are in all supersets

- Repeat until no valid candidates

- If invalid candidates

- Stop

- Else

- Output pattern

- Backtrack and declare entity invalid

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 79: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

29

A1

A2

A3

B1

B2

B3

C1

C2

1) Branch on any entity

2) Compute closure

- Add entities that are in all supersets

- Repeat until no valid candidates

- If invalid candidates

- Stop

- Else

- Output pattern

- Backtrack and declare entity invalid

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 80: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

30

A1

A2

A3

B1

B2

B3

C1

C2

1) Branch on any entity

2) Compute closure

- Add entities that are in all supersets

- Repeat until no valid candidates

- If invalid candidates

- Stop

- Else

- Output pattern

- Backtrack and declare entity invalid

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 81: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

31

A1

A2

A3

B1

B2

B3

C1

C2

1) Branch on any entity

2) Compute closure

- Add entities that are in all supersets

- Repeat until no valid candidates

- If invalid candidates

- Stop

- Else

- Output pattern

- Backtrack and declare entity invalid

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 82: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

32

A1

A2

A3

B1

B2

B3

C1

C2

1) Branch on any entity

2) Compute closure

- Add entities that are in all supersets

- Repeat until no valid candidates

- If invalid candidates

- Stop

- Else

- Output pattern

- Backtrack and declare entity invalid

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 83: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

33

A1

A2

A3

B1

B2

B3

C1

C2

1) Branch on any entity

2) Compute closure

- Add entities that are in all supersets

- If invalid added, backtrack

- Repeat until no valid candidates

- If invalid candidates

- Stop

- Else

- Output pattern

- Backtrack and declare entity invalid

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 84: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

34

A1

A2

A3

B1

B2

B3

C1

C2

1) Branch on any entity

2) Compute closure

- Add entities that are in all supersets

- If invalid added, backtrack

- Repeat until no valid candidates

- If invalid candidates

- Stop

- Else

- Output pattern

- Backtrack and declare entity invalid

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 85: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RMINER

Algorithmic approach: enumerate + rank

35

A1

A2

A3

B1

B2

B3

C1

C2

1) Branch on any entity

2) Compute closure

- Add entities that are in all supersets

- If invalid added, backtrack

- Repeat until no valid candidates

- If invalid candidates

- Stop

- Else

- Output pattern

- Backtrack and declare entity invalid

Etc. etc.

Spyropoulou, De Bie, Boley (DMKD 2014, DS 2013)

Page 86: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Circadian

Numeric

Numeric Taxonomyelement

Taxonomyelement

n-ary relation

N-RMiner

P-N-RMiner

RELATIONAL PATTERN MINING

Page 87: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

P-N-RMINER: FISHER DATA

37

Lijffijt, Spyropoulou, Kang, De Bie (DSAA 2015, IJDSA 2016)

Page 88: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CP-RMINER

Algorithmic approach: branch & bound in CP (top 1)

38

Guns, Aknin, Lijffijt, De Bie (ICDM 2016)

Page 89: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONNECTING TREES

39

Page 90: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

I

D

CONNECTING SUBTREES

Data: Graph , known, unknown

40

𝑬 ⊆ 𝑽 × 𝑽𝑮 = (𝑽, 𝑬) 𝑽A

B

C

E

F

G

H

Adriaens, Lijffijt, De Bie (ECMLPKDD 2017)

Page 91: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONNECTING SUBTREES

Data: Graph , known, unknown

Prior beliefs: degree of vertices, (batch) time order

41

𝑬 ⊆ 𝑽 × 𝑽𝑮 = (𝑽, 𝑬) 𝑽

I

DA

B

C

E

F

G

H

Adriaens, Lijffijt, De Bie (ECMLPKDD 2017)

Page 92: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

I

D

CONNECTING SUBTREES

Data: Graph , known, unknown

Prior beliefs: degree of vertices, (batch) time order

Patterns: subtree connecting query vertices Q

42

𝑬 ⊆ 𝑽 × 𝑽𝑮 = (𝑽, 𝑬) 𝑽A

B

C

E

F

G

H

Adriaens, Lijffijt, De Bie (ECMLPKDD 2017)

Page 93: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

I

D

CONNECTING SUBTREES

Data: Graph , known, unknown

Prior beliefs: degree of vertices, (batch) time order

Patterns: subtree connecting query vertices Q

Which is more interesting?

43

𝑬 ⊆ 𝑽 × 𝑽𝑮 = (𝑽, 𝑬) 𝑽A

B

C

E

F

G

H

ID

F

D

I

C

E

I

D

A

C

E

F

Unsuprising No information

compression

Most interesting,

since E and C have

small in-degree

Adriaens, Lijffijt, De Bie (ECMLPKDD 2017, DAMI 2019)

Page 94: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TIME DIFFERENCE PRIOR

Consider citation network

Papers arrive in batches

(per year)

Earlier papers cannot /

rarely cite newer papers

44

Adriaens, Lijffijt, De Bie (ECMLPKDD 2017, DAMI 2019)

Page 95: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TIME DIFFERENCE PRIOR

Consider citation network

Papers arrive in batches

(per year)

Earlier papers cannot /

rarely cite newer papers

Limited increase in

computational cost to fit

background distribution

45

Adriaens, Lijffijt, De Bie (ECMLPKDD 2017, DAMI 2019)

Page 96: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

EXAMPLE ON CITATION DATA

46

3 recent best

papers from ACM

SIGKDD

Uniform prior:

Time and degree prior:

Adriaens, Lijffijt, De Bie (ECMLPKDD 2017, DAMI 2019)

Page 97: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONNECTING SUBTREES

Algorithmic approach: greedily construct a tree with

maximum depth k

Not so straightforward!

We investigated various heuristics

47

Page 98: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONNECTING SUBTREES

Empirically: best strategy depends on size of query set Q

48

Page 99: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONNECTING SUBTREES

49

Uniform Degree prior

NIPS/PODS authors

Repeated from Akoglu et al. (SDM 2013)

Page 100: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

NETWORK EMBEDDING

50

Page 101: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONDITIONAL NETWORK EMBEDDINGS

51

Data: a graph 𝐺 w. adj. matrix 𝑨 Pattern: a metric embedding 𝑿

‒ Probabilistic info about the graph

‒ 𝑃 𝒙𝑖 − 𝒙𝑗 |𝑎𝑖𝑗 = Half-Normal

Prior beliefs: 𝑃𝑖,𝑗 𝑎𝑖𝑗‒ overall density

‒ degrees

‒ block structure

‒ assortativity

‒ ...

Find ML embedding:max𝑿 𝑃 𝐺|𝑿

Kang, Lijffijt, De Bie

(ICLR 2019)

Page 102: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

EXAMPLE ON STUDENTDB

52

Page 103: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

53

Page 104: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONDITIONAL NETWORK EMBEDDINGS

54

Page 105: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONDITIONAL NETWORK EMBEDDINGS

Algorithmic approach: gradient descent with

estimated gradient (positive and negative sampling)

55

Page 106: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SUMMARY PART 3

56

Page 107: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SUMMARY

Informative prior can be useful in many settings

For binary data (all previous examples), fitting and

updating background model is computationally easy

Mining SI patterns challenging, (for now) tailor-made

algorithms necessary

57

Page 108: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

HUMAN-CENTRIC

DATA EXPLORATION

PART 4/5: NUMERIC AND MIXED DATA

Tijl De Bie – slides in collaboration with Jefrey LijffijtGhent University

DEPARTMENT ELECTRONICS AND INFORMATION SYSTEMS (ELIS)

RESEARCH GROUP IDLAB

www.forsied.net 1

Page 109: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OUTLINEPart 1: Introduction and motivation

10mins

Part 2: The FORSIED framework40mins

Part 3: Binary matrices, graphs, and relational data40mins

BREAK

Part 4: Numeric and mixed data (including high-dimensional data visualization)50mins

Part 5: Advanced topics, outlook & conclusions25mins

Q&A15mins

2

Page 110: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OUTLINE OF THIS PART

Attributed subgraphs

Subgroup discovery in real-valued (target) data

Dimensionality reduction

Time series

3

Page 111: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ATTRIBUTED SUBGRAPHS

4

Page 112: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COHESIVE SUBGRAPHS

Data: attributed graph

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

A

B

D

C

E

F

G

H

I

Vertex Shops Events Pubs

A 1 5 3

B 5 0 2

C 2 3 10

D 1 6 4

E 4 0 2

F 3 1 5

G 6 2 9

H 2 1 3

I 0 1 3

Page 113: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COHESIVE SUBGRAPHS

Data: attributed graph

Prior beliefs: global attribute statistics

A

B

D

C

E

F

G

H

I

Vertex Shops Events Pubs

A 1 5 3

B 5 0 2

C 2 3 10

D 1 6 4

E 4 0 2

F 3 1 5

G 6 2 9

H 2 1 3

I 0 1 3

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

Page 114: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

A

C

D

COHESIVE SUBGRAPHS

Data: attributed graph

Prior beliefs: global attribute statistics

Patterns: cohesive subgraphs with exceptional attributes (CSEA)

B

E

F

G

H

I

Vertex Shops Events Pubs

A 1 5 3

B 5 0 2

C 2 3 10

D 1 6 4

E 4 2 2

F 3 1 5

G 6 2 9

H 2 1 3

I 0 1 3

Pattern:

These locations

have many events

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

Page 115: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

F

G

H

COHESIVE SUBGRAPHS

Data: attributed graph

Prior beliefs: global attribute statistics

Patterns: cohesive subgraphs with exceptional attributes (CSEA)

A

B

D

C

EI

Vertex Shops Events Pubs

A 1 5 3

B 5 0 2

C 2 3 10

D 1 6 4

E 4 0 2

F 3 1 5

G 6 2 9

H 2 1 3

I 0 1 3

Pattern is easy to

interpret if it is local.

How to quantify this?

Pattern:

These locations

have many pubs &

shops

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

Page 116: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

G

FH

COHESIVE SUBGRAPHS

Data: attributed graph

Prior beliefs: global attribute statistics

Patterns: cohesive subgraphs with exceptional attributes (CSEA)

A

B

D

C

EI

Vertex Shops Events Pubs

A 1 5 3

B 5 0 2

C 2 3 10

D 1 6 4

E 4 0 2

F 3 1 5

G 6 2 9

H 2 1 3

I 0 1 3

Pattern:

Vertices around G

have many pubs &

shops

Easy to describe

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

Page 117: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

C

AD

COHESIVE SUBGRAPHS

Data: attributed graph

Prior beliefs: global attribute statistics

Patterns: cohesive subgraphs with exceptional attributes (CSEA)

B

E

F

G

H

I

Vertex Shops Events Pubs

A 1 5 3

B 5 0 2

C 2 3 10

D 1 6 4

E 4 2 2

F 3 1 5

G 6 2 9

H 2 1 3

I 0 1 3

Pattern:

Vertices around C

have many events

Easy to describe

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

Page 118: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COHESIVE SUBGRAPHS

Data: attributed graph

Prior beliefs: global attribute statistics

Patterns: cohesive subgraphs with exceptional attributes (CSEA)

‒ Like subgroup discovery

‒ Describe vertex set with rule

‒ Intersection of neighbourhoods

‒ Minus exceptions

‒ Attributes below / above threshold

‒ As compared to expectation

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

Page 119: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COHESIVE SUBGRAPHS

Data: attributed graph

Prior beliefs: global attribute statistics

Vertex Shops Events Pubs

A 1 5 3

B 5 0 2

C 2 3 10

D 1 6 4

E 4 0 2

F 3 1 5

G 6 2 9

H 2 1 3

I 0 1 3

Vertex Shops Events Pubs

A 0.5 … …B 0.04 … …C 0.2 … …D 0.8 … …E 0.06 … …F 0.13 … …G 0.07 … …H 0.1 … …I 1.0 … …

Attribute values Interestingness

Background

distribution

- Geometric

per cell

- Using

row/column

margins

(values are for illustration only)

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

Page 120: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COHESIVE SUBGRAPHS

Data: attributed graph

Prior beliefs: global attribute statistics

Vertex Shops Events Pubs

A 1 5 3

B 5 0 2

C 2 3 10

D 1 6 4

E 4 0 2

F 3 1 5

G 6 2 9

H 2 1 3

I 0 1 3

Vertex Shops Events Pubs

A 0.5 … …B 0.04 … …C 0.2 … …D 0.8 … …E 0.06 … …F 0.13 … …G 0.07 … …H 0.1 … …I 1.0 … …

Attribute values Interestingness

Background

distribution

- Geometric

per cell

- Using

row/column

margins

(values are for illustration only)

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

Page 121: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COHESIVE SUBGRAPHS

Data: attributed graph

Prior beliefs: global attribute statistics

Patterns: cohesive subgraphs with exceptional attributes (CSEA)

14

𝑃1: +food

𝑃2: +professional, +nightlife,

+outdoors, +college

𝑃3: +nightlife, +food, -college

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

Page 122: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COHESIVE SUBGRAPHS

Data: attributed graph

Prior beliefs: global attribute statistics

Patterns: cohesive subgraphs with exceptional attributes (CSEA)

Bendimerad, Mel, Lijffijt, Plantevit, Robardet, De Bie (forthcoming)

[Presented and can be found online at MLG @ KDD 2018 workshop]

Page 123: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

COHESIVE SUBGRAPHS

Algorithmic approach: we tried

Option 1: enumerate with P-N-RMiner, then rank

Option 2: dedicated branch-and-bound algorithm

16

Page 124: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SUBGROUP DISCOVERY (EXCEPTIONAL MODEL MINING)

17

Page 125: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

LOCATION & SPREAD PATTERNS

Data:

‒ Meta-data: any type

‒ Target data: real valued matrix

Prior beliefs: mean and variance statistics

‒ Typically overall, but can be for subsets

Patterns: description with

‒ mean vector, or

‒ projection and magnitude of variance

18

𝑿 ∈ ℝ𝑚×𝑛

-4 -2 0 2 4

Attribute 1

-4

-2

0

2

4

Att

rib

ute

2

(a)Objects with

property x

Objects with

property y

Objects with

property z1

and z2

Lijffijt, Kang, Duivesteijn, Puolamäki,

Oikarinen, De Bie (ICDE 2018)

Page 126: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SINGLE TARGET DIMENSION: CRIME IN US

UCI Crime data:

violent crime rate

(per 1k pop)

Description: areas with high incidence of unmarried mothers (coded

by CBS as percentage illegitimate)

Target: high average crime rate

19

Page 127: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ECOLOGY: PRESENT SPECIES AS TARGETS

Description of pattern (a): “mean temperature in March ≤ −1.68 ◦C’”Description of pattern (b): “average monthly rainfall in August ≤ 47.62 mm”Description of pattern (c): “average monthly rainfall in October ≤ 45.25 mm and mean temperature of wettest quarter ≥ 16.32 ◦C”

Mean of target attributes for (a): -Wood mouse +Mountain hare, moose, red-

backed vole, wood lemming 20

Page 128: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

GERMAN POLITICS VS. DEMOGRAPHICS

Description of pattern (a): "few children". Target: LEFT is popular.

Description of pattern (b): "large mid-aged pop". Target: GREEN relatively popular.

Description of pattern (c): many children. Target: LEFT is impopular.

21

Page 129: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

GERMAN POLITICS VS. DEMOGRAPHICS

Moreover, target of pattern (a): “SDP and CDU negative correlated”.Due to popularity of LEFT, SDP and CDU are in tougher competition here

22

Page 130: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SUBGROUP DISCOVERY

For reference only

Prior:

IC/DL:

23

Page 131: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SUBGROUP DISCOVERY

Algorithmic approach:

Find descriptions using beam search

‒ Fairly standard in SD; Implementation from

Cortana (Meeng & Knobbe, BeneLearn 2011)

‒ Using the SI objective directly

For projections

‒ Manifold learning problem (ManOpt toolbox)

Interesting results to speed-up update to bg. distrib.

24

Page 132: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

DIMENSIONALITY REDUCTION

25

Page 133: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

PROJECTION PATTERNS

Data: Real valued matrix

Prior beliefs: global mean and (co-)variance structure

Patterns: projections

26

𝑿 ∈ ℝ𝑚×𝑛

Page 134: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

DATA PROJECTIONS

De Bie, Lijffijt, Santos-Rodriguez, Kang

(ESANN 2016)

Kang, Lijffijt, Santos-Rodriguez, De Bie

(KDD 2016, DMKD 2018)

Puolamäki, Kang, Lijffijt, De Bie (ECMLPKDD 2016)

Kang, Puolamäki, Lijffijt, De Bie (ECMLPKDD 2016)

Puolamäki, Oikarinen, Kang, Lijffijt, De Bie (ICDE 2018)

Finding informative

projections

Accounting for

user feedback

Page 135: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SI COMPONENT ANALYSIS (SICA)

Problem is parametrized by a resolution parameter

Go from density to probability for projections

28

De Bie, Lijffijt, Santos-Rodriguez, Kang (ESANN 2016)

Kang, Lijffijt, Santos-Rodriguez, De Bie (KDD 2016, DMKD 2018)

Page 136: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SICA

Effect of the prior beliefs:

Expectation on mean/variance PCA objective

Expectation on magnitude of variance more

robust variant of PCA (which we call t-PCA)

Graph of point similarities next slides

29

Page 137: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SICA GRAPH PRIOR

30

Page 138: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SICA GRAPH PRIOR

German voting percentages per district, account for

east-west divide

31

Page 139: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SI DATA EXPLORER (SIDE)

32

https://users.ugent.be/~bkang/software/side_dev/entry.html

Page 140: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SI DATA EXPLORER (SIDE)

33

Page 141: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SI DATA EXPLORER (SIDE)

34

Page 142: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SICA/SIDE

Algorithmic approach:

SICA (uniform/graph) are eigenvalue problems

SICA t-PCA and SIDE use manifold learning toolbox

35

Page 143: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

C-T-SNE

36

Non-linear

dimensionality

reduction

Based on t-SNE

Prior beliefs:

known clusters

Result: c-t-SNE

visualization

may make new

clusters salientBo Kang, Dario Garcia Garcia, Jefrey

Lijffijt, Raul Santos Rodriguez, Tijl De

Bie (arxiv, 2019)

Page 144: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TIME SERIES

37

Page 145: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TIME SERIES MOTIFS

Data: time series

Prior beliefs: mean, var, co-var (first order difference)

38

𝑿 ∈ ℝ1×𝑛Deng, Lijffijt, Kang, De Bie (Entropy, 2019)

Page 146: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TIME SERIES MOTIFS

Data: time series

Prior beliefs: mean, var, co-var (first order difference)

Patterns: motif template

39

𝑿 ∈ ℝ1×𝑛Deng, Lijffijt, Kang, De Bie (Entropy, 2019)

Page 147: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TIME SERIES MOTIFS

IC quantifies the reduction in uncertainty

Likelihood of data increases by inserting template in

background distribution for matched locations

Due to co-variance, expectations change globally

Updating computationally (somewhat) costly

40

Page 148: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TIME SERIES MOTIFS

Algorithmic approach:

Construct template from 3 or 4 subsequences using

constraint programming and relaxed objective

Greedily add subsequences using exact objective

Prune dissimilar subsequences from search after

branching and selection of initial set

41

Page 149: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TIME SERIES MOTIFS

Data: time series

Prior beliefs: mean, var, co-var (first order difference)

Patterns: motif template

42

𝑿 ∈ ℝ1×𝑛Deng, Lijffijt, Kang, De Bie (Entropy, 2019)

Page 150: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

AND MORE

43

Page 151: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

AND MORE

Past Data clustering

Biclustering

Exceptional model mining / subgroup discovery Time series segments

Ongoing / future Backbone of a network

Insightful summaries of an attributed network

Network embeddings ...

44

with all past and current members of the FORSIED team and Jilles

Vreeken, Antonis Matakos, Dario Garcia-Garcia, Siegfried Nijssen,...

Page 152: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

HUMAN-CENTRIC

DATA EXPLORATION

PART 5/5: ADVANCED TOPICS, OUTLOOK & CONCLUSIONS

Tijl De Bie – slides in collaboration with Jefrey LijffijtGhent University

DEPARTMENT ELECTRONICS AND INFORMATION SYSTEMS (ELIS)

RESEARCH GROUP IDLAB

www.forsied.net 1

Page 153: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OUTLINEPart 1: Introduction and motivation

10mins

Part 2: The FORSIED framework40mins

Part 3: Binary matrices, graphs, and relational data40mins

BREAK

Part 4: Numeric and mixed data (including high-dimensional data visualization)50mins

Part 5: Advanced topics, outlook & conclusions25mins

Q&A15mins

2

Page 154: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RELATED WORK

3

Page 155: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

RELATED WORK

MDL/compression

MAP estimation

Hypothesis testing / randomization techniques

‘Subjective interestingness’ research by Thuzhilin,

Silberschatz, Padmanabhan (90’s)

4

Page 156: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

LIMITATIONS

Not all new information is interesting!

The FORSIED framework does not directly address this

Importance of feedback in combination with FORSIED

(see work on dimensionality reduction)

Description length

How to determine?

Is shorter really always better?

5

Page 157: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

FORSIED’S ORIGINS,AND OUTLOOK

6

Page 158: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

FORSIED’S ORIGINS

7

Remodiscovery, 2006

DISTILLER, 2009

Page 159: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

FORSIED’S ORIGINS

8

Page 160: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

FORSIED’S ORIGINS

9

MINI, ECML-PKDD 2007

TKDD, 2007

KDD, 2009

DAMI, 2014

Page 161: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

FORSIED’S ORIGINS

10

Since 2017 also Jefrey’s Pegasus2 fellowship, and a large and a small FWO grant

Page 162: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OUTLOOK Improve theoretical understanding

Estimating the background distribution (information geometry)

Cognitive aspects (cognitive science)

User interface (human computer interaction)

Visualization (visual analytics)

Algorithmic aspects (optimisation theory)

Safeguarding sensitive information & fairness

More instantiations Data types (linked data / knowledge graphs!) / pattern types / prior belief types

Applications Bioinformatics

Web and social media mining

11

Page 163: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

DATA MINING WITHOUT SPILLING THE BEANS

12

Page 164: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

PRIVACY-PRESERVING DATA PUBLISHING

Anonymization insufficient to protect sensitive attributes (linkage attack)

Generalization!

13

ZIP D.O.B. Sex Diagnosis

94701 01/02/1968 F Healthy

94701 06/03/1990 F Obesitas

94702 11/08/1991 M Healthy

94703 03/09/1979 M Prostate cancer

94703 07/10/1951 F Healthy

94704 10/02/1973 M Obesitas

94705 20/12/2001 F Obesitas

ZIP D.O.B. Sex Full name

94701 01/02/1968 F Mary Smith

94701 06/03/1990 F Patricia Johnson

94702 11/08/1991 M James Jones

94703 03/09/1979 M John Brown

94703 07/10/1951 F Linda Davis

94704 10/02/1973 M Robert Miller

94705 20/12/2001 F Barbara Wilson

Anonymized patient database Voting records database

Quasi-identifiers

ZIP D.O.B. Sex Diagnosis

94701 ‘51-’01 F Healthy

94701 ‘51-’01 F Obesitas

94702-5 ‘51-’01 M Healthy

94702-5 ‘51-’01 M Prostate cancer

94702-5 ‘51-’01 F Healthy

94702-5 ‘51-’01 M Obesitas

94702-5 ‘51-’01 F Obesitas

Page 165: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

PRIVACY-PRESERVING DATA PUBLISHING

𝒌-anonymity: minimum size equivalence class ≥ 𝑘 Homogeneity attack

Background knowledge attack

𝒍-diversity: ≥ 𝑙 sensitive attribute values well represented in each

equivalence class

Hard to achieve for imbalanced data

Skewness attack

Similarity attack

𝒕-closeness: sensitive attribute value distribution in equivalence classes

same as in overall data

14Source: https://www.linkedin.com/pulse/dont-throw-baby-out-bathwater-didi-gurfinkel/

Page 166: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

PRIVACY-PRESERVING DATA PUBLISHING

𝒌-anonymity: minimum size equivalence class ≥ 𝑘 Homogeneity attack

Background knowledge attack

𝒍-diversity: ≥ 𝑙 sensitive attribute values well represented in each

equivalence class

Hard to achieve for imbalanced data

Skewness attack

Similarity attack

𝒕-closeness: sensitive attribute value distribution in equivalence classes

same as in overall data

15Source: https://www.linkedin.com/pulse/dont-throw-baby-out-bathwater-didi-gurfinkel/

Page 167: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OTHER KINDS OF SENSITIVE INFORMATION

Existence of a tight community in a network

16

Page 168: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OTHER KINDS OF SENSITIVE INFORMATION

Existence of a tight community in a network

Existence of a cluster in data

Frequency of particular items / size of particular

transactions in a database of purchases

Preserve this while:

publishing generalized version of database,

identifying dense subgraphs,

finding clusters,

mining frequent itemsets, etc

17

Data mining patterns

Page 169: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

GENERAL STRATEGY

Data: 𝒙 Data mining goal: reveal as much as possible about 𝒙

Sensitive aspects: 𝑓 𝒙 ∈ Φ the sensitive attributes’ values density of a specified subgraph

existence of a tight cluster

frequencies of all items

Goal: reveal as little as possible about 𝒇 𝒙 Updating 𝑷 → 𝑷′ results in updating 𝑷𝒇 → 𝑷𝒇′

More complex than conditioning!

𝑃𝑓 𝑓 𝒙 can be larger or smaller than 𝑃𝑓′ 𝑓 𝒙18

(data) 𝒙 → 𝑓 𝒙 (sensitive aspects)

Considering the user’s prior beliefs! ‘Subjective’ measures

ΩΩ′ 𝑃𝑥 Φ𝑃𝑓

𝑓 𝑥𝑓

Φ′ = 𝑓 Ω′

Page 170: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TRADING-OFF TWO THINGS

1. Subjective information content of a pattern2. A criterion on the background distribution about sensitive aspects:

Information content left in sensitive aspects(surprise in actual value of the sensitive attributes):−log 𝑃𝑓′ 𝑓 𝒙

Entropy of 𝑃𝑓 (uncertainty about sensitive attributes):−𝐸𝒙~𝑃𝑓 log 𝑃𝑓′ 𝑓 𝒙 Knowledge gained about actual value of the sensitive aspects:−log 𝑃𝑓 𝑓 𝒙𝑃𝑓′ 𝑓 𝒙 Degree of belief that the sensitive aspects are within a specified set Φ∗ ⊆ Φ:𝑃𝑓′ Φ∗ ...

19

Page 171: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

EXAMPLES

20

Page 172: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

PRIVACY-PRESERVING DATA PUBLISHING

21

• Random synthetic dataset:

• 5 real-valued quasi-identifiers,

generalization through intervals

• 1 sensitive attribute, 3 possible values

• 1 other attribute, 3 possible values

• 100 data records

• Trade-off:

• information about data (other & sensitive attributes)

• knowledge gained about sensitive attribute

• Generalize quasi-attributes 5 equivalence classes

• Ensure the maximum information content about any

sensitive attribute value is small

Conditional distributions within

the 5 equivalence classes over

the 3 sensitive attribute values

Conditional distributions within

the 5 equivalence classes over

the 3 other attribute values

Joint conditional distribution of the sensitive (rows) and other (columns) attributes, within 5 equivalence classesQI SA

OA

• Zip code

• DOB

E.g.

• Sexual orientation

• Ethnicity

E.g.

• Sense of well-being

• Productivity

E.g.

Page 173: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

DENSE SUBGRAPHS WITHOUT SPILLING BEANS

22

Initially

10 20 30 40

10

20

30

40

After both community patterns

10 20 30 40

10

20

30

40

After both community patterns

and a deception pattern

10 20 30 40

10

20

30

40

After both community patterns

partially concealed

10 20 30 40

10

20

30

40

• Random network:

• 2 non-overlapping

communities

• A 3rd community

overlapping both

• The 3rd is sensitive

• Analyst should

remain surprised by its presence

• Task:

• Identify (non-)dense subgraphs

• Without spilling the beans on the 3rd

community

• Approaches (result from general strategy):

• Deceive

• Conceal

Page 174: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

TAKE-AWAYS

FORSIED ideas can be used for quantifying sensitive

information disclosure

Key point: sensitive information disclosure is subjective

More work needed to understand how to make this

practical

23

Page 175: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONCLUSIONS

24

Page 176: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

OVERALL CONCLUSIONS

A generic approach for designing methods for exploring data Several successes

Sometimes more challenging, mostly due to algorithmic issues

Key take-away Model what’s not interesting (= prior beliefs),

show what’s complementary (= subjectively interesting) Using information theory

New horizons? Privacy and sensitive information

25

Page 177: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

www.forsied.net 26

“Data Mining without Spilling the Beans: Preserving more than Privacy alone”Research project funded by the FWO

Tijl De Bie, Jefrey Lijffijt

“Exploring Data: Theoretical Foundations and Applications to Web, Multimedia, and Omics Data”Odysseus project funded by the FWO

Tijl De Bie

“Formalizing Subjective Interestingness in Data mining”ERC project FORSIED

Tijl De Bie

“Personalised, interactive, and visual exploratory mining of patterns in complex data”FWO [Pegasus]2 Marie Skłodowska-Curie Fellowship

Jefrey Lijffijt

Acknowledgements go to

Page 178: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

27

Jefrey Lijffijt

Bo Kang

Wouter

Duivesteijn

Achille Aknin

Holly Silk

Raul

Santos-Rodriguez

Eirini Spyropoulou

Akis Kontonasios

Paolo Simeone

Robin Vandaele

Florian Adriaens

Tijl De Bie

Xi Chen

Junning (Lemon)

Deng

Ahmad MelAlexandru Mara

We are recruiting!

www.forsied.net / aida.ugent.be

+ lots of collaborators...

Maryam Fanaeepour

Maarten Buyl

Page 179: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

THANKS!

TIME FOR Q&A

28

Page 180: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

MINING SUBJECTIVELY

INTERESTING PATTERNS IN DATA

SUPPLEMENTARY MATERIAL

Tijl De Bie – slides in collaboration with Jefrey LijffijtGhent University

DEPARTMENT ELECTRONICS AND INFORMATION SYSTEMS (ELIS)

RESEARCH GROUP IDLAB

www.forsied.net 1

Page 181: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

REFERENCES

2

Page 182: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Adriaens, Lijffijt, De Bie: Subjectively Interesting Connecting Trees. ECML/PKDD (2) 2017: 53-69

De Bie, Lijffijt, Santos-Rodriguez, Kang: Informative Data Projections: A Framework and Two Examples. ESANN 2015 :

435-640

De Bie: Subjective Interestingness in Exploratory Data Mining. IDA 2013: 19-31

De Bie: Maximum entropy models and subjective interestingness: an application to tiles in binary databases. Data Min.

Knowl. Discov. 23(3): 407-446 (2011)

De Bie: An information theoretic framework for data mining. KDD 2011: 564-572

De Bie: Subjectively Interesting Alternative Clusters. MultiClust@ECML/PKDD 2011: 43-54

Deng, Lijffijt, Kang, De Bie: Subjectively Interesting Motifs in Time Series, AALTD@ECML/PKDD 2018

Guns, Aknin, Lijffijt, De Bie: Direct Mining of Subjectively Interesting Relational Patterns. ICDM 2016: 913-918

3

Page 183: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Kang, Lijffijt, De Bie: Conditional Network Embeddings. Proceedings of the 7th International Conference on Learning Representations (ICLR), 2019.

Kang, Lijffijt, Santos-Rodríguez, De Bie: SICA: subjectively interesting component analysis. Data Min. Knowl. Discov. 32(4): 949-987 (2018)

Kang, Lijffijt, Santos-Rodriguez, De Bie: Subjectively Interesting Component Analysis: Data Projections that Contrast with Prior Expectations. KDD 2016: 1615-1624

Kang, Puolamäki, Lijffijt, De Bie: A Tool for Subjective and Interactive Visual Data Exploration. ECML/PKDD (3) 2016: 3-7

Kontonasios, De Bie: Subjectively interesting alternative clusterings. Machine Learning 98(1-2): 31-56 (2015)

Kontonasios, De Bie: An Information-Theoretic Approach to Finding Informative Noisy Tiles in Binary Databases. SDM 2010: 153-164

Kontonasios, Vreeken, De Bie: Maximum Entropy Models for Iteratively Identifying Subjectively Interesting Structure in Real-Valued Data. ECML/PKDD (2) 2013: 256-271

4

Page 184: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Kontonasios, Vreeken, De Bie: Maximum Entropy Modelling for Assessing Results on Real-Valued Data. ICDM 2011:

350-359

Van Leeuwen, De Bie, Spyropoulou, Mesnage: Subjective interestingness of subgraph patterns. Machine Learning

105(1): 41-75 (2016)

Lijffijt, Kang, Duivesteijn, Puolamäki, Oikarinen, De Bie: Subjectively Interesting Subgroup Discovery on Real-valued

Targets. IEEE ICDE 2018 : to appear

Lijffijt, Spyropoulou, Kang, De Bie: P-N-RMiner: a generic framework for mining interesting structured relational

patterns. I. J. Data Science and Analytics 1(1): 61-76 (2016)

Lijffijt, Spyropoulou, Kang, De Bie: P-N-RMiner: A generic framework for mining interesting structured relational

patterns. DSAA 2015: 1-10

Puolamäki, Kang, Lijffijt, De Bie: Interactive Visual Data Exploration with Subjective Feedback. ECML/PKDD (2) 2016:

214-229

5

Page 185: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

Puolamäki, Oikarinen, Kang, Lijffijt, De Bie: Interactive Visual Data Exploration with Subjective Feedback: An

Information-Theoretic Approach. IEEE ICDE 2018 : to appear

Spyropoulou, De Bie, Boley: Interesting pattern mining in multi-relational data. Data Min. Knowl. Discov. 28(3): 808-849

(2014)

Spyropoulou, De Bie, Boley: Mining Interesting Patterns in Multi-relational Data with N-ary Relationships. Discovery

Science 2013: 217-232

6

Page 186: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

LINKS TO SOFTWARE

7

Page 187: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

R-MINER(S)

Original (fastest for full enumeration):

https://bitbucket.org/BristolDataScience/rminer/

N-RMiner (supports n-ary):

https://bitbucket.org/BristolDataScience/n-rminer/

P-N-RMiner (support structured attributes):

https://bitbucket.org/BristolDataScience/p-n-rminer/

CP-RMiner (top 1 RMiner pattern, iteratively, fast):

https://bitbucket.org/ghentdatascience/cp/

8

Page 188: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

CONNECTING TREES

https://bitbucket.org/ghentdatascience/interestingtreesp

ublic/

9

Page 189: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

DENSE SUBGRAPHS (COMMUNITIES)

http://patternsthatmatter.org/software.php#ssgminer

10

Page 190: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

NETWORK EMBEDDING

https://bitbucket.org/ghentdatascience/cne-public/

11

Page 191: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ATTRIBUTED SUBGRAPHS

SIAS-Miner: http://goo.gl/ZxsvbX

12

Page 192: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

SUBGROUP DISCOVERY

https://bitbucket.org/ghentdatascience/sisd-public/

13

Page 193: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

DIMENSIONALITY REDUCTION

SICA:

http://users.ugent.be/~bkang/software/sica/sica.zip

SIDE (online tool):

http://users.ugent.be/~bkang/software/side_dev/index.

html

SIDE (MaxEnt R version): http://kaip.iki.fi/sider.html

CLIPPR: https://bitbucket.org/ghentdatascience/clippr/

14

Page 194: HUMAN-CENTRIC DATA EXPLORATION · 2019-09-13 · HUMAN-CENTRIC DATA EXPLORATION PART 1/5: MOTIVATION, BACKGROUND & OUTLINE Tijl De Bie –slides in collaboration with Jefrey Lijffijt

ANYTHING MISSING?

Not all (source) code has been published, please ask if

you are interested in something that is missing!

15