phd defence pairwise learning
TRANSCRIPT
EXACT AND EFFICIENT ALGORITHMS FOR PAIRWISE LEARNING
Michiel Stock @michielstock
1
KERMIT
Michiel Stock @michielstock
2
KERMIT
EXACT AND EFFICIENT ALGORITHMS FOR PAIRWISE LEARNING
Pairwise learning in bioscience engineering
3
and everyday life.
OUTLINE OF THIS TALK
1. Explaining the problem setting of my PhD through an accessible example.
2. Showing that this setting can be generally applied in life sciences.
3. Example application: predicting species interactions.
4. Contributions of my PhD.
4
( , ,1)PAIRWISE DATA
5Any resemblance to actual persons, living or dead, or actual events is purely coincidental.
person
book
label (0/1)
( , ,1)( , ,0)
( , ,0)( , ,1)( , ,1)
…
BOOK READINGS DATASET
6Any resemblance to actual persons, living or dead, or actual events is purely coincidental.
Ayla 1 1 0 0 0 0 1 0
Bram 1 0 0 1 0 0 1 1
Chaïm 1 1 1 0 0 0 0 0
Dimitri 0 0 0 0 0 0 0 1
Elina 1 0 1 0 1 0 0 0
Francis 1 1 1 0 0 1 1 1
Giacomo 0 0 0 1 1 0 0 0
Hannes 0 0 0 0 0 1 0 0
Ine 1 1 1 0 0 0 0 0
Joni 1 1 1 0 1 0 0 1
Koen 1 1 1 1 0 1 0 0
Y
n
m
Yij
j
i
A PAIRWISE MODEL
7
‘Learn’ a function based on observed data:
such that a high score indicates that someone would be interested in a book.
f( , )
A SIMPLE AVERAGE OF AVERAGES
8
0i
j
3
4
3
1
3
6
21
3
5
5
8 6 6 3 3 3 3 4 44X
k
Ykj
X
l
Yil
X
k,l
Ykl Fij =0 + 0.27 + 0.75 + 0.5
4= 0.38
Re-estimate using the average of:
1. the value itself
2. column average
3. row average
4. total average
1
34
2
Fij =1
4(Yij +
1
n
X
k
Ykj
+1
m
X
l
Yil +1
nm
X
k,l
Ykl)
read
not read
Y
FROM OBSERVATIONS TO PREDICTIONS
9
Y F
Applying the simple
model
What people have read
What the model thinks people would
like to read
pers
ons
booksA
B
C
D
E
F
G
H
I
J
K
A
B
C
D
E
F
G
H
I
J
K
read
not read
LEAVE-ONE-OUT CROSS VALIDATION
10
Y
Applying the simple
model
What people have read
What the model thinks people would
like to read
pers
ons
booksA
B
C
D
E
F
G
H
I
J
K
Floo
not using the observation itself!
A
B
C
D
E
F
G
H
I
J
K
A
B
C
D
E
F
G
H
I
J
K
A
B
C
D
E
F
G
H
I
J
K
A
B
C
D
E
F
G
H
I
J
K
A
B
C
D
E
F
G
H
I
J
K
A
B
C
D
E
F
G
H
I
J
K
read
not read
A
B
C
D
E
F
G
H
I
J
K
Exact and efficient computation of this matrix.
PERFORMANCE EVALUATION OF THE MODEL
11
AUC = 81%
Giacomo, Harry Potter
Ayla, The Da Vinci Code
Bram, Lord of the Rings
Koen, Pattern Recognition
read
not read
score
coun
ts
PRIOR KNOWLEDGE ON THE PERSONS
12
G
H
I
J
F
E
D
C
BA
K
knows
female
male
likes fiction
likes non-fiction
Any resemblance to actual persons, living or dead, or actual events is purely coincidental.
PRIOR KNOWLEDGE ON THE BOOKS
13
Fantasy Magic
realism
Thriller
Popular science
Nonfiction
Fiction
CS
PAIRWISE PREDICTION BASED ON FEATURES
14
Prediction by weighing observations based on similarity of persons and books:
Fij =X
k,l
aikbjlYkl
similarity between person i and person k
similarity between book j and book l
More general:
f( , ) = h(�( ), ( ))
descriptions
Exact and efficient algorithms for learning
the parameters.
FOUR SETTINGS FOR PAIRWISE PREDICTION
15
A
Setting A: same persons, same books
B
Setting B: new persons, same booksC
Setting C: same persons, new books
D
Setting D: new persons, new books
books
persons
new books
new persons
CROSS VALIDATION IN THE FOUR SETTINGS
16
i
j
withheld for testing
discarded
Setting A: leave out each pair
Setting B: leave out each row
Setting C: leave out each column
Setting D: leave out each pair, discard other pairs in row and column
Exact and efficient formulas for computing
the loo values!
RANKING OF BOOKS
17
For a given person, find the most relevant books in a database:
= arg max f( , )2 S
Solve the following optimization problem:
S =
THE THRESHOLD ALGORITHM
18
Search for the best books in an exact and efficient way.
Example: I like books on biology, science fiction and robots.
Stock et al. (2016)
biology science fiction robotsmore
relevant
less relevant
The many applications of pairwise learning in bioscience engineering
19
De Clercq et al. (2015)
Costello et al. (2014) Stock et al. (2014)
Gonnelli et al. (2015) Van Peer et al. (2016)
Stock et al. (2013) Stock et al. (2017)
COMMON THEMES
➤ Incorporating prior knowledge
➤ Use of ranking
➤ Proper model evaluation
➤ Efficiency of algorithms
20
more relevant
less relevant
ii
“Book” — 2017/4/18 — 10:03 — page 174 — #202 ii
ii
ii
174 5 E�cient exact top-K inference in pairwise prediction
Input: q(v), p(u), upperBound, lowerBound, L1, . . . , LR
, dOutput: score or fail
1: score Ω upperBound2: for r Ω 1 to R do3: score Ω score - q
r
(v) pr
(uLr(d))
4: score Ω score + qr
(v) pr
(u)5: if score Æ lowerBound then6: break and return a fail /* item u will not improve SK
v
*/7: end if8: end for
Alg. 5.3: Calculating the scores for the Partial ThresholdAlgorithm
5.3 Recommending recipes:
an illustrative example
Everyone likes to eat. It is deciding what to eat that is often the hardproblem. In this section I will try to further clarify the ThresholdAlgorithm by developing a toy recipe recommendation engine. Theidea is simple; we have a set of ingredients in our fridge that we wouldlike to combine and we want to find a suitable recipe for this. Supposeour query is v= {egg, chocolate, bacon}. We have also a database ofrecipes from which to select a suitable recipe. Each recipe is onlydescribed by the presence or absence of a particular ingredient. Thedatabase of nine recipes is given below.
SPECIES INTERACTION NETWORKS
21
Interactions in nature:
Sampling of species interactions:
parasitismpredation pollination
FINDING INTERACTING SPECIES = FINDING INTERESTING BOOKS
22
For example: plant-pollinator interactions
pollinators => persons
plants => books
pollination => reading
FILTER REVEALS FALSE NEGATIVES
23
precision =
# interactions
size top
Improvement compared to random
selection
Stock et al. (2017)
CONTRIBUTIONS OF THIS PHD
1. Developing an exact and efficient toolkit for training and evaluating pairwise models.
2. Linking pairwise learning with other machine learning paradigm (transfer learning, matrix factorization, collaborative filtering, etc.)
3. Exploring new applications of pairwise learning in the life sciences.
24
THANKS FOR CONTRIBUTING DATA
25
Altynay, Ayla, Bart, Bram, Cedric, Chaïm, Cons, Daan, Dimitri, Elina, Esther, Francis, Giacomo, Griet, Gustavo, Hannes, Ilse, Ine, Ira, Isabel, Joke, Joni, Joris, Kate, Kevin, Klaas, Kobe, Koen, Laura, Lot, Louis, Marc, Marjolein, Marlies, Michaël, Niels, Niels, Olivier J, Olivier T, Peter B, Peter R, Phaedra, Raul, Simon, Sofie L, Sofie VG, Stijn, Thijs, Tilde, Tim, Tinne, Wim, Wouter & Zeno