parikshit ram – senior machine learning scientist, skytree at mlconf atl
DESCRIPTION
Max-kernel search: How to search for just about anything? Nearest neighbor search is a well studied and widely used task in computer science and is quite pervasive in everyday applications. While search is not synonymous with learning, search is a crucial tool for the most nonparametric form of learning. Nearest neighbor search can directly be used for all kinds of learning tasks — classification, regression, density estimation, outlier detection. Search is also the computational bottleneck in various other learning tasks such as clustering and dimensionality reduction. Key to nearest neighbor search is the notion of “near”-ness or similarity. Mercer kernels form a class of general nonlinear similarity functions and are widely used in machine learning. They can define a notion of similarity between pairs of objects of any arbitrary type and have been successfully applied to a wide variety of object types — fixed-length data, images, text, time series, graphs. I will present a technique to do nearest neighbor search with this class of similarity functions provably efficiently, hence facilitating faster learning for larger data.TRANSCRIPT
Max-kernel searchHow to search for just about anything?
Parikshit Ram
Similarity search
● Set of objects● Query● Similarity functionR
q
1
Finding similar images
2
Drug discovery
3http://fineartamerica.com
Movie recommendations
4
Similarity search is ubiquitous
● Machine learning
● Computer vision
● Theory
● Databases
● Information retrieval
● Web application
● Collaborative filtering
● Scientific computing
5
Search-based classification
6
Search-based classification
6
?
Search-based classification
6
k-nearest-neighbor classification/regression
Search-based classification
7
“RomCom fan”
Search-based classification
7
“Kids movie fanatic”
Search-based outlier detection
8
9
Search-based ML
Advantage● nonparametric - lets the data speak● no need to train complex models
Key ingredient● notion of similarity (domain/data-specific)
Main challenge: efficiency● Sheer size of the data● Varied data types
10
Properties of similarity functions
11
● symmetry
OR
11
3
1
The dissimilarity is the size of the set-theoretic difference
Properties of similarity functions
11
● symmetry
● self-similarity
OR
OR
11
We do not really care about this.
Properties of similarity functions
11
● symmetry
● self-similarity
OR
OR
12
12
12
Metricsused everywhere
12
Metricsused everywhere
12
Bregman divergenceswidely used for distributions
Mercer kernelswidely used in ML for variety of objects and problems
???not quite explored in search or ML
Metricsused everywhere
Breadth of Kernel Functions
Objects Kernel Functions
Images linear, polynomial, Gaussian, Pyramid match
Documents cosine
Sequences p-spectrum kernel, alignment score
Trees subtree, syntactic, partial tree
Graphs random walk
Time series cross-correlation, dynamic time-warping
Natural Lang. convolution, decomposition, lexical semantic
13
What is a Kernel Function?
In wordsA pairwise symmetric function
● Correlation in a richer but hidden feature space● Cannot access the hidden space
Object space
Hidden space
Hidden mapping
14
Max-kernel Search
Find the object in R most similar to q with respect to a kernel
15
Existing methods
● Brute-force (parallel/distributed)○ Domain-specific optimizations
● Coerce data to use metrics○ Only approximate
No standard search tools!
16
Understanding kernels
If two objects equally similar to each other
then they are equally similar to the query q
17
IF
17
Understanding kernels
THEN
18
Indexing our collection
18
Indexing our collection
Multi-resolution index in O( n log n ) time
p
18
Indexing our collection
Cover Tree (BKL 2006)
How to Search with this Index?
19
q
p
How to Search with this Index?
19
q
p
p'
p''
How to Search with this Index?
q
p
p''
p'
19
How to Search with this Index?
q
p
p''
p'
19
How to Search with this Index?
q
p
p''
p'
Safely ignore a large chunk (potentially millions)
19
Results: Efficiency
Improvement
20
● Widely applicable algorithm● Performance data/kernel-dependent
Results: Efficiency10000x
10xImprovement
20
Results: Sublinear Query Time
Bigger data implies bigger efficiency gains
Improvement
Object set size
21
Can We Prove it?What Makes Search Hard?
Thm. For a set R of n objects, the query time is
● expansion constant
○ the distribution of the data
● directional concentration constant
○ the distribution of a kernel-induced transformation
of the data22
Code/tutorial for Fast Exact Max-Kernel Search
23
version 1.0.5http://www.mlpack.org Ryan R. Curtin
Endnote
● Search is an essential tool for ML● Exploring different types of similarity functions
increases the applicability and quality of search● Kernels are widely applicable similarity functions
○ now we have provably fast max kernel search
Email: [email protected]