1 budgeted nonparametric learning from data streams ryan gomes and andreas krause california...

Budgeted Nonparametric Learning from Data

Streams

Ryan Gomes and Andreas KrauseCalifornia Institute of Technology

Application ExamplesClustering Millions of Internet

Images

Torralba et al. 80 Million tiny images. IEEE PAMI Nov. 2008

Application ExamplesNonlinear Regression in Embedded

Systems

Control Input

Data Streams

• Can’t access data set all at once• Can’t control order of data access (random access may be available)

Charikar et al. Better streaming algorithms for clustering problems. STOC 2003

Data Streams

maximum wait until an element is revisited

elements available at iteration t

Nonparametric Methods

• Highly flexible, use training examples to make predictions

• In streaming environment: select budget of K examples to do prediction

Problem Statementactive set at iteration t:

monotone utility function: when

Given sequence of available elementsmaintain active sets

where final

active set satisfies:

Exemplar Based Clustering

Gaussian Process Regression

information gain

M. Seeger et al. Fast forward selection to speed up sparse gaussian process regression. (AISTATS 2003)

Gaussian Process Regression

expected variance reduction

Submodularity

andIf then

FC, FV, and FH are all submodular! “diminishing returns”

greater change

smaller change

StreamGreedy

Repeat:

Until forconsecutive iterations

Optimality of StreamGreedy

•Clustering-consistency•FC, FV, and FH are clustering-consistent when data consists of very well-separated clusters•Preferable to select exemplar from new cluster rather than two from same cluster

Theorem: If F is monotonic, submodular, and clustering-consistent then StreamGreedy finds

after at most iterations.

Optimality of StreamGreedy

Approximation Guarantee

Theorem: Assume F is monotonic submodular and further assume F is bounded by constant B. Then StreamGreedy finds

•Typically, data does not consist of well-separated clusters •Maximizing F is NP-hard in general

Limited Stream Access

Approximate and

Uniform subsample approximation

“validation set”

within accuracy.

Approximation Guarantee

Theorem: Assume F is monotonic submodular and may be evaluated to ε-precision. Further, assume F is bounded by constant B. Then StreamGreedy finds

•May only be able to approximately evaluate F

with distance

• Convergence rate comparable to online k-means

• Quantization performance difference due to exemplar constraint

MNIST Convergence

Example based centers Unconstrained centers

• Good performance with small validation sets• Larger validation set needed for larger number of

clusters K

Validation Set Size

Tiny Images

StreamGreedy Online K-means

> 1.5 millions 28 x 28 pixel RGB images

• Online K-means finds many singleton or empty clusters

StreamGreedy Exemplars

Tiny Images

Online k-means centers

StreamGreedy Cluster Examples

Nearest to exemplar

Randomly Chosen

Tiny Images

Run time vs. Accuracy

• Vary and • StreamGreedy performance saturates with run

time• Outperforms Online K-means in less time

Gaussian Process RegressionKin-40k dataset

outperforms but requires sufficient validation set

Conclusions

•Flexible framework•Theoretical performance guarantees:•Exemplar based clustering with non-metric similarities in streaming environment•Leads to efficient algorithms•Excellent empirical performance

StreamGreedy

1 budgeted nonparametric learning from data streams ryan gomes and andreas krause california...

streamgreedy performance

access data set

streamgreedy cluster

streamgreedy repeat

larger validation set

validation set size

tiny images streamgreedyonline

clustering problems

Documents

krause julie portfolio

robert s. krause

krause corporation ops405

2017 - 2019 business plan budgeted initiatives...

actual actual actual budgeted estimated budgeted · actual...

krause fund

budgeted is bs

(krause 2004)

kuhn krause product range · kuhn krause product range....

krause mercedes (2013)

henry krause

section 3: explanatory tables and budgeted financial ......

will krause

carla gomes info 2950 1 info 2950 mathematical methods for...

timothy krause - etf

krause john vissual_resume_09_10_2012

personal services budgeted vs. actual · budgeted actual...

contents · budgeted standard cash flow statement.....17...

control lect krause

australian securities and investments commission...2.1...