making the impossible possible: randomized machine...

Post on 28-May-2020

5 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Making the Impossible Possible: Randomized Machine Learning

Algorithms for Big Data

Rong Jin

Alibaba Group

Big Data Challenge

• Data exists in the digital universe • 2012: 2.7 Zetabytes (1021) • 2020: 40 Zetabytes

• Huge amount of data generated

on the Internet every minute • YouTube users upload 300 hours of

video, • Facebook users share 4 million

pieces of content

http://www.fiercebigdata.com/story/how-much-data-created-internet-

every-minute/2015-08-14

Too much data to process

Big Data Challenge

High dimensional data

• E.g. millions of features have been used for image classification & online advertising

Why Data Size Matters ?

Matrix completion

• Classification, clustering, recommender systems

• Performance is measured by recovery error

Why Data Size Matters ?

O(rnlog2(n)): PERFECT Recovery

O(rnlog (n)): POOR Recovery

reco

very

err

or

# observed entries

O(rnlog (n)) O(rnlog2(n))

Un

kno

wn

# observed entries

Why Learning from Big Data is Hard ?

Even computing data average is non-trivial

• Each matrix Mi is sparse with size 1Bx1M

• Average matrix Z is much dense, too expensive to store

• Can we compute an approximate average Z’ without having to computing Z explicitly ?

Why Learning from Big Data is Hard ?

Turn matrix average into an optimization problem

Why Learning from Big Data is Hard ?

Turn matrix average into an optimization problem

• Solved efficiently by stochastic gradient descent

• Intermediate sparse solutions, strong guarantee

Why Learning from Big Data is Hard ?

• : training examples

• : a convex loss (e.g. )

• : a convex domain

Why Learning from Big Data is Hard ?

Require a large-scale optimization problem • Too many data points (109)

• Very high dimensionality (108)

Randomized Algorithms for Big Data

Randomized algorithms are efficient

• for large-sized data sets

• only need one pass of the entire data set

• for high dimensional data

• reduce dimensionality by random projection

Randomized Algorithms for Big Data

Randomized algorithms are efficient

• for large-sized data sets

• only need one pass of the entire data set

• for high dimensional data

• reduce dimensionality by random projection

Randomized algorithms are effective

• Minimizes the generalization error

Randomized Algorithms for Big Data

Limitations of randomized algorithms

• Random decision is suboptimal and can be very poor

We will focus our discussion on Random Projection

Random Projection

Random Projection

• Project data into a random low dimensional space

Gaussian Random Matrix S

Random Projection

• Project data into a random low dimensional space

Random Projection

• Recover the solution in the high dimensional space

Gaussian Random Matrix ST

Random Projection

• Good news

random projections are sufficient if data is linearly separated with margin

Random Projection

High Dimensional Space

Low Dimensional Space

Recovery

Random Projection

Random Projection

• is an poor approximation of

Random Projection

• Impossibility theorem: for most random projection S,

S Random Projection

Random Projection

• Impossibility theorem: for most random projection S,

S

Is it possible to overcome the limitation of random projection

while enjoys its simplicity ?

Randomized Algorithms for Big Data

Limitations of randomized algorithms

• Random decision is suboptimal and can be very poor

How to overcome the fundamental limitations of randomized alg. in ML ?

Dual Random Projection

Random Projection

Dual Random Projection

Random Projection

Compute Dual Variables

Dual Random Projection

Random Projection

Compute Dual Variables

Dual Recovery

Dual Random Projection

Recovery property

• If X can be well approximated by a rank r matrix, with a high probability, we have

Dual Random Projection

Recovery property

• If X can be well approximated by a rank r matrix, with a high probability, we have

Why Dual Random Projection Work ?

• Although primal solution can’t be recovered accurately via random projection, dual variables can

• It is closely related to gradient descent

where

Iterative Dual Random Projection

Iterative Dual Random Projection

With high probability

where

Experiment with Synthetic Dataset

• N=50,000, d = 20,000, r=10

Experiment with RCV1 Dataset

• 800K documents, 40,000 features

Fine-Grained Visual Classification

• Fine-Grained Challenge 2013 (https://sites.google.com/site/fgcomp2013)

• Categories: air crafts, birds, dogs, shoes, cars

• Number of training images: 100K

Fine-Grained Visual Classification

• # Visual features: 134,016

• Our approach is based on metric learning

• Apply dual random projection to improve computational efficiency

Team Performance Inria-Xerox 77.1

CafeNet 75.8

VisionMetric

(Our method)

71.7

Symbiotic (University

of Oxford)

71.6

CognitiveVision

(MSR)

70.0

DPD_Berkeley

(Berkeley)

69.2

MPG (University of

Tokyo)

52.9

Infor_FG (CMU) 16.0

InterfAIce (UIUC) 4.5

Online Display Ads

Advertiser

• Market its products

User

• Find products/service

Platform

• Attract enough traffic

Online Display Ads

Advertiser • Choose target audience by

selecting appropriate tags Platform • Match users with ads

through tags Users • Profile by tag assignments • Assigned tags with the

largest scores (greedy approach)

Tag1 Tag2 …… Tag n

Supply & Demand Mismatch

Advertisers

• Limited budget limited supplies of tags

Platform

• Match users with ads through tags

Users

• Profile by tag assignments

• Assigned tags with the largest scores (greedy approach)

Tag1 Tag2 …… Tag n

Supply

Demand

5000

1000

1000

5000

Supply and Demand Mismatch (I)

• Assume consumers a & b come at random order

• On average, 50% of time b can’t find matched ad

Advertiser/Consumer budget a b

A 1 1.1 1

B 1 1 0

b a

A B

a b

A B

Supply and Demand Mismatch (II)

• Alternative solution: remove a from the list of target audience for ad A

• Both a and b will find their matched ad regardless of their order

Advertiser/Consumer a b

A 1

B 1 0

Advertiser/Consumer budget a b

A 1 1

B 1 1 0

b a

A B

Supply and Demand Mismatch in Alibaba

• Many targets with strong demand (i.e. consumers) but weak supply (i.e. advertisement budgets)

• Many targets with weak demand (i.e. consumers) but strong supply (i.e. advertisement budgets)

Minimize Mismatch: Global Optimization

• Find the best assignment of tags

1. maximize the revenue, and

2. minimize the supply and demand mismatch

• A gigantic optimization problem

• Billions of users and thousands of tags

• Need to find solutions in 2 hours

u1

u2

……

un

a1

a2

am

……

A

Users (109) tags (105)

Minimize Mismatch: Global Optimization

• Apply dual random projection to efficiently find the solution for A u1

u2

……

un

a1

a2

am

……

A

Users (109) Ads (104)

Random Projection

Obtain optimal solution & dual variables

Implementation

• Implement by Map-Reduce

Results in Online Display Ads

• Reduce the supply and demand mismatch

After optimization

Before optimization

What Is the Next ?

• Impossibility theorems exist in many randomized algorithms in ML • Passive learning

• Active learning

• Data clustering

• Matrix completion

• Difference privacy

• Compressive sensing

• Low rank matrix approximation

• ……

top related