web-scale computer vision using mapreduce for multimedia data mining

Web-Scale Computer Vision Using MapReducefor Multimedia Data Mining

Brandyn White, Tom Yeh, Jimmy Lin, Larry Davis

University of Maryland, College [email protected]

July 25, 2010

Motivation

Massive amount of data available:

I Flickr: ∼4B images

I Youtube: ∼800M videos

I Google Maps: Panorama and map of most US streets

I Facebook: Social graph 500M users + ∼15B photos

Web-scale computer vision:

I New implementation techniques

I Data ingest and conditioning

I Past conclusions may not hold (Banko and Brill)

I Predictable requirements

I Focus on throughput over latency

Problem

I Common vision researchI ∼10K imagesI MATLAB/C++I Single ProcessorI Current Trade-off: data-size vs run-time

I More data than we can process

I Strong results showing data-size is critical (Torralba PAMI08)

I MapReduce: Successful abstraction for large-scale problems

MapReduce

I Programming Abstraction: User writes Map and Reduce

I Optional Optimization: Combiner (less network transfer)

I Framework: Distributed file system, job execution

I Fault tolerant (abstraction allows for reexecution)

I ‘Move code to data’

I Batch processingI Lots of data

I Used on > 1 PB of data at Yahoo, Google, and FacebookI 10K+ core cluster at Yahoo

X 5 Y 7 Z 8

A B C D E Fα β γ δ ε ζ

Shuffle and Sort: aggregate values by keys

ba 1 2 c c3 6 a c5 2 b c7 8

a 1 5 b 2 7 c 6 2 8

ba 1 2 c 6 a c5 2 b c7 8

combiner combiner combiner combiner

partitioner partitioner partitioner partitioner

mapper mapper mapper mapper

reducer reducer reducer

Paper Overview

Goal: MapReduce intro for web-scale vision processing

I Example driven

I Provides thought process and considerations

I Description, pseudo-code, and Python

Topics:

I Data representation

I Design patterns

I Algorithm design

I Experimentation

Algorithms Covered

Selection Criteria:

I Relevant large scale applications

I Diverse

I Non-trivial implementation

Algorithms:

I Clustering

I Bag-of-Features

I Background Subtraction

I Sliding Windows + NMS

I Classifier Training

I Image Registration

Clustering

I Goal: Group unlabeled points using a distance metricI Aid in data analysisI Improve computational efficiency

I Fundamental role in large scale data processing

I Our method evaluated at scale (> 200 GB)

I Implementation choices matter

K-Means

Input (id, point) Note: id is unused(x , p0) . . . (x , p1) . . . (x , pn)

Map (cluster num, point)(5, p0) . . . (2, p1) . . . (5, p2) . . . (1, pn) . . .

Shuffle(1, [pn, . . . ]) . . . (2, [p1, . . . ]) . . . (5, [p0, p2, . . . ])

Reduce (cluster num, centroid)(1, c1) . . . (2, c2) . . . (5, c5) . . .

K-Means w/ Combiner

Input (id, point) Note: id is unused(x , p0) . . . (x , p1) . . . (x , pn)

Map (cluster num, point)(5, p0) . . . (2, p1) . . . (5, p2) . . . (1, pn) . . .

Combine (cluster num, point)(5, p0,2) . . . (2, p1) . . . (1, pn) . . .

Shuffle(1, [pn, . . . ]) . . . (2, [p1, . . . ]) . . . (5, [p0,2, . . . ])

Reduce (cluster num, centroid)(1, c1) . . . (2, c2) . . . (5, c5) . . .

K-Means Experimental Setup

I Google/IBM Academic Cluster

I 410 NodesI Evaluated on 205 GB of points

I 100 ClustersI 1000 Dimensions

I Algorithm modificationsI UncombinedI CombinerI In-map combiner

105 106 107 108

Input Size

101

102

103

104

Tim

e (s

ec.)

K-Means Clustering: Overall Time vs # Points

Uncombined R=10Uncombined R=100Combiner R=10IMC R=10Single Machine (baseline)

Bag-of-Features

I Quantize image feature-points, treat as ‘words’

I Represents an image globally with local featuresI Lacks spatial information

I Can be encoded if desiredI Some invariance to object pose

I Sensitive to feature selectionI RandomI UniformI SIFT, SURF, Harris/Hessian-Affine, MSER

I At scale speed depends onI Number of clusters, features, and dimensionsI Distance metric

Bag-of-Features

Clusters (Words)

0.

1....

Bag-of-Words Vector<2, 1, ...>

Bag-of-Features

Input (image num, feature)(0, f0) . . . (0, f1) . . . (M, fn)

Map (image num, cluster num)(0, 0) . . . (0, 3) . . . (0, 0) . . . (1, 0) . . .

Shuffle(0, [0, 3, 0, . . . ]) . . . (1, [0 . . . ]) . . .

Reduce (image num, bof)(0, 〈2, 0, 0, 1, . . . 〉) . . . (1, 〈1, . . . 〉) . . .

Background Subtraction

I Goal: Distinguish between FG and BG in video sequencesI Detection method in surveillance settingsI High-level action analysis

I Past work: real-time processing (latency)

I Web-scale: batch processing (throughput)I Initial experiment

I Single GaussianI Remove frame dependenceI Learn local neighboring blocksI Uses the order-inversion design pattern

I Processed the PETS’06 dataset in 156 secI 82K 720x576 frames @ 528 FPSI 8GB of video dataI Google/IBM Academic Cluster

Background Subtraction

Input (frame num, image)(0, ) . . . (100, ) . . . (200, )

Map ((block num, flag), (frame num, image))((0, 0), (0, ))((0, 1), (0, ))

Shuffle((0, 0), [(0, ), (1, ), (2, ) . . . ])((0, 1), [(0, ), (1, ), (2, ) . . . ])

Reduce (frame num, bgsub)(0, ) . . . (100, ) . . . (200, )

Applications

I Visual Search Indexing (CBIR)

I Offline Surveillance Processing

I Scene Recognition (e.g., indoor, bank, beach)

I Object Recognition (e.g., cars, faces)

I Auto-Tagging

I Image Registration and Stabilization

I 3D Reconstruction

Conclusion

I MapReduce is applicable to large scale vision tasks

I Diverse set of algorithms shown to fit this model

I Algorithm design guidelines and recommendations

I Web-scale: Emphasis on high throughput

I Design patterns by Lin et al. simplify algorithm design

References

[1] M. Banko and E. Brill. Scaling to very very large corpora fornatural language disambiguation. In ACL, pages 26-33, 2001.

[3] G. Csurka, C. Dance, L. Fan, J. Willamowski, and C. Bray.Visual categorization with bags of keypoints. In ECCV Workshopon Statistical Learning in Computer Vision, 2004.

[5] J. Dean and S. Ghemawat. MapReduce: Simplified dataprocessing on large clusters. In OSDI, pages 137-150, 2004.

[13] J. Lin and C. Dyer. Data-Intensive Text Processing withMapReduce. Morgan & Claypool Publishers, 2010.

[20] A. Torralba, R. Fergus, and W. T. Freeman. 80 million tinyimages: A large data set for nonparametric object and scenerecognition. PAMI, 30(11):1958-1970, 2008.

Questions?

Brandyn White

[email protected]

http://www.cs.umd.edu/~bwhite/

http://github.com/bwhite/hadoopy

http://github.com/bwhite/hadoop_vision

http://www.cs.umd.edu/~bwhite/

http://github.com/bwhite/hadoopy

http://github.com/bwhite/hadoop_vision

web-scale computer vision using mapreduce for multimedia data mining

Documents