hipi: computer vision at large scale

Chris Sweeny

Liu Liu

Intro to MapReduce SIMD at Scale

Mapper / Reducer

MapReduce, Main Takeaway Data Centric, Data Centric, Data Centric!

Hadoop, a Java Impl An Implementation of MapReduce originated from

Yahoo!

The Cluster we worked at has 625.5 nodes, with map task capacity of 2502 and reduce task capacity of 834

Computer Vision at Scale The “computational vision”

The sheer size of dataset:

PCA of Natural Images (1992): 15 images, 4096 patches

High-perf Face Detection (2007): 75,000 samples

IM2GPS (2008): 6,472,304 images

HIPI Workflow

HIPI Image Bundle Setup Moral of the story:

Many small files are killing the performance in distributed file system.

Redo PCA in Natural Images at Scale The first 15 principal components with 15 images

(Hancock, 1992):

Redo PCA in Natural Images at Scale Comparison:

Hancock, 1992

HIPI, 100

HIPI, 1,000

HIPI, 10,000

HIPI, 100,000

Optimize HIPI Performance Culling: because decompression is costly

Decompress at need

A boolean cull(ImageHeader header) method for conditional decompression

Culling, to inspect specific camera effects Canon Powershot S500, at 2592x1944

HIPI, Glance at Performance figures An empty job (only decompressing and looping over

images), 5 run, using minimal figure, in seconds, lower is better:

100150200250300350400450

10 100 1000 10000 100000

Many Small Files

Hadoop Sequence File

HIPI Image Bundle

HIPI, Glance at Performance figures Im2gray job (converting images to gray scale), 5

run, using minimal figure, in seconds, lower is better:

10 100 1000 10000 100000

Many Small Files

HIPI Image Bundle

HIPI, Glance at Performance figures Covariance job (compute covariance matrix of

patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:

10 100 1000 10000 100000

Many Small Files

HIPI Image Bundle

HIPI, Glance at Performance figures Culling job (decompressing all images V.S.

decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:

10 100 1000 10000 100000

Without Culling

With Culling

Conclusion Everything at large scale gets better.

HIPI provides an image-centric interface that performs on par or better than the leading alternative

Cull method provides significant improvement and convenience

HIPI offers noticeable improvements!

Future work Release HIPI as Opensource Project.

Work on deep integration with Hadoop.

Making HIPI work-load more configurable.

Making work-load more balanced.

hipi: computer vision at large scale

scale hipi

2592x1944 hipi

hipi performanceculling

hipi workload

decompressing images

future workrelease hipi

images hancock

pca of natural images

Technology

large-scale adversarial training for vision-and-language...

hipi for fun

· web viewsupervision of high intensity psychological...

grid scale electrolyzers: enabling the 2045 hawaii vision

vision for large-scale solar power

creating a new vision: scope for a great and scale city the

computer vision for robot...

web-scale computer vision using mapreduce for multimedia...

fusion of imu and vision for absolute scale estimation in...

prosiding final seminar hipi 20111 - documents prosiding...

expo. hipi mixbook

vision services severity rating scale revised 8.15 ·...

large scale computer vision elena ranguelova august 2015 the

visym: a platform for large scale computer vision€¦ ·...

poor vision in a generation - cdn.ymaws.com · 12...

cross-scale: multi-scale coupling in space plasmas esa...

vision for large-scale solar power - abb group...vision for...

evolving large-scale neural networks for vision-based...

machine intelligence at google scale: vision/speech api

scaling up toolkit - from vision to large-scale change