hipi: computer vision at large scale

Post on 08-Jul-2015

2.127 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Chris Sweeny

Liu Liu

Intro to MapReduce SIMD at Scale

Mapper / Reducer

MapReduce, Main Takeaway Data Centric, Data Centric, Data Centric!

Hadoop, a Java Impl An Implementation of MapReduce originated from

Yahoo!

The Cluster we worked at has 625.5 nodes, with map task capacity of 2502 and reduce task capacity of 834

Computer Vision at Scale The “computational vision”

The sheer size of dataset:

PCA of Natural Images (1992): 15 images, 4096 patches

High-perf Face Detection (2007): 75,000 samples

IM2GPS (2008): 6,472,304 images

HIPI Workflow

HIPI Image Bundle Setup Moral of the story:

Many small files are killing the performance in distributed file system.

Redo PCA in Natural Images at Scale The first 15 principal components with 15 images

(Hancock, 1992):

Redo PCA in Natural Images at Scale Comparison:

Hancock, 1992

HIPI, 100

HIPI, 1,000

HIPI, 10,000

HIPI, 100,000

Optimize HIPI Performance Culling: because decompression is costly

Decompress at need

A boolean cull(ImageHeader header) method for conditional decompression

Culling, to inspect specific camera effects Canon Powershot S500, at 2592x1944

HIPI, Glance at Performance figures An empty job (only decompressing and looping over

images), 5 run, using minimal figure, in seconds, lower is better:

050

100150200250300350400450

10 100 1000 10000 100000

Many Small Files

Hadoop Sequence File

HIPI Image Bundle

HIPI, Glance at Performance figures Im2gray job (converting images to gray scale), 5

run, using minimal figure, in seconds, lower is better:

0

100

200

300

400

500

10 100 1000 10000 100000

Many Small Files

Hadoop Sequence File

HIPI Image Bundle

HIPI, Glance at Performance figures Covariance job (compute covariance matrix of

patches, 100 patches per image), 1~3 run*, using minimal figure, in seconds, lower is better:

0

1000

2000

3000

4000

5000

6000

7000

8000

10 100 1000 10000 100000

Many Small Files

Hadoop Sequence File

HIPI Image Bundle

HIPI, Glance at Performance figures Culling job (decompressing all images V.S.

decompressing images we care about), 1~3 run, using minimal figure, in seconds, lower is better:

0

100

200

300

400

500

600

700

10 100 1000 10000 100000

Without Culling

With Culling

Conclusion Everything at large scale gets better.

HIPI provides an image-centric interface that performs on par or better than the leading alternative

Cull method provides significant improvement and convenience

HIPI offers noticeable improvements!

Future work Release HIPI as Opensource Project.

Work on deep integration with Hadoop.

Making HIPI work-load more configurable.

Making work-load more balanced.

top related