fast ccl(connected component labeling) with gpu

43
ICESS 2016, Takamatsu, Japan 14 ~ 16 Nov. 2016 Young-Min Kang Tongmyong University A Parallel Approach to Object Identification In Large-scale Images Sung-Soo Kim, ETRI Gyung-Tae Nam, GCSC Inc.

Upload: young-min-kang

Post on 14-Apr-2017

123 views

Category:

Education


1 download

TRANSCRIPT

ICESS 2016, Takamatsu, Japan

14 ~ 16 Nov. 2016

Young-Min KangTongmyong University

A Parallel Approach to Object IdentificationIn Large-scale Images

Sung-Soo Kim, ETRI Gyung-Tae Nam, GCSC Inc.

Bigger images

• Era of Big data– Increased sizes of images data

• Image processing– Heavy Computation

• One of the most fundamental operations– Object identification/recognition

• Image segmentation• Connected components labeling

Connected component labeling

• Objective– Pixels in a connected component have an identical labels

Parallel image processing

• Most image processing algorithms– Pixel-wise operations

• can be implemented with pixel-wise threads• can be efficiently performed in a data-parallel fashion

• GPU– Data parallel device– can be easily applied to various image processing methods

GPU:Many-core architecture

Pixel connectivity

• Graph representation

Image Pixel connectivity

CCL and parallelism

• CCL with graph traversal– cannot be easily parallelized

• Traversal = sequential

• GPU based approaches– has not been very successful

Our method

• GPU-based efficient algorithm for CCL– Data initialization– Computing column-wise label runs– Efficient label merge

Data initialization

• Each pixel is assigned unique label if it is turned on

Data initialization

• Each pixel is assigned unique label if it is turned on

1 2 3 4 5

6 7 8 9 10

11 12 13 14 15

16 17 18 19 20

21 22 23 24 25

Data initialization

• Each pixel is assigned unique label if it is turned on

1 2 -1 -1 -1

6 7 -1 9 10

11 12 -1 14 -1

-1 -1 -1 19 20

-1 -1 -1 -1 -1

Column-wise label runs

• Run– Block of contiguous object pixels in a column

• Computing column-wise label runs– Can be done with w threads

h

w

Column-wise label runs

• Label change within a column (1 thread)

Column-wise label runs

• Graph-based interpretation

Column-wise label runs

• Implementation

Label merge

• After computing “column-wise label runs”– We have separate trees to be merges in accordance

with their connectivity• What is needed

– Checking vertical adjacency

Label merge

• Connectivity check

Label merge

• Connectivity check

Label merge

• Connectivity check

Label merge

• Connectivity check

Label merge

• Updated hierarchy

Why only roots are changed

Let’s merge

OK! I will follow you

Why only roots are changed

Merged tree

Previous methods

1. Check the connectivity2. Update the hierarchy3. Iterate this process until no update is made

A kind of graph traversalHeavy computation when the pixels make a

long connected chain

Our method

• Label merge is performed with fixed number of iterations– The number of iteration

• log2(w)– Computation cost at every iteration

• reduced to be the half the previous one

• Efficient label merge• Moreover

– Can be easily parallelized

Label merge boundary

• 1st merge

w/2 boundariesh comparisons in each boundary

wh/2 threads

Label merge boundary

• 2nd merge

w/22 boundariesh comparisons in each boundary

wh/22 threads

Label merge boundary

• 3rd merge

w/23 boundariesh comparisons in each boundary

wh/23 threads

Label merge boundary

• Final merge

log2(w) –th merge

Computation cost at the 1st merge: C(1)

Total Cost

Performance

• Computational cost for each task– Cost for Initialization = 1– 4096x4096 images with different number of connected components

50 labels 1869 labels

initialization 1.0 1.0

column-wise run 1.6 1.6

label merge 3.4 3.6

Performance

• Computational cost for each task– Cost for Initialization = 1– 4096x4096 images with different number of connected components

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

initialization column-wise run label merge

50 labels

1869 labels

Experimental results

• Reference– Grana’s method implemented with OpenCV

• Two Tests– Random noise with varying densities– Object identification with shapes

Varying densities

• Image size: 2048x2048

Varying densities

• Image size: 2048x2048

Varying densities

• Image size: 4096x4096

Varying densities

• Image size: 4096x4096

Object identification with shapes

• Two spiral curves

Object identification with shapes

Object identification with shapes

• Stars

Object identification with shapes

• Stars

Applications

• Object tracking with radar signal

Conclusion

• An efficient GPGPU implementation for CCL

• Data-parallelism of GPU exploited• Experimental results show its efficiency• Can be successfully applied to various

applications with large-scale images– e.g., Object identification from radar signals

감사합니다.ありがとうございます

谢谢Thank you

Q & A