accelerated connected component labeling using cuda … · connected component labeling (ccl) •...

20
Accelerated Connected Component Labeling Using CUDA Framework Fanny Nina-Paravecino, David Kaeli ICCVG 2014

Upload: others

Post on 04-Aug-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Accelerated Connected Component Labeling Using CUDA Framework Fanny Nina-Paravecino, David Kaeli

ICCVG 2014

Page 2: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Outline

• Introduction• Connected Component Labeling• NVIDIA’s Compute Unified Device Architecture• Accelerated Connected Component Labeling• Performance Results• Conclusions

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

2

Page 3: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Introduction• Image analysis plays an important role in many applications • In the field of physical security, there are challenging tasks

such as luggage scanning at airports that require:• Near real-time response• Very high rate accuracy

• Connected component algorithm identifies neighboring segments possessing similar intensities• Potential for efficient segmentation• Provides high quality results

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

3

Page 4: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Introduction Matrix of Image512 x512

~700 images…

~700 matrices

One Frame

Multiple Frames

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

4

Page 5: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Introduction• Flow chart of Object Detection

DICOM Image

DICOM Image

Input Object Detection

Preprocessing Preprocessing

Image Segmentation

Image Segmentation

Features ExtractionFeatures

Extraction

Object DetectionObject Detection

Our current focus

Our current focus

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

5

Page 6: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Connected Component Labeling (CCL)• There have been a number of attempts to improve

performance of CCL:• Bailey and Johnston, “Single Pass Connected Components

Analysis. Image and Vision Computing” (2007)• Zhao et al., “Stripe-based Connected Components

Labeling” (2010)• Klaiber et al., “A memory-efficient parallel single pass

architecture for connected component labeling of streamed images” (2012)

• GPU implementations• Stava and Benes, “Connected component labeling in CUDA”,

GPU Computing Gems, (2010)

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

6

Page 7: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

NVIDIA’s Compute Unified Device Architecture (CUDA)• Compute capability architecture:• Tesla: Compute capability 1.0, 1.1, 1.2, 1.3.• Fermi: Compute capability 2.0, 2.1.• Kepler: Compute capability 3.0, 3.5.

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

7

Page 8: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

NVIDIA’s Compute Unified Device Architecture (CUDA)• Dynamic Parallelism

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

8

Page 9: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

NVIDIA’s Compute Unified Device Architecture (CUDA)• Concurrent Kernel Execution: Hyper-Q

Issue Order

Stream 0

Stream 1

Fermi

Stream 0 Stream 1 Stream 0 Stream 1

Kepler GK110Kernel Execution

Time

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

9

Page 10: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Accelerated Connected Component Labeling• Two phases:• Phase 0: Find Spans• Phase 1: Merge Spans

Phase 0 Phase 1

1 1

1 1

1

0 0 2 2

1 2 - -

0 0 - -

Spans matrixN x K

Image matrixN x M

Each pair = span

1 2

3 -

5 -

Label Index MatrixN x K/2

Input

Binary imageN x M

threads

0 0 2 2

1 2 - -

0 0 - -

Spans matrix

1 2

2 -

5 -

Label Index

UpdateLabel

Kernel

UpdateLabel

Kernel

Child

threads

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

10

Page 11: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Accelerated Connected Component Labeling• Phase 0: Find Spans• Each span has two elements: (ystart, yend)

• A unique label is assigned immediately1 1

1 1

1

0 0 2 2

1 2 - -

0 0 - -

Spans matrix

1 2

3 -

5 -

Label Matrix

Binary imageN x M

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

11

spanx {(ystart,yend) | I (x,ystart ) I (x,ystart1 ) ... I (x,yend )}

Page 12: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Accelerated Connected Component Labeling• Phase 1: Merge Spans

Merge Span parent kernel

0 0 2 2

1 2 - -

0 0 - -

Spans matrix

1 2

3 -

5 -

Label Matrix

Merge Span parent kernel

0 0 2 2

0 0 2 2

0 1 - -

Spans matrix

1 2

1 2

1 -

Label Matrix

Concurrent Kernels

Multiples images at a time

Update LabelChild KernelUpdate LabelChild Kernel

Merge?

Merge?

Yes

NoNext span

One single update

Multiples updates at the same time

Update LabelChild KernelUpdate LabelChild Kernel

Merge?

Merge?

Yes

NoNext span

1 1

1 1

1 -

1 2

2 -

5 -

Label Matrix

Label Matrix

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

12

Page 13: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Performance Results• Input Image:• DICOM format• Integer values [0 – 255]• More than 700 images (512 x 512 pixels)

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

13

Page 14: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Performance Results• Pre-processing steps:• Background noise removal• Binary Conversion

Original Image Binary Image

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

14

Page 15: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Performance Results• Experimental Environment:• CPU• Intel Core i7-3779K processor• RAM: 8GB

• GPU• GK 110 (NVIDIA GTX Titan)• Compute Capability 3.5• CUDA 5.5

• gcc compiler 3.7• OpenMP 3.0

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

15

Page 16: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Performance Results• One Image

Method Running Time (s) Speedup

CCL Serial 0.25 1.00x

CCL OpenMP 0.18 1.39x

ACCL 0.05 5.00x

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

16

Page 17: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Performance Results• Multiple Images: Hyper-Q

# Streams CCL Serial (s) ACCL (s) Speedup

1 0.25 0.05 5.00x

2 1.08 0.10 10.80x

3 2.16 0.14 15.36x

4 4.18 0.19 21.44x

5 6.09 0.23 25.91x

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

17

Page 18: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Performance Results• Stava, O., Benes, B., CCL in CUDA comparison analysis

Mpixels/s Speedup

O. Stava, B. BenesCCL in CUDA

1542 1.0x

ACCL 5242 3.3x

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

18

Page 19: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

Conclusions• Described Accelerated Connected Component Labeling

(ACCL) using the CUDA framework• Presented evaluation of new features of the NVIDIA

Kepler GPU such as: dynamic parallelism and Hyper-Q• Compared serial CCL, OpenMP CCL with ACCL• Our algorithm scales well as long as we increase the number

of streams

• Dynamic parallelism turns out to be a disadvantage when trying to use a larger number of child thread kernels

Fanny Nina-Paravecino and David Kaeli, ICCVG, 15-17 Sep. 2014, Warsaw, Poland

19

Page 20: Accelerated Connected Component Labeling Using CUDA … · Connected Component Labeling (CCL) • There have been a number of attempts to improve performance of CCL: • Bailey and

ThanksQuestions?