deep watershed transform for instance...
TRANSCRIPT
Deep Watershed Transform for Instance Segmentation
Min Bai & Raquel Urtasun
To appear at IEEE CVPR 2017 in HawaiiPresented at NVIDIA GTC 2017
Semantic Segmentation● Input: RGB Image● Output at each pixel:
○ Semantic label
Instance Segmentation● Input: RGB Image● Output at each pixel:
○ Semantic label ○ Instance label
■ Same for each px in object■ Different among objects
○ Difficulty: How to phrase the problem?
Applications● Object tracking
Image credit: Davi Frossard
Applications● Interacting with the environment
Image credit: http://www.rethinkrobotics.com/build-a-bot/
Applications● Useful information for other algorithms such as optical flow, etc
Image credit: Shenlong Wang
Semantic Segmentation● Semantic segmentation is a well studied problem
○ Our instance segmentation method leverages an existing technique○ H. Zhao et al, Pyramid Scene Parsing Network, https://arxiv.org/abs/1612.01105
Image credit: H. Zhao et al.
Watershed Transform● Classical image segmentation technique
Image (left) credit: Adrian Fisher
Scalar Field and Gradient
Image source: Wikipedia: byVivekj78 - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=15346899
● Scalar field: single number at each pixel
● Gradient: vector at each pixel, pointing toward direction of greatest ascent
Overview of Approach
Gradient of Energy Landscape Energy Landscape Predicted Instances
Input Image
Semantic Segmentation
Overview of Approach
Gradient of Energy Landscape Energy Landscape Predicted Instances
Input Image
Semantic Segmentation
Why Predict Direction First?
Input Image Energy LandscapeDirection of Gradient
Much sharper difference in the direction label at the boundary!
Overall Network
Direction Prediction Network
Ground Truth Directions
Predicted Directions
Input Image
Semantic Segmentation
Energy Prediction Network
Ground Truth Energy
Predicted Energy
Ground Truth Instances
Predicted Instances
Training and Inference● Pre-train both networks● End-to-end fine-tuning● Network trained on NVIDIA DGX-1
○ Approximately 25 hours total for training on one GP100 core○ ~0.1s per image for forward pass○ Thank you NVIDIA for the generous gift!
Image source: www.nvidia.com
Cityscapes Dataset● 2975 training / 500 validation / 1525 testing images● Instances: car, truck, bus, train, person, rider, motorcycle, bicycle
Cityscapes Dataset● 2975 training / 500 validation / 1525 testing images● Instances: car, truck, bus, train, person, rider, motorcycle, bicycle
Cityscapes Instance Segmentation Leaderboard
* Average Precision (AP): higher is better
AP* AP* @ 50% AP* @ 50m AP* @ 100m
van den Brand et al. 2.3% 3.7% 3.9% 4.9%
Cordts et al. 4.6% 12.9% 7.7% 10.3%
Uhrig et al. 8.9% 21.1% 15.3% 16.7%
Ours 19.4% 35.3% 31.4% 36.8%
Recently, new approaches have achieved even higher performance.
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
Sample Output
Input RGB
Semantic Segmentation
Direction Prediction Energy Prediction
Predicted Instances Ground Truth Instances
Preliminary TorontoCity Aerial Instance Segmentation
Input RGB Semantic Segmentation (ResNet) Predicted Building Instances
Preliminary TorontoCity Aerial Instance Segmentation
Weighted Coverage*
AP* Recall* @ 50%
Precision* @ 50%
FCN-8 41.92% 11.37% 21.50% 36.00%
ResNet-56 40.65% 12.13% 18.90% 45.36%
Ours 56.22% 21.22% 67.16% 63.67%
* higher is better
In Summary...
● Simple technique for instance segmentation● Encodes object instances as energy map● Predicts gradient direction as intermediate task for better
supervision