1st valse workshop on pixel level image understanding

52
1 st VALSE Workshop on Pixel Level Image Understanding 8:00-12:00, 20 April, VALSE 2018 1/50 Ming-Ming Cheng 1 st VALSE Workshop on Pixel level image understanding http://mmcheng.net/pixelund/ VALSE 2018 · 大连 20 th April Ming-Ming Cheng

Upload: others

Post on 10-Nov-2021

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 1/50Ming-Ming Cheng

1st VALSE Workshop on Pixel level image understanding

http://mmcheng.net/pixelund/

VALSE 2018 · 大连

20th April

Ming-Ming Cheng

Page 2: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 2/50Ming-Ming Cheng

Workshop Organizers

林倞(中山大学)程明明 (南开大学)

Page 3: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 3/50Ming-Ming Cheng

Invited Speakers

刘偲 (信工所) 魏云超 (UIUC) 董超 (商汤)

王兴刚 (华科) 程明明 (南开)

Page 4: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 4/50Ming-Ming Cheng

Learning Pixel Accurate Image Semantics from Web

Speaker: Ming-Ming Cheng

Nankai University

http://mmcheng.net/

Ming-Ming Cheng

Page 5: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 5/50Ming-Ming Cheng

Dataset Annotation

Page 6: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 6/50Ming-Ming Cheng

Dataset Annotation

CVML 2012, Antonio Torralba

• PASCAL 11:• 10? workers

• 27.374 bounding boxes

• ImageNet:• 25.000 workers

• 11.231.732 images labeled with one word

• ADE20K: • Prof. Torralba’s mother labeled 213.841

segmented objects

• Job offer: I am looking for more parents

Page 7: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 7/50Ming-Ming Cheng

How do we learn ourselves?

Page 8: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 8/50Ming-Ming Cheng

Question

• Could we get ride of user annotation process?• Even keywords level supervision would needs significant

efforts to learn new categories.

• Could a machine vision system learn from web? • Autonomous learning from web

• Without relying on any explicit user annotations

Page 9: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 9/50Ming-Ming Cheng

Salient object detection & weak superv.

Global Contrast based Salient Region Detection, IEEE TPAMI 2015 (CVPR 2011). (2000+ citations)

Page 10: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 10/50Ming-Ming Cheng

More category-agnostic cues?

WebSeg: Learning Semantic Segmentation from Web Searches, arXiv, 2018.

Richer Convolutional Features for Edge Detection, IEEE CVPR 2017.

Deeply supervised salient object detection with short connections, IEEE TPAMI 2018 (CVPR’17).

Page 11: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 11/50Ming-Ming Cheng

Salient object detections (SOD)

Deeply supervised salient object detection with short connections, IEEE TPAMI 2018 (CVPR’17).

Page 12: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 12/50Ming-Ming Cheng

Utilizing multi-scale features

Page 13: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 13/50Ming-Ming Cheng

Bridging between multi-levels

Page 14: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 14/50Ming-Ming Cheng

Bridging between multi-levels

Page 15: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 15/50Ming-Ming Cheng

Sample results

Page 16: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 16/50Ming-Ming Cheng

Sample results

Page 17: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 17/50Ming-Ming Cheng

Sample results

Page 18: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 18/50Ming-Ming Cheng

Messages from numbers

Page 19: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 19/50Ming-Ming Cheng

Performance (use different dataset)

• Training on corresponding training set is the best• Especially obverse for DUT-OMRON

• More training images ≠ better performance

Page 20: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 20/50Ming-Ming Cheng

Performance (use different dataset)

• Construct a unified, composite, and versatile dataset• Online benchmark: https://mmcheng.net/dss/

All results are obtained without any post-processing.

Page 21: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 21/50Ming-Ming Cheng

Failure cases

Page 22: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 22/50Ming-Ming Cheng

Sample Applications

Page 23: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 23/50Ming-Ming Cheng

Edge detection

Richer Convolutional Features for Edge Detection, IEEE CVPR 2017.

Page 24: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 24/50Ming-Ming Cheng

Richer Convolutional Features

Page 25: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 25/50Ming-Ming Cheng

Explicit multi-scale still helps

Page 26: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 26/50Ming-Ming Cheng

Samples

image G-Truth results

Page 27: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 27/50Ming-Ming Cheng

50+ years of boundary detection

Since Roberts (1965)

Page 28: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 28/50Ming-Ming Cheng

Category-agnostic cues…

WebSeg: Learning Semantic Segmentation from Web Searches, arXiv, 2018.

Richer Convolutional Features for Edge Detection, IEEE CVPR 2017.

Deeply supervised salient object detection with short connections, IEEE TPAMI 2018 (CVPR’17).

Page 29: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 29/50Ming-Ming Cheng

Over-segmentation

• Challenges• Image label ≉ semantic category

• How many labels to learn?

HFS: Hierarchical Feature Selection for Efficient Image Segmentation, ECCV 2016.

DEL: Deep Embedding Learning for Efficient Image Segmentation, IJCAI 2018.

Page 30: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 30/50Ming-Ming Cheng

Deep Embedding Learning

Page 31: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 31/50Ming-Ming Cheng

Proxy GT from web searches

Page 32: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 32/50Ming-Ming Cheng

Our framework

Page 33: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 33/50Ming-Ming Cheng

Noise Filtering Module (NFM)

• Given image 𝐼, image level label 𝑦, and heuristic map 𝐻, we learn to predict binary label for each region 𝑅• Extract equal number of feature for each region

• Learn to discard potential noisy labels

Page 34: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 34/50Ming-Ming Cheng

Learning to Filter Noisy Labels

Page 35: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 35/50Ming-Ming Cheng

Testing phase

• NFM only used during testing

Page 36: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 36/50Ming-Ming Cheng

Effective of using different cues

• PASCAL VOC 2012 validation set, no post-processing

Page 37: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 37/50Ming-Ming Cheng

The role of NFM

Page 38: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 38/50Ming-Ming Cheng

Using different training data

• 𝐷(𝑆): Simple web images, manually cleaned, 1 label

• 𝐷(𝐶): Complex images with multi image level label

• 𝐷(𝑊): Web images, 1 non-cleaned label for each image

Page 39: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 39/50Ming-Ming Cheng

Using different training data

• 𝐷(𝑆): Simple web images, manually cleaned, 1 label

• 𝐷(𝐶): Complex images with multi image level label

• 𝐷(𝑊): Web images, 1 non-cleaned label for each image

Page 40: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 40/50Ming-Ming Cheng

Using CRF

Page 41: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 41/50Ming-Ming Cheng

Visual comparisons

Page 42: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 42/50Ming-Ming Cheng

Results on validation & test set

Page 43: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 43/50Ming-Ming Cheng

Comparisons

Page 44: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 44/50Ming-Ming Cheng

Conclusion

• Propose an interesting/challenging vision problem• WebSeg: learning semantic segmentation from web directly

• An online noisy filtering mechanism• Let CNNs know how to discard undesired noisy regions

Page 45: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 45/50Ming-Ming Cheng

Future works

• Never ending learning

• Effectively select good web images to learn from

• Customized salient object detection

• Improve the quality of heuristic cues

• Noise filtering mechanisms

• Other tasks using purely web supervision

We only touched the surface of purely web supervision!

Page 46: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 46/50Ming-Ming Cheng

Source code

free

Page 47: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 47/50Ming-Ming Cheng

Some closely related projects

FLIC: Fast Linear Iterative Clustering with Active Search, AAAI 2018.

Page 48: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 48/50Ming-Ming Cheng

Some closely related projects

Hi-Fi: Hierarchical Feature Integration for Skeleton Detection, IJCAI 2018.

Page 49: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 49/50Ming-Ming Cheng

Some closely related projects

S4Net: Single Stage Salient-Instance Segmentation, arXiv 2017.

Page 50: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 50/50Ming-Ming Cheng

Some closely related projects

Three Birds One Stone: A Unified Framework for Salient Object Segmentation, Edge Detection andSkeleton Extraction, arXiv 2018.

Page 51: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 51/50Ming-Ming Cheng

Some closely related projects

Salient Objects in Clutter: Bringing Salient Object Detection to the Foreground, arXiv 2018.

Page 52: 1st VALSE Workshop on Pixel level image understanding

1st VALSE Workshop on Pixel Level Image Understanding8:00-12:00, 20 April, VALSE 2018 52/50Ming-Ming Cheng

Thanks!Q&A