pr057 mask rcnn
TRANSCRIPT
![Page 1: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/1.jpg)
Yonsei UniversityMVP Lab.
![Page 2: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/2.jpg)
![Page 3: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/3.jpg)
Bbox Regression
Classification
RoIfromSelective Search
RoI PoolingFixed Size Representation
![Page 4: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/4.jpg)
Bbox Regression
Classification
RoI PoolingFixed Size Representation
Bbox Regression
Objectness
RPNRegionProposalNetwork
![Page 5: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/5.jpg)
32x32x3
Conv1
Pool1
16x16x64
Conv2
Pool2
8x8x128
Conv3
Pool3
4x4x256
Conv4
Pool4
2x2x512
Conv5
Pool5
1x1x512
1x1x512 Conv
1x1 Heatmap
x32 Upsample
Softmax
Remove Pooling1x1 Conv for Heatmap Output
![Page 6: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/6.jpg)
![Page 7: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/7.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017
![Page 8: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/8.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017
![Page 9: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/9.jpg)
Sheep Dog
Human
Sheep
Sheep Sheep Sheep
![Page 10: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/10.jpg)
Sheep Dog
Human
![Page 11: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/11.jpg)
Dog
Human
Sheep
Sheep
Sheep Sheep Sheep
![Page 12: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/12.jpg)
BBoxClassification
SegmentationClassification
![Page 13: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/13.jpg)
BBoxClassification
SegmentationClassification
Can Separate
Cannot Segment
![Page 14: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/14.jpg)
BBoxClassification
SegmentationClassification
Can Separate
Cannot Segment
Cannot Separate
Can Segment
![Page 15: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/15.jpg)
BBoxClassification
SegmentationClassification
Segmentationin BBox
Classification
+ =
Can Separate
Cannot Segment
Cannot Separate
Can Segment
![Page 16: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/16.jpg)
BBoxClassification
SegmentationClassification
Segmentationin BBox
Classification
+ =
Can Separate
Cannot Segment
Cannot Separate
Can Segment
Faster R-CNN FCN
![Page 17: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/17.jpg)
BBoxClassification
SegmentationClassification
Segmentationin BBox
Classification
Faster R-CNN FCN FCNon BBOX !
+ =
+ =
Can Separate
Cannot Segment
Cannot Separate
Can Segment
![Page 18: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/18.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017
![Page 19: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/19.jpg)
![Page 20: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/20.jpg)
![Page 21: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/21.jpg)
![Page 22: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/22.jpg)
![Page 23: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/23.jpg)
![Page 24: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/24.jpg)
![Page 25: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/25.jpg)
![Page 26: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/26.jpg)
![Page 27: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/27.jpg)
![Page 28: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/28.jpg)
FCN• Pixel-level Classification• Per Pixel Softmax (Multinomial)• Multi Instance
![Page 29: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/29.jpg)
FCN• Pixel-level Classification• Per Pixel Softmax (Multinomial)• Multi Instance
Faster R-CNN• Classification• Instance Level RoI
![Page 30: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/30.jpg)
FCN• Pixel-level Classification• Per Pixel Softmax (Multinomial)• Multi Instance
Faster R-CNN• Classification• Instance Level RoI
![Page 31: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/31.jpg)
FCN• Pixel-level Classification• Per Pixel SoftmaxSigmoid (Binary)• Multi Instance
Faster R-CNN• Classification• Instance Level RoI
![Page 32: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/32.jpg)
FCN• Pixel-level Classification• Per Pixel SoftmaxSigmoid (Binary)• Multi Instance
Faster R-CNN• Classification• Instance Level RoI
![Page 33: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/33.jpg)
DBBBox + Class + Mask
𝐿 = 𝐿𝑐𝑙𝑠 +𝐿𝑏𝑜𝑥 +𝐿𝑚𝑎𝑠𝑘
𝐿𝑐𝑙𝑠:Softmax Cross Entropy𝐿𝑏𝑜𝑥:Regression𝐿𝑚𝑎𝑠𝑘:Binary Cross Entropy
![Page 34: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/34.jpg)
Training Phase
𝐿𝑚𝑎𝑠𝑘 = 𝐿𝑐1 +𝐿𝑐2 +⋯+𝐿𝑐𝑘
𝐿𝑚𝑎𝑠𝑘 = 𝐿𝑐3
if) GT Class is 3
![Page 35: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/35.jpg)
Training Phase
𝐿𝑚𝑎𝑠𝑘 = 𝐿𝑐1 +𝐿𝑐2 +⋯+𝐿𝑐𝑘
𝐿𝑚𝑎𝑠𝑘 = 𝐿𝑐3
if) GT Class is 3
Mask Branch Only Learns How to Mask independent of Class
![Page 36: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/36.jpg)
Test Phase
Predicts Human MaskPredicts Car MaskPredicts Horse MaskPredicts ...
![Page 37: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/37.jpg)
Test Phase
Predicts Human MaskPredicts Car MaskPredicts Horse MaskPredicts ...
Winner Takes All
![Page 38: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/38.jpg)
![Page 39: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/39.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017
![Page 40: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/40.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017
![Page 41: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/41.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017 Faster R-CNN, S. Ren, NIPS 2015
![Page 42: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/42.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017
Deconv2x2 str2
Deconv2x2 str2
![Page 43: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/43.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017 3x3 Conv4 Layer
![Page 44: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/44.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017
1x1 Conv
1x1 Conv
![Page 45: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/45.jpg)
![Page 46: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/46.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017
![Page 47: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/47.jpg)
Bbox Regression
Classification
RoI PoolingFixed Size Representation
Pooled Feature7x7
![Page 48: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/48.jpg)
RoI Pooling (Fast R-CNN)• Input: Each RoI• Output: 7x7 Pooled Feature
RoI Align (Mask R-CNN)• Input: Each RoI• Output: 7x7 Pooled Feature
![Page 49: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/49.jpg)
RoI Pooling (Fast R-CNN)• Input: Each RoI• Output: 7x7 Pooled Feature
RoI Align (Mask R-CNN)• Input: Each RoI• Output: 7x7 Pooled Feature
![Page 50: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/50.jpg)
Feature Map
RoI
Note: Region Proposal Network RoI Prediction = Floating Point Representation
![Page 51: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/51.jpg)
Feature Map
RoI
![Page 52: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/52.jpg)
Feature Map
RoI
![Page 53: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/53.jpg)
Feature Map
RoI
Max Pooling
![Page 54: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/54.jpg)
Feature Map
RoI
Max Pooling
![Page 55: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/55.jpg)
Feature Map
RoI
![Page 56: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/56.jpg)
Feature Map
RoI
![Page 57: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/57.jpg)
Feature Map
RoI
2x2 Subcells for Precision
![Page 58: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/58.jpg)
= 0.15 + 0.25
+ 0.25 + 0.35
RoI
![Page 59: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/59.jpg)
Feature Map
RoI
2x2 Subcell Max Pooling
![Page 60: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/60.jpg)
Bbox Regression
Classification
RoI Align
Bbox Regression
Objectness
RPN
Binary Mask
![Page 61: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/61.jpg)
Bbox Regression
Classification
RoI Align
Bbox Regression
Objectness
RPN
Binary Mask
Paste Back
![Page 62: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/62.jpg)
Slide from Mask R-CNN Tutorial, K. He. ICCV 2017
![Page 63: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/63.jpg)
![Page 64: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/64.jpg)
• Faster R-CNN + ResNetDeep Residual Learning for Image Recognition, K He, 2016 CVPR
• Faster R-CNN + FPNFeature Pyramid Networks for Object Detection, T.Y.Lin 2017 CVPR
![Page 65: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/65.jpg)
• Faster R-CNN + ResNetDeep Residual Learning for Image Recognition, K He, 2016 CVPR
![Page 66: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/66.jpg)
• Faster R-CNN + FPNFeature Pyramid Networks for Object Detection, T.Y.Lin 2017 CVPR
![Page 67: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/67.jpg)
![Page 68: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/68.jpg)
Faster R-CNN + Binary Mask Prediction + FCN + RoIAlign
![Page 69: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/69.jpg)
Faster R-CNN + Binary Mask Prediction + FCN + RoIAlign
![Page 70: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/70.jpg)
Detection Performance Improvement
![Page 71: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/71.jpg)
![Page 72: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/72.jpg)
![Page 73: Pr057 mask rcnn](https://reader033.vdocuments.us/reader033/viewer/2022051318/5a650d2c7f8b9af3398b5277/html5/thumbnails/73.jpg)
Q&A?