bing: binarized normed gradients for objectness estimation

自适应视觉感知技术与应用程明明, http://mmcheng.net 1

自适应视觉感知技术

程明明

南开大学计算机学院


人类50+%神经元用于视觉信息处理

Image Credit: https://badremuneer.in/the-colours-of-light-green-is-best-for-brain-eyes-health/

人类的大脑中大约有一千亿个神经元，比最强超算还强

https://badremuneer.in/the-colours-of-light-green-is-best-for-brain-eyes-health/


视觉感知技术面临的挑战

大小各异、形状复杂、环境多变、类别众多怎样用有限的计算资源去理解无限复杂的真实世界？


相关论文

Res2Net: A New Multi-scale Backbone Architecture

• IEEE TPAMI 2020

Nonlinear Regression via Deep Negative Correlation Learning

• IEEE TPAMI 2020

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing

• IEEE CVPR 2020


计算机视觉的发展多尺度视角

AlexNet (NIPS’12)

SIFT (ICCV’99)

62557次引用

56729次引用


深度神经网络发展多尺度视角

VggNet (ICLR’15)

37951次引用


深度神经网络发展多尺度视角

ResNet(CVPR’16 Best Paper)

DenseNet(CVPR’17 Best

Paper)

46432次引用


CNN卷积、激活、池化

卷积？


富尺度空间的深度神经网络通用架构

Bottleneck block Res2Net moduleRes2Net: A New Multi-scale Backbone Architecture, IEEE TPAMI, 2020.



•应用1：图像分类 (Res2Net-v1b)

Backbone Params GFLOPs top-1 err. top-5 err.

ResNet-101 44.6 M 7.8 22.63 6.44

ResNeXt-101-64x4d 83.5M 15.5 20.40 -

HRNetV2p-W48 77.5M 16.1 20.70 5.50

Res2Net-v1b-50 25.23M 4.5 19.73 4.96

Res2Net-v1b-101 45.2M 8.3 18.77 4.64

与商汤和港中文开源物体检测库上的主流模型比较



•应用2：物体检测

• Faster R-CNN, MS-COCO

• https://github.com/Res2Net/mmdetection

Backbone Params. GFLOPs box AP

R-101-FPN 60.52M 283.14 39.4

X-101-64x4d-FPN 99.25M 440.36 41.3

HRNetV2p-W48 83.36M 459.66 41.5

Res2Net-v1b-101 61.18M 293.68 42.3

https://github.com/Res2Net/mmdetection



•应用3：Class Activation Mapping



•应用4：显著性物体 (分割)

Images. GT. ResNet-50. Res2Net-50



•应用4：显著性物体 PoolNet (CVPR 2019)

• https://github.com/Res2Net/Res2Net-PoolNet

0.85

0.87

0.89

0.91

0.93

0.95

ECSSD PASCAL-S HKU-IS SOD DUTS-TE

VGG

ResNet50

Res2Net50

https://github.com/Res2Net/Res2Net-PoolNet



•应用5：语义分割 (Deeplab v3+)



•应用5：语义分割 (PASCAL VOC12 val set)



•应用6：实例分割

• Mask-RCNN, MS-COCO



•应用6：实例分割

• Mask-RCNN, MS-COCO

• https://github.com/Res2Net/mmdetection

Backbone Params. GFLOPs box AP mask AP

R-101-FPN 63.17M 351.65 40.3 36.5

X-101-64x4d-FPN 101.9M 508.87 42.0 37.7

HRNetV2p-W48 86.01M 528.17 42.9 38.3

Res2Net-101 63.83M 362.18 43.3 38.6

https://github.com/Res2Net/mmdetection



•应用7: 关键点估计 (COCO 2017)



•应用7: 关键点估计 (COCO 2017)

• https://github.com/Res2Net/Res2Net-Pose-Estimation

• Key-point method: SimpleBaseline [Xiao et. al., ECCV'18].

0.724

0.697

0.765

0.737

0.708

0.782

0.743

0.713

0.792

AP AP (M) AP (L)

ResNet_50 Res2Net_50

Res2Net_v1b_50

https://github.com/Res2Net/Res2Net-Pose-Estimation



•应用8：交互式分割 (Lin et al. CVPR’20)



•应用9：全景分割（Detectron2）



•应用9：全景分割 (Detectron2, MS-COCO)

Name Train mem (GB) Box AP Mask AP PQ

R50-FPN 4.8 40.0 36.5 41.5

R101-FPN 6.0 42.4 38.5 43.0

Res2Net101-FPN 6.0 44.0 39.6 44.5

Detectron2 is Facebook AI Research's nextgeneration software system that implementsstate-of-the-art object detection algorithms.

https://github.com/Res2Net/Res2Net-detectron2/blob/master/configs/COCO-PanopticSegmentation/panoptic_fpn_R_50_3x.yaml

https://github.com/Res2Net/Res2Net-detectron2/blob/master/configs/COCO-PanopticSegmentation/panoptic_fpn_R_101_3x.yaml

https://github.com/Res2Net/Res2Net-detectron2/blob/master/configs/COCO-PanopticSegmentation/panoptic_fpn_R2_101_3x.yaml



•其他应用：https://mmcheng.net/res2net/

矢量化道路检测

行人重识别深度估计

CT影像肿瘤分割

https://mmcheng.net/res2net/


人工设计 vs. NAS

•限定搜索空间、硬件适配难

Tested on GTX 1080Ti


CNN卷积、激活、池化

池化？


Non-local/anisotropy context

即需要细节又需要捕捉全局信息


Non-local context information

Non-local neural networks, CVPR 2018.

Attention to scale: Scale-aware semantic image segmentation, CVPR 2016.

Non-local modules Self-attention

Compute large affinity matrix use huge resources!


Non-local context information

Non-local neural networks, CVPR 2018.

DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and FullyConnected CRFs, PAMI 2018.

Dilated convolution Pyramid/global pooling

Incapable of anisotropy context!


带状池化

Strip Pooling: Rethinking Spatial Pooling for Scene Parsing, IEEE CVPR, 2020.


Strip Pooling (SP) 模块

Long range connection along one direction, local context along the other direction


Visualization

LRD/SRD: long/short range dependency aggregation. MPM: mixed pooling module.


Results

Image GT Results


Results on ADE20K


ILSVRC 2016

Ensemble vs. Single Model


基于深度负相关学习的鲁棒回归

•回归 (regression): 输入→相关输出

•稠密人群计数、年龄估计、图像超分辨率...

•现有主流方法

•设计鲁棒的损失函数

• Single hypothesis能力不足

•集成学习(Esemble learning, EL)

•多模型集成→参数量大→应用较少

Robust Regression via Deep Negative Correlation Learning, IEEE TPAMI, 2020.



•负相关学习(Negative Correlation Learning)

•集成学习(Esenmble learning, EL)

•系统控制子模型的bias-variance-covariance

• DNCL (Deep Negative Correlation Learning)

•不额外增加参数



•对于一个映射G:𝑿 → 𝒀, 损失函数为

•假设集成模型是有多个子模型平均得到



• DNCL (Deep Negative Correlation Learning)

•系统地控制Bias-variance-covariance



•每个子模型 Accurate & “diversified”

•不增加计算量

•利用Group-conv对顶层特征进行分块实现



•理论证明

•集成模型的误差≤子模型平均误差

• Less Rademacher complexity→ 易优化



•应用1：人群计数



•应用2：性格分析



•应用2：性格分析

Comparison of the properties of the proposed method vs. the top teams in the ChaLearn First Impressions Challenge.



•应用3：年龄估计



•应用4：超分辨率


技术报告总结

•分层递进残差网络→富尺度特征提取

•图像分类

•物体检测

•激活图预测

•显著性检测

•语义分割

•实例分割

•关键点估计

•交互式分割

•深度负相关学习

→单模型算力、多模型效果

•人群计数

•年龄估计

•性格分析

•超分辨率

•自适应池化

•各向异性全局信息

•语义分割SOTA


新冠肺炎CT影像AI辅助诊断

•在国内外50+家医院使用：↓时间、↑准确率

•截至3月26日，已为15.3万疑似患者服务

•系统已经应用于美、意、日、俄等


案例


Tuberculosis Diagnosis

Rethinking Computer-aided Tuberculosis Diagnosis, CVPR 2020.


Tuberculosis Diagnosis

Human Study by Radiologists: accuracy is 68.7%, and 84.8% (no distinguish between active and latent TB)


谢谢!

bing: binarized normed gradients for objectness estimation

Documents