challenges for deep scene understanding -...
TRANSCRIPT
![Page 1: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/1.jpg)
Challenges for Deep Scene Understanding
Bolei Zhou
MIT
Hang
ZhaoSanja
Fidler(UToronto)
Adela
BarriusoAntonio
Torralba
Aditya
Khosla
Aude
Oliva
Xavier
Puig
Bolei
Zhou
![Page 2: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/2.jpg)
ObjectsintheSceneContext
![Page 3: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/3.jpg)
Challenge 1: Scene Classification
Top1: street
Top2: residential neighborhood
Top3: crosswalk
Top4: apartment building
Top5: office building
objects
Challenge 2: Scene Parsingstuff
Deep s
cene u
nders
tandin
g
![Page 4: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/4.jpg)
• 8milliontrainingimagesfrom365categoriesofPlacesDatabase
• Testset:900imagespercategory
Webpage:http://places2.csail.mit.edu
![Page 5: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/5.jpg)
Constructing Places Database
2.Queryanddownload
images
696adjectives+scenenames
~90million rawimagesdownloaded
1. Collectscenenames
fromdictionary
~1000scenenames
3.Annotatethrough
AmazonMechanicalTurk
Threeroundsofannotations
![Page 6: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/6.jpg)
amusement park
arch
corral
windmill train station platform
Urban
tower
swimming pool street
soccer field
elevator door
bar
cafeteria
veterinarians office
bedroom
conference center
Indoor
staircase
shoe shop
field road
fishpond
watering hole
Nature
rainforest
![Page 7: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/7.jpg)
Results
92 validsubmissionsfrom27 teams(eachteamallowstosubmit
atmost5submissions).
6.00%
8.00%
10.00%
12.00%
14.00%
16.00%
18.00%
20.00%
0 10 20 30 40 50 60 70 80 90 100
Top-5errorsofallthe92submission(sorted)
Baseline
singleResNet152:14.9%
![Page 8: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/8.jpg)
Results
Team Name Top-5Error
Hikvision 9.01%
MW 10.19%
Trimps-Soushen 10.30%
SIAT_MMLAB 10.43%
NTU-SC 10.85%
Hikvision
Qiaoyong Zhong, ChaoLi,Yingying
Zhang,Haiming Sun,Shicai Yang,Di
Xie,Shiliang Pu.
Hikvision ResearchInstitute
MW
GangSunandJie Hu
ChineseAcademyofSciencesand
PekingUniversity
Trimps-Soushen
Jie Shao,Xiaoteng Zhang,Zhengyan
Ding,Yixin Zhao,Yanjun Chen,Jianying
Zhou,Wenfei Wang,LinMei,
Chuanping Hu
TheThirdResearchInstituteofthe
MinistryofPublicSecurity,China
92 validsubmissionsfrom27 teams.
ResNet152 14.93%
VGG16 14.99%
AlexNet 17.25%
Singlemodel
baselines
![Page 9: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/9.jpg)
Ambiguouspredictions
1)Unusualactivityinascene 2)Multiplesceneparts
top-1: restaurant
top-2: ice cream parlor
top-3: coffee shop
top-4: pizzeria
top-5: cafeteria
aquarium
top-1: campsite
top-2: sandbox
top-3: beer garden
top-4: market outdoor
top-5: flea market indoor
junkyard
top-1: balcony interior
top-2: beach house
top-3: boardwalk
top-4: roof garden
top-5: restaurant patio
lagoon
top-1: martial arts gym
top-2: stable
top-3: boxing ring
top-4: locker room
top-5: basketball court
construction site
![Page 10: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/10.jpg)
• Newchallengethisyear
• Eachpixeloftheimageisclassifiedintosomeclass
class labelsemantic mask
Scene
parsing
![Page 11: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/11.jpg)
• 22,000imagesfortrainingandvalidation,3,000imagesfortesting
• 150classesofobjects(car,person,table,etc)andstuff(sky,road,ceiling,etc)
00.020.040.060.080.1
0.120.140.160.18
Pixelfrequencyinthetrainingset
![Page 12: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/12.jpg)
• 22,000imagesfortrainingandvalidation,3,000imagesfortesting
• 150classesofobjects(car,person,table,etc)andstuff(sky,road,ceiling,etc)
![Page 13: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/13.jpg)
ConstructingADEDataset
• Annotatingeachobjectinstancesinascene
• Singleexpertannotatorforafewyearsofwork
http://groups.csail.mit.edu/vision/datasets/ADE20K/
Labelme AnnotationTool
Ms. Adela Barriuso
![Page 14: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/14.jpg)
![Page 15: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/15.jpg)
![Page 16: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/16.jpg)
![Page 17: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/17.jpg)
![Page 18: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/18.jpg)
![Page 19: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/19.jpg)
![Page 20: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/20.jpg)
![Page 21: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/21.jpg)
![Page 22: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/22.jpg)
Results75 validsubmissionsfrom22 teams
0.3
0.35
0.4
0.45
0.5
0.55
0.6
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63 65 67 69 71 73 75
Finalscore=(meanIoU +pixelaccuracy)/2forallthe75
submissions
Baseline(VGG-based):0.4567
![Page 23: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/23.jpg)
Results
Team Name FinalScore
SenseCUSceneParsing 0.5721
Adelaide 0.5674
360+MCG-ICT-CAS_SP 0.5556
SegModel 0.5465
CASIA_IVA 0.5433
FinalScore=(meanIoU +pixelaccuracy)/2
SenseCUSceneParsing
Hengshuang Zhao,Jianping Shi,
Xiaojuan Qi,Xiaogang Wang,Tong
Xiao,Jiaya Jia
Sensetime andCUHK,HongKong
Adelaide
Zifeng Wu,Chunhua Shen,Antonvan
enHengel
UniversityofAdelaide,Australia
360+MCG-ICT-CAS_SP
YuLi,ShengTang,MinLin,Rui Zhang,
YunPeng Chen,YongDong Zhang,
JinTao Li,YuGang Han,ShuiCheng Yan
Qihoo 360,ChineseAcademyof
Sciences,NationalUniversityof
Singapore,ChinaandSingapore
75 validsubmissionsfrom22 teams
DilatedNet 0.4567
FCN-8s 0.4480
SegNet 0.4079
Singlemodel
baselines
![Page 24: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/24.jpg)
DataConsistencyandHumanPerformance
• 61imagesfromval setarere-annotatedafter6months.
82.4%pixelsgotthesamelabel.
64.03% 64.77% 65.41%
73.67% 74.49% 74.73%
82.40%
50.00%
55.00%
60.00%
65.00%
70.00%
75.00%
80.00%
85.00%
PixelAccuracy
![Page 25: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/25.jpg)
DataConsistencyandHumanPerformance
• 61imagesfromval setarere-annotatedafter6months.
82.4%pixelsgotthesamelabel.
64.03% 64.77% 65.41%
73.67% 74.49% 74.73%
82.40%
50.00%
55.00%
60.00%
65.00%
70.00%
75.00%
80.00%
85.00%
PixelAccuracy
floor
sky
wall
building
20.30%pixelaccuracy
averageimage annotationmode
![Page 26: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/26.jpg)
96.55% 96.36% 96.27% 91.96% 91.04%
94.14% 95.63% 96.55% 94.19% 89.59%
93.71% 93.51% 94.84% 94.89% 94.54%
92.73% 91.88% 93.02% 92.62% 91.47%
92.25% 85.66% 90.47% 79.81% 84.85%
Image Ground-truth SenseCU… Adelaide 360+MCG… SegModel CASIA_IVA
![Page 27: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/27.jpg)
62.89% 51.23% 50.25% 45.66% 67.91%
56.19% 42.86% 55.82% 81.53% 48.69%
grandstand
54.08% 52.81% 53.70% 50.49% 38.97%
41.49% 40.72% 49.07% 42.28% 45.04%
boat
car
booth
Image Ground-truth SenseCU… Adelaide 360+MCG… SegModel CASIAIVA
![Page 28: Challenges for Deep Scene Understanding - ImageNetimage-net.org/challenges/talks/2016/scene_challenges2016.pdf · Challenges for Deep Scene Understanding BoleiZhou MIT Hang Zhao Sanja](https://reader031.vdocuments.us/reader031/viewer/2022030406/5a820bb77f8b9a24668d7434/html5/thumbnails/28.jpg)
ThanksalltheParticipantsandAudiences!
Hang
ZhaoSanja
Fidler(UToronto)
Adela
BarriusoAntonio
Torralba
Aditya
Khosla
Aude
Oliva
Xavier
Puig
Bolei
Zhou
http://places2.csail.mit.edu
http://sceneparsing.csail.mit.edu