miguel dominguez, felipe petroski such, shagan sah ... · miguel dominguez, felipe petroski such,...
TRANSCRIPT
Miguel Dominguez, Felipe Petroski Such, Shagan Sah, Raymond PtuchaRochester Institute of Technology, Rochester, NY, USA
• Point clouds cannot be classified by
traditional CNNs (not an array)
• Voxels have an 𝑂 𝑑3 storage cost
• Treating point clouds as a graph more efficient
• Graphs can be posed as 𝑮 = 𝑽,𝑨• 𝑽 ∈ ℝ𝑁×𝑓 are vertices, 𝑨 ∈ ℝ𝑁×𝑁 is
adjacency matrix
• Can produce filters as polynomials of 𝑨• 𝑯 = ℎ0𝑰 + ℎ1𝑨 + ℎ2𝑨
2…ℎ𝑘𝑨𝑘
• In CNNs, can approximate as a cascade of
small filters:
• 𝑯𝒃𝒂𝒔𝒆 ≈ ℎ0𝑰 + ℎ1𝑨• Filtering 𝑽: 𝑽𝒐𝒖𝒕 = 𝑯𝑽𝒊𝒏• Stacks of 𝐿 adjacency matrices can encode
multiple features and learn more parameters
• Voxels can be used for accurate
classification if heavily downsampled
upfront
• Even high-dimension voxels introduce
artifacts
• Low-dimensional meshes preserve detail
• Asymptotically more efficient:
• Naïve implementation: 𝑂 𝑁2 +𝑁• Can be O(N + E) with sparse
implementation
Task 1: Facial expression
recognition on point clouds
Task 2: Modelnet10 3D meshes
1
43
2𝑨 =
0.9 1.21.2 0
0.8 00.4 0
0.8 0.40 0
0 1.51.5 0
𝑽 =
1234
0.8
1.2
0.4
1.5
0.9
• Row 𝑖 of 𝑨 is an indicator function
• Nonzero values indicate connections to vertex 𝑖• ℎ10.9(1) + ℎ11.2(2) + ℎ10.8(3) is the
weighted sum of vertex 𝑖 “one-hop”
neighborhood
• 𝑨2 is “two-hop” neighborhood, encodes
connections two hops away from vertex 𝑖
Architecture Accuracy # Parameters
3x GraphConv16 54.8% 8016
4x GraphConv16 70.0% 10336
5x GraphConv16 67.9% 12656
CNN + Images 82.2% 8032
Architecture Accuracy
VRN Ensemble 97.14%
ORION 93.8%
4x GraphConv24(ours) 74.3%
Ravanbakhsh,et al. Graph Method* 58.0%
*Only has results on ModelNet40, not Modelnet10
• 5-layer Image CNN may have
advantage due to richer input
features
• Limited receptive field, pooling
and stride to be added in the future
to increase the field
• Contact Miguel Dominguez at