0%

Reading Note: Deformable Part-based Fully Convolutional Network for Object Detection

TITLE: Deformable Part-based Fully Convolutional Network for Object Detection

AUTHOR: Taylor Mordan, Nicolas Thome, Matthieu Cord, Gilles Henaff

FROM: arXiv:1707.06175

CONTRIBUTIONS

  1. Deformable Part-based Fully Convolutional Network (DPFCN), an end-to-end model integrating ideas from DPM into region-based deep ConvNets for object detection, is proposed.
  2. A new deformable part-based RoI pooling layer is introduced, which explicitly selects discriminative elements of objects around region proposals by simultaneously optimizing latent displacements of all parts.
  3. Another improvement is the design of a deformation-aware localization module, a specific module exploiting configuration information to refine localization.

METHOD

R-FCN is the work closest to DP-FCN. Both are developed on the basis of Faster-RCNN, in which an RPN is used to generate object proposals and a designed pooling layer is used to extract features for classification and localization. The architecture of DP-FCN is illustrated in the following figure. A Deformable part-based RoI Pooling layer follows a FCN network. Then two branches predict category and location respectively. The output of the backbone FCN is similar to that in R-FCN. It has $ k^2(C+1) $ channels corresponding to $ k \times k $ parts and $ C $ categories and background.

DP-FCN

Deformable part-based RoI pooling

For each input channel, just like what has been done in DPM, a transformation is carried out to spread high responses to nearby locations, taking into account the deformation costs.

Deformable part-based RoI pooling

In my understanding, the output of RPN works like the root filter in DPM. Then the region proposal is evenly divided into $ k \times k $ sub-regions. Then these sub-regions will displace taking deformation into account. Displacement computed during the forward pass are stored and used to backpropagate gradients at the same locations.

Classification and localization predictions with deformable parts

Predictions are performed with two sibling branches for classification and relocalization of region proposals as is common practice. The classification branch is simply composed of an average pooling followed by a SoftMax layer.

Deformation-aware localization refinement

As for location prediction, every part has 4 elements to be predicted. In addition to that, the displacement is sent to two fully connected layers and is then element-wise multiplied with the first values to yield the final localization output for this class.