0%

Reading Note: ResNeSt: Split-Attention Networks

TITLE: ResNeSt: Split-Attention Networks

AUTHOR: Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola

ASSOCIATION: Amazon, University of California, Davis

FROM: arXiv:2004.08955

CONTRIBUTION

  1. A simple architectural modification of the ResNet is explored, incorporating feature-map split attention within the individual network blocks.
  2. Models utilizing a ResNeSt backbone are able to achieve state of the art performance on several tasks, namely: image classification, object detection, instance segmentation and semantic segmentation.

METHOD

Split-Attention Block

In this work, a Split-Attention Block is explored. For implementation convinience, the Radix-major version is easier to understand. The following figure gives an illustration of Split-Attention Block.

Split-Attention Block

As shown in Radix-major implementation of ResNeSt block, the featuremap groups with same radix index but different cardinality are next to each other physically. This implementation can be easily accelerated, because the $1 \times 1$ convolutional layers can be unified into a layer and the $3 \times 3$ convolutional layers can be implemented using group convolution with the number of groups equal to $RK$.

Comparison with Prior Work

The following figure shows the comparison. Splict-Attention Block is shown in cardinality-major view.

Comparison

SE-Net introduces squeeze-and-attention (called excitation in the original paper) to employ a global context to predict channel-wise attention factors. With radix=1, Split-Attention block is applying a squeeze-and-attention operation to each cardinal group, while the SE-Net operates on top of the entire block regardless of multiple groups. Previous models like SK-Net introduced feature attention between two network branches, but their operation is not optimized for training efficiency and scaling to large neural networks. This work generalizes prior work on feature-map attentio within a cardinal group setting, and its implementation remains computationally efficient.

PERFORMANCE

Classification

classification

classification-stoa

Detection

detection

Segmentation

instance-segmentation

semantic-segmentation