Reading Note: Destruction and Construction Learning for Fine-grained Image Recognition

TITLE: Destruction and Construction Learning for Fine-grained Image Recognition

AUTHOR: Yue Chen, Yalong Bai, Wei Zhang, Tao Mei

ASSOCIATION: JD AI Research

FROM: arXiv:2003.14142

CONTRIBUTION

A novel “Destruction and Construction Learning (DCL)” framework is proposed for fine-grained recognition.For destruction, the region confusion mechanism (RCM) forces the classification network to learn from discriminative regions, and the adversarial loss prevents over-fitting the RCM-induced noisy patterns. For construction, the region alignment network re- stores the original region layout by modeling the semantic correlation among regions.
State-of-the-art performances are reported on three standard benchmark datasets, where DCL consistently outperforms existing methods.
Compared to existing methods, proposed DCL does not need extra part/object annotation and introduces no computational overhead at inference time.

METHOD

The proposed method consists of four parts as the following figure shows.

At training stage, three losses are used, including classification loss, adversarial loss and region alignment loss. The loss can be defined as

$L = \alpha L_{cls} + \beta L_{adv} + \gamma L_{loc}$

The three losses play different roles in this work.

Classification Network

At inference stage, only this part of the network is used. And this part introduces the classification loss $L_{cls}$.

Region Confuston Mechanism

Given an input image, the image is first uniformly partioned into $N \times N$ sub-regions. Then the sub-regions are rearranged whithin neighbourhood. This shuffling method destructs the global structure and ensures that the local region jitters inside its neighbourhood with a tunable size. Since the global structure has been destructed, to recognize these randomly shuffled images, the classification network has to find the discriminative regions and learn the delicate differences among categories.

Adversarial Learning Network

Destructing images with RCM does not always bring beneficial information for fine-grained classification. Features learned from these noise visual patterns are harmful to the classification task. Thus adversarial loss $L_{adv}$ is introduced to prevent such overfitting. This loss helps the filters to response differently on original images and region-shuffled images. Thus the network can work reliabally.

Region Alignment Network

The direct aim of the Region Alignment Network is to restore the original image from the scattered image. By end-to-end training, the region alignment loss $L_{loc}$ can help the classification backbone network to build deep understanding about objects and model the structure information, such as the shape of objects and semantic correlation among parts of object.

PERFORMANCE

The following table shows the comparison between this work and prior work.