0%

# D2: 开发环境搭建

## 2. 安装DOCKER

### 修改镜像存储位置（可选操作）

1. 退出Docker Desktop

2. 确认所有WSL应用已退出，所有应用都应该是stop状态

3. 迁移docker-desktop

4. 迁移docker-desktop-data

## 3. 使用Ubuntu 18.04镜像建立开发容器

### 建立基础容器

1. Docker Hub上直接拉取镜像

2. 启动容器

### 基础环境安装

1. 更新源

2. 安装开发基础库

3. 使用交叉编译工具链

下载工具链，我用的是Linaro工具链，按常规方式编译程序，例如MNN的示例是

# D1: RK3399Pro操作系统Ubuntu安装

## 2. 烧录系统

May the 4th, May the Force be With You!

“I wish it need not have happened in my time,” said Frodo.

“So do I,H said Gandalf, “and so do all who live to see such times. But that is not for them to decide. All we have to decide is what to do with the time that is given us.”

It’s great to see the movie in cinema again.

# Spring, the sweet spring

BY THOMAS NASHE

Spring, the sweet spring, is the year’s pleasant king,
Then blooms each thing, then maids dance in a ring,
Cold doth not sting, the pretty birds do sing:
Cuckoo, jug-jug, pu-we, to-witta-woo!

The palm and may make country houses gay,
Lambs frisk and play, the shepherds pipe all day,
And we hear aye birds tune this merry lay:
Cuckoo, jug-jug, pu-we, to-witta-woo!

The fields breathe sweet, the daisies kiss our feet,
Young lovers meet, old wives a-sunning sit,
In every street these tunes our ears do greet:
Cuckoo, jug-jug, pu-we, to witta-woo!

Spring, the sweet spring!

TITLE: Destruction and Construction Learning for Fine-grained Image Recognition

AUTHOR: Yue Chen, Yalong Bai, Wei Zhang, Tao Mei

ASSOCIATION: JD AI Research

FROM: arXiv:2003.14142

## CONTRIBUTION

1. A novel “Destruction and Construction Learning (DCL)” framework is proposed for fine-grained recognition.For destruction, the region confusion mechanism (RCM) forces the classification network to learn from discriminative regions, and the adversarial loss prevents over-fitting the RCM-induced noisy patterns. For construction, the region alignment network re- stores the original region layout by modeling the semantic correlation among regions.
2. State-of-the-art performances are reported on three standard benchmark datasets, where DCL consistently outperforms existing methods.
3. Compared to existing methods, proposed DCL does not need extra part/object annotation and introduces no computational overhead at inference time.

## METHOD

The proposed method consists of four parts as the following figure shows.

At training stage, three losses are used, including classification loss, adversarial loss and region alignment loss. The loss can be defined as

The three losses play different roles in this work.

### Classification Network

At inference stage, only this part of the network is used. And this part introduces the classification loss $L_{cls}$.

### Region Confuston Mechanism

Given an input image, the image is first uniformly partioned into $N \times N$ sub-regions. Then the sub-regions are rearranged whithin neighbourhood. This shuffling method destructs the global structure and ensures that the local region jitters inside its neighbourhood with a tunable size. Since the global structure has been destructed, to recognize these randomly shuffled images, the classification network has to find the discriminative regions and learn the delicate differences among categories.

Destructing images with RCM does not always bring beneficial information for fine-grained classification. Features learned from these noise visual patterns are harmful to the classification task. Thus adversarial loss $L_{adv}$ is introduced to prevent such overfitting. This loss helps the filters to response differently on original images and region-shuffled images. Thus the network can work reliabally.

### Region Alignment Network

The direct aim of the Region Alignment Network is to restore the original image from the scattered image. By end-to-end training, the region alignment loss $L_{loc}$ can help the classification backbone network to build deep understanding about objects and model the structure information, such as the shape of objects and semantic correlation among parts of object.

## PERFORMANCE

The following table shows the comparison between this work and prior work.

TITLE: Weakly Supervised Attention Pyramid Convolutional Neural Network for Fine-Grained Visual Classification

AUTHOR: Yifeng Ding, Shaoguo Wen, Jiyang Xie, Dongliang Chang, Zhanyu Ma, Zhongwei Si, Haibin Ling

ASSOCIATION: Beijing University of Posts and Telecommunications, Stony Brook University

FROM: arXiv:2002.03353

## CONTRIBUTION

1. A novel attention pyramid convolutional neural network (AP-CNN) is propsed by building an enhanced pyramidal hierarchy, which combines a top-down pathway of features and a bottom-up pathway of attentions, and thus learns both high-level semantic and low-level detailed feature representations.
2. ROI guided refinement is proposed which consists of ROI guided dropblock and ROI guided zoom-in to further refine the features. The dropblock operation helps to locate more discriminative local regions, and the zoom-in operation aligns features with background noises eliminated.

## METHOD

AP-CNN is a two-stage network, raw-stage and refined-stage, that respectively takes coarse full images and refined features as input. An overview of the proposed AP-CNN is illustrated in the following figure.

First, the feature and attention pyramid structure takes coarse images as input, which generates the pyramidal features and the pyramidal attentions by establishing hierarchy on the basic CNN following a top-down feature pathway and a bottom-up attention pathway.

Second, once the spatial attention pyramid has been obtained from the raw input, the region proposal network (RPN) proceeds to generate the pyramidal regions of interest (ROIs) in a weakly supervised way. Then the ROI guided refinement is conducted on low-level features with a) the ROI guided dropblock which erases the most discriminative regions selected from small-scaled ROIs, and b) the ROI guided zoom-in which locates the major regions merged from all ROIs.

Third, the refined features are sent into the refined-stage to distill more discriminative information. Both stages set individual classifiers for each pyramid level, and the final classification result is averaged over the raw-stage predictions and the refined-stage predictions.

The Attention Pyramid consists of two types of attentions, Spatial Attention and Channel Attention. The following figure shows the data-flow.

Spatial Attention Pyramid is a set of feature maps of different resolutions and is generated from feature pyramid. Then ROI pyramid is generated from the spatial activations using RPN. At training stage, a ROI is selected to be droped, erasing the informative part and encouraging the network to find more discriminative regions. At testing stage, this operation is skipped. The following figure shows the ROI guided refinement.

## PERFORMANCE

The following table shows the comparison between this work and prior work.