Joshua's Blog

Reading Note: ResNeSt: Split-Attention Networks

Posted on 2020-05-04 Edited on 2022-08-19 In Computer Vision

TITLE: ResNeSt: Split-Attention Networks

AUTHOR: Hang Zhang, Chongruo Wu, Zhongyue Zhang, Yi Zhu, Zhi Zhang, Haibin Lin, Yue Sun, Tong He, Jonas Mueller, R. Manmatha, Mu Li, Alexander Smola

ASSOCIATION: Amazon, University of California, Davis

FROM: arXiv:2004.08955

CONTRIBUTION

A simple architectural modification of the ResNet is explored, incorporating feature-map split attention within the individual network blocks.
Models utilizing a ResNeSt backbone are able to achieve state of the art performance on several tasks, namely: image classification, object detection, instance segmentation and semantic segmentation.

METHOD

Split-Attention Block

In this work, a Split-Attention Block is explored. For implementation convinience, the Radix-major version is easier to understand. The following figure gives an illustration of Split-Attention Block.

As shown in Radix-major implementation of ResNeSt block, the featuremap groups with same radix index but different cardinality are next to each other physically. This implementation can be easily accelerated, because the $1 \times 1$ convolutional layers can be unified into a layer and the $3 \times 3$ convolutional layers can be implemented using group convolution with the number of groups equal to $RK$.

Comparison with Prior Work

The following figure shows the comparison. Splict-Attention Block is shown in cardinality-major view.

SE-Net introduces squeeze-and-attention (called excitation in the original paper) to employ a global context to predict channel-wise attention factors. With radix=1, Split-Attention block is applying a squeeze-and-attention operation to each cardinal group, while the SE-Net operates on top of the entire block regardless of multiple groups. Previous models like SK-Net introduced feature attention between two network branches, but their operation is not optimized for training efficiency and scaling to large neural networks. This work generalizes prior work on feature-map attentio within a cardinal group setting, and its implementation remains computationally efficient.

PERFORMANCE

Classification

Detection

Segmentation

Reading Note: NBDT: Neural-Backed Decision Trees

Posted on 2020-05-01 Edited on 2022-08-19 In Computer Vision

TITLE: NBDT: Neural-Backed Decision Trees

AUTHOR: Alvin Wan, Lisa Dunlap, Daniel Ho, Jihan Yin, Scott Lee, Henry Jin, Suzanne Petryk, Sarah Adel Bargal, Joseph E. Gonzalez

ASSOCIATION: UC Berkeley, Boston University

FROM: arXiv:2004.00221

CONTRIBUTION

A method is proposed for running any classification neural network as a decision tree by defining a set of embedded decision rules that can be constructed from the fully-connected layer. Induced hierarchies are designed that are easier for neural networks to learn.
Tree supervision loss is proposed, which boosts neural network accuracy by
0.5% and produces high-accuracy NBDTs. NBDTs achieve accuracies comparable to neural networks on small, medium, and large-scale image classification datasets.
Qualitative and quantitative evidence of semantic interpretations are illustrated.

METHOD

Steps for Converting CNN into a Decision Tree

Build an induced hierarchy;
Fine-tune the model with a tree supervision loss;
For inference, featurize samples with the neural network backbone;
And run decision rules embedded in the fully-connected layer.

The following figure illustrates the main steps for converting a classification neural network into a decision tree:

Building Induced Hierarchies

The following figure illustrates how to build induced hierarchies from the network’s final fully-connected layer. For the leaf nodes, the representative vectors are extracted from the weights of FC layer. The parents’ representative vectors are computed by averaging the children.

In this work, the author found a minimal subset of the WordNet hierarchy that includes all classes as leaves, pruning redundant leaves and single-child intermediate nodes. To leverage the source of labels, hypotheses is generated for each intermediate node by finding the earliest ancestor of each subtrees’ leaves.

Training with Tree Supervision Loss

A tree supervision loss is added to the final loss function to encourage the network to separate representative vectors for each internal node. Two losses are proposed, hard tree supervision loss and soft tree supervision loss. The final loss is

$Loss=L_{original}+L_{hard/soft}$

where $L{original}$ is the typical cross entopy loss for classification, and $L{hard/soft}$ stands for hard or soft tree supervision loss.

The hard tree supervsion loss is defined as

$L_{hard}=\frac{1}{N} \sum_{i=1}^{N} CrossEntropy( D(i)_{pred}, D(i)_{label} )$

where $N$ is the number of nodes in the tree, excluding leaves. $D(i){pred}$ is the predicted probabilities and $D{label}$ is the label in node $i$. Note that nodes that are not included in the path from the label to the root have no defined losses.

The soft tree supervsion loss is defined as

$L_{soft}=CrossEntropy(D_{pred}, D_{label})$

where $D{pred}$ is the predicted distribution over leaves and $D{label}$ is the wanted distribution.

The following figure gives an example of the hard and soft tree supervison loss.

PERFORMANCE

On all CIFAR10, CIFAR100, TinyImageNet, and ImageNet datasets, NBDT outperforms competing decision-tree-based methods, even uninterpretable variants such as a decision forest, by up to 18%. On CIFAR10, CIFAR100, and TinyImageNet, NBDTs largely stay within 1% of neural network performance.

SOME THOUGHTs

The performance seems promissing. Howerver, the ablation studies is confusing because they have different expirement settings with more than one variables.
The method of constructing a reasonable hierarchies is illustrated less exhaustive. My best guess is that the author force the tree to be a binary tree.
Is this possible that the leaves have duplicated labels?

A Very Cool AI History

Posted on 2020-04-29 Edited on 2022-08-19 In Computer Vision

A very cool ai history is found here.

Update Blog

Posted on 2020-04-26 Edited on 2022-08-19 In Life Discovery

Rebuilt blog with Hexo.

Using NCNN Crop Layer

Posted on 2020-04-05 Edited on 2022-08-19 In Coding

ncnn is a high-performance neural network inference framework optimized for the mobile platform.

I’ve been using NCNN for quite a while. And recently after compiling the latest version, I was surprised the network could not give the correct output. Besides, the program crashed randomly.

Cropping seemed to be the reason when I digging into the source code. The cropping operation crops not only the 2D feature map but also the channel dim when the input blob is a 3-dim tensor. So I modified _outc = ref_dims == 3 ? ref_channels : channels; to _outc = channels. I’m not sure whether there is another way to avoid this operation. The modification temporately cope the problem.

void Crop::resolve_crop_roi(const Mat &bottom_blob, const Mat &reference_blob, int &_woffset, int &_hoffset, int &_coffset, int &_outw, int &_outh, int &_outc) const
{
    int w = bottom_blob.w;
    int h = bottom_blob.h;
    int channels = bottom_blob.c;
    int dims = bottom_blob.dims;

    int ref_w = reference_blob.w;
    int ref_h = reference_blob.h;
    int ref_channels = reference_blob.c;
    int ref_dims = reference_blob.dims;

    if (dims == 1)
    {
        _woffset = woffset;
        _outw = ref_w;
    }
    if (dims == 2)
    {
        _woffset = woffset;
        _hoffset = hoffset;
        _outw = ref_w;
        _outh = ref_h;
    }
    if (dims == 3)
    {
        _woffset = woffset;
        _hoffset = hoffset;
        _coffset = coffset;
        _outw = ref_w;
        _outh = ref_h;
        // _outc = ref_dims == 3 ? ref_channels : channels;
        _outc = channels;
    }
}

The following image shows the result of a foreground segmentation nework before and after the modification.

Little Things [20200114 Magnificent Nature]

Posted on 2020-01-14 Edited on 2022-08-19 In Life Discovery

望长城内外惟余莽莽，大河上下顿失滔滔

Compile MXNet CPP API on Windows10 Using VS2015

Posted on 2019-09-15 Edited on 2022-08-19 In Coding

In order to deploy MXNet based vision engine to projects develped in C++, we need to compile MXNet CPP API. Though the instruction of how to compile is well illustrated in Build from Source and Build the C++ package, I still confronted some difficulties. This blog records some tips for compling MXNet CPP API.

Modify Source Code

By following the instruction, I could easily complie and get the libmxnet. However, when compling cpp-package, the op.h file can not be generated correctly. In issues#14116, Vigilans provided a solution.

Here: https://github.com/apache/incubator-mxnet/blob/master/include/mxnet/tuple.h#L744
1
2
3
4
5
6
7
8
9
10
11
12
13
14
namespace dmlc {
/*! \brief description for optional TShape */
DMLC_DECLARE_TYPE_NAME(optional<mxnet::TShape>, "Shape or None");
MLC_DECLARE_TYPE_NAME(optional<mxnet::Tuple<int>>, "Shape or None");
// avoid low version of MSVC
#if !defined(_MSC_VER) // <----------- Here !
template<typename T>
struct type_name_helper<mxnet::Tuple<T> > {
 static inline std::string value() {
     return "tuple of <" + type_name<T>() + ">";
 }
};
#endif
}  // namespace dmlc
So the specialization of mxnet::tuple was disabled for Visual Studio in the first place!
I removed the #if block, recompile, then everything works fine.

Set the Environment Variables

In my own case, I only needed to set OpenBLAS_HOME and OpenCV_DIR. Both of the can be set by set command or -D in cmake config.
Use CMake to generate VS solution

cmake -G "Visual Studio 14 2015 Win64" -DUSE_CUDA=0 -DUSE_CUDNN=0 -DUSE_NVRTC=0 -DUSE_OPENCV=1 -DUSE_OPENMP=1 -DUSE_BLAS=open -DUSE_LAPACK=0 -DUSE_DIST_KVSTORE=0 -DUSE_CPP_PACKAGE=1 -DCMAKE_INSTALL_PREFIX=install ..

Above command can be used to generate a solution without GPU support. By modifying config -DUSE_CUDA and -DUSE_CUDNN, we can generate a solution with GPU support.
Generate op.h

After generating libmxnet, we should run python OpWrapperGenerator.py libmxnet.dll to generate op.h. Note to place libmxnet.dll, libopenblas.dll and libopencv_world.dll together with OpWrapperGenerator.py.
No mxnet_static.lib

The cpp example project failed to link to mxnet_static.lib, which was actually named as libmxnet.lib. I modified the name of the static library. I believe the project settings can be fixed to cope with this problem.

1. nvcc path error

set the correct path using environment path, or modify USE_CUDA_PATH in config.mk

2. Error 137

It’s because of lack of memory when compiling. We need to add a swap file in size 2G

1
2
3

dd if=/dev/zero of=/swapfile bs=1k count=2048000
mkswap /swapfile
swapon /swapfile

3. openblas link error

install openblas-dev by sudo apt-get install libopenblas-dev

4. compile error when installing numpy

install python-dev by sudo apt-get install python-dev

5. gluoncv dependencies

sudo apt-get install libfreetype6-dev
sudo apt-get install pkg-config
sudo apt-get install libpng12-dev
sudo apt-get install libjpeg-dev

6. image.io error when testing GluonCV

Need to compile libmxnet.so with OpenCV. Modify USE_OPENCV in config.mk. set USE_OPENCV = 1

Joshua's Blog

Reading Note: ResNeSt: Split-Attention Networks

CONTRIBUTION

METHOD

Split-Attention Block

Comparison with Prior Work

PERFORMANCE

Classification

Detection

Segmentation

Reading Note: NBDT: Neural-Backed Decision Trees

CONTRIBUTION

METHOD

Steps for Converting CNN into a Decision Tree

Building Induced Hierarchies

Training with Tree Supervision Loss

PERFORMANCE

SOME THOUGHTs

A Very Cool AI History

Update Blog

Using NCNN Crop Layer

Little Things [20200114 Magnificent Nature]

Compile MXNet CPP API on Windows10 Using VS2015

My Drawings [20190823]

Little Things [20190722 Hello World]

Using MXNET on Jetson TX1