Reading Note: Understanding Convolution for Semantic Segmentation

TITLE: Understanding Convolution for Semantic Segmentation

AUTHOR: Panqu Wang, Pengfei Chen, Ye Yuan, Ding Liu, Zehua Huang, Xiaodi Hou, Garrison Cottrell

ASSOCIATION: UC San Diego, CMU, UIUC, TuSimpl

CONTRIBUTIONS

A method called dense upsampling convolution (DUC) is proposed, which instead of trying to recover the full-resolution label map at once, an array of upscaling filters are learnt to upscale the downsized feature maps into the final dense feature map of the desired size.
A simple hybrid dilation convolution (HDC) framework is proposed, which instead of using the same rate of dilation for the same spatial resolution, a range of dilation rates are used and are concatenated serially the same way as “blocks” in ResNet-101.

METHOD

DUC is illustrated as the following figure.

The key idea of DUC is to divide the whole label map into equal subparts which have the same height and width as the incoming feature map. Every feature map in the dark blue part is a corner or a part of the whole output.

HDC is illustrated as the following figure.

Instead of using the same dilation rate for all layers after the downsampling occurs, a different dilation rate for each layer is used. The pixels (marked in blue) contributes to the calculation of the center pixel (marked in red) through three convolution layers with kernel size 3 × 3. Subsequent convolutional layers have dilation rates of r = 1, 2, 3, respectively.