0%

READING NOTE: Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

TITLE: Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

AUTHOR: Kendall, Alex and Badrinarayanan, Vijay and Cipolla, Roberto

FROM: arXiv:1511.02680

CONTRIBUTIONS

  1. Extending deep convolutional encoder-decoder neural network architectures to Bayesian convolutional neural networks which can produce a probabilistic output.
  2. Bayesian SegNet outputs a measure of model uncertainty, which could be used to provide segmentation confidence.

METHOD

The first half of the network is a traditional convolutional neural network (VGG-16 in this work). The second half is sort of a mirror of the first half, applying upsampling layers to recover the size of output to that of input. The network is trained in an end-to-end method. The probabilistic output is obtained from Monte Carlo samples of the model with dropout at test time.

SOME DETAILS

  1. For each pixel, a softmax classifier is utilized to predict class label.
  2. At test stage, multiple times of forward is applied to simulate Monte Carlo sampling. Thus the mean of the softmax outputs is taken as class label, and the variance is taken as uncertainty.
  3. Situations of high model uncertainty: 1) different class boundaries, 2) object difficult to identify because of occlusion or distance and 3) vague classes such as dogs and cats, chairs and tables.

ADVANTAGES

  1. Monte Carlo sampling with dropout performs better than weight averaging after approximately 6 samples.
  2. No fully connected layers makes the network easier to be trained.
  3. The network could run in real time when computing in parallel.
  4. Do not need to convolve in a slide window method, which contributes its fast speed.

OTHERS

  1. Applying Bayesian weights to lower layers does not result in a better performance, because low level features are consistent across the distribution of models.
  2. Higher level features, such as shape and contextual relationships, are more effectively modeled with Bayesian weights.
  3. At training stage, dropout samples from a number of thinned networks with reduced width. At test time, standard dropout approximates the effect of averaging the predictions of all these thinned networks by using the weights of the unthinned network.

The online demo and codes can be found here and here