TITLE: SuperCNN: A Superpixelwise Convolutional Neural Network for Salient Object Detection

AUTHER: Shengfeng He, Rynson W.H. Lau, Wenxi Liu, Zhe Huang, Qingxiong Yang Reid, Ian



  1. A novel superpixel-wise convolutional neural network approach is proposed.
  2. Two kinds of sequence code are designed as input feature to CNN.


  1. Superpixels are extracted via some methods such as oversegmentation.
  2. Extract Color Uniqueness Sequences (CU) for each superpixels to describe the color contrast between regions.
  3. Extract Color Distribution Sequences (CD) for each superpixels to measure the color compactness of colors.
  4. The two sequences are fed into a CNN to generate two saliency maps.
  5. A regressor is used to merge the two predicted saliency maps


Color Uniqueness Sequence is used to describe the color contrast of a Region. Given an image $I$ and the superpixels or regions \(R=\lbrace r_{1},…,r_{x},…,r_{N} \rbrace\), each region \(r_{x}\) contains a color uniqueness sequence $Q_{x}^{C} = \lbrace q_{1}^{c},…,q_{j}^{c},…,q_{N}^{c} \rbrace$. Each element, \(q_{x}^{c}\) is defined as

where \(t(r_{j})\) counts the total number of pixels in region \(r_{j}\). \(\vert C(r_{x})-C(r_{j}) \vert\) is a 3D vector storing the absolute differences of each color channel. \(P(r_{x}\) is the mean position of region \(r_{j}\) and \(w(P(r_{x}),P(r_{j}))\) is defined as

The sequence \(Q_{x}^{C}\) is sorted by the spatial distance to region \(r_{x}\).

Color Distribution Sequence is a sequence \(Q_{x}^{D} = \lbrace q_{1}^{d},…,q_{j}^{d},…,q_{N}^{d} \rbrace\) with the element \(q_{j}^{d}\) defined as:


the sequence is also sorted by the spatial distance.

Network Structure is briefly illustrated as below:

Saliency Inference is first to get the \(N\) predicted saliency scores of the \(N\) regions. Because of the two kinds of sequences, two sets of scores \(S_{1} and S_{2}\) are predicted. The final saliency map can be obtained by:


  1. It is fast when infering.
  2. Large context are encoded in the sequences.


  1. The CNN is of a very light-weight structure. Deeper network may provide better performance.
  2. As the sequences are used to describe contrast information, which may lead to failure with the foreground and background having similar colors.