0%

Reading Note: CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

TITLE: CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

AUTHOR: Yuhong Li, Xiaofan Zhang, Deming Chen

ASSOCIATION: University of Illinois at Urbana-Champaign, Beijing University of Posts and Telecommunications

FROM: arXiv:1802.10062

CONTRIBUTION

  1. a pure fully convolutional network (CSRNet) is proposed for understanding the highly congested scenes.
  2. CSRNet achieves the state-of-the-art performance on four datasets (ShanghaiTech dataset, the UCF CC 50 dataset, the WorldEXPO’10 dataset, and the UCSD dataset).

METHOD

CSRNet Architecture

The proposed CSRNet is composed of two major components:

  1. a convolutional neural network (CNN) is utilized as the front-end for 2D feature extraction.
  2. a dilated CNN follows the front-end for the back-end, which uses dilated kernels to deliver larger reception fields and to replace pooling operations.

In this work, the front-end CNN is same as the first ten layers of VGG-16 with three pooling layers, considering the tradeoff between acuracy and the resource overhead. The back-end CNN is a series of dilated convolutional layers and the last layer is a $ 1 \times 1 $ convolutional layer producing density map. The network architecture is shown in the following figure

Architecture

Training

The ground truth is generated using geometry-adaptive kernels to tackle the highly congested scenes. By blurring each head annotation using a Gaussian kernel (which is normalized to 1), the ground truth is generated considering the spatial distribution of all images from each dataset.

9 patches are cropped from each image at different locations with 1/4 size of the original image. The first four patches contain four quarters of the image without overlapping while the other five patches are randomly cropped from the input image. After that, double the training set by mirroring the patches.

PERFORMANCE

Architecture Comparison

Performance Shanghai

Performance UCF

Performance WorldExpo

Performance UCSD

SOME THOUGHTs

  1. How about using more advanced CNNs?