Reading Note: DSSD: Deconvolutional Single Shot Detector

TITLE: DSSD: Deconvolutional Single Shot Detector

AUTHER: Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, Alexander C. Berg

CONTRIBUTIONS

A combination of a state-of-the-art classifier (Residual-101) with a fast detection framework (SSD) is proposed.
Deconvolution layers are applied to introduce additional large-scale context in object detection and improve accuracy, especially for small objects.

METHOD

This is a successive work of SSD. Compared with original SSD, DSSD (Deconvolutional Single Shot Detector) adds additional deconvolutional layers and more sophisticated structure for category classifiction and bounding box coordinates regression. As shown in the following figure, the part till blue feature maps is same with original SSD. Then Deconvolution Module and Prediction Module are applied.

Recent works such as Beyond Skip Connections: Top-Down Modulation for Object Detection and Feature Pyramid Networks for Object Detection propose to incorporate fine details into the detection framework using deconvolutional layers and skip connections. DSSD utilizes this idea as well using Deconvolutional Module, shown in the following figure.

Several different structures for Prediction Module are proposed. These structures take the idea from ResNet as illustrated in the following figure.

SOME IDEAS

Using ResNet-101 and more sophisticated structure for prediction is helpful to improve the performance, but the computation cost is high.
The idea of using deconvolutional layers to enlarge the feature maps and using skip connections to combine detail features is becoming popular.