0%

I’ve been a great fun of ancient history and warfares since my childhood. Maybe the first enlightenment comes from the serial computer games Age of Empires. I didn’t know Joan of Arc until I played the campaign in Age of Empires II when I was 11 or 12 years old. The story in the ganme was so attractive that I searched who Joan of Arc was and what she did on Internet. I was touched by her patriotic acts and sacrifices. I even began to be interested in France history and wanted to study French in university, though I finally chose EE as my major and became an engineer in AI.

I played Age of Empires II HD a while this weekends because I found it was on sale in Steam. It brought me back to the time of my childhood. The memories of playing this game with my friends were recalled. We had fun playing this game, read stories of heroes and quarrelled about who was the greatest one in history. This is a classic computer game.

春节之后好像更累。

工作上堆积了很多没有解决的问题,要么是程序有问题,要么是样本有问题,要么是模型有问题,总之是每天都紧赶慢赶,这边一下那边一下。连文章都没有时间好好看看,又有一种跟不上潮流的感觉了。

除了上班干活儿,另一个更让人心力憔悴的事就是看房子。自住房也快摇号了,一定要让我摇中了啊, 中了也就不转了,直接买一个,以后就在顺义混了。再一个就是看租的房子,还好眼急手快,直接出手了一个不错的一居室,就是打扫收拾搬家比较费劲。从下周开始蚂蚁搬家,每天从现住地搬点东西到公司,然后下班再搬到新租的房子,拖着空箱子回到现在住的地方。想想也够折腾,不过也没办法啦。希望搬过去之后能有精力再装饰一下新屋子,花了那么多钱租房子,总得让自己住的舒服点。

新年加油吧!

TITLE: DSSD: Deconvolutional Single Shot Detector

AUTHER: Cheng-Yang Fu, Wei Liu, Ananth Ranga, Ambrish Tyagi, Alexander C. Berg

FROM: arXiv:1701.06659

CONTRIBUTIONS

  1. A combination of a state-of-the-art classifier (Residual-101) with a fast detection framework (SSD) is proposed.
  2. Deconvolution layers are applied to introduce additional large-scale context in object detection and improve accuracy, especially for small objects.

METHOD

This is a successive work of SSD. Compared with original SSD, DSSD (Deconvolutional Single Shot Detector) adds additional deconvolutional layers and more sophisticated structure for category classifiction and bounding box coordinates regression. As shown in the following figure, the part till blue feature maps is same with original SSD. Then Deconvolution Module and Prediction Module are applied.

Recent works such as Beyond Skip Connections: Top-Down Modulation for Object Detection and Feature Pyramid Networks for Object Detection propose to incorporate fine details into the detection framework using deconvolutional layers and skip connections. DSSD utilizes this idea as well using Deconvolutional Module, shown in the following figure.

Several different structures for Prediction Module are proposed. These structures take the idea from ResNet as illustrated in the following figure.

SOME IDEAS

  1. Using ResNet-101 and more sophisticated structure for prediction is helpful to improve the performance, but the computation cost is high.
  2. The idea of using deconvolutional layers to enlarge the feature maps and using skip connections to combine detail features is becoming popular.

After reviewing the classic movies Master and Commander: The Far Side of the World and A Beautiful Mind, I surprisingly found that both of them Russell Crowe and Paul Bettany took part in. They had very interesting chemical reaction in both movies and acted well. I must say I like these two actors very much because of their marvelous act.

TITLE: A New Convolutional Network-in-Network Structure and Its Applications in Skin Detection, Semantic Segmentation, and Artifact Reduction

AUTHOR: Yoonsik Kim, Insung Hwang, Nam Ik Cho

ASSOCIATION: Seoul National University

FROM: arXiv:1701.06190

CONTRIBUTIONS

  1. a new inception-like convolutional network-in-network structure is proposed, which consists of convolution and rectified linear unit (ReLU) layers only. That is, pooling and subsampling layer are excluded that reduce feature map size, because decimated features are not helpful at the reconstruction stage. Hence, it is able to do one-to-one (pixel wise) matching at the inner network and also intuitive analysis of feature map correlation.
  2. Proposed architecture is applied to several pixel-wise labeling and restoration problems and it is shown to provide comparable or better performances compared to the state-of-the-art methods.

METHOD

The network structure is inspired by Inception. The comparison of the structure is illustrated in the following figure.

Pooling is removed in the proposed inception module and a larger size kernel instead is added to widen the receptive field which might have been reduced by the removal of pooling. The main inspiration of such modification is to maintain the large receptive field while keep the resolution of output same with input resolution at the same time.

SOME IDEAS

As the network removes the operation that reduces the resolution of the feature maps, both forward and backward propagation could be very slow if the input size is large.

TITLE: Pixel Objectness

AUTHOR: Suyog Dutt Jain, Bo Xiong, Kristen Grauman

ASSOCIATION: The University of Texas at Austin

FROM: arXiv:1701.05349

CONTRIBUTIONS

An end-to-end learning framework for foreground object segmentation is proposed. Given a single novel image, a pixel-level mask is produced for all “object-like” regions even for object categories never seen during training.

METHOD

Problem Formulation

Given an RGB image of size $m \times n \times c$ as input, the problem is formulated as densely labeling each pixel in the images as eigher “object” or “background”. The output is a binary map of size $m \times n$.

Dataset

Two different datasets are used including 1) one dataset with explicit boundary-level annotations and 2) one dataset with implicit imagelevel object category annotations.

Training

The network is first trained on a large scale object classification task, such as ImageNet 1000-category classification. This stage can be regarded as training on an implicit labeled dataset. Its image representation has a strong notion of objectness built inside it, even though it never observes any segmentation annotations.

Then the network is trained on PASCAL 2012 segmentation dataset, which is an explicit labeled dataset. The 20 object labels are discarded, and mapped instead to the single generic “object-like” (foreground) label for training.