0%

TITLE: Aggregated Residual Transformations for Deep Neural Networks

AUTHOR: Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He

ASSOCIATION: UC San Diego, Facebook AI Research

FROM: arXiv:1611.05431

CONTRIBUTIONS

A simple, highly modularized network (ResNeXt) architecture for image classification is proposed. The network is constructed by repeating a building block that aggregates a set of transformations with the same topology.

METHOD

The network is designed with two simple rules inspired by VGG/ResNets:

  1. if producing spatial maps of the same size, the blocks share the same hyper-parameters (width and filter sizes).
  2. each time when the spatial map is downsampled by a factor of 2, the width of the blocks is multiplied by a factor of 2. The second rule ensures that the computational complexity, in terms of FLOPs is roughly the same for all blocks.

The building block of ResNeXt is shown in the following figure:

Such design will make the network having more channels (sets of transformations) without increasing much FLOPs, which is claimed as the increasing of ardinality.

SOME IDEAS

The explanation of why such designation can lead to better performance seems to be less pursative.

Today I cleaned up my photo album, and recalled much looking at these photos. The city of Krakow, which is located in the south of poland and the former capital of the country, aroused my desire of travelling. Krakow is one of the most important cities in Europe because of its fantasitic history and culture. I did not know this city until I submitted a paper to an international conference in 2013.

What impressed me most is St. Mary’s Church, sitting in the southeast of the Main Market Square in the center of the city. The church has two towers, the taller one of which has a spire and the other one has a dome. The asymmetry of the church illustrates another kind of beauty that I was not familiar with because most historic architectures in China are symmetric. On every hour, a trumpet signal is played from the taller one of the towers and the plaintive tune breaks off in mid-stream, to memorize a brave young man who was shot in the throat while sounding an alarm before a Mongal attack on the city in 13th century. I think it is really touching and important for local people to remember who they are and where they are from.

Besides this church, the city is filled up with more than a hundred churches and chapels and a castle outside. I wish I could have another chance to visit Krakow to appreciate its beauty.

TITLE: Training Region-based Object Detectors with Online Hard Example Mining

AUTHOR: Abhinav Shrivastava, Abhinav Gupta, Ross Girshick

ASSOCIATION: Carnegie Mellon University, Facebook AI Research

FROM: arXiv:1604.03540

CONTRIBUTIONS

A novel bootstrapping technique called online hard example mining (OHEM) for training state-of-the-art detection models based on deep ConvNets is proposed. The algorithm is a simple modification to SGD in which training examples are sampled according to a non-uniform, non-stationary distribution that depends on the current loss of each example under consideration.

METHOD

Online Hard Example Mining

The main idea of the OHEM is illustrated in the following figure:

This alogorithm is designed for region proposal based object detector. The top side of the figure, including the Convolutional Network and the green arrows part of the RoI Network is same with Fast-RCNN. The key of the algorithm lies in the red arrows part of the RoI Network.

For an input image at SGD iteration, a conv feature map using the Convolutional Network is first computed. Then the RoI Network uses this feature map and the all the input RoIs, instead of a sampled mini-batch, to do a forward pass. The loss computated for each RoI represents how well the current network performs on each RoI. Hard examples are selected by sorting the input RoIs by loss and taking the B/N examples for which the current network performs worst.

Some Details

The author’s implementation maintains two copies of the RoI network, one of which is readonly. This implies that the readonly RoI network allocates memory only for forward pass of all RoIs (R) as opposed to the standard RoI network, which allocates memory for both forward and backward passes. For an SGD iteration, given the conv feature map, the readonly RoI network performs a forward pass and computes loss for all input RoIs (green arrows). Then the hard RoI sampling module selects hard examples, which are input to the regular RoI network (red arrows). This network computes forward and backward passes only for hard RoIs, accumulates the gradients and passes them to the conv network. In practice, the author uses all RoIs from all N images as R, therefore the effective batch size for the readonly RoI network is $\vert R \vert$ and for the regular RoI network is the standard B.

ADVANTAGE

  1. It removes the need for several heuristics and hyperparameters commonly used in region-based ConvNets.
  2. It yields a consistent and significant boosts in mean average precision.
  3. Its effectiveness increases as the training set becomes larger and more difficult, as demonstrated by results on the MS COCO dataset.

SOME IDEAS

  1. How to apply data balancing in one-shot method, such as SSD?

I’ve been a big fun of sailing boats since I was a little child. I do not know what in my childhood that made sailing boats so attractive to me. Maybe it is my fantasies of the romance of sailors, the fabulous view of the wild ocean, or the beauty of the boats themselves.

One of my favourite computer games is Uncharted Waters, also called Daikoukai Jidai, which sets its story at the time between the 15th century and the 17th century when European explorers sailed the seas to open new alternative ways to Asia when the Ottoman Empire blocked the land routes. This is the first video game that I played related to sailing boats. Then I had great fun playing Corsairs: Conquest at Sea and Sid Meier’s Pirates!). Even when I was playing Age of Empires and Sid Meier’s Civilization), I’d like to build naval units to conquer my enemies if possible.

Movies having the element of sailing entertain me much as well. Master and Commander: The Far Side of the World is the best one. It tells a story about how Captain Jack Aubrey of British HMS Surprise led his crew to defeat a fierce enemy, French privateer Acheron during the Napoleonic Wars. Besides their optimism during the war that inspires me much, the music in the movie also opened a door for me to the classic music. Another must mentioned film series is Pirates of the Caribbean). This film series combines adventure and comedy, which made me laugh out loud.

Though I love sailing boats so, I did not gain much knowledge about them and their time much. Recently I found a very fruitful Chinese blog in this topic. It’s time for me to start reading!

TITLE: Understanding intermediate layers using linear classifier probes

AUTHOR: Guillaume Alain, Yoshua Bengio

ASSOCIATION: Université de Montréal

FROM: arXiv:1610.01644

CONTRIBUTIONS

The concept of the linear classifier probe (probe) is introduced to understand the roles of the intermediate layers of a neural network, to measure how much information is gained at every layer (answer : technically, none). This powerful concept can be very useful to understand the dynamics involved in a deep neural network during training and after.

Linear Classifier Probes

Probes

The probes are implemented in a very simple manner, that using a fully-connected layer and a softmax as a linear classifier. The classifier’s error takes NO part in the back-propagation process and is only used to measure the features’ ability of solving classification problems, which are extracted from different layers of different depth in the network.

Probes on untrained model

Given an untrained network, the probes are set to see whether each layer would give useful features for a classification task. The data is generated from a Gaussian distribution, a very easy task.

The probe to layer 0 corresponding to the raw data are able to classify perfectly. And the performance degrades when applying random transformations brought by the intermediate layers. The phenomenon indicates that at the beginning on training, the usefulness of layers decays as we go deeper, reaching the point where the deeper layers are utterly useless. The authors give a very strong claim: garbage forwardprop, garbage backprop

Auxiliary loss branches and skip connections

From the experiment in the paper, it seems that Auxiliary loss branches help make the untrainable model trainable.

From another experiment, if we added a bridge (a skip connection) between layer 0 and layer 64, the model completely ignores layers 1-63. The following figure illustrates the phenomenon.

Some Ideas

  1. the probes can be used to visualize the role of each layers.
  2. ResNet is really necessary? Why it works if the skip will ignore the layers it covers.
  3. Training stage by stage could be very useful when a very deep network is use.

有时候不要做无谓的挣扎,尤其是在总部跟你要东西的时候。

我的成就感来自于完成一项工作,所以我盼望着新的工作。但是,如果还没有来得及体会成就感就有新工作,多半我会讨厌这个新工作,然后充满斗志,就为了再次获得成就感。

供暖之前,又下雨,冷。窝在床上读东野的《我杀了他》,广播里放着北京交通广播的《有我陪着你》

没什么新鲜事,应该督促自己读paper了。

一切事情都不能太自以为是,不然是要被打脸的。总部同事以前就怀疑过SVM预测会占用大量CPU资源,但是我们却觉得2000*100的矩阵向量乘应该是轻量级运算,但是经过排查果然还是这个计算出了问题。

在make之前一定要make clean。各种调试代码,一个小时以后,最后却是通过make clean解决了,一切代码又恢复原状,无用功啊!

记住,玛雅:我们在二十岁有共鸣的东西到了四十岁的时候不一定能产生共鸣,反之亦然。书本如此,生活亦如此。——《岛上书店》

以前就觉得自己慢热,但是直到最近一段时间才发觉自己是慢热到极致,慢热到连成熟度都比实际年龄拖后好几年,每次测心理年龄都比实际低那么三四岁。想想这可能真是个事实,比如读《哈利波特》,本来是一部在初中特别流行的小说,但是那个时候我却更喜欢《指环王》,给自己找的一个偏爱的理由是《指环王》比《哈利波特》具有更宏大的背景,但其实现在回想起来,应该还不能理解中土世界的宏大和那个世界的来龙去脉,真正的理由应该是“好人打坏人”。《哈利波特》虽然也是“好人打坏人”,但是其中混杂着角色的成长、复杂的人格和人物之间的矛盾,读起来总不如一场酣畅淋漓的杀敌竞赛来的痛快。后来上了大学,才突然对《哈利波特》的故事着迷起来,可能直到那时才与书中的角色发生了共鸣,才能将自己带入到角色中去。相应的,对于《指环王》的喜爱也转换为开始探究中土世界的来源。除了读书,谈恋爱也一样,好像现在还处于小孩谈恋爱的状态,甚至会羡慕一些综艺节目里的桥段。

即便慢热,到现在却也开始觉得年龄大了,时不时会有一种焦躁,仔细想想估计就是怕变老吧,想永远年轻有活力,可以有无限的时间。想做的事情太多,比如做饭、画画、练字、看文献、读书、做有意思的小项目、制作模型、出去旅行……有些时候是贪多嚼不烂,这么多事情不可能都做得到。有些时候总想等以后再做,比如等有了自己的房子再开始干嘛干嘛,租着房子以后搬家好麻烦。有些时候就是因为穷,机票好贵。真羡慕《指环王》里的精灵,长生不老,厌倦了一件事之后,可以拾起另一件事。估计也不会有“当其欣于所遇,暂得于己,快然自足,曾不知老之将至”的痛苦。或者好像《阳光姐妹淘》里的sunny girls,虽然大家一度“向之所欣,俯仰之间,已为陈迹”,但是依然能够有一次最后的疯狂,只可惜我希望自己可以一直疯狂。大二大三的时候曾经有过“要是我一直都是二十一二岁该多好”的想法,估计是满足于无忧无虑的生活,而现在可是结结实实地还念了。

如果现在还是对十几岁的东西感兴趣,好像总得提醒一下自己那样是不是太幼稚了,应该学着成熟。孙燕姿《年轻无极限》里“二十五岁的我,学着大人应该有的动作”,以前觉得二十五岁还年轻,现在早过了二十五,却还是没有什么大人的动作。真是矛盾,既想要成熟的思想,又想有年轻的心态。

Recently I hang much with Ubuntu because I am trying some interesting computer vision projects which are developed in Linux environment. In addition Steam supports to play Sid Meier’s Civilization V using its SteamOS platform, which means that I do not have to log into Windows to play my favourite computer game. In the purpose of using Ubuntu well, I have learn much from some websites that I have listed following.

  1. The Linux Command Line

    Designed for the new command line user, this 540-page volume covers the same material as LinuxCommand.org but in much greater detail. In addition to the basics of command line use and shell scripting, The Linux Command Line includes chapters on many common programs used on the command line, as well as more advanced topics.

    A very useful Chinese version can be found at 快乐的 Linux 命令行.

  2. LinuxTOY

    This is a websites in Chinese that provide visitors with Linux OS related information. The website was started back in 2006. There are many useful and interesting posts, including news, softwares, games and tutorials.

  3. Ubuntu Official Website

    Ubuntu is an open source project that develops and maintains a cross-platform, open-source operating system based on Debian. It includes Unity, a consistent user interface for the smartphone, the tablet and the PC. Upgrades are released every six months and support is guaranteed by Canonical for up to five years. Canonical also provides commercial support for Ubuntu deployments across the desktop, the server and the cloud.

  4. Pro Git

    Git is a free and open source distributed version control system designed to handle everything from small to very large projects with speed and efficiency. Though Git is not only used on Linux OS but also on Windows/Mac, it really much more convenient and ecological on Linux.

  5. A Great Vim Cheat Sheet

    I’ve compiled a list of essential vim commands that I use every day. I then give a few instructions on how to making vim as great as it should be, because it’s painful without configuration.

  6. climate

    The ultimate command line tool for Linux! climate provides a huge number of command line options for developers to automate their Linux system.

  7. Mastering Bash and Terminal

    If there is one tool that every developer uses regardless of language, platform, or framework it’s the terminal. If we are not compiling code, executing git commands, or scp-ing ssl certificates to some remote server, we are finding a new version of cowsay to entertain ourselves while we wait on one of the former. As much as we use the terminal it is important that we are efficient with it. Here are some ways I make my time in the terminal efficient and effective.