0%

TITLE: LCNN: Lookup-based Convolutional Neural Network

AUTHOR: Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

ASSOCIATION: University of Washington, Allen Institute for AI

FROM: arXiv:1611.06473

CONTRIBUTIONS

LCNN, a lookup-based convolutional neural network is introduced that encodes convolutions by few lookups to a dictionary that is trained to cover the space of weights in CNNs.

METHOD

The main idea of the work is decoding the weights of the convolutional layer using a dictionary $D$ and two tensors, $I$ and $C$, like the following figure illustrated.

where $k$ is the size of the dictionary $D$, $m$ is the size of input channel. The weight tensor can be constructed by the linear combination of $S$ words in dictionary $D$ as follows:

$$ W_{[:,r,c]}=\sum_{t=1}^{S}C_{[t,r,c]}\cdot D_{[I_{[t,r,c]},:]} \forall r,c $$

where $S$ is the size of number of components in the linear combinations. Then the convolution can be computed fast using a shared dictionary. we can convolve the input with all of the dictionary vectors, and then compute the output according to $I$ and $C$. Since the dictionary $D$ is shared among all weight filters in a layer, we can precompute the convolution between the input tensor $\textbf{X}$ and all the dictionary vectors. Given $\textbf{S}$ which is defined as:

$$ \textbf{S}{[i,:,:]}=\textbf{X}*\textbf{D}{[i,:]} \forall 1\leq i \leq k $$

the convolution operation can be computed as

$$ \textbf{X} * \textbf{W} = \textbf{S} * \textbf{P} $$

where $\textbf{P}$ can be expressed by $I$ and $C$:

$$ P_{j,r,c} = \begin{cases}
C_{t,r,c}& \exists t:I_{t,r,c}=j \
0& \text{otherwise}
\end{cases} $$

The idea can be illustrated in the following figure:

thus the the dictionary and the lookup parameters can be trained jointly.

ADVANTAGES

  1. It speeds up inference.
  2. Few-shot learning. The shared dictionary in LCNN allows a neural network to learn from very few training examples on novel categories
  3. LCNN needs fewer iteration to train.

DISADVANTAGES

  1. Performance is hurt because of the estimation of the weights

看了一部特别压抑的日本电影——《害虫》,由盐田明彦导演,宫崎葵主演。当时16岁的小葵评价这部影片赢得了2001年法国南特三大陆电影节影后。

影片的故事主角是一个边缘少女幸子,13岁的幸子与小学时的班主任绪方恋爱,而幸子的母亲对其不闻不问。为了逃避在学校里来自同学的指指点点,幸子整日逃学在街上闲逛打发时间。她和街头流浪汉成了朋友。后来,幸子又返回了学校,虽然生活看似有所好转,但是她差点被母亲的情人强奸,这一切又迫使幸子离开了学校,到绪方工作的发电厂找寻昔日的恋人。然而幸子在两人约定的小茶店里等了很久,绪方还是没有出现。灰心的幸子跟随一个前来搭讪的男青年离开了茶店,在停车场,她看到了匆匆赶来的绪方,却还是坐上了那个陌生男子的车。

很难想象宫崎葵在十六岁的年纪就有了这样令人叹服的演技,相信童星出身的小葵有着十分幸福的童年,在这样一部令人窒息的影片里。整部电影的台词极少,有些时候给人的感觉甚至好像默片,然而小葵通过到位的表演完美地展现了一个边缘少女的内心世界,绝望、压抑、叛逆、渴望被爱……

影片中的一些对比在给人一丝希望的同时,也让人感到无比痛心。

幸子和流浪汉在大马路上踢易拉罐,此时幸子的笑声充满了一个少女该有的纯净和无忧无虑。然而这一切发生在空无一人的大马路上,别的同学都在学校里,而幸子却和流浪汉在一起。欢快的笑声与空旷马路上的寂静,互相衬托出对方的愉悦与寂寥。

幸子返回学校后,在文化祭中为班级合唱弹奏钢琴伴奏,甚至获得了一个男生的青睐,当他们轻轻地吻在一起的时候,我们相信幸子开始放下警戒的面具,准备真正回到一个初中女生该有的生活轨道,幸子的眼神开始变得温柔。然而,当强奸未遂的流言散布于学校,面对男生的诘问,幸子的表现令人意外得冷静,她只是用教室的椅子把一列桌椅装乱,此时幸子的眼神里是无尽的冷漠。

幸子最灿烂的笑容是给精神有问题的流浪汉递燃烧瓶的时候,很难说这是不是一种心理扭曲的笑容,但是这笑容足以融化每个人内心。这笑容转瞬却被惊恐吞噬,她不断后退,直到退出了镜头。

这电影实在太压抑了,看完之后的一天都觉得心情很糟糕,我在想幸子后来怎么样了,那个时候的日本社会是什么样子的呢,为什么会让一个初中女生陷入如此令人绝望的境遇。甚至我都想去找一本有关日本历史的书,来探寻一下这些疑问。或者再找一部宫崎葵的电影,一部多多展示小葵笑容的电影,让自己的心情愉悦起来。

TITLE: Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection

AUTHOR: Xianzhi Du, Mostafa El-Khamy, Jungwon Lee, Larry S. Davis

ASSOCIATION: University of Maryland, Samsung Electronics

FROM: arXiv:1610.03466

CONTRIBUTIONS

A deep neural network fusion architecture is proposed to address the pedestrian detection problem, called Fused Deep Neural Network (F-DNN).

METHOD

The proposed network architecture consists of a pedestrian candidate generator, a classification network, and a pixel-wise semantic segmentation network. The pipeline of the proposed network fusion architecture is shown in the following figure:

Pedestrian Candidate Generator is implemented by SSD. It provides a large pool of pedestrian candidates varying in scales and aspect ratios. Pedestrian candidates generated should cover almost all the ground truth pedestrians, even though many false positives are introduced at the same time.

Classification Network consists of multiple binary classification deep neural networks which are trained on the pedestrian candidates from Pedestrian Candidate Generator.

Soft-rejection based DNN Fusion works as follows: Consider one pedestrian candidate and one classifier. If the classifier has high confidence about the candidate, we boost its original score from the candidate generator by multiplying with a confidence scaling factor greater than one. Otherwise, we decrease its score by a scaling factor less than one. To fuse all $M$ classifiers, the candidate’s original confidence score is multiplied with the product of the confidence scaling factors from all classifiers in the classification network.

$$ S_{FDNN} = S_{SSD} \times \prod_{m=1}^{M} a_{m} $$

where

$$ a_{m} = max(p_{m} \times \frac{1}{a_{c}} , b_{c}) $$

and $a_{c}$ and $b_{c}$ are chosen as 0.7 and 0.1 by cross validation.

Pixel-wise Semantic Segmentation Network is trained to get a binary map. DegreeDgreee to which each candidate’s BB overlaps with the pedestrian category in the SS activation mask gives a measure of the confidence of the SS network in the candidate generator’s results. If the generation pixels occupy at least 20% of the candidate BB area, its score is kept unaltered; Otherw, SNF is applied to scale the original confidence scores.

SOME IDEAS

The idea of the work is simple. It seems a very tricky implementation of pedestrian detection. Though the author claims that it is efficient, it is hard to say how efficient it is using very complex cnn classifiers.

这个周末的主题是有村架纯和宫崎葵。看了有村参演的《垫底辣妹》、《请叫我英雄》和《闪烁的爱情》,宫崎参演的《如果猫从世界上消失了》、《编舟记》、《黄色大象》、《丈夫得了抑郁症》、《少年手指虎》,还重温了一下《NANA》。

有村架纯算是日本新生代的女演员了,1993年出生,第一次听说这个女演员是在一个影评节目中,其中的主持人对她赞誉有加。也没有特意去关注这个女演员,所以直到这个周末看了《垫底辣妹》。这部电影中有村以辣妹形象登场,最终以邻家女孩的形象谢幕,最初也并没有特别的感觉,后来发现原来是一个长相很甜美的女演员,特别符合自己的审美。随后就搜了搜有村架纯参演的电影,可能是因为太年轻,主要都是学生妹的形象,希望以后可以有更多的戏路。

对于宫崎葵就比较熟悉了,第一次知道宫崎葵好像还是上高中,从时间上看应该是先看了《NANA》,不过总觉得第一次知道宫崎葵是因为她和李准基主演的《初雪》。宫崎葵应该不算一眼就让人惊艳的女演员,是越看越好看的类型,而且笑容十分有感染力,森女气质浓厚。earth music&ecology找来宫崎葵做代言可谓相当合适。后来又看了大河剧《笃姬》,不知道是不是宫崎葵总是出演性格相似的角色,总之是一个自带可爱光环的女演员。总感觉宫崎葵是常年处于少女时期,所以很难想象2007年宫崎葵出演的《初雪》上映,而且在同一年她与高冈苍甫结婚。直到2011年两人离婚,感觉宫崎葵出演的角色也都属于少女类型。或许成熟的宫崎葵就好像和离婚同年上映的《丈夫得了抑郁症》里的样子,虽然成熟,但还是自带少女可爱属性。不过这一切都很难说,宫崎葵没有注册任何社交平台,所以基本看不到什么个人生活的曝光,对于宫崎葵真实的样子,可能只能从一些与她合作的演员的评价中得知一二。《平成电影的日本女优》:宫崎葵外貌清纯秀美,天真可爱,能演复杂的内心戏,是一位极具塑造性的天才演员。Ethan Laundry:她也是一位美丽、优雅的女性。苍井优:她总是一副很开心的样子,生活中她也很爱笑,而且她笑起来的时候能够感染到别人。中岛美嘉:就演戏来说宫崎葵是我的大前辈,觉得她很厉害,从她身上学到不少演戏的技巧,见识到与自己与众不同的艺人的实力。李准基:事实上宫崎葵是入戏到会让我觉得她是不是真的喜欢我的那种演员,听到结婚消息的时候蒙了,真的有一种郁闷的小情绪。大竹忍:小葵是个既可爱、又有主见的女孩子,作为一名演员来讲,她又是十分努力刻苦,这一切都看在我的眼里。堺雅人:我和小葵每次拍戏的时候都会交出属于自己的“答案”。作为一名演员来说,小葵真的是我最棒的伴侣。小葵是那种能让人非常心安的女演员。特别这次扮演的是忧郁症患者,必须要让自己变得不安。不知道用语言该怎么表达,和小葵一起,能让我非常安心的“变得不安”。和三年前拍《笃姬》时一样,这次又让小葵背我抱我。大家一定都最想看到小葵扮演的晴子跨越障碍的那一刻。这个故事其实并不特殊,看过电影的人会觉得女演员宫崎葵其实也是个和我们一样的普通人啊,这才是影片最有趣的地方。能做到这一点的女演员,其实并不多。西田敏行:(拍摄期间)是那种因为有喜欢的女孩子,所以去学校也变得很期待的心情。伊藤淳史:就像一直思念着(宫崎葵饰演的蝴蝶)的伊作一样,淳史也一直想着小葵。

日本电影有一种特别的气质,而且很奇特,给我的感受是:节奏好慢,看着好像有点无聊,但是看起来停不下来,不知不觉间两个小时的电影就结束了。看完之后会稍稍有点“我两个小时居然就看了这么一个小故事”,不过治愈感还是很强。就这看的几部电影而言,印象最深刻的应该是《请叫我英雄》、《编舟记》和《丈夫得了抑郁症》,应该都算是“一个人坚持自我”这一主题。另外看了《NANA》中的宫崎葵和松田龙平,再看《编舟记》中的两人,有一种微妙感觉,2007年的《NANA》到2013年的《编舟记》,好像轻狂少年成长为社会的主力担当一样。所谓事业也应该好像编纂《大渡海》的编辑们一样吧,一生只为这一件事而来。

I’ve just tried a very interesting and fun website called Quick, Draw!, which is developed by Google Creative Lab and Data Arts Team. This is a game built with machine learning. The user draws, and a neural network tries to guess what the user is drawing. It is really fun.

TITLE: Aggregated Residual Transformations for Deep Neural Networks

AUTHOR: Saining Xie, Ross Girshick, Piotr Dollár, Zhuowen Tu, Kaiming He

ASSOCIATION: UC San Diego, Facebook AI Research

FROM: arXiv:1611.05431

CONTRIBUTIONS

A simple, highly modularized network (ResNeXt) architecture for image classification is proposed. The network is constructed by repeating a building block that aggregates a set of transformations with the same topology.

METHOD

The network is designed with two simple rules inspired by VGG/ResNets:

  1. if producing spatial maps of the same size, the blocks share the same hyper-parameters (width and filter sizes).
  2. each time when the spatial map is downsampled by a factor of 2, the width of the blocks is multiplied by a factor of 2. The second rule ensures that the computational complexity, in terms of FLOPs is roughly the same for all blocks.

The building block of ResNeXt is shown in the following figure:

Such design will make the network having more channels (sets of transformations) without increasing much FLOPs, which is claimed as the increasing of ardinality.

SOME IDEAS

The explanation of why such designation can lead to better performance seems to be less pursative.

Today I cleaned up my photo album, and recalled much looking at these photos. The city of Krakow, which is located in the south of poland and the former capital of the country, aroused my desire of travelling. Krakow is one of the most important cities in Europe because of its fantasitic history and culture. I did not know this city until I submitted a paper to an international conference in 2013.

What impressed me most is St. Mary’s Church, sitting in the southeast of the Main Market Square in the center of the city. The church has two towers, the taller one of which has a spire and the other one has a dome. The asymmetry of the church illustrates another kind of beauty that I was not familiar with because most historic architectures in China are symmetric. On every hour, a trumpet signal is played from the taller one of the towers and the plaintive tune breaks off in mid-stream, to memorize a brave young man who was shot in the throat while sounding an alarm before a Mongal attack on the city in 13th century. I think it is really touching and important for local people to remember who they are and where they are from.

Besides this church, the city is filled up with more than a hundred churches and chapels and a castle outside. I wish I could have another chance to visit Krakow to appreciate its beauty.

TITLE: Training Region-based Object Detectors with Online Hard Example Mining

AUTHOR: Abhinav Shrivastava, Abhinav Gupta, Ross Girshick

ASSOCIATION: Carnegie Mellon University, Facebook AI Research

FROM: arXiv:1604.03540

CONTRIBUTIONS

A novel bootstrapping technique called online hard example mining (OHEM) for training state-of-the-art detection models based on deep ConvNets is proposed. The algorithm is a simple modification to SGD in which training examples are sampled according to a non-uniform, non-stationary distribution that depends on the current loss of each example under consideration.

METHOD

Online Hard Example Mining

The main idea of the OHEM is illustrated in the following figure:

This alogorithm is designed for region proposal based object detector. The top side of the figure, including the Convolutional Network and the green arrows part of the RoI Network is same with Fast-RCNN. The key of the algorithm lies in the red arrows part of the RoI Network.

For an input image at SGD iteration, a conv feature map using the Convolutional Network is first computed. Then the RoI Network uses this feature map and the all the input RoIs, instead of a sampled mini-batch, to do a forward pass. The loss computated for each RoI represents how well the current network performs on each RoI. Hard examples are selected by sorting the input RoIs by loss and taking the B/N examples for which the current network performs worst.

Some Details

The author’s implementation maintains two copies of the RoI network, one of which is readonly. This implies that the readonly RoI network allocates memory only for forward pass of all RoIs (R) as opposed to the standard RoI network, which allocates memory for both forward and backward passes. For an SGD iteration, given the conv feature map, the readonly RoI network performs a forward pass and computes loss for all input RoIs (green arrows). Then the hard RoI sampling module selects hard examples, which are input to the regular RoI network (red arrows). This network computes forward and backward passes only for hard RoIs, accumulates the gradients and passes them to the conv network. In practice, the author uses all RoIs from all N images as R, therefore the effective batch size for the readonly RoI network is $\vert R \vert$ and for the regular RoI network is the standard B.

ADVANTAGE

  1. It removes the need for several heuristics and hyperparameters commonly used in region-based ConvNets.
  2. It yields a consistent and significant boosts in mean average precision.
  3. Its effectiveness increases as the training set becomes larger and more difficult, as demonstrated by results on the MS COCO dataset.

SOME IDEAS

  1. How to apply data balancing in one-shot method, such as SSD?

I’ve been a big fun of sailing boats since I was a little child. I do not know what in my childhood that made sailing boats so attractive to me. Maybe it is my fantasies of the romance of sailors, the fabulous view of the wild ocean, or the beauty of the boats themselves.

One of my favourite computer games is Uncharted Waters, also called Daikoukai Jidai, which sets its story at the time between the 15th century and the 17th century when European explorers sailed the seas to open new alternative ways to Asia when the Ottoman Empire blocked the land routes. This is the first video game that I played related to sailing boats. Then I had great fun playing Corsairs: Conquest at Sea and Sid Meier’s Pirates!. Even when I was playing Age of Empires and Sid Meier’s Civilization, I’d like to build naval units to conquer my enemies if possible.

Movies having the element of sailing entertain me much as well. Master and Commander: The Far Side of the World is the best one. It tells a story about how Captain Jack Aubrey of British HMS Surprise led his crew to defeat a fierce enemy, French privateer Acheron during the Napoleonic Wars. Besides their optimism during the war that inspires me much, the music in the movie also opened a door for me to the classic music. Another must mentioned film series is Pirates of the Caribbean. This film series combines adventure and comedy, which made me laugh out loud.

Though I love sailing boats so, I did not gain much knowledge about them and their time much. Recently I found a very fruitful Chinese blog in this topic. It’s time for me to start reading!