



—— 蔡崇达《皮囊》




TITLE: Fully Convolutional Instance-aware Semantic Segmentation

AUTHOR: Yi Li, Haozhi Qi, Jifeng Dai, Xiangyang Ji, Yichen Wei

ASSOCIATION: Microsoft Research Asia, Tsinghua University

FROM: arXiv:1611.07709


An end-to-end fully convolutional approach for instance-aware semantic segmentation is proposed. The underlying convolutional representation and the score maps are fully shared for the mask prediction and classification sub-tasks, via a novel joint formulation with no extra parameters. The network structure is highly integrated and efficient. The per-ROI computation is simple, fast, and does not involve any warping or resizing operations.


The proposed method is highly related with a previous work or R-FCN. The following figure gives an illustration:

Different from the mentioned previous work, this work predicts two maps, ROI inside map and ROI outside map. The two score maps jointly account for mask prediction and classification sub-tasks. For mask prediction, a softmax operation produces the per-pixel foreground probability. For mask clssification, a max operation produces the per-pixel likelihood of “belonging to the object category”.

For an input image, 300 ROIs with highest scores are generated from RPN. They pass through the bbox regression branch and give rise to another 300 ROIs. For each ROI, its classification scores and foreground mask (in probability) is predicted for all categories. NMS with an IoU threshold is used to filter out highly overlapping ROIs. The remaining ROIs are classified as the categories with highest classification scores. Their foreground masks are obtained by mask voting. For an ROI under consideration, the ROIs (from the 600) are found with IoU scores higher than 0.5. Their foreground masks of the category are averaged on a per-pixel basis, weighted by their classification scores. The averaged mask is binarized as the output.


  1. End-to-end training and testing alleviate the simplicity of the system.
  2. Utilizing the idea of R-FCN, its efficiency is proved.




—— 杨绛《我们仨》



—— 蔡崇达《皮囊》




—— 蔡崇达《皮囊》



TITLE: Densely Connected Convolutional Networks

AUTHOR: Gao Huang, Zhuang Liu, Kilian Q. Weinberger

ASSOCIATION: Cornell University, Tsinghua University

FROM: arXiv:1608.06993


Dense Convolutional Network (DenseNet) is proposed, which embraces the observation that networks can be substantially deeper, more accurate and efficient to train if they contain shorter connections between layers close to the input and those close to the output.


DenseNet is a network architecture where each layer is directly connected to every other layer in a feed-forward fashion (within each dense block). For each layer, the feature maps of all preceding layers are treated as separate inputs whereas its own feature maps are passed on as inputs to all subsequent layers. The idea can be illustrated as the following figure:


In the work of Yoshua Bengio’s Understanding intermediate layers using linear classifier probes, the author claims that the raw input is helpful at the beginning of the training of the network. So maybe the dense connection plays similar role in this work.

Using Caffe to implement DenseNet, large memory is of need because of the large number of split layers.

TITLE: LCNN: Lookup-based Convolutional Neural Network

AUTHOR: Hessam Bagherinezhad, Mohammad Rastegari, Ali Farhadi

ASSOCIATION: University of Washington, Allen Institute for AI

FROM: arXiv:1611.06473


LCNN, a lookup-based convolutional neural network is introduced that encodes convolutions by few lookups to a dictionary that is trained to cover the space of weights in CNNs.


The main idea of the work is decoding the weights of the convolutional layer using a dictionary $D$ and two tensors, $I$ and $C$, like the following figure illustrated.

where $k$ is the size of the dictionary $D$, $m$ is the size of input channel. The weight tensor can be constructed by the linear combination of $S$ words in dictionary $D$ as follows:

where $S$ is the size of number of components in the linear combinations. Then the convolution can be computed fast using a shared dictionary. we can convolve the input with all of the dictionary vectors, and then compute the output according to $I$ and $C$. Since the dictionary $D$ is shared among all weight filters in a layer, we can precompute the convolution between the input tensor $\textbf{X}$ and all the dictionary vectors. Given $\textbf{S}$ which is defined as:

the convolution operation can be computed as

where $\textbf{P}$ can be expressed by $I$ and $C$:

The idea can be illustrated in the following figure:

thus the the dictionary and the lookup parameters can be trained jointly.


  1. It speeds up inference.
  2. Few-shot learning. The shared dictionary in LCNN allows a neural network to learn from very few training examples on novel categories
  3. LCNN needs fewer iteration to train.


  1. Performance is hurt because of the estimation of the weights









TITLE: Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection

AUTHOR: Xianzhi Du, Mostafa El-Khamy, Jungwon Lee, Larry S. Davis

ASSOCIATION: University of Maryland, Samsung Electronics

FROM: arXiv:1610.03466


A deep neural network fusion architecture is proposed to address the pedestrian detection problem, called Fused Deep Neural Network (F-DNN).


The proposed network architecture consists of a pedestrian candidate generator, a classification network, and a pixel-wise semantic segmentation network. The pipeline of the proposed network fusion architecture is shown in the following figure:

Pedestrian Candidate Generator is implemented by SSD. It provides a large pool of pedestrian candidates varying in scales and aspect ratios. Pedestrian candidates generated should cover almost all the ground truth pedestrians, even though many false positives are introduced at the same time.

Classification Network consists of multiple binary classification deep neural networks which are trained on the pedestrian candidates from Pedestrian Candidate Generator.

Soft-rejection based DNN Fusion works as follows: Consider one pedestrian candidate and one classifier. If the classifier has high confidence about the candidate, we boost its original score from the candidate generator by multiplying with a confidence scaling factor greater than one. Otherwise, we decrease its score by a scaling factor less than one. To fuse all $M$ classifiers, the candidate’s original confidence score is multiplied with the product of the confidence scaling factors from all classifiers in the classification network.


and $a{c}$ and $b{c}$ are chosen as 0.7 and 0.1 by cross validation.

Pixel-wise Semantic Segmentation Network is trained to get a binary map. DegreeDgreee to which each candidate’s BB overlaps with the pedestrian category in the SS activation mask gives a measure of the confidence of the SS network in the candidate generator’s results. If the generation pixels occupy at least 20% of the candidate BB area, its score is kept unaltered; Otherw, SNF is applied to scale the original confidence scores.


The idea of the work is simple. It seems a very tricky implementation of pedestrian detection. Though the author claims that it is efficient, it is hard to say how efficient it is using very complex cnn classifiers.



对于宫崎葵就比较熟悉了,第一次知道宫崎葵好像还是上高中,从时间上看应该是先看了《NANA》,不过总觉得第一次知道宫崎葵是因为她和李准基主演的《初雪》。宫崎葵应该不算一眼就让人惊艳的女演员,是越看越好看的类型,而且笑容十分有感染力,森女气质浓厚。earth music&ecology找来宫崎葵做代言可谓相当合适。后来又看了大河剧《笃姬》,不知道是不是宫崎葵总是出演性格相似的角色,总之是一个自带可爱光环的女演员。总感觉宫崎葵是常年处于少女时期,所以很难想象2007年宫崎葵出演的《初雪》上映,而且在同一年她与高冈苍甫结婚。直到2011年两人离婚,感觉宫崎葵出演的角色也都属于少女类型。或许成熟的宫崎葵就好像和离婚同年上映的《丈夫得了抑郁症》里的样子,虽然成熟,但还是自带少女可爱属性。不过这一切都很难说,宫崎葵没有注册任何社交平台,所以基本看不到什么个人生活的曝光,对于宫崎葵真实的样子,可能只能从一些与她合作的演员的评价中得知一二。《平成电影的日本女优》:宫崎葵外貌清纯秀美,天真可爱,能演复杂的内心戏,是一位极具塑造性的天才演员。Ethan Laundry:她也是一位美丽、优雅的女性。苍井优:她总是一副很开心的样子,生活中她也很爱笑,而且她笑起来的时候能够感染到别人。中岛美嘉:就演戏来说宫崎葵是我的大前辈,觉得她很厉害,从她身上学到不少演戏的技巧,见识到与自己与众不同的艺人的实力。李准基:事实上宫崎葵是入戏到会让我觉得她是不是真的喜欢我的那种演员,听到结婚消息的时候蒙了,真的有一种郁闷的小情绪。大竹忍:小葵是个既可爱、又有主见的女孩子,作为一名演员来讲,她又是十分努力刻苦,这一切都看在我的眼里。堺雅人:我和小葵每次拍戏的时候都会交出属于自己的“答案”。作为一名演员来说,小葵真的是我最棒的伴侣。小葵是那种能让人非常心安的女演员。特别这次扮演的是忧郁症患者,必须要让自己变得不安。不知道用语言该怎么表达,和小葵一起,能让我非常安心的“变得不安”。和三年前拍《笃姬》时一样,这次又让小葵背我抱我。大家一定都最想看到小葵扮演的晴子跨越障碍的那一刻。这个故事其实并不特殊,看过电影的人会觉得女演员宫崎葵其实也是个和我们一样的普通人啊,这才是影片最有趣的地方。能做到这一点的女演员,其实并不多。西田敏行:(拍摄期间)是那种因为有喜欢的女孩子,所以去学校也变得很期待的心情。伊藤淳史:就像一直思念着(宫崎葵饰演的蝴蝶)的伊作一样,淳史也一直想着小葵。


I’ve just tried a very interesting and fun website called Quick, Draw!, which is developed by Google Creative Lab and Data Arts Team. This is a game built with machine learning. The user draws, and a neural network tries to guess what the user is drawing. It is really fun.