0%

TITLE: Learning Deconvolution Network for Semantic Segmentation

AUTHER: Hyeonwoo Noh, Seunghoon Hong, Bohyung Han

ASSOCIATION: Department of Computer Science and Engineering, POSTECH, Korea

FROM: arXiv:1505.04366

CONTRIBUTIONS

  1. A multi-layer deconvolution network is designed and learned, which is composed of deconvolution, unpooling, and rectified linear unit (ReLU) layers.
  2. Instance-wise segmentations are merged for final sematic segmentation, which is free from scale issues.

METHOD

The main steps of the method is as follows:

  1. Object proposals are genereated by alogrithms such as EdgeBox.
  2. ROI extracted based on object proposals are sent to the Deconvolution Network. The outputs are instance-wise segmentations.
  3. instance-wise segmentations are combined to get the final segmentaton.

Some Details

Architecture of the network is shown as the following figure. In the network, unpooling operation captures example-specific structures by tracing the original locations with strong activations back to image space. On the other hand, deconvolution operation learnes filters to capture class-specific shapes.

Training contains two stages. At first stage, simpler data are used to train the network. The simpler data are generated using object annotations and contains constraint appearance of objects. At second stage, complex data are similarly generated but from object proposals.

Inference includes a CRF can further bootstrap the performance.

ADVANTAGES

  1. It handles objects in various scales effectively and identifies fine details of objects .
  2. Deconvolution can generate finer segmentations.

DISADVANTAGES

  1. Large number of proposals are needed to get better result, which means higher computational complexity.

看了一部也不知道是烂片还是好片的电影——《蝙蝠侠大战超人:正义黎明》。

先说说烂的一面。首先这个名字就颇有标题党的嫌疑,蝙蝠侠和超人的对决少的可怜,两者的打斗戏基本属于走过场,而且前一秒还势不两立,后一秒就并肩战斗了。其次逻辑过于弱智了,两者之间的矛盾纯粹是小人挑拨,而其中的离间计也拙劣到令人发指,甚至不如咱们《三国演义》里的桥段,然而就这么拙劣的伎俩居然就让两位神一样存在的人物那么掐起来了。最后就是这名为《蝙蝠侠大战超人》的电影,反而要让神奇女侠来推动剧情,如果不是导演剪辑版,不知道这故事还怎么讲。再有一点想要吐槽的就是,虽然蝙蝠侠就是在法律之外惩处罪恶,本身的人设比较黑暗,但是这一部中也过于偏执了。再一个就是超人的能力被限制的太过苛刻了,完全就是羸弱啊!

再说说值得思考的一面。首先是影片中的一个思考,如果一个人(姑且称他为人)具有这个世界完全不能控制的力量,那么他就是神,而神的所做所为只受神的高尚道德来约束,这样的神是不是被消灭才更好。其次,是影片临近末尾处,对人类的描述:Man are still good. We fight. We kill. We betray one another. But we can rebuild. We can do better. We will. We have to. 这一段描述让我想起《指环王》里阿拉贡最后的呼喊:A day may come when the courage of man fails, when we forsake our friends and break all bonds of fellowship.But it is not this day. An hour of wolves and shattered shields when the age of man comes crashing down. But it is not this day. This day we fight! 人类或许真的可以在大灾前领悟一些什么,但是为什么一定要在大灾难前呢?

记一篇流水账。

上周末去苏州玩的高铁上又看了一本东野圭吾的小说——《恶意》,依旧是最好的诡计和无懈可击的推理。东野圭吾很擅长挖掘人性中的一点,并让一个角色将这一点无限放大。譬如这一部作品中的主角野野口修对身边人的嫉妒,我们每个人身边都有邻居家的孩子、别人的男女朋友,老板喜欢的同事……他们都是照出我们阴暗面的太阳,我们可能至多在人家背后说说酸溜溜的坏话,或者干脆自欺欺人地对他们表现出不屑,而其实内心中在乎得要死。但是野野口却做出了一个杀人的决定。撇开野野口缜密的布局,让人思考的就是野野口杀人的动机——嫉妒,这也许是一种最最普通的情感了,读完这部小说,我不得不提醒自己,一定不能让这种心态吞噬自己,当嫉妒能够左右你所做的决定时,那么你离自己的灭亡也就不远了。

苏州一直在下雨,游览江南的园林就需要这种梅雨沥沥的意境。

从苏州回来的高铁上看了小李主演的《无间道风云》,影片刚刚公映的时候也想看过,尝试了几次,一直都没能看进去,要不是在高铁上实在没得干,估计还是看不进去。不知道是不是自己欣赏水平有限,不过还是得说一句:去看港版吧!

发现一个跨平台的对文件系统进行监测并由事件驱动进行操作的python工具——Watchdog,下面写一个简单的例子介绍如何使用watchdog

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37

import sys
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler

class EventHandler(FileSystemEventHandler):

def on_created(self, event):
super(EventHandler, self).on_created(event)
self.file_name = event.src_path
if not event.is_directory:
print "create file: %s" % event.src_path
else:
print "create directory: %s" % event.src_path

def on_modified(self, event):
super(EventHandler, self).on_modified(event)
self.file_name = event.src_path
if not event.is_directory:
print "modify file: %s" % event.src_path
else:
print "modify directory: %s" % event.src_path


if __name__ == "__main__":
path = sys.argv[1] if len(sys.argv) > 1 else './test/'
event_handler = EventHandler()
observer = Observer()
observer.schedule(event_handler, path, recursive=True)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()

其中EventHandler继承自FileSystemEventHandler,其中可以针对不同的文件操作类型编写不同的处理逻辑。

今天看了一部法国电影,是一部蒸汽朋克风格的动画电影。这部电影的脑洞很大,故事发生在一个架空时空里,但是与真实历史也有一丝联系。故事的起始时间为1870年,法国皇帝拿破仑三世为了应对法国与普鲁士之间的战争急需更为先进的战争武器,于是他命令年轻科学家居斯塔夫-富兰克林研制一种终极血清,这些血清可以用于建立不死军团。然而居斯塔夫的研究失败了,他研制的血清只能使动物读书说话,气急败坏的拿破仑三世拔枪击中了实验室中的危险品导致爆炸,其当场殒命,两个在实验中被制造的怪物逃之夭夭。随后,拿破仑四世继承皇位并与普鲁士签订和平协议,由此便与真实历史不同了,普法战争在还没有开始的时候就宣告结束。随后,法国逐渐成为欧洲的统治者。随后奇怪的事情发生了,世界上的顶级科学家大量失踪,全世界的警察都无能为力,而事情的真相是这些科学家都被法国官方集中到一起,致力于保证法国在世界上的统治地位。从此,在外界看来,人类科学停滞,煤炭和木材始终是人类的主要能源,整个欧洲的树木都被砍光了,大气中弥漫着煤尘,空气质量极差,人们只得带着厚厚的面具出行。

阿薇尔的父母是居斯塔夫-富兰克林的儿子和儿媳,他们一家人都是杰出的科学家,为了不被警察抓走,他们费尽心思与官方进行周旋。与此同时,他们并没有停止对终极血清的研究,而就在他们即将成功的时刻,阿薇尔的父母还是被抓了,只有阿薇尔和她的爷爷居斯塔夫各自逃脱,在阿薇尔的父母被捕前,阿薇尔的母亲拼死将血清注入到阿薇尔喜爱的水晶球玩具里,而阿薇尔对此却一无所知。与家人失散后,阿薇尔与一只叫达尔文的猫相依为命,达尔文能读书说话,一直陪伴在阿薇尔身旁。阿薇尔为了完成父母的遗志,历尽千辛万苦,终于在十年后独立研制出了终极血清。历史再一次重演,官方凭借优势技术,得知阿薇尔拥有终极血清,并再一次找到阿薇尔。虽然阿薇尔得到爷爷的帮助暂时逃脱,但是他们终于还是被带到了囚禁科学家们的机构。此时真相才被揭开,原来政府已经被架空,操控这一切的竟然是两只大蜥蜴,它们正是从古斯塔夫实验室逃脱的怪物,在获取了人类的智慧和能力之后逐渐开始操控人类。经过一番殊死搏斗后,由蜥蜴操控幕后主使终于被消灭,终极血清使得地球上的植物起死回生,甚至使得外太空的星球上都适于人类呼吸。

电影很有趣,而且颇具深度,观众可以从各个角度深入思考,包括人与自然的关系、科学技术的双刃剑作用、亲情的力量、历史的假设……除了这些令人深思的地方,还有一些小细节也很有意思。比如电影中那只叫达尔文的猫,它在电影中的分量很重,甚至其拯救了人类。法国动画电影对猫好像情有独钟,以前还看过一部叫《屋顶上的猫》,也是一只猫的故事。再比如对历史的调侃,其中一个镜头中,希特勒正在巴黎街头作画,看来元首学画的梗总要被拿出来用一用。另一个有趣的地方是埃菲尔双子铁塔,其中一座在最终决战时被毁,这才与现实一致。

好久没有更博了,每次想写点什么都觉得静不下心。其实仔细想想,最近的输入确实也比较少,输出自然也少。

这一段时间没有读新的文献,因此专业技能上没有太多需要记录的。不过近一段时间仔细研究了一下SSD(Single Shot MultiBox Detector),从研读作者的代码到动手跑样例,再到使用自己设计的模型训练检测器,对这一套检测算法有了比较深入的理解,也得到了一些效果。做这些实验的最大感触就是——细节决定成败,这一套算法乍看比较简单,思路和流程都比较清晰,但是其中的细节却隐藏着很多有意思的trick,包括各个参数之间的配合,具体问题所对应的参数设置都需要仔细品味。现在读文献,很多时候都是不求甚解,但真正要解决一个问题,或者真正吃透一个算法,打破砂锅问到底的精神是必不可少的。看文献要多问为什么,而不应该只是简单地接受结论。

好长时间没有画画了,本来计划每个周末都动动笔,没想到最近两周要么加班,要么杂七杂八的事情,一犯懒就一笔没动。自己画得也谈不上好,想画的主要原因一个是换换脑子,每天都对着电脑写代码看文档,其实也挺疲劳的,再加上自己下了班也会学一些相关的专业知识,估计大脑的其他部分都要萎缩了,所以需要做点人文相关的事情,调节调节心情;第二个原因就是可以让心情沉稳下来,都说现在是一个浮躁的社会,在这样的大环境下,我想我是很难独善其身了,说不浮躁肯定是假的,但是在画画的时候,外界的一切可以暂时的被忘记,我可以专注于线条的走势和颜色的搭配,心情也自然沉静了,在这没有空调的出租屋里,好歹也能收获一个心静自然凉。

最近看了两部很有意思电影,一部是《盗钥匙的方法》,另一部是《曼妮姐妹》。最初注意到这两部电影,其实都是因为女主角。真正看了才发现,不只是女主角吸引人,电影本身也很精彩。

《盗钥匙的方法》是一部2012年的日本电影,网上的剧情介绍是:

小剧场演员樱井武史(堺雅人饰)奋斗多年始终未见出头天,穷困潦倒,情场失意,绝望至极的他连自杀都不成功。无奈之下,他只得进入大众浴池洗澡,谁知却阴差阳错用一枚香皂滑到了某个陌生男子。男子倒地晕厥,樱井则鬼迷心窍偷走了对方的储物柜钥匙,从此化名近藤,过起了从来不敢企及的富贵人生活。然而令他怎么也想象不到的是,近藤(香川照之饰)竟是一名名震黑道的冷血杀手。在此之后,樱井不得不接手来自黑道的委托,硬着头皮干起杀人的营生。与此同时,从医院醒来的近藤失去记忆,误以为自己是走投无路的樱井。在偶然结识的美丽女性水岛早苗(广末凉子饰)的帮助下,他一点一滴重新认识作为演员的自己,苦苦探索前进的方向,在此过程中他和早苗的内心也悄悄发生变化。而当他的记忆恢复那一刹那,三个人的命运也由此纠缠到了一起……

看到女主是广末凉子,感觉电影应该很温馨,本来是抱着看温情片的心情来看的,电影开头的凶杀案倒是很出乎我的意料,后来随着剧情发展,发现这剧情很有意思啊,反转很多。虽然夹杂着杀手的情节,但确实是一部彻底的温情喜剧,最终还真符合我的期待,就是一部很温馨的电影。电影里透着日式电影的细致,还夹杂着很多无厘头,这种组合看着挺新鲜。小的笑点也很多,给人的感觉就是角色们都在正经地胡说八道和逗比,这种反差反而让人忍俊不禁。

另一部电影《曼妮姐妹》相比较而言更沉闷一些,但是丝毫不影响观众体会人性和人与人之间的关心。两部电影的主题有点类似,都反映了陌生人之间的由猜疑到信任的情感变化,最终大家都开始互相依赖,反而无法分开了。网上的剧情介绍是:

11岁的亚曼达(Amanda)和16岁的萝芮(Lo)这对姐妹,从各自的收养家庭逃了出来,展开她们的流浪之旅。然而过程中,萝芮发现自己怀孕了,惊慌失措的两人,竟因此绑架了一名婴儿用品店店员伊莲,她们深信伊莲可以帮助她们安然度过这个难关。在长时间的相处下,伊莲慢慢发现,自己也开始依赖与需要这两个女孩的陪伴了。

点进这部电影是因为看到这是斯嘉丽约翰逊13岁时演的电影,对寡姐幼年的形象很好奇,就进来看了。寡姐在这部电影里就是个小天使,不管是对姐姐还是伊莲都抱有着极大爱心。而且一些小动作,比如挑眉毛,和成年之后真是一模一样,但是从一个幼齿小孩的脸上做出来,总透露出一种不一样的感觉。这部电影拍得很细腻,总觉的好像日本电影,不知道是不是因为最近日本电影看多了。

TITLE: Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks

AUTHER: Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick

ASSOCIATION: Cornell University, Microsoft Research

FROM: arXiv:1512.04143

CONTRIBUTIONS

  1. ION architecture is introduce that leverages context and multi-scale skip pooling for object detection. Use the information both inside and outside the ROI to determine the detection result.

METHOD

The main steps of the method is shown in the following figure.

  1. The image is first fed into a CNN, e.g.VGG16.
  2. ROI proposals are generated in the same way of Fast R-CNN.
  3. The information within the ROI are extracted by ROI pooling on different feature maps from different convolutional layers of different scales.
  4. The information outside the ROI are extracted by 2 successive 4-direction IRNNs. And ROI pooling is used to extract features.
  5. The pooled features are L2 nomalized and concated. Then a 1X1 conv layer is used to reduce the dimension.
  6. Two branches are learned to predict category and location.

some details

A 4-direction IRNN contains 4 independent IRNNs and each IRNN moves in different directions (left, right, up and down). The internal IRNN computations are splitted into separate logical layers. the input-to-hidden transition is implemented by a 1x1 convolution, and its computation can be shared across different directions.

ADVANTAGES

  1. The proposed detector works better on smaller objects compared with other works.
  2. Both local and global information are take into account.
  3. Skip pooling uses the informaiton of different scales.
  4. Two successive 4-direction IRNN cover the information form the whole image.

TITLE: Semantic Object Parsing with Graph LSTM

AUTHER: Xiaodan Liang, Xiaohui Shen, Jiashi Feng, Liang Lin, Shuicheng Yan

ASSOCIATION: National University of Singapore, Sun Yat-sen University, Adobe Research

FROM: arXiv:1603.07063

CONTRIBUTIONS

  1. A novel Graph LSTM structure is proposed handle general graph-structured data, which effectively exploits global context by superpixels extracted by over-segmentation.
  2. A confidence-driven scheme is proposed to select the starting node and the order of updating sequences.
  3. In each Graph LSTM unit, different forget gates for the neighboring nodes are learned to dynamically incorporate the local contextual interactions in accordance with their semantic relations.

METHOD

The main steps of the method is shown in the following figure.

  1. The input image first passes through a stack of convolutional layers to generate the convolutional feature maps.
  2. The convolutional feature maps are further used to generate an initial semantic confidence map for each pixel.
  3. The input image is over-segmented to multiple superpixels. For each superpixel, a feature vector is extracted from the upsampled convolutional feature maps.
  4. The first Graph LSTM takes the feature vector of every superpixel as input to compute a better state.
  5. The second Graph LSTM takes the feature vector of every superpixel and the output of first Graph LSTM as input.
  6. The update sequence of the superpixel is according to the initial confidence of the superpiexels.
  7. several 1×1 convolution filters are employed to produce the final parsing results.

some details

A graph structure is built based on the superpixels. The nodes are the superpixels and the nodes are linked when they are adjacent. The history information used by the G-LSTM for one superpixel come from the adjacent superpixels.

ADVANTAGES

  1. Constructed on superpixels generated by oversegmentation, the Graph LSTM is more naturally aligned with the visual patterns in the image.
  2. Adaptively learning the forget gates with respect to different neighboring nodes when updating the hidden states of a certain node is beneficial to model various neighbor connections.

TITLE: Object Detection from Video Tubelets with Convolutional Neural Networks

AUTHER: Kai Kang, Wanli Ouyang, Hongsheng Li, Xiaogang Wang

ASSOCIATION: The Chinese University of Hong Kong

FROM: arXiv:1604.04053

CONTRIBUTIONS

  1. A complete multi-stage framework is proposed for object detection in videos.
  2. A special temporal convolutional neural network is proposed to incorporate temporal information into object detection from video.

METHOD

The main steps of the method is shown in the following figure.

  1. Image object proposal. The regions are generated in each frame by Selective Search and classified by AlexNet of 200 categories. It is a similar method to R-CNN. The region with scores lower than a threshold are remove and the rest are the proposals.
  2. Obejct proposal scoring. The proposals are scored by a 30-category classifier deprived from GoogleNet. And the proposals with higher scores are kept.
  3. High-confidence proposal tracking. The proposals with higher scores are tracked and the overlapped proposals are pressed using IOU. The trackes are tubelet proposals.
  4. Tublet box perturbation and max-pooling. As the tracking result may drift, multiple regions are generated around tubelet proposals. All the regions are sent to the CNN in step 2 and sorted by the scores. Select the region of highest score to replace the one in tubelet.
  5. Temporal convolution and re-scoring. Temporal Convolutional Network (TCN) is proposed that uses 1-D serial features including detection scores, tracking scores, anchor offsets and generates temporally dense prediction on every tubelet box. The tubelet with high detection score are regarded as detection result. However, TCN has not been well explained in this work

ADVANTAGES

  1. The TCN help reduce the negative effect caused by the large variations of detection scores along the same track.

DISADVANTAGES

  1. Too many stages.
  2. Too many CNN operations.