Joshua's Blog

READING NOTE: Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

Posted on 2015-11-10 Edited on 2022-08-19 In Computer Vision

TITLE: Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

AUTHOR: Kendall, Alex and Badrinarayanan, Vijay and Cipolla, Roberto

FROM: arXiv:1511.02680

CONTRIBUTIONS

Extending deep convolutional encoder-decoder neural network architectures to Bayesian convolutional neural networks which can produce a probabilistic output.
Bayesian SegNet outputs a measure of model uncertainty, which could be used to provide segmentation confidence.

METHOD

The first half of the network is a traditional convolutional neural network (VGG-16 in this work). The second half is sort of a mirror of the first half, applying upsampling layers to recover the size of output to that of input. The network is trained in an end-to-end method. The probabilistic output is obtained from Monte Carlo samples of the model with dropout at test time.

SOME DETAILS

For each pixel, a softmax classifier is utilized to predict class label.
At test stage, multiple times of forward is applied to simulate Monte Carlo sampling. Thus the mean of the softmax outputs is taken as class label, and the variance is taken as uncertainty.
Situations of high model uncertainty: 1) different class boundaries, 2) object difficult to identify because of occlusion or distance and 3) vague classes such as dogs and cats, chairs and tables.

ADVANTAGES

Monte Carlo sampling with dropout performs better than weight averaging after approximately 6 samples.
No fully connected layers makes the network easier to be trained.
The network could run in real time when computing in parallel.
Do not need to convolve in a slide window method, which contributes its fast speed.

OTHERS

Applying Bayesian weights to lower layers does not result in a better performance, because low level features are consistent across the distribution of models.
Higher level features, such as shape and contextual relationships, are more effectively modeled with Bayesian weights.
At training stage, dropout samples from a number of thinned networks with reduced width. At test time, standard dropout approximates the effect of averaging the predictions of all these thinned networks by using the weights of the unthinned network.

The online demo and codes can be found here and here

READING NOTE: Two-Stream Convolutional Networks for Action Recognition in Videos

Posted on 2015-11-06 Edited on 2022-08-19 In Computer Vision

TITLE: Two-Stream Convolutional Networks for Action Recognition in Videos

AUTHOR: Simonyan, Karen and Zisserman, Andrew

FROM: NIPS2014

CONTRIBUTIONS

A two-stream ConvNet combines spatial and temporal networks.
A ConvNet trained on multi-frame dense optical flow is able to achieve a good performance in spite of small training dataset
Multi-task training procedure benefits performance on different datasets.

METHOD

Two-stream architecture convolutional network:

Spatial stream ConvNet: take a still frame as input and perform action recognition in this single frame.
Temporal stream ConvNet: take a 2L-channel optical flow/trajectory stacking corresponding to the still frame as input and perform action recognition in this multi-channel input.
The two outputs of the streams are concated as a feature to train a SVM classifier to fuse them.

SOME DETAILS

Mean flow subtraction is utilized to eliminate displacements caused by camera movement.
At test stage, 25 frames (time points) are extracted and their corresponding 2L-channel stackings are sent to the network. In addition, 5 patches and their flips are extracted in space domain.

ADVANTAGES

Simulate bio-structure of human visual cortex.
Competitive performance with the state of the art representations in spite of small size of training dataset.
CNN with convolution filters could generalize hand-crafted features.

DISADVANTAGES

Can not localize action in neither spatial nor temporal domain.

READING NOTE: READING NOTE: Joint Tracking and Segmentation of Multiple Targets

Posted on 2015-11-05 Edited on 2022-08-19 In Computer Vision

TITLE: Joint Tracking and Segmentation of Multiple Targets

AUTHOR: Milan, Anton and Leal-Taixe, Laura and Schindler, Konrad and Reid, Ian

FROM: CVPR2015

CONTRIBUTIONS

A new CRF model taking advantage of both high-level detector responses and low-level superpixel information
Fully automated segmentation and tracking of an unknown number of targets.
A complete state representation at every time step could handle occlusions

METHOD

Generate an overcomplete set of trajectory hypotheses.
Solve data association problem by optimizing an objective function, which is a multi-label conditional random field (CRF).

SOME DETAILS

The goal is to find the most probable labeling for all nodes given the observations, which is equivalent to

$v^{*} = \mathrm{argmin_{v}}E(\nu)$

in which

$E(\nu) = \sum_{s\in\large{\nu}_{S}}\phi^{\large{\nu}_{S}}(s) + \sum_{d\in\large{\nu}_{D}}\phi^{\large{\nu}_{D}}(d) + \sum_{(v,w)\in\Large{\varepsilon}}\psi(v,w)+\psi^{\lambda}$

where $\phi^{\large{\nu}{S}}$ and $\phi^{\large{\nu}{D}}$ are unary potential functions for superpixel and detection nodes, respectively, measuring the cost of one detection node in $\large{\nu}{D}$ or one superpixel node in $\large{\nu}{S}$ belonging to a certain target; $\psi(v,w)$ is pairwise edges among superpixels and detections, including spacial and temporal information among superpixels and information among superpixels and detections in the same frame; $\psi^{\lambda}$ is trajectory cost, containing several constrains of height, shape, dynamics, persistence, image likelihood and parsimony.

ADVANTAGES

Taking pixel (superpixel) level information in addition to detection results into consideration could handle partial occlusions, which would lead to higher recall.
Segments could provide considerable information even no reliable detection result exists.
Modeling multi-targets tracking problem to graph model could take advantage of existing optimization algorithms.

DISADVANTAGES

Solving CRF problem is slow, needing 12 seconds per frame.
Can not handle ID switch in two adjacent temporal slidewindows.

OTHER

Tracking-by-detection has proven to be the most successful strategy to address multi-target tracking problem.
Noise and imprecise measurements, long-term occlusions, complicated dynamics and target interactions all contributes to the problem’s complexity.

READING NOTE: Learning to Segment Moving Objects in Videos

Posted on 2015-11-04 Edited on 2022-08-19 In Computer Vision

TITLE: Learning to Segment Moving Objects in Videos

AUTHOR: Fragkiadaki, Katerina and Arbelaez, Pablo and Felsen, Panna and Malik, Jitendra

FROM: CVPR2015

CONTRIBUTIONS

Moving object proposals from multiple segmentations on optical flow boundaries
A moving objectness detector for ranking per frame segments and tube proposals
A method of extending per frame segments into spatial-temporal tubes

METHOD

Extract motion boundaries by optical flow
Generate segment proposals according to motion boundaries, called MOPs (Moving Object Proposal)
Rank the MOPs using a CNN based regressor
Combine per frame MOPs to space-time tubes based on pixelwise trajectory clusters

ADVANTAGES

Using optical flow could reduce the noises caused by inner texture of one object. Optical flow is more suitable for detecting rigid objects.
Using trajectory tracking could deal with objects that are temporary static.
Segments are effective to tackle frequent occlusions/dis-occlustions.

DISADVANTAGES

Too slow. Every stage would take seconds to process, which is not suitable for practical applications.
Use several independent method to detect objects. Less computations are shared.
The power of CNN has not been fully applied.

OTHER

RCNN has excellent performance on object detection in static images
For slidewindow methods, too many patches need to be evaluated.
MRF methods neglect nearby pixels’ relation and could not separate adjacent instances.
Methods of object detection in video could be categorized into two types i) top-down tracking and ii) bottom-up segmentation.

Miscellaneous [20151025]

Posted on 2015-10-25 Edited on 2022-08-19 In Life Discovery

好久以前就看过了《碟中谍5：神秘国度》，一直想写一点东西，今天终于整理了一下。虽然很多人都说这是一部四平八稳的电影，甚至觉得这部电影毫无新意，对其嗤之以鼻，但是我却觉得这部电影值得点赞，不管是在剧本还是表演上都值得肯定，算是良心作品。

从剧情上说，虽然大家都知道”不可能的完成的任务”最终都会被完成，在电影开始以前，任何观众都可以确定这一点。但是整部影片在缜密度、铺垫、节奏和人物关系上，都比第二第三部要好上很多，第二部来说只是吴宇森暴力美学兜售各种血包爆炸的耍帅而已，第三部过于侧重伊森亨特的爱情线，稍稍偏离了不可能完成的任务。第四部开始算是IMF的重启，重新开始了不可能完成的任务，第五部是一个完全的回归。在第一部影片开始的时候，IMF面临着一个几乎灭团的危机，同样，在第五部里，IMF面临解散，给我的感觉就是：”啊！终于开始干活儿了！”；还有一个有意思的地方是片名《碟中谍》，按英文直译应该是”不可能完成的任务”，但是就是因为第一部中任务的分发是由光碟为载体的，所以在大陆就被译成了《碟中谍》，由此一直沿用，在这一部中，任务的载体为一个优盘，也算是”碟”的回归。从画面上来看，虽然第五部比不上第四部，尤其是全景展开迪拜塔那一幕，真的令人惊叹，但是第五部的故事要比第四部还紧凑连贯很多，对观众情感的调节也很到位，执行任务时的紧张与西门佩吉穿插其中的逗逼搞笑搭配得十分到位。

另外让我十分难忘的两点是老去的阿汤哥和片中的配乐。

老去的阿汤哥

《碟中谍》虽然算不上是阿汤哥的独角戏，但是这一系列电影，如果没有了阿汤哥，绝对就不能再叫做《碟中谍》，一个演员在一生中如果有这么一部电影能够以他为标志，那么这个演员的演艺生涯算是极其成功了。

在第五部中，五十多岁的阿汤哥依然可以上天入地，片头伊森扒着飞机的场景据说是阿汤哥亲身实景拍摄的，没有使用任何电脑特技，以及后面的潜水戏，阿汤哥也是真正参加了水中闭气的训练，阿汤哥对于这部电影的钟情与付出是不言而喻的。即便如此我们不得不说岁月在阿汤哥的身上还是留下了痕迹，整部影片最难能可贵的就在于影片没有回避这一点，而是接受了阿汤哥老去的这一事实。这一部阿汤哥的个人英雄主义有所垮塌，影片中重复了很多以往各部中惊险的场景，但是与以往不同的是，阿汤哥并不是万能的了。尤其是在水下的那一场戏，阿汤哥其实并没有完成任务，而是真的令一个任务不可能完成了，最终是丽贝卡完成了任务并解救了阿汤哥。另一个是与以往无所不能的阿汤哥相比，这一部中当队友不配合的时候，阿汤哥是一点办法都没有的，比如开场的高空戏，西门佩吉打不开机舱门的时候，阿汤哥只能扒着飞机，相比较第四部阿汤哥攀爬迪拜塔同样遇到了西门佩吉这个猪队友，但总算有惊无险地完成了任务。连一些小细节也暗示了老去的阿汤哥，比如阿汤哥已经不能帅气地翻跃一辆汽车了，而是直接摔倒在车上。对比一下昔日的小鲜肉和今天的大叔吧，左边来源于第一部《碟中谍》，右边就是《神秘国度》中的阿汤哥。

贯穿始末的《今夜无人入睡》

在影片中有一段剧情包含了歌剧《图兰朵》的演出，提起《图兰朵》就不得不说《今夜无人入睡》，这首咏叹调可以算作是这部歌剧的代表，在电影中这一段音乐不仅仅作为了那一场戏的配乐，而是作为整部电影音乐的一个主题，贯穿始末，提示着男女主角的感性线。《图兰朵》的剧情中图兰朵直到故事快结束时都是一个冷酷的女人，行为反复无常，最后才感动于王子的真情嫁给了王子。《今夜无人入睡》把《图兰朵》的剧情代入了电影，影射了电影中男女主人公的关系，当我看到丽贝卡所饰演的女间谍在咖啡馆对伊森亨特的表白，联想到她起初的冷酷，对伊森亨特的利用，以及后面的感情变化，我在影院里就感叹：这就是一部《图兰朵》啊！再加上适时响起的《今夜无人入睡》，颇让人情绪激动不能自已。一部好电影应该包含这种值得琢磨的地方，当我们思考为什么电影中要使用这样的配乐后，能有更有意思的东西出来，这才是一部影片不会浮于表面被人遗忘的缘由。最后是电影原声《Finale and Curtain Call》，尤其是前面的一分半融合了《今夜无人入睡》和经典《碟中谍》插曲。

My Drawings [Lighthouse]

Posted on 2015-10-11 Edited on 2022-08-19 In Life Discovery

My Drawings [Steamship]

Posted on 2015-09-24 Edited on 2022-08-19 In Life Discovery

基本功不行啊，水平线不平，好像鱼眼成像。烟也不太会画。

My Drawings [Ship in the Jar]

Posted on 2015-09-18 Edited on 2022-08-19 In Life Discovery

好久不画，不会画了

写在《仙剑奇侠传》20周年

Posted on 2015-09-11 Edited on 2022-08-19 In Life Discovery

阅兵假期之前突然莫名地有一种再玩《仙剑奇侠传》的冲动，于是在假期中花了两天时间又重温了一次这部经典之作。后来在网上看，原来今年是《仙剑奇侠传》发行20周年，也算冥冥之中的一种召唤吧。说起这一部游戏，还真是能说一说情怀了。

《仙剑》是我玩的第一款电脑游戏。小学五年级的时候家里买了第一台电脑，那个时候的电脑还是奔腾3处理器，内存只有256M，买来时是Windows98系统，电话线拨号上网。那个时候还不流行网络游戏，一个是电话线上网网速慢，打开个那个时候流行的搜狐门户都卡得要死；再一个就是网费花不起，按时间计费，我已经不记得当时的网费是多少了，但是几个小时的网费肯定够家里家长唠叨好多天了，基本一个月就是30小时的上网时间，平均起来每天也就是上一个小时网，不可能整天整天的挂在线上。有了电脑，除了平时练练打字或者画个画（不得不提一句，我小时候可是用画图软件画过泰坦尼克号那条大船），偶尔上个网，游戏肯定是不能少的，不然一台电脑可真就浪费了。没有网游，小伙伴们就玩单机游戏，什么《红色警戒》、《帝国时代》、《沙丘2000》等等，那个时候真算是单机游戏的黄金年代了。不过阴差阳错的，这些我后来都玩得津津有味的游戏都没有在第一时间被安装在我的电脑里。我记得应该是在一个夏天的周末，爸妈都去上班了，我在家闲的无聊，就说出去转转吧。在离家不远的一排门脸房，我发现了一个卖软件的店面，店面的名字我早就不记得了，而且现在那个店面也早已改了主人，但是那家店的位置到现在我都还记得清清楚楚。从落地的玻璃中，我可以看到店内花花绿绿的游戏包装，小孩子很容易就被吸引了。走进去左看看右看看，不知道《仙剑》到底是什么吸引了我，让我最终在一整面墙的游戏中唯独掏钱买了这一款，但是我深深地记住了粉红色的包装上画着的男女主人公和飘落的桃花瓣。

李逍遥
翩翩潇洒美少年，灵岛求药结仙缘。

千里崎岖不辞苦，仗剑江湖为红颜。

《仙剑》的故事可能是很多小伙伴的爱情启蒙。虽然李逍遥和三个女子都有说不清道不明的情愫，但是这样的一个故事不会让人想起现在流行的后宫剧，而是真正的相亲相爱，有责任、有知己、有呵护。然而幸福的日子总是短暂的，美好的爱情也不能阻挡命运的捉弄，不管是逍遥和灵儿踏上去苗疆的路，还是和月如许愿要吃遍天下珍味、看遍人间美景，都在一个个变故之中成为记忆，每段旅程越是美好，当一切都已成往事时就越刺痛人心。或许，当逍遥最终失去灵儿，告别了阿奴，在冰雪之中看到没有复活但能不死的月如抱着忆如时，他的内心之中早已五味杂陈、悲怆无语了。或许从那时起，逍遥要很长一段时间后才能收拾好心情，走出阴霾，再一次踏上成为大侠的路。

赵灵儿
仙灵岛上别洞天，池中孤莲伴月眠。

一朝风雨落水面，愿君拾得惜相怜。

林月如
刁蛮少女贵千金，比武招亲动芳心。

盼能与君长相依，结伴江湖侠侣行。

阿奴
如花苗女鬼灵精，喜逢君子初尝情。

落花有意结连理，拌月愿作一颗星。

在这三个女子当中，灵儿温柔善良，阿奴鬼怪精灵，而我最喜欢月如。月如独立坚强，做事有主意有担当，不是那种生活不能自理的大小姐，关键时刻可以帮助逍遥，连终身大事也由自己做主。有时也有一点点刁蛮任性，对仆人棒打鸳鸯，比武招亲手下没轻没重。另外，跟月如在一起的日子，是逍遥最开心的一段时间，可能每一天都充满了新鲜感。月如最吸引我的也就是这种新鲜感，如果没有那些令人悲伤的经历，他们可能一刻也闲不住，总是要闹一些麻烦出来。要是他们穿越到现代，不知他们会实施多少次说走就走的旅行，买上一张最近起飞的航班机票，到一个新鲜的城市，尝试一下百虫宴，来一次蹦极……

《仙剑》引领了一个武侠游戏的时代，记得那时候还出了《剑侠情缘》、《绝代双骄》等等一大批武侠游戏。现在看来《仙剑》的画面很原始，还是一个个像素快堆叠起的世界，配乐是极简的电音，但是每当一首《蝶恋》响起，看着那“新的故事”和“旧的回忆”，我知道没有什么游戏可以超越《仙剑》在我心中的地位。甚至那一句“胜败乃兵家常事，大侠请重新来过”，也已经被融入到了日常生活。前一阵在知乎上还有人在讨论为什么逍遥和灵儿去蜀地要从余杭到苏州，还经过京城长安，我还真对照着地图，好好研读了一遍评价最高的答案。每看到一个地名，总能回忆起故事中的种种，那一种感动是无以言表的。后来《仙剑》出了很多续作，也玩过其中的一些，但是我总也不能再次体会玩“仙一”那种心境，或许这就是情怀。

Set Up Caffe on Ubuntu14.04 64bit+NVIDIA GTX970M+CUDA7.0

Posted on 2015-08-30 Edited on 2022-08-19 In Computer Vision

Table of Content

Content
{:toc}

Prerequisite

install NVIDIA GTX970M driver
install CUDA 7.0 Toolkit

Please refer to my previous blog Installation of NVIDIA GPU Driver and CUDA Toolkit

Install OpenBLAS

download source code from OpenBLAS official website and extract the archive
(optional) install gfortran by sudo apt-get install gfortran
change directory to the position of extracted folder the and compile make FC=gfortran
install by make PREFIX=/your/path install
add paths to envrionment: PATH=/your/path/to/openblas/include:$PATH and LD_LIBRARY_PATH=/your/path/to/openblas/lib:$LD_LIBRARY_PATH and export the pathes.

Install Anaconda

download the script from http://continuum.io/downloads
change mode sudo chmod +x Anaconda*.sh
execute the installer by bash Anaconda*.sh
in ~/.bashrc add

1 2	LD_LIBRARY_PATH=your_anaconda_path/lib:$LD_LIBRARY_PATH export LD_LIBRARY_PATH

NEVER put it in /etc !!! Otherwise, one may be in danger of unable to get into GUI.

config HDF5 version

cd /usr/lib/x86_64-linux-gnu
sudo ln -s libhdf5.so.7 libhdf5.so.10
sudo ln -s libhdf5_hl.so.7 libhdf5_hl.so.10
sudo ldconfig

Install OpenCV

One can conveniently install OpenCV by run a shell script from a Github repository

download the script. For me, I use OpenCV 2.4.10.
change mode of the shell sudo chmod +x opencv2_4_10.sh
run the script sudo ./opencv2_4_10.sh. Note that one may need to modify the cmake settings, such as eliminating QT.

Install a Set of Dpendencies

Following the guideline in Caffe, we can set up the dependencies by commond sudo apt-get install libprotobuf-dev libleveldb-dev libsnappy-dev libopencv-dev libboost-all-dev libhdf5-serial-dev libgflags-dev libgoogle-glog-dev liblmdb-dev protobuf-compiler

Compile Caffe

get Caffe from github git clone https://github.com/BVLC/caffe.git
edit Makefile.config to set correct paths. Firstly create Makefile.config by cp Makefile.config.example Makefile.config. Then modify several paths. For me, I set blas to openblas and set blas path to /opt/OpenBLAS/include and /opt/OpenBLAS/lib where I install OpenBLAS; Python is set to Anaconda as well as its paths.
compile Caffe by make -j and make pycaffe
In addition, so far Caffe should be able to be compiled without any problem. However, when running exampls such as MNIST, some libs might be missing. My solution is to add libraries to the system library cache. For example, create a file called cuda.conf in /etc/ld.so.conf.d/ and add the path “/usr/local/cuda-7.0/lib64” to this file.