Joshua's Blog

春游南京城

Posted on 2016-02-28 Edited on 2026-03-14 In Life Discovery

想起一句北京顺义的广告词：离都市不远，距自然更近。我觉得这句话更适合形容南京城，城墙里是都市井肆的繁华世界，城墙外是山间溪水的世外桃源。

从城市的外观上就可以看出南京城的秀美。相对来说，北京城更加显示出一种等级森严的帝王气，一切都井井有条，皇城坐落在正中，城市是棋盘的布局，城墙是直来直往的。然而南京城却拥有一种含蓄委婉的气质，整座城市依江势山势展开，白墙黛瓦沿秦淮河错落有致地排列着，皇城歪歪地躺在一边，城墙也是婉转曲折，用优美的曲线将整个城市拥抱起来。也许这种城市的容貌无形之中就影响了人们的心态、说话的音调和行事的风格。

南京城里的繁华体现在商业的发达，秦淮河畔的酒家、饭馆总是熙熙攘攘，门庭若市。也许在白天还不太能考虑的到，但是只要天一擦黑，沿河的灯火点亮了，连着倒映在水里的明灯，将觥筹交错的气氛彰显得更加热闹。

南京城外则是另一番滋味，大大的一片玄武湖和紫金山，完全是自然风光的迤逦，然而谁又能想到大江大湖的烟云和山林的俊秀就只在城墙之外。在秦淮河畔喝了一顿碗鸭血粉丝汤之后，不如悠闲地到紫金山脚下散散步。就在城外不远处就可以欣赏灵谷的流萤、孝陵的梅花、紫霞湖的落日……

在南京呆久了，就不想离开了，既可以享受城市，又可以享受自然，整个城市都那么悠闲，不紧不慢地过活，去哪里二三十分钟就足够了，真是悠哉。说了这么多，或许是因为回到北京，我就得面对现实了，所以我才如此喜欢南京。也许南京也只是偶然，这个城市也可能是济南、扬州、或者无锡。

往南京高铁上的一些杂想

Posted on 2016-02-26 Edited on 2026-03-14 In Life Discovery

只是写些东西用来打发车上的时间。

列车是上午十点零五出发，现在是十点三十七分。这短短的三十分钟内，望着车窗外逐渐消失的城市，这脑子里还真是瞎想了不少东西。

第一个便是这天气，说到天气，估计现在的北京人都会特指空气质量了。毕竟夏天有空调，冬天有暖气，都热不着冻不着，就是吓人的雾霾让人没处躲没处藏。今天的空气质量虽说算不上特别优等，但也算很不错了，起码阳光明媚，天也是淡淡的蓝色。我坐在靠窗的位置上，好歹也有个心情码码字。

第二个便是码字，说实话自己并不是一个喜欢写东西的人。上学的时候写作文估计是我最头疼的事了，中考高考作文的分数都低的很，每每想起都会说要是作文能多考个三五分会如何如何。现在工作了，最烦的就是写专利，虽说写专利跟写作文抒发情感不一样，不过一想到要写个好几十页也是抓头，往往都是能拖就拖，拖到最后借着最后期限激发出来的小宇宙才能完成。

最近倒是想着要多写写东西，动力的来源挺多。随便一想包括：自己有个个人网站；最近看到一个有意思的网站，叫做”简书”；不就之前上了一门英语写作的公开课。一个一个说说吧，个人网站真是自己弄着玩的，也没什么访问量，就是几个要好的朋友可能会去看看，上面的内容包括一些技术博客和生活记录，好久没有更新了，总得写点东西充充门面。”简书”这个网站其实挺文艺的，都是一些爱好写作的人发布的生活感悟，看了别人的文字，总觉得自己也应该多动动笔，记录一下生活，不然岂不是那些当时的感触都会随着时间渐渐消失，这样的人生会不会什么都不会留下呢，以后连个想回味的线索都没有。上英文写作公开课其实是因为当时正准备雅思考试，这门课的第一个作业就是写一小段作文，作文的主题是”作为作者的我”，我还从来没有从这个角度看待过自己，通过这个作业，我惊奇的发现原来自己早就是一个写作的人了，初中的每个寒暑假都被要求一天一篇日记，当有所经历或感悟时，写出来的东西也总能得到老师的积极评价。上了高中自己有写日记的习惯，现在看着那三个大硬皮本，里面可是有不少成长的烦恼。上了大学，在学生会写工作计划和总结，硕士期间也发了论文。写的东西其实挺多的，所以现在也觉得或许我并不是不能写。

说了码字，算是一种输出，那第三个便是输入——阅读。我一向认为没有输入就没有输出，这算是我的人生感悟了，与陌生人打交道时，我是一个为话题烦恼的人，只有当别人开始一个话题时，当他说了他的故事后，我才会想到我的故事。在工作中也一样，当我接受了别人的信息时，我更有可能提出有建设性的建议。所以我知道自己是一个没有输入就没有输出的人，甚至有了输入，输出的多少都很难说，按张大夫的话说就是：你们摩羯就是三脚踹不出来一个屁。所以现在想多写字，也是变相的督促自己多看看书。

以前自己会写一些技术博客，本来想着没看一篇文献就写一篇读书笔记，设想很不错，可惜坚持的不好，一个是有些文章读的不够深入，往往写不出什么东西；再一个就是有些时候工作太忙读文献的时间太少，或者干脆就没有读，往往过了一段时间以后，自己很难再回到阅读的状态里，一拖再拖就更加荒废了。除了业务相关的阅读，其实还应该加强一些其他方面的阅读，最近读了简书上的一些东西，发现自己在人文方面的素养太低了，经典名著没读过几本，人文社科的东西一看就困，现在应该让自己多思考思考哲学问题了。

刚刚还说了考雅思，其实当时是因为刚刚工作，算是从学生到职场的转型期，要说有什么特别大的压力，好像也没什么，但是总觉得不能太过放松，要保持竞争力。因此做了一个要出国读博士的决定，其实这个决定挺难的，毕竟时间和经济成本都不小，而且毕业回来之后会怎么发展也是未知数。但是为了让自己不至于退缩了，便趁着自己还有着一腔热情便赶紧报了雅思的培训班，雅思的成绩算是很不错了，综合一共7.5。从决定申请到现在一年多了，一切工作也都开展了，只是这结果有点差劲，给教授发套词信，回复的教授没几个，申请了几个学校，最近也陆续收到了拒信。想想这么个结果不怎么令人鼓舞，但是过程我知足了，每个周末不休息去上课，花力气读文献写科研计划书，不管怎么样起码英文能力有所提升。再者，即使不出国留学也可以活出自己的人生，只要不荒废了时间就好。

胡乱地今天就写这么多吧，十点三十七到十一点三十七，刚刚好一个小时。

How to read and understand a scientific paper: a guide for non-scientists

Posted on 2016-02-05 Edited on 2026-03-14 In Life Discovery

The article is partially quoted from a blog, which is written by Jennifer Raff, Assistant Professor of Physical Anthropology.

What constitutes enough proof? Obviously everyone has a different answer to that question. But to form a truly educated opinion on a scientific subject, you need to become familiar with current research in that field. And to do that, you have to read the “primary research literature” (often just called “the literature”). You might have tried to read scientific papers before and been frustrated by the dense, stilted writing and the unfamiliar jargon. I remember feeling this way! Reading and understanding research papers is a skill which every single doctor and scientist has had to learn during graduate school. You can learn it too, but like any skill it takes patience and practice.

I want to help people become more scientifically literate, so I wrote this guide for how a layperson can approach reading and understanding a scientific research paper. It’s appropriate for someone who has no background whatsoever in science or medicine, and based on the assumption that he or she is doing this for the purpose of getting a basic understanding of a paper and deciding whether or not it’s a reputable study.

The type of scientific paper I’m discussing here is referred to as a primary research article. It’s a peer-reviewed report of new research on a specific question (or questions). Another useful type of publication is a review article. Review articles are also peer-reviewed, and don’t present new information, but summarize multiple primary research articles, to give a sense of the consensus, debates, and unanswered questions within a field. (I’m not going to say much more about them here, but be cautious about which review articles you read. Remember that they are only a snapshot of the research at the time they are published. A review article on, say, genome-wide association studies from 2001 is not going to be very informative in 2013. So much research has been done in the intervening years that the field has changed considerably).

Before you begin: some general advice
Reading a scientific paper is a completely different process than reading an article about science in a blog or newspaper. Not only do you read the sections in a different order than they’re presented, but you also have to take notes, read it multiple times, and probably go look up other papers for some of the details. Reading a single paper may take you a very long time at first. Be patient with yourself. The process will go much faster as you gain experience.

Most primary research papers will be divided into the following sections: Abstract, Introduction, Methods, Results, and Conclusions/Interpretations/Discussion. The order will depend on which journal it’s published in. Some journals have additional files (called Supplementary Online Information) which contain important details of the research, but are published online instead of in the article itself (make sure you don’t skip these files).

Before you begin reading, take note of the authors and their institutional affiliations. Some institutions (e.g. University of Texas) are well-respected; others (e.g. the Discovery Institute) may appear to be legitimate research institutions but are actually agenda-driven. Tip: google “Discovery Institute” to see why you don’t want to use it as a scientific authority on evolutionary theory.

Also take note of the journal in which it’s published. Reputable (biomedical) journals will be indexed by Pubmed. [EDIT: Several people have reminded me that non-biomedical journals won’t be on Pubmed, and they’re absolutely correct! (thanks for catching that, I apologize for being sloppy here). Check out Web of Science for a more complete index of science journals. And please feel free to share other resources in the comments!] Beware of questionable journals.

As you read, write down every single word that you don’t understand. You’re going to have to look them all up (yes, every one. I know it’s a total pain. But you won’t understand the paper if you don’t understand the vocabulary. Scientific words have extremely precise meanings).

Step-by-step instructions for reading a primary research article
1. Begin by reading the introduction, not the abstract.
The abstract is that dense first paragraph at the very beginning of a paper. In fact, that’s often the only part of a paper that many non-scientists read when they’re trying to build a scientific argument. (This is a terrible practice—don’t do it.). When I’m choosing papers to read, I decide what’s relevant to my interests based on a combination of the title and abstract. But when I’ve got a collection of papers assembled for deep reading, I always read the abstract last. I do this because abstracts contain a succinct summary of the entire paper, and I’m concerned about inadvertently becoming biased by the authors’ interpretation of the results.

2. Identify the BIG QUESTION.
Not “What is this paper about”, but “What problem is this entire field trying to solve?”

This helps you focus on why this research is being done. Look closely for evidence of agenda-motivated research.

3. Summarize the background in five sentences or less.
Here are some questions to guide you:

What work has been done before in this field to answer the BIG QUESTION? What are the limitations of that work? What, according to the authors, needs to be done next?

The five sentences part is a little arbitrary, but it forces you to be concise and really think about the context of this research. You need to be able to explain why this research has been done in order to understand it.

4. Identify the SPECIFIC QUESTION(S)
What exactly are the authors trying to answer with their research? There may be multiple questions, or just one. Write them down. If it’s the kind of research that tests one or more null hypotheses, identify it/them.

5. Identify the approach
What are the authors going to do to answer the SPECIFIC QUESTION(S)?

6. Now read the methods section. Draw a diagram for each experiment, showing exactly what the authors did.
I mean literally draw it. Include as much detail as you need to fully understand the work.

You don’t need to understand the methods in enough detail to replicate the experiment—that’s something reviewers have to do—but you’re not ready to move on to the results until you can explain the basics of the methods to someone else.

7. Read the results section. Write one or more paragraphs to summarize the results for each experiment, each figure, and each table. Don’t yet try to decide what the results mean, just write down what they are.
You’ll find that, particularly in good papers, the majority of the results are summarized in the figures and tables. Pay careful attention to them! You may also need to go to the Supplementary Online Information file to find some of the results.

It is at this point where difficulties can arise if statistical tests are employed in the paper and you don’t have enough of a background to understand them. I STRONGLY advise you to become familiar with them.

THINGS TO PAY ATTENTION TO IN THE RESULTS SECTION:

Any time the words “significant“ or “non-significant“ are used. These have precise statistical meanings. Read more about this here.

If there are graphs, do they have error bars on them? For certain types of studies, a lack of confidence intervals is a major red flag.

The sample size. Has the study been conducted on 10, or 10,000 people? (For some research purposes, a sample size of 10 is sufficient, but for most studies larger is better).

8. Do the results answer the SPECIFIC QUESTION(S)? What do you think they mean?
Don’t move on until you have thought about this. It’s okay to change your mind in light of the authors’ interpretation—in fact you probably will if you’re still a beginner at this kind of analysis—but it’s a really good habit to start forming your own interpretations before you read those of others.

9. Read the conclusion/discussion/Interpretation section.
What do the authors think the results mean? Do you agree with them? Can you come up with any alternative way of interpreting them? Do the authors identify any weaknesses in their own study? Do you see any that the authors missed? (Don’t assume they’re infallible!) What do they propose to do as a next step? Do you agree with that?

10. Now, go back to the beginning and read the abstract.
Does it match what the authors said in the paper? Does it fit with your interpretation of the paper?

11. FINAL STEP: (Don’t neglect doing this) What do other researchers say about this paper?
Who are the (acknowledged or self-proclaimed) experts in this particular field? Do they have criticisms of the study that you haven’t thought of, or do they generally support it?

Here’s a place where I do recommend you use google! But do it last, so you are better prepared to think critically about what other people say.

(12. This step may be optional for you, depending on why you’re reading a particular paper. But for me, it’s critical! I go through the “Literature cited” section to see what other papers the authors cited. This allows me to better identify the important papers in a particular field, see if the authors cited my own papers (KIDDING!….mostly), and find sources of useful ideas or techniques.)

READING NOTE: Object Detection by Labelling Superpixels

Posted on 2015-12-03 Edited on 2026-03-14 In Computer Vision

TITLE: Object Detection by Labelling Superpixels

AUTHOR: Yan, Junjie and Yu, Yinan and Zhu, Xiangyu and Lei, Zhen and Li, Stan Z.

FROM: CVPR2015

CONTRIBUTIONS

Convert object detection problem into super-pixel labelling problem, which could avoid false negatives caused by proposals and could take advantages from global contexts.
Conduct an energy function considering appearance, spatial context and numbers of labels.

METHOD

The image is partitioned into a set of super-pixels, denoted as $\mathcal{P}=\lbrace p_{1},p_{2},…,p_{N}\rbrace$.
An energy function $E(\mathcal{L})$ is calculated to measure the corresponding label configuration for each super-pixels, where $\mathcal{L}=\lbrace l_{1},l_{2},…,l_{N}\rbrace$.
The problem is transfered to select an $\mathcal{L}$ to minimise $E(\mathcal{L})$.

SOME DETAILS

The energy function is conducted as

$$ E(\mathcal{L})=\sum_{\mathcal{p_{i}}\in\mathcal{P}}D(l_{i},p_{i})+\sum_{(p_{i},p_{j})\in\mathcal{N}}V(l_{i},l_{j},p_{i},p_{j})+C(\mathcal{L}) $$

where $D(l_{i},p_{i})$ is the data cost to capture the appearance of $p_{i}$ and measure its cost of belonging to label $l_{i}$, $V(l_{i},l_{j},p_{i},p_{j})$ is the pairwise smooth cost in the local area $\mathcal{N}$ and $C(\mathcal{L})$ is the label cost to encourage compact detection and to punish the number of labels.

Data Cost

Super-pixels usually does not have enough semantic information, so corresponding regions are classified and their costs are propagated to super-pixels. In this work, RCNN is used to generate and classify semantic regions. The region set of $T$ elements is denoted as $\mathcal{R}=\lbrace r_{1},..,r_{T}\rbrace$ and the classifier score is $s_{t}$, thus we can map the scores into $(0,1)$ by

$$ D(l_{t},r_{t})= \begin{cases} \frac{1}{1+\mathit{exp}(-\alpha\cdot s_{t})}& \text{if }l_{t}>0 \\ \frac{\mathit{exp}(-\alpha\cdot s_{t})}{1+\mathit{exp}(-\alpha\cdot s_{t})}& \text{if }l_{t}=0 \\ \end{cases} $$

where $\alpha$ is set to 1.5 empirically. For each super-pixel the data cost is the weighted sum of T smallest costs,

$$ D(l_{i},p_{i})= \sum_{t=1}^{T}\omega_{d_{t}}\cdot D(l_{t}, R(p_{i})_{t}) $$

where $$R(p_{i}){t}$$ is the region $$p{i}$$ belongs to with the $t$-th smallest cost.

Smooth Cost

The smooth cost is conducted for the reason that 1) adjacent super-pixels often have the same label and 2) super-pixels belonging to the same label should have similar apprearance. This attribute is measured by

$$ V(l_{i},l_{j},p_{i},p_{j})=\omega_{s_{l}}V_{l}(l_{i}, l_{j})+V_{a}(l_{i},l_{j},p_{i},p_{j}) $$

where $V_{l}$ is a boolean variable and is set to $1$ when $l_{i}=l_{j}$ and $(p_{i},p_{j})\in \mathcal{N}$. $V_{a}$ is defined as

$$ V_{a}(l_{i},l_{j},p_{i},p_{j})=\omega_{s_{c}}(1-\sum_{q}\mathit{min}(c_{i}^{q}), c_{j}^{q})+\omega_{s_{t}}(1-\sum_{q}\mathit{min}(t_{i}^{q}), t_{j}^{q}) $$

where $c_{i}^{q}$ and $t_{i}^{q}$ are the values in the $q$-th bin of color and texture histogram of super-pixel $p_{i}$. In this work color histogram and SIFT histogram are calculated to describe color and texture information.

Label Cost

The label cost is used to encourage less number of labels and its defination is

$$ C(\mathcal{L})=\sum_{i=1}^{K}\omega_{l_{i}}\cdot \delta(i, \mathcal{L}) $$

where $\delta(\cdot)$ is defined as

$$ \delta(i, \mathcal{L})=\begin{cases} 1& \text{if }i\in \mathcal{L} \\ 0& \text{if otherwise} \\ \end{cases} $$

ADVANTAGES

Super-pixels are compact and perceptually meaningful atomic regions for images.
Avoid false negatives caused by inappropriate proposals generated by algorithms suchas Selective Search and BING.
Super-pixel based method is a trade-off of Pixel based and Proposal based algorithm, leading to accurate and fast results.

DISADVANTAGES

The CNN used in RCNN and the parameters in the energy function are learned separately.
The region generated might not cover all the super-pixels.
Time consumption is high. Its speed is 1fps for each 128 proposals on a NVIDIA Telsa K40 GPU. However, 128 proposals might not be enough.

READING NOTE: Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

Posted on 2015-11-15 Edited on 2026-03-14 In Computer Vision

TITLE: Pooling the Convolutional Layers in Deep ConvNets for Action Recognition

AUTHOR: Zhao, Shichao and Liu, Yanbin and Han, Yahong and Hong, Richang

FROM: arXiv:1511.02126

CONTRIBUTIONS

Propose an efficient video representation framework basing on VGGNet and Two-Stream ConcNets.
Trajectory pooling and line pooling are used together to extract features from convolutional layers.
A frame-diff layer is used to get local descriptors.

METHOD

Two succession frames are sent to a siamese VGGNet and a frame-diff layer is used to extract spatial features.
Compute temporal feature in one frame using optical-flow net of Two-Stream ConvNet.
Extract features in ConvNet feature maps along point trajectories or along lines in a dense sampling manner.
Use BoF method to generate video representation
Classify video using a SVM classifier.

ADVANTAGES

Using deeper network to extract features, which are more discriminative.
Different from Two-Stream ConvNet, in this work spatial features are extracted on every frame, which would provide more information.

DISADVANTAGES

The two branches are trained independently. Jointly training in a multi-task manner may benefit.

OTHERS

The difficulty of human action recognition is caused by some inherent characteristics of action videos such as intra-class variation, occlusions, view point changes, background noises, motion speed and actor differences.
Despite the good performance, Dense Trajectory based action recognition algorithms suffer from huge computation costs and large disk affords.

READING NOTE: Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

Posted on 2015-11-10 Edited on 2026-03-14 In Computer Vision

TITLE: Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding

AUTHOR: Kendall, Alex and Badrinarayanan, Vijay and Cipolla, Roberto

FROM: arXiv:1511.02680

CONTRIBUTIONS

Extending deep convolutional encoder-decoder neural network architectures to Bayesian convolutional neural networks which can produce a probabilistic output.
Bayesian SegNet outputs a measure of model uncertainty, which could be used to provide segmentation confidence.

METHOD

The first half of the network is a traditional convolutional neural network (VGG-16 in this work). The second half is sort of a mirror of the first half, applying upsampling layers to recover the size of output to that of input. The network is trained in an end-to-end method. The probabilistic output is obtained from Monte Carlo samples of the model with dropout at test time.

SOME DETAILS

For each pixel, a softmax classifier is utilized to predict class label.
At test stage, multiple times of forward is applied to simulate Monte Carlo sampling. Thus the mean of the softmax outputs is taken as class label, and the variance is taken as uncertainty.
Situations of high model uncertainty: 1) different class boundaries, 2) object difficult to identify because of occlusion or distance and 3) vague classes such as dogs and cats, chairs and tables.

ADVANTAGES

Monte Carlo sampling with dropout performs better than weight averaging after approximately 6 samples.
No fully connected layers makes the network easier to be trained.
The network could run in real time when computing in parallel.
Do not need to convolve in a slide window method, which contributes its fast speed.

OTHERS

Applying Bayesian weights to lower layers does not result in a better performance, because low level features are consistent across the distribution of models.
Higher level features, such as shape and contextual relationships, are more effectively modeled with Bayesian weights.
At training stage, dropout samples from a number of thinned networks with reduced width. At test time, standard dropout approximates the effect of averaging the predictions of all these thinned networks by using the weights of the unthinned network.

The online demo and codes can be found here and here

READING NOTE: Two-Stream Convolutional Networks for Action Recognition in Videos

Posted on 2015-11-06 Edited on 2026-03-14 In Computer Vision

TITLE: Two-Stream Convolutional Networks for Action Recognition in Videos

AUTHOR: Simonyan, Karen and Zisserman, Andrew

FROM: NIPS2014

CONTRIBUTIONS

A two-stream ConvNet combines spatial and temporal networks.
A ConvNet trained on multi-frame dense optical flow is able to achieve a good performance in spite of small training dataset
Multi-task training procedure benefits performance on different datasets.

METHOD

Two-stream architecture convolutional network:

Spatial stream ConvNet: take a still frame as input and perform action recognition in this single frame.
Temporal stream ConvNet: take a 2L-channel optical flow/trajectory stacking corresponding to the still frame as input and perform action recognition in this multi-channel input.
The two outputs of the streams are concated as a feature to train a SVM classifier to fuse them.

SOME DETAILS

Mean flow subtraction is utilized to eliminate displacements caused by camera movement.
At test stage, 25 frames (time points) are extracted and their corresponding 2L-channel stackings are sent to the network. In addition, 5 patches and their flips are extracted in space domain.

ADVANTAGES

Simulate bio-structure of human visual cortex.
Competitive performance with the state of the art representations in spite of small size of training dataset.
CNN with convolution filters could generalize hand-crafted features.

DISADVANTAGES

Can not localize action in neither spatial nor temporal domain.

READING NOTE: READING NOTE: Joint Tracking and Segmentation of Multiple Targets

Posted on 2015-11-05 Edited on 2026-03-14 In Computer Vision

TITLE: Joint Tracking and Segmentation of Multiple Targets

AUTHOR: Milan, Anton and Leal-Taixe, Laura and Schindler, Konrad and Reid, Ian

FROM: CVPR2015

CONTRIBUTIONS

A new CRF model taking advantage of both high-level detector responses and low-level superpixel information
Fully automated segmentation and tracking of an unknown number of targets.
A complete state representation at every time step could handle occlusions

METHOD

Generate an overcomplete set of trajectory hypotheses.
Solve data association problem by optimizing an objective function, which is a multi-label conditional random field (CRF).

SOME DETAILS

The goal is to find the most probable labeling for all nodes given the observations, which is equivalent to

$$ v^{*} = \mathrm{argmin_{v}}E(\nu) $$

in which

$$ E(\nu) = \sum_{s\in\large{\nu}{S}}\phi^{\large{\nu}{S}}(s) + \sum_{d\in\large{\nu}{D}}\phi^{\large{\nu}{D}}(d) + \sum_{(v,w)\in\Large{\varepsilon}}\psi(v,w)+\psi^{\lambda}$$

where $\phi^{\large{\nu}{S}}$ and $\phi^{\large{\nu}{D}}$ are unary potential functions for superpixel and detection nodes, respectively, measuring the cost of one detection node in $\large{\nu}{D}$ or one superpixel node in $\large{\nu}{S}$ belonging to a certain target; $\psi(v,w)$ is pairwise edges among superpixels and detections, including spacial and temporal information among superpixels and information among superpixels and detections in the same frame; $\psi^{\lambda}$ is trajectory cost, containing several constrains of height, shape, dynamics, persistence, image likelihood and parsimony.

ADVANTAGES

Taking pixel (superpixel) level information in addition to detection results into consideration could handle partial occlusions, which would lead to higher recall.
Segments could provide considerable information even no reliable detection result exists.
Modeling multi-targets tracking problem to graph model could take advantage of existing optimization algorithms.

DISADVANTAGES

Solving CRF problem is slow, needing 12 seconds per frame.
Can not handle ID switch in two adjacent temporal slidewindows.

OTHER

Tracking-by-detection has proven to be the most successful strategy to address multi-target tracking problem.
Noise and imprecise measurements, long-term occlusions, complicated dynamics and target interactions all contributes to the problem’s complexity.

READING NOTE: Learning to Segment Moving Objects in Videos

Posted on 2015-11-04 Edited on 2026-03-14 In Computer Vision

TITLE: Learning to Segment Moving Objects in Videos

AUTHOR: Fragkiadaki, Katerina and Arbelaez, Pablo and Felsen, Panna and Malik, Jitendra

FROM: CVPR2015

CONTRIBUTIONS

Moving object proposals from multiple segmentations on optical flow boundaries
A moving objectness detector for ranking per frame segments and tube proposals
A method of extending per frame segments into spatial-temporal tubes

METHOD

Extract motion boundaries by optical flow
Generate segment proposals according to motion boundaries, called MOPs (Moving Object Proposal)
Rank the MOPs using a CNN based regressor
Combine per frame MOPs to space-time tubes based on pixelwise trajectory clusters

ADVANTAGES

Using optical flow could reduce the noises caused by inner texture of one object. Optical flow is more suitable for detecting rigid objects.
Using trajectory tracking could deal with objects that are temporary static.
Segments are effective to tackle frequent occlusions/dis-occlustions.

DISADVANTAGES

Too slow. Every stage would take seconds to process, which is not suitable for practical applications.
Use several independent method to detect objects. Less computations are shared.
The power of CNN has not been fully applied.

OTHER

RCNN has excellent performance on object detection in static images
For slidewindow methods, too many patches need to be evaluated.
MRF methods neglect nearby pixels’ relation and could not separate adjacent instances.
Methods of object detection in video could be categorized into two types i) top-down tracking and ii) bottom-up segmentation.

Miscellaneous [20151025]

Posted on 2015-10-25 Edited on 2026-03-14 In Life Discovery

好久以前就看过了《碟中谍5：神秘国度》，一直想写一点东西，今天终于整理了一下。虽然很多人都说这是一部四平八稳的电影，甚至觉得这部电影毫无新意，对其嗤之以鼻，但是我却觉得这部电影值得点赞，不管是在剧本还是表演上都值得肯定，算是良心作品。

从剧情上说，虽然大家都知道”不可能的完成的任务”最终都会被完成，在电影开始以前，任何观众都可以确定这一点。但是整部影片在缜密度、铺垫、节奏和人物关系上，都比第二第三部要好上很多，第二部来说只是吴宇森暴力美学兜售各种血包爆炸的耍帅而已，第三部过于侧重伊森亨特的爱情线，稍稍偏离了不可能完成的任务。第四部开始算是IMF的重启，重新开始了不可能完成的任务，第五部是一个完全的回归。在第一部影片开始的时候，IMF面临着一个几乎灭团的危机，同样，在第五部里，IMF面临解散，给我的感觉就是：”啊！终于开始干活儿了！”；还有一个有意思的地方是片名《碟中谍》，按英文直译应该是”不可能完成的任务”，但是就是因为第一部中任务的分发是由光碟为载体的，所以在大陆就被译成了《碟中谍》，由此一直沿用，在这一部中，任务的载体为一个优盘，也算是”碟”的回归。从画面上来看，虽然第五部比不上第四部，尤其是全景展开迪拜塔那一幕，真的令人惊叹，但是第五部的故事要比第四部还紧凑连贯很多，对观众情感的调节也很到位，执行任务时的紧张与西门佩吉穿插其中的逗逼搞笑搭配得十分到位。

另外让我十分难忘的两点是老去的阿汤哥和片中的配乐。

老去的阿汤哥

《碟中谍》虽然算不上是阿汤哥的独角戏，但是这一系列电影，如果没有了阿汤哥，绝对就不能再叫做《碟中谍》，一个演员在一生中如果有这么一部电影能够以他为标志，那么这个演员的演艺生涯算是极其成功了。

在第五部中，五十多岁的阿汤哥依然可以上天入地，片头伊森扒着飞机的场景据说是阿汤哥亲身实景拍摄的，没有使用任何电脑特技，以及后面的潜水戏，阿汤哥也是真正参加了水中闭气的训练，阿汤哥对于这部电影的钟情与付出是不言而喻的。即便如此我们不得不说岁月在阿汤哥的身上还是留下了痕迹，整部影片最难能可贵的就在于影片没有回避这一点，而是接受了阿汤哥老去的这一事实。这一部阿汤哥的个人英雄主义有所垮塌，影片中重复了很多以往各部中惊险的场景，但是与以往不同的是，阿汤哥并不是万能的了。尤其是在水下的那一场戏，阿汤哥其实并没有完成任务，而是真的令一个任务不可能完成了，最终是丽贝卡完成了任务并解救了阿汤哥。另一个是与以往无所不能的阿汤哥相比，这一部中当队友不配合的时候，阿汤哥是一点办法都没有的，比如开场的高空戏，西门佩吉打不开机舱门的时候，阿汤哥只能扒着飞机，相比较第四部阿汤哥攀爬迪拜塔同样遇到了西门佩吉这个猪队友，但总算有惊无险地完成了任务。连一些小细节也暗示了老去的阿汤哥，比如阿汤哥已经不能帅气地翻跃一辆汽车了，而是直接摔倒在车上。对比一下昔日的小鲜肉和今天的大叔吧，左边来源于第一部《碟中谍》，右边就是《神秘国度》中的阿汤哥。

贯穿始末的《今夜无人入睡》

在影片中有一段剧情包含了歌剧《图兰朵》的演出，提起《图兰朵》就不得不说《今夜无人入睡》，这首咏叹调可以算作是这部歌剧的代表，在电影中这一段音乐不仅仅作为了那一场戏的配乐，而是作为整部电影音乐的一个主题，贯穿始末，提示着男女主角的感性线。《图兰朵》的剧情中图兰朵直到故事快结束时都是一个冷酷的女人，行为反复无常，最后才感动于王子的真情嫁给了王子。《今夜无人入睡》把《图兰朵》的剧情代入了电影，影射了电影中男女主人公的关系，当我看到丽贝卡所饰演的女间谍在咖啡馆对伊森亨特的表白，联想到她起初的冷酷，对伊森亨特的利用，以及后面的感情变化，我在影院里就感叹：这就是一部《图兰朵》啊！再加上适时响起的《今夜无人入睡》，颇让人情绪激动不能自已。一部好电影应该包含这种值得琢磨的地方，当我们思考为什么电影中要使用这样的配乐后，能有更有意思的东西出来，这才是一部影片不会浮于表面被人遗忘的缘由。最后是电影原声《Finale and Curtain Call》，尤其是前面的一分半融合了《今夜无人入睡》和经典《碟中谍》插曲。