0%

最近看了不少小说和电影。

其中电影算是跟风看的,包括《釜山行》和《隧道》。小说的话还是东野圭吾的作品《毕业》和《时生》。

这两部韩国电影虽说硬伤不少,比如《釜山行》里脑残的车厢安排,《隧道》里手机电池的坚挺,不过要说扣人心弦,还真挺有套路的。就故事而言,完成度还不错,来龙去脉、节奏控制都比较完整,能够算是合格的工业线生产的类型片。比起很多国内的电影,这个水平已经是相当不错了。很多网友都说某某电影甩国产电影几条街,这两部也在某某之列,不说是不是有几条街远,就整体水平而言,韩国电影里的烂片可能在国内也能算是中等以上了。说来也奇怪,一些设置放在韩国电影或者电视剧里好像很自然,但是放在国内的电影里就相当出戏,比如《大鱼海棠》里的人物设置,我喜欢你,你喜欢他,我为你牺牲一切,看起来好像完全没问题,但是不知道为什么在《大鱼海棠》里就是那么不自然。不知道是我对韩国电影有了刻板印象,还是对国内电影太过苛求,总是感觉大部分国内电影在使用这样的人设时都质量堪忧啊。

两部东野圭吾的小说好像也没什么太多要说的,东野圭吾是个高产的作家,最近也看了很多,说不上有多好,很多情节设置好像有些牵强,或者巧合过多。东野圭吾的小说可能不是世界上最好的推理小说,但是他对人性的挖掘的确是深刻的,不管是对人性恶的一面还是善的一面,东野都会进行放大,让读者无需费什么劲儿就能感受到,从而也会反思生活中的我们是不是也是有这样的恶或善。

看了不少东西,不过让我突然有所感触的却是家里餐厅墙上挂着的一幅十字绣,十字绣的内容是《兰亭序》。以前上学的时候没有太大的感觉,只是背了下来,但是那天吃饭的时候又读了读,马上就联想到最近看的电影和小说,这些人都是在感叹:活着真好!《兰亭序》里的那一句“固知一死生为虚诞,齐彭殇为妄作”,不也是王羲之在欣赏了自然之美,感受了快意人生之后,希望自己可以多多享受这样的人生吗?曾几何时,我也好像有了类似的感觉,突然觉得美好的事物太多,想体验的人生太多,但是时间过得太快,都来不及慢慢体会,就已经飞逝而过了。

TITLE: Face Detection with End-to-End Integration of a ConvNet and a 3D Model

AUTHOR: Yunzhu Li, Benyuan Sun, Tianfu Wu, Yizhou Wang

ASSOCIATION: Peking University, North Carolina State University

FROM: arXiv:1606.00850

CONTRIBUTIONS

  1. It presents a simple yet effective method to integrate a ConvNet and a 3D model in an end-to-end learning with multi-task loss used for face detection in the wild.
  2. It addresses two limitations in adapting the state-of-the-art faster-RCNN for face detection: eliminating the heuristic design of anchor boxes by leveraging a 3D model, and replacing the generic and predefined RoI pooling with a configuration pooling which exploits the underlying object structural configurations.
  3. It obtains very competitive state-of-the-art performance in the FDDB and AFW benchmarks.

METHOD

The main scheme of inferring is shown in the following figure.

The input image is sent into a ConvNet, e.g. VGG, with an upsampling layer. Then the network will generate face proposals based on the score of summing the log probability of the keypoints, which is predicted by the predefined 3D face model.

some details

  1. The loss of keypoint labels is defined as

    $$ L_{cls}(\omega)= -{1 \over 2m} \sum_{i=1}^{2m} \log(p_{l_i}^{\mathbf{x}_i}) $$

    where $\omega$ stands for the learnable weights of ConvNet, $m$ is the number of the keypoints, and $p_{l_i}^{\mathbf{x}_i}$ is the probability of the point in location $\mathbf{x}_i$, which can be obtained by annotations, belongs to label $l_i$.

  2. The loss of keypoit locations is defined as

    $$ L_{loc}^{pt}(\omega)={1 \over m^2} \sum_{i=1}^m \sum_{i=1}^m \sum_{t \in {x,y}} Smooth(t_i-\hat{t}_{i,j}) $$

    where $smooth(\cdot)$ is the smooth $l_1$ loss. For each ground-truth keypoint, we can generate a set of predicted keypoints based on the 3D face model and the 3D transformation parameters. If for each face we have $m$ keypoints, then we will generate m sets of predicted keypoints. For each keypoint, m locations will be predicted.

  3. The Configuration Pooling Layer is similar to the ROI Pooling Layer in faster-RCNN. Features are extracted based on the locations and relations of the keypoints, rather than based on the predefined perceptive field.

Recently I am searching for an easy-to-use library of media framework that can be embedded to my own codes. First I have tried ffmpeg and SDL2 libraries, which I found too difficult for me since I have little experience of developing multimedia related applications. Then I found VLC media player after I searched the internet using a key word of “media player library”. I thought maybe it was a good choice and searched related projects in github. There is a very simple demo project called Controlling VLC media player using OpenCV. I was extracted by the word “OpenCV” because I am an engineer in computer vision. After I clicked the link, BINGO! THAT’S ALL I NEED!

Install libvlc

  1. A simple command is used to install libvlc sudo apt-get install libvlc-dev.
  2. One may need to install vlc media player to get the plugins work sudo apt-get install vlc.

Use libvlc

The following code is a simple demo of how to use libvlc to play a video

#include <vlc/vlc.h>
// LibVLC requirements, plays video specified as a command line argument
libvlc_instance_t *instance = libvlc_new(0, NULL);
libvlc_media_t *media = libvlc_media_new_path(instance, argv[1]);	
libvlc_media_player_t *mplayer = libvlc_media_player_new_from_media(media);

while(1)
{
    //something to do while playing video
    //...
}

libvlc_media_release(media);
libvlc_release(instance);

The libvlc will handle the multimedia thread. All we need to do is controlling the player in main thread. More document can be found here.

前一阵在火车站随便买了一本《人类简史》,本打算是在车上无聊打发时间用,没想到看到了一本十分高质量的书。

刚刚看到书名的时候我还以为是一本讲解人类历史上大事件的书,因为自己对历史本来就很感兴趣,所以也没看看简介或者序文就直接买了。后来看了目录才发现这是一本完全不一样的历史书。书的英文原名是《Sapiens: A Brief History of Humankind》,Sapiens可以被译为“智人”,我们人类的生物学名称,这本书讲的就是“智人”如何从一种也没什么特别的动物逐渐演变成“变成神的动物”的。作者在书中既肯定了“智人”的成功,又警示了“智人”的毁灭。

第一章的开篇,作者这样写到

大约在135亿年前,经过所谓的“大爆炸”之后,宇宙的物质、能量、时间和空间才成了现在的样子。宇宙的这些基本特征,就成了“物理学”。

在这之后过了大约30万年,物质和能量开始形成复杂的结构,称为“原子”,再进一步构成“分子”。至于这些原子和分子的故事以及它们如何互动,就成了“化学”。

大约38亿年前,这个叫做地球的行星上,有些分子结合起来,形成一种特别庞大而又精细的结构,称为“有机体”。有机体的故事,就成了“生物学”。

到了大约7万年前,一些属于“智人”这一物种的生物,开始创造出更加复杂的架构,称为“文化”。而这些人类文化继续发展,就成了“历史学”。

这短短的一段话真是醍醐灌顶,我还从来没有听过什么人从这个角度来阐述人类的发展历程。感觉这就已经完美概括了人类的历史——人类的起源是什么,人类的活动是什么。

作者认为人类历史有三大重要革命:发生在大约7万年前的“认知革命”让历史正式启动;大约12000年前的“农业革命”让历史加速发展;大约500年前的“科学革命”让历史画下句点而另创新局。整本书也正是从这三大革命进行展开,逐渐阐述并展示给读者,这三大革命如何改变了人类和这个世界。

书中从生物进化、文化演进、科学发展等多个角度阐述了上面提到的三大革命。其中既有高深的学术研究成果,也有幽默有趣的实例,还包括一些我感觉近似哲学的论述。

第一个让我印象深刻的论述就发生在“认知革命”时期,这一论述甚至让我觉得人类就是一种带着“原罪”的生物。我们人类总是认为自己是神创造的独一无二的生物,但是其实在生物学的演化过程中,我们所属的物种可不只“智人”这一支,“human”的真正意思是“属于人属的动物”,包括尼安德特人、直立人、梭罗人、弗洛里斯人、丹尼索瓦人等等,就好像我们养的宠物狗有各种各样的品种,人类也一样,但是现在世界上只剩下了我们“智人“一种人类,而”智人“通过血腥的手段灭绝了其他人种。先不论后面在”农业革命“和”科学革命“时期人类对环境危机和物种危机的责任,单单残害异种人类就让人感到脊背发凉。

第二个印象深刻的论述让我刚刚说的话颇显唯心,甚至让整个人类的文化都显得唯心。作者在书中对这一部分的标题为”想象构建的秩序存在于人和人之间思想的连接“,这一句话读起来可是够拗口。文中以”标致“公司为例,阐述了这一观点,我们该以什么样的标准才能说标致公司确实存在。标致公司生产的实实在在的汽车,不能代表标致公司,因为如果我们把这些汽车全部销毁,我们仍然认为标致公司可以继续制造出属于标致公司的汽车。标致公司的员工也不能代表标致公司,因为如果标致公司的全部员工罹难,公司还是可以重新招聘新的员工。没有任何一个实体的东西可以代表标致公司,标致公司只是我们的一个集体想象。我们想毁灭掉标致公司,就必须从法律上消灭它,例如我们判定标致公司属于非法组织进行取缔。然而法律是什么呢,依旧是我们的一个集体想象,如果我说我不相信法律了,那么我肯定会受到法律的制裁,那是因为绝大多数人都相信法律,但如果是绝大多数人都不相信法律呢?

这两个论述让我冒出了很多疑问。比如当我们遇到一些烦恼的时候,我们是不是可以反思自己是否处于一个集体想象中呢?我们可不可以稍稍跳出一下这个集体想象?如果所有的集体想象都没了我们是否还能称之为”人“呢?说到这里,又有一个疑问,既然我们承认”人“是平平常常的一个物种,我们为什么还要思考我们是否能被称之为”人“?

Joshua-s-Blog

I start up this repo to manage the code of my personal website JOSHUA’s BLOG, which is set up on WebFaction and implemented by Django and several other packages. The repository is here.

In addition to manage my own code, viewers can also use this project to learn how to build a website using Django. I will write down the works that need to be done to use these code.

Finally, welcome to visit JOSHUA’s BLOG.

Environment

  1. Ubuntu 14.04. Though Ubuntu 16.04 has been released for a while, I am still using an older LTS release. I am not sure the following instruction can work fine on Ubuntu 16.04.
  2. Apache 2.4. I chose Apache 2.4 because my deploy server is powered by Apache 2.4. The Apache HTTP Server Project is an effort to develop and maintain an open-source HTTP server for modern operating systems including UNIX and Windows.
  3. WSGI 1.5. WSGI is the Web Server Gateway Interface. It is a specification that describes how a web server communicates with web applications, and how web applications can be chained together to process one request.
  4. Django 1.7.1. Hmm… I am still using an ancient version of Django. It is also because I wish I could have an exactly same developing environment with the deploy one. Django is a high-level Python Web framework that encourages rapid development and clean, pragmatic design. Built by experienced developers, it takes care of much of the hassle of Web development, so you can focus on writing your app without needing to reinvent the wheel. It’s free and open source.

Instructions

Install Apache 2.4

Apache can be installed easily using the apt-get. sudo apt-get install apache2. Then we can use apacheclt -v to check the version of Apache and its success of installation.

Install mod_wsgi 1.5

mod_wsgi can be installed using Pip, which can be installed using sudo apt-get install python-pip.

  1. install mod_wsgi using pip install mod_wsgi. I met an error of missing Apache httpd server packages. here. I bypassed this error by sudo apt-get install apache2-mpm-worker apache2-dev.
  2. download mod_wsgi-3.5.tar.gz from here.
  3. extract files from the package using tar xvfz mod_wsgi-3.5.tar.gz.
  4. enter the directory cd cd mod_wsgi-3.5.
  5. config by ./configure
  6. make
  7. sudo make install

Install Django 1.7.1

Install Django using command pip install Django==1.7.1. To verify its installation, try to import django in python console. If no error is raised, then the installation is successful.

Set up joshua_blog project with Apache

Note: after download the project, you should change the folder name to joshua_blog.

  1. add the following code to the file of /etc/apache2/apache2.conf

     LoadModule wsgi_module /usr/lib/apache2/modules/mod_wsgi.so
     WSGIScriptAlias / /home/joshua/CODE/PYTHON/joshua_blog/joshua_blog/wsgi.py
     Alias /media/ /home/joshua/CODE/PYTHON/joshua_blog/media/
     Alias /static/ /home/joshua/CODE/PYTHON/joshua_blog/static/
    
     <Directory /home/joshua/CODE/PYTHON/joshua_blog/static>
         Require all granted
     </Directory>
    
     <Directory /home/joshua/CODE/PYTHON/joshua_blog/media>
         Require all granted
     </Directory>
    
     <Directory /home/joshua/CODE/PYTHON/joshua_blog/joshua_blog>
         <Files wsgi.py>
             Require all granted
         </Files>
     </Directory>
    
     WSGIPythonPath /home/joshua/anaconda/lib/python2.7/site-packages
     ServerName localhost:80
    
  2. add the following code to the file of joshua_blog/joshua_blog/wsgi.py

     import sys
     sys.path.append("/path/joshua_blog/")
    

    in my case, the path is /home/joshua/CODE/PYTHON/.

  3. restart apache server by sudo service apache2 restart

Install dependency modules

Coming here, we have set up the environment. But when we visit 127.0.0.1, we found that the web server return 500 error. It is because several modules needed by joshua_blog are of need to be installed manually. We can see which modules are missed in the file of /var/log/apache2/error.log. Next we will see how to install these modules.

  1. django-bootstrap. The author suggests to install this app using pip install django-bootstrap, but the latest version of the app requires Django>=1.8. Thus we need to download django-bootstrap from 6.x.x branch, which can be found here. Extract the package and enter its directory. Install by python setup.py install.
  2. django-filemanager. The installation instruction can be found here.
  3. django-disqus. Install by pip install django-disqus. A brief introduction in Chinese can be found here
  4. unidecode. Install by pip install unidecode.
  5. markdown2. Install by pip install markdown2. A brief introduction in Chinese can be found here.

At last we can visit our website at 127.0.0.1. We can log in as the supervisor using the username of joshua and the password of joshua. We may meet other issues:

  1. Here maybe we will get an error message reading “attempt to write a readonly database”. We should change its mode by chmod 666 db.sqlite3.
  2. Then another error comes as “unable to open database file”. Then we should change the owner of the whole project sudo chown www-data joshua_blog.

Today I read a very interesting essay about solitude. The ideas of the essay are quite appealing for me because I myself prefer to be alone rather than hanging out with people. Sometimes I even prefer to travel alone without any companions, though it is not a wise choice. The essay can be found here.

Today’s society encourages us to spend as much time as possible with other people; even when we are alone, we are texting, emailing, phoning, and Skyping each other. We are always expected to be doing something or going somewhere, but there are actually lots of benefits to spending time alone.

Solitude isn’t very popular in our constantly connected world, and often people who spend time alone are assumed to be lonely or sad. However, this is rarely the case; lots people enjoy spending time alone because it benefits them psychologically. Spending time alone is actually good for us, and it gives us the chance to relax and recharge.

  1. Spending Time Alone Will Make You More Confident

    Unconfident people often rely on others to help with decisions, but spending time alone encourages you to make decisions for yourself. When you are alone, you can ignore other people’s opinions and ideas so that you can really focus on your own thoughts. You get the opportunity to weigh up all of the pros and cons so that you make the best possible decision, which helps to inspire confidence within ourselves.

  2. Spending Time Alone Will Boost Your Productivity

    When we are around other people, we often become distracted from our goals and priorities. When we are alone, we get the chance to really think about what matters to us, from work to family to money. This helps us to decide our goals and it also motivates us to work towards achieving our goals.

  3. Spending Time Alone Helps You to be Creative

    Everyone is creative in different ways, but when we are around other people, we are more likely to do what the rest of the group is doing. When we are alone, outside influences are removed and we can do exactly as we please. Some people enjoy drawing or painting, and others might enjoy cooking, reading, writing, or making music. There are lots of different ways to be creative, and when we are alone we get the chance to explore our individual interests and abilities.

  4. Spending Time Alone Will Clear Your Mind

    Our society is filled with information and we often overload on the endless stream of information coming from work, social media, and our friends and family. Sometimes it is essential to take a break from the stream of information so that we can think about our lives and assess everything that is happening. Spending time alone allows us to clear our minds, which makes us happier and more relaxed.

  5. Spending Time Alone Will Help You to Solve Problems

    The best solutions often come to us when we are alone and reflecting on our problems. Spending time alone helps us to work through a problem as we get the chance to really think about it. We can think about what caused the problem, as well as all the different ways we can do to solve the problem. We also get the opportunity to think about what we really want, which helps us to think of effective solutions.

  6. Spending Time Alone Will Help You to Get Things Done

    Most people have a list of things that they need to do, but it is difficult to tick anything off the list when you are always with people. When you are alone, you get uninterrupted time to work on your to-do list, and we often achieve a lot more when we are alone as there are no distractions. Even if it isn’t fun to work through the list, you will feel happy and productive when you finish working.

  7. Spending Time Alone Relieves Stress and Anxiety

    Lots of people today suffer from stress and anxiety, and spending time alone helps us to de-stress and relax. When we are alone we don’t have to listen to other people’s problems or issues; we can just relax and do as we please.

  8. Spending Time Alone Encourages You to be More Independent

    One of the main benefits to spending time alone is that it inspires us to be more independent. When we are alone we can focus on personal progress, problem solving, and enjoying ourselves, which all encourage independence and self-love.

Previously I have written some notes of how to set up environment for Caffe on Ubuntu 14.04. My strategy is DIY. Though the methods mentioned in those notes worked well at that time, they may be out of date now. Since I am quite a rookie in Linux OS, I need this additional note to bypass the problems I met recently.

Concerning setting link directories

New a configuration file in the directory of /etc/ld.so.conf.d/ and the file name must be end up with .conf. In the file is the directories that need to be added to the link path. Finally use command sudo ldconfig to make it effective. Here is an example to add link path of OpenBLAS to the environment. A new file named as openblas.conf is created and saved in /etc/ld.so.conf.d/. In the file a path is written /home/joshua/LIBS/openblas/lib. Then type sudo ldconfig in a terminal window.

Concerning installing NVIDIA GPU drivers

A very easy way to install NVIDIA GPU drivers is using the tools called “Software&Updates” in “System Settings” of Ubuntu. In “Software&Updates” there is a tag “Additional Drivers”, which can be used to select possible GPU drivers. After reboot the new settings can take effect.

Concerning making Caffe

After sudo apt-get update and sudo apt-get upgrade, I found that Caffe can not make. The error came from TIFF. I solved the problem by conda remove libtiff.

东野圭吾的《解忧杂货店》里有这样一段话:

不管是骚扰还是恶作剧,写这些信给浪矢杂货店的人,和普通的咨询者在本质上是一样的。他们都是内心破了一个洞,重要的东西正从那个破洞逐渐流失。证据就是,这样的人也一定会来拿回信,他会来查看牛奶箱。因为他很想知道,浪矢爷爷会怎样回复自己的信。你想想看,就算是瞎编的烦恼,要一口气想出三十个也不简单。既然费了这么多心思,怎么可能不想知道答案?所以我不但要写回信,而且要好好思考后再写。人的心声是绝对不能无视的。

一直被这最后一句话打动着,“人的心声是绝对不能无视的”,我们绝对应该好好倾听一下自己的心声,问问自己到底想要些什么。这一点可能特别困难,因为就像话剧《李白》里,当人生如意、仕途坦荡时,诗人外罩锦袍,内套道袍,写出的诗句是“人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来”。相反当入世不顺、无奈被贬之时,诗人则道袍在外,锦袍与内,写出的诗句是“君不见吴中张翰称达生,秋风忽忆江东行。且乐生前一杯酒,何须身后千载名?”。都是喝酒但是,一个是踌躇满志的得意酒,一个是悻悻作罢的失意酒。连诗仙李白都是这么一个“俗人”,更何况我们这些芸芸呢。找到自己的内心真的很难,有多少时候我们有了小小的成就,便发出一种“世界全是我的”的感叹,而遇到些许困难,又是一副悲天悯人的模样。不知道浪矢爷爷说的那个洞,到底能不能真的被填上。真希望有个浪矢爷爷来帮我们答疑解惑,

很多时候,咨询的人心里已经有了答案,来咨询只是想确认自己的决定是对的。所以有些人读过回信后,会再次写信过来,大概就是因为回答的内容和他的想法不一样吧。

说得太对了,当我们挣扎着做了一个决定时,总是希望别人也有同样的想法,并给予自己支持。现实也的确如此,当我们已经做了一个决定时,我们再去咨询别人,别人大多也会顺着我们的决心说,不管是鼓励还是安慰,总是要说一些顺耳的话。但也许我们的决定并不一定是正确的,那些给予我们正确意见的声音,被我们选择性的过滤掉了。我们到底该听谁的意见呢,这可太难了,我们想顺着自己的心意行事,但是又害怕做了错事,别人真的那么在乎我们的烦恼吗?他们的意见一定正确吗?我们都不知道。因此,获取我们真的是应该在做决定时抛一枚硬币,就按照我们期望的那一面去做吧。

TITLE: What makes ImageNet good for transfer learning?

AUTHOR: Minyoung Huh, Pulkit Agrawal, Alexei A. Efros

ASSOCIATION: Berkeley Artificial Intelligence Research (BAIR) Laboratory, UC Berkeley

FROM: http://arxiv.org/abs/1608.08614

CONTRIBUTIONS

Several questions about how the dataset affects the training of CNN is discussed, including

  • Is more pre-training data always better? How does feature quality depend on the number of training examples per class?
  • Does adding more object classes improve performance?
  • For the same data budget, how should the data be split into classes?
  • Is fine-grained recognition necessary for learning good features?
  • Given the same number of training classes, is it better to have coarse classes or fine-grained classes?
  • Which is better: more classes or more examples per class?

Summary

The following is a summary of the main findings:

  1. How many pre-training ImageNet examples are sufficient for transfer learning? Pre-training with only half the ImageNet data (500 images per class instead of 1000)results in only a small drop in transfer learning performance (1.5 mAP drop on PASCAL-DET). This drop is much smaller than the drop on the ImageNet classification task itself.
  2. How many pre-training ImageNet classes are sufficient for transfer learning? Pre-training with an order of magnitude fewer classes (127 classes instead of 1000) results in only a small drop in transfer learning performance (drop of 2.8 mAP on PASCAL-DET). Quite interestingly, we also found that for some transfer tasks, pre-training with fewer number of classes leads to better performance.
  3. How important is fine-grained recognition for learning good features for transfer learning? The above experiment also suggests that transferable features are learnt even when a CNN is pre-trained with a set of classes that do not require fine-grained discrimination.
  4. Given the same budget of pre-training images, should we have more classes or more images per class? Training with fewer classes but more images per class performs slightly better than training with more classes but fewer images per class.
  5. Is more data always helpful? We found that training using 771 ImageNet classes that excludes all PASCAL VOC classes, achieves nearly the same performance on PASCALDET as training on complete ImageNet. Further experiments confirm that blindly adding more training data does not always lead to better performance and can sometimes hurt performance.

TITLE: PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection

AUTHER: Kye-Hyeon Kim, Yeongjae Cheon, Sanghoon Hong, Byungseok Roh, Minje Park

ASSOCIATION: Intel Imaging and Camera Technology

FROM: arXiv:1608.08021

CONTRIBUTIONS

An efficient object detector based on CNN is proposed, which has the following advantages:

  • Computational cost: 7.9GMAC for feature extraction with 1065x640 input (cf. ResNet-101: 80.5GMAC1)
  • Runtime performance: 750ms/image (1.3FPS) on Intel i7-6700K CPU with a single core; 46ms/image (21.7FPS) on NVIDIA Titan X GPU
  • Accuracy: 81.8% mAP on VOC-2007; 82.5% mAP on VOC-2012 (2nd place)

Method

The author utilizes the pipline of Faster-RCNN, which is “CNN feature extraction + region proposal + RoI classification”. The author claims that feature extraction part needs to be redesigned, since region proposal part is not computationally expensive and classification part can be efficiently compressed with common techniques like truncated SVD. And the principle is “less channels with more layers” and adoption of some building blocks including concatenated ReLU, Inception, and HyperNet. The structure of the network is as follows:

Some Details

  1. Concatenated rectified linear unit (C.ReLU) is applied to the early stage of the CNNs (i.e., first several layers from the network input) to reduce the number of computations by half without losing accuracy. In my understanding, the C.ReLU encourages the network to learn Gabor-like filters and helps to accelerate the forward-propagation. If the output of the C.ReLu is 64, its convolution layer only needs 32-channel outputs. And it may harm the performance if it is used to the later stage of the CNNs, because it keeps the negative responses as activated signal, which means that a mad brain is trained.
  2. Inception is applied to the remaining of the feature generation sub-network. An Inception module produces output activations of different sizes of receptive fields, so that increases the variety of receptive field sizes in the previous layer. All the design policies can be found in this related work.
  3. The author adopted the idea of multi-scale representation like HyperNet that combines several intermediate outputs so that multiple levels of details and non-linearities can be considered simultaneously. Direct concatenation of all abstraction layers may produce redundant information with much higher compute requirement and layers which are too early for object proposal and classification would be little help. The author combines 1) the last layer and 2) two intermediate layers whose scales are 2x and 4x of the last layer, respectively.
  4. Residual structure is also used in this network, which helps to train very deep CNNs.