0%

Today I read a very interesting essay about solitude. The ideas of the essay are quite appealing for me because I myself prefer to be alone rather than hanging out with people. Sometimes I even prefer to travel alone without any companions, though it is not a wise choice. The essay can be found here.

Today’s society encourages us to spend as much time as possible with other people; even when we are alone, we are texting, emailing, phoning, and Skyping each other. We are always expected to be doing something or going somewhere, but there are actually lots of benefits to spending time alone.

Solitude isn’t very popular in our constantly connected world, and often people who spend time alone are assumed to be lonely or sad. However, this is rarely the case; lots people enjoy spending time alone because it benefits them psychologically. Spending time alone is actually good for us, and it gives us the chance to relax and recharge.

  1. Spending Time Alone Will Make You More Confident

    Unconfident people often rely on others to help with decisions, but spending time alone encourages you to make decisions for yourself. When you are alone, you can ignore other people’s opinions and ideas so that you can really focus on your own thoughts. You get the opportunity to weigh up all of the pros and cons so that you make the best possible decision, which helps to inspire confidence within ourselves.

  2. Spending Time Alone Will Boost Your Productivity

    When we are around other people, we often become distracted from our goals and priorities. When we are alone, we get the chance to really think about what matters to us, from work to family to money. This helps us to decide our goals and it also motivates us to work towards achieving our goals.

  3. Spending Time Alone Helps You to be Creative

    Everyone is creative in different ways, but when we are around other people, we are more likely to do what the rest of the group is doing. When we are alone, outside influences are removed and we can do exactly as we please. Some people enjoy drawing or painting, and others might enjoy cooking, reading, writing, or making music. There are lots of different ways to be creative, and when we are alone we get the chance to explore our individual interests and abilities.

  4. Spending Time Alone Will Clear Your Mind

    Our society is filled with information and we often overload on the endless stream of information coming from work, social media, and our friends and family. Sometimes it is essential to take a break from the stream of information so that we can think about our lives and assess everything that is happening. Spending time alone allows us to clear our minds, which makes us happier and more relaxed.

  5. Spending Time Alone Will Help You to Solve Problems

    The best solutions often come to us when we are alone and reflecting on our problems. Spending time alone helps us to work through a problem as we get the chance to really think about it. We can think about what caused the problem, as well as all the different ways we can do to solve the problem. We also get the opportunity to think about what we really want, which helps us to think of effective solutions.

  6. Spending Time Alone Will Help You to Get Things Done

    Most people have a list of things that they need to do, but it is difficult to tick anything off the list when you are always with people. When you are alone, you get uninterrupted time to work on your to-do list, and we often achieve a lot more when we are alone as there are no distractions. Even if it isn’t fun to work through the list, you will feel happy and productive when you finish working.

  7. Spending Time Alone Relieves Stress and Anxiety

    Lots of people today suffer from stress and anxiety, and spending time alone helps us to de-stress and relax. When we are alone we don’t have to listen to other people’s problems or issues; we can just relax and do as we please.

  8. Spending Time Alone Encourages You to be More Independent

    One of the main benefits to spending time alone is that it inspires us to be more independent. When we are alone we can focus on personal progress, problem solving, and enjoying ourselves, which all encourage independence and self-love.

Previously I have written some notes of how to set up environment for Caffe on Ubuntu 14.04. My strategy is DIY. Though the methods mentioned in those notes worked well at that time, they may be out of date now. Since I am quite a rookie in Linux OS, I need this additional note to bypass the problems I met recently.

Concerning setting link directories

New a configuration file in the directory of /etc/ld.so.conf.d/ and the file name must be end up with .conf. In the file is the directories that need to be added to the link path. Finally use command sudo ldconfig to make it effective. Here is an example to add link path of OpenBLAS to the environment. A new file named as openblas.conf is created and saved in /etc/ld.so.conf.d/. In the file a path is written /home/joshua/LIBS/openblas/lib. Then type sudo ldconfig in a terminal window.

Concerning installing NVIDIA GPU drivers

A very easy way to install NVIDIA GPU drivers is using the tools called “Software&Updates” in “System Settings” of Ubuntu. In “Software&Updates” there is a tag “Additional Drivers”, which can be used to select possible GPU drivers. After reboot the new settings can take effect.

Concerning making Caffe

After sudo apt-get update and sudo apt-get upgrade, I found that Caffe can not make. The error came from TIFF. I solved the problem by conda remove libtiff.

东野圭吾的《解忧杂货店》里有这样一段话:

不管是骚扰还是恶作剧,写这些信给浪矢杂货店的人,和普通的咨询者在本质上是一样的。他们都是内心破了一个洞,重要的东西正从那个破洞逐渐流失。证据就是,这样的人也一定会来拿回信,他会来查看牛奶箱。因为他很想知道,浪矢爷爷会怎样回复自己的信。你想想看,就算是瞎编的烦恼,要一口气想出三十个也不简单。既然费了这么多心思,怎么可能不想知道答案?所以我不但要写回信,而且要好好思考后再写。人的心声是绝对不能无视的。

一直被这最后一句话打动着,“人的心声是绝对不能无视的”,我们绝对应该好好倾听一下自己的心声,问问自己到底想要些什么。这一点可能特别困难,因为就像话剧《李白》里,当人生如意、仕途坦荡时,诗人外罩锦袍,内套道袍,写出的诗句是“人生得意须尽欢,莫使金樽空对月。天生我材必有用,千金散尽还复来”。相反当入世不顺、无奈被贬之时,诗人则道袍在外,锦袍与内,写出的诗句是“君不见吴中张翰称达生,秋风忽忆江东行。且乐生前一杯酒,何须身后千载名?”。都是喝酒但是,一个是踌躇满志的得意酒,一个是悻悻作罢的失意酒。连诗仙李白都是这么一个“俗人”,更何况我们这些芸芸呢。找到自己的内心真的很难,有多少时候我们有了小小的成就,便发出一种“世界全是我的”的感叹,而遇到些许困难,又是一副悲天悯人的模样。不知道浪矢爷爷说的那个洞,到底能不能真的被填上。真希望有个浪矢爷爷来帮我们答疑解惑,

很多时候,咨询的人心里已经有了答案,来咨询只是想确认自己的决定是对的。所以有些人读过回信后,会再次写信过来,大概就是因为回答的内容和他的想法不一样吧。

说得太对了,当我们挣扎着做了一个决定时,总是希望别人也有同样的想法,并给予自己支持。现实也的确如此,当我们已经做了一个决定时,我们再去咨询别人,别人大多也会顺着我们的决心说,不管是鼓励还是安慰,总是要说一些顺耳的话。但也许我们的决定并不一定是正确的,那些给予我们正确意见的声音,被我们选择性的过滤掉了。我们到底该听谁的意见呢,这可太难了,我们想顺着自己的心意行事,但是又害怕做了错事,别人真的那么在乎我们的烦恼吗?他们的意见一定正确吗?我们都不知道。因此,获取我们真的是应该在做决定时抛一枚硬币,就按照我们期望的那一面去做吧。

TITLE: What makes ImageNet good for transfer learning?

AUTHOR: Minyoung Huh, Pulkit Agrawal, Alexei A. Efros

ASSOCIATION: Berkeley Artificial Intelligence Research (BAIR) Laboratory, UC Berkeley

FROM: http://arxiv.org/abs/1608.08614

CONTRIBUTIONS

Several questions about how the dataset affects the training of CNN is discussed, including

  • Is more pre-training data always better? How does feature quality depend on the number of training examples per class?
  • Does adding more object classes improve performance?
  • For the same data budget, how should the data be split into classes?
  • Is fine-grained recognition necessary for learning good features?
  • Given the same number of training classes, is it better to have coarse classes or fine-grained classes?
  • Which is better: more classes or more examples per class?

Summary

The following is a summary of the main findings:

  1. How many pre-training ImageNet examples are sufficient for transfer learning? Pre-training with only half the ImageNet data (500 images per class instead of 1000)results in only a small drop in transfer learning performance (1.5 mAP drop on PASCAL-DET). This drop is much smaller than the drop on the ImageNet classification task itself.
  2. How many pre-training ImageNet classes are sufficient for transfer learning? Pre-training with an order of magnitude fewer classes (127 classes instead of 1000) results in only a small drop in transfer learning performance (drop of 2.8 mAP on PASCAL-DET). Quite interestingly, we also found that for some transfer tasks, pre-training with fewer number of classes leads to better performance.
  3. How important is fine-grained recognition for learning good features for transfer learning? The above experiment also suggests that transferable features are learnt even when a CNN is pre-trained with a set of classes that do not require fine-grained discrimination.
  4. Given the same budget of pre-training images, should we have more classes or more images per class? Training with fewer classes but more images per class performs slightly better than training with more classes but fewer images per class.
  5. Is more data always helpful? We found that training using 771 ImageNet classes that excludes all PASCAL VOC classes, achieves nearly the same performance on PASCALDET as training on complete ImageNet. Further experiments confirm that blindly adding more training data does not always lead to better performance and can sometimes hurt performance.

TITLE: PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection

AUTHER: Kye-Hyeon Kim, Yeongjae Cheon, Sanghoon Hong, Byungseok Roh, Minje Park

ASSOCIATION: Intel Imaging and Camera Technology

FROM: arXiv:1608.08021

CONTRIBUTIONS

An efficient object detector based on CNN is proposed, which has the following advantages:

  • Computational cost: 7.9GMAC for feature extraction with 1065x640 input (cf. ResNet-101: 80.5GMAC1)
  • Runtime performance: 750ms/image (1.3FPS) on Intel i7-6700K CPU with a single core; 46ms/image (21.7FPS) on NVIDIA Titan X GPU
  • Accuracy: 81.8% mAP on VOC-2007; 82.5% mAP on VOC-2012 (2nd place)

Method

The author utilizes the pipline of Faster-RCNN, which is “CNN feature extraction + region proposal + RoI classification”. The author claims that feature extraction part needs to be redesigned, since region proposal part is not computationally expensive and classification part can be efficiently compressed with common techniques like truncated SVD. And the principle is “less channels with more layers” and adoption of some building blocks including concatenated ReLU, Inception, and HyperNet. The structure of the network is as follows:

Some Details

  1. Concatenated rectified linear unit (C.ReLU) is applied to the early stage of the CNNs (i.e., first several layers from the network input) to reduce the number of computations by half without losing accuracy. In my understanding, the C.ReLU encourages the network to learn Gabor-like filters and helps to accelerate the forward-propagation. If the output of the C.ReLu is 64, its convolution layer only needs 32-channel outputs. And it may harm the performance if it is used to the later stage of the CNNs, because it keeps the negative responses as activated signal, which means that a mad brain is trained.
  2. Inception is applied to the remaining of the feature generation sub-network. An Inception module produces output activations of different sizes of receptive fields, so that increases the variety of receptive field sizes in the previous layer. All the design policies can be found in this related work.
  3. The author adopted the idea of multi-scale representation like HyperNet that combines several intermediate outputs so that multiple levels of details and non-linearities can be considered simultaneously. Direct concatenation of all abstraction layers may produce redundant information with much higher compute requirement and layers which are too early for object proposal and classification would be little help. The author combines 1) the last layer and 2) two intermediate layers whose scales are 2x and 4x of the last layer, respectively.
  4. Residual structure is also used in this network, which helps to train very deep CNNs.

I am going to maintain this page to record a few things about computer vision that I have read, am doing, or will have a look at. Previously I’d like to write short notes of the papers that I have read. It is a good way to remember and understand the ideas of the authors. But gradually I found that I forget much portion of what I had learnt because in addition to paper I also derive knowledges from others’ blogs, online courses and reports, not recording them at all. Besides, I need a place to keep a list of what I should have a look at but do not at the time when I discover them. This page will be much like a catalog.

PAPERS AND PROJECTS

OBJECT/SALIENCY DETECTION

  • EfficientDet: Scalable and Efficient Object Detection (PDF, Project/Code)
  • YOLOv4: Optimal Speed and Accuracy of Object Detection (PDF, Project/Code)
  • Learning Data Augmentation Strategies for Object Detection (PDF, Project/Code)
  • Light-Weight RetinaNet for Object Detection (PDF)
  • Objects as Points (PDF, Code/Projects)
  • Augmentation for small object detection (PDF)
  • ThunderNet: Towards Real-time Generic Object Detection (PDF)
  • Pyramid Mask Text Detector (PDF)
  • Gaussian YOLOv3: An Accurate and Fast Object Detector Using Localization Uncertainty for Autonomous Driving (PDF)
  • CornerNet: Detecting Objects as Paired Keypoints (PDF, Code/Project, Reading Note)
  • Scale-Aware Trident Networks for Object Detection (PDF)
  • Acquisition of Localization Confidence for Accurate Object Detectinon (PDF, Project/Code)
  • A Single Shot Text Detector with Scale-adaptive Anchors (PDF)
  • Small-scale Pedestrian Detection Based on Somatic Topology Localization and Temporal Feature Aggregation (PDF)
  • Object detection at 200 Frames Per Second (PDF, )
  • DetNet: A Backbone network for Object Detection (PDF, Reading Note)
  • Zero-Shot Object Detection (PDF)
  • Unsupervised Discovery of Object Landmarks as Structural Representations (PDF, Project/Code)
  • Cascade R-CNN: Delving into High Quality Object Detection (PDF, PROJECT/CODE)
  • Path Aggregation Network for Instance Segmentation (PDF)
  • ClickBAIT-v2: Training an Object Detector in Real-Time (PDF)
  • Single-Shot Bidirectional Pyramid Networks for High-Quality Object Detection (PDF)
  • Complex-YOLO: Real-time 3D Object Detection on Point Clouds (PDF)
  • Zero-Shot Object Detection: Learning to Simultaneously Recognize and Localize Novel Concepts (PDF)
  • Domain Adaptive Faster R-CNN for Object Detection in the Wild (PDF)
  • Chinese Text in the Wild (PDF, Project/Code)
  • TSSD: Temporal Single-Shot Detector Based on Attention and LSTM for Robotic Intelligent Perception (PDF)
  • Tiny SSD: A Tiny Single-shot Detection Deep Convolutional Neural Network for Real-time Embedded Object Detection (PDF, Reading Note)
  • Object Detection in Videos by Short and Long Range Object Linking (PDF)
  • Learning a Rotation Invariant Detector with Rotatable Bounding Box (PDF, Project/Code)
  • Detecting Curve Text in the Wild: New Dataset and New Solution (PDF, Project/Code)
  • Single Shot Text Detector with Regional Attention (PDF, Project/Code)
  • Single-Shot Refinement Neural Network for Object Detection (PDF, Project/Code, Reading Note)
  • $S^3$FD: Single Shot Scale-invariant Face Detector (PDF, Code/Project, Reading Note)
  • MegDet: A Large Mini-Batch Object Detector (PDF)
  • Light-Head R-CNN: In Defense of Two-Stage Object Detector (PDF)
  • Interpretable R-CNN (PDF)
  • Cascade Region Proposal and Global Context for Deep Object Detection (PDF)
  • PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection (PDF, Project/Code, Reading Note)
  • Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks (PDF, Reading Note)
  • Object Detection from Video Tubelets with Convolutional Neural Networks (PDF, Reading Note)
  • R-FCN: Object Detection via Region-based Fully Convolutional Networks (PDF, Project/Code, Reading Note)
  • SSD: Single Shot MultiBox Detector (PDF, Project/Code, Reading Note)
  • Pushing the Limits of Deep CNNs for Pedestrian Detection (PDF, Reading Note)
  • Object Detection by Labeling Superpixels(PDF, Reading Note)
  • Crafting GBD-Net for Object Detection (PDF, Projct/Code)
    code for CUImage and CUVideo, the object detection champion of ImageNet 2016.
  • Fused DNN: A deep neural network fusion approach to fast and robust pedestrian detection (PDF, Reading Note)
  • Training Region-based Object Detectors with Online Hard Example Mining (PDF, Reading Note)
  • Detecting People in Artwork with CNNs (PDF, Project/Code)
  • Deeply supervised salient object detection with short connections (PDF)
  • Learning to detect and localize many objects from few examples (PDF)
  • Multi-Scale Saliency Detection using Dictionary Learning (PDF)
  • Straight to Shapes: Real-time Detection of Encoded Shapes (PDF)
  • Weakly Supervised Cascaded Convolutional Networks (PDF, Reading Note)
  • Speed/accuracy trade-offs for modern convolutional object detectors (PDF, Reading Note)
  • Object Detection via End-to-End Integration of Aspect Ratio and Context Aware Part-based Models and Fully Convolutional Networks (PDF)
  • Feature Pyramid Networks for Object Detection (PDF, Reading Note)
  • COCO-Stuff: Thing and Stuff Classes in Context (PDF)
  • Finding Tiny Faces (PDF)
  • Beyond Skip Connections: Top-Down Modulation for Object Detection (PDF, Reading Note)
  • YOLO9000: Better, Faster, Stronger (PDF, Project/Code, Reading Note)
  • Quantitative Analysis of Automatic Image Cropping Algorithms: A Dataset and Comparative Study (PDF)
  • To Boost or Not to Boost? On the Limits of Boosted Trees for Object Detection (PDF)
  • Pixel Objectness (PDF, Project/Code, Reading Note)
  • DSSD: Deconvolutional Single Shot Detector (PDF, Reading Note)
  • A Fast and Compact Salient Score Regression Network Based on Fully Convolutional Network (PDF)
  • Wide-Residual-Inception Networks for Real-time Object Detection (PDF)
  • Zoom Out-and-In Network with Recursive Training for Object Proposal (PDF, Project/Code)
  • Improving Object Detection with Region Similarity Learning (PDF)
  • Tree-Structured Reinforcement Learning for Sequential Object Localization (PDF)
  • Weakly Supervised Object Localization Using Things and Stuff Transfer (PDF)
  • Unsupervised learning from video to detect foreground objects in single images (PDF)
  • A-Fast-RCNN: Hard Positive Generation via Adversary for Object Detection (PDF, Project/Code)
  • A Learning non-maximum suppression (PDF)
  • Real Time Image Saliency for Black Box Classifiers (PDF)
  • An Efficient Approach for Object Detection and Tracking of Objects in a Video with Variable Background (PDF)
  • RON: Reverse Connection with Objectness Prior Networks for Object Detection (PDF, Project/Code)
  • Deformable Part-based Fully Convolutional Network for Object Detection (PDF, Reading Note)
  • Recurrent Scale Approximation for Object Detection in CNN (PDF)
  • DSOD: Learning Deeply Supervised Object Detectors from Scratch (PDF, Project/Code, Reading Note)
  • PPR-FCN: Weakly Supervised Visual Relation Detection via Parallel Pairwise R-FCN (PDF)
  • Focal Loss for Dense Object Detection (PDF)
  • Learning Uncertain Convolutional Features for Accurate Saliency Detection (PDF)
  • Optimizing Region Selection for Weakly Supervised Object Detection (PDF)
  • Kill Two Birds With One Stone: Boosting Both Object Detection Accuracy and Speed With adaptive Patch-of-Interest Composition (PDF)
  • Flow-Guided Feature Aggregation for Video Object Detection (PDF)
  • BlitzNet: A Real-Time Deep Network for Scene Understanding (PDF, Project/Code)
  • RON: Reverse Connection with Objectness Prior Networks for Object Detection (PDF)
  • Soft Proposal Networks for Weakly Supervised Object Localization (PDF, Project/Code)
  • Feature-Fused SSD: Fast Detection for Small Objects (PDF)
  • Light Cascaded Convolutional Neural Networks for Accurate Player Detection (PDF)
  • Personalized Saliency and its Prediction (PDF)
  • WeText: Scene Text Detection under Weak Supervision (PDF)
  • VPGNet: Vanishing Point Guided Network for Lane and Road Marking Detection and Recognition (PDF, Project/Code)

SEGMENTATION/PARSING

  • CenterMask: single shot instance segmentation with point representation (PDF)
  • Background Matting: The World is Your Green Screen (PDF, Project/Code, Github)
  • Towards Real-Time Automatic Portrait Matting on Mobile Devices (PDF, Project/Code)
  • Panoptic Feature Pyramid Networks (PDF)
  • Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells (PDF)
  • Deep Learning for Semantic Segmentation on Minimal Hardware (PDF)
  • TernausNetV2: Fully Convolutional Network for Instance Segmentation (PDF, Project/Code)
  • Stacked U-Nets: A No-Frills Approach to Natural Image Segmentation (PDF, Project/Code)
  • Deep Object Co-Segmentation (PDF)
  • Fusing Hierarchical Convolutional Features for Human Body Segmentation and Clothing Fashion Classification (PDF)
  • ShuffleSeg: Real-time Semantic Segmentation Network (PDF)
  • Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation (PDF, Project/Code)
  • Learning random-walk label propagation for weakly-supervised semantic segmentation (PDF)
  • Panoptic Segmentation (PDF, Reading Note)
  • Learning to Segment Every Thing (PDF, Project/Code)
  • Deep Extreme Cut: From Extreme Points to Object Segmentation (PDF)
  • Instance-aware Semantic Segmentation via Multi-task Network Cascades (PDF, Project/Code)
  • ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation (PDF, Reading Note)
  • Learning Deconvolution Network for Semantic Segmentation (PDF, Reading Note)
  • Semantic Object Parsing with Graph LSTM (PDF, Reading Note)
  • Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding (PDF, Reading Note)
  • Learning to Segment Moving Objects in Videos (PDF, Reading Note)
  • Deep Structured Features for Semantic Segmentation (PDF)

    We propose a highly structured neural network architecture for semantic segmentation of images that combines i) a Haar wavelet-based tree-like convolutional neural network (CNN), ii) a random layer realizing a radial basis function kernel approximation, and iii) a linear classifier. While stages i) and ii) are completely pre-specified, only the linear classifier is learned from data. Thanks to its high degree of structure, our architecture has a very small memory footprint and thus fits onto low-power embedded and mobile platforms. We apply the proposed architecture to outdoor scene and aerial image semantic segmentation and show that the accuracy of our architecture is competitive with conventional pixel classification CNNs. Furthermore, we demonstrate that the proposed architecture is data efficient in the sense of matching the accuracy of pixel classification CNNs when trained on a much smaller data set.

  • CNN-aware Binary Map for General Semantic Segmentation (PDF)

  • Learning to Refine Object Segments (PDF)
  • Clockwork Convnets for Video Semantic Segmentation(PDF, Project/Code)
  • Convolutional Gated Recurrent Networks for Video Segmentation (PDF)
  • Efficient Convolutional Neural Network with Binary Quantization Layer (PDF)
  • One-Shot Video Object Segmentation (PDF)
  • Fully Convolutional Instance-aware Semantic Segmentation (PDF, Projcet/Code, Reading Note)
  • Semantic Segmentation using Adversarial Networks (PDF)
  • Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes (PDF)
  • Deep Watershed Transform for Instance Segmentation (PDF)
  • InstanceCut: from Edges to Instances with MultiCut (PDF)
  • The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation (PDF)
  • Improving Fully Convolution Network for Semantic Segmentation (PDF)
  • Video Scene Parsing with Predictive Feature Learning (PDF)
  • Training Bit Fully Convolutional Network for Fast Semantic Segmentation (PDF)
  • Pyramid Scene Parsing Network (PDF, Reading Note)
  • Mining Pixels: Weakly Supervised Semantic Segmentation Using Image Labels (PDF)
  • FastMask: Segment Object Multi-scale Candidates in One Shot (PDF, Project/Code, Reading Note)
  • A New Convolutional Network-in-Network Structure and Its Applications in Skin Detection, Semantic Segmentation, and Artifact Reduction (PDF, Reading Note)
  • FusionSeg: Learning to combine motion and appearance for fully automatic segmention of generic objects in videos (PDF)
  • Visual Saliency Prediction Using a Mixture of Deep Neural Networks (PDF)
  • PixelNet: Representation of the pixels, by the pixels, and for the pixels (PDF, Project/Code)
  • Super-Trajectory for Video Segmentation (PDF)
  • Understanding Convolution for Semantic Segmentation (PDF, Reading Note)
  • Adversarial Examples for Semantic Image Segmentation (PDF)
  • Large Kernel Matters — Improve Semantic Segmentation by Global Convolutional Network (PDF)
  • Deep Image Matting (PDF, Reading Note)
  • Mask R-CNN (PDF, Caffe Implementation, TuSimple Implementation on MXNet, TensorFlow Implementation, Reading Note)
  • Predicting Deeper into the Future of Semantic Segmentation (PDF)
  • Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks (PDF, Project/Code)
  • One-Shot Video Object Segmentation (PDF, Project/Code)
  • Semantic Instance Segmentation via Deep Metric Learning (PDF)
  • Not All Pixels Are Equal: Difficulty-aware Semantic Segmentation via Deep Layer Cascade (PDF)
  • Semantically-Guided Video Object Segmentation (PDF)
  • Recurrent Multimodal Interaction for Referring Image Segmentation (PDF)
  • Loss Max-Pooling for Semantic Image Segmentation (PDF)
  • Reformulating Level Sets as Deep Recurrent Neural Network Approach to Semantic Segmentation (PDF)
  • Learning Video Object Segmentation with Visual Memory (PDF)
  • A Review on Deep Learning Techniques Applied to Semantic Segmentation (PDF)
  • BiSeg: Simultaneous Instance Segmentation and Semantic Segmentation with Fully Convolutional Networks (PDF)
  • Rethinking Atrous Convolution for Semantic Image Segmentation (PDF)
  • Discriminative Localization in CNNs for Weakly-Supervised Segmentation of Pulmonary Nodules (PDF)
  • Superpixel-based semantic segmentation trained by statistical process control (PDF)
  • The Devil is in the Decoder (PDF)
  • Semantic Segmentation with Reverse Attention (PDF)
  • Learning Deconvolution Network for Semantic Segmentation (PDF, Project/Code)
  • Depth Adaptive Deep Neural Network for Semantic Segmentation (PDF)
  • Semantic Instance Segmentation with a Discriminative Loss Function (PDF)
  • A Cost-Sensitive Visual Question-Answer Framework for Mining a Deep And-OR Object Semantics from Web Images (PDF)
  • ICNet for Real-Time Semantic Segmentation on High-Resolution Images (PDF, Project/Code)
  • Pyramid Scene Parsing Network (PDF, Project/Code, Reading Note)
  • Learning to Segment Instances in Videos with Spatial Propagation Network (PDF, Project/Code)
  • Learning Affinity via Spatial Propagation Networks (PDF, Project/Code)

TRACKING

  • Tracking Objects as Points (PDF, Project/Code)
  • Deeper and Wider Siamese Networks for Real-Time Visual Tracking (PDF)
  • Multiple People Tracking Using Hierarchical Deep Tracklet Re-identification (PDF)
  • Fully-Convolutional Siamese Networks for Object Tracking (PDF)
  • Joint Flow: Temporal Flow Fields for Multi Person Tracking (PDF)
  • Trajectory Factory: Tracklet Cleaving and Re-connection by Deep Siamese Bi-GRU for Multiple Object Tracking (PDF)
  • Machine Learning Methods for Solving Assignment Problems in Multi-Target Tracking (PDF)
  • Multi-Target, Multi-Camera Tracking by Hierarchical Clustering: Recent Progress on DukeMTMC Project (PDF)
  • Detect-and-Track: Efficient Pose Estimation in Videos (PDF)
  • Track, then Decide: Category-Agnostic Vision-based Multi-Object Tracking (PDF)
  • Spatially Supervised Recurrent Convolutional Neural Networks for Visual Object Tracking (PDF, Reading Note)
  • Joint Tracking and Segmentation of Multiple Targets (PDF, Reading Note)
  • Deep Tracking on the Move: Learning to Track the World from a Moving Vehicle using Recurrent Neural Networks (PDF)
  • Convolutional Regression for Visual Tracking (PDF)
  • Kernelized Correlation Filters(Project CODE1 CODE2)
  • Online Visual Multi-Object Tracking via Labeled Random Finite Set Filtering (PDF)
  • SANet: Structure-Aware Network for Visual Tracking (PDF)
  • Semantic tracking: Single-target tracking with inter-supervised convolutional networks (PDF)
  • On The Stability of Video Detection and Tracking (PDF)
  • Dual Deep Network for Visual Tracking (PDF)
  • Deep Motion Features for Visual Tracking (PDF)
  • Robust and Real-time Deep Tracking Via Multi-Scale Domain Adaptation (PDF, Project/Code)
  • Instance Flow Based Online Multiple Object Tracking (PDF)
  • PathTrack: Fast Trajectory Annotation with Path Supervision (PDF)
  • Good Features to Correlate for Visual Tracking (PDF)
  • Re3 : Real-Time Recurrent Regression Networks for Object Tracking (PDF)
  • Action-Decision Networks for Visual Tracking with Deep Reinforcement Learning (PDF, Project/Code)
  • Simple Online and Realtime Tracking with a Deep Association Metric (PDF)
  • Learning Policies for Adaptive Tracking with Deep Feature Cascades (PDF)
  • Recurrent Filter Learning for Visual Tracking (PDF)
  • Tracking Persons-of-Interest via Unsupervised Representation Adaptation (PDF)
  • Detect to Track and Track to Detect (PDF, Project/Code, Reading Note)

POSE ESTIMATION

  • Human Pose Estimation with Spatial Contextual Information (PDF)
  • Rethinking on Multi-Stage Networks for Human Pose Estimation (PDF)
  • Learning to Estimate 3D Human Pose and Shape from a Single Color Image (PDF, Project/Code)
  • Ordinal Depth Supervision for 3D Human Pose Estimation (PDF, Project/Code)
  • Simple Baselines for Human Pose Estimation and Tracking (PDF)
  • End-to-end Recovery of Human Shape and Pose (PDF, PROJECT/CODE, Code)
  • PersonLab: Person Pose Estimation and Instance Segmentation with a Bottom-Up, Part-Based, Geometric Embedding Model (PDF)
  • DensePose: Dense Human Pose Estimation In The Wild (PDF, Project/Code)
  • Cascaded Pyramid Network for Multi-Person Pose Estimation (PDF)
  • Chained Predictions Using Convolutional Neural Networks (PDF, Reading Note)
  • CRF-CNN: Modeling Structured Information in Human Pose Estimation (PDF)
  • Convolutional Pose Machines (PDF, Project/Code, Reading Note)
  • Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields (PDF, Project/Code, Reading Note)
  • Towards Accurate Multi-person Pose Estimation in the Wild (PDF, Reading Note)
  • Adversarial PoseNet: A Structure-aware Convolutional Network for Human Pose Estimation (PDF)
  • Coarse-to-Fine Volumetric Prediction for Single-Image 3D Human Pose (PDF, Project/Code)
  • Learning Feature Pyramids for Human Pose Estimation (PDF, Project/Code)
  • Joint Multi-Person Pose Estimation and Semantic Part Segmentation (PDF)
  • DeepPrior++: Improving Fast and Accurate 3D Hand Pose Estimation (PDF)
  • Lifting from the Deep: Convolutional 3D Pose Estimation from a Single Image (PDF)
  • Human Pose Regression by Combining Indirect Part Detection and Contextual Information (PDF)
  • Dual Path Networks for Multi-Person Human Pose Estimation (PDF)

ACTION RECOGNITION/EVENT DETECTION/VIDEO

  • Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition (PDF, Project/Code)
  • CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes (PDF, Project/Code, MxNet Version, Reading Note)
  • SlowFast Networks for Video Recognition (PDF)
  • PHD-GIFs: Personalized Highlight Detection for Automatic GIF Creation (PDF, Project/Code)
  • Superframes, A Temporal Video Segmentation (PDF)
  • Co-occurrence Feature Learning from Skeleton Data for Action Recognition and Detection with Hierarchical Aggregation (PDF)
  • 2D/3D Pose Estimation and Action Recognition using Multitask Deep Learning (PDF)
  • Real-Time End-to-End Action Detection with Two-Stream Networks (PDF)
  • Learning Video-Story Composition via Recurrent Neural Network (PDF)
  • Real-world Anomaly Detection in Surveillance Videos (PDF)
  • Fully-Coupled Two-Stream Spatiotemporal Networks for Extremely Low Resolution Action Recognition (PDF)
  • Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward (PDF, Project/Code)
  • Making a long story short: A Multi-Importance Semantic for Fast-Forwarding Egocentric Videos (PDF)
  • Attentional Pooling for Action Recognition (PDF, Project/Code)
  • Pooling the Convolutional Layers in Deep ConvNets for Action Recognition (PDF, Reading Note)
  • Two-Stream Convolutional Networks for Action Recognition in Videos (PDF, Reading Note)
  • YouTube-8M: A Large-Scale Video Classification Benchmark (PDF, Project/Code)
  • Spatiotemporal Residual Networks for Video Action Recognition (PDF)
  • An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data (PDF)
  • Fast Video Classification via Adaptive Cascading of Deep Models (PDF)
  • Video Pixel Networks (PDF)
  • Plug-and-Play CNN for Crowd Motion Analysis: An Application in Abnormal Event Detection (PDF)
  • EM-Based Mixture Models Applied to Video Event Detection (PDF)
  • Video Captioning and Retrieval Models with Semantic Attention (PDF)
  • Title Generation for User Generated Videos (PDF)
  • Review of Action Recognition and Detection Methods (PDF)
  • RECURRENT MIXTURE DENSITY NETWORK FOR SPATIOTEMPORAL VISUAL ATTENTION (PDF)
  • Self-Supervised Video Representation Learning With Odd-One-Out Networks (PDF)
  • Recurrent Memory Addressing for describing videos (PDF)
  • Online Real time Multiple Spatiotemporal Action Localisation and Prediction on a Single Platform (PDF)
  • Real-Time Video Highlights for Yahoo Esports (PDF)
  • Surveillance Video Parsing with Single Frame Supervision (PDF)
  • Anomaly Detection in Video Using Predictive Convolutional Long Short-Term Memory Networks (PDF)
  • Action Recognition with Dynamic Image Networks (PDF)
  • ActionFlowNet: Learning Motion Representation for Action Recognition (PDF)
  • Video Propagation Networks (PDF)
  • Detecting events and key actors in multi-person videos (PDF)
  • A Pursuit of Temporal Accuracy in General Activity Detection (PDF, Reading Note)
  • Tube Convolutional Neural Network (T-CNN) for Action Detection in Videos (PDF)
  • Deceiving Google’s Cloud Video Intelligence API Built for Summarizing Videos (PDF)
  • Incremental Tube Construction for Human Action Detection (PDF)
  • Unsupervised Action Proposal Ranking through Proposal Recombination (PDF)
  • CERN: Confidence-Energy Recurrent Network for Group Activity Recognition (PDF)
  • Forecasting Human Dynamics from Static Images (PDF)
  • Interpretable 3D Human Action Analysis with Temporal Convolutional Networks (PDF)
  • Training object class detectors with click supervision (PDF)
  • Skeleton-based Action Recognition with Convolutional Neural Networks (PDF)
  • Online growing neural gas for anomaly detection in changing surveillance scenes (PDF)
  • Learning Person Trajectory Representations for Team Activity Analysis (PDF)
  • Concurrence-Aware Long Short-Term Sub-Memories for Person-Person Action Recognition (PDF)
  • Video Imagination from a Single Image with Transformation Generation (PDF, Project/Code)
  • Optimizing Deep CNN-Based Queries over Video Streams at Scale (PDF, Project/Code, Reading Note)
  • Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning (PDF)
  • Predicting Human Activities Using Stochastic Grammar (PDF)
  • Discriminative convolutional Fisher vector network for action recognition (PDF)
  • Extreme Low Resolution Activity Recognition with Multi-Siamese Embedding Learning (PDF)
  • Exploiting Semantic Contextualization for Interpretation of Human Activity in Videos (PDF)
  • Lattice Long Short-Term Memory for Human Action Recognition (PDF)
  • Kinship Verification from Videos using Spatio-Temporal Texture Features and Deep Learning (PDF)
  • Fast-Forward Video Based on Semantic Extraction (PDF)
  • Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks (PDF)
  • ConvNet Architecture Search for Spatiotemporal Feature Learning (PDF, Project/Code, Github)
  • Fully Context-Aware Video Prediction (PDF)

FACE

  • BlazeFace: Sub-millisecond Neural Face Detection on Mobile GPUs (PDF)
  • A Dataset and Benchmark for Large-scale Multi-modal Face Anti-spoofing (PDF, Project/Code)
  • Learning towards Minimum Hyperspherical Energy (PDF, Project/Code)
  • Consensus-Driven Propagation in Massive Unlabeled Data for Face Recognition (PDF, Code/Project)
  • Arbitrary Facial Attribute Editing: Only Change What You Want (PDF, Project/Code)
  • Anchor Cascade for Efficient Face Detection (PDF)
  • Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks (PDF, Reading Note)
  • MobileFaceNets: Efficient CNNs for Accurate Real-time Face Verification on Mobile Devices (PDF)
  • Survey of Face Detection on Low-quality Images (PDF)
  • PyramidBox: A Context-assisted Single Shot Face Detector (PDF)
  • SFace: An Efficient Network for Face Detection in Large Scale Variations (PDF)
  • Deep Facial Expression Recognition: A Survey (PDF)
  • Deep Face Recognition: A Survey (PDF)
  • Deep Semantic Face Deblurring (PDF, Project/Code)
  • Evaluation of Dense 3D Reconstruction from 2D Face Images in the Wild (PDF)
  • SSH: Single Stage Headless Face Detector (PDF, Project/Code)
  • Detecting and counting tiny faces (PDF, Project/Code)
  • Training Deep Face Recognition Systems with Synthetic Data (PDF)
  • Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification (PDF, Project/Code)
  • Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks (PDF, Project/Code, Code Caffe)
  • Deep Architectures for Face Attributes (PDF)
  • Face Detection with End-to-End Integration of a ConvNet and a 3D Model (PDF, Reading Note, Project/Code)
  • A CNN Cascade for Landmark Guided Semantic Part Segmentation (PDF, Project/Code)
  • Kernel Selection using Multiple Kernel Learning and Domain Adaptation in Reproducing Kernel Hilbert Space, for Face Recognition under Surveillance Scenario (PDF)
  • An All-In-One Convolutional Neural Network for Face Analysis (PDF)
  • Fast Face-swap Using Convolutional Neural Networks (PDF)
  • Cross-Age Reference Coding for Age-Invariant Face Recognition and Retrieval (Project/Code)
  • CMS-RCNN: Contextual Multi-Scale Region-based CNN for Unconstrained Face Detection (Project/Code)
  • Face Synthesis from Facial Identity Features (PDF)
  • DeepFace: Face Generation using Deep Learning (PDF)
  • Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns (PDF, Project/Code)
  • EmotioNet Challenge: Recognition of facial expressions of emotion in the wild (PDF)
  • Unrestricted Facial Geometry Reconstruction Using Image-to-Image Translation (PDF)
  • Semi and Weakly Supervised Semantic Segmentation Using Generative Adversarial Network (PDF)
  • Deep Alignment Network: A convolutional neural network for robust face alignment (PDF, Project/Code)
  • Scale-Aware Face Detection (PDF)
  • SSH: Single Stage Headless Face Detector (PDF)
  • AffectNet: A Database for Facial Expression, Valence, and Arousal Computing in the Wild (PDF)
  • SphereFace: Deep Hypersphere Embedding for Face Recognition (PDF, Project/Code)
  • Age Group and Gender Estimation in the Wild with Deep RoR Architecture (PDF)
  • Island Loss for Learning Discriminative Features in Facial Expression Recognition (PDF)
  • Temporal Non-Volume Preserving Approach to Facial Age-Progression and Age-Invariant Face Recognition (PDF)

OPTICAL FLOW

  • LiteFlowNet: A Lightweight Convolutional Neural Network for Optical Flow Estimation (PDF, Project/Code)
  • DeepFlow: Large displacement optical flow with deep matching (PDF, Project/Code)
  • Guided Optical Flow Learning (PDF)

IMAGE PROCESSING

  • R2D2: Repeatable and Reliable Detector and Descriptor (PDF)
  • CartoonGAN: Generative Adversarial Networks for Photo Cartoonization (PDF)
  • Image Inpainting for Irregular Holes Using Partial Convolutions (PDF)
  • Neural Aesthetic Image Reviewer (PDF, Reading Note)
  • Automatic Image Cropping for Visual Aesthetic Enhancement Using Deep Neural Networks and Cascaded Regression (PDF)
  • Learning Intelligent Dialogs for Bounding Box Annotation (PDF)
  • Real-time video stabilization and mosaicking for monitoring and surveillance (PDF, Project/Code)
  • Learning Recursive Filter for Low-Level Vision via a Hybrid Neural Network (PDF, Project/Code)
  • Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding(PDF, Project/Code)
  • A Learned Representation For Artistic Style(PDF)
  • Let there be Color!: Joint End-to-end Learning of Global and Local Image Priors for Automatic Image Colorization with Simultaneous Classification (PDF, Project/Code)
  • Pixel Recurrent Neural Networks (PDF)
  • Conditional Image Generation with PixelCNN Decoders (PDF, Project/Code)
  • RAISR: Rapid and Accurate Image Super Resolution (PDF)
  • Photo-Quality Evaluation based on Computational Aesthetics: Review of Feature Extraction Techniques (PDF)
  • Fast color transfer from multiple images (PDF)
  • Bringing Impressionism to Life with Neural Style Transfer in Come Swim (PDF)
  • PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications (PDF, (Project/CODE)[https://github.com/openai/pixel-cnn])
  • Deep Photo Style Transfer (PDF)
  • A Neural Representation of Sketch Drawings (PDF)
  • Visual Attribute Transfer through Deep Image Analogy (PDF)
  • Deep Semantics-Aware Photo Adjustment (PDF)
  • Diversified Texture Synthesis with Feed-forward Networks (PDF, Project/Code)
  • Real-Time Neural Style Transfer for Videos (PDF)
  • Creatism: A deep-learning photographer capable of creating professional work (PDF)
  • Deep Image Harmonization (PDF, Project/Code)
  • Neural Color Transfer between Images (PDF)
  • Deeper, Broader and Artier Domain Generalization (PDF)

3D/DEPTH/POINT CLOUD

  • The Perfect Match: 3D Point Cloud Matching with Smoothed Densities (PDF, Project/Code)
  • Pix3D: Dataset and Methods for Single-Image 3D Shape Modeling (PDF, Project/Code)
  • Depth from Videos in the Wild: Unsupervised Monocular Depth Learning from Unknown Cameras (PDF)

CNN AND DEEP LEARNING

  • ResNeSt: Split-Attention Networks (PDF, Project/Code, Reading Note)
  • Meta-Learning in Neural Networks: A Survey (PDF, )
  • Generalizing from a Few Examples: A Survey on Few-Shot Learning (PDF)
  • NBDT: Neural-Backed Decision Trees (PDF, Project/Code, Github, Reading Note)
  • Interpretable CNNs (PDF)
  • Bag of Tricks for Image Classification with Convolutional Neural Networks (PDF)
  • How Does Batch Normalization Help Optimization? (PDF, VIDEO)
  • https://arxiv.org/abs/1805.07883 (PDF)
  • Rethinking ImageNet Pre-training (PDF)
  • Learning From Positive and Unlabeled Data: A Survey (PDF)
  • Gather-Excite: Exploiting Feature Context in Convolutional Neural Networks (PDF, Project/Code)
  • DropBlock: A regularization method for convolutional networks (PDF)
  • Differentiable Abstract Interpretation for Provably Robust Neural Networks (PDF, Project/Code)
  • Adding One Neuron Can Eliminate All Bad Local Minima (PDF)
  • Step Size Matters in Deep Learning (PDF)
  • Do Better ImageNet Models Transfer Better? (PDF)
  • Robust Classification with Convolutional Prototype Learning (PDF, Project/Code)
  • Fast Feature Extraction with CNNs with Pooling Layers (PDF)
  • Network Transplanting (PDF)
  • An Information-Theoretic View for Deep Learning (PDF)
  • Understanding Individual Neuron Importance Using Information Theory (PDF)
  • Understanding Convolutional Neural Network Training with Information Theory (PDF)
  • The unreasonable effectiveness of the forget gate (PDF)
  • Discovering Hidden Factors of Variation in Deep Networks (PDF)
  • Regularizing Deep Networks by Modeling and Predicting Label Structure (PDF)
  • Hierarchical Novelty Detection for Visual Object Recognition (PDF)
  • Guide Me: Interacting with Deep Networks (PDF)
  • Studying Invariances of Trained Convolutional Neural Networks (PDF)
  • Deep Residual Networks and Weight Initialization (PDF)
  • WNGrad: Learn the Learning Rate in Gradient Descent (PDF)
  • Understanding the Loss Surface of Neural Networks for Binary Classification (PDF)
  • Tell Me Where to Look: Guided Attention Inference Network (PDF)
  • Convolutional Neural Networks with Alternately Updated Clique (PDF, Project/Code)
  • Visual Interpretability for Deep Learning: a Survey (PDF)
  • Threat of Adversarial Attacks on Deep Learning in Computer Vision: A Survey (PDF)
  • CNNs are Globally Optimal Given Multi-Layer Support (PDF)
  • Take it in your stride: Do we need striding in CNNs? (PDF)
  • Gradients explode - Deep Networks are shallow - ResNet explained (PDF)
  • Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates (PDF, Project/Code)
  • Data Distillation: Towards Omni-Supervised Learning (PDF)
  • Peephole: Predicting Network Performance Before Training (PDF)
  • AdaBatch: Adaptive Batch Sizes for Training Deep Neural Networks (PDF)
  • Gradual Tuning: a better way of Fine Tuning the parameters of a Deep Neural Network (PDF)
  • CondenseNet: An Efficient DenseNet using Learned Group Convolutions (PDF, Project/Code)
  • Population Based Training of Neural Networks (PDF)
  • Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN (PDF)
  • Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions (PDF)
  • Unleashing the Potential of CNNs for Interpretable Few-Shot Learning (PDF)
  • Non-local Neural Networks (PDF, Caffe2)
  • Log-DenseNet: How to Sparsify a DenseNet (PDF)
  • Don’t Decay the Learning Rate, Increase the Batch Size (PDF)
  • Guarding Against Adversarial Domain Shifts with Counterfactual Regularization (PDF)
  • UberNet: Training a ‘Universal’ Convolutional Neural Network for Low-, Mid-, and High-Level Vision using Diverse Datasets and Limited Memory (PDF, Project/Code)
  • What makes ImageNet good for transfer learning? (PDF, Project/Code, Reading Note)

    The tremendous success of features learnt using the ImageNet classification task on a wide range of transfer tasks begs the question: what are the intrinsic properties of the ImageNet dataset that are critical for learning good, general-purpose features? This work provides an empirical investigation of various facets of this question: Is more pre-training data always better? How does feature quality depend on the number of training examples per class? Does adding more object classes improve performance? For the same data budget, how should the data be split into classes? Is fine-grained recognition necessary for learning good features? Given the same number of training classes, is it better to have coarse classes or fine-grained classes? Which is better: more classes or more examples per class?

  • Understanding and Improving Convolutional Neural Networks via Concatenated Rectified Linear Units (PDF)

  • Densely Connected Convolutional Networks (PDF, Project/Code, Reading Note)
  • Decoupled Neural Interfaces using Synthetic Gradients (PDF)

    Training directed neural networks typically requires forward-propagating data through a computation graph, followed by backpropagating error signal, to produce weight updates. All layers, or more generally, modules, of the network are therefore locked, in the sense that they must wait for the remainder of the network to execute forwards and propagate error backwards before they can be updated. In this work we break this constraint by decoupling modules by introducing a model of the future computation of the network graph. These models predict what the result of the modeled sub-graph will produce using only local information. In particular we focus on modeling error gradients: by using the modeled synthetic gradient in place of true backpropagated error gradients we decouple subgraphs, and can update them independently and asynchronously.

  • Rethinking the Inception Architecture for Computer Vision (PDF, Reading Note)

    In this paper, several network designing choices are discussed, including factorizing convolutions into smaller kernels and asymmetric kernels, utility of auxiliary classifiers and reducing grid size using convolution stride rather than pooling.

  • Factorized Convolutional Neural Networks (PDF, Reading Note)

  • Do semantic parts emerge in Convolutional Neural Networks? (PDF, Reading Note)
  • A Critical Review of Recurrent Neural Networks for Sequence Learning (PDF)
  • Image Compression with Neural Networks (Project/Code)
  • Graph Convolutional Networks (Project/Code)
  • Understanding intermediate layers using linear classifier probes (PDF, Reading Note)
  • Learning What and Where to Draw (PDF, Project/Code)
  • On the interplay of network structure and gradient convergence in deep learning (PDF)
  • Deep Learning with Separable Convolutions (PDF)
  • Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization (PDF, Project/Code)
  • Optimization of Convolutional Neural Network using Microcanonical Annealing Algorithm (PDF)
  • Deep Pyramidal Residual Networks (PDF)
  • Impatient DNNs - Deep Neural Networks with Dynamic Time Budgets (PDF)
  • Uncertainty in Deep Learning (PDF, Project/Code)
    This is the PhD Thesis of Yarin Gal.
  • Tensorial Mixture Models (PDF, Project/Code)
  • Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks (PDF)
  • Why Deep Neural Networks? (PDF)
  • Local Similarity-Aware Deep Feature Embedding (PDF)
  • A Review of 40 Years of Cognitive Architecture Research: Focus on Perception, Attention, Learning and Applications (PDF)
  • Professor Forcing: A New Algorithm for Training Recurrent Networks (PDF)
  • On the expressive power of deep neural networks(PDF)
  • What Is the Best Practice for CNNs Applied to Visual Instance Retrieval? (PDF)
  • Deep Convolutional Neural Network Design Patterns (PDF, Project/Code)
  • Tricks from Deep Learning (PDF)
  • A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models (PDF)
  • Multi-Shot Mining Semantic Part Concepts in CNNs (PDF)
  • Aggregated Residual Transformations for Deep Neural Networks (PDF, Reading Note)
  • PolyNet: A Pursuit of Structural Diversity in Very Deep Networks (PDF)
  • On the Exploration of Convolutional Fusion Networks for Visual Recognition (PDF)
  • ResFeats: Residual Network Based Features for Image Classification (PDF)
  • Object Recognition with and without Objects (PDF)
  • LCNN: Lookup-based Convolutional Neural Network (PDF, Reading Note)
  • Inductive Bias of Deep Convolutional Networks through Pooling Geometry (PDF, Project/Code)
  • Wider or Deeper: Revisiting the ResNet Model for Visual Recognition (PDF, Reading Note)
  • Multi-Scale Context Aggregation by Dilated Convolutions (PDF, Project/Code)
  • Large-Margin Softmax Loss for Convolutional Neural Networks (PDF, mxnet Code, Caffe Code)
  • Adversarial Examples Detection in Deep Networks with Convolutional Filter Statistics (PDF)
  • Feedback Networks (PDF)
  • Visualizing Residual Networks (PDF)
  • Convolutional Oriented Boundaries: From Image Segmentation to High-Level Tasks (PDF, Project/Code)
  • Understanding trained CNNs by indexing neuron selectivity (PDF)
  • Benchmarking State-of-the-Art Deep Learning Software Tools (PDF, Project/Code)
  • Batch Renormalization: Towards Reducing Minibatch Dependence in Batch-Normalized Models (PDF)
  • Visualizing Deep Neural Network Decisions: Prediction Difference Analysis (PDF, Project/Code)
  • ShaResNet: reducing residual network parameter number by sharing weights (PDF)
  • Deep Forest: Towards An Alternative to Deep Neural Networks (PDF, Project/Code)
  • All You Need is Beyond a Good Init: Exploring Better Solution for Training Extremely Deep Convolutional Neural Networks with Orthonormality and Modulation (PDF)
  • Genetic CNN (PDF)
  • Deformable Convolutional Networks (PDF)
  • Quality Resilient Deep Neural Networks (PDF)
  • How ConvNets model Non-linear Transformations (PDF)
  • Active Convolution: Learning the Shape of Convolution for Image Classification (PDF)
  • Multi-Scale Dense Convolutional Networks for Efficient Prediction (PDF, Project/Code)
  • Coordinating Filters for Faster Deep Neural Networks (PDF, Project/Code)
  • A Genetic Programming Approach to Designing Convolutional Neural Network Architectures (PDF)
  • On Generalization and Regularization in Deep Learning (PDF)
  • Interpretable Explanations of Black Boxes by Meaningful Perturbation (PDF)
  • Energy Propagation in Deep Convolutional Neural Networks (PDF)
  • Introspection: Accelerating Neural Network Training By Learning Weight Evolution (PDF)
  • Deeply-Supervised Nets (PDF)
  • Speeding up Convolutional Neural Networks By Exploiting the Sparsity of Rectifier Units (PDF)
  • Inception Recurrent Convolutional Neural Network for Object Recognition (PDF)
  • Residual Attention Network for Image Classification (PDF)
  • The Landscape of Deep Learning Algorithms (PDF)
  • Pixel Deconvolutional Networks (PDF)
  • Dilated Residual Networks (PDF)
  • A Kernel Redundancy Removing Policy for Convolutional Neural Network (PDF)
  • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour (PDF)
  • Learning Spatial Regularization with Image-level Supervisions for Multi-label Image Classification (PDF, Project/Code, Reading Note)
  • VisualBackProp: efficient visualization of CNNs (PDF)
  • Pruning Convolutional Neural Networks for Resource Efficient Inference (PDF, Project/Code)
  • Zero-Shot Learning - A Comprehensive Evaluation of the Good, the Bad and the Ugly (PDF, Project/Code)
  • ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices (PDF, Caffe Implementation)
  • Submanifold Sparse Convolutional Networks (PDF, Project/Code)
  • Dual Path Networks (PDF)
  • ThiNet: A Filter Level Pruning Method for Deep Neural Network Compression (PDF, Project/Code, Reading Note)
  • Memory-Efficient Implementation of DenseNets (PDF)
  • Residual Attention Network for Image Classification (PDF, Project/Code)
  • An Effective Training Method For Deep Convolutional Neural Network (PDF)
  • Learning to Transfer (PDF)
  • Learning Efficient Convolutional Networks through Network Slimming (PDF, Project/Code)
  • Super-Convergence: Very Fast Training of Residual Networks Using Large Learning Rates (PDF, Project/Code)
  • Hierarchical loss for classification (PDF)
  • Convolutional Gaussian Processes (PDF, Code/Project)
  • Interpretable Convolutional Neural Networks (PDF)
  • What Uncertainties Do We Need in Bayesian Deep Learning for Computer Vision? (PDF)
  • Porcupine Neural Networks: (Almost) All Local Optima are Global (PDF)
  • Generalization in Deep Learning (PDF)
  • A systematic study of the class imbalance problem in convolutional neural networks (PDF)
  • Interpretable Transformations with Encoder-Decoder Networks (PDF, Project/Code)
  • One pixel attack for fooling deep neural networks (PDF)

SINGLE-SHOT/UNSUPERVISED LEARNING

  • Zero-Shot Object Detection by Hybrid Region Embedding (PDF, Project/Code)
  • Deep Triplet Ranking Networks for One-Shot Recognition (PDF)
  • Avatar-Net: Multi-scale Zero-shot Style Transfer by Feature Decoration (PDF)

GAN

  • A Survey on GANs for Anomaly Detection (PDF)
  • Outfit Generation and Style Extraction via Bidirectional LSTM and Autoencoder (PDF)
  • Pioneer Networks: Progressively Growing Generative Autoencoder (PDF)
  • Transferring GANs: generating images from limited data (PDF, Project/Code)
  • Painting Generation Using Conditional Generative Adversarial Net (PDF, Project/Code)
  • MGGAN: Solving Mode Collapse using Manifold Guided Training (PDF)
  • Multimodal Unsupervised Image-to-Image Translation (PDF, Project/Code)
  • Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond (PDF)
  • Face Aging with Contextual Generative Adversarial Nets (PDF, Project/Code)
  • Deformable GANs for Pose-based Human Image Generation (PDF, Project/Code)
  • ComboGAN: Unrestrained Scalability for Image Domain Translation (PDF, Project/Code)
  • Eye In-Painting with Exemplar Generative Adversarial Networks (PDF)
  • Disentangled Person Image Generation (PDF)
  • Fader Networks: Manipulating Images by Sliding Attributes (PDF, Code/Project)
  • Are GANs Created Equal? A Large-Scale Study (PDF)
  • StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation (PDF, Project/Code)
  • Two Birds with One Stone: Iteratively Learn Facial Attributes with GANs (PDF, Project/Code)
  • Spectral Normalization for Generative Adversarial Networks (PDF)
  • XGAN: Unsupervised Image-to-Image Translation for many-to-many Mappings (PDF)
  • How Generative Adversarial Nets and its variants Work: An Overview of GAN (PDF)
  • DNA-GAN: Learning Disentangled Representations from Multi-Attribute Images (PDF, Project/Code)
  • Sobolev GAN (PDF)
  • Data Augmentation Generative Adversarial Networks (PDF)
  • Conditional Autoencoders with Adversarial Information Factorization (PDF, Project/Code)
  • Progressive Growing of GANs for Improved Quality, Stability, and Variation (PDF, Project/Code, Torch, PyTorch, Reading Note)
  • Bayesian GAN (PDF, Project/Code)
  • Metric Learning-based Generative Adversarial Network (PDF)
  • Flexible Prior Distributions for Deep Generative Models (PDF)
  • Data Augmentation in Classification using GAN (PDF)
  • Semantically Decomposing the Latent Spaces of Generative Adversarial Networks (PDF)
  • Multi-View Data Generation Without View Supervision (PDF)
  • StackGAN++: Realistic Image Synthesis with Stacked Generative Adversarial Networks (PDF)
  • Generative Adversarial Networks (PDF)
  • Stacked Generative Adversarial Networks (PDF)
  • Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks (PDF)
  • Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks (PDF)
  • Deep Generative Image Models using a Laplacian Pyramid of Adversarial Networks (PDF)
  • NIPS 2016 Tutorial: Generative Adversarial Networks (PDF)
  • Wasserstein GAN (PDF)
  • Adversarial Discriminative Domain Adaptation (PDF, Reading Note)
  • Generative Adversarial Nets with Labeled Data by Activation Maximization (PDF)
  • Triple Generative Adversarial Nets (PDF)
  • On the Quantative Evaluation of Deep Generative Models (PDF)
  • Adversarial Transformation Networks: Learning to Generate Adversarial Examples (PDF)
  • Improved Training of Wasserstein GANs (PDF, Project/Code)
  • Generate To Adapt: Aligning Domains using Generative Adversarial Networks (PDF)
  • Adversarial Generator-Encoder Networks (PDF, Project/Code)
  • Training Triplet Networks with GAN (PDF)
  • Multi-Agent Diverse Generative Adversarial Networks (PDF)
  • GP-GAN: Towards Realistic High-Resolution Image Blending (PDF, Project/Code)
  • BEGAN: Boundary Equilibrium Generative Adversarial Networks (PDF)
  • MAGAN: Margin Adaptation for Generative Adversarial Networks (PDF)
  • Pose Guided Person Image Generation (PDF)
  • On the Effects of Batch and Weight Normalization in Generative Adversarial Networks (PDF, Project/Code)
  • Aesthetic-Driven Image Enhancement by Adversarial Learning (PDF)
  • VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning (PDF, Project/Code
  • MoCoGAN: Decomposing Motion and Content for Video Generation (PDF, Project/Code)
  • Generative Adversarial Networks: An Overview ((PDF)[https://arxiv.org/abs/1710.07035])
  • SalGAN: Visual Saliency Prediction with Generative Adversarial Networks (PDF, Project/Code)

MACHINE LEARNING

LIGHT-WEIGHT MODEL/EMBEDDED/MOBILE/MODEL COMPRESSION

  • MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning (PDF, Project/Code)
  • PyTorch Network Slimming (PDF, Project/Code)
  • Importance Estimation for Neural Network Pruning (PDF, Project/Code)
  • MetaPruning: Meta Learning for Automatic Neural Network Channel Pruning (PDF)
  • EFFICIENT METHODS AND HARDWARE FOR DEEP LEARNING (PDF)
  • ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware (PDF, Project/Code)
  • FD-MobileNet: Improved MobileNet with a Fast Downsampling Strategy (PDF)
  • Quantization Mimic: Towards Very Tiny CNN for Object Detection (PDF)
  • Pelee: A Real-Time Object Detection System on Mobile Devices (PDF, Project/Code, TensorRT Implemented, Reading Note)
  • MobileNetV2: Inverted Residuals and Linear Bottlenecks (PDF, Reading Note)
  • SBNet: Sparse Blocks Network for Fast Inference (PDF, Project/Code)
  • IGCV2: Interleaved Structured Sparse Convolutional Neural Networks (PDF)
  • FitNets: Hints for Thin Deep Nets (PDF)
  • Building Efficient ConvNets using Redundant Feature Pruning (PDF, Project/Code)
  • Multi-Scale Dense Networks for Resource Efficient Image Classification (PDF)
  • Net-Trim: Convex Pruning of Deep Neural Networks with Performance Guarantee (pdf)
  • NISP: Pruning Networks using Neuron Importance Score Propagation (PDF)
  • Caffeinated FPGAs: FPGA Framework For Convolutional Neural Networks (PDF)
  • Comprehensive Evaluation of OpenCL-based Convolutional Neural Network Accelerators in Xilinx and Altera FPGAs (PDF)
  • FINN: A Framework for Fast, Scalable Binarized Neural Network Inference (PDF)
  • Two-Bit Networks for Deep Learning on Resource-Constrained Embedded Devices (PDF)
  • SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <0.5MB model size (PDF, Project/Code)
  • MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications (PDF, Caffe Implementation, Reading Note)
  • Binarized Convolutional Neural Networks with Separable Filters for Efficient Hardware Acceleration (PDF)
  • Channel Pruning for Accelerating Very Deep Neural Networks (PDF, Project/Code)
  • Quantized Convolutional Neural Networks for Mobile Devices (PDF, Project/Code)
  • Squeeze-and-Excitation Networks (PDF)
  • Domain-adaptive deep network compression (PDF)
  • Embedded Binarized Neural Networks (PDF)
  • Keynote: Small Neural Nets Are Beautiful: Enabling Embedded Systems with Small Deep-Neural-Network Architectures (PDF)
  • A Survey of Model Compression and Acceleration for Deep Neural Networks ([https://arxiv.org/abs/1710.09282])

ReID

  • Video-based Person Re-identification via 3D Convolutional Networks and Non-local Attention (PDF)
  • Attention-Aware Compositional Network for Person Re-identification (PDF)
  • Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification (PDF, Project/Code)
  • Features for Multi-Target Multi-Camera Tracking and Re-Identification (PDF)
  • Video Person Re-identification by Temporal Residual Learning (PDF)
  • Harmonious Attention Network for Person Re-Identification (PDF)
  • In Defense of the Triplet Loss for Person Re-Identification (PDF)
  • Deep Spatial Feature Reconstruction for Partial Person Re-identification: Alignment-Free Approach (PDF)
  • AlignedReID: Surpassing Human-Level Performance in Person Re-Identification (PDF)
  • A Discriminatively Learned CNN Embedding for Person Re-identification (PDF, Project/Code)
  • Learning Deep Neural Networks for Vehicle Re-ID with Visual-spatio-temporal Path Proposals (PDF)
  • Beyond triplet loss: a deep quadruplet network for person re-identification (PDF)
  • Person Re-identification by Local Maximal Occurrence Representation and Metric Learning (PDF, Project/Code)
  • Person Re-identification: Past, Present and Future (PDF)
  • Unsupervised Person Re-identification: Clustering and Fine-tuning (PDF, Project/Code)
  • Jointly Attentive Spatial-Temporal Pooling Networks for Video-based Person Re-Identification (PDF)
  • Divide and Fuse: A Re-ranking Approach for Person Re-identification (PDF)
  • Learning Deep Context-aware Features over Body and Latent Parts for Person Re-identification (PDF)
  • HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis (PDF, Project/Code)

FASHION

  • Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba (PDF)
  • Visually-Aware Fashion Recommendation and Design with Generative Image Models (PDF)
  • Be Your Own Prada: Fashion Synthesis with Structural Coherence (PDF, Project/Code, Reading Note)
  • Style2Vec: Representation Learning for Fashion Items from Style Sets (PDF)
  • Dress like a Star: Retrieving Fashion Products from Videos (PDF)
  • The Conditional Analogy GAN: Swapping Fashion Articles on People Images (PDF)

OTHER

  • GaitSet: Regarding Gait as a Set for Cross-View Gait Recognition (PDF, Project/Code)
  • Deep Clustering for Unsupervised Learning of Visual Features (PDF)
  • Detecting Visual Relationships Using Box Attention (PDF)
  • Zoom-Net: Mining Deep Feature Interactions for Visual Relationship Recognition (PDF, Project/Code)
  • Learning to See in the Dark(PDF)
  • A Variational U-Net for Conditional Appearance and Shape Generation (PDF, Project/Code)
  • Synthesizing Images of Humans in Unseen Poses (PDF)
  • End-to-end weakly-supervised semantic alignment (PDF, Project/Code)
  • Dense Optical Flow based Change Detection Network Robust to Difference of Camera Viewpoints (PDF)
  • Dual-Path Convolutional Image-Text Embedding (PDF, Project/Code)
  • The Promise and Peril of Human Evaluation for Model Interpretability (PDF)
  • Semantic Image Retrieval via Active Grounding of Visual Situations (PDF)
  • LIFT: Learned Invariant Feature Transform (PDF)
  • Learning Aligned Cross-Modal Representations from Weakly Aligned Data (PDF, Project/Code)
  • Multi-Task Curriculum Transfer Deep Learning of Clothing Attributes (PDF)
  • End-to-end Learning of Deep Visual Representations for Image Retrieval (PDF)
  • SoundNet: Learning Sound Representations from Unlabeled Video (PDF)
  • Bags of Local Convolutional Features for Scalable Instance Search (PDF, Project/Code)
  • Universal Correspondence Network (PDF, Project/Code)
  • Judging a Book By its Cover (PDF)
  • Generalisation and Sharing in Triplet Convnets for Sketch based Visual Search (PDF)
  • Analysis and Optimization of Loss Functions for Multiclass, Top-k, and Multilabel Classification (PDF)
  • Automatic generation of large-scale handwriting fonts via style learning (PDF)
  • Image Retrieval with Deep Local Features and Attention-based Keypoints (PDF)
  • Visual Discovery at Pinterest (PDF)
  • Learning to Detect Human-Object Interactions (PDF, Project/Code, Reading Note)
  • Learning Deep Features via Congenerous Cosine Loss for Person Recognition (PDF)
  • Large-Scale Evolution of Image Classifiers (PDF)
  • Deep Variation-structured Reinforcement Learning for Visual Relationship and Attribute Detection (PDF)
  • Twitter100k: A Real-world Dataset for Weakly Supervised Cross-Media Retrieval (PDF, Project/Code)
  • Mixture of Counting CNNs: Adaptive Integration of CNNs Specialized to Specific Appearance for Crowd Counting (PDF)
  • Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art (PDF, Project/Code)
  • Learning Features by Watching Objects Move (PDF, Project/Code)
  • GMS: Grid-based Motion Statistics for Fast, Ultra-robust Feature Correspondence (PDF, Project/Code)
  • ResnetCrowd: A Residual Deep Learning Architecture for Crowd Counting, Violent Behaviour Detection and Crowd Density Level Classification (PDF)
  • Learning Cross-modal Embeddings for Cooking Recipes and Food Images (PDF, Project/Code)
  • Convolutional neural network architecture for geometric matching (PDF, Project/Code)
  • Semantic Compositional Networks for Visual Captioning (PDF, Project/Code)
  • CNN-based Cascaded Multi-task Learning of High-level Prior and Density Estimation for Crowd Counting (PDF)
  • Understanding Black-box Predictions via Influence Functions (PDF)
  • Learning a Repression Network for Precise Vehicle Search (PDF)
  • Visual Graph Mining (PDF)
  • A Deep Multimodal Approach for Cold-start Music Recommendation (PDF)
  • A Multilayer-Based Framework for Online Background Subtraction with Freely Moving Cameras (PDF)
  • A self-organizing neural network architecture for learning human-object interactions (PDF)

INTERESTING FINDS

RESOURCES/PERSPECTIVES

PROJECTS

NEWS/BLOGS

BENCHMARK/LEADERBOARD/DATASET

TOOLKITS

  • XGBoostLSS An extension of XGBoost to probabilistic forecasting
  • Netron is a viewer for neural network, deep learning and machine learning models.
  • Bring Deep Learning to small devices An open source deep learning platform for low bit computation
  • Albumentations fast image augmentation library and easy to use wrapper around other libraries.
  • FeatherCNN
    FeatherCNN is a high performance inference engine for convolutional neural networks.
  • Caffe
    Caffe is a deep learning framework made with expression, speed, and modularity in mind. It is developed by the Berkeley Vision and Learning Center (BVLC) and by community contributors. Yangqing Jia created the project during his PhD at UC Berkeley. Caffe is released under the BSD 2-Clause license.
  • Caffe2
    Caffe2 is a deep learning framework made with expression, speed, and modularity in mind. It is an experimental refactoring of Caffe, and allows a more flexible way to organize computation.
  • Caffe on Intel
    This fork of BVLC/Caffe is dedicated to improving performance of this deep learning framework when running on CPU, in particular Intel® Xeon processors (HSW+) and Intel® Xeon Phi processors
  • TensorFlow
    TensorFlow is an open source software library for numerical computation using data flow graphs. Nodes in the graph represent mathematical operations, while the graph edges represent the multidimensional data arrays (tensors) that flow between them. This flexible architecture lets you deploy computation to one or more CPUs or GPUs in a desktop, server, or mobile device without rewriting code. TensorFlow also includes TensorBoard, a data visualization toolkit.
  • MXNet
    MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to mix the flavours of symbolic programming and imperative programming to maximize efficiency and productivity. In its core, a dynamic dependency scheduler that automatically parallelizes both symbolic and imperative operations on the fly. A graph optimization layer on top of that makes symbolic execution fast and memory efficient. The library is portable and lightweight, and it scales to multiple GPUs and multiple machines.
  • neon
    neon is Nervana’s Python based Deep Learning framework and achieves the fastest performance on modern deep neural networks such as AlexNet, VGG and GoogLeNet. Designed for ease-of-use and extensibility.
  • Piotr’s Computer Vision Matlab Toolbox
    This toolbox is meant to facilitate the manipulation of images and video in Matlab. Its purpose is to complement, not replace, Matlab’s Image Processing Toolbox, and in fact it requires that the Matlab Image Toolbox be installed. Emphasis has been placed on code efficiency and code reuse. Thanks to everyone who has given me feedback - you’ve helped make this toolbox more useful and easier to use.
  • NVIDIA Developer
  • nvCaffe
    A special branch of caffe is used on TX1 which includes support for FP16.
  • dlib
    Dlib is a modern C++ toolkit containing machine learning algorithms and tools for creating complex software in C++ to solve real world problems. It is used in both industry and academia in a wide range of domains including robotics, embedded devices, mobile phones, and large high performance computing environments. Dlib’s open source licensing allows you to use it in any application, free of charge.
  • OpenCV
    OpenCV is released under a BSD license and hence it’s free for both academic and commercial use. It has C++, C, Python and Java interfaces and supports Windows, Linux, Mac OS, iOS and Android. OpenCV was designed for computational efficiency and with a strong focus on real-time applications.
  • CNNdroid
    CNNdroid is an open source library for execution of trained convolutional neural networks on Android devices.
  • tiny dnn
    tiny-dnn is a C++11 implementation of deep learning. It is suitable for deep learning on limited computational resource, embedded systems and IoT devices.

    An introduction to this toolkit at《Deep learning with C++ - an introduction to tiny-dnn》by Taiga Nomi

  • CaffeMex
    A multi-GPU & memory-reduced MAT-Caffe on LINUX and WINDOWS

  • ARCore ARCore is a platform for building augmented reality apps on Android. ARCore uses three key technologies to integrate virtual content with the real world as seen through your phone’s camera
  • CNTK Microsoft Cognitive Toolkit (CNTK), an open source deep-learning toolkit.
  • ONNX ONNX is a open format to represent deep learning models. With ONNX, AI developers can more easily move models between state-of-the-art tools and choose the combination that is best for them. ONNX is developed and supported by a community of partners.
  • PyToune is a Keras-like framework for PyTorch and handles much of the boilerplating code needed to train neural networks.
  • Deep Learning Studio - Desktop DeepCognition.ai is a single user solution that runs locally on your hardware. Desktop version allows you to train models on your GPU(s) without uploading data to the cloud. The platform supports transparent multi-GPU training for up to 4 GPUs. Additional GPUs are supported in Deep Learning Studio – Enterprise.

LEARNING/TRICKS/TIPS

SKILLS

ABOUT CAFFE

SETTING UP

TITLE: ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation

AUTHER: Adam Paszke, Abhishek Chaurasia, Sangpil Kim, Eugenio Culurciello

ASSOCIATION: University of Warsaw, Purdue University

FROM: arXiv:1606.02147

CONTRIBUTIONS

  1. A novel deep neural network architecture named ENet (efficient neural network) is propsed, which is quite efficient.
  2. A serie of designing strategies is discussed.

Design Choices

Network Architecture

Readers could refer to the paper to have a look at the network architecture. The network is inspired by ResNet structure, while the authers re-design it based on the specific task of semantic segmentation and their intuitions. The intial block and basic building block (bottlenect module) is shown in the following figure. After the intial block, a comparetively large encoder is constructed using the bottleneck module. On the other hand, a smaller decoder follows the encoder.

Design Strategy

  1. Feature map resolution: Small feature map resolution has two drawbacks 1) loss of finer information of edges and 2) smaller size compared with original image. The advantage is that small feature map resolution means larger receptive field and more context for the filters. The first problem is solved by adding more feature maps or unsampling technique.
  2. Early downsampling: Early downsampling is very helpful for boosting the efficiency of the network while persisting the performance. The idea is that visual information is highly redundant and that initial network layers should not directly contribute to classification but act as good feature extractors.
  3. Decoder size: In most previous works, the encoder and decoder have the same size, for example totally symmetric. In this work, the auther uses a larger encoder and a smaller decoder. The responsibility of encoder is to operate on smaller resolution data and provide for information processing and filtering. Instead, the role of the the decoder, is to upsample the output of the encoder, only fine-tuning the details.
  4. Nonlinear operations In this paper some interesting observations are carried out. The auther invetigates the effect of nonlinear operations by training the network using PReLU. All layers in the main branch behave nearly exactly like regular ReLUs, while the weights of PReLU inside bottleneck modules are negative. It means that typical identity shortcut in ResNet does not work well because of the limited depth of the network.
  5. Information-preserving dimensionality changes: A method of performing pooling operation in parallel with a convolution of stride 2 and concatenating resulting feature maps is used to guarentee efficiency and performance, just as the intial block shows.
  6. Factorizing filters: Using factorizing technique can achive a kernel of larger size while using less computations. In addition, deeper network and more times of non-linear operation helps simulate richer functions.
  7. Dilated convolutions: Dilated convolutions is a good way of maintaining feature resolution while boosting efficiency.
  8. Regularization: Spatial Dropout is used to prevent overfitting.

ADVANTAGES

  1. The network processes fast.

DISADVANTAGES

  1. The performance is comparatively inferior.

TITLE: Factorized Convolutional Neural Networks

AUTHER: Min Wang, Baoyuan Liu, Hassan Foroosh

ASSOCIATION: Department of EECS, University of Central Florida, Orlando

FROM: arXiv:1608.04337

CONTRIBUTIONS

  1. A new implementation of convolutional layer is proposed and only involves single in-channel convolution and linear channel projection.
  2. The network using such layers can achieves similar accuracy with significantly less computaion.

METHOD

Convolutional Layer with Bases

When $b = k^2$, this layer is equivalent to the standard convolutional layer. The number of multiplication required for this layer is $hwbm(k^2 + n)$, which means that by reducing b and increasing k, we create a layer that achieves large convolutional kernel while maintaining low complexity.

Convolutional Layer as Stacked Single Basis Layer

One assumption is that the number of output channels is the same as the number of input channels $m = n$, which is the case of that in ResNet. The modified layer can be considered as stacking multiple convolutional layers with single basis. Residual learning is also introduced in thie modified layer, which solves the problem of losing useful information caused by single basis.

Topological Connections

A $n$-dimensional topological connections between the input and output channels in convolutional layer is proposed. Each output channel is only connected with its local neighbors rather than all input channels.

ADVANTAGES

  1. It is an interesting method of speeding up CNN as the auther claims that the network achieves accuracy of GoogLeNet while consuming 3.4 times less computaion.

最近没有怎么看书学习,倒是看了一部2007年上映的日本TV动画——《精灵守护者》,改编自上桥菜穗子的儿童系列小说《守护人·旅人》中的第一部《精灵之守护人》,动画一共26话。作品是世界观很有意思,这个架空世界里有分为两个世界,肉眼所能看到的人类的世界及肉眼看不到的精灵的世界,这两个世界的存在很像平行世界,在特定条件下可以相互产生作用。

在世界观设定中,精灵世界里的水精灵一百年产一次卵,新生的卵在人类世界孕育,孵化后化身为水精灵返回精灵世界。水精灵保证人类世界拥有充足的降水,使得动植物繁荣地繁衍生息,人类社会可以繁荣安康。水精灵的卵在人类世界孕育时会寄生在一种生物个体体内,这个生物个体被称为“精灵守护者”。当孵化临近时,卵会指引精灵守护者来到人类世界与精灵世界的衔接处,此时与水精灵相克的土精灵会来狩猎精灵守护者,只有成功避开土精灵的狩猎才能使水精灵重返精灵世界。

在《精灵守护者》的故事中,二皇子扎克穆被选为精灵守护者,但是由于史书的错误记载,扎克穆被认为被不祥之物附身,遭到皇室的秘密追杀。女保镖巴鲁萨因为机缘巧合成为他的监护人,带着扎克穆在民间谋生躲避皇室的追杀。随着服务于皇室的观星者揭开大旱之兆的原因,逐渐意识到史书的错误,皇室也开始与巴鲁萨一行人开始合作,争取维护扎克穆的生命和人类世界的繁荣。作为根据儿童文学改编的动画,当然是大团圆结局。

B站上的网友戏称这是一部没有反派登场的故事,事实也是如此,故事中的各个人物之间虽然有着各种各样的矛盾,不管是各为其主还是历史偏见,但各个角色都拥有绝对正派的世界观。观众可以从各个角色身上体察到一种正向的人格,比如二皇子扎克穆的勇敢善良,女保镖巴鲁萨对誓言和生命的信仰,观星者修伽对真理的追求,扎克穆母亲二之妃的母爱,巴鲁萨养父吉古洛对知己的忠诚……看完之后会让人有一种充满正能量的感觉。除了这些正能量,动画中的伏笔比比皆是,任何一处细节都会在后续的故事中发挥作用,每每将这些呼应串联起来的时候,都会让人大呼过瘾。

TITLE: Do semantic parts emerge in Convolutional Neural Networks?

AUTHER: Abel Gonzalez-Garica, David Modolo, Vittorio Ferrari

ASSOCIATION: CLAVIN, University of Edingburgh, UK

FROM: arXiv:1607.03738

CONTRIBUTIONS

  1. An extensive quantitative analysis of the association between responses of CNN filters and sematic parts

METHOD

  1. CNNs are trained for object detection task or object classification.
  2. Filters that give significant responses to certain semantic parts are selected.
  3. Filters are comibned to construct a part detector if necessary.
  4. A regressor is trained for part bounding-boxes.
  5. Discriminative filters are selected in object classification task.

Observation

There are several interesting observatoins from the authers.

Differences between layers. Overall, the higher the network layer, the higher the performance. It means that in higher part of the network abstract semantic contents are represented.

Differences between part classes. Performance varies greatly across part classes. It seems that very discriminative semantic parts are well detected.

Filter combinations. Performing part detection using a combination of filters always performs better than single best filter. It means taht a semantic part may be represented jointly by several filters.

Filter sharing across part classes. Filters are shared across different part classes. It is clear that some filters are representative for a generic part and work well on all object classes containing it.

The number of emerged semantic parts. Only a modest number of filters responses to semantic parts. The auther concludes that the network does contain filters combinations that can cover some part classes well, but they do not fire exclusively on the part, making them weak part detectors. Moreover, the part classes covered by the semantic filters tend to either cover a large image area, or be very discriminative for their object class.

Discriminative filters in object classification. The filters are measured by how much they contribute to the classification score. On average, 9/256 filters are discriminative for a particular class. The total number of dicriminative filte overall 16 object classes amounts to 104. It shows that the discriminative filters are largely distributed across different object classes, with very little sharing.

Discriminative and semantic filters. 5.5 out of the 9 discriminative filters for an object class are semantic filters.It means that only a portion of the filters learned by CNN are semantic, and many are just responding to dicriminative patches.