Every coder will feel something seeing this :)
Reading Note: DetNet: A Backbone network for Object Detection
TITLE: DetNet: A Backbone network for Object Detection
AUTHOR: Xuepeng Shi, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen
ASSOCIATION: Tsinghua University, Face++
FROM: arXiv:1804.06215
CONTRIBUTION
- The inherent drawbacks of traditional ImageNet pre-trained model for fine-tunning recent object detectors is analyzed.
- A novel backbone, called DetNet, is proposed, which is specifically designed for object detection task by maintaining the spatial resolution and enlarging the receptive field.
METHOD
Motivation
There are two problems using the classification backbone for object detection tasks. (i) Recent detectors, e.g., FPN, involve extra stages compared with the backbone network for ImageNet classification in order to detect objects with various sizes. (ii) Traditional backbone produces higher receptive field based on large downsampling factor, which is beneficial to the visual classification. However, the spatial resolution is compromised which will fail to accurately localize the large objects and recognize the small objects.
To sumarize, there are 3 main problems to use current pre-trained models, including
- The number of network stages is different. It means that extra layers for object detection compared to classification has not been pretrained.
- Weak visibility of large objects. It is because The feature map with strong semantic information has large strides respect to input image, which is harmful for the object localization.
- Invisibility of small objects. The information from the small objects will be easily weaken as the spatial resolution of the feature maps is decreased and the large context information is integrated.
To address these problems, DetNet has following characteristics. (i) The number of stages is directly designed for Object Detection. (ii) Even though more stages are involved, high spatial resolution of the feature maps is mainted, while keeping large receptive field using dilated convolution.
DetNet Design
The main architecture of DetNet is designed based on ResNet-50. The first 4 stages are kept same with ResNet-50. The main differences are illustrated as follows:
- The extra stages are merged into the backbone which will be later utilized for object detection as in FPN. Meanwhile, the spatial resolution is fixed as 16x downsampling even after stage 4.
- Since the spatial size is fixed after stage 4, in order to introduce a new stage, a dilated bottleneck with $1 \times 1$ convolution projection is utilized in the begining of the each stage. The dilation convolution efficiently enlarge the receptive field.
- Since dilated convolution is still time consuming, stage 5 and stage 6 keep the same channels as stage 4 (256 input channels for bottleneck block). This is different from traditional backbone design, which will double channels in a later stage.
The following figure shows the dialted bottleneck with $1 \times 1$ conv projection and the architecture of DetNet.
PERFORMANCE
Reading Note: Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
TITLE: Real-Time Rotation-Invariant Face Detection with Progressive Calibration Networks
AUTHOR: Xuepeng Shi, Shiguang Shan, Meina Kan, Shuzhe Wu, Xilin Chen
ASSOCIATION: Chinese Academy of Sciences
FROM: arXiv:1804.06039
CONTRIBUTION
- A real-time and accurate rotation-invariant face detector with progressive calibration networks (PCN) is proposed.
- PCN divides the calibration process into several progressive steps, each of which is an easy task, rsulting in accurate calibration with low time cost. And the range of full rotation-in-plane (RIP) angles is gradually decreasing, which helps distinguish faces from non-faces.
- In the first two stages of PCN, only coarse calibrations are conducted, such as calibrations from facing down to facing up, and from facing left to facing right. On the one hand, a robust and accurate RIP angle prediction for this coarse calibration is easier to attain without extra time cost, by jointly learning calibration task with the classification task and bounding box regression task in a multi-task learning manner. On the other hand, the calibration can be easier to implement as flipping original image with quite low time cost.
METHOD
Framework
Given an image, all face candidates are obtained according to the sliding window and image pyramid principle, and each candidate window goes through the detector stage by stage. In each stage of PCN, the detector simultaneously rejects most candidates with low face confidences, regresses the bounding boxes of remaining face candidates, and calibrates the RIP orientations of the face candidates. After each stage, non-maximum suppression (NMS) is used to merge those highly overlapped candidates.
PCN progressively calibrates the RIP orientation of each face candidate to upright for better distinguishing faces from non-faces.
- PCN-1 first identifies face candidates and calibrates those facing down to facing up, halving the range of RIP angles from [$-180^{\circ}$,$180^{\circ}$] to [$-90^{\circ}$, $90^{\circ}$].
- Then the rotated face candidates are further distinguished and calibrated to an upright range of [$-45^{\circ}$, $45^{\circ}$] in PCN-2, shrinking the RIP ranges by half again.
- Finally, PCN-3 makes the accurate final decision for each face candidate to determine whether it is a face and predict the precise RIP angle. Briefly,
The following figure illustrates the framework.
First Stage PCN-1
For each input window $x$, PCN-1 has three objectives: face or non-face classification, bounding box regression, and calibration, formulated as follows:
where $F_{1}$ is the detector in the first stage structured with a small CNN. The $f$ is face confidence score, $t$ is a vector representing the prediction of bounding box regression, and $g$ is orientation score. Overall, the objective for PCN-1 in the first stage is defined as:
where $\lambda{reg}$, $\lambda{cal}$ are parameters to balance different loss. The first objective, which is also the primary objective, aims for distinguishing faces from non-faces. The second objective attempts to regress the fine bounding box. The third objective aims to predict the coarse orientation of the face candidate in a binary classification manner, telling the candidate is facing up or facing down.
The PCN-1 can be used to filter all windows to get a small number of face candidates. For the remaining face candidates, firstly they are updated to the new regressed bounding boxes. Then the updated face candidates are rotated according to the predicted coarse RIP angles.
Second Stage PCN-2
Similar as the PCN-1 in the first stage, the PCN-2 in the second stage further distinguishes the faces from non-faces more accurately, regresses the bounding boxes, and calibrates face candidates. Differently, the coarse orientation prediction in this stage is a ternary classification of the RIP angle range, telling the candidate is facing left, right or front.
Third Stage PCN-3
After the second stage, all the face candidates are calibrated to an upright quarter of RIP angle range, i.e. [$-45^{\circ}$,$45^{\circ}$]. Therefore, the PCN-3 in the third stage can easily and accurately determine whether it is a face and regress the bounding box. Since the RIP angle has been reduced to a small range in previous stages, PCN-3 attempts to directly regress the precise RIP angles of face candidates instead of coarse orientations.
Accurate and Fast Calibration
The early stages only predict coarse RIP ori- entations, which is robust to the large diversity and further benefits the prediction of successive stages.
The calibration based on the coarse RIP prediction can be efficiently achieved via flipping original image three times, which brings almost no additional time cost. Rotating the original image by $-90^{\circ}$, $90^{\circ}$ and $180^{\circ}$ to get image-left, image-right, and image-down. And the windows with $0^{\circ}$,$-90^{\circ}$, $90^{\circ}$ and $180^{\circ}$ can be cropped from original image, image-left, image-right, and image-down respectively, as the following figure shows.
CNN Architecture
PERFORMANCE
Reading Note: Pelee: A Real-Time Object Detection System on Mobile Devices
TITLE: Pelee: A Real-Time Object Detection System on Mobile Devicesn
AUTHOR: Robert J. Wang, Xiang Li, Shuang Ao, Charles X. Ling
ASSOCIATION: University ofWestern Ontario
FROM: arXiv:1804.06882
CONTRIBUTION
- A variant of DenseNet architecture called PeleeNet for mobile devices is proposed.
- The network architecture of Single Shot MultiBox Detector (SSD) is optimized for speed acceleration and then combine it with PeleeNet.
METHOD
BUILDING BLOCKS
Two-Way Dense Layer. A 2-way dense layer is used to get different scales of receptive fields. One branch uses a small kernel size (3x3) to capture small-size objects. The other branch stacks two 3x3 convolution layers for larger objects. The structure is shown in the following figure.
Stem Block. This block is placed before the first dense layer for the sake of cost efficiency. This stem block can effectively improve the feature expression ability without adding computational cost too much. The structure is shown as follows.
Dynamic Number of Channels in Bottleneck Layer. The number of channels in the bottleneck layer varies according to the input shape to make sure the number of output channels does not exceed the number of its input channels.
Transition Layer without Compression. experiments show that the compression factor proposed by DenseNet hurts the feature expression so that the number of output channels is kept the same as the number of input channels in transition layers.
Composite Function. The post-activation (Convolution - Batch Normalization - Relu) is used for speed acceleration. For post-activation, all batch normalization layers can be merged with convolution layer at the inference stage. To compensate for the negative impact on accuracy caused by this change, a shallow and wide network structure is designed. In addition, a 1x1 convolution layer is added to the last dense block to get a stronger representational ability.
ARCHITECTURE
The framework of the work is illustrated in the following table.
OPTIMIZATION FOR SSD
Feature Map Selection. 5 scale feature maps (19x19, 10x10, 5x5, 3x3, and 1x1) are selected. Larger resolution features are discarded for speed acceleration.
Residual Prediction Block. For each feature map used for detection, a residual block (ResBlock) is constructed before conducting prediction, shown in the following figure.
PERFORMANCE
The classification performance on ILSVRC2012 is shown in the following table.
The detection performance on VOC2007 is shown in the following table.
The detection performance on COCO2015 is shown in the following table.
SOME IDEAS
From my own experince, DW convolution is not pruning friendly so that recently pruning methods, such as ThiNet and Net-Trim, works poorly on DW convolution. This work uses conventional convolutional layers, so maybe those pruning methods can play a role.
采坑记录
最近在给封装一个动态库,需要支持古老的windows xp系统。而我的开发系统是windows 10,使用visual studio 2013作为IDE。
一通谷歌百度之后,我采用了曝光度最高的方法。具体来说包括两个步骤:
- 在工程设置里,配置属性->常规->平台工具集,选择 Visual Studio 2013 - Windows XP (v120_xp)
- 在工程设置里,配置属性->C/C++->代码生成->运行库,选择MT/MTd。分别对应于release和debug模式。
最初并没有发现这样做有什么问题,后来写了一个接口函数,release模式下没有发现问题,但是debug模式下调用该接口的函数在出栈时一直崩溃,错误如下
因为对这一块儿实在不熟悉,就抱着死马当活马医的态度,把所有MT/MTd都改成了MD/MDd,又把所有依赖库和自己的库编译了一遍。在目标测试机上安装,居然成功了。
在网上搜了一些解释:
Debug Assertion Failed! Expression: __acrt_first_block == header
As this is a DLL, the problem might lie in different heaps used for allocation and deallocation (try to build the library statically and check if that will work).
The problem is, that DLLs and templates do not agree together very well. In general, depending on the linkage of the MSVC runtime, it might be problem if the memory is allocated in the executable and deallocated in the DLL and vice versa (because they might have different heaps). And that can happen with templates very easily, for example: you push_back() to the vector inside the removeWhiteSpaces() in the DLL, so the vector memory is allocated inside the DLL. Then you use the output vector in the executable and once it gets out of scope, it is deallocated, but inside the executable whose heap doesn’t know anything about the heap it has been allocated from. Bang, you’re dead.
This can be worked-around if both DLL and the executable use the same heap. To ensure this, both the DLL and the executable must use the dynamic MSVC runtime - so make sure, that both link to the runtime dynamically, not statically. In particular, the exe should be compiled and linked with /MD[d] and the library with /LD[d] or /MD[d] as well, neither one with /MT[d]. Note that afterwards the computer which will be running the app will need the MSVC runtime library to run (for example, by installing “Visual C++ Redistributable” for the particular MSVC version).
You could get that work even with /MT, but that is more difficult - you would need to provide some interface which will allow the objects allocated in the DLL to be deallocated there as well. For example something like:
1
2
3
4
5
6 >__declspec(dllexport) void deallocVector(std::vector<std::string> &x);
>void deallocVector(std::vector<std::string> &x) {
std::vector<std::string> tmp;
v.swap(tmp);
>}(however this does not work very well in all cases, as this needs to be called explicitly so it will not be called e.g. in case of exception - to solve this properly, you would need to provide some interface from the DLL, which will cover the vector under the hood and will take care about the proper RAII)
EDIT: the final solution was actually was to have all of the projects (the exe, dll and the entire googleTest project) built in Multi-threaded Debug DLL (/MDd) (the GoogleTest projects are built in Multi-threaded debug(/MTd) by default)
说实话,对计算机原理的理解十分欠缺,遇到稍微专业一些的问题只能照着网上的一些方法试一试,如果成了也就不会再深入研究了,如果不成也不知道为什么不成,只能再去试别的方法。:-(
Using Evermonkey
Using Evermonkey
I’ve been using VSCode for a while and used to Markdown, which has not been supported by EverNote yet. Thus, I wonder whether there’s any extension that can help. Luckily, evermonkey shows up.
Installation
There are 3 steps to use this extension.
- Get a developer token. Currently, EverNote does not accept applications for tokens on their official website. But we can get a token by sending emails to their costumer service. I got a token in only one or two days.
- Install evermonkey extension to VSCODE.
- Set
evermonkey.token
andevermonkey.noteStoreUrl
in settings.
Usage
Open command panel by F1 or ctrl+shift+p then type
ever new
to start a new blank note.ever open
to open a note in a tree-like structure.ever search
to search note in EverNote grammar.ever publish
to publish current editing note to EverNote server.ever sync
to synchronizing EverNote account.
Shortage
Currently, third-party extensions only support synchronizing files. The file can not be modified in apps. For example, I can now only modify the file in VSCODE, but not in EverNote application.
Useful Git Commands
list the different files in two branches
1
git diff branch1 branch2 --stat
list the differences in detail in two branches
1
git diff branch1 branch2
Relpace one file from branch1 to branch2
1
2git checkout branch2
git checkout --patch branch1 filenameStart a new branch1
1
git checkout -b NewBranch
List branches in remote git
1
git branch -a
Reading Note: MobileNetV2: Inverted Residuals and Linear Bottlenecks
TITLE: MobileNetV2: Inverted Residuals and Linear Bottlenecks
AUTHOR: Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen
ASSOCIATION: Google
FROM: arXiv:1801.04381
CONTRIBUTION
- The main contribution is a novel layer module: the inverted residual with linear bottleneck.
METHOD
BUILDING BLOCKS
Depthwise Separable Convolutions. The basic idea is to replace a full convolutional operator with a factorized version that splits convolution into two separate layers. The first layer is called a depthwise convolution, it performs lightweight filtering by applying a single convolutional filter per input channel. The second layer is a $1 \times 1$ convolution, called a pointwise convolution, which is responsible for building new features through computing linear combinations of the input channels.
Linear Bottlenecks Consider. It has been long assumed that manifolds of interest in neural networks could be embedded in low-dimensional subspaces. Two properties are indicative of the requirement that the manifold of interest should lie in a low-dimensional subspace of the higher-dimensional activation space:
- If the manifold of interest remains non-zero vol-ume after ReLU transformation, it corresponds to a linear transformation.
- ReLU is capable of preserving complete information about the input manifold, but only if the input manifold lies in a low-dimensional subspace of the input space.
Assuming the manifold of interest is low-dimensional we can capture this by inserting linear bottleneck layers into the convolutional blocks.
Inverted Residuals. Inspired by the intuition that the bottlenecks actually contain all the necessary information, while an expansion layer acts merely as an implementation detail that accompanies a non-linear transformation of the tensor, shortcuts are used directly between the bottlenecks. In residual networks the bottleneck layers are treated as low-dimensional supplements
to high-dimensional “information” tensors.
The following figure gives the Inverted resicual block. The diagonally hatched texture indicates layers that do not contain non-linearities. It provides a natural separation between the input/output domains of the building blocks (bottleneck layers), and the layer transformation – that is a non-linear function that converts input to the output. The former can be seen as the capacity of the network at each layer, whereas the latter as the expressiveness.
The framework of the work is illustrated in the following figure. The main idea of this work is to learn image aesthetic classification and vision-to-language generation using a multi-task framework.
And the following table gives the basic implementation structure.
ARCHITECTURE
PERFORMANCE
Reading Note: Neural Aesthetic Image Reviewer
TITLE: Tiny SSD: Neural Aesthetic Image Reviewer
AUTHOR: WenshanWang, Su Yang, Weishan Zhang, Jiulong Zhang
ASSOCIATION: Fudan University, China University of Petroleum, Xi’an University of Technology
FROM: arXiv:1802.10240
CONTRIBUTION
- The problem is whether computer vision systems can perceive image aesthetics as well as generate reviews or explanations as human. It is the first work to investigate into this problem.
- By incorporating shared aesthetically semantic layers at a high level, an end-to-end trainable NAIR architecture is proposed, which can approach the goal of performing aesthetic prediction as well as generating natural-language comments related to aesthetics.
- To enable this research, the AVA-Reviews dataset is collected, which contains 52,118 images and 312,708 comments.
METHOD
The framework of the work is illustrated in the following figure. The main idea of this work is to learn image aesthetic classification and vision-to-language generation using a multi-task framework.
The authors tried two designs, Model-I and Model-II. The difference between the two architectures is whether there are task-specific embedding layers for each task in addition to the shared layers. The potential limitation of Model-I is that some task-specific features can not be captured by the shared aesthetically semantic layer. Thus a task-specific embedding layer is introduced.
For image aesthetic classification part, it is a typical binary classification task. For comment generation part, LSTM is applied, the input of which is the high-level visual feature vector for an image.
PERFORMANCE
《极简主义》读书笔记
自从出来创业,自己一直处于一种颇为混乱的状态。包括如何管理团队、如何保持自己的技术能力、如何释放压力及平衡生活和工作……也许在最近的未来,我依旧无法找到令人满意的答案,但至少从现在开始我要努力去寻找答案。
有一句话“有一种迷茫叫想得太多,做得太少”,的确如此,当自己在不断审视自己的时候,不知道该如何是好的时候,其实就是该行动的时候了。比如我最近读文献读得少了,总是焦虑要和技术脱轨跟不上形势,其实不如赶紧补一补课,多读一读最近发表的论文。同样的,为了缓解其他方面的焦虑,我也要开始多读读书,给自己充电。这本《极简主义》就是一个开头。
理念一 事情其实很简单
很多事情都是复杂的,而且大多数情况下我们都在寻找复杂的解决方案,但其实越简单的方案越有可行性,也越可能正确。反思一下自己,好像从读硕士阶段开始,就在想象自己应该做一些复杂的事情,比如复杂的算法、复杂的代码、复杂的系统……其实真正解决问题的方案都没有那么复杂。书中给出的建议是:
- 寻找简单的解决方法。要时刻问自己“完成这件事情的最简单方法是什么?”
- 用不超过25个字把一件事情描述清楚。
- 如果你发现自己采用了某种复杂的解决方法或者思路,那么你可能已经走上了错误的道路。那么如何定义复杂呢?
- 只问简单的问题:谁?什么?为什么?在哪里?什么时候?怎么发生的?产生了什么结果?
- 只寻求简单的答案。
- 记住让事情简单易懂,把对象当作6岁的小孩,然后再解释。
- 跳出思维惯性,使用水平思维。
“事情其实很简单”的基本要求是我们要用简单的方法提问和解答,同时寻求别人的简单回答和问题。
理念二 弄明白自己要做什么
如果你不知道自己要驶向哪个港口,那么无论是东南风还是西北风,对你来说都是无所谓的。
感觉这一理念和“事情其实很简单”是相辅相成的,即当我们可以将问题和答案简化的时候,也就距离“明白自己要做什么”不远了。其实最简单的“弄明白自己要做什么”的方法就是制定计划。将我们所要达到的目标形象化,设定一件事情结束的指标,即当我们要知道一件事做到什么程度的时候就可以算作是结束了。
我们除了要知道自己想做什么,更重要的两个问题是:
- 真正理解想要做的事情
- 搞清楚做这件事是否也是其他人希望做的
对于第一个问题,我们除了要知道自己要做什么,还需要知道做这件事背后的原因。我们需要问自己:“这个工作最终要达到什么目标?”。
对于第二个问题,我们要思考的是如何让所有利益相关者获利,即共赢。让利益相关者永远开心是成功的关键,我们需要告知利益相关者他们会得到什么,而他们得到的一定是他们需要的。
理念三 任何事情都有连续性
我最初在读到这个标题的时候,以为是说任何事情都有“惯性”,它会影响其他事情。但后来发现,原来这里说的是要有计划的开展工作,一件事情的完成是一个一个连续的小事件组成的。
为了能够让事情连续起来,我们需要作出一些努力,包括
- 一开始就做好计划
- 把假话做得详细周到
- 清楚地说出自己的意图
- 善于运用知识和假设
- 懂得运用因果关系
- 记录已经发生的事情
这些努力是一种递进的关系,核心就是“做好计划”。什么是好的计划呢?答案是详细周到的计划。而如何做到详细周到,首先,我们需要能够清晰地给出目标,这其实就是理念二对我们的要求。其次,在制定计划的时候,很多东西肯定不是马上就能遇到的,我们需要作出预判,预判的根据就是已有的知识假设以及各个事件之间的因果关系。最后,就是记录以往的经验,这些经验讲转化为4和5中的知识。
书中有一个情节让我十分感同身受——那就是“救火”。书中是这样描述的:
早餐你到达办公室后翻看了代办清单。当你开始做清单上的第一件事情时有人通知你参加九点三十分的会议。在开会期间有人敲门找你,说“我能耽误你几分钟吗?”就在你和他谈话的时候,你的手机响了,于是你又得接电话。还没接完电话呢,电脑“叮”地响了一声提醒你收到了一封邮件。紧接着你的座机又响了……
真的是这样,感觉在创业的半年多来,很多时候我被这个同事叫过去,然后又被那个实习生拦住,接着处理杂事……以后要注意制定计划,然后跟住这个计划。
理念四 如果不去做,永远都做不完
首先重要的事情说三遍:开始做!开始做!开始做!
开始做的前提是应用好第二和第三个理念,然后就是如何做,书中给出工具包括
- 将工作落实到人
- 舞会卡
- 使团队的力量最大化
第一个工具是说任何一件工作都应该有负责人,不应出现这个工作你做也行我做也行的情况。具体的一项工作可能是属于一个团队的,但是这个工作一定由一个人负责。而且一项工作一定可以被拆分到团队中的每个人头上。
第二个工具其实是一个估算工作量的工具。当我们发现工作量完全超过了我们可以承受的范围时,我们就要对工作进行优先级的排序,并适当地放弃一些工作。想起来一句大家常说的话:舍得舍得,有舍才有得。如何建立优先级,又回到了理念一、二、三的问题。读这一本书的感觉就是,事情都是螺旋上升的,计划一件工作,或者学习这本书中的知识,也需要循环使用这些工具才能到达目标。
第三个工具是一种如何最大化员工输出的方法。人员可以被分为五类:
- 明星人员。这类人喜欢特定的工作,具备一切必需的技能并几乎可以确定会完成工作。让他们按照自己的方式完成工作,尽量不要干预他们。
- 可依赖人员。这类人很愿意工作并知道工作方法。也许他们对于这项工作不是那么热情,但他们还是很可能会完成它。对于这类人别太妨碍他们,但也别对他们抱有百分之百的信息。
- 不确定人员。这类人由于各种原因,很可能不会很好地完成工作任务。我们需要尽快对这类人进行细分,分配给他们一些工作,并根据工作成效判断他们的能力。如果他们能够较好地完成工作,可归于第二类,如果不能就归于第五类。
- 实习人员。无论如何他们都是新人,在确认他们具备可以完成工作的能力之前,需要对他们进行手把手地指导、正规的培训和细节管理。要确保他们至少可以成为第二类人员。
- 无希望人员。他们不会完成任务,所以我们需要寻找别的方法来完成这项工作。需要对这类人做出合理处置,包括解雇或者改造他们。
理念五 事情的结果往往和预期不一样
计划做的再好,也开始执行了,总会有一些事情是不受控制的,总有一些“惊喜“要来突袭我们,那我们该如何处理这些事情呢?
- 应急措施
- 风险管理
第一个工具是指我们事先就会预测到一些问题,我们需要做的是计划好如果真的发生了问题,我们要怎么做。
第二个工具要求我们队可能发生的问题进行评估,包括发生的概率和带来的破坏力,这样可以帮助我们尽量避免一些严重的问题,所谓“两害相权取其轻”。
其实我的理解是要做好完备的计划,即在应用第二个理念时,就要考虑到可能遇到的问题。对于项目管理来说,这些风险可能是需要经验积累才能预见到的,需要不断的实践和总结。
理念六 明确界定事情的结果
一旦我们践行了理念一理念五,做一件事情的基本框架就搭建完成了,但是还有一些细节需要注意。该理念就是告诉我们在实践计划的过程中,每个工作只能有两个状态:要么完成了,要么没有完成。如何判定完成或者没有完成呢,需要我们“明确界定事情的结果”。
其实如果我们已经开始施行理念三,那么我们就已经开始理念六了,因为我们把达到一个目标分解成了一个个的小工作,当我们完成这些小工作之前,目标就是没有完成。但是,有了这些小工作,我们可以有效的监控大目标已经完成了多少。至于如何评估一个小工作是否完成,我们可以进一步借助理念二,我们到底要做什么。每项细化的工作都应该明确工作目标和成果形式,确定一些我们能实实在在看到并能够掌控的东西,通过对成果的检验,我们可以判断一件工作是否完成了。
理念七 学会从他人的角度看问题
这里作者给出了两个工具:
- 试着穿上别人的鞋子
- 尽可能满足利益相关者的获利条件
首先,当你与任何层次的人打交道遇到挫折时,能够尽量把自己放在他们的位置上考虑,会更容易理解其观点和做法。这个其实就是经常会被提到的“体恤下属”和“站在领导的角度看问题”,其实这两件事说起来容易,做起来难度很大。对于前者,当我们有责任有压力在身时,很难做到冷静地处理问题。至于后者,当我们的眼界还不够的时候,几乎不可能具有那样的眼光,或者有时候是“不在其职不谋其政”。
其次,让所有人都有所得,而且是让每个人得到他自己想要的,而不是我们想给的,才能让所有人顺畅地合作。