Narcissus's blog Stay Follish, Stay Hungry

论文阅读:Object Detection Using Scene-Level Context and Instance-Level Relationships

2018-07-10
Narcissus

SIN网络结构

cls  Softmax  O  Concatenate  Figure 3. SIN: The mewo Our Method. ly we get a fixed number of ROIs from an input image. Each ROI is pooled into a  fixed-size feature map and then to a feature vector b a full connec  same way. then we concatenate the descri ors of every two ROIS int edge .  inference method is triggered, and fln s a o eac  the corresponding ROI. The who  the base detection framework-).  k is trained nd-t  with the ori inal  e. e extract the whole ima feature as scene  eratively update the node state. an e a rate y  to predict the category and refine the location Of  study exploits Faster R-CNN as

网络结构解释

首先对输入图像利用faster-rcnn进行物体检测,得到一些ROIs,对ROI进行ROI pooling得到固定size的feature map,再经过全连接层映射成一些节点(nodes,即feature vector),并两两组合得到edges,即得到object relationship context。对整个图像提取特征得到scene context。将两种context送入到GRU(选择GRU的原因是其具有记忆功能,能融合来自多方面的context,并传递到下一次更新中,即有消息传递功能),这个GRU的原理如下:

ht (C:\Users\nian\OneDrive\NoteBook_posts\assets\clip_image002.png)  1 (nodes')  Figure 4. An illustration of GRU.  update gate z selects  whether the hi dden s tate ht is to updated with a new hi  den state 11. The reset gate r decides whether the previous hidden  ignored.

这里r开关和z开关的详细定义及作用可看论文

全局context与局部context

对于每一个物体的定位,它利用了来自scene GRU的信息和Edge GRU的信息,如下:

Figure 5. Structure Inference. For Object Vi, the input Of scene  GRU is scene context message m: , and the initial hidden state  is the feature For message from node to  node is controlled by edge e, These messages from all other  objects are integrated as m: to input the edge GRU. The initial  hidden State Of edge GRU is also fie. Then these two sets Of GRU  output ensemble together as eventual updated node state.

形式化的表示为:

5, the integrated message to node is calculated by  Tn3 *  je-v  where  Wp and We, are learnable weight matrixes. Using m  pooliqg cau most important message while us-

其中的空间位置特征表示为:

(Xi — (Yi , log(C:\Users\nian\OneDrive\NoteBook_posts\assets\clip_image005.png  ) , tog(

最后得到隐藏层t+1的特征表示(通过mean pooling 得到):

For node v', it receives messages both from the other  nodes and scene context. Eventually we get the compre-  hensive representation , which denotes the node state.  In our current study, we empirical find that (C:\Users\nian\OneDrive\NoteBook_posts\assets\clip_image006.png)  pooling and concatenation, so  (8)  where is the output of scene GRU, and h; +1 denotes  the output of edge GRU.

结果分析

  • 量化结果是一个总体结果,肯定是好的,这个需要跑代码看,数据是否对。

  • qualitative results

Scene module结果中比较值得一说的是failure case:将海里的airoplain识别成了boat,说明对这种罕见的场景来说scene context是不好的

(C:\Users\nian\OneDrive\NoteBook_posts\assets\clip_image007.png) ru'hins dct"ted an with a  (d) are detected (e) chair ted  (n only a  Figure 8. Qualitative results Baseline vs. Scene on VOC. In  pair Of detection results (top vs. the top is based on  baseline, and the is detection result of Scene,

  • Edge module的结果:定位错误的结果减少;能够减少重复检测的结果:

(C:\Users\nian\OneDrive\NoteBook_posts\assets\clip_image008.png) aeroplane  (b) Edge  (d) Edge  Figure 9. Analysis of 'Iöp-Ranked False Positives. Pie charts:  fraction of detections that are correct (Cot) or false positive due  to confusion With similar Objects (Simi,  confusion with other VOC objects ((hh). or confusion with back-  ground or unlatkled objects (BG)_ Len: results of the baseline  Faster R -CNN, Right; results Of Edge, Loc errors are fewer than  baseline on aemplane and bus.

(C:\Users\nian\OneDrive\NoteBook_posts\assets\clip_image009.png) are  (b) of Edge  (d r&lts of Edge  Figure results Edge on VOC. In  every pair of detection results. the left is based on bmseline. and  the right is detection result Of Edge.

  • 结合两种context的结果:

Table 6. performance on VOC 2007 test Using Different  semble Ways and lime Ste'". All methcxis are trained VOC  Emsemble Wa  concatenation  max- lin  mean-pooling  mean-poo li ng  mean- li  Time S  70.2  70.4  69.8  fective fusion Of the two separated updated hidden State h s  and he Of mules Obtained by the modules Of  Scene and Edge.

题注

Comments

Content