Paper Read: Robust Deep Multi-modal Learning Based on Gated Information Fusion Network

时间 2019-11-18

标签 paper read robust deep multi modal learning based gated information fusion network 栏目系统网络繁體版

原文原文链接

Robust Deep Multi-modal Learning Based on Gated Information Fusion Network 网络

2018-07-27 14:25:26

app

Paper：https://arxiv.org/pdf/1807.06233.pdf
ide

Related Papers: 函数

1. Infrared and visible image fusion methods and applications: A survey 　　Paper
ui

2. Chenglong Li, Xiao Wang, Lei Zhang, Jin Tang, Hejun Wu, and Liang Lin. WELD: Weighted Low-rank Decomposition or Robust Grayscale-Thermal Foreground Detection. IEEE Transactions on Circuits and Systems for Video Technology (T-CSVT), 27(4): 725-738, 2017. [Project page with Dataset and Code]google

3. Chenglong Li, Xinyan Liang, Yijuan Lu, Nan Zhao, and Jin Tang. RGB-T Object Tracking: Benchmark and Baseline.[arXiv] [Dataset: Google drive, Baidu cloud] [Project page] spa

本文针对多模态融合问题（Multi-modal），提出一种基于 gate 机制的融合策略，可以自适应的进行多模态信息的融合。做者将该方法用到了物体检测上，其大体流程图以下所示：3d

如上图所示，做者分别用两路 Network 来提取两个模态的特征。该网络是由标准的 VGG-16 和 8 extra convolutional layers 构成。另外，做者提出新的 GIF（Gated Information Fusion Network）网络进行多个模态之间信息的融合，以取得更好的结果。动机固然就是多个模态的信息，是互补的，可是有的信息帮助会更大，有的可能就质量比较差，功效比较小，因而就能够自适应的来融合，达到更好的效果。orm

Gated Information Fusion Network (GIF)： blog

如上图所示：

该 GIF 网络的输入是：已经提取的 CNN feature map，这里是 F1, F2. 而后，将这两个 feature 进行 concatenate，获得 $F_G$. 该网络包含两个部分：

1. information fusion network（图2，虚线框意外的部分）；

2. weight generation network （WG Network，即：图2，虚线处）；

Weight Generation Network 分别用两个 3*3*1 的卷积核对组合后的 feature map $F_G$ 进行操做，而后输入到 sigmoid 函数中，即：gate layer，而后输出对应的权重 $w_1$，$w_2$。

Information fusion network 分别用获得的两个权重，点乘原始的 feature map，获得加权之后的特征图，将二者进行 concatenate 后，用 1*1*2k 的卷积核，获得最终的 feature map。

总结整个过程，能够概括为：

== Done !