http://www.ee.columbia.edu/ln/dvmm/publications/17/zhang2017visual.pdfweb
Visual Translation Embedding Network for Visual Relation Detection Hanwang Zhang† , Zawlin Kyaw‡ , Shih-Fu Chang† , Tat-Seng Chua‡ †Columbia University, ‡National University of Singapore算法
亮点网络
现有工做ide
主要思想学习
Translation Embedding 视觉关系预测的难点主要是:对于N个物体和R种谓语,有N^2R种关系,是一个组合爆炸问题。解决这个问题经常使用的办法是:spa
受Translation Embedding (TransE) 启发,文章中将视觉关系看做在特征空间上从主语到宾语的一种映射,在低维空间上关系元组可看做向量变换,例如person+ride ≈ bike. scala
Knowledge Transfer in Relation 物体的识别和谓语的识别是互惠的。经过使用类别名、位置、视觉特征三种特征和端对端训练网络,使物体和谓语以前的隐含关系在网络中可以学习到。设计
算法blog
Visual Translation Embeddingip
Loss function
Feature Extraction Layer
classname + location + visual feature 不一样的特征对不一样的谓语(动词、介词、空间位置、对比)都有不同的做用
Bilinear Interpolation
In order to achieve object-relation knowledge transfer, the relation error should be back-propagated to the object detection network and thus refines the objects. We replace the RoI pooling layer with bilinear interpolation [18]. It is a smooth function of two inputs:
结果
Translation embeding: +18%
object detection +0.6% ~ 0.3%
State-of-art:
问题