论文连接:https://arxiv.org/pdf/1502.03044.pdfhtml
代码连接:https://github.com/kelvinxu/arctic-captions & https://github.com/yunjey/show-attend-and-tell & https://github.com/jazzsaxmafia/show_attend_and_tell.tensorflowgit
主要贡献github
在这篇文章中,做者将“注意力机制(Attention Mechanism)”引入了神经机器翻译(Neural Image Captioning)领域,提出了两种不一样的注意力机制:‘Soft’ Deterministic Attention Mechanism & ‘Hard’ Stochastic Attention Mechanism。下图展现了"Show, Attend and Tell"模型的总体框架。框架
注意力机制的关键点在于,如何从图像的特征向量ai中计算获得上下文向量zt。对于每个位置i,注意力机制可以产生一个权重eti。在Hard Attention机制中,权重αti所扮演的角色是图像区域向量ai在t时刻被选中做为解码器的信息的几率,有且只有一个区域会被选中,为此,引入变量st,i,当区域i被选中时为1,不然为0;在Soft Attention机制中,权重αti所扮演的角色是图像区域向量ai在t时刻输入解码器的信息中所占的比例。(参考Attention机制论文阅读——Soft和Hard Attention,Multimodal —— 看图说话(Image Caption)任务的论文笔记(二)引入attention机制)
spa
实验细节.net
To create the annotations ai used by our decoder, we used the Oxford VGGnet pretrained on ImageNet without finetuning.翻译
In our experiments we use the 14×14×512 feature map of the fourth convolutional layer before max pooling. This means our decoder operates on the flattened 196×512 (i.e L × D) encoding.code
The initial memory state and hidden state of the LSTM are predicted by an average of the annotation vectors fed through two separate MLPs(init,c and init,h).htm
版权声明:本文为博主原创文章,欢迎转载,转载请注明做者及原文出处!blog