论文连接:https://arxiv.org/pdf/1411.4555.pdfgit
代码连接:https://github.com/karpathy/neuraltalk & https://github.com/karpathy/neuraltalk2 & https://github.com/zsdonghao/Image-Captioninggithub
主要贡献网络
在这篇文章中,做者借鉴了神经机器翻译(Neural Machine Translation)领域的方法,将“编码器-解码器(Encoder-Decoder)”模型引入了神经图像标注(Neural Image Captioning)领域,提出了一种端到端(end-to-end)的模型解决图像标注问题。下面展现了从论文中截取的两幅图片,第一幅图片是NIC模型的概述,第二幅图片描述了网络的细节。NIC网络采用卷积神经网络(CNN)做为编码器,长短时间记忆网络(LSTM)做为解码器。学习
实验细节优化
Hence, it is natural to use a CNN as an image “encoder”, by first pre-training it for an image classification task and using the last hidden layer as an input to the RNN decoder that generates sentences.ui
An “encoder” RNN reads the source sentence and transforms it into a rich fixed-length vector representation, which in turn in used as the initial hidden state of a “decoder” RNN that generates the target sentence.编码
It is a neural net which is fully trainable using stochastic gradient descent.lua
The model is trained to maximize the likelihood of the target description sentence given the training image.spa
版权声明:本文为博主原创文章,欢迎转载,转载请注明做者及原文出处!翻译