Watch,Listen,and Describe:Globally and Locally Aligned Cross-Modal Attentions for Video Captioning

时间 2021-01-02

原文原文链接

这是NAACL2018的一篇关于video caption（CV与NLP结合）的文章，paper链接https://arxiv.org/abs/1804.05448，一作是加州大学圣塔芭芭拉分校（UCSB）的PHD，作者的homepage http://www.cs.ucsb.edu/~xwang/，code还没有被released出来（作者没有release code的习惯）。个人瞎扯：看这