哈佛NLP组论文解读：基于隐变量的注意力模型 | 附开源代码

时间 2020-12-20

原文原文链接

摘要 Attention 注意力模型在神经网络中被广泛应用。在已有的工作中，Attention 机制一般是决定性的而非随机变量。我们提出了将 Attention 建模成隐变量，并应用 VAE 和 policy gradient 训练模型。在不使用 KL annealing 等 trick 的情况下训练，在 IWSLT 14 German-English 上建立了新的 state-of-the-ar