DRL for Dialogue Generation论文学习零散记录

时间 2021-01-02

原文原文链接

Deep Reinforcement Learning for Dialogue Generation 这是一篇将策略梯度（policy gradient）引入Seq2Seq来进行多轮对话的文章。使用策略梯度从三方面来reward: informativity,coherence, and ease of answering。作者提到将SEQ2SEQ模型用于dialogue generatio