万字长文带你一览ICLR2020最新Transformers进展

文章目录 1. Self-atention 的变体 Long-Short Range Attention Tree-Structured Attention with Subtree Masking Hashed Attention eXtra Hop Attention 2. 训练目标 Discriminative Replacement Task Word and Sentence Struc
相关文章
相关标签/搜索