Synthesizer: Rethinking Self-Attention in Transformer Models

Synthesizer: Rethinking Self-Attention in Transformer Models 这篇论文通过替换 Q × K T Q \times K^{T} Q×KTattention矩阵,发现Self-Attention中query-key-value dot product attention并不是不可或缺的。作者分别提出了Dense SynSynthesizer
相关文章
相关标签/搜索