Attention is all you need 2020-05-15

Attention is all you need Abstract Transformer : 无recurrence和convolutions,只基于attention Introduction Recurrent models 是seq2seq model,h_t = f (position,h_t-1);不能并行运算, RNN 长期忘记,transformer: averaging att
相关文章
相关标签/搜索