Transformer相关的各种预训练模型优点缺点对比总结与资料收集(Transformer/Bert/Albert/RoBerta/ERNIE/XLnet/ELECTRA）

时间 2021-01-04

原文原文链接

文章目录 1、Transfomer 基础资料基本结构 single attention和 multiHead attention attention multi-head attention self-attention encoder和decoder Add & Norm Position-wise Feed-Forward Networks（Relu） Weight Tying Normal