Distilling transformers into simple neural networks with unlabeled transfer data论文解读

Distilling transformers into simple neural networks with unlabeled transfer data 论文地址:https://arxiv.org/pdf/1910.01769.pdf motivation 一般来说,蒸馏得到的student模型与teacher模型的准确率还存在差距。文章利用大量in-domain unlabeled t
相关文章
相关标签/搜索