Efficient Estimation of Word Representations in Vector Space (2013)论文要点

时间 2019-11-09

标签 efficient estimation word representations vector space 论文要点栏目 Microsoft Office 繁體版

原文原文链接

参考：分布式

A Neural Probabilistic Language Model (2003)论文要点 http://www.javashuo.com/article/p-entuqhuq-gt.html学习

- 线性规律linear regularities: "king - man = queen - woman"编码

- 语法和语义规律syntactic and semantic regularitieshtm

1986年Hinton等人提出分布式表示。blog

典型的训练：token

3-50轮，十亿级别样本，滑动窗口宽度N=10，向量维度D=50-200，隐层宽度H=500-1000，词典维度|V|=10^6ip

复杂度主要取决于隐层到输出层，即H*|V|get

hierarchical softmax，输出层Huffman编码，计算复杂度|V| -> log|V|it

考虑去掉隐层。

两种方式CBOW和Skip-gram

更多数据，更高维向量：

Google News：60亿tokens，100万经常使用词，3万极经常使用词

3轮迭代，学习率0.025且随时间衰减。