【NLP】基础模型之词向量

时间 2019-11-08

标签 NLP 基础模型之词向量繁體版

原文原文链接

愈来愈以为基础过重要了，要成为一个合格的算法工程师而不是调包侠，必定要知道各个基础模型的HOW&WHY，毕竟那些模型都是当年的SOTA，他们的思想也对以后的NLP模型影响很大。最近找到了一个还不错的nlp-tutorial，准备抽时间过一遍基础模型，模型的大体思想以及数学公式可能就带过了，主要是实现上的细节。git

1. NNLM

1.1 思想

经过神经语言模型学习词向量，网络结构如图：github

解决了统计语言模型（n-gram model）的如下问题：算法

维度灾难：高维下的数据稀疏会致使不少统计几率为0，本文提出了分布式词表示
长距离依赖：n-gram通常最多为3
词的类似关系：在本文中，词以向量的方式存在，经过LM训练后类似的词会具备类似的词向量

1.2 源码

$y = b+Wx+Utanh(d+Hx) \\$

class NNLM(nn.Module):
    def __init__(self):
        super(NNLM, self).__init__()
        self.C = nn.Embedding(n_class, m)
        self.H = nn.Parameter(torch.randn(n_step * m, n_hidden).type(dtype))
        self.W = nn.Parameter(torch.randn(n_step * m, n_class).type(dtype))
        self.d = nn.Parameter(torch.randn(n_hidden).type(dtype))
        self.U = nn.Parameter(torch.randn(n_hidden, n_class).type(dtype))
        self.b = nn.Parameter(torch.randn(n_class).type(dtype))

    def forward(self, X):
        X = self.C(X)
        X = X.view(-1, n_step * m) # [batch_size, n_step * emb_dim]
        tanh = torch.tanh(self.d + torch.mm(X, self.H)) # [batch_size, n_hidden]
        output = self.b + torch.mm(X, self.W) + torch.mm(tanh, self.U) # [batch_size, n_class]
        return output复制代码

要注意的点：网络

模型输入x是全部词向量的拼接，而不是平均
模型有两个隐层：一个是线性层C，一个是非线性层tanh。W矩阵中的参数是有可能为0的
模型输出层embedding参数矩阵不共享

【NLP】基础模型之词向量

1. NNLM

1.1 思想

1.2 源码

2. Word2Vec

3. Fasttext