NLP(二)

交叉熵误差

\(H(p,q)=-\sum^{C}_{c=1}{p(c)logq(c)}\)

  • p(c)为类别概率
  • q(c)为SoftMax概率\(J(\theta)={\frac{1}{N}}\)\(\sum^{N}_{i=1}{-log(\frac{e^{f_{yi}}}{e^{f_c}})}+M\sum_{K}{{\theta}k^2}\)\(M\sum_{K}{{\theta}k^2}\)是正则项,避免过拟合或者爆炸
    Q:何时更新词向量?
    A:固定词向量,因为对于较小的语料库可能过拟合

词窗口分类

假设我们想要识别人名、地点、组织以及其它(四个类别)
... museums in Paris are amazing...\(X_window=[x_{museums}\) \(x_{in}\) \(x_{Paris}\) \(x_{are}\) \(x_{amazing}\)\(]^T\)
R\(\in{5D}\)
with x=\(x_{window}\)\(y_y=p(y|x)=\frac{exp(W_{y.}x)}{\sum^{C}_{c=1}{exp(W_{c.}X)}}\)
define:
y:softmax probability output vector (see previous slide)
t:target probability distribution (one-hot)
f=f(x)=\({W_x}\in{R^c}\)and \(f_c\)=c'th element of the f vector\(\delta_{x}\)J=\(\frac{\delta{}}{\delta{x}}\)\(-logp(y|x)=\sum^{C}_{c=1}\delta_{c}W^T_c=W^T\delta{}\)(代价函数的导数)
Let \(\delta_{x}=W^T\delta{}=\delta{x_{window}}\)
With \(X_window=[x_{museums}\) \(x_{in}\) \(x_{Paris}\) \(x_{are}\) \(x_{amazing}\)\(]^T\)\(\delta_{window}\)=[\(\delta{x_{museums}}\) \(\delta{x_{in}}\) \(\delta{x_{Paris}}\) \(\delta{x_{are}}\) \(\delta{x_{amazing}}\)\(]^T\)
Q:如何更新这些连接的词向量?

A single layer neural network


与SoftMax的本质区别:输出可能为另一个神经元的损失函数
It is a combination of a linear layer and a non linear layer:
Z=Wx+b
a=f(z)
The neural activation a can then be used to compute some output
For instance,a probability via SoftMax
p(y|x)=SoftMax( \(W_a\))
Or an unnormalized score (even simple)
score(x)= \(U^T\)a \(\in\)R