目录html
学习网址:https://www.youtube.com/watch?v=ogZi5oIo4fI
有道云笔记:http://note.youdao.com/noteshare?id=d86bd8fc60cb4fe87005a2d2e2d5b70d&sub=6911732F9FA44C68AD53A09072155ED3python
第一部分,使用一个类来构建你的模型,须要写forward函数git
import torch from torch.autograd import Variable import matplotlib.pyplot as plt x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0]])) y_data = Variable(torch.Tensor([[2.0], [4.0], [6.0]])) class Model(torch.nn.Module): def __init__(self): """ In the constructor we instantiate two nn.Linear module """ super(Model, self).__init__() self.linear = torch.nn.Linear(1, 1) # One in and one out def forward(self, x): """ In the forward function we accept a Variable of input data and we must return a Variable of output data. We can use Modules defined in the constructor as well as arbitrary operators on Variables. """ y_pred = self.linear(x) return y_pred # our model model = Model()
第二部分,构建loss和优化器来进行参数计算github
# Construct our loss function and an Optimizer. The call to model.parameters() # in the SGD constructor will contain the learnable parameters of the two # nn.Linear modules which are members of the model. # criterion 标准准则 主要用来计算loss criterion = torch.nn.MSELoss(size_average=False) # 优化器 optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
第三部分,进行训练,forward -> backward -> update parameters网络
# Training loop for epoch in range(1000): # Forward pass: Compute predicted y by passing x to the model y_pred = model(x_data) # Compute and print loss loss = criterion(y_pred, y_data) print(epoch, loss.data[0]) # Zero gradients, perform a backward pass, and update the weights. # initialize the gradients optimizer.zero_grad() # 反向传递 loss.backward() # 更新优化器中的权重,即model.parrameters optimizer.step()
第四部分,测试cors
# After training hour_var = Variable(torch.Tensor([[4.0]])) y_pred = model(hour_var) print("predict (after training)", 4, model(hour_var).data[0][0])
总结一下基本的训练框架:框架
做业测试其余optimizers:ide
原来的:函数
graph LR x-->Linear Linear-->y
\hat{y} = x * w + b loss = \frac{1}{N}\sum_{n=1}^{N}(\hat{y_n}-y_n)^2
激活函数:oop
using sigmoid functions:
graph LR x --> Linear Linear --> Sigmoid Sigmoid --> y
Y 介于 [0,1] 之间, 这样作能够用来压缩计算量,让计算更加容易
\sigma(z) = \frac{1}{1+e^{-z}} \hat{y} = \sigma(x*w+b) loss=-\frac{1}{N}\sum_{n=1}^{N}y_nlog\hat{y_n} + (1-y_n)log(1-\hat{y_n})
代码:
import torch from torch.autograd import Variable import torch.nn.functional as F x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0], [4.0],[5.0]])) y_data = Variable(torch.Tensor([[0.], [0.], [1.], [1.],[1.]])) class Model(torch.nn.Module): def __init__(self): """ In the constructor we instantiate nn.Linear module """ super(Model, self).__init__() self.linear = torch.nn.Linear(1, 1) # One in and one out def forward(self, x): """ In the forward function we accept a Variable of input data and we must return a Variable of output data. """ y_pred = F.sigmoid(self.linear(x)) return y_pred # our model model = Model() # Construct our loss function and an Optimizer. The call to model.parameters() # in the SGD constructor will contain the learnable parameters of the two # nn.Linear modules which are members of the model. criterion = torch.nn.BCELoss(size_average=True) optimizer = torch.optim.SGD(model.parameters(), lr=0.01) # Training loop for epoch in range(400): # Forward pass: Compute predicted y by passing x to the model y_pred = model(x_data) # Compute and print loss loss = criterion(y_pred, y_data) print(epoch, loss.data[0]) # Zero gradients, perform a backward pass, and update the weights. optimizer.zero_grad() loss.backward() optimizer.step() # After training hour_var = Variable(torch.Tensor([[0.0]])) print("predict 1 hour ", 0.0, model(hour_var).data[0][0] > 0.5) hour_var = Variable(torch.Tensor([[7.0]])) print("predict 7 hours", 7.0, model(hour_var).data[0][0] > 0.5)
新增激活函数:
y_Pred = F.sigmoid(self.linear(x))
change loss into:
criterion = torch.nn.BCELoss(size_average=True)
做业:尝试其余激活函数:
ReLU是修正线性单元(The Rectified Linear Unit)的简称,近些年来在深度学习中使用得不少,能够解决梯度弥散问题,由于它的导数等于1或者就是0。相对于sigmoid和tanh激励函数,对ReLU求梯度很是简单,计算也很简单,能够很是大程度地提高随机梯度降低的收敛速度。(由于ReLU是线性的,而sigmoid和tanh是非线性的)。但ReLU的缺点是比较脆弱,随着训练的进行,可能会出现神经元死亡的状况,例若有一个很大的梯度流经ReLU单元后,那权重的更新结果多是,在此以后任何的数据点都没有办法再激活它了。若是发生这种状况,那么流经神经元的梯度从这一点开始将永远是0。也就是说,ReLU神经元在训练中不可逆地死亡了。
ELU在正值区间的值为x自己,这样减轻了梯度弥散问题(x>0区间导数到处为1),这点跟ReLU、Leaky ReLU类似。而在负值区间,ELU在输入取较小值时具备软饱和的特性,提高了对噪声的鲁棒性
Leaky ReLU主要是为了不梯度消失,当神经元处于非激活状态时,容许一个非0的梯度存在,这样不会出现梯度消失,收敛速度快。它的优缺点跟ReLU相似。
tanh函数将输入值压缩至-1到1之间。该函数与Sigmoid相似,也存在着梯度弥散或梯度饱和的缺点。
这应该是神经网络中使用最频繁的激励函数了,它把一个实数压缩至0到1之间,当输入的数字很是大的时候,结果会接近1,当输入很是大的负数时,则会获得接近0的结果。在早期的神经网络中使用得很是多,由于它很好地解释了神经元受到刺激后是否被激活和向后传递的场景(0:几乎没有被激活,1:彻底被激活),不过近几年在深度学习的应用中比较少见到它的身影,由于使用sigmoid函数容易出现梯度弥散或者梯度饱和。当神经网络的层数不少时,若是每一层的激励函数都采用sigmoid函数的话,就会产生梯度弥散的问题,由于利用反向传播更新参数时,会乘以它的导数,因此会一直减少。若是输入的是比较大或者比较小的数(例如输入100,经Sigmoid函数后结果接近于1,梯度接近于0),会产生饱和效应,致使神经元相似于死亡状态。
graph LR a-->Linear b-->Linear Linear-->Sigmoid Sigmoid-->y
多维度,更层次的网络,主要在Design your model using class 中进行的改变
import torch from torch.autograd import Variable import numpy as np xy = np.loadtxt('./data/diabetes.csv.gz', delimiter=',', dtype=np.float32) x_data = Variable(torch.from_numpy(xy[:, 0:-1])) y_data = Variable(torch.from_numpy(xy[:, [-1]])) print(x_data.data.shape) print(y_data.data.shape) class Model(torch.nn.Module): def __init__(self): """ In the constructor we instantiate two nn.Linear module """ super(Model, self).__init__() self.l1 = torch.nn.Linear(8, 6) self.l2 = torch.nn.Linear(6, 4) self.l3 = torch.nn.Linear(4, 1) self.sigmoid = torch.nn.Sigmoid() def forward(self, x): """ In the forward function we accept a Variable of input data and we must return a Variable of output data. We can use Modules defined in the constructor as well as arbitrary operators on Variables. """ out1 = self.sigmoid(self.l1(x)) out2 = self.sigmoid(self.l2(out1)) y_pred = self.sigmoid(self.l3(out2)) return y_pred # our model model = Model() # Construct our loss function and an Optimizer. The call to model.parameters() # in the SGD constructor will contain the learnable parameters of the two # nn.Linear modules which are members of the model. #criterion = torch.nn.BCELoss(size_average=True) criterion = torch.nn.BCELoss(reduction='elementwise_mean') optimizer = torch.optim.SGD(model.parameters(), lr=0.1) # Training loop for epoch in range(1200000): # Forward pass: Compute predicted y by passing x to the model y_pred = model(x_data) # Compute and print loss loss = criterion(y_pred, y_data) print(epoch, loss.item()) # Zero gradients, perform a backward pass, and update the weights. optimizer.zero_grad() loss.backward() optimizer.step()
做业:
构造Datasets主要分为三个过程:
继承自Dataset
实例化一个dataset,在Dataloader中使用:
train_loader = DataLoader(dataset=dataset, batch_size=1, shuffle=True, num_workers=1)
Code:
# References # https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/01-basics/pytorch_basics/main.py # http://pytorch.org/tutorials/beginner/data_loading_tutorial.html#dataset-class import torch import numpy as np from torch.autograd import Variable from torch.utils.data import Dataset, DataLoader class DiabetesDataset(Dataset): """ Diabetes dataset.""" # Initialize your data, download, etc. def __init__(self): xy = np.loadtxt('./data/diabetes.csv.gz', delimiter=',', dtype=np.float32) self.len = xy.shape[0] self.x_data = torch.from_numpy(xy[:, 0:-1]) self.y_data = torch.from_numpy(xy[:, [-1]]) def __getitem__(self, index): return self.x_data[index], self.y_data[index] def __len__(self): return self.len dataset = DiabetesDataset() train_loader = DataLoader(dataset=dataset, batch_size=1, shuffle=True, num_workers=1) for epoch in range(2): for i, data in enumerate(train_loader, 0): # get the inputs inputs, labels = data # wrap them in Variable inputs, labels = Variable(inputs), Variable(labels) # Run your training process print(epoch, i, "inputs", inputs.data, "labels", labels.data)
课后做业:
使用其余数据集,MNIST,参考了官网的代码:
总结一下训练的思路:
MNist softmax
before:
graph LR x{x} --> Linear Linear --> Activation Activation --> ... ... --> Linear2 Linear2-->Activation2 Activation2-->h{y}
now:
graph LR x{x} --> Linear Linear --> Activation Activation --> ... ... --> Linear2 Linear2-->Activation2 Activation2-->P_y=0 Activation2-->P_y=1 Activation2-->.... Activation2-->P_y=10
what is softmax?
\sigma(z)_j = \frac{e^{z_j}}{\sum_{k=1}^{K}e^{z_k}} for j=1,2,...,k
using softmax to get probabilities.
what is corss entropy?
loss = \frac{1}{N}\sum_i D(Softmax(wx_i+b),Y_i) D(\hat{Y},Y) = -Ylog\hat{Y}
整个过程:
graph LR x--LinearModel-->Z Z--Softmax-->y' y'--Cross_Entropy-->Y
Pytorch中的实现:
loss = torch.nn.CrossEntropyLoss()
这个中既包括了Softmax也包括了Cross_Entropy
graph LR X--Softmax-->y' y'--Cross_Entropy-->Y
Code:
import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms from torch.autograd import Variable # Cross entropy example import numpy as np # One hot # 0: 1 0 0 # 1: 0 1 0 # 2: 0 0 1 Y = np.array([1, 0, 0]) Y_pred1 = np.array([0.7, 0.2, 0.1]) Y_pred2 = np.array([0.1, 0.3, 0.6]) print("loss1 = ", np.sum(-Y * np.log(Y_pred1))) print("loss2 = ", np.sum(-Y * np.log(Y_pred2))) ################################################################################ # Softmax + CrossEntropy (logSoftmax + NLLLoss) loss = nn.CrossEntropyLoss() # target is of size nBatch # each element in target has to have 0 <= value < nClasses (0-2) # Input is class, not one-hot Y = Variable(torch.LongTensor([0]), requires_grad=False) # input is of size nBatch x nClasses = 1 x 4 # Y_pred are logits (not softmax) Y_pred1 = Variable(torch.Tensor([[2.0, 1.0, 0.1]])) Y_pred2 = Variable(torch.Tensor([[0.5, 2.0, 0.3]])) l1 = loss(Y_pred1, Y) l2 = loss(Y_pred2, Y) print("PyTorch Loss1 = ", l1.data, "\nPyTorch Loss2=", l2.data) print("Y_pred1=", torch.max(Y_pred1.data, 1)[1]) print("Y_pred2=", torch.max(Y_pred2.data, 1)[1]) ################################################################################ """Batch loss""" # target is of size nBatch # each element in target has to have 0 <= value < nClasses (0-2) # Input is class, not one-hot Y = Variable(torch.LongTensor([2, 0, 1]), requires_grad=False) # input is of size nBatch x nClasses = 2 x 4 # Y_pred are logits (not softmax) Y_pred1 = Variable(torch.Tensor([[0.1, 0.2, 0.9], [1.1, 0.1, 0.2], [0.2, 2.1, 0.1]])) Y_pred2 = Variable(torch.Tensor([[0.8, 0.2, 0.3], [0.2, 0.3, 0.5], [0.2, 0.2, 0.5]])) l1 = loss(Y_pred1, Y) l2 = loss(Y_pred2, Y) print("Batch Loss1 = ", l1.data, "\nBatch Loss2=", l2.data)
做业:CrossEntropyLoss VS NLLLoss ?
graph LR inputLayer -.-> HiddenLayer HiddenLayer -.-> OutputLayer
Code:
# https://github.com/pytorch/examples/blob/master/mnist/main.py from __future__ import print_function import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms from torch.autograd import Variable # Training settings batch_size = 16 # MNIST Dataset train_dataset = datasets.MNIST(root='./mnist_data/', train=True, transform=transforms.ToTensor(), download=True) test_dataset = datasets.MNIST(root='./mnist_data/', train=False, transform=transforms.ToTensor()) # Data Loader (Input Pipeline) train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True) test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False) class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.l1 = nn.Linear(784, 520) self.l2 = nn.Linear(520, 320) self.l3 = nn.Linear(320, 240) self.l4 = nn.Linear(240, 120) self.l5 = nn.Linear(120, 10) def forward(self, x): x = x.view(-1, 784) # Flatten the data (n, 1, 28, 28)-> (n, 784) x = F.relu(self.l1(x)) x = F.relu(self.l2(x)) x = F.relu(self.l3(x)) x = F.relu(self.l4(x)) return self.l5(x) model = Net() criterion = nn.CrossEntropyLoss() optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) def train(epoch): model.train() for batch_idx, (data, target) in enumerate(train_loader): data, target = Variable(data), Variable(target) optimizer.zero_grad() output = model(data) loss = criterion(output, target) loss.backward() optimizer.step() if batch_idx % 10 == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch, batch_idx * len(data), len(train_loader.dataset), 100. * batch_idx / len(train_loader), loss.data[0])) def test(): model.eval() test_loss = 0 correct = 0 for data, target in test_loader: data, target = Variable(data, volatile=True), Variable(target) output = model(data) # sum up batch loss test_loss += criterion(output, target).data[0] # get the index of the max pred = output.data.max(1, keepdim=True)[1] correct += pred.eq(target.data.view_as(pred)).cpu().sum() test_loss /= len(test_loader.dataset) print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format( test_loss, correct, len(test_loader.dataset), 100. * correct / len(test_loader.dataset))) for epoch in range(1, 10): train(epoch) test()
做业:
Use DataLoader
Simple convolution layer
for Example:
graph LR 3*3*1_image-->2*2*1_filter_W 3*3*1_image-->1*1_Stride 3*3*1_image-->NoPadding NoPadding-->2*2_featureMap 2*2*1_filter_W-->2*2_featureMap 1*1_Stride-->2*2_featureMap
How to compute multi-dimension pictures ?
w^T + b
Get: 28 * 28 * 1 feature map * N (how many filters you used)
计算公式
OutputSize = \frac{(InputSize+PaddingSize*2-FilterSize)}{Stride} + 1
几个须要解释的参数:
CONV
卷积层,须要配合激活函数使用
filter and padding and filterSize using function above to calculate
torch.nn.Conv2d(in_channels,out_channels,kernel_size)
self.conv1=nn.Conv2d(1,10,kernel_size=5)
激活函数
activate functions
Max Pooling
选取一个n*m的Filter中最大的值做为pooling的结果
还有相似的avg Pooling
nn.MaxPool2d(kernel_size)
self.mp = nn.MaxPool2d(2)
全链接层
self.fc = nn.Linear(320,10)
CNN中的神经元不是跟每一个像素都相连
Fully Connected network中的神经元是跟每一个像素都相连。
graph TB ConvolutionalLayer1 --> PoolingLayer1 PoolingLayer1 --> ConvolutionalLayer2 ConvolutionalLayer2 --> PoolingLayer2 PoolingLayer2 --> Fully-ConnectedLayer
Model:
class Net(nn.Module): def __init__(self): super(Net,self).__init__() self.conv1 = nn.Conv2d(1,10,kernel_size=5) self.conv2 = nn.Conv2d(10,20,kernel_size=5) self.mp = nn.MaxPool2d(2) self.fc = nn.Linear(???,10) def forward(self,x): in_size = x.size(0) x = F.relu(self.mp(self.conv1(x))) x = F.relu(self.mp(self.conv2(x))) x = x.view(in_size,-1) # flatten the tensor x = self.fc(x) return F.log_softmax(x)
???
处如何填写
???
处能够随意先填一个数值,而后经过程序的报错来填写做业:
尝试更深层次的网络,更深的全链接层
Why 1*1 convolution ?
using 32 1*1 filters to turn 64-dimension pic into 32-dimension pic.
using 1*1 filters can significantly save our computations.
graph LR Filter_concat_in --> 1*1Conv0_16 Filter_concat_in --> 1*1Conv1_16 Filter_concat_in --> 1*1Conv2_16 Filter_concat_in --> AvgPooling AvgPooling --> 1*1Conv3_16 1*1Conv0_16 --> 3*3Conv0_24 3*3Conv0_24 --> 3*3Conv1_24 3*3Conv1_24 --> Filter_Concat_out 1*1Conv1_16 --> 5*5Conv_24 5*5Conv_24 --> Filter_Concat_out 1*1Conv3_16 --> Filter_Concat_out 1*1Conv2_16 --> Filter_Concat_out
Implement
self.brach1x1 = nn.Conv2d(in_channels,16,kernel_size=1) branch1x1 = self.branch1x1(x)
self.branch_pool = nn.Conv2d(in_channels,24,kernel_size=1) branch_pool = F.avg_pool2d(x,kernel_size=3,stride=1,padding=1) branch_pool = self.branch_pool(branch_pool)
self.branch5x5_1 = nn.Conv2d(in_channels,16,kernel_size=1) self.branch5x5_2 = nn.Conv2d(16,24,kernel_size=1,padding=2) branch5x5 = self.branch5x5_1(x) branch5x5 = self.branch5x5_2(branch5x5)
self.branch3x3_1=nn.Conv2d(in_channels,16,kernel_size=1) self.branch3x3_2=nn.Conv2d(16,24,kernel_size=3,padding=1) self.branch3x3_3=nn.Conv2d(24,24,kernel_size=3,padding=1) branch3x3 = self.branch3x3_1(x) branch3x3 = self.branch3x3_2(branch3x3) branch3x3 = self.branch3x3_3(branch3x3)
outputs = [branch1x1,branch_pool,branch5x5,branch3x3]
ALL CODE:
# https://github.com/pytorch/examples/blob/master/mnist/main.py from __future__ import print_function import argparse import torch import torch.nn as nn import torch.nn.functional as F import torch.optim as optim from torchvision import datasets, transforms from torch.autograd import Variable # Training settings batch_size = 64 # MNIST Dataset train_dataset = datasets.MNIST(root='./data/', train=True, transform=transforms.ToTensor(), download=True) test_dataset = datasets.MNIST(root='./data/', train=False, transform=transforms.ToTensor()) # Data Loader (Input Pipeline) train_loader = torch.utils.data.DataLoader(dataset=train_dataset, batch_size=batch_size, shuffle=True) test_loader = torch.utils.data.DataLoader(dataset=test_dataset, batch_size=batch_size, shuffle=False) class InceptionA(nn.Module): def __init__(self, in_channels): super(InceptionA, self).__init__() self.branch1x1 = nn.Conv2d(in_channels, 16, kernel_size=1) self.branch5x5_1 = nn.Conv2d(in_channels, 16, kernel_size=1) self.branch5x5_2 = nn.Conv2d(16, 24, kernel_size=5, padding=2) self.branch3x3dbl_1 = nn.Conv2d(in_channels, 16, kernel_size=1) self.branch3x3dbl_2 = nn.Conv2d(16, 24, kernel_size=3, padding=1) self.branch3x3dbl_3 = nn.Conv2d(24, 24, kernel_size=3, padding=1) self.branch_pool = nn.Conv2d(in_channels, 24, kernel_size=1) def forward(self, x): branch1x1 = self.branch1x1(x) branch5x5 = self.branch5x5_1(x) branch5x5 = self.branch5x5_2(branch5x5) branch3x3dbl = self.branch3x3dbl_1(x) branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl) branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl) branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1) branch_pool = self.branch_pool(branch_pool) outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool] return torch.cat(outputs, 1) class Net(nn.Module): def __init__(self): super(Net, self).__init__() self.conv1 = nn.Conv2d(1, 10, kernel_size=5) self.conv2 = nn.Conv2d(88, 20, kernel_size=5) self.incept1 = InceptionA(in_channels=10) self.incept2 = InceptionA(in_channels=20) self.mp = nn.MaxPool2d(2) self.fc = nn.Linear(1408, 10) def forward(self, x): in_size = x.size(0) x = F.relu(self.mp(self.conv1(x))) x = self.incept1(x) x = F.relu(self.mp(self.conv2(x))) x = self.incept2(x) x = x.view(in_size, -1) # flatten the tensor x = self.fc(x) return F.log_softmax(x) model = Net() optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5) def train(epoch): model.train() for batch_idx, (data, target) in enumerate(train_loader): data, target = Variable(data), Variable(target) optimizer.zero_grad() output = model(data) loss = F.nll_loss(output, target) loss.backward() optimizer.step() if batch_idx % 10 == 0: print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format( epoch, batch_idx * len(data), len(train_loader.dataset), 100. * batch_idx / len(train_loader), loss.data[0])) def test(): model.eval() test_loss = 0 correct = 0 for data, target in test_loader: data, target = Variable(data, volatile=True), Variable(target) output = model(data) # sum up batch loss test_loss += F.nll_loss(output, target, size_average=False).data[0] # get the index of the max log-probability pred = output.data.max(1, keepdim=True)[1] correct += pred.eq(target.data.view_as(pred)).cpu().sum() test_loss /= len(test_loader.dataset) print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format( test_loss, correct, len(test_loader.dataset), 100. * correct / len(test_loader.dataset))) for epoch in range(1, 10): train(epoch) test()
Recurrrent NN
graph LR X1 --> A1 A1 --> h1 X2 --> A2 A2 --> h2 X3 --> A3 A3 --> h3 X4 --> A4 A4 --> h4 A1 --> A2 A2 --> A3 A3 --> A4
Pytorch提供了RNN函数,能够直接使用
different RNN implementations
cell = nn.RNN(input_size=4,hidden_size=2,batch_first=True) cell = nn.GRU(input_size=4,hidden_size=2,batch_first=True) cell = nn.LSTM(input_size=4,hidden_size=2,batch_first=True)
How to use RNN?
cell = nn.RNN(input_size=4,hidden_size=2,batch_first=True) inputs = ... # batch_size, seq_len,inputSize hidden = (...) # numLayers,batch_size, hidden_size out, hidden = cell(inputs,hidden)
有两个输出,一个是output, 一个是hidden layer的output
# Lab 12 RNN import sys import torch import torch.nn as nn from torch.autograd import Variable torch.manual_seed(777) # reproducibility # 0 1 2 3 4 idx2char = ['h', 'i', 'e', 'l', 'o'] # Teach hihell -> ihello x_data = [0, 1, 0, 2, 3, 3] # hihell one_hot_lookup = [[1, 0, 0, 0, 0], # 0 [0, 1, 0, 0, 0], # 1 [0, 0, 1, 0, 0], # 2 [0, 0, 0, 1, 0], # 3 [0, 0, 0, 0, 1]] # 4 y_data = [1, 0, 2, 3, 3, 4] # ihello x_one_hot = [one_hot_lookup[x] for x in x_data] # As we have one batch of samples, we will change them to variables only once inputs = Variable(torch.Tensor(x_one_hot)) labels = Variable(torch.LongTensor(y_data)) num_classes = 5 input_size = 5 # one-hot size hidden_size = 5 # output from the RNN. 5 to directly predict one-hot batch_size = 1 # one sentence sequence_length = 1 # One by one num_layers = 1 # one-layer rnn class Model(nn.Module): def __init__(self): super(Model, self).__init__() self.rnn = nn.RNN(input_size=input_size, hidden_size=hidden_size, batch_first=True) def forward(self, hidden, x): # Reshape input (batch first) x = x.view(batch_size, sequence_length, input_size) # Propagate input through RNN # Input: (batch, seq_len, input_size) # hidden: (num_layers * num_directions, batch, hidden_size) out, hidden = self.rnn(x, hidden) return hidden, out.view(-1, num_classes) def init_hidden(self): # Initialize hidden and cell states # (num_layers * num_directions, batch, hidden_size) return Variable(torch.zeros(num_layers, batch_size, hidden_size)) # Instantiate RNN model model = Model() print(model) # Set loss and optimizer function # CrossEntropyLoss = LogSoftmax + NLLLoss criterion = nn.CrossEntropyLoss() optimizer = torch.optim.Adam(model.parameters(), lr=0.1) # Train the model for epoch in range(100): optimizer.zero_grad() loss = 0 hidden = model.init_hidden() sys.stdout.write("predicted string: ") for input, label in zip(inputs, labels): # print(input.size(), label.size()) hidden, output = model(hidden, input) val, idx = output.max(1) sys.stdout.write(idx2char[idx.data[0]]) loss += criterion(output, label) print(", epoch: %d, loss: %1.3f" % (epoch + 1, loss.data[0])) loss.backward() optimizer.step() print("Learning finished!")