【深度学习】Pytorch 学习笔记

时间 2019-11-11

标签深度学习 pytorch 学习笔记繁體版

原文原文链接

目录html

学习网址：https://www.youtube.com/watch?v=ogZi5oIo4fI
有道云笔记:http://note.youdao.com/noteshare?id=d86bd8fc60cb4fe87005a2d2e2d5b70d&sub=6911732F9FA44C68AD53A09072155ED3python

Pytorch Leture 05: Linear Rregression in the Pytorch Way

第一部分，使用一个类来构建你的模型，须要写forward函数git

import torch
from torch.autograd import Variable
import matplotlib.pyplot as plt

x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0]]))
y_data = Variable(torch.Tensor([[2.0], [4.0], [6.0]]))


class Model(torch.nn.Module):

    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(1, 1)  # One in and one out

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        y_pred = self.linear(x)
        return y_pred

# our model
model = Model()

第二部分，构建loss和优化器来进行参数计算github

# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
# criterion 标准准则 主要用来计算loss
criterion = torch.nn.MSELoss(size_average=False)
# 优化器
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

第三部分，进行训练，forward -> backward -> update parameters网络

# Training loop
for epoch in range(1000):
    # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    # initialize the gradients
    optimizer.zero_grad()
    # 反向传递
    loss.backward()
    # 更新优化器中的权重，即model.parrameters
    optimizer.step()

第四部分，测试cors

# After training
hour_var = Variable(torch.Tensor([[4.0]]))
y_pred = model(hour_var)
print("predict (after training)",  4, model(hour_var).data[0][0])

总结一下基本的训练框架：框架

经过写一个类，来构造你的模型
构建loss和优化器
开始训练 Forward -> compute loss -> backward -> update

Forward: y_pred = model(x_data)
Compute loss: loss = criterion(y_pred,y_data)
Backward: optimizer.zero_grad() && loss.backward()
Update: optimizer.step()

做业测试其余optimizers:ide

torch.optim.Adagrad
torch.optim.Adam
torch.optim.Adamax
torch.optim.ASGD
torch.optim.LBFGS
torch.optim.RRRMSprop
torch.optim.Rprop
torch.optim.SGD

Logistic Regression 逻辑回归 - 二分类

原来的：函数

graph LR

x-->Linear
Linear-->y

\hat{y} = x * w + b

loss = \frac{1}{N}\sum_{n=1}^{N}(\hat{y_n}-y_n)^2

激活函数：oop

using sigmoid functions:

graph LR

x --> Linear
Linear --> Sigmoid
Sigmoid --> y

Y 介于 [0,1] 之间，这样作能够用来压缩计算量，让计算更加容易

\sigma(z) = \frac{1}{1+e^{-z}}

\hat{y} = \sigma(x*w+b)

loss=-\frac{1}{N}\sum_{n=1}^{N}y_nlog\hat{y_n} + (1-y_n)log(1-\hat{y_n})

代码：

import torch
from torch.autograd import Variable
import torch.nn.functional as F

x_data = Variable(torch.Tensor([[1.0], [2.0], [3.0], [4.0],[5.0]]))
y_data = Variable(torch.Tensor([[0.], [0.], [1.], [1.],[1.]]))


class Model(torch.nn.Module):

    def __init__(self):
        """
        In the constructor we instantiate nn.Linear module
        """
        super(Model, self).__init__()
        self.linear = torch.nn.Linear(1, 1)  # One in and one out

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data.
        """
        y_pred = F.sigmoid(self.linear(x))
        return y_pred

# our model
model = Model()


# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
criterion = torch.nn.BCELoss(size_average=True)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

# Training loop
for epoch in range(400):
        # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.data[0])

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

# After training
hour_var = Variable(torch.Tensor([[0.0]]))
print("predict 1 hour ", 0.0, model(hour_var).data[0][0] > 0.5)
hour_var = Variable(torch.Tensor([[7.0]]))
print("predict 7 hours", 7.0, model(hour_var).data[0][0] > 0.5)

新增激活函数：

Design your model using class

y_Pred = F.sigmoid(self.linear(x))

Construct loss and optimizer

change loss into:

criterion = torch.nn.BCELoss(size_average=True)

Training cycle (forward,backward,update)

做业：尝试其余激活函数：

ReLu

ReLU是修正线性单元（The Rectified Linear Unit）的简称，近些年来在深度学习中使用得不少，能够解决梯度弥散问题，由于它的导数等于1或者就是0。相对于sigmoid和tanh激励函数，对ReLU求梯度很是简单，计算也很简单，能够很是大程度地提高随机梯度降低的收敛速度。（由于ReLU是线性的，而sigmoid和tanh是非线性的）。但ReLU的缺点是比较脆弱，随着训练的进行，可能会出现神经元死亡的状况，例若有一个很大的梯度流经ReLU单元后，那权重的更新结果多是，在此以后任何的数据点都没有办法再激活它了。若是发生这种状况，那么流经神经元的梯度从这一点开始将永远是0。也就是说，ReLU神经元在训练中不可逆地死亡了。

ReLu6
ELU

ELU在正值区间的值为x自己，这样减轻了梯度弥散问题（x>0区间导数到处为1），这点跟ReLU、Leaky ReLU类似。而在负值区间，ELU在输入取较小值时具备软饱和的特性，提高了对噪声的鲁棒性

SELU
PReLU
LeakyReLu

Leaky ReLU主要是为了不梯度消失，当神经元处于非激活状态时，容许一个非0的梯度存在，这样不会出现梯度消失，收敛速度快。它的优缺点跟ReLU相似。

Threshold
Hardtanh

tanh函数将输入值压缩至-1到1之间。该函数与Sigmoid相似，也存在着梯度弥散或梯度饱和的缺点。

Sigmoid

这应该是神经网络中使用最频繁的激励函数了，它把一个实数压缩至0到1之间，当输入的数字很是大的时候，结果会接近1，当输入很是大的负数时，则会获得接近0的结果。在早期的神经网络中使用得很是多，由于它很好地解释了神经元受到刺激后是否被激活和向后传递的场景（0：几乎没有被激活，1：彻底被激活），不过近几年在深度学习的应用中比较少见到它的身影，由于使用sigmoid函数容易出现梯度弥散或者梯度饱和。当神经网络的层数不少时，若是每一层的激励函数都采用sigmoid函数的话，就会产生梯度弥散的问题，由于利用反向传播更新参数时，会乘以它的导数，因此会一直减少。若是输入的是比较大或者比较小的数（例如输入100，经Sigmoid函数后结果接近于1，梯度接近于0），会产生饱和效应，致使神经元相似于死亡状态。

Tanh

Lecture07: How to make netural network wide and deep ?

graph LR
a-->Linear
b-->Linear
Linear-->Sigmoid
Sigmoid-->y

多维度，更层次的网络，主要在Design your model using class 中进行的改变

import torch
from torch.autograd import Variable
import numpy as np

xy = np.loadtxt('./data/diabetes.csv.gz', delimiter=',', dtype=np.float32)
x_data = Variable(torch.from_numpy(xy[:, 0:-1]))
y_data = Variable(torch.from_numpy(xy[:, [-1]]))

print(x_data.data.shape)
print(y_data.data.shape)


class Model(torch.nn.Module):

    def __init__(self):
        """
        In the constructor we instantiate two nn.Linear module
        """
        super(Model, self).__init__()
        self.l1 = torch.nn.Linear(8, 6)
        self.l2 = torch.nn.Linear(6, 4)
        self.l3 = torch.nn.Linear(4, 1)

        self.sigmoid = torch.nn.Sigmoid()

    def forward(self, x):
        """
        In the forward function we accept a Variable of input data and we must return
        a Variable of output data. We can use Modules defined in the constructor as
        well as arbitrary operators on Variables.
        """
        out1 = self.sigmoid(self.l1(x))
        out2 = self.sigmoid(self.l2(out1))
        y_pred = self.sigmoid(self.l3(out2))
        return y_pred

# our model
model = Model()


# Construct our loss function and an Optimizer. The call to model.parameters()
# in the SGD constructor will contain the learnable parameters of the two
# nn.Linear modules which are members of the model.
#criterion = torch.nn.BCELoss(size_average=True)
criterion = torch.nn.BCELoss(reduction='elementwise_mean')
optimizer = torch.optim.SGD(model.parameters(), lr=0.1)

# Training loop
for epoch in range(1200000):
        # Forward pass: Compute predicted y by passing x to the model
    y_pred = model(x_data)

    # Compute and print loss
    loss = criterion(y_pred, y_data)
    print(epoch, loss.item())

    # Zero gradients, perform a backward pass, and update the weights.
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

做业：

10层以上的更深层测的网络进行训练
发现并无由于更深，效果变好
更改激励函数

Lecture 08: Pytorch DataLoader

构造Datasets主要分为三个过程：

继承自Dataset

download, rerad data etc
return one item on the index
return the data length

实例化一个dataset,在Dataloader中使用：

train_loader = DataLoader(dataset=dataset,
                          batch_size=1,
                          shuffle=True,
                          num_workers=1)

Code:

# References
# https://github.com/yunjey/pytorch-tutorial/blob/master/tutorials/01-basics/pytorch_basics/main.py
# http://pytorch.org/tutorials/beginner/data_loading_tutorial.html#dataset-class
import torch
import numpy as np
from torch.autograd import Variable
from torch.utils.data import Dataset, DataLoader


class DiabetesDataset(Dataset):
    """ Diabetes dataset."""

    # Initialize your data, download, etc.
    def __init__(self):
        xy = np.loadtxt('./data/diabetes.csv.gz',
                        delimiter=',', dtype=np.float32)
        self.len = xy.shape[0]
        self.x_data = torch.from_numpy(xy[:, 0:-1])
        self.y_data = torch.from_numpy(xy[:, [-1]])

    def __getitem__(self, index):
        return self.x_data[index], self.y_data[index]

    def __len__(self):
        return self.len


dataset = DiabetesDataset()
train_loader = DataLoader(dataset=dataset,
                          batch_size=1,
                          shuffle=True,
                          num_workers=1)

for epoch in range(2):
    for i, data in enumerate(train_loader, 0):
        # get the inputs
        inputs, labels = data

        # wrap them in Variable
        inputs, labels = Variable(inputs), Variable(labels)

        # Run your training process
        print(epoch, i, "inputs", inputs.data, "labels", labels.data)

课后做业：
使用其余数据集，MNIST，参考了官网的代码：

总结一下训练的思路：

构造继承自Dataset的本身的datasets类

[ ] 读取数据集，np.loadtxt("datas.csv") ，构建trainset testset
[ ] 构建DataLoader: 获得trainLoader , testLoader
[ ] 从DataLoader中获取数据： dataiter = iter(trainloader) images, labels = dataiter.next()
[ ] 训练
[ ] 测试

Lecture 09: softmax Classifier

part one

MNist softmax

before:

graph LR

x{x} --> Linear
Linear --> Activation
Activation --> ...
... --> Linear2
Linear2-->Activation2
Activation2-->h{y}

now:

graph LR

x{x} --> Linear
Linear --> Activation
Activation --> ...
... --> Linear2
Linear2-->Activation2
Activation2-->P_y=0
Activation2-->P_y=1
Activation2-->....
Activation2-->P_y=10

what is softmax?

\sigma(z)_j = \frac{e^{z_j}}{\sum_{k=1}^{K}e^{z_k}} for j=1,2,...,k

using softmax to get probabilities.

what is corss entropy?

loss = \frac{1}{N}\sum_i D(Softmax(wx_i+b),Y_i)

D(\hat{Y},Y) = -Ylog\hat{Y}

整个过程:

graph LR
x--LinearModel-->Z
Z--Softmax-->y'
y'--Cross_Entropy-->Y

Pytorch中的实现：

loss = torch.nn.CrossEntropyLoss()
这个中既包括了Softmax也包括了Cross_Entropy

graph LR
X--Softmax-->y'
y'--Cross_Entropy-->Y

Code:

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable


# Cross entropy example
import numpy as np
# One hot
# 0: 1 0 0
# 1: 0 1 0
# 2: 0 0 1
Y = np.array([1, 0, 0])

Y_pred1 = np.array([0.7, 0.2, 0.1])
Y_pred2 = np.array([0.1, 0.3, 0.6])
print("loss1 = ", np.sum(-Y * np.log(Y_pred1)))
print("loss2 = ", np.sum(-Y * np.log(Y_pred2)))

################################################################################

# Softmax + CrossEntropy (logSoftmax + NLLLoss)
loss = nn.CrossEntropyLoss()

# target is of size nBatch
# each element in target has to have 0 <= value < nClasses (0-2)
# Input is class, not one-hot
Y = Variable(torch.LongTensor([0]), requires_grad=False)

# input is of size nBatch x nClasses = 1 x 4
# Y_pred are logits (not softmax)
Y_pred1 = Variable(torch.Tensor([[2.0, 1.0, 0.1]]))
Y_pred2 = Variable(torch.Tensor([[0.5, 2.0, 0.3]]))

l1 = loss(Y_pred1, Y)
l2 = loss(Y_pred2, Y)

print("PyTorch Loss1 = ", l1.data, "\nPyTorch Loss2=", l2.data)

print("Y_pred1=", torch.max(Y_pred1.data, 1)[1])
print("Y_pred2=", torch.max(Y_pred2.data, 1)[1])


################################################################################
"""Batch loss"""
# target is of size nBatch
# each element in target has to have 0 <= value < nClasses (0-2)
# Input is class, not one-hot
Y = Variable(torch.LongTensor([2, 0, 1]), requires_grad=False)

# input is of size nBatch x nClasses = 2 x 4
# Y_pred are logits (not softmax)
Y_pred1 = Variable(torch.Tensor([[0.1, 0.2, 0.9],
                                 [1.1, 0.1, 0.2],
                                 [0.2, 2.1, 0.1]]))


Y_pred2 = Variable(torch.Tensor([[0.8, 0.2, 0.3],
                                 [0.2, 0.3, 0.5],
                                 [0.2, 0.2, 0.5]]))

l1 = loss(Y_pred1, Y)
l2 = loss(Y_pred2, Y)

print("Batch Loss1 = ", l1.data, "\nBatch Loss2=", l2.data)

做业：CrossEntropyLoss VS NLLLoss ?

part two : real problem - MNIST input

MNIST Network

graph LR
inputLayer -.-> HiddenLayer
HiddenLayer -.-> OutputLayer

Code:

# https://github.com/pytorch/examples/blob/master/mnist/main.py
from __future__ import print_function
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

# Training settings
batch_size = 16

# MNIST Dataset
train_dataset = datasets.MNIST(root='./mnist_data/',
                               train=True,
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='./mnist_data/',
                              train=False,
                              transform=transforms.ToTensor())

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.l1 = nn.Linear(784, 520)
        self.l2 = nn.Linear(520, 320)
        self.l3 = nn.Linear(320, 240)
        self.l4 = nn.Linear(240, 120)
        self.l5 = nn.Linear(120, 10)

    def forward(self, x):
        x = x.view(-1, 784)  # Flatten the data (n, 1, 28, 28)-> (n, 784)
        x = F.relu(self.l1(x))
        x = F.relu(self.l2(x))
        x = F.relu(self.l3(x))
        x = F.relu(self.l4(x))
        return self.l5(x)


model = Net()

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)


def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = criterion(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))


def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        # sum up batch loss
        test_loss += criterion(output, target).data[0]
        # get the index of the max
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


for epoch in range(1, 10):
    train(epoch)
    test()

做业：

Use DataLoader

Lecture 10 : basic CNN

Simple convolution layer

for Example:

graph LR
3*3*1_image-->2*2*1_filter_W
3*3*1_image-->1*1_Stride
3*3*1_image-->NoPadding
NoPadding-->2*2_featureMap
2*2*1_filter_W-->2*2_featureMap
1*1_Stride-->2*2_featureMap

How to compute multi-dimension pictures ?

32 * 32 * 3 image
5 * 5 * 3 filter W

w^T + b

Get: 28 * 28 * 1 feature map * N (how many filters you used)

计算公式

OutputSize = \frac{(InputSize+PaddingSize*2-FilterSize)}{Stride} + 1

几个须要解释的参数：

CONV

卷积层，须要配合激活函数使用
filter and padding and filterSize using function above to calculate

torch.nn.Conv2d(in_channels,out_channels,kernel_size)

self.conv1=nn.Conv2d(1,10,kernel_size=5)

激活函数

activate functions

Max Pooling

选取一个n*m的Filter中最大的值做为pooling的结果
还有相似的avg Pooling

nn.MaxPool2d(kernel_size)

self.mp = nn.MaxPool2d(2)

全链接层

self.fc = nn.Linear(320,10)

CNN & Fully Connected network 区别

CNN中的神经元不是跟每一个像素都相连

Fully Connected network中的神经元是跟每一个像素都相连。

implement of Simple CNN

graph TB


ConvolutionalLayer1 --> PoolingLayer1
PoolingLayer1 --> ConvolutionalLayer2
ConvolutionalLayer2 --> PoolingLayer2
PoolingLayer2 --> Fully-ConnectedLayer

Model:

class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.conv1 = nn.Conv2d(1,10,kernel_size=5)
        self.conv2 = nn.Conv2d(10,20,kernel_size=5)
        self.mp = nn.MaxPool2d(2)
        self.fc = nn.Linear(???,10)
    def forward(self,x):
        in_size = x.size(0)
        x = F.relu(self.mp(self.conv1(x)))
        x = F.relu(self.mp(self.conv2(x)))
        x = x.view(in_size,-1) # flatten the tensor
        x = self.fc(x)
        return F.log_softmax(x)

??? 处如何填写

??? 处能够随意先填一个数值，而后经过程序的报错来填写
还能够在forward函数中print(x.size())获得tensor的维度

做业：

尝试更深层次的网络，更深的全链接层

Lecture 11 Advanced CNN

Why 1*1 convolution ?

using 32 1*1 filters to turn 64-dimension pic into 32-dimension pic.

using 1*1 filters can significantly save our computations.

Inception Module

graph LR

Filter_concat_in --> 1*1Conv0_16
Filter_concat_in --> 1*1Conv1_16
Filter_concat_in --> 1*1Conv2_16
Filter_concat_in --> AvgPooling
AvgPooling --> 1*1Conv3_16
1*1Conv0_16 --> 3*3Conv0_24
3*3Conv0_24 --> 3*3Conv1_24
3*3Conv1_24 --> Filter_Concat_out
1*1Conv1_16 --> 5*5Conv_24
5*5Conv_24 --> Filter_Concat_out
1*1Conv3_16 --> Filter_Concat_out
1*1Conv2_16 --> Filter_Concat_out

Implement

最下边的实现（第四道）

self.brach1x1 = nn.Conv2d(in_channels,16,kernel_size=1)

branch1x1 = self.branch1x1(x)

倒数第二道

self.branch_pool = nn.Conv2d(in_channels,24,kernel_size=1)

branch_pool = F.avg_pool2d(x,kernel_size=3,stride=1,padding=1)
branch_pool = self.branch_pool(branch_pool)

正数第二道

self.branch5x5_1 = nn.Conv2d(in_channels,16,kernel_size=1)
self.branch5x5_2 = nn.Conv2d(16,24,kernel_size=1,padding=2)

branch5x5 = self.branch5x5_1(x)
branch5x5 = self.branch5x5_2(branch5x5)

第一道

self.branch3x3_1=nn.Conv2d(in_channels,16,kernel_size=1)
self.branch3x3_2=nn.Conv2d(16,24,kernel_size=3,padding=1)
self.branch3x3_3=nn.Conv2d(24,24,kernel_size=3,padding=1)

branch3x3 = self.branch3x3_1(x)
branch3x3 = self.branch3x3_2(branch3x3)
branch3x3 = self.branch3x3_3(branch3x3)

output

outputs = [branch1x1,branch_pool,branch5x5,branch3x3]

ALL CODE:

# https://github.com/pytorch/examples/blob/master/mnist/main.py
from __future__ import print_function
import argparse
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torchvision import datasets, transforms
from torch.autograd import Variable

# Training settings
batch_size = 64

# MNIST Dataset
train_dataset = datasets.MNIST(root='./data/',
                               train=True,
                               transform=transforms.ToTensor(),
                               download=True)

test_dataset = datasets.MNIST(root='./data/',
                              train=False,
                              transform=transforms.ToTensor())

# Data Loader (Input Pipeline)
train_loader = torch.utils.data.DataLoader(dataset=train_dataset,
                                           batch_size=batch_size,
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset,
                                          batch_size=batch_size,
                                          shuffle=False)


class InceptionA(nn.Module):

    def __init__(self, in_channels):
        super(InceptionA, self).__init__()
        self.branch1x1 = nn.Conv2d(in_channels, 16, kernel_size=1)

        self.branch5x5_1 = nn.Conv2d(in_channels, 16, kernel_size=1)
        self.branch5x5_2 = nn.Conv2d(16, 24, kernel_size=5, padding=2)

        self.branch3x3dbl_1 = nn.Conv2d(in_channels, 16, kernel_size=1)
        self.branch3x3dbl_2 = nn.Conv2d(16, 24, kernel_size=3, padding=1)
        self.branch3x3dbl_3 = nn.Conv2d(24, 24, kernel_size=3, padding=1)

        self.branch_pool = nn.Conv2d(in_channels, 24, kernel_size=1)

    def forward(self, x):
        branch1x1 = self.branch1x1(x)

        branch5x5 = self.branch5x5_1(x)
        branch5x5 = self.branch5x5_2(branch5x5)

        branch3x3dbl = self.branch3x3dbl_1(x)
        branch3x3dbl = self.branch3x3dbl_2(branch3x3dbl)
        branch3x3dbl = self.branch3x3dbl_3(branch3x3dbl)

        branch_pool = F.avg_pool2d(x, kernel_size=3, stride=1, padding=1)
        branch_pool = self.branch_pool(branch_pool)

        outputs = [branch1x1, branch5x5, branch3x3dbl, branch_pool]
        return torch.cat(outputs, 1)


class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.conv1 = nn.Conv2d(1, 10, kernel_size=5)
        self.conv2 = nn.Conv2d(88, 20, kernel_size=5)

        self.incept1 = InceptionA(in_channels=10)
        self.incept2 = InceptionA(in_channels=20)

        self.mp = nn.MaxPool2d(2)
        self.fc = nn.Linear(1408, 10)

    def forward(self, x):
        in_size = x.size(0)
        x = F.relu(self.mp(self.conv1(x)))
        x = self.incept1(x)
        x = F.relu(self.mp(self.conv2(x)))
        x = self.incept2(x)
        x = x.view(in_size, -1)  # flatten the tensor
        x = self.fc(x)
        return F.log_softmax(x)


model = Net()

optimizer = optim.SGD(model.parameters(), lr=0.01, momentum=0.5)


def train(epoch):
    model.train()
    for batch_idx, (data, target) in enumerate(train_loader):
        data, target = Variable(data), Variable(target)
        optimizer.zero_grad()
        output = model(data)
        loss = F.nll_loss(output, target)
        loss.backward()
        optimizer.step()
        if batch_idx % 10 == 0:
            print('Train Epoch: {} [{}/{} ({:.0f}%)]\tLoss: {:.6f}'.format(
                epoch, batch_idx * len(data), len(train_loader.dataset),
                100. * batch_idx / len(train_loader), loss.data[0]))


def test():
    model.eval()
    test_loss = 0
    correct = 0
    for data, target in test_loader:
        data, target = Variable(data, volatile=True), Variable(target)
        output = model(data)
        # sum up batch loss
        test_loss += F.nll_loss(output, target, size_average=False).data[0]
        # get the index of the max log-probability
        pred = output.data.max(1, keepdim=True)[1]
        correct += pred.eq(target.data.view_as(pred)).cpu().sum()

    test_loss /= len(test_loader.dataset)
    print('\nTest set: Average loss: {:.4f}, Accuracy: {}/{} ({:.0f}%)\n'.format(
        test_loss, correct, len(test_loader.dataset),
        100. * correct / len(test_loader.dataset)))


for epoch in range(1, 10):
    train(epoch)
    test()

Lecture 12: RNN

Recurrrent NN

graph LR

X1 --> A1
A1 --> h1

X2 --> A2
A2 --> h2

X3 --> A3
A3 --> h3

X4 --> A4
A4 --> h4

A1 --> A2
A2 --> A3
A3 --> A4

Pytorch提供了RNN函数，能够直接使用

different RNN implementations

cell = nn.RNN(input_size=4,hidden_size=2,batch_first=True)
cell = nn.GRU(input_size=4,hidden_size=2,batch_first=True)
cell = nn.LSTM(input_size=4,hidden_size=2,batch_first=True)

How to use RNN?

cell = nn.RNN(input_size=4,hidden_size=2,batch_first=True)


inputs = ... # batch_size, seq_len,inputSize
hidden = (...) # numLayers,batch_size, hidden_size

out, hidden = cell(inputs,hidden)

有两个输出，一个是output, 一个是hidden layer的output

# Lab 12 RNN
import sys
import torch
import torch.nn as nn
from torch.autograd import Variable

torch.manual_seed(777)  # reproducibility
#            0    1    2    3    4
idx2char = ['h', 'i', 'e', 'l', 'o']

# Teach hihell -> ihello
x_data = [0, 1, 0, 2, 3, 3]   # hihell
one_hot_lookup = [[1, 0, 0, 0, 0],  # 0
                  [0, 1, 0, 0, 0],  # 1
                  [0, 0, 1, 0, 0],  # 2
                  [0, 0, 0, 1, 0],  # 3
                  [0, 0, 0, 0, 1]]  # 4

y_data = [1, 0, 2, 3, 3, 4]    # ihello
x_one_hot = [one_hot_lookup[x] for x in x_data]

# As we have one batch of samples, we will change them to variables only once
inputs = Variable(torch.Tensor(x_one_hot))
labels = Variable(torch.LongTensor(y_data))

num_classes = 5
input_size = 5  # one-hot size
hidden_size = 5  # output from the RNN. 5 to directly predict one-hot
batch_size = 1   # one sentence
sequence_length = 1  # One by one
num_layers = 1  # one-layer rnn


class Model(nn.Module):

    def __init__(self):
        super(Model, self).__init__()
        self.rnn = nn.RNN(input_size=input_size,
                          hidden_size=hidden_size, batch_first=True)

    def forward(self, hidden, x):
        # Reshape input (batch first)
        x = x.view(batch_size, sequence_length, input_size)

        # Propagate input through RNN
        # Input: (batch, seq_len, input_size)
        # hidden: (num_layers * num_directions, batch, hidden_size)
        out, hidden = self.rnn(x, hidden)
        return hidden, out.view(-1, num_classes)

    def init_hidden(self):
        # Initialize hidden and cell states
        # (num_layers * num_directions, batch, hidden_size)
        return Variable(torch.zeros(num_layers, batch_size, hidden_size))


# Instantiate RNN model
model = Model()
print(model)

# Set loss and optimizer function
# CrossEntropyLoss = LogSoftmax + NLLLoss
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=0.1)

# Train the model
for epoch in range(100):
    optimizer.zero_grad()
    loss = 0
    hidden = model.init_hidden()

    sys.stdout.write("predicted string: ")
    for input, label in zip(inputs, labels):
        # print(input.size(), label.size())
        hidden, output = model(hidden, input)
        val, idx = output.max(1)
        sys.stdout.write(idx2char[idx.data[0]])
        loss += criterion(output, label)

    print(", epoch: %d, loss: %1.3f" % (epoch + 1, loss.data[0]))

    loss.backward()
    optimizer.step()

print("Learning finished!")