【pytorch】pytorch学习笔记（一）

时间 2019-11-20

标签 pytorch 学习笔记繁體版

原文原文链接

原文地址：https://pytorch.org/tutorials/beginner/deep_learning_60min_blitz.htmlhtml

什么是pytorch？python

　　pytorch是一个基于python语言的的科学计算包，主要分为两种受众：数组

可以使用GPU运算取代NumPy
提供最大灵活度和速度的深度学习研究平台

开始网络

Tensors框架

　　Tensors与numpy的ndarray类似，且Tensors能使用GPU进行加速计算。dom

　　建立5 * 3的未初始化矩阵：ide

　　建立并随机初始化矩阵：函数

　　建立一个类型为long且值全为0的矩阵：oop

　　直接赋值建立tensor：学习

　　使用已有的tensor建立一个tensor，这个方法能使用已有tensor的属性如类型：

　　new_* 方法的参数是张量形状大小

　　重写类型，结果具备相同的形状大小

　　获取张量的尺寸大小：

　　注：torch.Size其实是一个元组，因此它支持全部的元组操做。

Operations（运算）

　　pytorch有多种运算的语法。在下列例子中，咱们看一下加法运算：

　　加法1

　　加法2:

　　加法：提供一个输出tensor做为参数

　　加法：in-place （至关于+= ，将结果直接赋值到左侧的变量）

　　注：任何方法名中使用“_”的都是in-place操做，好比x.copy_(y)，x.t_()，都会使x的值发生改变。

　　同时，你也可使用标准NumPy风格的索引操做：

　　Resizing：若是你想要resize或reshape一个张量，可使用方法 torch.view ：

　　若是张量只有一个元素，可使用.item()来获取一个python数值

NumPy Bridge

　　Torch的张量和NumPy数组的相互转换，他们共享底层的内存位置，更改一个另外一个也会改变。

　　张量转换为NumPy数组：

　　当张量的值改变时，numpy数组的值也同时改变：

　　Numpy数组转换为张量：

　　除了字符张量，全部在CPU上的张量都能与NumPy数组相互转换。

CUDA Tensors

　　张量可以被转移到任何设备经过使用 .to 方法。

AUTOGRAD: AUTOMATIC DIFFERENTIATION

　　autograd包在pytorch中是全部神经网络的核心，咱们先看一下而后将会训练本身的神经网络。

　　autograd包给全部张量运算提供了自动微分。他是一个运行定义框架（define-by-run framework），意为着被你的代码如何运行所定义，每次的单个迭代都是不一样的。

　　让咱们用简单的术语和一些例子来看下：

Tensor

　　torch.Tensor是核心类。若是你设置了它的属性 .requires_grad为True，它就开始跟踪全部的运算。当完成全部的计算能够经过 .backward() 而后就能够自动进行全部的梯度计算。该张量的全部梯度将会累加到.grad() 属性。

　　经过调用 .detach() 方法能够中止一个张量对计算记录的追踪。

　　另外一种阻止张量计算追踪的方法是，将代码块写入 with torch.no_grad(): ，这在训练中评价测试一个模型的时候尤为有用，由于咱们设置了requires_grad =True，而此时并不须要梯度。

　　自动微分的另外一个重要实现是类Function。

　　Tensor和Function是相互联系的并创建了一个非循环图，记录了计算的完整历史。每一个张量有.grad_fn属性，这个属性与建立张量的Function相关联（除了用户本身建立张量，该属性grad_fn is None）

　　若是你想要计算导数，你能够调用张量的.backward()方法，若是张量是一个标量，如只有一个元素的数据，则backward()不须要指定任何参数；若是其有更多参数，则须要指定一个具备匹配张量大小的gradient参数。

　　建立一个张量并设置 requires_grad = True来追踪计算：

　　张量运算：

　　y是做为运算的结果所建立的，因此它有属性 grad_fn。

　　y的更多运算：

　　.require_grad_()方法能够改变张量requires_grad的值，若是没指定的话，默认值为False。

Gradients

　　开始反向传播。由于 out 包含一个单一的标量，out.backward() 至关于 out.backward((torch.tensor(1.))) 。　

　　打印梯度d(out)/dx：

x = [ [1, 1], [1, 1] ]

y = x + 2

 z =  y * y * 3

out = z.mean

d(out)/dx = 3/2(x + 2)

　　当.requires_grad=True 时，你也能够经过写入代码块with torch.no_grad()中止追踪历史和自动微分：

NEURAL NETWORKS

　　torch.nn能够建立神经网络，nn依赖于 autograd 来定义模型和微分。nn.Moudle包括各类层和返回output的方法forward(input)。

　　例如，如下为对数字图片进行分类的网络：

　　一个简单的前馈网络，得到输入并通过一个个网络层，最终获得输出。

　　如下是一个神经网络的训练步骤：

定义一个具备可学习参数或权重的神经网络；
对一个数据集的输入进行迭代；
经过网络处理输入；
计算损失
将梯度反向传播给网络参数
更新网络参数，一般的更新方法为 weight = weight - learning_rate* gradien

Define the network

#encoding:utf-8
import torch
import torch.nn as nn
import torch.nn.functional as F
 
class Net(nn.Module):
 
    def __init__(self):
        super(Net,self).__init__()
        #定义2个卷积层
        self.conv1=nn.Conv2d(in_channels=1,out_channels=6,kernel_size=5)
        self.conv2=nn.Conv2d(in_channels=6,out_channels=16,kernel_size=5)
        #定义了3个线性层，即y=wx+b
        self.fc1=nn.Linear(in_features=16*5*5,out_features=120)
        self.fc2=nn.Linear(120,84)
        self.fc3=nn.Linear(84,10)
 
    def forward(self, x):
        #在第一层卷积网络和第二层之间增长了relu激活函数和池化层
        x=F.max_pool2d(input=F.relu(input=self.conv1(x)),kernel_size=(2,2))
        #若是池化层是方块形的能够用一个number指定
        x=F.max_pool2d(input=F.relu(input=self.conv2(x)),kernel_size=2)
        x=x.view(-1,self.num_flat_features(x))
        x=F.relu(input=self.fc1(x))
        x=F.relu(self.fc2(x))
        x=self.fc3(x)
        return x
 
    def num_flat_features(self,x):
        size=x.size()[1:]#切片0里面放的是当前训练batch的number，不是特征信息
        num_features=1
        for s in size:
            num_features*=s
        return num_features
net=Net()
print(net)

Net(
  (conv1): Conv2d(1, 6, kernel_size=(5, 5), stride=(1, 1))
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

　　你只要定义前向函数，当你使用 autograd 时，后向传播函数将会自动定义。在前向函数中你可使用任何张量运算。

　　模型的可学习参数由 net.parameters() 返回。

params = list(net.parameters())
print(len(params))
print(params[0].size())  # conv1's .weight

　　这个网络适合32*32的数据集（1*32*32通过一个核心为5的卷积层，大小变为6*28*28；通过一个核心为2的池化层，大小变为6*14*14；再通过一个核心为5的卷积层，大小变为16*10*10；池化层，16*5*5，正好是下一个线性层的输入特征数），咱们随机一个输入数据集，运行一下咱们的模型吧：

input = torch.randn(1, 1, 32, 32)
out = net(input)
print(out)

tensor([[-8.2934e-03, -1.9684e-02, -7.6836e-02,  5.3187e-02,  1.1083e-01,
         -2.5156e-02, -6.1870e-05, -8.3843e-03,  4.1401e-02, -5.8330e-02]],
       grad_fn=<AddmmBackward>)

　　将全部参数的梯度设为0，使用随机梯度进行反向传播：

net.zero_grad()
out.backward(torch.randn(1, 10))

　　注：torch.nn只支持批量数据的处理，而不是单个样本。例如，nn.Conv2d 将会使用 4D张量，nSamples x nChannels x Height x Width，若是只有单个样本，可使用input.unsqueeze(0)来增长一个假的batch维度。

　　回顾下刚学到的几个类：

torch.Tensor- 多维数组并支持自动微分运算如backward()
nn.Moudle-神经网络模型，帮助咱们封装参数，移植到GPU，导出和加载等
nn.Parameter-张量，当你给Module定义属性时，参数会自动生成
autograd.Function-实现前向和反向定义的自动微分运算。每一个张量运算至少建立一个Function结点，来链接建立张量和记录历史的函数

Loss Function

　　损失函数计算输出和目标之间的值，来衡量二者的差距。在nn中有多种不一样的损失函数，一个简单的损失函数是：nn.MSELoss，它返回的是均方差，即每个份量相减的平方累计最后除份量的个数。

output = net(input)
target = torch.randn(10)  # a dummy target, for example
target = target.view(1, -1)  # make it the same shape as output
criterion = nn.MSELoss()

loss = criterion(output, target)
print(loss)

tensor(0.9531, grad_fn=<MseLossBackward>)

　　若是你沿着loss反向传播的方向，用.grad_fn属性能够看到计算图：

input -> conv2d -> relu -> maxpool2d -> conv2d -> relu -> maxpool2d
      -> view -> linear -> relu -> linear -> relu -> linear
      -> MSELoss
      -> loss

　　因此，当你调用loss.backward()时，在整个计算图都会进行微分，全部requires_grad=True的张量的.grad属性都会累加。

　　为了说明，打印以下少数几步的反向传播：

print(loss.grad_fn)  # MSELoss
print(loss.grad_fn.next_functions[0][0])  # Linear
print(loss.grad_fn.next_functions[0][0].next_functions[0][0])  # ReLU

<MseLossBackward object at 0x7f6e48d5ee48>
<AddmmBackward object at 0x7f6e48d5eac8>
<AccumulateGrad object at 0x7f6e48d5eac8>

Backprop

　　为了反向传播偏差咱们须要作的只要使用 loss.backward() ，同时也须要清除已有的梯度，不然梯度会累加到已经存在的梯度。

　　调用 loss.backward() ，观察conv1 层bias的梯度反向传播先后的变化　　

net.zero_grad()     # zeroes the gradient buffers of all parameters

print('conv1.bias.grad before backward')
print(net.conv1.bias.grad)

loss.backward()

print('conv1.bias.grad after backward')
print(net.conv1.bias.grad)

conv1.bias.grad before backward
tensor([0., 0., 0., 0., 0., 0.])
conv1.bias.grad after backward
tensor([-0.0022,  0.0022, -0.0139,  0.0072,  0.0029,  0.0035])

　　如今咱们已经知道了如何使用损失函数。

Update the weights

　　在实际中使用的最简单的更新规则是随即梯度降低SGD：

weight = weight - learning_rate * gradient

　　咱们能够简单地使用python代码进行实现：

learning_rate = 0.01
for f in net.parameters():
    f.data.sub_(f.grad.data * learning_rate)

　　若是须要使用其余的更新规则，好比SGD，Nesterov-SGD, Adam, RMSProp等等，pytorch也提供了相关的包：torch.optim ，实现了这些方法。

　　使用实例：

import torch.optim as optim

# create your optimizer
optimizer = optim.SGD(net.parameters(), lr=0.01)

# in your training loop:
optimizer.zero_grad()   # zero the gradient buffers
output = net(input)
loss = criterion(output, target)
loss.backward()
optimizer.step()    # Does the update

　　注：代码中咱们手动将梯度设置为0，optimizer.zero_grad()，这是由于梯度不置0的话将会累加之前的值，在反向传播那一节咱们也提到过。

TRAINING A CLASSIFIER

　　咱们已经知道了如何定义网络，计算损失和更新网络权重，那咱们将会思考，那数据呢？

　　一般来说，当你处理图片，文本，语音和视频数据，可使用标准的python包来加载数据为numpy数组，而后转换为张量torch.*Tensor。

图片，一般可使用Pillow, OpenCV
语音，使用scipy 和 librosa
文本，纯python或Cython，也可使用NLTK和SpaCy

　　对于视觉，咱们专门建立了一个名为 torchvision 的包，其拥有加载普通数据集（Imagenet, CIFAR10, MNIST）的加载器和图片数据转换器，即torchvision.datasets 和 torch.utils.data.DataLoader。

　　在本教程中，咱们将使用CIFAR10数据集。它有10个类别：‘airplane’, ‘automobile’, ‘bird’, ‘cat’, ‘deer’, ‘dog’, ‘frog’, ‘horse’, ‘ship’, ‘truck’. CIFAR10中的图像的大小为3*32*32，即大小为32*32的有3个通道的彩色图像。　　

Training an image classifier

　　接下来咱们要作一步步完成如下内容：

使用torchvision加载和归一化CIFAR10的训练和测试数据集
定义一个卷积神经网络
定义一个损失函数
在训练数据上训练数据
在测试集上测试网络

1. Loading and normalizing CIFAR10

import torch
import torchvision
import torchvision.transforms as transforms

　　torchvision数据集的输出是在[0,1]范围的]PILImage，咱们将其转化为归一化范围[-1, 1]的张量。

transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5))])

trainset = torchvision.datasets.CIFAR10(root='./data', train=True,
                                        download=True, transform=transform)
trainloader = torch.utils.data.DataLoader(trainset, batch_size=4,
                                          shuffle=True, num_workers=2)

testset = torchvision.datasets.CIFAR10(root='./data', train=False,
                                       download=True, transform=transform)
testloader = torch.utils.data.DataLoader(testset, batch_size=4,
                                         shuffle=False, num_workers=2)

classes = ('plane', 'car', 'bird', 'cat',
           'deer', 'dog', 'frog', 'horse', 'ship', 'truck')

Downloading https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz to ./data/cifar-10-python.tar.gz
Files already downloaded and verified

　　使用matplotlib库查看训练数据：

import matplotlib.pyplot as plt
import numpy as np

# functions to show an image


def imshow(img):
    img = img / 2 + 0.5     # unnormalize
    npimg = img.numpy()
    plt.imshow(np.transpose(npimg, (1, 2, 0)))
    plt.show()


# get some random training images
dataiter = iter(trainloader)
images, labels = dataiter.next()

# show images
imshow(torchvision.utils.make_grid(images))
# print labels
print(' '.join('%5s' % classes[labels[j]] for j in range(4)))

frog  bird truck  ship

2. Define a Convolutional Neural Network

　　复制神经网络那一节的代码，而后将输入图像的通道改成3个，

import torch.nn as nn
import torch.nn.functional as F
 
class Net(nn.Module):
    def __init__(self):
        super(Net,self).__init__()
        self.conv1=nn.Conv2d(3,6,5)
        self.pool=nn.MaxPool2d(kernel_size=2,stride=2)
        self.conv2=nn.Conv2d(6,16,5)
        self.fc1=nn.Linear(16*5*5,120)
        self.fc2=nn.Linear(in_features=120,out_features=84)
        self.fc3 = nn.Linear(in_features=84, out_features=10)
    def forward(self, x):
        x=self.pool(F.relu(self.conv1(x)))
        x=self.pool(F.relu(self.conv2(x)))
        x=x.view(-1,16*5*5)
        x=F.relu(self.fc1(x))
        x=F.relu(self.fc2(x))
        x=self.fc3(x)
        return  x
net=Net()
print(net)

Net(
  (conv1): Conv2d(3, 6, kernel_size=(5, 5), stride=(1, 1))
  (pool): MaxPool2d(kernel_size=2, stride=2, padding=0, dilation=1, ceil_mode=False)
  (conv2): Conv2d(6, 16, kernel_size=(5, 5), stride=(1, 1))
  (fc1): Linear(in_features=400, out_features=120, bias=True)
  (fc2): Linear(in_features=120, out_features=84, bias=True)
  (fc3): Linear(in_features=84, out_features=10, bias=True)
)

3. Define a Loss function and optimizer

　　使用分类的交叉熵损失函数和带动量的随机梯度降低。

import torch.optim as optim

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(net.parameters(), lr=0.001, momentum=0.9)

4. Train the network　　

for epoch in range(2):#循环整个数据集多少遍
 
    running_loss=0.0
    for i,data in enumerate(trainloader,start=0):
        #get the inputs
        inputs, labels = data
 
        # 梯度清0
        optimizer.zero_grad()
 
        #forward + backword + optimize
        outputs = net(inputs)
        loss = criterion(outputs, labels)
        loss.backward()
        optimizer.step()
 
        #打印统计信息
        running_loss += loss.item()
        if i % 2000 == 1999: #每2000个mini-batches打印一次
            print("[%d, %d] loss: %.3f" %
                  (epoch+1, i+1, running_loss/2000))
            running_loss=0.0
print('Finished Training')