文章目录

引入

本节对VGG进行介绍，其名字源于论文所在实验室Visual Geometry Group $^{\color{red}[1]}$ 。VGG提出了能够经过重复使用简单的基础块来构建深度模型的思路 $^{\color{red}[2]}$ 。html

注：
[1] Simonyan, K., & Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556.
[2] 李沐、Aston Zhang老师等，动手学深度学习。python

1 库引入

import time
import torch
from torch import nn, optim
from util.SimpleTool import load_data_fashion_mnist

2 VGG块

VGG块的组成规律是：连续使用数个相同的填充为 $1$ 、窗口形状为 $\times 3$ 的卷积层，后接一个步幅为 $2$ 、窗口形状为 $\times 2$ 的最大池化层 $^{\color{red}[1]}$ 。卷积层保持输入的高宽不变，池化层使其减半。
如下代码实现了基础的VGG块，它能够指定卷积层的数量和输入输出通道数：web

def vgg_block(num_convs, in_channels, out_channels):
    """ The VGG block. """
    temp_block = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.ReLU()]
    for i in range(1, num_convs):
        temp_block.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        temp_block.append(nn.ReLU())
    temp_block.append(nn.MaxPool2d(kernel_size=2, stride=2))
    return nn.Sequential(*temp_block)

3 VGG网络

VGG网络由卷积层模块后接全链接层模块构成。卷积层模块串联数个vgg_block，其超参数由变量conv_arch定义：指定了每一个VGG块里卷积层的个数和输入输出通道数，全链接模块则和AlexNet一致。
如今构造一个VGG网络，其具备如下特色：
1） $5$ 个卷积块，前两个使用单卷积层，后 $3$ 块使用双卷积层；
2）第一块的输入输出通道分为是 $1$ 和 $64$ ，以后每次对输出通道翻倍，直到变成 $512$ 。
因为该网络使用了 $8$ 个卷积层和 $3$ 个全链接层，故被称为VGG-1：网络

def vgg(conv_arch, fc_num_features, fc_num_hiddens=4096):
    ret_net = nn.Sequential()
    for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):
        ret_net.add_module("vgg_block_" + str(i + 1), vgg_block(num_convs, in_channels, out_channels))
    ret_net.add_module("fc", nn.Sequential(FlattenLayer(),
                                       nn.Linear(fc_num_features, fc_num_hiddens),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Linear(fc_num_hiddens, fc_num_hiddens),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Linear(fc_num_hiddens, 10)
                                       ))
    return ret_net


class FlattenLayer(torch.nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()

    def forward(self, x):
        return x.view(x.shape[0], -1)

输出每一级的形状看看：app

def test1():
    temp_conv_arch = ((1, 1, 64),
                      (1, 64, 128),
                      (2, 128, 256),
                      (2, 256, 512),
                      (2, 512, 512))
    temp_fc_num_features = 512 * 7 * 7
    temp_fc_num_hiddens = 4096
    temp_net = vgg(temp_conv_arch, temp_fc_num_features, temp_fc_num_hiddens)
    temp_x = torch.rand(1, 1, 224, 224)
    for name, block in temp_net.named_children():
        temp_x = block(temp_x)
        print(name, "output shape:", temp_x.shape)


if __name__ == '__main__':
    test1()

输出以下：ide

vgg_block_1 output shape: torch.Size([1, 64, 112, 112])
vgg_block_2 output shape: torch.Size([1, 128, 56, 56])
vgg_block_3 output shape: torch.Size([1, 256, 28, 28])
vgg_block_4 output shape: torch.Size([1, 512, 14, 14])
vgg_block_5 output shape: torch.Size([1, 512, 7, 7])
fc output shape: torch.Size([1, 10])

能够发现，每次的输入和高宽都减半，直到变为 $\times 7$ 传入全链接层。与此同时，输出通道数每次翻倍，直到 $512$ 。
由于每一个卷积层的窗口大小一致，全部每层模型的参数尺寸和计算复杂度与输入高宽、通道数的乘积成正比。
VGG这种高宽减半、通道翻倍的设计使得多数卷积层都有相同的模型参数尺寸和计算复杂度。svg

4 获取数据和模型训练

因为VGG-11相对复杂，所以构造一个通道更小的网络在Fashion-MNIST数据集上进行训练 (train函数以及load_data_fashion_mnist与AlexNet相同)：函数

def test2():
    temp_ratio = 8
    temp_conv_arch = ((1, 1, 64 // temp_ratio),
                      (1, 64 // temp_ratio, 128 // temp_ratio),
                      (2, 128 // temp_ratio, 256 // temp_ratio),
                      (2, 256 // temp_ratio, 512 // temp_ratio),
                      (2, 512 // temp_ratio, 512 // temp_ratio))
    temp_fc_num_features = 512 * 7 * 7
    temp_fc_num_hiddens = 4096
    temp_net = vgg(temp_conv_arch, temp_fc_num_features // temp_ratio, temp_fc_num_hiddens // temp_ratio)
    temp_batch_size = 64
    temp_tr_iter, temp_te_iter = load_data_fashion_mnist(temp_batch_size, resize=224)
    temp_lr = 0.001
    temp_num_epochs = 5
    temp_optimizer = optim.Adam(temp_net.parameters(), lr=temp_lr)
    train(temp_net, temp_tr_iter, temp_te_iter, temp_batch_size, temp_optimizer, num_epochs=temp_num_epochs)


if __name__ == '__main__':
    test2()

输出以下：学习

Training on cpu
Epoch 1, loss 0.5778, training acc 0.786, test ass 0.881, time 1180.2 s

完整代码

""" @author: Inki @contact: inki.yinji@gmail.com @version: Created in 2020 1220, last modified in 2020 1220. """

import time
import torch
from torch import nn, optim
from util.SimpleTool import load_data_fashion_mnist


def vgg_block(num_convs, in_channels, out_channels):
    """ The VGG block. """
    temp_block = [nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.ReLU()]
    for i in range(1, num_convs):
        temp_block.append(nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1))
        temp_block.append(nn.ReLU())
    temp_block.append(nn.MaxPool2d(kernel_size=2, stride=2))
    return nn.Sequential(*temp_block)


def vgg(conv_arch, fc_num_features, fc_num_hiddens=4096):
    ret_net = nn.Sequential()
    for i, (num_convs, in_channels, out_channels) in enumerate(conv_arch):
        ret_net.add_module("vgg_block_" + str(i + 1), vgg_block(num_convs, in_channels, out_channels))
    ret_net.add_module("fc", nn.Sequential(FlattenLayer(),
                                       nn.Linear(fc_num_features, fc_num_hiddens),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Linear(fc_num_hiddens, fc_num_hiddens),
                                       nn.ReLU(),
                                       nn.Dropout(0.5),
                                       nn.Linear(fc_num_hiddens, 10)
                                       ))
    return ret_net


class FlattenLayer(torch.nn.Module):
    def __init__(self):
        super(FlattenLayer, self).__init__()

    def forward(self, x):
        return x.view(x.shape[0], -1)


def test1():
    temp_conv_arch = ((1, 1, 64),
                      (1, 64, 128),
                      (2, 128, 256),
                      (2, 256, 512),
                      (2, 512, 512))
    temp_fc_num_features = 512 * 7 * 7
    temp_fc_num_hiddens = 4096
    temp_net = vgg(temp_conv_arch, temp_fc_num_features, temp_fc_num_hiddens)
    temp_x = torch.rand(1, 1, 224, 224)
    for name, block in temp_net.named_children():
        temp_x = block(temp_x)
        print(name, "output shape:", temp_x.shape)


def test2():
    temp_ratio = 8
    temp_conv_arch = ((1, 1, 64 // temp_ratio),
                      (1, 64 // temp_ratio, 128 // temp_ratio),
                      (2, 128 // temp_ratio, 256 // temp_ratio),
                      (2, 256 // temp_ratio, 512 // temp_ratio),
                      (2, 512 // temp_ratio, 512 // temp_ratio))
    temp_fc_num_features = 512 * 7 * 7
    temp_fc_num_hiddens = 4096
    temp_net = vgg(temp_conv_arch, temp_fc_num_features // temp_ratio, temp_fc_num_hiddens // temp_ratio)
    temp_batch_size = 64
    temp_tr_iter, temp_te_iter = load_data_fashion_mnist(temp_batch_size, resize=224)
    temp_lr = 0.001
    temp_num_epochs = 5
    temp_optimizer = optim.Adam(temp_net.parameters(), lr=temp_lr)
    train(temp_net, temp_tr_iter, temp_te_iter, temp_batch_size, temp_optimizer, num_epochs=temp_num_epochs)

注：
[1] 对于给定的感觉野，采用堆积的小卷积核优于采用大的卷积核，由于能够增长网络深度来保证学习更复杂的模型，并且代价更小。例如在VGG中，使用 $3$ 个 $\times 3$ 的卷积核来代替 $\times 7$ 卷积核，使用 $2$ 个 $\times 3$ 卷积核代替 $\times 5$ 的卷积核，这样既提高了网络的深度，使用网络效果提高，也减少了参数数量。ui

本文同步分享在博客“因吉”（CSDN）。
若有侵权，请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”，欢迎正在阅读的你也加入，一块儿分享。

深度学习 (二十二)：卷积神经网络之VGG模型