3.4 卷积神经网络进阶-Vggnet-Resnet 实战

4.2.4 VGG-ResNet实战

  • VGGNET实战

    VGGNET的思想就是加深神经网络层次,多使用3*3的卷积核替换5*5的python

    这里咱们就不使用1*1的卷积核了git

    咱们能够在以前的卷积神经网络基础上复用数据处理和测试的代码bash

    只修改卷积层部分网络

    # conv1:神经元图,feature map,输出图像
    conv1_1 = tf.layers.conv2d(x_image,
                             32, # output channel number
                             (3,3), # kernal size
                             padding = 'same', # same 表明输出图像的大小没有变化,valid 表明不作padding
                             activation = tf.nn.relu,
                             name = 'conv1_1'
                             )
    conv1_2 = tf.layers.conv2d(conv1_1,
                             32, # output channel number
                             (3,3), # kernal size
                             padding = 'same', # same 表明输出图像的大小没有变化,valid 表明不作padding
                             activation = tf.nn.relu,
                             name = 'conv1_2'
                             )
    # 16*16
    pooling1 = tf.layers.max_pooling2d(conv1_2,
                                       (2, 2), # kernal size
                                       (2, 2), # stride
                                       name = 'pool1' # name为了给这一层作一个命名,这样会让图打印出来的时候会是一个有意义的图
                                      )
    
    conv2_1 = tf.layers.conv2d(pooling1,
                             32, # output channel number
                             (3,3), # kernal size
                             padding = 'same', # same 表明输出图像的大小没有变化,valid 表明不作padding
                             activation = tf.nn.relu,
                             name = 'conv2_1'
                             )
    
    conv2_2 = tf.layers.conv2d(conv2_1,
                             32, # output channel number
                             (3,3), # kernal size
                             padding = 'same', # same 表明输出图像的大小没有变化,valid 表明不作padding
                             activation = tf.nn.relu,
                             name = 'conv2_2'
                             )
    # 8*8
    pooling2 = tf.layers.max_pooling2d(conv2_2,
                                       (2, 2), # kernal size
                                       (2, 2), # stride
                                       name = 'pool2' # name为了给这一层作一个命名,这样会让图打印出来的时候会是一个有意义的图
                                      )
    
    conv3_1 = tf.layers.conv2d(pooling2,
                             32, # output channel number
                             (3,3), # kernal size
                             padding = 'same', # same 表明输出图像的大小没有变化,valid 表明不作padding
                             activation = tf.nn.relu,
                             name = 'conv3_1'
                             )
    
    conv3_2 = tf.layers.conv2d(conv3_1,
                             32, # output channel number
                             (3,3), # kernal size
                             padding = 'same', # same 表明输出图像的大小没有变化,valid 表明不作padding
                             activation = tf.nn.relu,
                             name = 'conv3_2'
                             )
    # 4*4*32
    pooling3 = tf.layers.max_pooling2d(conv3_2,
                                       (2, 2), # kernal size
                                       (2, 2), # stride
                                       name = 'pool3' # name为了给这一层作一个命名,这样会让图打印出来的时候会是一个有意义的图
                                      )
    复制代码

    训练10000次 能够达到百分之70的准确率app

    [Train] Step: 500, loss: 1.92473, acc: 0.45000
    [Train] Step: 1000, loss: 1.49288, acc: 0.35000
    [Train] Step: 1500, loss: 1.30839, acc: 0.55000
    [Train] Step: 2000, loss: 1.41633, acc: 0.40000
    [Train] Step: 2500, loss: 1.10951, acc: 0.60000
    [Train] Step: 3000, loss: 1.15743, acc: 0.65000
    [Train] Step: 3500, loss: 0.93834, acc: 0.70000
    [Train] Step: 4000, loss: 0.76699, acc: 0.80000
    [Train] Step: 4500, loss: 0.71109, acc: 0.70000
    [Train] Step: 5000, loss: 0.75763, acc: 0.75000
    (10000, 3072)
    (10000,)
    [Test ] Step: 5000, acc: 0.67500
    [Train] Step: 5500, loss: 0.98661, acc: 0.65000
    [Train] Step: 6000, loss: 1.43098, acc: 0.50000
    [Train] Step: 6500, loss: 0.86575, acc: 0.70000
    [Train] Step: 7000, loss: 0.80474, acc: 0.65000
    [Train] Step: 7500, loss: 0.60132, acc: 0.85000
    [Train] Step: 8000, loss: 0.66683, acc: 0.80000
    [Train] Step: 8500, loss: 0.56874, acc: 0.85000
    [Train] Step: 9000, loss: 0.68185, acc: 0.70000
    [Train] Step: 9500, loss: 0.83302, acc: 0.70000
    [Train] Step: 10000, loss: 0.87228, acc: 0.70000
    (10000, 3072)
    (10000,)
    [Test ] Step: 10000, acc: 0.72700
    复制代码
  • RESNET实战

    先来回顾一下RESNET的网络结构ide

    image.png

    RESNET是先通过了一个卷积层,又通过了一个池化层,而后再通过若干个残差链接块测试

    这里每通过一个残差链接块之后,可能会通过一个降采样的过程优化

    所谓降采样就是以前的maxpooling或者卷积层的步长等于2ui

    在上面的ResNet中,通过了四次降采样的过程,可是因为咱们的实战使用的图片是32*32的自己就比较小,因此不会通过太多的降采样,也不会首先通过maxpooling层spa

    在降采样的过程当中可能会出现的一个问题是:残差有两部分组成,一部分是卷积操做,一部分是恒等变换,若是卷及操做降采样了,那么会致使两部分的维度不同,这时候的矩阵加法会出问题。因此这个时候须要额外进行一个操做,就是若是卷积作了降采样,那么恒等变化也要作一次降采样,这个操做使用maxpooling来作。

    image.png

    先定义残差块的实现方法

    """ x是输入数据,output_channel 是输出通道数 为了不降采样带来的数据损失,咱们会在降采样的时候讲output_channel翻倍 因此这里若是output_channel是input_channel的二倍,则说明须要降采样 """
    
    def residual_block(x, output_channel):
        """residual connection implementation"""
        input_channel = x.get_shape().as_list()[-1]
        if input_channel * 2 == output_channel:
            increase_dim = True
            strides = (2, 2)
        elif input_channel == output_channel:
            increase_dim = False
            strides = (1, 1)
        else:
             raise Exception("input channel can't match output channel")
                
        conv1 = tf.layers.conv2d(x,
                                 output_channel,
                                 (3,3),
                                 strides = strides,
                                 padding = 'same',
                                 activation = tf.nn.relu,
                                 name = 'conv1')
        conv2 = tf.layers.conv2d(conv1,
                                 output_channel,
                                 (3,3),
                                 strides = (1,1),
                                 padding = 'same',
                                 activation = tf.nn.relu,
                                 name = 'conv2')
        # 处理另外一个分支(恒等变换)
        if increase_dim:
            # 须要降采样
            # [None,image_width,image_height,channel] -> [,,,channel*2]
            pooled_x = tf.layers.average_pooling2d(x,
                                                  (2,2), # pooling 核
                                                  (2,2), # strides strides = pooling 不重叠
                                                  padding = 'valid' # 这里图像大小是32*32,都能除尽,padding是什么没有关系
                                                  )
            
            # average_pooling2d使得图的大小变化了,可是output_channel仍是不匹配,下面修改output_channel
            padded_x = tf.pad(pooled_x,
                             [[0,0],
                              [0,0],
                              [0,0],
                              [input_channel // 2,input_channel //2]])
        else:
            padded_x = x
        output_x = conv2 + padded_x
        return output_x
    复制代码

    而后定义残差网络

    先使用一个卷积层,而后循环建立残差块,最后跟一个全局的池化,而后是全链接到输出

    全局的池化和普通的池化同样,只不过他的size和图像的width,height同样大,这样一个图像的输出就是一个数

    def res_net(x,
                num_residual_blocks,  
                num_filter_base, 
                class_num): 
        """residual network implementation"""
        """ Args: - x: 输入数据 - num_residual_blocks: 残差连接块数 eg: [3,4,6,3] - num_filter_base: 最初的通道数目 - class_num: 类别数目 """
        # 须要作多少次降采样
        num_subsampling = len(num_residual_blocks)
        layers = []
        # [None,image_width,image_height,channel] -> [image_width,image_height,channel]
        # kernal size:image_width,image_height
        input_size = x.get_shape().as_list()[1:]
        with tf.variable_scope('conv0'):
            conv0 = tf.layers.conv2d(x,
                                     num_filter_base,
                                     (3,3),
                                     strides = (1,1),
                                     activation = tf.nn.relu,
                                     padding = 'same',
                                     name = 'conv0')
            layers.append(conv0)
            
        # eg: num_subsampling = 4 ,sample_id = [1,2,3,4] 
        for sample_id in range(num_subsampling):
            for i in range(num_residual_blocks[sample_id]):
                with tf.variable_scope("conv%d_%d" % (sample_id, i)):
                    conv = residual_block(
                        layers[-1],
                        num_filter_base * (2 ** sample_id)) # 每次翻倍
                    layers.append(conv)
        multiplier = 2 ** (num_subsampling - 1)
        assert layers[-1].get_shape().as_list()[1:] \
            == [input_size[0] / multiplier,
                input_size[1] / multiplier,
                num_filter_base * multiplier]
        with tf.variable_scope('fc'):
            # layers[-1].shape : [None, width, height, channel]
            global_pool = tf.reduce_mean(layers[-1], [1, 2]) # pooling
            logits = tf.layers.dense(global_pool, class_num) # 全链接
            layers.append(logits)
        return layers[-1]
    复制代码

    而后使用残差网络

    x = tf.placeholder(tf.float32, [None, 3072])
    y = tf.placeholder(tf.int64, [None])
    
    # 将向量变成具备三通道的图片的格式
    x_image = tf.reshape(x, [-1,3,32,32])
    # 32*32
    x_image = tf.transpose(x_image, perm = [0, 2, 3, 1])
    
    y_ = res_net(x_image, [2,3,2], 32, 10)
    
    
    # 交叉熵
    loss = tf.losses.sparse_softmax_cross_entropy(labels=y, logits=y_)
    # y_-> softmax
    # y -> one_hot
    # loss = ylogy_
    
    # bool
    predict = tf.argmax(y_, 1)
    # [1,0,1,1,1,0,0,0]
    correct_prediction = tf.equal(predict, y)
    accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float64))
    
    with tf.name_scope('train_op'):
        train_op = tf.train.AdamOptimizer(1e-3).minimize(loss)
    复制代码

    这里训练的结构过7000次百分之67.之因此比VGG低,是由于不少优化没有用。优化后的残差网络在cifar10上能够达到94%的准确率

相关文章
相关标签/搜索