【Semantic Segmentation】DeepLab V3（转）

时间 2019-12-13

标签 semantic segmentation deeplab v3 繁體版

原文原文链接

原文地址：DeepLabv3python

代码:git

TensorFlow

Abstract

DeepLabv3进一步探讨空洞卷积，这是一个在语义分割任务中：能够调整滤波器视野、控制卷积神经网络计算的特征响应分辨率的强大工具。为了解决多尺度下的目标分割问题，咱们设计了空洞卷积级联或不一样采样率空洞卷积并行架构。此外，咱们强调了ASPP(Atrous Spatial Pyramid Pooling)模块，该模块能够在获取多个尺度上卷积特征，进一步提高性能。同时，咱们分享了实施细节和训练方法，这次提出的DeepLabv3相比先前的版本有显著的效果提高，在PASCAL VOC 2012上得到了先进的性能。github

Introduction

对于语义分割任务，在应用深度卷积神经网络中的有两个挑战：网络

第一个挑战：连续池化和下采样，让高层特征具备局部图像变换的内在不变性，这容许DCNN学习愈来愈抽象的特征表示。但同时引发的特征分辨率降低，会妨碍密集的定位预测任务，由于这须要详细的空间信息。DeepLabv3系列解决这一问题的办法是使用空洞卷积(前两个版本会使用CRF细化分割结果)，这容许咱们能够保持参数量和计算量的同时提高计算特征响应的分辨率，从而得到更多的上下文。session
第二个挑战：多尺度目标的存在。现有多种处理多尺度目标的方法，咱们主要考虑4种，以下图：
架构
- a. Image Pyramid: 将输入图片放缩成不一样比例，分别应用在DCNN上，将预测结果融合获得最终输出
- b. Encoder-Decoder: 利用Encoder阶段的多尺度特征，运用到Decoder阶段上恢复空间分辨率(表明工做有FCN、SegNet、PSPNet等工做)
- c. Deeper w. Atrous Convolution: 在原始模型的顶端增长额外的模块，例如DenseCRF，捕捉像素间长距离信息
- d. Spatial Pyramid Pooling: 空间金字塔池化具备不一样采样率和多种视野的卷积核，可以以多尺度捕捉对象

DeepLabv3的主要贡献在于：app

本文从新讨论了空洞卷积的使用，这让咱们在级联模块和空间金字塔池化的框架下，可以获取更大的感觉野从而获取多尺度信息。框架
改进了ASPP模块：由不一样采样率的空洞卷积和BN层组成，咱们尝试以级联或并行的方式布局模块。dom
讨论了一个重要问题：使用大采样率的 $3 \times 3$ 的空洞卷积，由于图像边界响应没法捕捉远距离信息，会退化为1×1的卷积, 咱们建议将图像级特征融合到ASPP模块中。 ide
阐述了训练细节并分享了训练经验，论文提出的”DeepLabv3”改进了之前的工做，得到了很好的结果

现有多个工做代表全局特征或上下文之间的互相做用有助于作语义分割，咱们讨论四种不一样类型利用上下文信息作语义分割的全卷积网络。

图像金字塔(Image pyramid)：一般使用共享权重的模型，适用于多尺度的输入。小尺度的输入响应控制语义，大尺寸的输入响应控制细节。经过拉布拉斯金字塔对输入变换成多尺度，传入DCNN，融合输出。这类的缺点是：由于GPU存储器的限制，对于更大/更深的模型不方便扩展。一般应用于推断阶段。
编码器-解码器(Encoder-decoder)： 编码器的高层次的特征容易捕获更长的距离信息，在解码器阶段使用编码器阶段的信息帮助恢复目标的细节和空间维度。例如SegNet利用下采样的池化索引做为上采样的指导；U-Net增长了编码器部分的特征跳跃链接到解码器；RefineNet等证实了Encoder-Decoder结构的有效性。
上下文模块(Context module)：包含了额外的模块用于级联编码长距离的上下文。一种有效的方法是DenseCRF并入DCNN中，共同训练DCNN和CRF。
空间金字塔池化(Spatial pyramid pooling)：采用空间金字塔池化能够捕捉多个层次的上下文。在ParseNet中从不一样图像等级的特征中获取上下文信息；DeepLabv2提出ASPP，以不一样采样率的并行空洞卷积捕捉多尺度信息。最近PSPNet在不一样网格尺度上执行空间池化，并在多个数据集上得到优异的表现。还有其余基于LSTM方法聚合全局信息。

咱们的工做主要探讨空洞卷积做为上下文模块和一个空间金字塔池化的工具，这适用于任何网络。具体来讲，咱们取ResNet最后一个block，复制多个级联起来，送入到ASPP模块后。咱们经过实验发现使用BN层有利于训练过程，为了进一步捕获全局上下文，咱们建议在ASPP上融入图像级特征.

Method

空洞卷积应用于密集的特征提取

这在DeepLabv1和DeepLabv2都已经讲过，这里不详解了~

深层次的空洞卷积

咱们首先探讨将空洞卷积应用在级联模块。具体来讲，咱们取ResNet中最后一个block，在下图中为block4，并在其后面增长级联模块。

上图(a)所示，总体图片的信息总结到后面很是小的特征映射上，但实验证实这是不利于语义分割的。以下图：

使用步幅越长的特征映射，获得的结果反倒会差，结果最好的out_stride = 8 须要占用较多的存储空间。由于连续的下采样会下降特征映射的分辨率，细节信息被抽取，这对语义分割是有害的。
上图(b)所示，可以使用不一样采样率的空洞卷积保持输出步幅的为out_stride = 16.这样不增长参数量和计算量同时有效的缩小了步幅。

　Atrous Spatial Pyramid Pooling

对于在DeepLabv2中提出的ASPP模块，其在特征顶部映射图并行使用了四种不一样采样率的空洞卷积。这代表以不一样尺度采样是有效的，咱们在DeepLabv3中向ASPP中添加了BN层。不一样采样率的空洞卷积能够有效的捕获多尺度信息，可是，咱们发现随着采样率的增长，滤波器的有效权重(权重有效的应用在特征区域，而不是填充0)逐渐变小。以下图所示：

当咱们不一样采样率的 $3 \times 3$ 卷积核应用在 $65 \times 65$ 的特征映射上，当采样率接近特征映射大小时， $3 \times 3$ 的滤波器不是捕捉全图像的上下文，而是退化为简单的 $1 \times 1$ 滤波器，只有滤波器中心点的权重起了做用。

为了克服这个问题，咱们考虑使用图片级特征。具体来讲，咱们在模型最后的特征映射上应用全局平均，将结果通过 $1 \times 1$ 的卷积，再双线性上采样获得所需的空间维度。最终，咱们改进的ASPP包括：

一个 $1 \times 1$ 卷积和三个 $3 \times 3$ 的采样率为 $r a t e s = {6, 12, 18}$ 的空洞卷积，滤波器数量为256，包含BN层。针对output_stride=16的状况。以下图(a)部分Atrous Spatial Pyramid Pooling
图像级特征，即将特征作全局平均池化，通过卷积，再融合。以下图(b)部分Image Pooling.

改进后的ASPP模块以下图所示：

注意当output_stride=8时，加倍了采样率。全部的特征经过 $1 \times 1$ 级联到一块儿，生成最终的分数.

Experiment

采用的是预训练的ResNet为基础层，并配合使用了空洞卷积控制输出步幅。由于输出步幅output_stride(定义为输入图像的分辨率与最终输出分辨率的比值)。当咱们输出步幅为8时，原ResNet的最后两个block包含的空洞卷积的采样率为 $r = 2$ 和 $r = 4$ 。

模型的训练设置：

部分	设置
数据集	PASCAL VOC 2012
工具	TensorFlow
裁剪尺寸	采样513大小的裁剪尺寸
学习率策略	采用poly策略，在初始学习率基础上，乘以 $(1 - \frac{i t e r}{m a x_i t e r})^{p o w e r}$ ,其中 $p o w e r = 0.9$
BN层策略	当output_stride=16时，咱们采用batchsize=16，同时BN层的参数作参数衰减0.9997。在加强的数据集上，以初始学习率0.007训练30K后，冻结BN层参数。采用output_stride=8时，再使用初始学习率0.001训练30K。训练output_stride=16比output_stride=8要快不少，由于中间的特征映射在空间上小的四倍。但由于output_stride=16在特征映射上粗糙是牺牲了精度。
上采样策略	在先前的工做上，咱们是将最终的输出与GroundTruth下采样8倍作比较如今咱们发现保持GroundTruth更重要，故咱们是将最终的输出上采样8倍与完整的GroundTruth比较。

Going Deeper with Atrous Convolution实验

咱们首先试试级联使用多个带空洞卷积的block模块。

ResNet50：以下图，咱们探究输出步幅的影响，当输出步幅为256时，因为严重的信号抽取，性能大大的降低了。

而当咱们使用不一样采样率的空洞卷积，结果大大的上升了，这表如今语义分割中使用空洞卷积的必要性。
ResNet50 vs. ResNet101: 用更深的模型，并改变级联模块的数量。以下图，当block增长性能也随之增长。
Multi-grid：咱们使用的变体残差模块，采用Multi-gird策略，即主分支的三个卷积都使用空洞卷积，采样率设置Multi-gird策略。按照以下图：
- 应用不一样策略一般比单倍数 $(r_{1}, r_{2}, r_{3}) = (1, 1, 1)$ 效果要好
- 简单的提高倍数是无效的 $(r_{1}, r_{2}, r_{3}) = (2, 2, 2)$
- 最好的随着网络的深刻提高性能.即block7下 $(r_{1}, r_{2}, r_{3}) = (1, 2, 1)$
Inference strategy on val set：
推断期间使用output_stride = 8，有着更丰富的细节内容:

Atrous Spatial Pyramid Pooling实验

ASPP模块相比之前增长了BN层，对比multi-grid策略和图片层级特征提高实验结果：

Inference strategy on val set：
推断期间使用output_stride = 8，有着更丰富的细节内容，采用多尺度输入和翻转，性能进一步提高了:

在PASCAL VOC 2012上表现：

Cityscapes表现

多种技巧配置结果：

与其余模型相比：

其余参数的影响

上采样策略和裁剪大小和BN层的影响：
不一样batchsize的影响：
不一样评估步幅的影响：

Conclusion

DeepLabv3重点探讨了空洞卷积的使用，同时改进了ASPP模块，便于更好的捕捉多尺度上下文。

代码分析

由于没找到官方的代码，在github上找了一个DeepLabV3-TensorFlow版本.

训练脚本分析

先找到train_voc12.py训练文件。

找到关键的main方法：

建立训练模型 & 计算loss

def main():
    """建立模型 and 准备训练."""
    h = args.input_size
    w = args.input_size
    input_size = (h, w)

    # 设置随机种子
    tf.set_random_seed(args.random_seed)

    # 建立线程队列，准备数据
    coord = tf.train.Coordinator()

    # 读取数据
    image_batch, label_batch = read_data(is_training=True)

    # 建立训练模型
    net, end_points = deeplabv3(image_batch,
                                num_classes=args.num_classes,
                                depth=args.num_layers,
                                is_training=True,
                                )
    # 对于小的batchsize,保持BN layers的统计参数更佳(即冻结预训练模型的BN参数)
    # If is_training=True, 统计参数在训练期间会被更新
    # 注意的是：即便is_training=False ，BN参数gamma (scale) and beta (offset) 也会更新

    # 取出模型预测值
    raw_output = end_points['resnet{}/logits'.format(args.num_layers)]

    # Which variables to load. Running means and variances are not trainable,
    # thus all_variables() should be restored.
    restore_var = [v for v in tf.global_variables() if 'fc' not in v.name 
        or not args.not_restore_last]
    if args.freeze_bn:
        all_trainable = [v for v in tf.trainable_variables() if 'beta' not in 
            v.name and 'gamma' not in v.name]
    else:
        all_trainable = [v for v in tf.trainable_variables()]
    conv_trainable = [v for v in all_trainable if 'fc' not in v.name] 

    # 上采样logits输出，取代ground truth下采样
    raw_output_up = tf.image.resize_bilinear(raw_output, [h, w]) # 双线性插值放大到原大小

    # Predictions: 忽略标签中大于或等于n_classes的值
    label_proc = tf.squeeze(label_batch) # 删除数据标签tensor的shape中维度值为1
    mask = label_proc <= args.num_classes # 忽略标签中大于或等于n_classes的值
    seg_logits = tf.boolean_mask(raw_output_up, mask)  #取出预测值中感兴趣的mask
    seg_gt = tf.boolean_mask(label_proc, mask) # 取出数据标签中标注的mask(感兴趣的mask)
    seg_gt = tf.cast(seg_gt, tf.int32)  # 转换一下数据类型 

    # 逐像素作softmax loss.
    loss = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=seg_logits,
        labels=seg_gt)
    seg_loss = tf.reduce_mean(loss)
    seg_loss_sum = tf.summary.scalar('loss/seg', seg_loss) # TensorBoard记录

    # 增长正则化损失
    reg_losses = tf.get_collection(tf.GraphKeys.REGULARIZATION_LOSSES)
    reg_loss = tf.add_n(reg_losses)
    reg_loss_sum = tf.summary.scalar('loss/reg', reg_loss)

    tot_loss = seg_loss + reg_loss
    tot_loss_sum = tf.summary.scalar('loss/tot', tot_loss)

    seg_pred = tf.argmax(seg_logits, axis=1)

    # 计算MIOU 
    train_mean_iou, train_update_mean_iou = streaming_mean_iou(seg_pred, 
        seg_gt, args.num_classes, name="train_iou")  

    train_iou_sum = tf.summary.scalar('accuracy/train_mean_iou', 
        train_mean_iou)

关于streaming_mean_iou方法代码见metric_ops.py，该方法用于计算每步的平均交叉点(mIOU),即先计算每一个类别的IOU，再平均到各个类上。

IOU的计算定义以下：

I O U = t r u e

该方法返回一个 update_op操做用于估计数据流上的度量，更新变量并返回 mean_iou.

上面代码初始化了DeepLabv3模型，并取出模型输出，计算了loss，并计算了mIOU.

训练参数设置

这里学习率没有使用poly策略，该github说学习率设置0.00001效果更好点~

# 初始化训练参数
    train_initializer = tf.variables_initializer(var_list=tf.get_collection(
        tf.GraphKeys.LOCAL_VARIABLES, scope="train_iou"))

    # 定义 loss and 优化参数.
    # 这里学习率没采用poly策略 
    base_lr = tf.constant(args.learning_rate)
    step_ph = tf.placeholder(dtype=tf.float32, shape=())
    # learning_rate = tf.scalar_mul(base_lr, 
    # tf.pow((1 - step_ph / args.num_steps), args.power))
    learning_rate = base_lr
    lr_sum = tf.summary.scalar('params/learning_rate', learning_rate)

    train_sum_op = tf.summary.merge([seg_loss_sum, reg_loss_sum, 
        tot_loss_sum, train_iou_sum, lr_sum])

建立交叉验证模型，并设置输出值

# 交叉验证模型
    image_batch_val, label_batch_val = read_data(is_training=False)
    _, end_points_val = deeplabv3(image_batch_val,
                                  num_classes=args.num_classes,
                                  depth=args.num_layers,
                                  reuse=True,
                                  is_training=False,
                                  )
    raw_output_val = end_points_val['resnet{}/logits'.format(args.num_layers)] # 交叉验证输出
    nh, nw = tf.shape(image_batch_val)[1], tf.shape(image_batch_val)[2]

    seg_logits_val = tf.image.resize_bilinear(raw_output_val, [nh, nw])
    seg_pred_val = tf.argmax(seg_logits_val, axis=3)
    seg_pred_val = tf.expand_dims(seg_pred_val, 3)
    seg_pred_val = tf.reshape(seg_pred_val, [-1,])

    seg_gt_val = tf.cast(label_batch_val, tf.int32)
    seg_gt_val = tf.reshape(seg_gt_val, [-1,])
    mask_val = seg_gt_val <= args.num_classes - 1

    seg_pred_val = tf.boolean_mask(seg_pred_val, mask_val)
    seg_gt_val = tf.boolean_mask(seg_gt_val, mask_val)

    val_mean_iou, val_update_mean_iou = streaming_mean_iou(seg_pred_val, 
        seg_gt_val, num_classes=args.num_classes, name="val_iou")        
    val_iou_sum = tf.summary.scalar('accuracy/val_mean_iou', val_mean_iou)

训练模型

val_initializer = tf.variables_initializer(var_list=tf.get_collection(
        tf.GraphKeys.LOCAL_VARIABLES, scope="val_iou"))
    test_sum_op = tf.summary.merge([val_iou_sum])
    global_step = tf.train.get_or_create_global_step()

    opt = tf.train.MomentumOptimizer(learning_rate, args.momentum)
    grads_conv = tf.gradients(tot_loss, conv_trainable)
    # train_op = opt.apply_gradients(zip(grads_conv, conv_trainable))
    train_op = slim.learning.create_train_op(
        tot_loss, opt,
        global_step=global_step,
        variables_to_train=conv_trainable,
        summarize_gradients=True)

    # Set up tf session and initialize variables. 
    config = tf.ConfigProto()
    config.gpu_options.allow_growth = True
    sess = tf.Session(config=config)

    sess.run(tf.global_variables_initializer())
    sess.run(tf.local_variables_initializer())

    # Saver for storing checkpoints of the model.
    saver = tf.train.Saver(var_list=tf.global_variables(), max_to_keep=20)

    # 若是有checkpoint则加载
    if args.ckpt > 0 or args.restore_from is not None:
        loader = tf.train.Saver(var_list=restore_var)
        load(loader, sess, args.snapshot_dir)

    # 开始线程队列
    threads = tf.train.start_queue_runners(coord=coord, sess=sess)

    # tf.get_default_graph().finalize()
    summary_writer = tf.summary.FileWriter(args.snapshot_dir,
                                           sess.graph)

    # 迭代训练
    for step in range(args.ckpt, args.num_steps):
        start_time = time.time()
        feed_dict = { step_ph : step }
        tot_loss_float, seg_loss_float, reg_loss_float, _, lr_float, _,train_summary = sess.run([tot_loss, seg_loss, reg_loss, train_op,
            learning_rate, train_update_mean_iou, train_sum_op], 
            feed_dict=feed_dict)
        train_mean_iou_float = sess.run(train_mean_iou)
        duration = time.time() - start_time
        sys.stdout.write('step {:d}, tot_loss = {:.6f}, seg_loss = {:.6f}, ' \
            'reg_loss = {:.6f}, mean_iou = {:.6f}, lr: {:.6f}({:.3f}' \
            'sec/step)\n'.format(step, tot_loss_float, seg_loss_float,
             reg_loss_float, train_mean_iou_float, lr_float, duration)
            )
        sys.stdout.flush()

        if step % args.save_pred_every == 0 and step > args.ckpt:
            summary_writer.add_summary(train_summary, step)
            sess.run(val_initializer)
            for val_step in range(NUM_VAL-1):
                _, test_summary = sess.run([val_update_mean_iou, test_sum_op],
                feed_dict=feed_dict)

            summary_writer.add_summary(test_summary, step)
            val_mean_iou_float= sess.run(val_mean_iou)

            save(saver, sess, args.snapshot_dir, step)
            sys.stdout.write('step {:d}, train_mean_iou: {:.6f}, ' \
                'val_mean_iou: {:.6f}\n'.format(step, train_mean_iou_float, 
                val_mean_iou_float))
            sys.stdout.flush()
            sess.run(train_initializer)

        if coord.should_stop():
            coord.request_stop()
            coord.join(threads)

模型分析

上面看完了训练脚本，下面看看DeepLabv3的模型定义脚本libs.nets.deeplabv3.py.

deeplabv3中ResNet变体

def deeplabv3(inputs, num_classes, depth=50, aspp=True, reuse=None, is_training=True):
  """DeepLabV3 Args: inputs: A tensor of size [batch, height, width, channels]. depth: ResNet的深度 通常为101或51. aspp: 是否使用ASPP module, if True, 使用4 blocks with multi_grid=(1,2,4), if False, 使用7 blocks with multi_grid=(1,2,1). reuse: 模型参数重用(验证会重用训练的模型参数) Returns: net: A rank-4 tensor of size [batch, height_out, width_out, channels_out]. end_points: 模型的组合 """

  if aspp:
    multi_grid = (1,2,4)
  else:
    multi_grid = (1,2,1)
  scope ='resnet{}'.format(depth)
  with tf.variable_scope(scope, [inputs], reuse=reuse) as sc:
    end_points_collection = sc.name + '_end_points'
    with slim.arg_scope(resnet_arg_scope(weight_decay=args.weight_decay, 
      batch_norm_decay=args.bn_weight_decay)):
      with slim.arg_scope([slim.conv2d, bottleneck, bottleneck_hdc],
                          outputs_collections=end_points_collection):
        with slim.arg_scope([slim.batch_norm], is_training=is_training):
          net = inputs
          net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1')
          net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1')

          with tf.variable_scope('block1', [net]) as sc:
            base_depth = 64
            for i in range(2):
              with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                net = bottleneck(net, depth=base_depth * 4, 
                  depth_bottleneck=base_depth, stride=1)
            with tf.variable_scope('unit_3', values=[net]):
              net = bottleneck(net, depth=base_depth * 4, 
                depth_bottleneck=base_depth, stride=2)
            net = slim.utils.collect_named_outputs(end_points_collection, 
              sc.name, net)

          with tf.variable_scope('block2', [net]) as sc:
            base_depth = 128
            for i in range(3):
              with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                net = bottleneck(net, depth=base_depth * 4, 
                  depth_bottleneck=base_depth, stride=1)
            with tf.variable_scope('unit_4', values=[net]):
              net = bottleneck(net, depth=base_depth * 4, 
                depth_bottleneck=base_depth, stride=2)
            net = slim.utils.collect_named_outputs(end_points_collection, 
              sc.name, net)

          with tf.variable_scope('block3', [net]) as sc:
            base_depth = 256

            num_units = 6
            if depth == 101:
              num_units = 23
            elif depth == 152:
              num_units = 36

            for i in range(num_units):
              with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                net = bottleneck(net, depth=base_depth * 4, 
                  depth_bottleneck=base_depth, stride=1)
            net = slim.utils.collect_named_outputs(end_points_collection, 
              sc.name, net)

          with tf.variable_scope('block4', [net]) as sc:
            base_depth = 512

            for i in range(3):
              with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                net = bottleneck_hdc(net, depth=base_depth * 4, 
                  depth_bottleneck=base_depth, stride=1, rate=2, 
                  multi_grid=multi_grid)
            net = slim.utils.collect_named_outputs(end_points_collection, 
              sc.name, net)

这部分实现的变体的ResNet结构，包括带mutli-grid的残差模块由libs.nets.deeplabv3.py中的bottleneck_hdc方法提供。

带mutli-grid策略的bottleneck_hdc残差结构代码以下：

@slim.add_arg_scope
def bottleneck_hdc(inputs, depth, depth_bottleneck, stride, rate=1, multi_grid=(1,2,4), outputs_collections=None, scope=None, use_bounded_activations=False):
  """Hybrid Dilated Convolution Bottleneck. Multi_Grid = (1,2,4) See Understanding Convolution for Semantic Segmentation. When putting together two consecutive ResNet blocks that use this unit, one should use stride = 2 in the last unit of the first block. Args: inputs: A tensor of size [batch, height, width, channels]. depth: The depth of the ResNet unit output. depth_bottleneck: The depth of the bottleneck layers. stride: The ResNet unit's stride. Determines the amount of downsampling of the units output compared to its input. rate: An integer, rate for atrous convolution. multi_grid: multi_grid sturcture. outputs_collections: Collection to add the ResNet unit output. scope: Optional variable_scope. use_bounded_activations: Whether or not to use bounded activations. Bounded activations better lend themselves to quantized inference. Returns: The ResNet unit's output. """
  with tf.variable_scope(scope, 'bottleneck_v1', [inputs]) as sc:
    depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4)
    # 是否降采样
    if depth == depth_in:
      shortcut = resnet_utils.subsample(inputs, stride, 'shortcut')
    else:
      shortcut = slim.conv2d(
          inputs,
          depth, [1, 1],
          stride=stride,
          activation_fn=tf.nn.relu6 if use_bounded_activations else None,
          scope='shortcut')

    # 残差结构的主分支
    residual = slim.conv2d(inputs, depth_bottleneck, [1, 1], stride=1, 
      rate=rate*multi_grid[0], scope='conv1')
    residual = resnet_utils.conv2d_same(residual, depth_bottleneck, 3, stride,
      rate=rate*multi_grid[1], scope='conv2')
    residual = slim.conv2d(residual, depth, [1, 1], stride=1, 
      rate=rate*multi_grid[2], activation_fn=None, scope='conv3')

    # 是否后接激活函数
    if use_bounded_activations:
      # Use clip_by_value to simulate bandpass activation.
      residual = tf.clip_by_value(residual, -6.0, 6.0)
      output = tf.nn.relu6(shortcut + residual)
    else:
      output = tf.nn.relu(shortcut + residual)

    return slim.utils.collect_named_outputs(outputs_collections,
                                            sc.name,
                                            output)

下面是关于aspp模块和后期的空洞卷积策略使用

if aspp:
            with tf.variable_scope('aspp', [net]) as sc:
              aspp_list = []
              branch_1 = slim.conv2d(net, 256, [1,1], stride=1, 
                scope='1x1conv')
              branch_1 = slim.utils.collect_named_outputs(
                end_points_collection, sc.name, branch_1)
              aspp_list.append(branch_1)

              for i in range(3):
                branch_2 = slim.conv2d(net, 256, [3,3], stride=1, rate=6*(i+1), scope='rate{}'.format(6*(i+1)))
                branch_2 = slim.utils.collect_named_outputs(end_points_collection, sc.name, branch_2)
                aspp_list.append(branch_2)

              aspp = tf.add_n(aspp_list)
              aspp = slim.utils.collect_named_outputs(end_points_collection, sc.name, aspp)

            # 增长图像级特征，即全局平均池化
            with tf.variable_scope('img_pool', [net]) as sc:
              """Image Pooling See ParseNet: Looking Wider to See Better """
              pooled = tf.reduce_mean(net, [1, 2], name='avg_pool', 
                keep_dims=True)
              pooled = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, pooled)

              pooled = slim.conv2d(pooled, 256, [1,1], stride=1, scope='1x1conv')
              pooled = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, pooled)

              pooled = tf.image.resize_bilinear(pooled, tf.shape(net)[1:3])
              pooled = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, pooled)

            # 将图像级特征融合到aspp中
            with tf.variable_scope('fusion', [aspp, pooled]) as sc:
              net = tf.concat([aspp, pooled], 3)
              net = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, net)

              net = slim.conv2d(net, 256, [1,1], stride=1, scope='1x1conv')
              net = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, net)

          # 若是不使用aspp， 则使用带mutli-grid的残差结构
          else:
            with tf.variable_scope('block5', [net]) as sc:
              base_depth = 512

              for i in range(3):
                with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                  net = bottleneck_hdc(net, depth=base_depth * 4, 
                    depth_bottleneck=base_depth, stride=1, rate=4)
              net = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, net)

            with tf.variable_scope('block6', [net]) as sc:
              base_depth = 512

              for i in range(3):
                with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                  net = bottleneck_hdc(net, depth=base_depth * 4, 
                    depth_bottleneck=base_depth, stride=1, rate=8)
              net = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, net)

            with tf.variable_scope('block7', [net]) as sc:
              base_depth = 512

              for i in range(3):
                with tf.variable_scope('unit_%d' % (i + 1), values=[net]):
                  net = bottleneck_hdc(net, depth=base_depth * 4, 
                    depth_bottleneck=base_depth, stride=1, rate=16)
              net = slim.utils.collect_named_outputs(end_points_collection, 
                sc.name, net)

          # 输出
          with tf.variable_scope('logits',[net]) as sc:
            net = slim.conv2d(net, num_classes, [1,1], stride=1, 
              activation_fn=None, normalizer_fn=None)
            net = slim.utils.collect_named_outputs(end_points_collection, 
            sc.name, net)

          end_points = slim.utils.convert_collection_to_dict(
              end_points_collection)

          return net, end_points

if __name__ == "__main__":
  x = tf.placeholder(tf.float32, [None, 512, 512, 3])

  net, end_points = deeplabv3(x, 21)
  for i in end_points:
    print(i, end_points[i])

代码自己仍是很容易理解的~

到这里整个DeepLabv3就算结束了~

【Semantic Segmentation】DeepLab V3（转）

Abstract

Introduction

Related Work

Method

空洞卷积应用于密集的特征提取

深层次的空洞卷积

Atrous Spatial Pyramid Pooling

Experiment

Going Deeper with Atrous Convolution实验

Atrous Spatial Pyramid Pooling实验

在PASCAL VOC 2012上表现：

Cityscapes表现

其余参数的影响

Conclusion

代码分析

训练脚本分析

建立训练模型 & 计算loss

训练参数设置

建立交叉验证模型，并设置输出值

训练模型

模型分析

deeplabv3中ResNet变体

下面是关于aspp模块和后期的空洞卷积策略使用

　Atrous Spatial Pyramid Pooling