论文笔记：Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells

时间 2019-12-06

标签论文笔记 fast neural architecture search compact semantic segmentation models auxiliary cells 繁體版

原文原文链接

Fast Neural Architecture Search of Compact Semantic Segmentation Models via Auxiliary Cells git

2019-04-24 14:49:10

网络

Paper：https://arxiv.org/pdf/1810.10804.pdf app

在过去的许多年，你们一直认为网络结构的设计是人类的事情。可是，近些年 NAS 的发展，打破了这种观念，用自动化的方法在给定的数据上设计合适的网络结构，变的势不可挡。本文在语义分割的任务上，尝试搜索高效的 encoder-decoder framework，并在其余相似任务上作了验证。函数

1. 方法：性能

1.1 问题定义：测试

咱们考虑 dense prediction task T, 输入是 3维的 RGB image，输出是 C 维的 one-hot segmentation mask，C 是等于类别数目的。咱们将从该输入到对应输出的函数，记为 f，即：全卷积网络结构。咱们假设 f 能够进一步的分解为两个部分，即：e-表明 encoder，d-表明 decoder。咱们用预训练的分类任务的模型来初始化 encoder e，另外，decoder d 部分，就是选择访问 encoder 的多个输出，而后选择利用哪些 operation 在这个上面进行对应的操做。优化

1.2 Search Space：spa

这里做者着重关注 decoder 部分，该 decoder 能够访问 pre-trained encoder 的多个 layer，从而能够得到多个不一样分辨率的输出。为了使得采样的结构紧凑，每个 encoder 的输出会通过一个 1*1 convolution，获得相同数目的 channel。咱们依赖于 RNN 模型，来序列的产生该利用的 layer 的索引 (produce pairs of indices of which layers to use) 以及在这些数据上利用什么操做。特别的，这些操做的序列组合起来，获得一个 cell（如图1所示）。一样的 cell 可是用不一样的 weights，对采样到的 layer 进行操做；两个 cell 的输出进行 sum。在 sampling pooling 以后添加 resultant layer。采样的 layer 个数是由超参数控制的，文中设置为 3，容许 controller （即 RNN）来恢复出该 encoder-decoder architecture，好比：FCN, RefineNet。全部的非采样的加和输出，都被组合起来，而后输入到 1*1 卷积中，以进行降维。scala

做者使用了以下的 11 种操做，做为搜索空间：设计

1.3 Search Strategy：

做者将 training set 划分为两个不连续的集合，meta-train 以及 meta-val。meta-train 是用于训练在特定任务上的 sampled architecture，meta-val 则是用于衡量 trained architecture 的性能，并提供给 controller 一个 scalar，在 DRL 中一般称为 reward 。给定采样的序列，其 logarithmic probabilities 以及 reward signal, the controller 都用 PPO 进行优化。因此，本文的任务就有两个训练过程：

　　inner --- optimization of the sampled architecture on the given task,

　　outer --- optimization of the controller.

做者接下来对 inner 的过程进行了详细的介绍。

1.4 Progressive Stages:

做者将 inner training process 分为两个阶段：在第一个阶段，固定住 the encoder weights，因此其输出是能够预先计算的，而后仅仅训练 decoder 部分。这种策略能够快速的更新 decoder 的权重，能够对 sampled architecture 进行一个合理的性能评估。咱们探索了一个检测的方法来决定是否继续，在第二阶段训练 the sampled architecture。确切的说，当前 reward value 是和所看到的 running mean of rewards 相比较，若是大于平均值，咱们继续训练。不然，咱们以 1-p 的几率来终止训练过程。几率 p 在搜索过程当中是从 0.9 渐变的（annealed）。

而这么作的动机是：在第一个阶段，虽然有些许噪声，可是仍然能够提供潜在的 sampled architecture 的合理预测。至少，他们提供了一个可靠地信号：the sampled architecture is non-promising, 当仅仅花费几秒钟在这个任务上。这种简单的方法，在前期能够鼓励探索。

1.5 Fast training via Knowledge Distillation and Weights' Averaging:

对于语义分割任务来讲，须要屡次的迭代才可以收敛。经过用 pre-trained classification model 来初始化 encoder 部分能够很大程度上缓解该问题，可是对于 decoder 来讲，不存在这种 pre-trained model。做者探索了几种其余的策略，来加速收敛过程：

1). we keep track of the running average of the parameters during each stage and apply them before the final valivation.

2). we append an additional l2-loss term between the logits of the current architecture and a pre-trained teacher network.

这两种方法的组合容许咱们接受一个很是可靠地分割模型的性能预测。

1.6 Intermediate Supervision via Auxiliary Cells:

做者添加了一个辅助单元（auxiliary cell），该 cell 是和 main cell 相同的。同时，在训练和测试阶段，其也不影响 main classifier 的输出，而仅仅对剩下的网络提供更好的梯度。最终，每个采样的网络结构的奖励，仍然由 main classifier 的输出决定。为了简单起见，咱们仅仅在该辅助输出上计算分割损失（segmentation loss）。中间监督的概念，并非很新，可是前人的工做仅依赖于辅助分类器，而本文做者首次将 decoder 的设计与辅助单元的设计相结合。

2. Experiments:

下面该 decoder 网络结构就是做者搜索出来的，取得了很好的分割效果。