图像分类丨Inception家族进化史「GoogleNet、Inception、Xception」

时间 2019-12-05

标签图像分类 inception 家族进化 googlenet xception 繁體版

原文原文链接

引言

Google提出的Inception系列是分类任务中的表明性工做，不一样于VGG简单地堆叠卷积层，Inception重视网络的拓扑结构。本文关注Inception系列方法的演变，并加入了Xception做为对比。

PS1：这里有一篇blog，做者Bharath Raj简洁明了地介绍这系列的工做：https://towardsdatascience.com/a-simple-guide-to-the-versions-of-the-inception-network-7fc52b863202，强烈建议阅读。python

PS2：我看了比较多的blog，都没有介绍清楚V2和V3的区别。主要是由于V2的提出涉及到两篇paper，而且V2和V3是在一篇论文中提到的。实际上，它们二者的区别并不大。网络

InceptionV1

Going Deeper with Convolutions架构

核心思想

因为图像的突出部分可能有极大的尺寸变化，这为卷积操做选择正确的内核大小创造了困难，好比更全局的信息应该使用大的内核，而更局部的信息应该使用小内核。不妨在同一级运行多种尺寸的滤波核，让网络本质变得更"宽"而不是”更深“。框架

提出Inception模块（左），具备三种不一样的滤波器（1x1,3x3,5x5）和max pooling。为下降计算量，GooLeNet借鉴Network-in-Network的思想，用1x1卷积降维减少参数量（右）。可在保持计算成本的同时增长网络的深度和宽度。

网络架构

GoogLeNet具备9个Inception模块，22层深（27层包括pooling），并在最后一个Inception模块使用全局池化。
因为网络深度，将存在梯度消失vanishing gradient的问题。
为了防止网络中间部分消失，做者提出了两个辅助分类器auxiliary classifiers（紫色），总损失是实际损失和辅助损失的加权求和。

# The total loss used by the inception net during training.
total_loss = real_loss + 0.3 * aux_loss_1 + 0.3 * aux_loss_2

实验结果

InceptionV2

Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shiftide

Rethinking the Inception Architecture for Computer Vision函数

核心思想

使用Batch Normalization

将输出归一化为N(0,1正态分布，一方面能够采用较大的学习速率，加快收敛；另外一方面BN具备正则效应。性能

卷积分解Factorizing Convolutions

当卷积没有完全改变输入维度时，神经网络表现更好。过分减少尺寸会致使信息丢失，称为"representational bottleneck"，巧妙地使用分解(factorization)方法，可提升卷积的计算效率。学习

分解为更小的卷积：\(5\times5\)卷积可分解为两个\(3\times3\)卷积以提高计算效率，计算效率为原来的\(\frac{3\times3+3\times3}{5\times5}\)
分解为非对称卷积：\(n\times n\)卷积可分解为\(1\times n\)和\(n \times 1\)的卷积。

Inception的演化ui

a为InceptionV1；用两个3x3卷积替换5x5获得b；再将3x3卷积分解为3x一、1x3得c；在高层特征中，卷积组被拓展为d已产生更多不同的特征。es5

下采样模块

InceptionV3再也不使用max pooling下采样，这样致使信息损失较大。因而做者想用conv升维，而后再pooling，但会带来较大的计算量，因此做者设计了一个并行双分支的结构Grid Size Reduction来取代max pooling。

网络结构

figure五、figure六、figure7分别表示上图的b、c、d，每种block之间加入Grid Size Reduction。

实验结果

Inceptionv2达到23.4%，而Inceptionv3是指在Inceptionv2上同时使用RMSProp、Label Smoothing和分解7x7卷积、辅助分类器使用BN。

InceptionV3

Rethinking the Inception Architecture for Computer Vision

核心思想

做者指出，辅助分类器在训练即将结束时准确度接近饱和时才会有大的贡献。所以能够做正则化regularizes。

V3在V2上做了以下改进，见V2实验结果：
1. RMSProp Optimizer
2. 分解7x7的卷积
3. 辅助分类器采用BatchNorm
4. Label Smoothing，防止过拟合。

实验结果

InceptionV4

Inception-ResNet and the Impact of Residual Connections on Learning

这篇文章结合ResNet和Inception提出了三种新的网络结构

Inception-ResNet-v1：混合版Inception，和InceptionV3有相同计算成本。
Inception-ResNet-v2：计算成本更高，显著提升performance。
InceptionV4：纯Inception变体，无residual链接，媲美Inception-ResNetV2

核心思想

InceptionV4是对原来的版本进行了梳理，由于原始模型是采用分区方式训练，而迁移到TensorFlow框架后能够对Inception模块进行必定的规范和简化。

网络架构

Stem：Inception-ResNetV1采用了top，Inceptionv4和Inception-ResNetV2采用了bottom。

Inception modules A,B,C

Reduction Blocks A,B

Network

Inception-ResNet

核心思想

受ResNet启发，提出一种混合版的Inception。Inception-ResNet有v一、v2版本。
1. Inception-ResNetV1计算量与InceptionV3类似，Inception-ResNetV2计算量与InceptionV4类似。
2. 它们有不一样的steam。
3. 它们的A、B、C模块相同，区别在于超参数设置。

当卷积核数量超过1000时，更深的单元会致使网络死亡。所以为了增长稳定性，做者对残差激活值进行0.1-0.3的缩放。

网络架构

Steam：见InceptionV4
Inception-ResNet Module A,B,C

Residual Blocks A,B

Network

实验结果

Xception

核心思想

Xception: Deep Learning with Depthwise Separable Convolutions

借鉴depth wise separable conv改进InceptionV3。

Inception基于假设：卷积时将通道和空间卷积分离会更好。其1x1的卷积做用于通道，3x3的卷积同时做用于通道和空间，没有作到彻底分离。

Xception(Extream Inception)则让3x3卷积只做用于一个通道的特征图，从而实现了彻底分离。

InceptionV3到Xception的演化

Xception与depthwise separable conv的不一样之处：

depthwise separable conv先对通道进行卷积再1x1卷积，而Xception先1x1卷积，再对通道卷积。
depthwise separable conv两个卷积间不带激活函数，Xception会通过ReLU。

网络架构

实验结果

总结

GoogLeNet即InceptionV1提出了Inception结构，包含1x一、3x三、5x5的conv和pooling，使网络变宽，增长网络对多尺度的适应性。
InceptionV2提出了Batch Normalization，使输出归一化为N(0,1)分布，从而加快收敛。而且提出了卷积分解的思想，将大卷积分解为小卷积或非对称卷积，从而下降计算量。
InceptionV3在InceptionV2的基础上作了一些改进，继续分解7x7卷积、Label Smoothing，并在辅助分类器中也采用BN。
InceptionV4从新考虑了InceptionV3的结构，下降了没必要要的计算量，纯Inception，未引入Residual链接，准确性媲美Inception-ResNetv2。
Inception-ResNet是Inception和Residual Connection的结合，性能有所提高。其有两个版本v一、v2，v1的计算量跟InceptionV3类似，v2的计算量跟InceptionV4类似。
Xception借鉴了depth wise separable conv改进InceptionV3，将空间和通道彻底分离，从而提高了性能，下降了参数量。

参考

paper

[1]Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2015: 1-9.

[2]Ioffe S, Szegedy C. Batch normalization: Accelerating deep network training by reducing internal covariate shift[J]. arXiv preprint arXiv:1502.03167, 2015.

[3]Szegedy C, Vanhoucke V, Ioffe S, et al. Rethinking the inception architecture for computer vision[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2818-2826.

[4]Szegedy C, Ioffe S, Vanhoucke V, et al. Inception-v4, inception-resnet and the impact of residual connections on learning[C]//Thirty-First AAAI Conference on Artificial Intelligence. 2017.

[5]Chollet F. Xception: Deep learning with depthwise separable convolutions[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2017: 1251-1258.

blog

A Simple Guide to the Versions of the Inception Network

Inception模型进化史：从GoogLeNet到Inception-ResNet

关于Xception，你须要知道这些