从卷积拆分和分组的角度看CNN模型的演化

时间 2020-05-15

标签拆分分组角度 cnn 模型演化繁體版

原文原文链接

博客：博客园 | CSDN | blog网络

写在前面

如题，这篇文章将尝试从卷积拆分的角度看一看各类经典CNN backbone网络module是如何演进的，为了视角的统一，仅分析单条路径上的卷积形式。ide

形式化

方便起见，对常规卷积操做，作以下定义，模块化

\(I\)：输入尺寸，长\(H\) 宽\(W\) ，令长宽相同，即\(I = H = W\)
\(M\)：输入channel数，能够当作是tensor的高
\(K\)：卷积核尺寸\(K \times K\)，channel数与输入channel数相同，为\(M\)
\(N\)：卷积核个数
\(F\)：卷积获得的feature map尺寸\(F \times F\)，channel数与卷积核个数相同，为\(N\)

因此，输入为\(M \times I \times I\)的tensor，卷积核为\(N \times M \times K \times K\)的tensor，feature map为\(N \times F \times F\)的tensor，因此常规卷积的计算量为函数

\[FLOPS = K \times K \times M \times N \times F \times F \]

特别地，若是仅考虑SAME padding且\(stride = 1\)的状况，则\(F = I\)，则计算量等价为性能

\[FLOPS = K \times K \times M \times N \times I \times I \]

能够当作是\((K \times K \times M) \times (N \times I \times I)\)，前一个括号为卷积中一次内积运算的计算量，后一个括号为须要多少次内积运算。spa

参数量为.net

\[\#Params = N \times M \times K \times K \]

网络演化

总览SqueezeNet、MobileNet V1 V二、ShuffleNet等各类轻量化网络，能够当作对卷积核\(M \times K \times K\) 进行了各类拆分或分组（同时引入激活函数），这些拆分和分组一般会减小参数量和计算量，这就为进一步增长卷积核数量\(N\)让出了空间，同时这种结构上的变化也是一种正则，经过上述变化来得到性能和计算量之间的平衡。blog

这些变化，从总体上看，至关于对原始\(FLOPS = K \times K \times M \times N \times I \times I\)作了各类变换。backbone

下面就从这个视角进行一下疏理，简洁起见，只列出其中发生改变的因子项，get

Group Convolution（AlexNet），对输入进行分组，卷积核数量不变，但channel数减小，至关于

\[M \rightarrow \frac{M}{G} \]
大卷积核替换为多个堆叠的小核（VGG），好比\(5\times 5\)替换为2个\(3\times 3\)，\(7\times 7\)替换为3个\(3\times 3\)，保持感觉野不变的同时，减小参数量和计算量，至关于把大数乘积变成小数乘积之和，

\[(K \times K) \rightarrow (k \times k + \dots + k \times k) \]
Factorized Convolution（Inception V2），二维卷积变为行列分别卷积，先行卷积再列卷积，

\[(K \times K) \rightarrow (K \times 1 + 1 \times K) \]
Fire module（SqueezeNet），pointwise+ReLU+(pointwise + 3x3 conv)+ReLU，pointwise降维，同时将必定比例的\(3\times 3\)卷积替换为为\(1 \times 1\)，

\[(K \times K \times M \times N) \rightarrow (M \times \frac{N}{t} + \frac{N}{t} \times (1-p)N + K \times K \times \frac{N}{t} \times pN) \\ K = 3 \]
Bottleneck（ResNet），pointwise+BN ReLU+3x3 conv+BN ReLU+pointwise，相似于对channel维作SVD，

\[(K \times K \times M \times N) \rightarrow (M \times \frac{N}{t} + K \times K \times \frac{N}{t} \times \frac{N}{t} + \frac{N}{t} \times N) \\ t = 4 \]
ResNeXt Block（ResNeXt），至关于引入了group \(3\times 3\) convolution的bottleneck，

\[(K \times K \times M \times N) \rightarrow (M \times \frac{N}{t} + K \times K \times \frac{N}{tG} \times \frac{N}{t} + \frac{N}{t} \times N) \\t = 2, \ G = 32 \]
Depthwise Separable Convolution（MobileNet V1），depthwise +BN ReLU + pointwise + BN ReLU，至关于将channel维单独分解出去，

\[(K \times K \times N) \rightarrow (K \times K + N) \]
Separable Convolution（Xception），pointwise + depthwise + BN ReLU，也至关于将channel维分解出去，但先后顺序不一样（但由于是连续堆叠，其实跟基本Depthwise Separable Convolution等价），同时移除了二者间的ReLU，

\[(K \times K \times M) \rightarrow (M + K \times K) \]
但实际在实现时仍是depthwise + pointwise + ReLU。。。
pointwise group convolution and channel shuffle（ShuffleNet），group pointwise+BN ReLU+Channel Shuffle+depthwise+BN+group pointwise+BN，至关于bottleneck中2个pointwise引入相同的group，同时\(3\times 3\) conv变成depthwise，也就是说3个卷积层都group了，这会阻碍不一样channel间（分组间）的信息交流，因此在第一个group pointwise后加入了channel shuffle，即

\[(K \times K \times M \times N) \rightarrow (\frac{M}{G} \times \frac{N}{t} + channel \ shuffle +K \times K \times \frac{N}{t} + \frac{N}{tG} \times N) \]
Inverted Linear Bottleneck（MobileNet V2），bottleneck是先经过pointwise降维、再卷积、再升维，Inverted bottleneck是先升维、再卷积、再降维，pointwise+BN ReLU6+depthwise+BN ReLU6+pointwise+BN，

\[(K \times K \times M \times N) \rightarrow (M \times tM + K \times K \times tM + tM \times N) \\t = 6 \]

小结

最后小结一下，早期的CNN由一个个常规卷积层堆叠而成，然后，开始模块化，由一个个 module构成，module的演化，能够当作是不停地在常规卷积的计算量\(FLOPS = K \times K \times M \times N \times I \times I\)上作文章。

拆分：卷积核是个3 D 的tensor，能够在不一样维度上进行拆分，行列可拆分，高也可拆分，还能够拆分红多段串联（相似SVD）。
分组：若是多个卷积核放在一块儿，能够构成4D的tensor，增长的这一数量维上能够分组group。

不一样拆分和分组的方式排列组合就构成了各类各样的module。