卷积涨点论文 | Asymmetric Convolution ACNet | ICCV | 2019

时间 2020-12-23

标签 python 微信网络机器学习 ide 学习测试 .net code orm 栏目 Python 繁體版

原文原文链接

文章原创来自做者的微信公众号：【机器学习炼丹术】。交流群氛围超好，我但愿能够建议一个：当一我的遇到问题的时候，有这样一个平台能够快速讨论并解答，目前已经1群已经满员啦，2群欢迎你的到来哦。加入群惟一的要求就是，你对AI有兴趣。加个人微信我邀请进群cyx645016617。python

论文名称：“ACNet: Strengthening the Kernel Skeletons for Powerful CNN via Asymmetric Convolution Blocks”
论文连接：https://arxiv.org/abs/1908.03930
模型缩写：ACNet

0 个人理解

这个ACNet是一个不错的对于卷积核结构的一个创新。总的来讲是一个值得在CNN模型中尝试的trick，至于有没有效果还得看缘分。不过这个trick的听同行来讲，算是一个好的trick，因此值得尝试。微信

这个trick的代价是增长了训练阶段的时间和参数，可是并不会增长推理阶段的时长，也不会增长最终模型的参数。网络

1 论文讲解

这个方法挺简单了，能够用这一张图来展现：
机器学习

炼丹兄带你理解这图：ide

图片分为左右两个部分，左边是训练阶段的ACNet，右边是部署的模型，能够理解为测试推理阶段；
通常3x3的卷积，其实就是左图中第一行的那个卷积，ACNet的创新在于3x3的卷积的侧面并行了1x3和3x1两个矩形卷积核的卷积。能够理解为，任何一个卷积网络中，原本的一个3x3的卷积层，假如使用ACNet的方法，就会变成3哥卷积层并行的一个结构。
三个卷积层的输出结构相加，就是这个这个AC卷积层的输出特征图了
为何说，测试阶段模型的参数没有增长呢？这不是多了两个卷积层，那参数怎么会不增长呢？从右边的图能够看到，这三个卷积核其实能够合并成一个卷积核，因此其实acnet是彻底等价于通常的卷积模型的。

我的的理解，通常的模型也是有可能训练出ACNet的效果的，由于二者的参数彻底等价。可是ACNet多是由于强化了横向和纵向的特征，因此会取得更好的效果。而且这个至关于，给卷积核增长了一层限制，卷积核的每个参数再也不是同等中重要的，中心更为重要。由于增长了限制，可能也会避免过拟合。这是我的从实验中获得的一些猜测和思考。学习

下面看一下另一篇文章的解释，看得懂的朋友能够验证本身理解的是否正确：
测试

2 训练代码

我先写一个用通常卷积的很是简单的分类网络：.net

class Net(nn.Module):    
    def __init__(self):
        super(Net, self).__init__()
          
        self.features = nn.Sequential(
            nn.Conv2d(1, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.Conv2d(32, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            nn.Conv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.Conv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
          
        self.classifier = nn.Sequential(
            nn.Dropout(p = 0.5),
            nn.Linear(64 * 7 * 7, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p = 0.5),
            nn.Linear(512, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p = 0.5),
            nn.Linear(512, 10),
        )                

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        
        return x

下面我来把这个网络转成使用ACNet的结构，先构建一个acblock来代替卷积：code

class ACConv2d(nn.Module):
    def __init__(self,in_channels,out_channels,kernel_size=3,stride=1,padding=1,bias=True):
        super(ACConv2d,self).__init__()
        self.conv = nn.Conv2d(in_channels,out_channels,kernel_size=kernel_size,
                             stride=stride,padding=padding,bias=True)
        self.ac1 = nn.Conv2d(in_channels,out_channels,kernel_size=(1,kernel_size),
                            stride=stride,padding=(0,padding),bias=True)
        self.ac2 = nn.Conv2d(in_channels,out_channels,kernel_size=(kernel_size,1),
                            stride=stride,padding=(padding,0),bias=True)
        
    def forward(self,x):
        ac1 = self.ac1(x)
        ac2 = self.ac2(x)
        x = self.conv(x)
        return (ac1+ac2+x)/3

而后把网路中的nn.Conv2d替换成ACConv2d便可：orm

class ACNet(nn.Module):    
    def __init__(self):
        super(ACNet, self).__init__()
          
        self.features = nn.Sequential(
            ACConv2d(1, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            ACConv2d(32, 32, kernel_size=3, stride=1, padding=1),
            nn.BatchNorm2d(32),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2),
            ACConv2d(32, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            ACConv2d(64, 64, kernel_size=3, padding=1),
            nn.BatchNorm2d(64),
            nn.ReLU(inplace=True),
            nn.MaxPool2d(kernel_size=2, stride=2)
        )
          
        self.classifier = nn.Sequential(
            nn.Dropout(p = 0.5),
            nn.Linear(64 * 7 * 7, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p = 0.5),
            nn.Linear(512, 512),
            nn.BatchNorm1d(512),
            nn.ReLU(inplace=True),
            nn.Dropout(p = 0.5),
            nn.Linear(512, 10),
        )  

    def forward(self, x):
        x = self.features(x)
        x = x.view(x.size(0), -1)
        x = self.classifier(x)
        return x

3 效果及缘由

效果上看，模型在ImageNet上是有必定的效果的。为何会有这样的提高呢？论文中给出了一种解释，由于1x3和3x1的卷积核对于竖直翻转和水平翻转是有鲁棒性的。看下图：

特征图竖直翻转以后，对于1x3的卷积核的特征并无影响，可是3x3的卷积核中的特征已经发生改变。同理，3x1的卷积核对于水平翻转也有鲁棒性。

这个翻转鲁棒性是一种解释，下面还有另一种解释：

这部分的缘由我的理解是来自梯度差别化，原来只有一个[公式]卷积层，梯度能够看出一份，而添加了1x3和3x1卷积层后，部分位置的梯度变为2份和3份，也是更加细化了。并且理论上能够融合无数个卷积层不断逼近现有网络的效果极限，融合方式不限于相加（训练和推理阶段一致便可），融合的卷积层也不限于1x3或3x1尺寸。

我把这个方法用在我MNIST数据集的识别上，不过没有什么效果哈哈。但愿未来能够个人项目有提高效果，是一个值得尝试的trick，欢迎你们收藏点赞。

4 改进

最后，若是你耐心看到这里，而且对以前的内容加以思考，就会发现，我写的ac卷积，并无实如今推理过程的卷积核融合。我后来完善了一下代码，当调用model.eval()后，acconv卷积就会融合成一个卷积层，而不是3个并行的卷积层：

class ACConv2d(nn.Module):
    def __init__(self,in_channels,out_channels,kernel_size=3,stride=1,padding=1,bias=False):
        super(ACConv2d,self).__init__()
        self.bias = bias
        self.conv = nn.Conv2d(in_channels,out_channels,kernel_size=kernel_size,
                             stride=stride,padding=padding,bias=bias)
        self.ac1 = nn.Conv2d(in_channels,out_channels,kernel_size=(1,kernel_size),
                            stride=stride,padding=(0,padding),bias=bias)
        self.ac2 = nn.Conv2d(in_channels,out_channels,kernel_size=(kernel_size,1),
                            stride=stride,padding=(padding,0),bias=bias)
        self.fusedconv = nn.Conv2d(in_channels,out_channels,kernel_size=kernel_size,
                                 stride=stride,padding=padding,bias=bias)
    def forward(self,x):
        if self.training:
            ac1 = self.ac1(x)
            ac2 = self.ac2(x)
            x = self.conv(x)
            return (ac1+ac2+x)/3
        else:
            x = self.fusedconv(x)
            return x
        
    def train(self,mode=True):
        super().train(mode=mode)
        if mode is False:
            weight = self.conv.weight.cpu().detach().numpy()
            weight[:,:,1:2,:] = weight[:,:,1:2,:] + self.ac1.weight.cpu().detach().numpy()
            weight[:,:,:,1:2] = weight[:,:,:,1:2] + self.ac2.weight.cpu().detach().numpy()
            self.fusedconv.weight = torch.nn.Parameter(torch.FloatTensor(weight/3))
            if self.bias:
            	bias = self.conv.bias.cpu().detach().numpy()+self.conv.ac1.cpu().detach().numpy()+self.conv.ac2.cpu().detach().numpy()
                self.fusedconv.bias = torch.nn.Parameter(torch.FloatTensor(bias/3))
            if torch.cuda.is_available():
                self.fusedconv = self.fusedconv.cuda()

感谢各位的阅读，喜欢的能够点个“赞”和“在看”！
参考文章：