fcn是深度学习用于图像分割的鼻祖.后续的不少网络结构都是在此基础上演进而来.html
图像分割即像素级别的分类.前端
语义分割的基本框架:
前端fcn(以及在此基础上的segnet,deconvnet,deeplab等) + 后端crf/mrfgit
FCN是分割网络的鼻祖,后面的不少网络都是在此基础上提出的.
论文地址github
和传统的分类网络相比,就是将传统分类网络的全链接层用反卷积层替代.获得一个和图像大小一致的feature map。本篇文章用的网络是VGG.
后端
主要关注两点网络
关于反卷积(也叫转置卷积)的详细推导,能够参考:<https://blog.csdn.net/LoseInVain/article/details/81098502>框架
简单滴说就是:卷积的反向操做.以4x4矩阵A为例,卷积核C(3x3,stride=1),经过卷积操做获得一个2x2的矩阵B. 转置卷积即已知B,要获得A,咱们要找到卷积核C,使得B至关于A经过C作正向卷积,获得B.ide
转置卷积是一种上采样的方法.函数
若是只用特征提取部分(也就是VGG全链接层以前的部分)获得的feature map作上采样将feature map还原到图像输入的size的话,feature不够精确.因此采用不一样layer的feature map作上采样再组合起来.学习
源码:https://github.com/pochih/FCN-pytorch
其中的核心代码以下:
class FCNs(nn.Module): def __init__(self, pretrained_net, n_class): super().__init__() self.n_class = n_class self.pretrained_net = pretrained_net self.relu = nn.ReLU(inplace=True) self.deconv1 = nn.ConvTranspose2d(512, 512, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1) self.bn1 = nn.BatchNorm2d(512) self.deconv2 = nn.ConvTranspose2d(512, 256, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1) self.bn2 = nn.BatchNorm2d(256) self.deconv3 = nn.ConvTranspose2d(256, 128, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1) self.bn3 = nn.BatchNorm2d(128) self.deconv4 = nn.ConvTranspose2d(128, 64, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1) self.bn4 = nn.BatchNorm2d(64) self.deconv5 = nn.ConvTranspose2d(64, 32, kernel_size=3, stride=2, padding=1, dilation=1, output_padding=1) self.bn5 = nn.BatchNorm2d(32) self.classifier = nn.Conv2d(32, n_class, kernel_size=1) def forward(self, x): output = self.pretrained_net(x) x5 = output['x5'] # size=(N, 512, x.H/32, x.W/32) x4 = output['x4'] # size=(N, 512, x.H/16, x.W/16) x3 = output['x3'] # size=(N, 256, x.H/8, x.W/8) x2 = output['x2'] # size=(N, 128, x.H/4, x.W/4) x1 = output['x1'] # size=(N, 64, x.H/2, x.W/2) score = self.bn1(self.relu(self.deconv1(x5))) # size=(N, 512, x.H/16, x.W/16) score = score + x4 # element-wise add, size=(N, 512, x.H/16, x.W/16) score = self.bn2(self.relu(self.deconv2(score))) # size=(N, 256, x.H/8, x.W/8) score = score + x3 # element-wise add, size=(N, 256, x.H/8, x.W/8) score = self.bn3(self.relu(self.deconv3(score))) # size=(N, 128, x.H/4, x.W/4) score = score + x2 # element-wise add, size=(N, 128, x.H/4, x.W/4) score = self.bn4(self.relu(self.deconv4(score))) # size=(N, 64, x.H/2, x.W/2) score = score + x1 # element-wise add, size=(N, 64, x.H/2, x.W/2) score = self.bn5(self.relu(self.deconv5(score))) # size=(N, 32, x.H, x.W) score = self.classifier(score) # size=(N, n_class, x.H/1, x.W/1) return score # size=(N, n_class, x.H/1, x.W/1)
train.py中
vgg_model = VGGNet(requires_grad=True, remove_fc=True) fcn_model = FCNs(pretrained_net=vgg_model, n_class=n_class)
这里咱们重点看FCN
的forward函数
def forward(self, x): output = self.pretrained_net(x) x5 = output['x5'] # size=(N, 512, x.H/32, x.W/32) x4 = output['x4'] # size=(N, 512, x.H/16, x.W/16) x3 = output['x3'] # size=(N, 256, x.H/8, x.W/8) x2 = output['x2'] # size=(N, 128, x.H/4, x.W/4) x1 = output['x1'] # size=(N, 64, x.H/2, x.W/2) score = self.bn1(self.relu(self.deconv1(x5))) # size=(N, 512, x.H/16, x.W/16) score = score + x4 # element-wise add, size=(N, 512, x.H/16, x.W/16) score = self.bn2(self.relu(self.deconv2(score))) # size=(N, 256, x.H/8, x.W/8) score = score + x3 # element-wise add, size=(N, 256, x.H/8, x.W/8) score = self.bn3(self.relu(self.deconv3(score))) # size=(N, 128, x.H/4, x.W/4) score = score + x2 # element-wise add, size=(N, 128, x.H/4, x.W/4) score = self.bn4(self.relu(self.deconv4(score))) # size=(N, 64, x.H/2, x.W/2) score = score + x1 # element-wise add, size=(N, 64, x.H/2, x.W/2) score = self.bn5(self.relu(self.deconv5(score))) # size=(N, 32, x.H, x.W) score = self.classifier(score) # size=(N, n_class, x.H/1, x.W/1) return score # size=(N, n_class, x.H/1, x.W/1)
可见FCN的输入为(batch_size,c,h,w),输出为(batch_size,class,h,w).
首先是通过vgg的特征提取层,能够获得feature map. 5个max_pool后的feature map的size分别为
x5 = output['x5'] # size=(N, 512, x.H/32, x.W/32) x4 = output['x4'] # size=(N, 512, x.H/16, x.W/16) x3 = output['x3'] # size=(N, 256, x.H/8, x.W/8) x2 = output['x2'] # size=(N, 128, x.H/4, x.W/4) x1 = output['x1'] # size=(N, 64, x.H/2, x.W/2)
以后每个pool layer的feature map都通过一次2倍上采样,并与前一个pool layer的输出进行element-wise add.(resnet也有相似操做).从而使得上采样后的feature map信息更充分更精准,模型的鲁棒性会更好.
例如以输入图片尺寸为224x224为例,pool4的输出为(,512,14,14),pool5的输出为(,512,7,7),反卷积后获得(,512,14,14),再与pool4的输出作element-wise add。获得的仍然是(,512,14,14). 对这个输出作上采样获得(,256,28,28)再与pool3的输出相加. 依次类推,最终获得(,64,112,112).
此后,再作一次反卷积上采样获得(,32,224,224),以后卷积获得(,n_class,224,224)。即获得n_class张224x224的feature map。
下图显示了随着上采样的进行,获得的feature map细节愈来愈丰富.
criterion = nn.BCEWithLogitsLoss()
损失函数采用二分类交叉熵.torch中有2个计算二分类交叉熵的函数
后者只是在前者的基础上,对输入先作一个sigmoid将输入转换到0-1之间.即BCEWithLogitsLoss = Sigmoid + BCELoss
一个具体的例子能够参考:https://blog.csdn.net/qq_22210253/article/details/85222093