使用图像分割，绕不开的Dice损失：Dice损失理论+代码

在不少关于医学图像分割的竞赛、论文和项目中，发现 Dice 系数(Dice coefficient) 损失函数出现的频率较多，这里整理一下。使用图像分割，绕不开Dice损失，这个就比如在目标检测中绕不开IoU同样。python

1 概述

Dice损失和Dice系数（Dice coefficient）是同一个东西，他们的关系是：网络

DiceLoss=1−DiceCoefficientDiceLoss = 1-DiceCoefficientDiceLoss=1−DiceCoefficient机器学习

1.2 Dice 定义

Dice系数, 根据 Lee Raymond Dice命名，是一种集合类似度度量函数，一般用于计算两个样本的类似度(值范围为 [0, 1])。

DiceCoefficient=2∣X⋂Y∣∣X∣+∣Y∣DiceCoefficient = \frac{2|X \bigcap Y|}{|X| + |Y|}DiceCoefficient=∣X∣+∣Y∣2∣X⋂Y∣函数

其中∣X∣⋂∣Y∣|X| \bigcap |Y|∣X∣⋂∣Y∣表示X和Y集合的交集，|X|和|Y|表示其元素个数，对于分割任务而言，|X|和|Y|表示分割的ground truth和predict_mask。学习

此外，咱们能够获得Dice Loss的公式：this

DiceLoss=1−2∣X⋂Y∣∣X∣+∣Y∣DiceLoss = 1- \frac{2|X \bigcap Y|}{|X| + |Y|}DiceLoss=1−∣X∣+∣Y∣2∣X⋂Y∣url

2 手推案例

这个Dice网上有一个很是好二分类的Dice Loss的手推的案例，很是好理解，过程分红两个部分：spa

先计算∣X∣⋂∣Y∣|X|\bigcap|Y|∣X∣⋂∣Y∣
再计算∣X∣|X|∣X∣和∣Y∣|Y|∣Y∣

计算loss咱们必然已经有了这两个参数，模型给出的output，也就是预测的mask；数据集中的ground truth（GT），也就是真实的mask。.net

在不少关于医学图像分割的竞赛、论文和项目中，发现 Dice 系数(Dice coefficient) 损失函数出现的频率较多，这里整理一下。使用图像分割，绕不开Dice损失，这个就比如在目标检测中绕不开IoU同样。code

1 概述

Dice损失和Dice系数（Dice coefficient）是同一个东西，他们的关系是：

DiceLoss=1−DiceCoefficientDiceLoss = 1-DiceCoefficientDiceLoss=1−DiceCoefficient

1.2 Dice 定义

Dice系数, 根据 Lee Raymond Dice命名，是一种集合类似度度量函数，一般用于计算两个样本的类似度(值范围为 [0, 1])。

DiceCoefficient=2∣X⋂Y∣∣X∣+∣Y∣DiceCoefficient = \frac{2|X \bigcap Y|}{|X| + |Y|}DiceCoefficient=∣X∣+∣Y∣2∣X⋂Y∣

其中∣X∣⋂∣Y∣|X| \bigcap |Y|∣X∣⋂∣Y∣表示X和Y集合的交集，|X|和|Y|表示其元素个数，对于分割任务而言，|X|和|Y|表示分割的ground truth和predict_mask。

此外，咱们能够获得Dice Loss的公式：

DiceLoss=1−2∣X⋂Y∣∣X∣+∣Y∣DiceLoss = 1- \frac{2|X \bigcap Y|}{|X| + |Y|}DiceLoss=1−∣X∣+∣Y∣2∣X⋂Y∣

2 手推案例

这个Dice网上有一个很是好二分类的Dice Loss的手推的案例，很是好理解，过程分红两个部分：

先计算∣X∣⋂∣Y∣|X|\bigcap|Y|∣X∣⋂∣Y∣
再计算∣X∣|X|∣X∣和∣Y∣|Y|∣Y∣

计算loss咱们必然已经有了这两个参数，模型给出的output，也就是预测的mask；数据集中的ground truth（GT），也就是真实的mask。

固然还没完，还要把结果加和：

对于二分类问题，GT分割图是只有 0, 1 两个值的，所以能够有效的将在 Pred 分割图中未在 GT 分割图中激活的全部像素清零. 对于激活的像素，主要是惩罚低置信度的预测，较高值会获得更好的 Dice 系数.

关于计算∣X∣|X|∣X∣和∣Y∣|Y|∣Y∣，以下：

其中须要注意的是，一半状况下，这个是直接对全部元素求和，固然有对全部元素先平方再求和的作法。总之就这么多，很是的简单好用。不过上面的内容是针对分割二分类的状况，对于多分类的状况和二分类基本相同。

3 二分类代码实现

在实现的时候，每每会加上一个smooth，防止分母为0的状况出现。因此公式变成：

DiceLoss=1−2∣X⋂Y∣+smooth∣X∣+∣Y∣+smoothDiceLoss = 1- \frac{2|X \bigcap Y|+smooth}{|X| + |Y|+smooth}DiceLoss=1−∣X∣+∣Y∣+smooth2∣X⋂Y∣+smooth

通常smooth为1

3.1 PyTorch实现

先是dice coefficient的实现，pred和target的shape为【batch_size,channels,...】,2D和3D的均可以用这个。

def dice_coeff(pred, target):
    smooth = 1.
    num = pred.size(0)
    m1 = pred.view(num, -1)  # Flatten
    m2 = target.view(num, -1)  # Flatten
    intersection = (m1 * m2).sum()
 
    return (2. * intersection + smooth) / (m1.sum() + m2.sum() + smooth)
复制代码

固然dice loss就是1-dice ceofficient，因此能够写成：

def dice_coeff(pred, target):
    smooth = 1.
    num = pred.size(0)
    m1 = pred.view(num, -1)  # Flatten
    m2 = target.view(num, -1)  # Flatten
    intersection = (m1 * m2).sum()
 
    return 1-(2. * intersection + smooth) / (m1.sum() + m2.sum() + smooth)
复制代码

3.2 keras实现

smooth = 1. # 用于防止分母为0.
def dice_coef(y_true, y_pred):
    y_true_f = K.flatten(y_true) # 将 y_true 拉伸为一维.
    y_pred_f = K.flatten(y_pred)
    intersection = K.sum(y_true_f * y_pred_f)
    return (2. * intersection + smooth) / (K.sum(y_true_f * y_true_f) + K.sum(y_pred_f * y_pred_f) + smooth)

def dice_coef_loss(y_true, y_pred):
    return 1. - dice_coef(y_true, y_pred)
复制代码

3.3 tensorflow实现

def dice_coe(output, target, loss_type='jaccard', axis=(1, 2, 3), smooth=1e-5):
    """ Soft dice (Sørensen or Jaccard) coefficient for comparing the similarity of two batch of data, usually be used for binary image segmentation i.e. labels are binary. The coefficient between 0 to 1, 1 means totally match. Parameters ----------- output : Tensor A distribution with shape: [batch_size, ....], (any dimensions). target : Tensor The target distribution, format the same with `output`. loss_type : str ``jaccard`` or ``sorensen``, default is ``jaccard``. axis : tuple of int All dimensions are reduced, default ``[1,2,3]``. smooth : float This small value will be added to the numerator and denominator. - If both output and target are empty, it makes sure dice is 1. - If either output or target are empty (all pixels are background), dice = ```smooth/(small_value + smooth)``, then if smooth is very small, dice close to 0 (even the image values lower than the threshold), so in this case, higher smooth can have a higher dice. Examples --------- >>> outputs = tl.act.pixel_wise_softmax(network.outputs) >>> dice_loss = 1 - tl.cost.dice_coe(outputs, y_) References ----------- - `Wiki-Dice <https://en.wikipedia.org/wiki/Sørensen–Dice_coefficient>`__ """
    inse = tf.reduce_sum(output * target, axis=axis)
    if loss_type == 'jaccard':
        l = tf.reduce_sum(output * output, axis=axis)
        r = tf.reduce_sum(target * target, axis=axis)
    elif loss_type == 'sorensen':
        l = tf.reduce_sum(output, axis=axis)
        r = tf.reduce_sum(target, axis=axis)
    else:
        raise Exception("Unknow loss_type")
    dice = (2. * inse + smooth) / (l + r + smooth)
    dice = tf.reduce_mean(dice)
    return dice
复制代码

4 多分类

假设是一个10分类的任务，那么咱们应该会有一个这样的模型预测结果：[batch_size,10,width,height]，而后咱们的ground truth须要改为one hot的形式，也变成[batch_size,10,width,height]。剩下的和二分类的代码基本相同了，先ground truth和预测结果对应元素相乘，而后对相乘的结果求和。就是最后须要对每个类别和每个样本都求一次平均就好了。

5 深刻探讨Dice，IoU

上图就是咱们常见的IoU方法，假设分子的两个集合，一个集合是Ground Truth，另一个集合是神经网络给出的预测值。不要被图中的正方形的形状限制了想一想，对于分割任务来讲，通常是像素级的不规则图案。

若是预测正确，也就是分子中的蓝色交汇的部分，称之为True Positive，属于True Positive的像素的数量就是分子的值。分母的值是Ground Truth的全部像素的数量和预测结果中全部像素的数量的和再减去重叠的部分的像素数量。

直接学过recall，precision，混淆矩阵，f1score的朋友必定对FN，TP，TN，FP这些不陌生：

黄色区域：预测为negative，可是GT中是positive的False Negative区域；
红色区域：预测为positive，可是GT中是Negative的False positive区域；

对于IoU的预测好坏的直观理解就是： 简单的说就是，重叠的越多，IoU越接近1，预测效果越好。

如今让咱们更好的从IoU过渡到Dice，咱们先把IoU的算式写出来：

IoU=TPTP+FP+FNIoU = \frac{TP}{TP+FP+FN}IoU=TP+FP+FNTP

Dice的算式，结合咱们以前讲的内容，能够推导出，∣X∣⋂∣Y∣|X|\bigcap|Y|∣X∣⋂∣Y∣就是TP，∣X∣|X|∣X∣假设是GT的话就是FN+TP，∣Y∣|Y|∣Y∣假设是预测的mask，就是TP+FP，因此：

Dicecoefficient=2×TPTP+FN+TP+FPDice_coefficient = \frac{2\times TP}{TP+FN + TP + FP}Dicecoefficient=TP+FN+TP+FP2×TP

因此咱们能够获得Dice和IoU之间的关系了，这里的以后的Dice默认表示Dice Coefficient：

IoU=Dice2−DiceIoU = \frac{Dice}{2-Dice}IoU=2−DiceDice

这个函数图像以下图，咱们只关注0～1这个区间就行了，能够发现：

IoU和Dice同时为0，同时为1；这很好理解，就是全预测正确和所有预测错误
假设在相同的预测状况下，能够发现Dice给出的评价会比IoU高一些，哈哈哈。因此Dice的数据会更加好看一些。

做者：机器学习炼丹术