SSE图像算法优化系列二十三: 基于value-and-criterion structure 系列滤波器（如Kuwahara，MLV，MCV滤波器）的优化。

时间 2019-11-06

标签 sse 图像算法优化系列二十三基于 value criterion structure 滤波器 kuwahara mlv mcv 繁體版

原文原文链接

基于value-and-criterion structure方式的实现的滤波器在原理上其实比较简单，感受下面论文中得一段话已经描述的比较清晰了，直接贴英文吧，感受翻译过来反而失去了原始的韵味了。html

The value-and-criterion filter structure is based on the geometrical structure of mathematical morphology, but allows the use of a much wider variety of operations (both linear and nonlinear) than standard morphology. Morphological filters usually only use extreme order statistic operations (maximum and minimum). The new filter structure is much more flexible, but still retains the geometric characteristics of mathematical morphology.算法

The value-and-criterion filter structure is derived from the “compound” structure of the morphological operations opening and closing. Opening and closing are both sequential operations; opening consists of the successive application of the basic morphological operations of erosion and dilation, and closing is dilation followed by erosion. Like opening or closing, a value- and-criterion function has two distinct stages. However, a value-and-criterion filter can have two separate operations acting in parallel in the first stage, as opposed to a single operation (such as erosion in the case of opening). The second stage of a value-and-criterion function uses the output of one of these first stage operations to determine which output from the other first stage operation is selected as the final output。数组

　　简单的理解，value-and-criterion structure这种结构的滤波器有两个元素，一个是值(Value)函数，一个是评判标准(criterion)函数，普通的过程咱们咱们只须要计算出值函数，就结束了，而value-and-criterion structure这种结构须要根据评判标准来选择最后的值函数的值。app

　　经典的此类函数有Kuwahara滤波器，Mean of Least Variance(MLV)滤波器，Minimum Coefficient of Variation(MCV)滤波器等，咱们稍微介绍一下。ide

　　1、Kuwahara滤波器函数

很明显，这个是由Kuwahara等人提出来的。其基本原理是：计算图像模板中邻域内的均值和方差，选择图像灰度值较为均匀的区域的均值替代模板中心像素灰度值。而图像模板中较为均匀的区域所对应的方差是最小的。为了获取图像模板中较为均匀区域的均值，滤波区域R被划分为K个重叠的子区域R1，R2,…, Rk。位于图像中位置为（u，v）处的每个像素，其所对应的每个模板子区域的均值和方差都将被计算。那么标准的Kuwahara滤波器只计算四块邻域的方差，取方差最小的邻域计算其平均值，获得的结果做为目标像素的新值。固然后来还有Tomita-Tsuji，Nagao-Matsuyama’s等人提出了不一样的领域，详见https://blog.csdn.net/lz0499/article/details/54646952一文。
post

　　在这个滤波器里，value函数对应的是均值，criterion 函数对应的是均方差，从算法实现上核心的是均值和方差的快速实现，这在个人博文SSE图像算法优化系列十四：局部均方差及局部平方差算法的优化里已经有描述，此处不赘述。测试

那么一个简单的Kuwahara滤波器的主要代码可能以下：flex

int IM_KuwaharaFilter(unsigned char *Src, unsigned char *Dest, int Width, int Height, int Stride, int Radius) { int Channel = Stride / Width; if ((Src == NULL) || (Dest == NULL))                                return IM_STATUS_NULLREFRENCE; if ((Width <= 0) || (Height <= 0))                                    return IM_STATUS_INVALIDPARAMETER; if ((Channel != 1) && (Channel != 3))                                return IM_STATUS_INVALIDPARAMETER; int Status = IM_STATUS_OK; unsigned char *Average = (unsigned char *)malloc(Height * Width * sizeof(unsigned char)); int *Deviation = (int *)malloc(Height * Width * sizeof(int)); int *RowOffset = (int *)malloc((Width + Radius + Radius) * sizeof(int)); int *ColOffset = (int *)malloc((Height + Radius + Radius) * sizeof(int)); unsigned char *Blur = NULL; if ((Average == NULL) || (Deviation == NULL) || (RowOffset == NULL) || (ColOffset == NULL)) { Status = IM_STATUS_OUTOFMEMORY; goto FreeMemory; } if (Channel == 1) { Status = IM_GetLocalMeanAndDeviationFilter(Src, Average, Deviation, Width, Height, Stride, Radius, false); if (Status != IM_STATUS_OK)    goto FreeMemory; } else { Blur = (unsigned char *)malloc(Height * Stride * sizeof(unsigned char));        // 彩色须要使用明度通道来计算均方差，否则会出现彩色异常块
        if (Blur == NULL) { Status = IM_STATUS_OUTOFMEMORY; goto FreeMemory; } 
        Status = IM_GetLuminance(Src, Average, Width, Height, Stride, false);            // 获得明度通道
        if (Status != IM_STATUS_OK)    goto FreeMemory; Status = IM_GetLocalMeanAndDeviationFilter(Average, NULL, Deviation, Width, Height, Width, Radius, false);    // 无需保存对应的均值，由于没用
        if (Status != IM_STATUS_OK)    goto FreeMemory; Status = IM_BoxBlur(Src, Blur, Width, Height, Stride, Radius);                    // 彩色模糊
        if (Status != IM_STATUS_OK)    goto FreeMemory; } for (int Y = 0; Y < Height + Radius + Radius; Y++) ColOffset[Y] = IM_GetMirrorPos(Height, Y - Radius); for (int X = 0; X < Width + Radius + Radius; X++) RowOffset[X] = IM_GetMirrorPos(Width, X - Radius); for (int Y = 0; Y < Height; Y++) { int *LinePST = Deviation + ColOffset[Y] * Width;            // 均方差
        int *LinePSB = Deviation + ColOffset[Y + Radius + Radius] * Width; unsigned char *LinePD = Dest + Y * Stride; if (Channel == 1) { unsigned char *LinePT = Average + ColOffset[Y] * Width;        // 均值
            unsigned char *LinePB = Average + ColOffset[Y + Radius + Radius] * Width; for (int X = 0; X < Width; X++) { int OffsetL = RowOffset[X];                                            // 求4个点的均方差的最小值
                int OffsetR = RowOffset[X + Radius + Radius]; int Min, Mean; if (LinePST[OffsetL] < LinePST[OffsetR]) { Min = LinePST[OffsetL]; Mean = LinePT[OffsetL]; } else { Min = LinePST[OffsetR]; Mean = LinePT[OffsetR]; } if (Min > LinePSB[OffsetL]) { Min = LinePSB[OffsetL]; Mean = LinePB[OffsetL]; } if (Min > LinePSB[OffsetR]) { Min = LinePSB[OffsetR]; Mean = LinePB[OffsetR]; } LinePD[X] = Mean; } } else { unsigned char *LinePT = Blur + ColOffset[Y] * Stride;        // 均值
            unsigned char *LinePB = Blur + ColOffset[Y + Radius + Radius] * Stride; for (int X = 0; X < Width; X++) { int OffsetL = RowOffset[X];                                            // 求4个点的均方差的最小值
                int OffsetR = RowOffset[X + Radius + Radius]; int Min, MeanB, MeanG, MeanR; if (LinePST[OffsetL] < LinePST[OffsetR]) { Min = LinePST[OffsetL]; MeanB = LinePT[OffsetL * 3 + 0]; MeanG = LinePT[OffsetL * 3 + 1]; MeanR = LinePT[OffsetL * 3 + 2]; } else { Min = LinePST[OffsetR]; MeanB = LinePT[OffsetR * 3 + 0]; MeanG = LinePT[OffsetR * 3 + 1]; MeanR = LinePT[OffsetR * 3 + 2]; } if (Min > LinePSB[OffsetL]) { Min = LinePSB[OffsetL]; MeanB = LinePB[OffsetL * 3 + 0]; MeanG = LinePB[OffsetL * 3 + 1]; MeanR = LinePB[OffsetL * 3 + 2]; } if (Min > LinePSB[OffsetR]) { Min = LinePSB[OffsetR]; MeanB = LinePB[OffsetR * 3 + 0]; MeanG = LinePB[OffsetR * 3 + 1]; MeanR = LinePB[OffsetR * 3 + 2]; } LinePD[0] = MeanB; LinePD[1] = MeanG; LinePD[2] = MeanR; LinePD += 3; } } } FreeMemory: if (Average != NULL)        free(Average); if (Deviation != NULL)        free(Deviation); if (RowOffset != NULL)        free(RowOffset); if (ColOffset != NULL)        free(ColOffset); if (Blur != NULL)            free(Blur); return Status; }

　　咱们跳过内存分配和均方差等的计算过程，咱们来看看灰度版本在宽度方向上是如何根据标准函数来获取值函数的：优化

            for (int X = 0; X < Width; X++) { int OffsetL = RowOffset[X];                                            // 求4个点的均方差的最小值
                int OffsetR = RowOffset[X + Radius + Radius]; int Min, Mean; if (LinePST[OffsetL] < LinePST[OffsetR]) { Min = LinePST[OffsetL]; Mean = LinePT[OffsetL]; } else { Min = LinePST[OffsetR]; Mean = LinePT[OffsetR]; } if (Min > LinePSB[OffsetL]) { Min = LinePSB[OffsetL]; Mean = LinePB[OffsetL]; } if (Min > LinePSB[OffsetR]) { Min = LinePSB[OffsetR]; Mean = LinePB[OffsetR]; } LinePD[X] = Mean; }

　　很简单，就是对四个取样点，获取方差的最小值，而后同步获取对应的均值的值，做为结果值。

本质上讲，这个也是个边缘保留滤波器，虽然其保边效果不怎么样，其实Kuwahara主要是做为一个艺术化的滤镜存在，在开源的GPUImage里也有Kuwahara滤波器的实现，参见GPUImageKuwaharaFilter.h文件。

　　我也用别人用的一个美女图，贴个效果吧。

　　　　　　　　　　　　原图　　　　　　　　　　　　　　　　　　　　　　　　　　　　半径1　　　　　　　　　　　　　　　　　　　　　　　　半径5

　注意，对于彩色图像，咱们不易将他们分通道来计算，由于分通道时，同一个像素各通道对应的方差最小值不必定在同一位置，此时就会在最终的结果图像上出现色斑，这不是咱们所但愿的，在做者的文章中也有相似的说法，以下所示，此时咱们可取明度值做为各通道的统一判断依据。

Obviously the Normal filter can't be used for color images by applying the filter to each RGB channel separately and then using the three resulting channels to compose the image. The main problem with this is that the sub regions will have different variancesfor each of the channels. For example, a region with the lowest variance in the red channel might have the highest variance in the green channel. This once again causes abiguity which would result in the color of the central pixel to be determined by several regions, which might also result in blurrier edges.

2、MLV滤波器

　　MLV滤波器也是一个典型的value-and-criterion structure结构滤波器，其参考论文见： A morphology-based filter structure for edge-enhanceing smoothing，和Kuwahara不一样的是，MLV滤波器并不仅取周边四领域的均方差做为标准，而是周边全部包含了中心点店像素的领域的均方差做为判断依据，取均方差最小那个位置的均值来替代模板中心的值。那这个时候计算的耗时因素就必须归入考虑范围，由于随着半径的增大，这个计算量急剧上升。

　　此时咱们想到了前面个人博文SSE图像算法优化系列七：基于SSE实现的极速的矩形核腐蚀和膨胀（最大值和最小值）算法中提到了和半径无关的最值算法的优化，那里巧妙的运用了一些特性，使得算法在半径大以后有可能计算时间还有所降低。

　　可是咱们注意到这里的MLV滤波器并非简单的最值计算，其还带了一个尾巴，在计算最值得时候还须要保留另一个参数同步，这就有点像C的sort函数功能，好比我要对一个结构体数组进行排序，而排序的规则是根据结构体中某一个成员的值来决定的，sort函数能够轻松搞定这个功能，若是咱们用普通的C语言实现上面博文的算法，也是能够容易搞定这个的，可是我这里须要进行SSE处理，那咱们先来看看源博文这一块的核心实现语句。

    __m128i Max1 = _mm_set1_epi8(255), Max2 = _mm_set1_epi8(255);                // 这样写能减小一次内存加载
    for (int Y = StartY; Y < EndY; Y++) { Max1 = _mm_min_epu8(_mm_loadu_si128((__m128i *)(LinePS + 0)), Max1);    // 一条AVX指令
        Max2 = _mm_min_epu8(_mm_loadu_si128((__m128i *)(LinePS + 16)), Max2); _mm_store_si128((__m128i *)(LinePD + 0), Max1); _mm_store_si128((__m128i *)(LinePD + 16), Max2); LinePS += Stride; LinePD += 32; }

　　很简单，就是_mm_min_epu8的一个简单调用，这里没有任何的比较函数。

若是咱们在计算最值得时候还须要利用最值得特性附带上另一个参量，很明显，若是直接使用上述过程，咱们没法作到这一点。必须想办法带上比较特征。

　　实际上，min系列的比较函数，咱们也可使用cmp系列的SSE函数来实现，好比要实现两个32位数的SSE函数的比较，除了使用_mm_min_epi32外，还可使用下面的代码：

    __m128i S1 = _mm_set_epi32(10, 20, 30, 50); __m128i S2 = _mm_set_epi32(15, 8, 34, 234); __m128i Min1 = _mm_min_epi32(S1, S2); __m128i Cmp = _mm_cmplt_epi32(S1, S2); __m128i Min2 = _mm_blendv_epi8(S2, S1, Cmp); for (int Y = 0; Y < 4; Y++) { printf("%d \t", Min1.m128i_i32[Y]); } printf("\n"); for (int Y = 0; Y < 4; Y++) { printf("%d \t", Min2.m128i_i32[Y]); }

　　结果以下，彻底同样：

　　注意此时咱们在过程当中得到了一些标志信息，好比上述代码中得Cmp变量，这个很是有用，有了它，咱们就至关于有了一枚旗帜，在队伍A里有人举起了一面旗帜，咱们只要在队伍B里对应的位置找到对应的人就能够了。这个功能一样是能够借助于_mm_blendv_epi8这个来实现的。

　　具体到本例，通常方差（不是均方差，此例为未除以取样总数的值，即(v0-m)*(v0-m)+(v1-m)*(v1-m)+...+(vn-m)*(vn-m)的值，其中m表示v0到vn的平均值，中间用浮点保存），能够用整形表示（在计算最后舍入），而平均值直接使用unsigned char表示，如上IM_KuwaharaFilter函数所示。此时，按照最值那篇文章的例子，咱们能够一次性比较四个SSE变量，这样产生了4个比较标志位，而此时咱们须要附带的信息是4*4个字节类型的数据，此时还需将4个比较标志位的SSE变量合并成一个SSE变量，以方便_mm_blendv_epi8使用，这个时候咱们就能够借用packs系列的函数了，具体以下所示:

    __m128i Min0 = _mm_set1_epi32(IM_INT_MAX), Min1 = _mm_set1_epi32(IM_INT_MAX);        // 同时处理16个方差数据
    __m128i Min2 = _mm_set1_epi32(IM_INT_MAX), Min3 = _mm_set1_epi32(IM_INT_MAX); __m128i Avg = _mm_setzero_si128(), AvgB = _mm_setzero_si128(); __m128i AvgG = _mm_setzero_si128(), AvgR = _mm_setzero_si128(); for (int Y = StartY; Y < EndY; Y++) { __m128i Src0 = _mm_loadu_si128((__m128i *)(LinePS + 0)); __m128i Src1 = _mm_loadu_si128((__m128i *)(LinePS + 4)); __m128i Src2 = _mm_loadu_si128((__m128i *)(LinePS + 8)); __m128i Src3 = _mm_loadu_si128((__m128i *)(LinePS + 12)); __m128i Cmp0 = _mm_cmplt_epi32(Src0, Min0); __m128i Cmp1 = _mm_cmplt_epi32(Src1, Min1); __m128i Cmp2 = _mm_cmplt_epi32(Src2, Min2); __m128i Cmp3 = _mm_cmplt_epi32(Src3, Min3); Min0 = _mm_blendv_epi8(Min0, Src0, Cmp0);            // 不使用_mm_min_epi32，主要是为了获得标识符
        Min1 = _mm_blendv_epi8(Min1, Src1, Cmp1); Min2 = _mm_blendv_epi8(Min2, Src2, Cmp2); Min3 = _mm_blendv_epi8(Min3, Src3, Cmp3); _mm_storeu_si128((__m128i *)(LinePD + 0), Min0); _mm_storeu_si128((__m128i *)(LinePD + 4), Min1); _mm_storeu_si128((__m128i *)(LinePD + 8), Min2); _mm_storeu_si128((__m128i *)(LinePD + 12), Min3); if (Channel == 1)                                // 注意要使用_mm_packs_epi16而不是_mm_packus_epi16
 { Avg = _mm_blendv_epi8(Avg, _mm_loadu_si128((__m128i *)LinePA), _mm_packs_epi16(_mm_packs_epi32(Cmp0, Cmp1), _mm_packs_epi32(Cmp2, Cmp3))); _mm_storeu_si128((__m128i *)LinePM, Avg);            // 最小方差对应的均值
 } else { __m128i Mask0 = _mm_or_si128(_mm_shuffle_epi8(Cmp0, _mm_setr_epi8(0, 0, 0, 4, 4, 4, 8, 8, 8, 12, 12, 12, -1, -1, -1, -1)), _mm_shuffle_epi8(Cmp1, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 4))); __m128i Mask1 = _mm_or_si128(_mm_shuffle_epi8(Cmp1, _mm_setr_epi8(4, 4, 8, 8, 8, 12, 12, 12, -1, -1, -1, -1, -1, -1, -1, -1)), _mm_shuffle_epi8(Cmp2, _mm_setr_epi8(-1, -1, -1, -1, -1, -1, -1, -1, 0, 0, 0, 4, 4, 4, 8, 8))); __m128i Mask2 = _mm_or_si128(_mm_shuffle_epi8(Cmp2, _mm_setr_epi8(8, 12, 12, 12, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1, -1)), _mm_shuffle_epi8(Cmp3, _mm_setr_epi8(-1, -1, -1, -1, 0, 0, 0, 4, 4, 4, 8, 8, 8, 12, 12, 12))); AvgB = _mm_blendv_epi8(AvgB, _mm_loadu_si128((__m128i *)LinePA), Mask0); AvgG = _mm_blendv_epi8(AvgG, _mm_loadu_si128((__m128i *)(LinePA + 16)), Mask1); AvgR = _mm_blendv_epi8(AvgR, _mm_loadu_si128((__m128i *)(LinePA + 32)), Mask2); _mm_storeu_si128((__m128i *)LinePM, AvgB); _mm_storeu_si128((__m128i *)(LinePM + 16), AvgG); _mm_storeu_si128((__m128i *)(LinePM + 32), AvgR); } LinePS += Width; LinePA += Stride; LinePD += 16; LinePM += 16 * Channel; }

　　注意在上述代码中咱们使用packs而非packus，请读者自行理解这是为何。

　　对于彩色图像，根据上面的对Kuwahara滤波器的描述，也是须要计算明度的，此时，一个标志位对应的就是就是3个份量，此时就不能使用packs之类的函数，而能够直接使用上述的_mm_shuffle_epi8类的函数能够轻松搞定。

　　固然具体的过程还涉及到不少其余东西，好比int类型矩阵的转置，内存的有效利用等等，这里不予以多述。

　　3、MCV滤波器

　　同MLV滤波器稍微有点不一样的时，MCV滤波器的评判标准有稍做改动，其参考文件见：An edge-enhancing nonlinear filter for reducing multiplicative noise，可是核心的实现没有太大的区别。

　　按照论文的说法，MCV滤波器能够有效地去除乘性噪音，而MLV能够去除加性噪音，以下所示：

The MCV filter therefore uses the sample mean for the value function, the coefficient of variation as the criterion function, and the minimum as the selection function. It is a value-and-criterion filter specifically designed to reduce multiplicative noise.

　　以及

The MCV filter is closely related to a value-and-criterion filter designed to remove additive noise known as the Mean of Least Variance (MLV) filter [7–9]. The MLV filter uses the sample variance as its criterion function rather than the coefficient of variation. In images corrupted by additive noise, the variance is theoretically minimum in structuring elements where the signal is constant.

　　MLV和MCV能产生比Kuwahara更为平滑的图像，在SAR图像以及医疗图像上应该有必定用武之地。并且我的感受在小半径时其产生的结果对图像的边界分割、边缘查找等也有必定的帮助。

　　　　　　　　　　　　　　　　　　原图　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　Kuwahara

　　　　　　　　　　　　　　　　　　　　MLV　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　　MCV

　　上述操做半径都为5。

　　处理后，彷佛边缘的分界线更为明显。

　　固然我贴的图形其实都不是论文自己强调的应用场景，只是作个例证而已。

　　关于这种相似的滤波器，还可使用圆形半径，或者是椭圆半径，也有一些人对这方面的需求作了研究，好比https://blog.csdn.net/qq_16013649/article/details/78744910。

　　针对1024*768的彩色图像，此类算法优化后大约耗时20ms，灰度图约10ms，速度也还算比较快了，测试见附件的SSE_Optimization_Demo的stylize菜单。

Demo下载地址：https://files.cnblogs.com/files/Imageshop/SSE_Optimization_Demo.rar

SSE图像算法优化系列二十三: 基于value-and-criterion structure 系列滤波器（如Kuwahara，MLV，MCV滤波器）的优化。

原图 半径1 半径5

　　　　　　　　　　　　原图　　　　　　　　　　　　　　　　　　　　　　　　　　　　半径1　　　　　　　　　　　　　　　　　　　　　　　　半径5