Metal 系列教程（2）- Metal 实现 LUT 滤镜

时间 2019-11-30

标签 metal 系列教程实现 lut 繁體版

原文原文链接

简单滤镜

在咱们平时作图像处理的过程当中，最长作的就是改变总体图像的某个颜色。
咱们举个例子，若是作一个将全部 RGB 中的 R 值改成原来的 0.5 倍，根据上一个 wiki 里面所提到的，一张图表绘制的过程是先顶点 vertex 再 fragment，而 fragment 是负责绘制每一个像素的颜色。ios

fragment float4 myFragmentShader(
                                VertexOut vertexIn [[stage_in]],
                            texture2d<float,access::sample>   inputImage   [[ texture(0) ]],
                                 sampler textureSampler [[sampler(0)]]
                             )
{
    float4 color = inputImage.sample(textureSampler, vertexIn.texCoords);
    return color;

}复制代码

因此就在这个 shader 里面将返回的 color 的 r 值乘上 0.5，就可以实现咱们想要的效果。bash

return float4(color.r * 0.5 ,color.gba)app

从新运行咱们以前的 demo ，咱们的三角就有点绿了，说明咱们的效果实现了。机器学习

ColorLUT

可是上面是理想状况，通常图片的处理会复杂的多。
假设咱们的图片是 1280 * 720 像素，那么就会进行 921600 的浮点运算，对每一个像素的 r 值乘以 0.5。
若是图片小的话，对 GPU 的计算来讲并无什么压力，可是当图片更大而且数量更多的时候，就是会影响 GPU 计算的速度了。ide

look up table
顾名思义就是查找表，而 ColorLUT 就是颜色查找表。函数

因此引入了查询表，把对应的变换完的像素存起来，用的时候只要进行一次查询操做就能够，这样的操做会比以前的查表操做快的多，特别是在负载的颜色运算的状况下。post

可是要把全部颜色的变换都存储起来，假设是 RGB24 ，一个是 8 3 24位，RGB 每一个颜色都是 0-255，全部一共有 16777216 个颜色的变换，全存下来就是 256 256 256 24 / 8 / 1024/ 1024 = 48 mb，若是每一个滤镜都是 48 mb 的话，那图片处理软件里面那么多滤镜，app 的大小不得没边了？学习

因此为了解决这一问题就有了 ColorLut 这样的标准滤镜图片，默认的是以下的图片，512*512 ，表明着全部颜色的变换，若不在图片中的颜色就去对应的差值：ui

这是一张标准颜色的图，rbg 都是原来的颜色，因此对这张图片进行颜色的调整，而后获得一张新的 lut 图片，新的图片加上修改后的 lut 图片滤镜就能够查询到对应的颜色该怎么替换，从而的到新的图片。编码

下面咱们来解释下上面的这张图片和如何使用：
首先观察一下这个图片

8*8 的方块组成
总体上看每一个方块左上角从左上往右下由黑变蓝
单独每一个个方块的右上角是红色为主
单独每一个个方块的左下角是绿色为主

上述的信息有没有给你一点点启示呢？
咱们在简化一点
颜色是 r g b 三个值，都以归一化的值表示（ 1 表明 255 ）。

总体对每一个小方块而言，从左上往右下 b 从 0 到 1 ，是 z 字型的顺序
单独对每一个小方块而言，从左到右 r 从 0 到 1，表明 x
单独对每一个小方块而言，从上到下 g 从 0 到 1，表明 y

因此获得 0,0,1 的纯蓝色对应的位置就是 (7 64 , 7 64)，右下角的那个方块。

如今让咱们经过个例子，来演示一遍查询的过程。

假设咱们如今须要获取的颜色是（0.4，0.6，0.2）都采用归一化坐标

首先咱们肯定用哪一个方块 b = 0.2 * 63 = 12.6 即（4，1）那个方块
r = 0.4 63 = 25.6，g = 0.6 63 = 37.8 转换到大坐标(4 64 + 25.6, 164 + 37.8)
前三步获得的都是浮点数，可是咱们滤镜的图像的像素都是固定的，不存在小数
对于 r,g 最后将的到的坐标再转换为归一化坐标，( (4 64 + 25.6)/512, (164 + 37.8)/512),经过取样器 sampler 插值取出精确颜色值
对于 b 咱们能够经过对下一个方块 (5,1)再进行取色，再把两个颜色混合获得最后的颜色

Metal 图像处理

在上一篇中，咱们提到 CommandBuffer 有三种 Encoder 。

MTLRenderCommandEncoder 渲染 3D 编码器
MTLComputeCommandEncoder 计算编码器
MTLBlitCommandEncoder 位图复制编码器拷贝 buffer texture 同时也能生成 mipmap

以前的 demo 是简单的对图像进行绘制，用的是 MTLRenderCommandEncoder 的 Encoder。
此次咱们对图片添加滤镜，用到的是 MTLComputeCommandEncoder ，经过 GPU 的计算能力，来为咱们实现查询 lut，并混合颜色的操做。

简而言之，相比以前的渲染操做，是输入图片的 texture 就能渲染出来了，滤镜咱们须要作的是有个处理的方法，咱们给 GPU 输入原始图片 texture 和 lut 图片的 texture ， GPU 返回给咱们一个新的添加完滤镜的图片 texture，咱们把这个 texture 再给咱们以前的渲染的 Encoder，就会在三角中绘制一张咱们加过滤镜以后的图片了。

咱们延续以前的 demo，Device 和 CommandQueue ，CommandBuff，默认都已经有了咱们在以前的渲染的 Encoder 以前增长一个 Compute 的 Encoder。

每一个 Encoder 都须要一个 PipelineState 负责连接 Shader 的方法
这里新建个 ComputePipelineState ，对应的 shader 方法稍后介绍。

id<MTLLibrary> library = [device newDefaultLibrary];
 id<MTLFunction> function = [library newFunctionWithName:@"image_filiter"];

 self.computeState = [device newComputePipelineStateWithFunction:function error:nil];复制代码

配置资源，原始图片和 lut 图片。

下面是 UIImage 转换为 Texture 的一种方法，经过 CGContext 绘制。

- (void)setLutImage:(UIImage *)lutImage{
    _lutImage = lutImage;

    CGImageRef imageRef = [_lutImage CGImage];

    // Create a suitable bitmap context for extracting the bits of the image
    NSUInteger width = CGImageGetWidth(imageRef);
    NSUInteger height = CGImageGetHeight(imageRef);
    CGColorSpaceRef colorSpace = CGColorSpaceCreateDeviceRGB();
    uint8_t *rawData = (uint8_t *)calloc(height * width * 4, sizeof(uint8_t));
    NSUInteger bytesPerPixel = 4;
    NSUInteger bytesPerRow = bytesPerPixel * width;
    NSUInteger bitsPerComponent = 8;
    CGContextRef bitmapContext = CGBitmapContextCreate(rawData, width, height,
                                                       bitsPerComponent, bytesPerRow, colorSpace,
                                                       kCGImageAlphaPremultipliedLast | kCGBitmapByteOrder32Big);
    CGColorSpaceRelease(colorSpace);


    CGContextDrawImage(bitmapContext, CGRectMake(0, 0, width, height), imageRef);
    CGContextRelease(bitmapContext);

    MTLRegion region = MTLRegionMake2D(0, 0, width, height);
    [self.lutTexture replaceRegion:region mipmapLevel:0 withBytes:rawData bytesPerRow:bytesPerRow];

    free(rawData);
    }复制代码

配置可配参数，如滤镜的混合度，返回等等。
这里我新建了一个 struct ，表明了添加滤镜的返回和强度。经过 bytes 能够把相应的配置传到 shader 中去。

typedef struct
{

    UInt32 clipOriginX;
    UInt32 clipOriginY;
    UInt32 clipSizeX;
    UInt32 clipSizeY;
    Float32 saturation;
    bool changeColor;
    bool changeCoord;

    }ImageSaturationParameters;复制代码

配置 Encoder

将上述的组件都组装起来，sourceTexture 为输入的图片 texture ，destinationTexture 为将要写入的图片 texture，
self.lutTexture 为输入的滤镜图片 texture，分为对应为 texture 的 0，1，2 输入源。
把参数配置，做为 bytes 传入 shader 中。

ImageSaturationParameters params;
    params.clipOriginX = floor(self.filiterRect.origin.x);
    params.clipOriginY = floor(self.filiterRect.origin.y);
    params.clipSizeX = floor(self.filiterRect.size.width);
    params.clipSizeY = floor(self.filiterRect.size.height);

    params.saturation = self.saturation;
    params.changeColor = self.needColorTrans;
    params.changeCoord = self.needCoordTrans;


    id<MTLComputeCommandEncoder> encoder = [commandBuffer computeCommandEncoder];
    [encoder pushDebugGroup:@"filter"];
    [encoder setLabel:@"filiter encoder"];

    [encoder setComputePipelineState:self.computeState];
    [encoder setTexture:sourceTexture atIndex:0];
    [encoder setTexture:destinationTexture atIndex:1];

    if (self.lutTexture == nil) {
        NSLog(@"lut == nil");
        [encoder setTexture:sourceTexture atIndex:2];
    }else{
        [encoder setTexture:self.lutTexture atIndex:2];
    }

    [encoder setSamplerState:self.samplerState atIndex:0];

    [encoder setBytes:&params length:sizeof(params) atIndex:0];复制代码

threadgroups
在 Compute encoder 中，为了提升计算的效率，每一个图片都会分为一个小的单元送到 GPU 进行并行处理，分多少组和每一个组的单元大小都是由 Encder 来配置的。

为了尽量地发挥 GPU 计算最大的效率，能够经过以下方式来配置：

NSUInteger wid = self.computeState.threadExecutionWidth;
    NSUInteger hei = self.computeState.maxTotalThreadsPerThreadgroup / wid;

    MTLSize threadsPerGrid = {(sourceTexture.width + wid - 1) / wid,(sourceTexture.height + hei - 1) / hei,1};
    MTLSize threadsPerGroup = {wid, hei, 1};


    [encoder dispatchThreadgroups:threadsPerGrid
    threadsPerThreadgroup:threadsPerGroup];复制代码

Shader
这里也就是核心的计算逻辑，和以前渲染不一样的是，它既不是 vertex ，也不是 fragment，而是新的 kernel 修饰的，具体的以下，其实就是上面的解释 lut 的代码版本，若是你能理解上面的 lut 坐标的定位的，那么下面的相关代码也不存在问题。
同时下面代码还增长了一个是不是须要添加滤镜的范围的判断，能够看到取样器是能够复用的，不一样 texture 均可以使用同一个取样器。
能够看到 image_filiter 函数有 6 个输入值，从上网上分别为配置参数，原图 texture，写入的目标 texture，滤镜的 texture，采样器，执行时的位置（这个参数返回的是在以前配置的 threadgroup 中计算出来的，位于整个图像中的位置，不是归一化的值，直接取样便可获取对应位置的颜色）

//check the point in pos
bool checkPointInRect(uint2 point,uint2 origin, uint2 rect){
    return point.x >= origin.x &&
    point.y >= origin.y &&
    point.x <= (origin.x + rect.x) &&
    point.y <= (origin.y + rect.y);
}
kernel void image_filiter(constant ImageSaturationParams *params [[buffer(0)]],
                          texture2d<half, access::sample> sourceTexture [[texture(0)]],
                          texture2d<half, access::write> targetTexture [[texture(1)]],
                          texture2d<half, access::sample> lutTexture [[texture(2)]],
                          sampler samp [[sampler(0)]],
                          uint2 gridPos [[thread_position_in_grid]]){


    float2 sourceCoord = float2(gridPos);
    half4 color = sourceTexture.sample(samp,sourceCoord);


    float blueColor = color.b * 63.0;

    int2 quad1;
    quad1.y = floor(floor(blueColor) / 8.0);
    quad1.x = floor(blueColor) - (quad1.y * 8.0);

    int2 quad2;

    quad2.y = floor(ceil(blueColor) / 8.0);
    quad2.x = ceil(blueColor) - (quad2.y * 8.0);

    half2 texPos1;
    texPos1.x = (quad1.x * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * color.r);
    texPos1.y = (quad1.y * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * color.g);

    half2 texPos2;
    texPos2.x = (quad2.x * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * color.r);
    texPos2.y = (quad2.y * 0.125) + 0.5/512.0 + ((0.125 - 1.0/512.0) * color.g);


    half4 newColor1 = lutTexture.sample(samp,float2(texPos1.x * 512 ,texPos2.y * 512));
    half4 newColor2 = lutTexture.sample(samp,float2(texPos2.x * 512,texPos2.y * 512 ));

    half4 newColor = mix(newColor1, newColor2, half(fract(blueColor)));


    half4 finalColor = mix(color, half4(newColor.rgb, color.w), half(params->saturation));


    uint2 destCoords = gridPos + params->clipOrigin;


    uint2 transformCoords = destCoords;

    //transform coords for y
    if (params->changeCoord){
        transformCoords = uint2(destCoords.x, sourceTexture.get_height() - destCoords.y);
    }
    //transform color for r&b
    half4 realColor = finalColor;
    if (params->changeColor){
        realColor = half4(finalColor.bgra);
    }

    if(checkPointInRect(transformCoords,params->clipOrigin,params->clipSize))
    {
        targetTexture.write(realColor, transformCoords);

    }else{

        targetTexture.write(color,transformCoords);
    }
}复制代码

7.计算
在上述步骤都配置完成以后，就能够 encode 了。

[encoder endEncoding];复制代码

在执行上述步骤以后，咱们就获得了一个添加完滤镜以后的 destinationTexture，将该 texture 传给以前的渲染流程，咱们就能够得到一个带滤镜效果的三角形了！

对比下原图

经过 Metal System Trace 根据 label 能够明显的看到，在咱们的 render 以前多了一个 Compute 的 encoder。

#总结

上面是利用 ComputeEncoder 来实现的图像处理工做，其实经过 ComputeEncoder 能将一些复杂的数学计算转移到 GPU 上执行，如机器学习须要的大量的矩阵运算等。
整体的流程仍是和以前的 Render 相同，惟一不一样的多是多了 threadgroup 的配置，

##参考：

wiki - Colour_look-up_table
Metal Programming Guide
使用CIColorCube快速製做濾鏡

1. Metal 系列教程
2. Metal 系列教程（3）- 性能优化点
3. [MetalKit]47-Introducing Metal 3 Metal 3简介
4. [MetalKit]38-Using-ARKit-with-Metal-part-2使用ARKit与Metal-2
5. [MetalKit]43-Metal By Tutorials book! Metal教程书籍
6. metal cmd
7. metal feature
8. Metal学习
9. metal基础
10. [译]Metal 渲染管线教程
更多相关文章...
• SVG 滤镜 - SVG 教程
• Mybatis实现映射器的2种方式 - MyBatis教程
• Java 8 Stream 教程
• Github 简明教程