深度学习基于CNN的纹理合成实践【附python实现】

时间 2020-08-10

标签深度学习基于 cnn 纹理合成实践附python实现栏目 Python 繁體版

原文原文链接

Q0: Preliminary knowledge of Texture Synthesis

Baseline请见此处，下文全部的代码修改均创建此代码基础之上。python

1. 纹理合成简述

纹理合成（Texture Systhesis）技术主要应用于计算机图形学等领域，被用于模拟几何模型的表面细节、加强绘制模型的真实感。不一样于传统的纹理映射（Texture Mapping）技术，纹理合成是从一个样本纹理中推导一个泛化的过程，并以此来生成具备那种纹理的任意的新图像，可有效解决纹理接缝和扭曲等问题。git

根据原理的不一样，咱们经常将纹理合成的方法划分为过程纹理合成（Procedural Texture Synthesis，PTS）和基于采样的纹理合成（Texture Synthesis from Samples，TSFS），具体区别以下。github

PTS：经过对物理生成过程的仿真直接在曲面上生成纹理，如毛发、云雾、木纹等。这种方法能够逼真地生成纹理图案，前提是对该纹理的生成过程进行准确的物理建模，这显然是很是困难的，对于较为复杂的纹理生成问题，PTS行不通；算法
TSFS：经过分析给定样图的纹理特征来生成大面积纹理。TSFS技术既能保证纹理的类似性和连续性，又避免了PTS中物理模型创建的繁琐过程。其传统的算法主要有特征匹配算法、基于马尔可夫链随机场模型的合成算法以及基于纹理块拼接的纹理合成算法，而近些年发展较快的，则是基于深度学习的纹理合成方法，本次做业所涉及的《Texture Synthesis Using Convolutional Neural Networks》便属于此类。网络

2. 论文思想解读

2-1 基本架构

纹理分析：原始纹理传入卷积神经网络（做业采用的是VGG-16网络），计算其特征图之间的Gram矩阵；
纹理生成：初始化一张白噪声图像传入网络，计算包含纹理模型的每一个层的损失函数，在每一个像素值的总损失函数上使用梯度降低算法，最终训练生成Gram矩阵与原始纹理图像的Gram矩阵相同的纹理图像。

2-2 Gram矩阵

Gram矩阵能够视为特征图之间的偏爱协方差矩阵，即没有减去均值的协方差矩阵。其含义可能够这样理解——”在feature map中，每个数字都来自于一个特定滤波器在特定位置的卷积，所以每一个数字就表明一个特征的强度，而Gram计算的其实是两两特征之间的相关性，哪两个特征是同时出现的，哪两个是此消彼长的等等，同时，Gram的对角线元素，还体现了每一个特征在图像中出现的量。”（知乎 90后后生）下图左式为Gram矩阵的定义式，其实就是用矩阵的转置乘以矩阵自身来获取；右式为架构

Q1: Implementing Gram matrix and loss function.

Use the features extracted from all the 13 convolution layers, complete the baseline project with loss function based on gram matrix and run the training process.app

q1-1. 代码

# Gram矩阵的计算
def get_gram_matrix(feature_map):
    shape = feature_map.get_shape().as_list()
    re_shape = tf.reshape(feature_map, (-1, shape[3]))
    gram = tf.matmul(re_shape, re_shape, transpose_a=True) / (shape[1]*shape[2]*shape[3])
    return gram

# L2损失函数的补充
def get_l2_gram_loss_for_layer(noise, source, layer):
    source_feature = getattr(source, layer)
    noise_feature = getattr(noise, layer)
    Gram_s = get_gram_matrix(source_feature)
    Gram_n = get_gram_matrix(noise_feature)
    loss = tf.nn.l2_loss((Gram_s-Gram_n))/2
    return loss

q1-2. 效果

图片生成的动态效果图请点击此处查看。less

Origin	Generate

Q2: Training with non-texture images.

To better understand texture model represents image information, choose another non-texture image(such as robot.jpg in the ./images folder) and rerun the training process.函数

q2-1. 代码

为了较好的训练效果，在Q2中，我给各层添加了递增的权重，以便更加清晰地对比不一样纹理图片下网络的生成效果。具体代码以下。学习

def get_gram_loss(noise, source):
    with tf.name_scope('get_gram_loss'):
        # weight = np.logspace(0, len(GRAM_LAYERS)-1, len(GRAM_LAYERS), base=3.5)
        weight = np.linspace(1, len(GRAM_LAYERS), len(GRAM_LAYERS), endpoint=True)
        gram_loss = [get_l2_gram_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ]
    return tf.reduce_mean(tf.convert_to_tensor(list(map(lambda x,y:x*y, weight, gram_loss))))

q2-2. 效果

	origin	epoch=1000，weight=1,2,3,4……	epoch=5000，weight=1,2,4,8……
red-peppers
robot
shibuya
stone

q2-3. 分析

从实验结果来看，对于分布有必定规律的纹理图案，本网络的生成效果尚佳，如图red-peppers与图stone；可是对于非纹理图案来讲，彷佛效果并不理想，在生成的图像中，很难辨认出原图中的元素。

Q3: Training with less layers of features.

To reduce the parameter size, please use less layers for extracting features (based on which we compute the Gram matrix and loss) and explore a combination of layers with which we can still synthesize texture images with high degrees of naturalness.

q3-1. 代码

分别将不一样layer对应的weight设置为0，以从loss的计算中删除相应的layer。具体代码以下。

def get_gram_loss(noise, source):
    with tf.name_scope('get_gram_loss'):
        # weight = [1,1, 1,1, 1,1,1, 1,1,1, 1,1,1]
        # weight = [0,0, 1,1, 1,1,1, 1,1,1, 1,1,1]
        # weight = [1,1, 0,0, 1,1,1, 1,1,1, 1,1,1]
        # weight = [1,1, 1,1, 0,0,0, 1,1,1, 1,1,1]
        # weight = [1,1, 1,1, 1,1,1, 0,0,0, 1,1,1]
        # weight = [1,1, 1,1, 1,1,1, 1,1,1, 0,0,0]
        # weight = [10,10, 20,20, 30,30,30, 40,40,40, 50,50,50]
        # weight = [50,50, 40,40, 30,30,30, 20,20,20, 10,10,10]
        gram_loss = [get_l2_gram_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ]
    return tf.reduce_mean(tf.convert_to_tensor(list(map(lambda x,y:x*y, weight, gram_loss))))

q3-2. 效果

all	~~conv1~~	~~conv2~~	~~conv3~~	~~conv4~~	~~conv5~~

所有保留	删除conv1	删除conv2	删除conv3	删除conv4	删除conv5

weight ↗	weight ↘

[10,10, 20,20, 30,30,30, 40,40,40, 50,50,50]	[50,50, 40,40, 30,30,30, 20,20,20, 10,10,10]

q3-3. 分析

在删除不一样层的尝试中，对比实验结果能够发现第一层对图像特征的提取尤为关键；同时，单独删除conv2-5，对实验结果的影响不大。同时，我尝试着赋予向深层递增或递减的权重，经过结果的对比，发现权重递增的状况下生成图像纹理效果较优，这说明提升深层conv对网络的影响能够有效提升输出质量。综合考量之下，可选择删除conv5的feature Map，同时提升深层的权重来得到较优的效果。

Q4: Finding alternatives of Gram matrix.

We may use the Earth mover's distance between the features of source texture image and the generated image.

q4-1. 代码

EMD（Earth Mover’s Distance）是基于内容的图像检索计算两个分布之间距离的度量标准。EMD能够直观地理解为线性规划中运输问题的最优解，即把一种分配转换为另外一种分配所必须支付地最低成本，最先由Peleg等人针对某些视觉问题提出。基于EMD，咱们能够构建以下的损失函数。

\[Loss = \sum_l w_l \sum_i (sorted(F_i)-sorted(\hat{F_i}))^2 \]

具体代码以下所示。

def get_l2_emd_loss_for_layer(noise, source, layer):
    noise_feature = getattr(noise, layer)
    source_feature = getattr(source, layer)
    shape = noise_feature.get_shape().as_list()
    noise_re_shape = tf.reshape(noise_feature, (shape[1]*shape[2], shape[3]))
    source_re_shape = tf.reshape(source_feature, (shape[1]*shape[2], shape[3]))
    noise_sort = tf.sort(noise_re_shape, direction='ASCENDING')
    source_sort = tf.sort(source_re_shape, direction='ASCENDING')
    return tf.reduce_sum(tf.math.square(noise_sort-source_sort))

def get_emd_loss(noise, source):
    with tf.name_scope('get_emd_loss'):
        emd_loss = [get_l2_emd_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ]
    return tf.reduce_mean(tf.convert_to_tensor(emd_loss))

q4-2. 效果

此时 loss 还未彻底收敛，此为【e:3700 loss: 2575.86865】时的输出。~~个人小破电脑已经尽力了……~~

Origin	Generate

q4-3. 分析

从实验结果来看，网络学习到了原始纹理图片的各个特征向量之间的相关性，生成的图片与原始图像的纹理走向类似。但很遗憾的是，更改loss函数为EMD-loss后，网络缺失了原始纹理图片的大多数颜色特征（可能与EMD计算过程当中的sort操做有关），在色彩呈现上的表现很是很差。

Q5: Training with different weighting factor.

Use the configuration in Q3 as baseline. Change the weighting factor of each layer and rerun the training process.

q5-1. 代码

根据Q3，使用递增的权重系数可得到较优的训练效果，因而，在Q5中，我设定了两种权重的递增序列：1）等差数列；2）等比数列。具体代码以下。

def get_gram_loss(noise, source):
    with tf.name_scope('get_gram_loss'):
        # weight = np.logspace(0, len(GRAM_LAYERS)-4, len(GRAM_LAYERS)-3, base=2)
        weight = np.linspace(1, 128*(len(GRAM_LAYERS)-3), len(GRAM_LAYERS)-3, endpoint=True)
        weight = weight + [0, 0, 0]
        gram_loss = [get_l2_gram_loss_for_layer(noise, source, layer) for layer in GRAM_LAYERS ]
    return tf.reduce_mean(tf.convert_to_tensor(list(map(lambda x,y:x*y, weight, gram_loss))))

q5-2. 效果

等比数列 - 递增 - \(q\) 为相邻项的比

q = 2	q = 2.5	q = 3	q = 3.5

等差数列 - 递增 - \(d\) 为相邻项的差

d = 1	d = 2	d = 4	d = 8

d = 16	d = 32	d = 64	d = 128

q5-3. 分析

相对于等差递增的权重，在等比递增的权重下网络的表现更好。同时，当q或d不断增大时，生成图像的还原度也不断增高。结合这两种现象，能够得出初步的结论，经过扩大不一样层layer权重的差别（即减少浅层layer的权重，增大深层layer的权重），能够有效地提升纹理图像的还原度；不一样层权重的差别越大，网络生成纹理图像的效果越好，反之，则生成效果越差。

Q6. Some remaining problems.

1）Q4中EMD-loss效果并不理想，须要对loss函数进行调整以保留更多的纹理特征；

2）Q5中等比数列递增的权重下，当q增大时，生成图像的两侧会出现部分的颜色失真，尚不明其缘由。

深度学习 基于CNN的纹理合成实践【附python实现】