【飞桨开发者说】侯继旭,海南师范大学自动化本科在读,PPDE飞桨开发者技术专家,研究方向为目标检测、对抗生成网络等php
模型摘要git
在此篇论文中,做者们提出了Globally and Locally Consistent Image Completion方法,可使得图像的缺失部分自动补全,局部和整图保持一致。做者经过全卷积网络,能够补全图片中任何形状的缺失,为了保持补全后的图像与原图的一致性,做者使用全局(整张图片)和局部(缺失补所有分)两种鉴别器来训练。全局鉴别器查看整个图像以评估它是否做为总体是连贯的,而局部鉴别器仅查看以完成区域为中心的小区域来确保所生成的补丁的局部一致性。github
接着对图像补全网络训练以欺骗两个内容鉴别器网络,这要求它生成整体以及细节上与真实没法区分的图像。咱们证实了咱们的方法能够用来完成各类各样的场景。此外,与PatchMatch等基于补丁的方法相比,咱们的方法能够生成图像中未出现的碎片,这使咱们可以天然地完成具备熟悉且高度特定的结构(如面部)的对象的图像。算法
该论文的方法,彻底以卷积网络做为基础,使用了GAN网络的思路,设计了两部分(三个网络),一部分用于生成图像,即补全网络,一部分用于鉴别生成图像是否与原图像一致,即全局鉴别器和局部鉴别器。网络结构图以下所示:网络
基于飞桨实现GLCLC算法app
下面咱们基于飞桨开源深度学习框架动手实现 GLCLC 算法,介绍神经网络代码实现内容,主要使用了卷积、反卷积、空洞卷积、正则、激活函数等方法搭建了补全网络及鉴别网络。框架
补全网络部分,做者采用12层卷积网络对输入图像进行encoding,获得一张原图16分之一大小的网格。而后再对该网格采用4层卷积网络进行decoding。为了保证生成区域尽可能不模糊,文中下降分辨率的操做是使用strided convolution 的方式进行的,并且只用了两次,将图片的size 变为原来的四分之一。同时在中间层还使用了空洞卷积来增大感觉野,在尽可能获取更大范围内的图像信息的同时不损失额外的信息,从而获得复原图像。下表为补全网络各层参数分布状况。dom
输入为RGB图像与二进制掩码(须要填充的区域以1填充)的组合图像;输出为RGB图像。ide
1. # 搭建补全网络 2. def generator(x): 3. # conv1 4. conv1 = fluid.layers.conv2d(input=x,num_filters=64,filter_size=5,dilation=1,stride=1,padding='SAME',name='generator_conv1',data_format='NHWC') 5. conv1 = fluid.layers.batch_norm(conv1, momentum=0.99, epsilon=0.001) 6. conv1 = fluid.layers.relu(conv1, name=None) 7. # conv2 8. conv2 = fluid.layers.conv2d(input=conv1,num_filters=128,filter_size=3,dilation=1,stride=2,padding='SAME',name='generator_conv2',data_format='NHWC') 9. conv2 = fluid.layers.batch_norm(conv2, momentum=0.99, epsilon=0.001) 10. conv2 = fluid.layers.relu(conv2, name=None) 11. # conv3 12. conv3 = fluid.layers.conv2d(input=conv2,num_filters=128,filter_size=3,dilation=1,stride=1,padding='SAME',name='generator_conv3',data_format='NHWC') 13. conv3 = fluid.layers.batch_norm(conv3, momentum=0.99, epsilon=0.001) 14. conv3 = fluid.layers.relu(conv3, name=None) 15. # conv4 16. conv4 = fluid.layers.conv2d(input=conv3,num_filters=256,filter_size=3,dilation=1,stride=2,padding='SAME',name='generator_conv4',data_format='NHWC') 17. conv4 = fluid.layers.batch_norm(conv4, momentum=0.99, epsilon=0.001) 18. conv4 = fluid.layers.relu(conv4, name=None) 19. # conv5 20. conv5 = fluid.layers.conv2d(input=conv4,num_filters=256,filter_size=3,dilation=1,stride=1,padding='SAME',name='generator_conv5',data_format='NHWC') 21. conv5 = fluid.layers.batch_norm(conv5, momentum=0.99, epsilon=0.001) 22. conv5 = fluid.layers.relu(conv5, name=None) 23. # conv6 24. conv6 = fluid.layers.conv2d(input=conv5,num_filters=256,filter_size=3,dilation=1,stride=1,padding='SAME',name='generator_conv6',data_format='NHWC') 25. conv6 = fluid.layers.batch_norm(conv6, momentum=0.99, epsilon=0.001) 26. conv6 = fluid.layers.relu(conv6, name=None) 27. # 空洞卷积 28. # dilated1 29. dilated1 = fluid.layers.conv2d(input=conv6,num_filters=256,filter_size=3,dilation=2,padding='SAME',name='generator_dilated1',data_format='NHWC') 30. dilated1 = fluid.layers.batch_norm(dilated1, momentum=0.99, epsilon=0.001) 31. dilated1 = fluid.layers.relu(dilated1, name=None) 32. # dilated2 33. dilated2 = fluid.layers.conv2d(input=dilated1,num_filters=256,filter_size=3,dilation=4,padding='SAME',name='generator_dilated2',data_format='NHWC') #stride=1 34. dilated2 = fluid.layers.batch_norm(dilated2, momentum=0.99, epsilon=0.001) 35. dilated2 = fluid.layers.relu(dilated2, name=None) 36. # dilated3 37. dilated3 = fluid.layers.conv2d(input=dilated2,num_filters=256,filter_size=3,dilation=8,padding='SAME',name='generator_dilated3',data_format='NHWC') 38. dilated3 = fluid.layers.batch_norm(dilated3, momentum=0.99, epsilon=0.001) 39. dilated3 = fluid.layers.relu(dilated3, name=None) 40. # dilated4 41. dilated4 = fluid.layers.conv2d(input=dilated3,num_filters=256,filter_size=3,dilation=16,padding='SAME',name='generator_dilated4',data_format='NHWC') 42. dilated4 = fluid.layers.batch_norm(dilated4, momentum=0.99, epsilon=0.001) 43. dilated4 = fluid.layers.relu(dilated4, name=None) 44. # conv7 45. conv7 = fluid.layers.conv2d(input=dilated4,num_filters=256,filter_size=3,dilation=1,name='generator_conv7',data_format='NHWC') 46. conv7 = fluid.layers.batch_norm(conv7, momentum=0.99, epsilon=0.001) 47. conv7 = fluid.layers.relu(conv7, name=None) 48. # conv8 49. conv8 = fluid.layers.conv2d(input=conv7,num_filters=256,filter_size=3,dilation=1,stride=1,padding='SAME',name='generator_conv8',data_format='NHWC') 50. conv8 = fluid.layers.batch_norm(conv8, momentum=0.99, epsilon=0.001) 51. conv8 = fluid.layers.relu(conv8, name=None) 52. # deconv1 53. deconv1 = fluid.layers.conv2d_transpose(input=conv8, num_filters=128, output_size=[64,64],stride = 2,name='generator_deconv1',data_format='NHWC') 54. deconv1 = fluid.layers.batch_norm(deconv1, momentum=0.99, epsilon=0.001) 55. deconv1 = fluid.layers.relu(deconv1, name=None) 56. # conv9 57. conv9 = fluid.layers.conv2d(input=deconv1,num_filters=128,filter_size=3,dilation=1,stride=1,padding='SAME',name='generator_conv9',data_format='NHWC') 58. conv9 = fluid.layers.batch_norm(conv9, momentum=0.99, epsilon=0.001) 59. conv9 = fluid.layers.relu(conv9, name=None) 60. # deconv2 61. deconv2 = fluid.layers.conv2d_transpose(input=conv9, num_filters=64, output_size=[128,128],stride = 2,name='generator_deconv2',data_format='NHWC') 62. deconv2 = fluid.layers.batch_norm(deconv2, momentum=0.99, epsilon=0.001) 63. deconv2 = fluid.layers.relu(deconv2, name=None) 64. # conv10 65. conv10 = fluid.layers.conv2d(input=deconv2,num_filters=32,filter_size=3,dilation=1,stride=1,padding='SAME',name='generator_conv10',data_format='NHWC') 66. conv10 = fluid.layers.batch_norm(conv10, momentum=0.99, epsilon=0.001) 67. conv10 = fluid.layers.relu(conv10, name=None) 68. # conv11 69. x = fluid.layers.conv2d(input=conv10,num_filters=3,filter_size=3,dilation=1,stride=1,padding='SAME',name='generator_conv11',data_format='NHWC') 70. x = fluid.layers.tanh(x) 71. return x
内容鉴别器分为了两个部分,一个全局鉴别器(Global Discriminator)以及一个局部鉴别器(Local Discriminator)。全局鉴别器是将一张完整的图像做为输入数据,对图像的全局一致性作出判断;局部鉴别器仅在以填充区域为中心的原图像四分之一大小区域上观测,对此部分图像的一致性作出判断。经过采用上述两个不一样的鉴别器,可使得最终的网络,不但能够对图像全局一致性作判断,而且可以经过局部鉴别方法,优化生成图的细节,最终能产生更好的图片填充效果。函数
在原文中,做者设定的全局鉴别网络输入是256X256X3的图片,局部网络输入是128X128X3的图片。原始论文中,全局网络和局部网络都会经过使用5X5的卷积层、2X2的stride下降图像分辨率,经过全链接,分别获得一个1024维的向量。而后,做者将全局和局部两个鉴别器的输出链接成一个2048维向量,再经过一个全链接,而后用sigmoid函数对总体的图像的一致性进行打分判别。但在本次实验,为了能下降训练难度,设定全局鉴别网络输入是128X128X3的图片,局部网络输入是64X64X3的图片。
1. # 搭建内容鉴别器 2. def discriminator(global_x, local_x): 3. def global_discriminator(x): 4. # conv1 5. conv1 = fluid.layers.conv2d(input=x,num_filters=64,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_global_conv1',data_format='NHWC') 6. conv1 = fluid.layers.batch_norm(conv1, momentum=0.99, epsilon=0.001) 7. conv1 = fluid.layers.relu(conv1, name=None) 8. # conv2 9. conv2 = fluid.layers.conv2d(input=conv1,num_filters=128,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_global_conv2',data_format='NHWC') 10. conv2 = fluid.layers.batch_norm(conv2, momentum=0.99, epsilon=0.001) 11. conv2 = fluid.layers.relu(conv2, name=None) 12. # conv3 13. conv3 = fluid.layers.conv2d(input=conv2,num_filters=256,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_global_conv3',data_format='NHWC') 14. conv3 = fluid.layers.batch_norm(conv3, momentum=0.99, epsilon=0.001) 15. conv3 = fluid.layers.relu(conv3, name=None) 16. # conv4 17. conv4 = fluid.layers.conv2d(input=conv3,num_filters=512,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_global_conv4',data_format='NHWC') 18. conv4 = fluid.layers.batch_norm(conv4, momentum=0.99, epsilon=0.001) 19. conv4 = fluid.layers.relu(conv4, name=None) 20. # conv5 21. conv5 = fluid.layers.conv2d(input=conv4,num_filters=512,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_global_conv5',data_format='NHWC') 22. conv5 = fluid.layers.batch_norm(conv5, momentum=0.99, epsilon=0.001) 23. conv5 = fluid.layers.relu(conv5, name=None) 24. # conv6 25. conv6 = fluid.layers.conv2d(input=conv5,num_filters=512,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_global_conv6',data_format='NHWC') 26. conv6 = fluid.layers.batch_norm(conv6, momentum=0.99, epsilon=0.001) 27. conv6 = fluid.layers.relu(conv6, name=None) 28. # fc 29. x = fluid.layers.fc(input=conv6, size=1024,name='discriminator_global_fc1') 30. return x 32. def local_discriminator(x): 33. # conv1 34. conv1 = fluid.layers.conv2d(input=x,num_filters=64,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_lobal_conv1',data_format='NHWC') 35. conv1 = fluid.layers.batch_norm(conv1, momentum=0.99, epsilon=0.001) 36. conv1 = fluid.layers.relu(conv1, name=None) 37. # conv2 38. conv2 = fluid.layers.conv2d(input=conv1,num_filters=128,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_lobal_conv2',data_format='NHWC') 39. conv2 = fluid.layers.batch_norm(conv2, momentum=0.99, epsilon=0.001) 40. conv2 = fluid.layers.relu(conv2, name=None) 41. # conv3 42. conv3 = fluid.layers.conv2d(input=conv2,num_filters=256,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_lobal_conv3',data_format='NHWC') 43. conv3 = fluid.layers.batch_norm(conv3, momentum=0.99, epsilon=0.001) 44. conv3 = fluid.layers.relu(conv3, name=None) 45. # conv4 46. conv4 = fluid.layers.conv2d(input=conv3,num_filters=512,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_lobal_conv4',data_format='NHWC') 47. conv4 = fluid.layers.batch_norm(conv4, momentum=0.99, epsilon=0.001) 48. conv4 = fluid.layers.relu(conv4, name=None) 49. # conv5 50. conv5 = fluid.layers.conv2d(input=conv4,num_filters=512,filter_size=5,dilation=1,stride=2,padding='SAME',name='discriminator_lobal_conv5',data_format='NHWC') 51. conv5 = fluid.layers.batch_norm(conv5, momentum=0.99, epsilon=0.001) 52. conv5 = fluid.layers.relu(conv5, name=None) 53. # fc 54. x = fluid.layers.fc(input=conv5, size=1024,name='discriminator_lobal_fc1') 55. return x 57. global_output = global_discriminator(global_x) 58. local_output = local_discriminator(local_x) 59. print('global_output',global_output.shape) 60. print('local_output',local_output.shape) 61. output = fluid.layers.concat([global_output, local_output], axis=1) 62. output = fluid.layers.fc(output, size=1,name='discriminator_concatenation_fc1') 64. return output
生成网络使用weighted Mean Squared Error (MSE)做为损失函数,计算原图与生成图像像素之间的差别,表达式以下所示:
鉴别器网络使用GAN损失函数,其目标是最大化生成图像和原始图像的类似几率,表达式以下所示:
最后结合二者损失,造成下式:
网络训练
本项目为了缩短训练时间,仅采用了此论文核心思想、网络结构、优化目标等,并对训练方式及部分细节作了简化。使用的输入图像大小:128*128,训练方式设定为:先训练生成器再将生成器和判别器一块儿训练。
1. # 生成器优先迭代次数 2. NUM_TRAIN_TIMES_OF_DG = 100 3. # 总迭代轮次 4. epoch = 200 6. step_num = int(len(x_train) / BATCH_SIZE) 8. np.random.shuffle(x_train) 10. for pass_id in range(epoch): 11. # 训练生成器 12. if pass_id <= NUM_TRAIN_TIMES_OF_DG: 13. g_loss_value = 0 14. for i in tqdm.tqdm(range(step_num)): 15. x_batch = x_train[i * BATCH_SIZE:(i + 1) * BATCH_SIZE] 16. points_batch, mask_batch = get_points() 17. # print(x_batch.shape) 18. # print(mask_batch.shape) 19. dg_loss_n = exe.run(dg_program, 20. feed={'x': x_batch, 21. 'mask':mask_batch,}, 22. fetch_list=[dg_loss])[0] 23. g_loss_value += dg_loss_n 24. print('Pass_id:{}, Completion loss: {}'.format(pass_id, g_loss_value)) 26. np.random.shuffle(x_test) 27. x_batch = x_test[:BATCH_SIZE] 29. completion_n = exe.run(dg_program, 30. feed={'x': x_batch, 31. 'mask': mask_batch,}, 32. fetch_list=[completion])[0][0] 33. # 修复图片 34. sample = np.array((completion_n + 1) * 127.5, dtype=np.uint8) 35. # 原图 36. x_im = np.array((x_batch[0] + 1) * 127.5, dtype=np.uint8) 37. # 挖空洞输入图 38. input_im_data = x_im * (1 - mask_batch[0]) 39. input_im = np.array(input_im_data + np.ones_like(x_im) * mask_batch[0] * 255, dtype=np.uint8) 40. output_im = np.concatenate((x_im,input_im,sample),axis=1) 41. #print(output_im.shape) 42. cv2.imwrite('./output/pass_id:{}.jpg'.format(pass_id), cv2.cvtColor(output_im, cv2.COLOR_RGB2BGR)) 43. # 保存模型 44. save_pretrain_model_path = 'models/' 45. # 建立保持模型文件目录 46. #os.makedirs(save_pretrain_model_path) 47. fluid.io.save_params(executor=exe, dirname=save_pretrain_model_path, main_program=dg_program) 49. # 生成器判断器一块儿训练 50. else: 51. g_loss_value = 0 52. d_loss_value = 0 53. for i in tqdm.tqdm(range(step_num)): 54. x_batch = x_train[i * BATCH_SIZE:(i + 1) * BATCH_SIZE] 55. points_batch, mask_batch = get_points() 56. dg_loss_n = exe.run(dg_program, 57. feed={'x': x_batch, 58. 'mask':mask_batch,}, 59. fetch_list=[dg_loss])[0] 60. g_loss_value += dg_loss_n 62. completion_n = exe.run(dg_program, 63. feed={'x': x_batch, 64. 'mask': mask_batch,}, 65. fetch_list=[completion])[0] 66. local_x_batch = [] 67. local_completion_batch = [] 68. for i in range(BATCH_SIZE): 69. x1, y1, x2, y2 = points_batch[i] 70. local_x_batch.append(x_batch[i][y1:y2, x1:x2, :]) 71. local_completion_batch.append(completion_n[i][y1:y2, x1:x2, :]) 72. local_x_batch = np.array(local_x_batch) 73. local_completion_batch = np.array(local_completion_batch) 74. d_loss_n = exe.run(d_program, 75. feed={'x': x_batch, 'mask': mask_batch, 'local_x': local_x_batch, 'global_completion': completion_n, 'local_completion': local_completion_batch}, 76. fetch_list=[d_loss])[0] 77. d_loss_value += d_loss_n 78. print('Pass_id:{}, Completion loss: {}'.format(pass_id, g_loss_value)) 79. print('Pass_id:{}, Discriminator loss: {}'.format(pass_id, d_loss_value)) 81. np.random.shuffle(x_test) 82. x_batch = x_test[:BATCH_SIZE] 83. completion_n = exe.run(dg_program, 84. feed={'x': x_batch, 85. 'mask': mask_batch,}, 86. fetch_list=[completion])[0][0] 87. # 修复图片 88. sample = np.array((completion_n + 1) * 127.5, dtype=np.uint8) 89. # 原图 90. x_im = np.array((x_batch[0] + 1) * 127.5, dtype=np.uint8) 91. # 挖空洞输入图 92. input_im_data = x_im * (1 - mask_batch[0]) 93. input_im = np.array(input_im_data + np.ones_like(x_im) * mask_batch[0] * 255, dtype=np.uint8) 94. output_im = np.concatenate((x_im,input_im,sample),axis=1) 95. #print(output_im.shape) 96. cv2.imwrite('./output/pass_id:{}.jpg'.format(pass_id), cv2.cvtColor(output_im, cv2.COLOR_RGB2BGR)) 97. # 保存模型 98. save_pretrain_model_path = 'models/' 99. # 建立保持模型文件目录 100. #os.makedirs(save_pretrain_model_path) 101. fluid.io.save_params(executor=exe, dirname=save_pretrain_model_path, main_program = dg_program)
结果展现
项目总结
Image Completion Result 中的 Input 是挖洞后输入补全网络的图像,在 Output 看到, Input 图像上挖的洞已经被补上了,这说明如今的训练结果已经能在必定程度上补全图像的缺失部分了。因为本项目实现时在硬件及时间方面受限,所以对原文中的方法进行了简化,训练方法和数据样本处理较原论文有所调整作了调整,没法达到原论文效果,但相较于原做者两个月的训练时间对比,这样的训练方式也是可取的。
如想到达到原论文的精准的小伙伴,能够在本项目基础上修改训练策略~在此附上原论文训练程序图
本项目使用了飞桨开源深度学习框架,在AI Studio上完成了数据处理、模型训练、效果预测等整个工做过程,很是感谢AI Studio给咱们提供的GPU在线训练环境,对于在深度学习道路上硬件条件上不足的学生来讲简直是很是大的帮助。
若是你对这个小实验感兴趣,也能够本身来尝试一下,整个项目包括数据集与相关代码已公开在AI Studio上,欢迎小伙伴们Fork。
https://aistudio.baidu.com/ai...
如在使用过程当中有问题,可加入飞桨官方QQ群进行交流:1108045677。
若是您想详细了解更多飞桨的相关内容,请参阅如下文档。