CycleGAN和pix2pix的tips.md文件翻译(训练/测试 Tips(Training/test Tips))

训练/测试 选项(Training/test options)

Please see options/train_options.py and options/base_options.py for the training flags; see options/test_options.py and options/base_options.py for the test flags. There are some model-specific flags as well, which are added in the model files, such as --lambda_A option in model/cycle_gan_model.py. The default values of these options are also adjusted in the model files.html

请查阅 options/train_options.pyoptions/base_options.py 来肯定训练的 Flags;查阅 options/test_options.pyoptions/base_options.py 来肯定测试的 Flags。于此同时,还有一些添加到模型文件中的 flags,如在 model/cycle_gan_model.py 中的 --lambda_A 选项。这些选项的默认值也同时在模型文件中进行调整。python

CPU/GPU (默认 --gpu_ids 0)(CPU/GPU (default --gpu_ids 0))

Please set--gpu_ids -1 to use CPU mode; set --gpu_ids 0,1,2 for multi-GPU mode. You need a large batch size (e.g., --batch_size 32) to benefit from multiple GPUs.git

若是使用 CPU 模式,请设定 --gpu_ids -1;对于多 GPU 模式,请设定 --gpu_ids 0,1,2。为了发挥多 GPU 的优点,建议增长batch 的尺寸,如 --batch_size 32github

可视化(Visualization)

During training, the current results can be viewed using two methods. First, if you set --display_id > 0, the results and loss plot will appear on a local graphics web server launched by visdom. To do this, you should have visdom installed and a server running by the command python -m visdom.server. The default server URL is http://localhost:8097. display_id corresponds to the window ID that is displayed on the visdom server. The visdom display functionality is turned on by default. To avoid the extra overhead of communicating with visdom set --display_id -1. Second, the intermediate results are saved to [opt.checkpoints_dir]/[opt.name]/web/ as an HTML file. To avoid this, set --no_html.web

在训练过程当中,当前结果可使用 2 种方法进行观察。bash

其一,若是你设置 --display_id > 0,结果和损失(loss)的结果将在一个由 visdom 启动的本地图像服务器上展现。为了用该方法,你须要安装 visdom,而且使用命令 python -m visdom.server 来启动服务器运行。默认的服务器 URL 为 http://localhost:8097 。而其中的 display_id 对应的是 visdom 服务器中显示的窗口 ID。visdom 显示功能是默认开启的。为了不使用 visdom 功能带来的额外开支,能够设置--display_id -1
其二,中间结果是做为 HTML 文件存储在 [opt.checkpoints_dir]/[opt.name]/web/ 下。为了不该功能,能够设置 --no_html服务器

预处理(Preprocessing)

Images can be resized and cropped in different ways using --preprocess option. The default option 'resize_and_crop' resizes the image to be of size (opt.load_size, opt.load_size) and does a random crop of size (opt.crop_size, opt.crop_size). 'crop' skips the resizing step and only performs random cropping. 'scale_width' resizes the image to have width opt.crop_size while keeping the aspect ratio. 'scale_width_and_crop' first resizes the image to have width opt.load_size and then does random cropping of size (opt.crop_size, opt.crop_size). 'none' tries to skip all these preprocessing steps. However, if the image size is not a multiple of some number depending on the number of downsamplings of the generator, you will get an error because the size of the output image may be different from the size of the input image. Therefore, 'none' option still tries to adjust the image size to be a multiple of 4. You might need a bigger adjustment if you change the generator architecture. Please see data/base_datset.py do see how all these were implemented.网络

可使用 --preprocess 选项来经过不一样的方式来对图像进行尺寸调整和切割。架构

默认的选项 ‘resize_and_crop’ 将尺寸改变尺寸为 (opt.load_size, opt.load_size) 而且使用尺寸 (opt.crop_size, opt.crop_size) 来进行随机切割;app

'crop' 选项则跳过尺寸调整的步骤,只进行随机切割;

'scale_width' 选项对图像进行尺寸调整,使得图像宽度为 opt.crop_size(疑似错误?是否应为 opt.load_size? ),同时保持显示比例;

'scale_width_and_crop' 选项首先将尺寸调整为宽度为 opt.load_size,而后进行随机切割,尺寸为 (opt.crop_size, opt.crop_size) ;

'none' 选项试图跳过全部的预处理步骤。可是若是图像尺寸不是由生成器中的降采样的倍数,可能会因为输出图像的尺寸和输入图像的尺寸不一样而得到错误。所以,'none' 选项仍然会试图调整图像的尺寸为 4 的倍数。若是你改变了生成器的架构,可能须要进行较大的调整。请查询 data/base_datset.py 来肯定这些如何实现。

微调(Fine-tuning)/ 从新开始训练(resume training)

To fine-tune a pre-trained model, or resume the previous training, use the --continue_train flag. The program will then load the model based on epoch. By default, the program will initialize the epoch count as 1. Set --epoch_count <int> to specify a different starting epoch count.

为了微调一个预先训练的模型,或者从新开始以前的训练,使用 --continue_train 标记。程序将会基于 epoch 载入模型。默认的,程序将会初始化 epoch 的计数为 1。能够设定 --epoch_count <int> 来制定一个不一样的初始 epoch 计数。

为 CycleGAN 准备你本身的数据集(Prepare your own datasets for CycleGAN)

You need to create two directories to host images from domain A /path/to/data/trainA and from domain B /path/to/data/trainB. Then you can train the model with the dataset flag --dataroot /path/to/data. Optionally, you can create hold-out test datasets at /path/to/data/testA and /path/to/data/testB to test your model on unseen images.

你须要建立 2 个目录放置图像,从 A 域 /path/to/data/trainA 和 B 域 /path/to/data/trainB。而后你就可使用数据集标记 --dataroot /path/to/data 来训练模型。或者,你能够在目录 /path/to/data/testA/path/to/data/testB 来建立测试数据集,来测试你的模型。

为 pix2pix 准备你本身的数据集(Prepare your own datasets for pix2pix)

Pix2pix’s training requires paired data. We provide a python script to generate training data in the form of pairs of images {A,B}, where A and B are two different depictions of the same underlying scene. For example, these might be pairs {label map, photo} or {bw image, color image}. Then we can learn to translate A to B or B to A:

Pix2pix 的训练须要配对的数据。咱们提供一个 python 脚原本产生配对形式 {A,B} 的图像做为训练数据,其中 A 和 B 是同一个场景的不一样描述。举例而言,这些多是 {label map, photo} (标签地图,照片)或者 {bw image, color image} (黑白图像,彩色图像)。而后咱们能够训练从 A 到 B 或者从 B 到 A。

Create folder /path/to/data with subdirectories A and B. A and B should each have their own subdirectories train, val, test, etc. In /path/to/data/A/train, put training images in style A. In /path/to/data/B/train, put the corresponding images in style B. Repeat same for other data splits (val, test, etc).

建立文件夹 /path/to/data,具备子目录 ABAB 应该各自具备其子目录 train, val, test 等。在 /path/to/data/A/train 目录中,放入 A 样式的训练图像;在 /path/to/data/B/train目录中,放入对应的 B 样式的图像。对其余的数据划分( val, test 等),也是相似。

Corresponding images in a pair {A,B} must be the same size and have the same filename, e.g., /path/to/data/A/train/1.jpg is considered to correspond to /path/to/data/B/train/1.jpg.

一个配对形式 {A,B} 的对应图像,必须具备相同的尺寸,而且具备相同的文件名。如 /path/to/data/A/train/1.jpg 被认为与 /path/to/data/B/train/1.jpg 相对应。

Once the data is formatted this way, call:
一旦数据如此格式化后,调用:

python datasets/combine_A_and_B.py --fold_A /path/to/data/A --fold_B /path/to/data/B --fold_AB /path/to/data

This will combine each pair of images (A,B) into a single image file, ready for training.

这将合并图像对 (A,B) 到一个图像文件中,能够用来进行训练。

关于图像尺寸(About image size)

Since the generator architecture in CycleGAN involves a series of downsampling / upsampling operations, the size of the input and output image may not match if the input image size is not a multiple of 4. As a result, you may get a runtime error because the L1 identity loss cannot be enforced with images of different size. Therefore, we slightly resize the image to become multiples of 4 even with --preprocess none option. For the same reason, --crop_size needs to be a multiple of 4.

因为 CycleGAN 的生成器架构包括了一系列的降采样/升采样操做,若是输入图像的尺寸不是4的倍数,可能致使输入和输出图像的尺寸不匹配。这可能致使你会获得一个运行错误,由于 L1 识别偏差(loss)不能对不一样尺寸的图像进行操做。因此,咱们及时使用 --preprocess none 选项来轻微的调整图像尺寸为 4 的倍数。一样的缘由,--crop_size 也须要是 4 的倍数。

使用高分辨率的图像进行训练/测试(Training/Testing with high res images)

CycleGAN is quite memory-intensive as four networks (two generators and two discriminators) need to be loaded on one GPU, so a large image cannot be entirely loaded. In this case, we recommend training with cropped images. For example, to generate 1024px results, you can train with --preprocess scale_width_and_crop --load_size 1024 --crop_size 360, and test with --preprocess scale_width --load_size 1024. This way makes sure the training and test will be at the same scale. At test time, you can afford higher resolution because you don’t need to load all networks.

因为有 4 个网络( 2 个生成器和 2 个鉴别器)加载到一个 GPU 上,因此 CyclGAN 对存储要求高,因此大图像不能彻底被加载。在这种状况下,咱们推荐使用裁剪的图像进行训练。举例而言,为了产生 1024px 的结果,你可使用 --preprocess scale_width_and_crop --load_size 1024 --crop_size 360 进行训练,而且使用 --preprocess scale_width --load_size 1024 进行测试。这种方法确保训练和测试将在一样的尺度上进行。在测试时,你能够承担的其高分辨率的开销,由于这是不须要加载全部的网络。

长方形图像的训练和测试(Training/Testing with rectangular images)

Both pix2pix and CycleGAN can work for rectangular images. To make them work, you need to use different preprocessing flags. Let’s say that you are working with 360x256 images. During training, you can specify --preprocess crop and --crop_size 256. This will allow your model to be trained on randomly cropped 256x256 images during training time. During test time, you can apply the model on 360x256 images with the flag --preprocess none.

Pix2pix 和 CycleGAN 均可以使用长方形图像工做。为了使得其工做,你须要使用不一样的预处理标记。假如你使用 360x256 图像,在训练期间,你能够指定 --preprocess crop--crop_size 256;这将容许训练时,你的模型使用随机裁剪的 256x256 图像。在测试时,你能够经过标记 --preprocess none 将模型应用于 360x256 图像。

There are practical restrictions regarding image sizes for each generator architecture. For unet256, it only supports images whose width and height are divisible by 256. For unet128, the width and height need to be divisible by 128. For resnet_6blocks and resnet_9blocks, the width and height need to be divisible by 4.

对不一样的生成器架构,有不一样的图像尺寸的实际限制:对于 unet256,它只支持可被 256 整除的宽度和高度;对于 unet128,其宽度和长度须要被 128 整除;对于 resnet_6blocksresnet_9blocks,其宽度和长度须要被 4 整除。

关于损失曲线(About loss curve)

Unfortunately, the loss curve does not reveal much information in training GANs, and CycleGAN is no exception. To check whether the training has converged or not, we recommend periodically generating a few samples and looking at them.

不幸的是,在训练 GAN 中,损失曲线(loss curve)并不能披露更多的信息,CycleGAN 也不例外。为了检查训练是否收敛,咱们推荐周期性的产生一些样本并进行观察。

关于批尺寸(About batch size)

For all experiments in the paper, we set the batch size to be 1. If there is room for memory, you can use higher batch size with batch norm or instance norm. (Note that the default batchnorm does not work well with multi-GPU training. You may consider using synchronized batchnorm instead). But please be aware that it can impact the training. In particular, even with Instance Normalization, different batch sizes can lead to different results. Moreover, increasing --crop_size may be a good alternative to increasing the batch size.

在本论文全部的实验中,咱们设定 batch size 为 1。若是还有存储空间,你可使用更高的 batch size,使用 batch norm 或 instance norm(须要注意的是默认的 batchnorm 支持多 GPU 训练并很差,能够考虑使用 synchronized batchnorm)。可是须要意识这将影响到训练。特别的,即便使用 Instance Normalization,不一样的 batch size 可能致使不一样的结果。更多的,增长 --crop_size 能够做为增长 batch size 的替代。

彩色化的笔记(Notes on Colorization)

No need to run combine_A_and_B.py for colorization. Instead, you need to prepare natural images and set --dataset_mode colorization and --model colorization in the script. The program will automatically convert each RGB image into Lab color space, and create L -> ab image pair during the training. Also set --input_nc 1 and --output_nc 2. The training and test directory should be organized as /your/data/train and your/data/test. See example scripts scripts/train_colorization.sh and scripts/test_colorization for more details.

色彩化不须要运行 combine_A_and_B.py。做为替代,你须要准备天然的图像,并在脚本中设置选项 --dataset_mode colorization--model colorization。该程序将自动把每一个 RGB 图像转换为 Lab 色彩空间,而且在训练过程当中建立 L -> ab 图像对。同时设定 --input_nc 1--output_nc 2。该训练和测试目录应当组织为:/your/data/trainyour/data/test。能够参见示例脚本 scripts/train_colorization.shscripts/test_colorization 来获取详细信息。

提取边缘的笔记(Notes on Extracting Edges)

We provide python and Matlab scripts to extract coarse edges from photos. Run scripts/edges/batch_hed.py to compute HED edges. Run scripts/edges/PostprocessHED.m to simplify edges with additional post-processing steps. Check the code documentation for more details.

咱们提供了 Python 和 Matlab 脚原本从照片中提取粗糙的边缘。运行 scripts/edges/batch_hed.py 来计算 HED 边缘。运行 scripts/edges/PostprocessHED.m 来使用额外的后处理步骤来简化边缘。能够检查代码文档来获取更多细节。

在 都市风光(Cityscapes)来验证 Labels2Photos(Evaluating Labels2Photos on Cityscapes)

We provide scripts for running the evaluation of the Labels2Photos task on the Cityscapes validation set. We assume that you have installed caffe (and pycaffe) in your system. If not, see the official website for installation instructions. Once caffe is successfully installed, download the pre-trained FCN-8s semantic segmentation model (512MB) by running:

咱们提供了脚原本运行在 Cityscapes 验证集上运行 Labes2Photos 任务。咱们假设你已经在你系统里安装了 caffepycaffe。若是没有,可从官方网站来查询安装指引。一旦 caffe 成功安装,经过下面的脚原本下载预先训练的 FCN-8s 语义分割模型(512MB):

bash ./scripts/eval_cityscapes/download_fcn8s.sh

Then make sure ./scripts/eval_cityscapes/ is in your system’s python path. If not, run the following command to add it:

而后,确保 ./scripts/eval_cityscapes/ 在你系统的 Python 路径中,若是不是的话,能够经过下面的命令来增长路径:

export PYTHONPATH=${PYTHONPATH}:./scripts/eval_cityscapes/

Now you can run the following command to evaluate your predictions:

以后你能够经过运行下面的命令来验证你的预测:

python ./scripts/eval_cityscapes/evaluate.py --cityscapes_dir /path/to/original/cityscapes/dataset/ --result_dir /path/to/your/predictions/ --output_dir /path/to/output/directory/

Images stored under --result_dir should contain your model predictions on the Cityscapes validation split, and have the original Cityscapes naming convention (e.g., frankfurt_000001_038418_leftImg8bit.png). The script will output a text file under --output_dir containing the metric.

存储在 --result_dir 之下的图像应该包括了你在 Cityscapes 验证分支的模型预测,而且具备原始的 Cityscapes 命名惯例(如:frankfurt_000001_038418_leftImg8bit.png)。该脚本将输出到一个包括了公制(metric)的 --output_dir 文件夹下的文本文件中。

Further notes: Our pre-trained FCN model is not supposed to work on Cityscapes in the original resolution (1024x2048) as it was trained on 256x256 images that are then upsampled to 1024x2048 during training. The purpose of the resizing during training was to 1) keep the label maps in the original high resolution untouched and 2) avoid the need of changing the standard FCN training code and the architecture for Cityscapes. During test time, you need to synthesize 256x256 results. Our test code will automatically upsample your results to 1024x2048 before feeding them to the pre-trained FCN model. The output is at 1024x2048 resolution and will be compared to 1024x2048 ground truth labels. You do not need to resize the ground truth labels. The best way to verify whether everything is correct is to reproduce the numbers for real images in the paper first. To achieve it, you need to resize the original/real Cityscapes images (not labels) to 256x256 and feed them to the evaluation code.

更多的笔记:咱们预先训练的 FCN 模型不能在原始分辨率(1024x2048)的 Cityscapes 上运行,缘由是其是在 256x256 的图像上进行的训练,并无在训练时上采样到 1024x2048。在训练时进行尺寸调整的目的是:1)保证标记地图保持原始的高分辨率;2)避免为了 Cityscapes 改变标准的 FCN 训练代码和架构。在测试时,你须要合成 256x256 的结果。咱们的测试代码将在图像用到预先训练 FCN 模型前自动上采样你的结果到 1024x2048 分辨率。输出将为 1024x2048 的分辨率,并将比做 1024x2048 的真值标签。你不须要调整真值标签的尺寸。验证全部事情是否正确的最好办法是首先将真实图像的数字复制到纸上。为了实现这个,你须要将原始/实际 Cityscapes 图像(不是标签)调整尺寸到 256x256,并将其应用到验证代码中。