论文地址:InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Netspython
源码地址:InfoGAN in TensorFlowgit
GAN,Generative Adversarial Network是目前很是火也是很是有潜力的一个发展方向,原始的GAN模型存在着无约束、不可控、噪声信号z很难解释等问题,近年来,在原始GAN模型的基础上衍生出了不少种模型,如:条件——CGAN、卷积——DCGAN等等,在本博客的前几篇博文里均进行了大体的解读,本篇博文将提到的InfoGAN也是GAN的一种改进成果,甚至被OPENAI称为去年的五大突破之一。今天咱们就来看看,InfoGAN到底作出了什么样的改进,能达到一个什么样的效果呢。github
2014年,Ian J. Goodfellow提出了生成对抗网络:Generative Adversarial Networks,经过generator和discriminator的对抗学习,最终能够获得一个与real data分布一致的fake data,可是因为generator的输入z是一个连续的噪声信号,而且没有任何约束,致使GAN没法利用这个z,并将z的具体维度与数据的语义特征对应起来,并非一个Interpretable(可解释) Representation,而这正好是InfoGAN的出发点,它试图利用z,寻找一个可解释的表达,因而它将z进行了拆解,一是不可压缩的噪声z,二是可解释的 隐变量c,称做为latent code,而咱们但愿经过约束c与生成数据之间的关系,可使得c里面包含有对数据的可解释的信息,如对MNIST数据,c能够分为categorical latent code代 来表数字种类信息(0~9),以及continuous latent code来表示倾斜度、笔画粗细等等。网络
为了引入c,做者利用互信息来对c进行约束,这是由于若是c对于生成数据G(z,c)具备可解释性,那么c和G(z,c)应该具备高度相关性,即互信息大,而若是是无约束的话,那么它们之间没有特定的关系,即互信息接近于0。所以咱们但愿c与G(z,c)的互信息I(c;G(z,c))越大越好,所以,模型的目标函数也变为:app
可是在I(c;G(z,c))的计算中,真实的P(c|x)并不清楚,所以在具体的优化过程当中,做者采用了变分推断的思想,引入了变分分布Q(c|x)来逼近P(c|x),它是基于最优互信息下界的轮流迭代实现最终的求解,因而InfoGAN的目标函数变为:dom
在具体的实现中,Q和D共用了全部的卷积层,并只在最后增长了一个全链接层来输出Q(c|x),所以InfoGAN并无在原始的GAN上增长多少的计算量。函数
对于c,若是是categorical latent code,可使用softmax的非线性输出来表明Q(c|x);若是是continuous latent code,可使用高斯分布来表示。学习
在实验中,做者经过只改表c的某一个维度,来观察生成数据的变化,实验结果证实了,latent code确实学习到了一些可解释的信息,如在MNIST中的数字,倾斜度、笔画粗细等等。优化
下面咱们看代码,在infogan\__init__.py中第212行:编码
if use_infogan: z_size = style_size + sum(categorical_cardinality) + num_continuous # z_size=74 sample_noise = create_infogan_noise_sample( categorical_cardinality, num_continuous, style_size ) # sample_noise.shape=[64 74]其中style_size为62,categorical_cardinality为[10],num_continuous为2,看create_infogan_noise_sample,表明噪声信号的产生:
def create_infogan_noise_sample(categorical_cardinality, num_continuous, style_size): def sample(batch_size): return encode_infogan_noise( categorical_cardinality, create_categorical_noise(categorical_cardinality, size=batch_size), create_continuous_noise(num_continuous, style_size, size=batch_size) ) return sample其中batch_size=64,看create_categorical_noise,表明categorical latent code的产生:
def create_categorical_noise(categorical_cardinality, size): noise = [] for cardinality in categorical_cardinality: noise.append( np.random.randint(0, cardinality, size=size) ) return noise其中np.random.randint(0, cardinality, size=size)表示生成[0 cardinality)半开半闭区间内的随机整数,在这里即0~9之间的整数,表明数字的种类。
看create_continuous_noise,表明continuous latent code以及不可压缩的噪声z的产生:
def create_continuous_noise(num_continuous, style_size, size): continuous = np.random.uniform(-1.0, 1.0, size=(size, num_continuous)) style = np.random.standard_normal(size=(size, style_size)) return np.hstack([continuous, style])其中continuous latent code服从-1到1之间的均匀分布,style即噪声z服从标准正态分布,再将continuous latent code与style进行concat。
def encode_infogan_noise(categorical_cardinality, categorical_samples, continuous_samples): noise = [] for cardinality, sample in zip(categorical_cardinality, categorical_samples): noise.append(make_one_hot(sample, size=cardinality)) noise.append(continuous_samples) return np.hstack(noise)对于categorical latent code,将categorical进行one-hot编码,即生成长度为10的0-1向量。而后再将三者concat,就生成了噪声样本。
再看__init__.py的第33行:
def generator_forward(z, network_description, is_training, reuse=None, name="generator", use_batch_norm=True, debug=False): with tf.variable_scope(name, reuse=reuse): return run_network(z, network_description, is_training=is_training, use_batch_norm=use_batch_norm, debug=debug, strip_batchnorm_from_last_layer=True)定义了生成器,其中network_description为"fc:1024,fc:7x7x128,reshape:7:7:128,deconv:4:2:64,deconv:4:2:1:sigmoid",输出为28*28的生成样本,即fake_image。
看第48行:
def discriminator_forward(img, network_description, is_training, reuse=None, name="discriminator", use_batch_norm=True, debug=False): with tf.variable_scope(name, reuse=reuse): out = run_network(img, network_description, is_training=is_training, use_batch_norm=use_batch_norm, debug=debug) out = layers.flatten(out) prob = layers.fully_connected( out, num_outputs=1, activation_fn=tf.nn.sigmoid, scope="prob_projection" ) return {"prob":prob, "hidden":out}其中network_description为"conv:4:2:64:lrelu,conv:4:2:128:lrelu,fc:1024:lrelu",out的维度为[64 1024],prob的维度为[64 1]表示对输入样本关于real_image的预测几率。
第291行:
# discriminator should maximize: ll_believing_fake_images_are_fake = tf.log(1.0 - prob_fake + TINY) ll_true_images = tf.log(prob_true + TINY) discriminator_obj = ( tf.reduce_mean(ll_believing_fake_images_are_fake) + tf.reduce_mean(ll_true_images) )定义了discriminator的目标函数,与原始GAN中的目标函数一致,其中TINY为很小的数,为了不log里面的数等于0。
第299行:
# generator should maximize: ll_believing_fake_images_are_real = tf.reduce_mean(tf.log(prob_fake + TINY)) generator_obj = ll_believing_fake_images_are_real定义了generator的目标函数,与原始GAN中的目标函数一致。
看320行:
q_output = reconstruct_mutual_info( categorical_c_vectors, continuous_c_vector, categorical_lambda=args.categorical_lambda, continuous_lambda=args.continuous_lambda, fix_std=fix_std, hidden=discriminator_fake["hidden"], is_training=is_training_discriminator, name="mutual_info" )其中categorical_c_vectors对应了以前的categorical latent code,continuous_c_vector对应了continuous latent code,hidden为fake_image的discriminator输出,fix_std表示"Fix continuous var standard deviation to 1."。
再看reconstruct_mutual_info,第82行和第93行将fake_image的discriminator输出再输入到两个全链接中,最终的输出维度为[64 12]。看第101行:
ll_categorical = None for true_categorical in true_categoricals: cardinality = true_categorical.get_shape()[1].value prob_categorical = tf.nn.softmax(out[:, offset:offset + cardinality]) ll_categorical_new = tf.reduce_sum(tf.log(prob_categorical + TINY) * true_categorical, reduction_indices=1 ) if ll_categorical is None: ll_categorical = ll_categorical_new else: ll_categorical = ll_categorical + ll_categorical_new关于categorical latent code的目标函数为G(z,c)对应于categorical的输出的softmax与categorical latent code的交叉熵。
第114行:
mean_contig = out[:, num_categorical:num_categorical + num_continuous] if fix_std: std_contig = tf.ones_like(mean_contig) else: std_contig = tf.sqrt(tf.exp(out[:, num_categorical + num_continuous:num_categorical + num_continuous * 2])) epsilon = (true_continuous - mean_contig) / (std_contig + TINY) ll_continuous = tf.reduce_sum( - 0.5 * np.log(2 * np.pi) - tf.log(std_contig + TINY) - 0.5 * tf.square(epsilon), reduction_indices=1, )关于continuous latent code的目标函数,将continuous latent code以均值为G(z,c)对应于continuous的输出,方差为1进行标准化,而后计算它以正态分布的几率密度做为目标函数。
mutual_info_lb = continuous_lambda * ll_continuous + categorical_lambda * ll_categorical即为c与G(z,c)的互信息的目标函数。
# train discriminator noise = sample_noise(batch_size) _, summary_result1, disc_obj, infogan_obj = sess.run( [train_discriminator, discriminator_obj_summary, discriminator_obj, neg_mutual_info_objective], feed_dict={ true_images:batch, zc_vectors:noise, is_training_discriminator:True, is_training_generator:True } )以及第438行:
# train generator noise = sample_noise(batch_size) _, _, summary_result2, gen_obj, infogan_obj = sess.run( [train_generator, train_mutual_info, generator_obj_summary, generator_obj, neg_mutual_info_objective], feed_dict={ zc_vectors:noise, is_training_discriminator:True, is_training_generator:True } )看实验结果:
修改categorical变量,能够生成不一样的数字图像;修改continuous变量,能够改变生成数字的倾斜度以及笔画的宽度。