6.1 图像识别问题简介及经典数据集node
CIFAR 数据集就是一个影响力很大的图像分类数据集。CIFAR数据集分为了CIFAR-10 和 CIFAR-100 两个问题,它们都是图像词典项目( Visual Dictionary ) 中 800 万张图片的一个子集。CIFAR数据集中的图片为32×32的彩色图片。python
CIFAR-10 问题收集了来自 10 个不一样种类的 60000 张图片。每张图片大小固定且仅含一个种类的实体。与MNIST相比,最大区别是图片由黑白变成彩色,且分类难度更高。git
不管是 MNIST 数据集仍是 CIFAR 数据集,相比真实环境下的图像识别问题,有 2 个最大的问题。第一,现实生活中的图片分辨率要远高于 32 × 32,并且图像的分辨率也不会是固定的。第二,现实生活中的物体类别不少,不管是 10 种仍是 100 种都远远不够,并且一张图片中不会只出现一个种类的物体。正则表达式
ImageNet很大程度上解决了这两个问题,更加贴近真实环境下的图像识别问题。算法
ImageNet是一个基于WordNet的大型图像数据库。有将近1500万图片被关联到了WordNet的大约2000个名词同义词集上。每个与ImageNet相关的WordNet同义词集都表明了现实世界中的一个实体,能够被认为是分类问题中的一个类别。一张图片中可能有多个同义词集所表明的实体。数据库
ILSVRC2012图像分类数据集是ImageNet的子集,包含了来自1000个类别的120万张图片,其中每张图片只属于一个类别。图片是直接从网上爬取的,因此图片的大小从几千字节到几百万字节不等。数组
top-N正确率是指图像识别算法给出前N个答案中有一个是正确的几率。在图像分类问题上,不少学术论文都将前N个答案的正确率做为比较的方法,其中N的取值通常为3或5。网络
6.2 卷积神经网络简介架构
在全链接神经网络中,每相邻两层之间的节点都有边相连,因而通常会将每一层全链接层中的节点组织成一列,这样方便显示链接结构。而对于卷积神经网络,相邻两层之间只有部分节点相连,通常会将每一层卷积层的节点组织成一个三维矩阵。虽然直观上差别很大,实际上总体架构很是类似,并且输入输出、训练流程也基本一致。两者惟一的区别在于神经网络中相邻两层的链接方式。app
使用全链接神经网络处理图像的最大问题在于全链接层的参数太多,参数增多除了致使计算速度减慢,还很容易致使过拟合问题。卷积神经网络的目的就是为了减小参数个数。
卷积神经网络中前几层中每一个节点只和上一层中的部分节点相连。
卷积神经网络的五种结构:
1.输入层:一张图片的像素矩阵,长×宽×深度(色道)
2.卷积层:卷积层中每一个节点的输入只是上一层神经网络的一小块,这个小块的经常使用大小有3×3或5×5。卷积层试图将神经网络中的每一小块进行更深刻地分析从而获得抽象程度更高的特征。通常来讲,经过卷积层处理过的节点矩阵深度会增长。
3.池化层:不改变三维矩阵的深度,但能够缩小矩阵的大小。池化操做能够认为是将一张分辨率较高的图片转化为分辨率较低的图片。经过池化层,能够进一步缩小最后全链接层中节点的个数,从而达到减小整个神经网络中参数的目的。
4.全链接层:通过几轮卷积层和池化层的处理以后,能够认为图像中的信息已经被抽象成了信息含量更高的特征。在卷积层和池化层完成自动图像特征提取以后,仍然须要全链接层完成分类任务。
5.Softmax层:用于分类问题。经过Softmax层,能够获得当前样例属于不一样种类的几率分布状况。
6.3 卷积神经网络经常使用结构
卷积层神经网络结构中最重要的部分是过滤器(filter)或者内核(kernel),过滤器能够把当前层神经网络上的一个子节点矩阵转化为下一层神经网络上的一个单位节点矩阵,即长宽为1,深度不限的节点矩阵。
过滤器所处理的节点矩阵的长和宽都是人工指定的,这个节点矩阵的尺寸也被称为过滤器的尺寸。经常使用尺寸有3×3或5×5。由于过滤器处理的矩阵深度和当前层神经网络节点矩阵的深度是一致的,因此虽然节点矩阵是三维的,但过滤器的尺寸只需指定两个维度。
过滤器中另一个须要人工指定的设置是处理获得的单位节点矩阵的深度,称为过滤器的深度。
(局部)过滤器的前向传播过程就是经过左侧小矩阵中的节点计算出右侧单位节点矩阵中节点的过程。与全链接层相似,也是权重和偏置项。如图6-8
(总体)卷积层的前向传播过程就是经过将一个过滤器从神经网络当前层的左上角移动到右下角,而且在移动中计算每个对应的单位矩阵获得的。如图6-10
过滤器每移动一次,能够计算出一个值(当深度为 k 时会计算出 k 个值),将这些数值拼接成一个新的矩阵,就完成了卷积层前向传播的过程。
当过滤器的大小不为 1×1 时,卷积层前向传播获得的矩阵的尺寸要小于当前层矩阵的尺寸。
为了不尺寸的变化,能够在当前层矩阵的边界上加入全0填充。如图6-11
除了使用全0填充,还能够经过设置过滤器移动的步长来调整结果矩阵的大小。图6-12显示了当移动步长为2且使用全0填充时,卷积层前向传播的过程。
输出层大小的肯定:
使用全0填充时,向上取整
out_length =in_length / stride_length
out_width = in_width / stride_width
不使用全0填充时,向上取整
out_length = (in_length - filter_length +1) / stride_length
out_width = (in_width - filter_width + 1) / stride_width
卷积神经网络有一个很是重要的性质就是每个卷积层中使用的过滤器中的参数相同,这可使得图像上的内容不受位置的影响。以mnist手写体数字识别为例,不管数字“1”出如今左上角仍是右下角,图片的种类都是不变的。并且,共享每一个卷积层中过滤器的参数能够巨幅减小神经网络上的参数。以CIFAR-10问题为例,输入层矩阵的维度为32×32×3,假设卷积层使用的过滤器尺寸为5×5,深度为16,那么这个卷积层的参数个数为5*5*3*16+16=1216个(能够想象为输入层为5*5*三、输出层为16*1的全链接层)。并且,参数个数只与过滤器的尺寸、深度以及当前层节点矩阵的深度有关,而与图片大小无关,这使得卷积神经网络能够很好地扩展到更大的图像数据上。
经过tensorflow实现卷积层的前向传播,
1 x = tf.placeholder(tf.float32, shape=[None, 32, 32, 3], name='x-input') 2 # shape分别为过滤器尺寸、当前层深度、过滤器深度 3 filter_weight = tf.get_variable( 4 'weights', shape=[5, 5, 3, 16], initializer=tf.truncated_normal_initializer(stddev=0.1) 5 ) 6 biases = tf.get_variable( 7 'biases', shape=[16], initializer=tf.constant_initializer(0.1) # shape为过滤器深度 8 ) 9 # 第一个输入为当前层的节点矩阵,该矩阵为四维矩阵,第一个维度对应一个输入batch, 后三个维度对应一个节点矩阵(长*宽*深) 10 # 第二个输入为卷积层的权重,也就是过滤器 11 # 第三个输入为不一样维度上的步长,长度为4的数组,要求第一维度和第四维度必定为1, 由于卷积层的步长只对矩阵的长和宽有效 12 # 第四个输入为padding, 取值能够为SAME或VALID 13 conv = tf.nn.conv2d( 14 x, filter_weight, strides=[1, 1, 1, 1], padding='SAME' 15 ) 16 # print(conv.shape) # (?, 32, 32, 16) # 深度变成16, 根据公式,使用全0填充时为32, 不使用时为28 17 18 # 不能直接使用加法,由于矩阵上不一样位置的节点都须要加上一样的偏置项。 19 # 例如图6-13所示, 虽然下一层神经网络的大小为 2×2, 可是偏置项只有一个数(由于深度为1), 而2×2矩阵中的每个值都须要加上这个偏置项。 20 bias = tf.nn.bias_add(conv, biases) 21 actived_conv = tf.nn.relu(bias) 22 23 # 注意区分输入的四个维度、权重的四个维度、步长的四个维度。
6.3.2 池化层
池化层主要用于减少矩阵的尺寸,从而减小最后全链接层中的参数。使用池化层既能够加快计算速度也有防止过拟合问题的做用。
池化层的前向传播过程也是经过移动一个相似过滤器的结构完成的。但池化层过滤器中的计算不是节点的加权和,而是采用更简单的最大值或平均值运算。使用最大值操做的池化层称为最大池化层,使用平均值操做的池化层称为平均池化层。
与卷积层的过滤器相似,池化层的过滤器也须要人工设定过滤器的尺寸、是否使用全0填充以及过滤器移动的步长等。卷积层和池化层中过滤器的移动方式是类似的,惟一的区别在于卷积层使用的过滤器是横跨整个深度的,而池化层使用的过滤器只影响一个深度上的节点。因此池化层的过滤器除了在长和宽上移动,还须要在深度上移动。
卷积层,深度为3,3个相加
池化层,深度为2,分别处理
经过tensorflow实现池化层的前向传播,
1 # 第一个输入为当前层节点矩阵(四维矩阵) 2 # 第二个为过滤器尺寸,长度为4的数组,第一维度和第四维度必须为1,这意味着过滤器不可跨不一样输入样例和节点矩阵深度,使用最多的是[1,2,2,1]或[1,3,3,1] # 与卷积层不一样 3 # 第三个输入为步长,长度为4的数组,第一维度和第四维度必须为1,这意味着池化层不能减小节点矩阵的深度或输入样例的个数。 4 # 第四个输入为padding 5 pool = tf.nn.max_pool(actived_conv, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1], padding='SAME')
卷积层和池化层的最大不一样在于过滤器:[5, 5, 3, 16] [1, 3, 3, 1]
6.4 经典卷积网络模型
6.4.1 LeNet-5模型
第一个成功应用于数字识别问题的卷积神经网络。
LetNet-5模型接受的输入层大小为三维矩阵(长×宽×深)。
参数个数远远小于链接个数,但卷积层的链接个数??没搞懂为啥还要加1。
只有全链接层的权重须要加入正则化。
relu和dropout不在最后一层使用。
1 # mnist_inference.py 2 3 import tensorflow as tf 4 5 IMAGE_SIZE = 28 6 NUM_CHANNELS = 1 # 黑白 7 NUM_LABELS = 10 8 9 # 第一层卷积层的尺寸和深度 10 CONV1_SIZE = 5 11 CONV1_DEEP = 32 12 # 第二层卷积层的尺寸和深度 13 CONV2_SIZE = 5 14 CONV2_DEEP = 64 15 # 全链接层的节点个数 16 FC_SIZE = 512 17 18 def get_weight_variable(shape, regularizer): 19 weights = tf.get_variable('weight', shape, initializer=tf.truncated_normal_initializer(stddev=0.1)) 20 if regularizer: 21 tf.add_to_collection('losses', regularizer(weights)) 22 return weights 23 24 def inference(input_tensor, train, regularizer): 25 with tf.variable_scope('layer1-conv1'): 26 # 输入层为28×28×1,尺寸为5×5,深度为32,步长为1,输出层为28×28×32 27 conv1_weights = get_weight_variable([CONV1_SIZE, CONV1_SIZE, NUM_CHANNELS, CONV1_DEEP], None) 28 conv1_biases = tf.get_variable('bias', [CONV1_DEEP], initializer=tf.constant_initializer(0.0)) 29 conv1 = tf.nn.conv2d(input_tensor, conv1_weights, strides=[1, 1, 1, 1], padding='SAME') 30 relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv1_biases)) 31 32 with tf.name_scope('layer2-pool1'): 33 # 输出层为14*14*32 34 pool1 = tf.nn.max_pool(relu1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') 35 36 with tf.variable_scope('layer3-conv2'): 37 # 尺寸为5*5,深度为64,输出层为14*14*64 38 conv2_weights = get_weight_variable([CONV2_SIZE, CONV2_SIZE, CONV1_DEEP, CONV2_DEEP], None) 39 conv2_biases = tf.get_variable('bias', [CONV2_DEEP], initializer=tf.constant_initializer(0.0)) 40 conv2 = tf.nn.conv2d(pool1, conv2_weights, strides=[1, 1, 1, 1], padding='SAME') 41 relu2 = tf.nn.relu(tf.nn.bias_add(conv2, conv2_biases)) 42 43 with tf.name_scope('layer4-pool2'): 44 # 输出层为7*7*64 45 pool2 = tf.nn.max_pool(relu2, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME') 46 47 # 全链接层的输入格式为特征向量,这就须要将三维矩阵拉直成一维向量。 48 pool_shape = pool2.get_shape().as_list() # 包含一个batch 49 nodes = pool_shape[1] * pool_shape[2] * pool_shape[3] # 3136 50 reshaped = tf.reshape(pool2, [pool_shape[0], nodes]) 51 52 # dropout在训练时会随机将部分节点的输出改成0。dropout方法能够进一步提高模型可靠性并防止过拟合,dropout过程只在训练时使用。 53 with tf.variable_scope('layer5-fc1'): 54 # 只有全链接层的权重须要加入正则化 55 fc1_weights = get_weight_variable([nodes, FC_SIZE], regularizer) 56 fc1_biases = tf.get_variable('bias', shape=[FC_SIZE], initializer=tf.constant_initializer(0.1)) 57 fc1 = tf.nn.relu(tf.matmul(reshaped, fc1_weights) + fc1_biases) 58 if train: 59 fc1 = tf.nn.dropout(fc1, 0.5) 60 61 with tf.variable_scope('layer6-fc2'): 62 fc2_weights = get_weight_variable([FC_SIZE, NUM_LABELS], regularizer) 63 fc2_biases = tf.get_variable('bias', shape=[NUM_LABELS], initializer=tf.constant_initializer(0.1)) 64 logit = tf.matmul(fc1, fc2_weights) + fc2_biases 65 66 # relu和dropout不在最后一层使用。 后面会使用sparse_softmax_cross_entropy_with_logits计算交叉熵。 67 return logit 68 69 72 # mnist_train.py 73 74 #!coding:utf8 75 import tensorflow as tf 76 from tensorflow.examples.tutorials.mnist import input_data 77 import mnist_inference 78 import os 79 import numpy as np 80 81 BATCH_SIZE = 100 82 83 LEARNING_RATE_BASE = 0.8 84 LEARNING_RATE_DECAY = 0.99 85 REGULARIZATION_RATE = 0.0001 # lambda 86 TRAINING_STEPS = 30000 87 MOVING_AVERAGE_DACAY = 0.99 88 89 MODEL_SAVE_PATH = '/home/yangxl/files/save_model' 90 MODEL_NAME = 'conv2d.ckpt' 91 92 93 def train(mnist): 94 # 由于从池化层到全链接层要进行reshape,因此不能为shape[0]不能为None。 95 x = tf.placeholder(tf.float32, [BATCH_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS], 'x-input') 96 y_ = tf.placeholder(tf.float32, [BATCH_SIZE, mnist_inference.NUM_LABELS], 'y-input') 97 98 # 正则化 99 from tensorflow.contrib.layers import l2_regularizer 100 regularizer = l2_regularizer(REGULARIZATION_RATE) 101 102 y = mnist_inference.inference(x, True, regularizer) 103 104 global_step = tf.Variable(0, trainable=False) 105 106 # 滑动平均 107 variables_averages = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DACAY, global_step) 108 variables_averages_op = variables_averages.apply(tf.trainable_variables()) 109 # 互斥分类问题; 110 # 由于标准答案是一个长度为10的一维数组,而该函数须要提供的是一个正确答案的数字,因此须要使用tf.argmax 函数来获得正确答案对应的类别编号。 111 cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(logits=y, labels=tf.argmax(y_, 1)) 112 cross_entropy_mean = tf.reduce_mean(cross_entropy) 113 loss = cross_entropy_mean + tf.add_n(tf.get_collection('losses')) 114 115 learning_rate = tf.train.exponential_decay(LEARNING_RATE_BASE, global_step, mnist.train.num_examples / BATCH_SIZE, 116 LEARNING_RATE_DECAY, staircase=True) 117 train_step = tf.train.GradientDescentOptimizer(learning_rate).minimize(loss, global_step) 118 with tf.control_dependencies([train_step, variables_averages_op]): 119 train_op = tf.no_op(name='train') 120 121 saver = tf.train.Saver() 122 123 with tf.Session() as sess: 124 tf.global_variables_initializer().run() 125 126 for i in range(TRAINING_STEPS): 127 xs, ys = mnist.train.next_batch(BATCH_SIZE) # xs.shape=(100, 784) 128 reshaped_xs = np.reshape(xs, [BATCH_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS]) 129 _, loss_value, step = sess.run([train_op, loss, global_step], feed_dict={x: reshaped_xs, y_: ys}) 130 131 if i % 1000 == 0: 132 print('after %d training steps, loss on training batch is %g ' % (i, loss_value)) 133 saver.save(sess, os.path.join(MODEL_SAVE_PATH, MODEL_NAME), global_step=global_step) 134 135 136 def main(argv=None): 137 mnist = input_data.read_data_sets('/home/yangxl/files/mnist', one_hot=True) 138 import time 139 # print('start...', int(time.time())) 140 train(mnist) 141 # print(int(time.time())) 142 143 144 if __name__ == '__main__': 145 tf.app.run() 146 147 150 # mnist_eval.py 151 152 #!coding:utf8 153 import tensorflow as tf 154 from tensorflow.examples.tutorials.mnist import input_data 155 import mnist_inference 156 import mnist_train 157 import time 158 import numpy as np 159 160 # 每10秒加载一次最新模型,并在测试数据上测试最新模型的正确率。 161 EVAL_INTERVAL_SECS = 60 162 163 def evaluate(mnist): 164 x = tf.placeholder(tf.float32, [mnist.test.num_examples, mnist_inference.IMAGE_SIZE, mnist_inference.IMAGE_SIZE, mnist_inference.NUM_CHANNELS], 'x-input') 165 y_ = tf.placeholder(tf.float32, [mnist.test.num_examples, mnist_inference.NUM_LABELS], 'y-input') 166 167 y = mnist_inference.inference(x, False, None) 168 169 correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 170 accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 171 172 # 滑动平均 173 variables_averages = tf.train.ExponentialMovingAverage(mnist_train.MOVING_AVERAGE_DACAY) 174 variables_to_restore = variables_averages.variables_to_restore() 175 176 saver = tf.train.Saver(variables_to_restore) # 训练时须要保存滑动平均模型,验证时才能加载到。 177 178 while True: 179 with tf.Session() as sess: 180 reshape_x = np.reshape(mnist.test.images, [-1, 28, 28, 1]) 181 validate_feed = {x: reshape_x, y_: mnist.test.labels} 182 183 # 经过checkpoint文件自动找到目录中最新模型的文件名 184 ckpt = tf.train.get_checkpoint_state(mnist_train.MODEL_SAVE_PATH) 185 if ckpt and ckpt.model_checkpoint_path: 186 saver.restore(sess, ckpt.model_checkpoint_path) 187 188 global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1] 189 accuracy_score = sess.run(accuracy, feed_dict=validate_feed) 190 print('after %s training steps, validation accuracy = %g ' % (global_step, accuracy_score)) 191 else: 192 print('No checkpoint file found') 193 return 194 time.sleep(EVAL_INTERVAL_SECS) 195 196 197 def main(argv=None): 198 mnist = input_data.read_data_sets('/home/yangxl/files/mnist', one_hot=True) 199 print('start...') 200 evaluate(mnist) 201 202 203 if __name__ == '__main__': 204 tf.app.run()
代码么问题,损失函数不固定,准确率大约为0.117,应该大约为99.4%才对啊。
一种卷积神经网络架构不能解决全部问题。好比,LeNet-5模型就没法很好地处理相似ImageNet这样比较大的图像数据集。
如下正则表达式公式总结了一些经典的用于图片分类问题的卷积神经网络架构:输入层 --> (卷积层+ --> 池化层?)+ --> 全链接层+
大部分卷积神经网络中通常最多连续使用三层卷积层。
在多轮卷积层和池化层以后,卷积神经网络在输出以前通常会通过1~2个全链接层。
在过滤器的深度上,大部分卷积神经网络都采用逐层递增的方式。
6.4.2 Inception-v3模型
LeNet-5模型中,不一样卷积层经过串联的方式链接在一块儿,而Inception-v3模型中,inception结构是将不一样卷积层经过并联的方式结合在一块儿。
在6.4.1中提到了一个卷积层可使用边长为一、3或5的过滤器,那么如何在这些边长中选择呢?inception模型给出了一个方案,那就是同时使用全部不一样尺寸的过滤器,而后再将获得的矩阵拼接起来。
虽然过滤器的尺寸不一样,但若是全部的过滤器都使用全0填充而且步长为1,那么前向传播获得的结果矩阵的长和宽都与输入矩阵一致。这样通过不一样过滤器处理的结果矩阵能够拼接成一个更深的矩阵。
Inception-v3 模型总共有46 层(图中方框里的层数),由 11 个(图中方框) Inception 模块组成。在 Inception-v3 模型中有 96 个卷积层。
inception-v3模型的代码和slim库,
1 import tensorflow as tf 2 import tensorflow.contrib.slim as slim 3 4 trunc_normal = lambda stddev: tf.truncated_normal_initializer(0.0, stddev) 5 6 def inception_v3_base(inputs, 7 final_endpoint='Mixed_7c', 8 min_depth=16, 9 depth_multiplier=1.0, 10 scope=None): 11 end_points = {} 12 13 if depth_multiplier <= 0: 14 raise ValueError('depth_multiplier is not greater than zero.') 15 depth = lambda d: max(int(d * depth_multiplier), min_depth) 16 17 with tf.variable_scope(scope, 'InceptionV3', [inputs]): 18 # arg_scope用于设置默认的参数取值 19 with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], 20 stride=1, 21 padding='VALID'): 22 # 299 × 299 × 3 23 end_point = 'Conv2d_1a_3x3' # 字母数字下划线,乘号用x代替 24 # 不使用全0填充 25 net = slim.conv2d(inputs, depth(32), [3, 3], stride=2, scope=end_point) 26 end_points[end_point] = net 27 if end_point == final_endpoint: 28 return net, end_points 29 # 149 × 149 × 32 30 end_point = 'Conv2d_2a_3x3' 31 # 不使用全0填充,步长为1 32 net = slim.conv2d(net, depth(32), [3, 3], scope=end_point) 33 end_points[end_point] = net 34 if end_point == final_endpoint: 35 return net, end_points 36 # 147 × 147 × 32 37 end_point = 'Conv2d_2b_3x3' 38 net = slim.conv2d(net, depth(64), [3, 3], padding='SAME', scope=end_point) 39 end_points[end_point] = net 40 if end_point == final_endpoint: 41 return net, end_points 42 # 147 × 147 × 64 43 end_point = 'MaxPool_3a_3x3' 44 net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point) 45 end_points[end_point] = net 46 if end_point == final_endpoint: 47 return net, end_points 48 # 73 × 73 × 64 49 end_point = 'Conv2d_3b_1x1' 50 net = slim.conv2d(net, depth(80), [1, 1], scope=end_point) 51 end_points[end_point] = net 52 if end_point == final_endpoint: 53 return net, end_points 54 # 73 × 73 × 80 55 end_point = 'Conv2d_4a_3x3' 56 net = slim.conv2d(net, depth(192), [3, 3], scope=end_point) 57 end_points[end_point] = net 58 if end_point == final_endpoint: 59 return net, end_points 60 # 71 × 71 × 192 61 end_point = 'MaxPool_5a_3x3' 62 net = slim.max_pool2d(net, [3, 3], stride=2, scope=end_point) 63 end_points[end_point] = net 64 if end_point == final_endpoint: 65 return net, end_points 66 # 35 × 35 × 192 67 68 # Inception blocks 69 with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], 70 stride=1, 71 padding='SAME'): 72 # mixed: 35 × 35 × 256 73 end_point = 'Mixed_5b' 74 with tf.variable_scope(end_point): 75 with tf.variable_scope('Branch_0'): 76 branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 77 with tf.variable_scope('Branch_1'): 78 branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1') 79 branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_1x1') 80 with tf.variable_scope('Branch_2'): 81 branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 82 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_1x1') 83 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_1x1') 84 with tf.variable_scope('Branch_3'): 85 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 86 branch_3 = slim.conv2d(branch_3, depth(32), [1, 1], scope='Conv2d_0b_1x1') 87 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 88 end_points[end_point] = net 89 if end_point == final_endpoint: 90 return net, end_points 91 92 # mixed_1: 35 × 35 × 288 93 end_point = 'Mixed_5c' 94 with tf.variable_scope(end_point): 95 with tf.variable_scope('Branch_0'): 96 branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 97 with tf.variable_scope('Branch_1'): 98 branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0b_1x1') 99 branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv_1_0c_1x1') 100 with tf.variable_scope('Branch_2'): 101 branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 102 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3') 103 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3') 104 with tf.variable_scope('Branch_3'): 105 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 106 branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1') 107 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 108 end_points[end_point] = net 109 if end_point == final_endpoint: 110 return net, end_points 111 112 # mixed_2: 35 × 35 × 288 113 end_point = 'Mixed_5d' 114 with tf.variable_scope(end_point): 115 with tf.variable_scope('Branch_0'): 116 branch_0 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 117 with tf.variable_scope('Branch_1'): 118 branch_1 = slim.conv2d(net, depth(48), [1, 1], scope='Conv2d_0a_1x1') 119 branch_1 = slim.conv2d(branch_1, depth(64), [5, 5], scope='Conv2d_0b_5x5') 120 with tf.variable_scope('Branch_2'): 121 branch_2 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 122 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0b_3x3') 123 branch_2 = slim.conv2d(branch_2, depth(96), [3, 3], scope='Conv2d_0c_3x3') 124 with tf.variable_scope('Branch_3'): 125 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 126 branch_3 = slim.conv2d(branch_3, depth(64), [1, 1], scope='Conv2d_0b_1x1') 127 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 128 end_points[end_point] = net 129 if end_point == final_endpoint: 130 return net, end_points 131 132 # mixed_3: 17 × 17 × 768 133 end_point = 'Mixed_6a' 134 with tf.variable_scope(end_point): 135 with tf.variable_scope('Branch_0'): 136 branch_0 = slim.conv2d(net, depth(384), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1') 137 with tf.variable_scope('Branch_1'): 138 branch_1 = slim.conv2d(net, depth(64), [1, 1], scope='Conv2d_0a_1x1') 139 branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], scope='Conv2d_0b_3x3') 140 branch_1 = slim.conv2d(branch_1, depth(96), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_1x1') 141 with tf.variable_scope('Branch_2'): 142 branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') 143 net = tf.concat([branch_0, branch_1, branch_2], 3) 144 end_points[end_point] = net 145 if end_point == final_endpoint: 146 return net, end_points 147 148 # mixed_4: 17 x 17 x 768. 149 end_point = 'Mixed_6b' 150 with tf.variable_scope(end_point): 151 with tf.variable_scope('Branch_0'): 152 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 153 with tf.variable_scope('Branch_1'): 154 branch_1 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1') 155 branch_1 = slim.conv2d(branch_1, depth(128), [1, 7], scope='Conv2d_0b_1x7') # 输出层大小不变,即便过滤器长宽不一样 156 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 157 with tf.variable_scope('Branch_2'): 158 branch_2 = slim.conv2d(net, depth(128), [1, 1], scope='Conv2d_0a_1x1') 159 branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0b_7x1') 160 branch_2 = slim.conv2d(branch_2, depth(128), [1, 7], scope='Conv2d_0c_1x7') 161 branch_2 = slim.conv2d(branch_2, depth(128), [7, 1], scope='Conv2d_0d_7x1') 162 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 163 with tf.variable_scope('Branch_3'): 164 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 165 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 166 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 167 end_points[end_point] = net 168 if end_point == final_endpoint: 169 return net, end_points 170 171 # mixed_5: 17 x 17 x 768. 172 end_point = 'Mixed_6c' 173 with tf.variable_scope(end_point): 174 with tf.variable_scope('Branch_0'): 175 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 176 with tf.variable_scope('Branch_1'): 177 branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 178 branch_1 = slim.conv2d(branch_1, depth(160), [1, 7], scope='Conv2d_0b_1x7') 179 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 180 with tf.variable_scope('Branch_2'): 181 branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 182 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0b_7x1') 183 branch_2 = slim.conv2d(branch_2, depth(160), [1, 7], scope='Conv2d_0c_1x7') 184 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0d_7x1') 185 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 186 with tf.variable_scope('Branch_3'): 187 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 188 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 189 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 190 end_points[end_point] = net 191 if end_point == final_endpoint: 192 return net, end_points 193 194 # mixed_6: 17 x 17 x 768. 195 end_point = 'Mixed_6d' 196 with tf.variable_scope(end_point): 197 with tf.variable_scope('Branch_0'): 198 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 199 with tf.variable_scope('Branch_1'): 200 branch_1 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 201 branch_1 = slim.conv2d(branch_1, depth(160), [1, 7], scope='Conv2d_0b_1x7') 202 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 203 with tf.variable_scope('Branch_2'): 204 branch_2 = slim.conv2d(net, depth(160), [1, 1], scope='Conv2d_0a_1x1') 205 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0b_7x1') 206 branch_2 = slim.conv2d(branch_2, depth(160), [1, 7], scope='Conv2d_0c_1x7') 207 branch_2 = slim.conv2d(branch_2, depth(160), [7, 1], scope='Conv2d_0d_7x1') 208 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 209 with tf.variable_scope('Branch_3'): 210 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 211 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 212 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 213 end_points[end_point] = net 214 if end_point == final_endpoint: 215 return net, end_points 216 217 # mixed_7: 17 x 17 x 768. 218 end_point = 'Mixed_6e' 219 with tf.variable_scope(end_point): 220 with tf.variable_scope('Branch_0'): 221 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 222 with tf.variable_scope('Branch_1'): 223 branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 224 branch_1 = slim.conv2d(branch_1, depth(192), [1, 7], scope='Conv2d_0b_1x7') 225 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 226 with tf.variable_scope('Branch_2'): 227 branch_2 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 228 branch_2 = slim.conv2d(branch_2, depth(192), [7, 1], scope='Conv2d_0b_7x1') 229 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0c_1x7') 230 branch_2 = slim.conv2d(branch_2, depth(192), [7, 1], scope='Conv2d_0d_7x1') 231 branch_2 = slim.conv2d(branch_2, depth(192), [1, 7], scope='Conv2d_0e_1x7') 232 with tf.variable_scope('Branch_3'): 233 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 234 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 235 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 236 end_points[end_point] = net 237 if end_point == final_endpoint: 238 return net, end_points 239 240 # mixed_8: 8 x 8 x 1280. 241 end_point = 'Mixed_7a' 242 with tf.variable_scope(end_point): 243 with tf.variable_scope('Branch_0'): 244 branch_0 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 245 branch_0 = slim.conv2d(branch_0, depth(320), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3') 246 with tf.variable_scope('Branch_1'): 247 branch_1 = slim.conv2d(net, depth(192), [1, 1], scope='Conv2d_0a_1x1') 248 branch_1 = slim.conv2d(branch_1, depth(192), [1, 7], scope='Conv2d_0b_1x7') 249 branch_1 = slim.conv2d(branch_1, depth(192), [7, 1], scope='Conv2d_0c_7x1') 250 branch_1 = slim.conv2d(branch_1, depth(192), [3, 3], stride=2, padding='VALID', scope='Conv2d_1a_3x3') 251 with tf.variable_scope('Branch_2'): 252 branch_2 = slim.max_pool2d(net, [3, 3], stride=2, padding='VALID', scope='MaxPool_1a_3x3') 253 net = tf.concat([branch_0, branch_1, branch_2], 3) 254 end_points[end_point] = net 255 if end_point == final_endpoint: 256 return net, end_points 257 258 # mixed_9: 8 x 8 x 2048. 259 end_point = 'Mixed_7b' 260 with tf.variable_scope(end_point): 261 with tf.variable_scope('Branch_0'): 262 branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1') 263 with tf.variable_scope('Branch_1'): 264 branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1') 265 branch_1 = tf.concat( 266 [ 267 slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'), 268 slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0b_3x1') 269 ], 270 3) 271 with tf.variable_scope('Branch_2'): 272 branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1') 273 branch_2 = slim.conv2d(branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3') 274 branch_2 = tf.concat( 275 [ 276 slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'), 277 slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1') 278 ], 279 3) 280 with tf.variable_scope('Branch_3'): 281 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 282 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 283 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 284 end_points[end_point] = net 285 if end_point == final_endpoint: 286 return net, end_points 287 288 # mixed_10: 8 x 8 x 2048. 289 end_point = 'Mixed_7c' 290 with tf.variable_scope(end_point): 291 with tf.variable_scope('Branch_0'): 292 branch_0 = slim.conv2d(net, depth(320), [1, 1], scope='Conv2d_0a_1x1') 293 with tf.variable_scope('Branch_1'): 294 branch_1 = slim.conv2d(net, depth(384), [1, 1], scope='Conv2d_0a_1x1') 295 branch_1 = tf.concat( 296 [ 297 slim.conv2d(branch_1, depth(384), [1, 3], scope='Conv2d_0b_1x3'), 298 slim.conv2d(branch_1, depth(384), [3, 1], scope='Conv2d_0c_3x1') 299 ], 300 3) 301 with tf.variable_scope('Branch_2'): 302 branch_2 = slim.conv2d(net, depth(448), [1, 1], scope='Conv2d_0a_1x1') 303 branch_2 = slim.conv2d(branch_2, depth(384), [3, 3], scope='Conv2d_0b_3x3') 304 branch_2 = tf.concat( 305 [ 306 slim.conv2d(branch_2, depth(384), [1, 3], scope='Conv2d_0c_1x3'), 307 slim.conv2d(branch_2, depth(384), [3, 1], scope='Conv2d_0d_3x1') 308 ], 309 3) 310 with tf.variable_scope('Branch_3'): 311 branch_3 = slim.avg_pool2d(net, [3, 3], scope='AvgPool_0a_3x3') 312 branch_3 = slim.conv2d(branch_3, depth(192), [1, 1], scope='Conv2d_0b_1x1') 313 net = tf.concat([branch_0, branch_1, branch_2, branch_3], 3) 314 end_points[end_point] = net 315 if end_point == final_endpoint: 316 return net, end_points 317 raise ValueError('Unknown final endpoint %s' % final_endpoint) 318 319 320 # 源文件中该函数放在了后面也能调用,why?? 321 def _reduced_kernel_size_for_small_input(input_tensor, kernel_size): 322 ''' 323 Define kernel size which is automatically reduced for small input. 324 325 If the shape of the input images is unknown at graph construction time this 326 function assumes that the input images are is large enough. 327 328 ''' 329 shape = input_tensor.get_shape().as_list() # [?, 5, 5, 128] 330 if shape[1] is None or shape[2] is None: 331 kernel_size_out = kernel_size 332 else: 333 kernel_size_out = [min(shape[1], kernel_size[0]), min(shape[2], kernel_size[1])] 334 return kernel_size_out 335 336 337 def incepiton_v3(inputs, 338 num_classes=1000, 339 is_training=True, 340 dropout_keep_prob=0.8, 341 min_depth=16, 342 depth_multiplier=1.0, 343 prediction_fn=slim.softmax, 344 spatial_squeeze=True, 345 reuse=None, 346 scope='InceptionV3'): 347 if depth_multiplier <= 0: 348 raise ValueError('depth_multiplier is not greater than zero.') 349 depth = lambda d: max(int(d * depth_multiplier), min_depth) 350 351 with tf.variable_scope(scope, 'InceptionV3', [inputs, num_classes], reuse=reuse) as scope: 352 with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training): 353 net, end_points = inception_v3_base(inputs, scope=scope, min_depth=min_depth, depth_multiplier=depth_multiplier) 354 # Auxiliary Head logits 355 # 这一部分是作啥的?? 356 with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], 357 stride=1, 358 padding='SAME'): 359 aux_logits = end_points['Mixed_6e'] # mixed_7: 17 x 17 x 768 360 with tf.variable_scope('AuxLogits'): 361 # 5 × 5 × 768 362 aux_logits = slim.avg_pool2d(aux_logits, [5, 5], stride=3, padding='VALID', scope='AvgPool_1a_5x5') 363 # 5 × 5 × 128 364 aux_logits = slim.conv2d(aux_logits, depth(128), [1, 1], scope='Conv2d_1b_1x1') 365 366 # shape of feature map before the final layer. 367 kernel_size = _reduced_kernel_size_for_small_input(aux_logits, [5, 5]) 368 # 1 × 1 × 768 输入层大小与过滤器尺寸相同,按照公式计算就没问题 369 aux_logits = slim.conv2d(aux_logits, depth(768), kernel_size, padding='VALID', weights_initializer=trunc_normal(0.01), scope='Conv2d_2a_{}x{}'.format(*kernel_size)) 370 # 1 × 1 × 1000 371 aux_logits = slim.conv2d(aux_logits, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, weights_initializer=trunc_normal(0.001), scope='Conv2d_2b_1x1') 372 if spatial_squeeze: 373 # (?, 1000) 374 aux_logits = tf.squeeze(aux_logits, [1, 2], name='SpatialSqueeze') 375 end_points['AuxLogits'] = aux_logits 376 377 # final pooling and prediction 378 with tf.variable_scope('Logits'): 379 kernel_size = _reduced_kernel_size_for_small_input(net, [8, 8]) 380 # 1 × 1 × 2048 381 net = slim.avg_pool2d(net, kernel_size, padding='VALID', scope='AvgPool_1a_{}x{}'.format(*kernel_size)) 382 # 1 × 1 × 2048 383 # 这里竟然有一个dropout方法?? 384 net = slim.dropout(net, keep_prob=dropout_keep_prob, scope='Dropout_1b') 385 end_points['Predictions'] = net 386 slim.conv2d(net, num_classes, [1, 1], activation_fn=None, normalizer_fn=None, scope='Conv2d_1c_1x1') 387 if spatial_squeeze: 388 # (?, 2048) 389 logits = tf.squeeze(net, [1, 2], name='SpatialSqueeze') 390 end_points['Logits'] = logits 391 end_points['Predictions'] = slim.softmax(logits, scope='Predictions') 392 393 return logits, end_points 394 395 396 # 在迁移中,定义模型时会用到 397 def inception_v3_arg_scope(weight_decay=0.00004, 398 batch_norm_var_collection='moving_vars', 399 batch_norm_decay=0.9997, 400 batch_norm_epsilon=0.001, 401 updates_collections=tf.GraphKeys.UPDATE_OPS, 402 use_fused_batchnorm=True): 403 """Defines the default InceptionV3 arg scope. 404 Returns: 405 An `arg_scope` to use for the inception v3 model. 406 """ 407 batch_norm_params = { 408 # Decay for the moving averages. 409 'decay': batch_norm_decay, 410 # epsilon to prevent 0s in variance. 411 'epsilon': batch_norm_epsilon, 412 # collection containing update_ops. 413 'updates_collections': updates_collections, 414 # Use fused batch norm if possible. 415 'fused': use_fused_batchnorm, 416 # collection containing the moving mean and moving variance. 417 'variables_collections': { 418 'beta': None, 419 'gamma': None, 420 'moving_mean': [batch_norm_var_collection], 421 'moving_variance': [batch_norm_var_collection], 422 } 423 } 424 425 # Set weight_decay for weights in Conv and FC layers. 426 with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_regularizer=slim.l2_regularizer(weight_decay)): 427 with slim.arg_scope( 428 [slim.conv2d], 429 weights_initializer=slim.variance_scaling_initializer(), 430 activation_fn=tf.nn.relu, 431 normalizer_fn=slim.batch_norm, 432 normalizer_params=batch_norm_params) as sc: 433 return sc 434 435 436 inputs = tf.placeholder(tf.float32, shape=[None, 299, 299, 3], name='X') 437 # inception_v3_base(inputs) 438 incepiton_v3(inputs)
输入层为299*299*3的三维矩阵。
6.5 卷积神经网络迁移学习
迁移学习,就是将一个问题上训练好的模型经过简单的调整使其适用于一个新的问题。
利用 ImageNet 数据集上训练好的 Inception-v3 模型来解决一个新的图像分类问题 。能够保留训练好的 Inception-v3 模型中全部卷积层的参数,只是替换最后一层全链接层。在最后这一层全链接层以前的网络层称之为瓶颈层( bottleneck ) 。瓶颈层指的是一层。
通常来讲,在数据足够的状况下,迁移学习的效果不如彻底从新训练。
迁移学习处理
处理文件样例,须要在2核8g上才能执行
1 import os 2 import glob 3 import tensorflow as tf 4 import numpy as np 5 6 INPUT_DATA = '/home/yangxl/flower_photos' # 输入文件 7 OUTPUT_DATA = '/home/yangxl/flower_processed_data.npy' # 输出文件 8 9 VALIDATION_PERCENTAGE = 10 10 TEST_PERCENTAGE = 10 11 12 def create_image_lists(sess, testing_percentage, validation_percentage): 13 sub_dirs = [x[0] for x in os.walk(INPUT_DATA)] # 当前目录和子目录 14 # print(sub_dirs) 15 is_root_dir = True 16 17 # 初始化各个数据集 18 training_images = [] 19 training_labels = [] 20 testing_images = [] 21 testing_labels = [] 22 validation_images = [] 23 validation_labels = [] 24 current_labels = 0 25 26 # 读取全部子目录 27 for sub_dir in sub_dirs: 28 if is_root_dir: # 把第一个排除了 29 is_root_dir = False 30 continue 31 32 # 获取一个子目录中全部的图片文件 33 extensions = ['jpg', 'jpeg', 'JPG', 'JPEG'] 34 file_list = [] 35 dir_name = os.path.basename(sub_dir) # '/'最后面的部分 36 print(dir_name) 37 for extension in extensions: 38 file_glob = os.path.join(INPUT_DATA, dir_name, '*.' + extension) 39 file_list.extend(glob.glob(file_glob)) # glob.glob返回一个匹配该模式的列表, glob和os配合使用来操做文件 40 if not file_list: 41 continue 42 43 # 处理图片数据 44 for file_name in file_list: 45 image_raw_data = tf.gfile.GFile(file_name, 'rb').read() # 二进制数据 46 image = tf.image.decode_jpeg(image_raw_data) # tensor, dtype=uint8 333×500×3 色道0~255 47 if image.dtype != tf.float32: 48 image = tf.image.convert_image_dtype(image, dtype=tf.float32) # 色道值0~1 49 image = tf.image.resize_images(image, [299, 299]) 50 image_value = sess.run(image) # numpy.ndarray 51 52 # 随机划分数据集 53 chance = np.random.randint(100) 54 if chance < validation_percentage: 55 validation_images.append(image_value) 56 validation_labels.append(current_labels) 57 elif chance < validation_percentage + testing_percentage: 58 testing_images.append(image_value) 59 testing_labels.append(current_labels) 60 else: 61 training_images.append(image_value) 62 training_labels.append(current_labels) 63 current_labels += 1 64 65 # 将训练数据随机打乱以得到更好的训练效果, 将数据打乱,但仍保持training_images和training_labels的对应关系。 66 state = np.random.get_state() 67 np.random.shuffle(training_images) 68 np.random.set_state(state) 69 np.random.shuffle(training_labels) 70 71 print("it's time to return") 72 return np.asarray([training_images, training_labels, 73 validation_images, validation_labels, 74 testing_images, testing_labels]) 75 76 def main(): 77 with tf.Session() as sess: 78 processed_data = create_image_lists(sess, TEST_PERCENTAGE, VALIDATION_PERCENTAGE) 79 # 经过numpy格式保存处理后的数据 80 np.save(OUTPUT_DATA, processed_data) 81 82 if __name__ == '__main__': 83 main()
获取交叉熵更简便的方式:
tf.losses.softmax_cross_entropy(tf.one_hot(labels, N_CLASSES), logits, weights=1.0)
train_step = tf.train.RMSPropOptimizer(LEARNING_RATE).minimize(tf.losses.get_total_loss())
迁移学习示例,
1 #!coding:utf8 2 3 import tensorflow as tf 4 import numpy as np 5 import tensorflow.contrib.slim as slim 6 7 # 加载inception-v3模型 8 import tensorflow.contrib.slim.python.slim.nets.inception_v3 as inception_v3 9 10 INPUT_DATA = '/home/yangxl/files/flower_processed_data.npy' 11 12 TRAIN_FILE = '/home/yangxl/files/save_model' 13 CKPT_FILE = '/home/yangxl/files/inception_v3.ckpt' 14 15 LEARNING_RATE = 0.0001 16 STEPS = 300 17 BATCH = 32 18 N_CLASSES = 5 # 5种花 19 20 CHECKPOINT_EXCLUDE_SCOPES = 'InceptionV3/Logits,InceptionV3/AuxLogits' 21 TRAINABLE_SCOPES = 'InceptionV3/Logits,InceptionV3/AuxLogits' 22 23 # 获取全部须要从训练好的模型中加载的参数 24 def get_tuned_variables(): 25 exclusions = [scope.strip() for scope in CHECKPOINT_EXCLUDE_SCOPES.split(',')] 26 variables_to_restore = [] 27 28 # 过滤参数 29 for var in slim.get_model_variables(): # 先定义了inception-v3模型,以后才会有变量 30 excluded = False 31 for exclusion in exclusions: 32 if var.op.name.startswith(exclusion): 33 excluded = True 34 break 35 if not excluded: 36 variables_to_restore.append(var) 37 return variables_to_restore 38 39 # 获取全部须要训练的变量列表 40 def get_trainable_variables(): 41 scopes = [scope.strip() for scope in TRAINABLE_SCOPES.split(',')] 42 variables_to_train = [] 43 for scope in scopes: 44 variables = tf.get_collection(tf.GraphKeys.TRAINABLE_VARIABLES, scope) # 对scope进行正则匹配 45 variables_to_train.append(variables) 46 return variables_to_train 47 48 def main(arg=None): 49 processed_data = np.load(INPUT_DATA) 50 training_images = processed_data[0] 51 n_training_example = len(training_images) 52 training_labels = processed_data[1] 53 validation_images = processed_data[2] 54 validation_labels = processed_data[3] 55 testing_images = processed_data[4] 56 testing_labels = processed_data[5] 57 print('%d training examples, %s validation examples and %d tseting examples.' % (n_training_example, len(validation_labels), len(testing_labels))) 58 59 images = tf.placeholder(tf.float32, [None, 299, 299, 3], name='input_images') 60 labels = tf.placeholder(tf.int64, [None], name='labels') # 5种花 61 62 # 定义inception-v3模型,由于谷歌给出的只有模型参数取值,因此这里须要在这个代码中定义inception-v3的结构。 63 with slim.arg_scope(inception_v3.inception_v3_arg_scope()): 64 # inception_v3.inception_v3_arg_scope()是一个包含两个键的字典。嵌套的arg_scope函数返回的字典会整合到一块儿。 65 # inception_v3.inception_v3函数里的一些函数可能会使用字典中的参数。 66 logits, _ = inception_v3.inception_v3(images, num_classes=N_CLASSES) 67 68 # 获取须要训练的变量 69 trainable_variables = get_trainable_variables() 70 # print('==', len(trainable_variables), trainable_variables) 71 ''' 72 [[<tf.Variable 'InceptionV3/Logits/Conv2d_1c_1x1/weights:0' shape=(1, 1, 2048, 5) dtype=float32_ref>, 73 <tf.Variable 'InceptionV3/Logits/Conv2d_1c_1x1/biases:0' shape=(5,) dtype=float32_ref>], 74 [<tf.Variable 'InceptionV3/AuxLogits/Conv2d_1b_1x1/weights:0' shape=(1, 1, 768, 128) dtype=float32_ref>, 75 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_1b_1x1/BatchNorm/beta:0' shape=(128,) dtype=float32_ref>, 76 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2a_5x5/weights:0' shape=(5, 5, 128, 768) dtype=float32_ref>, 77 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2a_5x5/BatchNorm/beta:0' shape=(768,) dtype=float32_ref>, 78 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2b_1x1/weights:0' shape=(1, 1, 768, 5) dtype=float32_ref>, 79 <tf.Variable 'InceptionV3/AuxLogits/Conv2d_2b_1x1/biases:0' shape=(5,) dtype=float32_ref>]] 80 81 ''' 82 # 定义损失函数。在模型定义的时候已经将正则化损失加入损失集合了。 83 tf.losses.softmax_cross_entropy(tf.one_hot(labels, N_CLASSES), logits, weights=1.0) 84 85 # 定义训练过程 86 train_step = tf.train.RMSPropOptimizer(LEARNING_RATE).minimize(tf.losses.get_total_loss()) 87 88 # 计算正确率 89 with tf.name_scope('evaluation'): 90 correct_prediction = tf.equal(tf.argmax(logits, 1), labels) 91 evaluation_step = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 92 93 # 定义加载模型的函数。返回一个回调函数callback,执行callback(sess)就会加载get_tuned_variables()变量列表到当前图。 94 load_fn = slim.assign_from_checkpoint_fn(CKPT_FILE, get_tuned_variables(), ignore_missing_vars=True) 95 96 # 定义保存新的训练好的模型的函数 97 saver = tf.train.Saver() 98 99 with tf.Session() as sess: 100 # 初始化没有加载进来的变量。这个过程必定要在模型加载以前,不然初始化过程会将已经加载好的变量从新赋值。 101 tf.global_variables_initializer().run() 102 # 加载已经训练好的模型 103 print('Loading tuned variables from %s' % CKPT_FILE) 104 load_fn(sess) 105 106 start = 0 107 end = BATCH 108 for i in range(STEPS): 109 sess.run(train_step, feed_dict={ 110 images: training_images[start: end], 111 labels: training_labels[start: end] 112 }) 113 114 # 输出日志 115 if i % 30 == 0 or i + 1 == STEPS: 116 saver.save(sess, TRAIN_FILE, global_step=i) 117 validation_accuracy = sess.run(evaluation_step, feed_dict={ 118 images: validation_images, labels: validation_labels 119 }) 120 print('Step %d: Validation accuracy = %.1f%%' % (i, validation_accuracy * 100.0)) 121 122 start = end 123 if start == n_training_example: 124 start = 0 125 end = start + BATCH 126 if end > n_training_example: 127 end = n_training_example 128 test_accuracy = sess.run(evaluation_step, feed_dict={ 129 images: testing_images, labels: testing_labels 130 }) 131 print('Final test accuracy = %.1f%%' % (test_accuracy * 100.0)) 132 133 if __name__ == '__main__': 134 tf.app.run()
执行过程:
代码执行了12个小时,可是top命令中的TIME+显示只有300多分钟,why??
执行过程当中,`load average`至关高,可是进程的CPU、MEM使用率很低,多是CPU执行了内存和swap之间的调度,really??