Google在TensorFlow1.0,以后推出了一个叫slim的库,TF-slim是TensorFlow的一个新的轻量级的高级API接口。这个模块是在16年新推出的,其主要目的是来作所谓的“代码瘦身”。它相似咱们在TensorFlow模块中所介绍的tf.contrib.lyers模块,将不少常见的TensorFlow函数进行了二次封装,使得代码变得更加简洁,特别适用于构建复杂结构的深度神经网络,它能够用了定义、训练、和评估复杂的模型。html
这里咱们为何要过来介绍这一节的内容呢?主要是由于TensorFlow的models模块里提供了大量用slim写好的网络模型结构代码,以及用该代码训练出来的模型检查点文件,能够做为咱们预训练模型来使用。所以咱们须要会使用slim库。python
为了可以使用models中的代码,须要先验证下咱们的TensorFlow版本是否集成了slim模块。接着从GitHub上将models代码下载下来:linux
在使用slim以前,要测试本地的tf.contrib.slim模块是否有效,在命令行中输入以下命令:git
python -c "import tensorflow.contrib.slim as slim; eval = slim.evaluation.evaluate_once"
若是没有任何错误,则代表TF-Slim是能够工做的。github
To use TF-Slim for image classification, you also have to install the TF-Slim image models library, which is not part of the core TF library. To do this, check out the tensorflow/models repository as follows:shell
cd $HOME/workspace
git clone https://github.com/tensorflow/models/
This will put the TF-Slim image models library in $HOME/workspace/models/research/slim
. (It will also create a directory calledmodels/inception, which contains an older version of slim; you can safely ignore this.)express
To verify that this has worked, execute the following commands; it should run without raising any errors.apache
cd $HOME/workspace/models/research/slim python -c "from nets import cifarnet; mynet = cifarnet.cifarnet"
我使用的是window操做系统,我直接从https://github.com/tensorflow/models/网址下载了该模块:windows
slim位于\models-master\research\slim路径下,一共有5个文件夹:api
在这里重点介绍datasets,nets,preprocessing三个文件夹。
datasets里面存放着经常使用的图片训练数据集相关的代码。主要支持的数据集有cifar十、flowers、mnist、imagenet。
代码文件的名称和数据集相对应,可使用这些代码下载或获取数据集中的数据。以imagenet为例,可使用以下函数从网上获取imagenet标签。
imagenet_map = imagenet.create_readable_names_for_imagenet_labels()
上面代码返回的是imagenet中1000个类的分类标签名字(与样本序列对应)。
该文件夹下面包含各类网络模块:
每一个网络模型文件都是以本身的名字命名的,并且里面的代码结构框架也大体相同,以inception_resnet_v2为例:
# Copyright 2016 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== """Contains the definition of the Inception Resnet V2 architecture. As described in http://arxiv.org/abs/1602.07261. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alex Alemi """ from __future__ import absolute_import from __future__ import division from __future__ import print_function import tensorflow as tf slim = tf.contrib.slim def block35(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None): """Builds the 35x35 resnet block.""" with tf.variable_scope(scope, 'Block35', [net], reuse=reuse): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 32, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 32, 3, scope='Conv2d_0b_3x3') with tf.variable_scope('Branch_2'): tower_conv2_0 = slim.conv2d(net, 32, 1, scope='Conv2d_0a_1x1') tower_conv2_1 = slim.conv2d(tower_conv2_0, 48, 3, scope='Conv2d_0b_3x3') tower_conv2_2 = slim.conv2d(tower_conv2_1, 64, 3, scope='Conv2d_0c_3x3') mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_1, tower_conv2_2]) up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None, activation_fn=None, scope='Conv2d_1x1') scaled_up = up * scale if activation_fn == tf.nn.relu6: # Use clip_by_value to simulate bandpass activation. scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0) net += scaled_up if activation_fn: net = activation_fn(net) return net def block17(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None): """Builds the 17x17 resnet block.""" with tf.variable_scope(scope, 'Block17', [net], reuse=reuse): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 128, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 160, [1, 7], scope='Conv2d_0b_1x7') tower_conv1_2 = slim.conv2d(tower_conv1_1, 192, [7, 1], scope='Conv2d_0c_7x1') mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2]) up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None, activation_fn=None, scope='Conv2d_1x1') scaled_up = up * scale if activation_fn == tf.nn.relu6: # Use clip_by_value to simulate bandpass activation. scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0) net += scaled_up if activation_fn: net = activation_fn(net) return net def block8(net, scale=1.0, activation_fn=tf.nn.relu, scope=None, reuse=None): """Builds the 8x8 resnet block.""" with tf.variable_scope(scope, 'Block8', [net], reuse=reuse): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 192, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 192, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 224, [1, 3], scope='Conv2d_0b_1x3') tower_conv1_2 = slim.conv2d(tower_conv1_1, 256, [3, 1], scope='Conv2d_0c_3x1') mixed = tf.concat(axis=3, values=[tower_conv, tower_conv1_2]) up = slim.conv2d(mixed, net.get_shape()[3], 1, normalizer_fn=None, activation_fn=None, scope='Conv2d_1x1') scaled_up = up * scale if activation_fn == tf.nn.relu6: # Use clip_by_value to simulate bandpass activation. scaled_up = tf.clip_by_value(scaled_up, -6.0, 6.0) net += scaled_up if activation_fn: net = activation_fn(net) return net def inception_resnet_v2_base(inputs, final_endpoint='Conv2d_7b_1x1', output_stride=16, align_feature_maps=False, scope=None, activation_fn=tf.nn.relu): """Inception model from http://arxiv.org/abs/1602.07261. Constructs an Inception Resnet v2 network from inputs to the given final endpoint. This method can construct the network up to the final inception block Conv2d_7b_1x1. Args: inputs: a tensor of size [batch_size, height, width, channels]. final_endpoint: specifies the endpoint to construct the network up to. It can be one of ['Conv2d_1a_3x3', 'Conv2d_2a_3x3', 'Conv2d_2b_3x3', 'MaxPool_3a_3x3', 'Conv2d_3b_1x1', 'Conv2d_4a_3x3', 'MaxPool_5a_3x3', 'Mixed_5b', 'Mixed_6a', 'PreAuxLogits', 'Mixed_7a', 'Conv2d_7b_1x1'] output_stride: A scalar that specifies the requested ratio of input to output spatial resolution. Only supports 8 and 16. align_feature_maps: When true, changes all the VALID paddings in the network to SAME padding so that the feature maps are aligned. scope: Optional variable_scope. activation_fn: Activation function for block scopes. Returns: tensor_out: output tensor corresponding to the final_endpoint. end_points: a set of activations for external use, for example summaries or losses. Raises: ValueError: if final_endpoint is not set to one of the predefined values, or if the output_stride is not 8 or 16, or if the output_stride is 8 and we request an end point after 'PreAuxLogits'. """ if output_stride != 8 and output_stride != 16: raise ValueError('output_stride must be 8 or 16.') padding = 'SAME' if align_feature_maps else 'VALID' end_points = {} def add_and_check_final(name, net): end_points[name] = net return name == final_endpoint with tf.variable_scope(scope, 'InceptionResnetV2', [inputs]): with slim.arg_scope([slim.conv2d, slim.max_pool2d, slim.avg_pool2d], stride=1, padding='SAME'): # 149 x 149 x 32 net = slim.conv2d(inputs, 32, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') if add_and_check_final('Conv2d_1a_3x3', net): return net, end_points # 147 x 147 x 32 net = slim.conv2d(net, 32, 3, padding=padding, scope='Conv2d_2a_3x3') if add_and_check_final('Conv2d_2a_3x3', net): return net, end_points # 147 x 147 x 64 net = slim.conv2d(net, 64, 3, scope='Conv2d_2b_3x3') if add_and_check_final('Conv2d_2b_3x3', net): return net, end_points # 73 x 73 x 64 net = slim.max_pool2d(net, 3, stride=2, padding=padding, scope='MaxPool_3a_3x3') if add_and_check_final('MaxPool_3a_3x3', net): return net, end_points # 73 x 73 x 80 net = slim.conv2d(net, 80, 1, padding=padding, scope='Conv2d_3b_1x1') if add_and_check_final('Conv2d_3b_1x1', net): return net, end_points # 71 x 71 x 192 net = slim.conv2d(net, 192, 3, padding=padding, scope='Conv2d_4a_3x3') if add_and_check_final('Conv2d_4a_3x3', net): return net, end_points # 35 x 35 x 192 net = slim.max_pool2d(net, 3, stride=2, padding=padding, scope='MaxPool_5a_3x3') if add_and_check_final('MaxPool_5a_3x3', net): return net, end_points # 35 x 35 x 320 with tf.variable_scope('Mixed_5b'): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 96, 1, scope='Conv2d_1x1') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 48, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 64, 5, scope='Conv2d_0b_5x5') with tf.variable_scope('Branch_2'): tower_conv2_0 = slim.conv2d(net, 64, 1, scope='Conv2d_0a_1x1') tower_conv2_1 = slim.conv2d(tower_conv2_0, 96, 3, scope='Conv2d_0b_3x3') tower_conv2_2 = slim.conv2d(tower_conv2_1, 96, 3, scope='Conv2d_0c_3x3') with tf.variable_scope('Branch_3'): tower_pool = slim.avg_pool2d(net, 3, stride=1, padding='SAME', scope='AvgPool_0a_3x3') tower_pool_1 = slim.conv2d(tower_pool, 64, 1, scope='Conv2d_0b_1x1') net = tf.concat( [tower_conv, tower_conv1_1, tower_conv2_2, tower_pool_1], 3) if add_and_check_final('Mixed_5b', net): return net, end_points # TODO(alemi): Register intermediate endpoints net = slim.repeat(net, 10, block35, scale=0.17, activation_fn=activation_fn) # 17 x 17 x 1088 if output_stride == 8, # 33 x 33 x 1088 if output_stride == 16 use_atrous = output_stride == 8 with tf.variable_scope('Mixed_6a'): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 384, 3, stride=1 if use_atrous else 2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_1'): tower_conv1_0 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1_0, 256, 3, scope='Conv2d_0b_3x3') tower_conv1_2 = slim.conv2d(tower_conv1_1, 384, 3, stride=1 if use_atrous else 2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_2'): tower_pool = slim.max_pool2d(net, 3, stride=1 if use_atrous else 2, padding=padding, scope='MaxPool_1a_3x3') net = tf.concat([tower_conv, tower_conv1_2, tower_pool], 3) if add_and_check_final('Mixed_6a', net): return net, end_points # TODO(alemi): register intermediate endpoints with slim.arg_scope([slim.conv2d], rate=2 if use_atrous else 1): net = slim.repeat(net, 20, block17, scale=0.10, activation_fn=activation_fn) if add_and_check_final('PreAuxLogits', net): return net, end_points if output_stride == 8: # TODO(gpapan): Properly support output_stride for the rest of the net. raise ValueError('output_stride==8 is only supported up to the ' 'PreAuxlogits end_point for now.') # 8 x 8 x 2080 with tf.variable_scope('Mixed_7a'): with tf.variable_scope('Branch_0'): tower_conv = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv_1 = slim.conv2d(tower_conv, 384, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_1'): tower_conv1 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv1_1 = slim.conv2d(tower_conv1, 288, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_2'): tower_conv2 = slim.conv2d(net, 256, 1, scope='Conv2d_0a_1x1') tower_conv2_1 = slim.conv2d(tower_conv2, 288, 3, scope='Conv2d_0b_3x3') tower_conv2_2 = slim.conv2d(tower_conv2_1, 320, 3, stride=2, padding=padding, scope='Conv2d_1a_3x3') with tf.variable_scope('Branch_3'): tower_pool = slim.max_pool2d(net, 3, stride=2, padding=padding, scope='MaxPool_1a_3x3') net = tf.concat( [tower_conv_1, tower_conv1_1, tower_conv2_2, tower_pool], 3) if add_and_check_final('Mixed_7a', net): return net, end_points # TODO(alemi): register intermediate endpoints net = slim.repeat(net, 9, block8, scale=0.20, activation_fn=activation_fn) net = block8(net, activation_fn=None) # 8 x 8 x 1536 net = slim.conv2d(net, 1536, 1, scope='Conv2d_7b_1x1') if add_and_check_final('Conv2d_7b_1x1', net): return net, end_points raise ValueError('final_endpoint (%s) not recognized', final_endpoint) def inception_resnet_v2(inputs, num_classes=1001, is_training=True, dropout_keep_prob=0.8, reuse=None, scope='InceptionResnetV2', create_aux_logits=True, activation_fn=tf.nn.relu): """Creates the Inception Resnet V2 model. Args: inputs: a 4-D tensor of size [batch_size, height, width, 3]. Dimension batch_size may be undefined. If create_aux_logits is false, also height and width may be undefined. num_classes: number of predicted classes. If 0 or None, the logits layer is omitted and the input features to the logits layer (before dropout) are returned instead. is_training: whether is training or not. dropout_keep_prob: float, the fraction to keep before final layer. reuse: whether or not the network and its variables should be reused. To be able to reuse 'scope' must be given. scope: Optional variable_scope. create_aux_logits: Whether to include the auxilliary logits. activation_fn: Activation function for conv2d. Returns: net: the output of the logits layer (if num_classes is a non-zero integer), or the non-dropped-out input to the logits layer (if num_classes is 0 or None). end_points: the set of end_points from the inception model. """ end_points = {} with tf.variable_scope(scope, 'InceptionResnetV2', [inputs], reuse=reuse) as scope: with slim.arg_scope([slim.batch_norm, slim.dropout], is_training=is_training): net, end_points = inception_resnet_v2_base(inputs, scope=scope, activation_fn=activation_fn) if create_aux_logits and num_classes: with tf.variable_scope('AuxLogits'): aux = end_points['PreAuxLogits'] aux = slim.avg_pool2d(aux, 5, stride=3, padding='VALID', scope='Conv2d_1a_3x3') aux = slim.conv2d(aux, 128, 1, scope='Conv2d_1b_1x1') aux = slim.conv2d(aux, 768, aux.get_shape()[1:3], padding='VALID', scope='Conv2d_2a_5x5') aux = slim.flatten(aux) aux = slim.fully_connected(aux, num_classes, activation_fn=None, scope='Logits') end_points['AuxLogits'] = aux with tf.variable_scope('Logits'): # TODO(sguada,arnoegw): Consider adding a parameter global_pool which # can be set to False to disable pooling here (as in resnet_*()). kernel_size = net.get_shape()[1:3] if kernel_size.is_fully_defined(): net = slim.avg_pool2d(net, kernel_size, padding='VALID', scope='AvgPool_1a_8x8') else: net = tf.reduce_mean(net, [1, 2], keep_dims=True, name='global_pool') end_points['global_pool'] = net if not num_classes: return net, end_points net = slim.flatten(net) net = slim.dropout(net, dropout_keep_prob, is_training=is_training, scope='Dropout') end_points['PreLogitsFlatten'] = net logits = slim.fully_connected(net, num_classes, activation_fn=None, scope='Logits') end_points['Logits'] = logits end_points['Predictions'] = tf.nn.softmax(logits, name='Predictions') return logits, end_points inception_resnet_v2.default_image_size = 299 def inception_resnet_v2_arg_scope(weight_decay=0.00004, batch_norm_decay=0.9997, batch_norm_epsilon=0.001, activation_fn=tf.nn.relu): """Returns the scope with the default parameters for inception_resnet_v2. Args: weight_decay: the weight decay for weights variables. batch_norm_decay: decay for the moving average of batch_norm momentums. batch_norm_epsilon: small float added to variance to avoid dividing by zero. activation_fn: Activation function for conv2d. Returns: a arg_scope with the parameters needed for inception_resnet_v2. """ # Set weight_decay for weights in conv2d and fully_connected layers. with slim.arg_scope([slim.conv2d, slim.fully_connected], weights_regularizer=slim.l2_regularizer(weight_decay), biases_regularizer=slim.l2_regularizer(weight_decay)): batch_norm_params = { 'decay': batch_norm_decay, 'epsilon': batch_norm_epsilon, 'fused': None, # Use fused batch norm if possible. } # Set activation_fn and parameters for batch_norm. with slim.arg_scope([slim.conv2d], activation_fn=activation_fn, normalizer_fn=slim.batch_norm, normalizer_params=batch_norm_params) as scope: return scope
该网络的框架接口以下:
该模块代码包含几个图片预处理文件,命名也是按照模型的名字来命名的。slim会把某一类模型经常使用的预处理函数放到一个文件里,并命名该类模型相关的名字,并且每一个代码文件函数结构也大体类似。例如调用inception_preprocessing函数中的代码以下:
inception_preprocessing.preprocess_image
该函数是将传入的图片转换成模型尺寸并归一化处理。
As part of this library, we've included scripts to download several popular image datasets (listed below) and convert them to slim format.
TFRecord是TensorFlow推荐的数据集格式,与TensorFlow框架结合紧密。在TensorFlow中提供了一系列接口能够访问TFRecord格式,该结构存在的意义主要是为了知足在处理海量样本集时,须要边执行训练边从硬盘上读取数据的需求。将原始文件转换成TFRecord的格式,而后在运行中经过多线程的方式来读取,这样能够减小主线程训练的负担,使得训练过程变得更高效。关于TFRecord格式详情能够参考文章
第十二节,TensorFlow读取数据的几种方法以及队列的使用。
For each dataset, we'll need to download the raw data and convert it to TensorFlow's native TFRecord format. Each TFRecord contains a TF-Example protocol buffer. Below we demonstrate how to do this for the Flowers dataset.
$ DATA_DIR=/tmp/data/flowers $ python download_and_convert_data.py \ --dataset_name=flowers \ --dataset_dir="${DATA_DIR}"
这里有两个关键点:一个是数据集(例子中的flowers),另外一个是下载路径(这里是存放在/tmp/data/flowers下的)
When the script finishes you will find several TFRecord files created:
These represent the training and validation data, sharded over 5 files each. You will also find the $DATA_DIR/labels.txt
file which contains the mapping from integer labels to class names.
You can use the same script to create the mnist and cifar10 datasets. However, for ImageNet, you have to follow the instructionshere. Note that you first have to sign up for an account at image-net.org. Also, the download can take several hours, and could use up to 500GB.
在这里我详细介绍一下执行的代码,咱们打开download_and_convert_data.py 文件,代码内容以下:
# Copyright 2016 The TensorFlow Authors. All Rights Reserved. # # Licensed under the Apache License, Version 2.0 (the "License"); # you may not use this file except in compliance with the License. # You may obtain a copy of the License at # # http://www.apache.org/licenses/LICENSE-2.0 # # Unless required by applicable law or agreed to in writing, software # distributed under the License is distributed on an "AS IS" BASIS, # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. # See the License for the specific language governing permissions and # limitations under the License. # ============================================================================== r"""Downloads and converts a particular dataset. Usage: ```shell $ python download_and_convert_data.py \ --dataset_name=mnist \ --dataset_dir=/tmp/mnist $ python download_and_convert_data.py \ --dataset_name=cifar10 \ --dataset_dir=/tmp/cifar10 $ python download_and_convert_data.py \ --dataset_name=flowers \ --dataset_dir=/tmp/flowers ``` """ from __future__ import absolute_import from __future__ import division from __future__ import print_function import tensorflow as tf from datasets import download_and_convert_cifar10 from datasets import download_and_convert_flowers from datasets import download_and_convert_mnist FLAGS = tf.app.flags.FLAGS tf.app.flags.DEFINE_string( 'dataset_name', None, 'The name of the dataset to convert, one of "cifar10", "flowers", "mnist".') tf.app.flags.DEFINE_string( 'dataset_dir', None, 'The directory where the output TFRecords and temporary files are saved.') def main(_): if not FLAGS.dataset_name: raise ValueError('You must supply the dataset name with --dataset_name') if not FLAGS.dataset_dir: raise ValueError('You must supply the dataset directory with --dataset_dir') if FLAGS.dataset_name == 'cifar10': download_and_convert_cifar10.run(FLAGS.dataset_dir) elif FLAGS.dataset_name == 'flowers': download_and_convert_flowers.run(FLAGS.dataset_dir) elif FLAGS.dataset_name == 'mnist': download_and_convert_mnist.run(FLAGS.dataset_dir) else: raise ValueError( 'dataset_name [%s] was not recognized.' % FLAGS.dataset_name) if __name__ == '__main__': tf.app.run()
download_and_convert_flowers.run函数位于download_and_convert_flowers.py文件下,run()函数代码以下:
def run(dataset_dir): """Runs the download and conversion operation. Args: dataset_dir: The dataset directory where the dataset is stored. """ if not tf.gfile.Exists(dataset_dir): tf.gfile.MakeDirs(dataset_dir) if _dataset_exists(dataset_dir): print('Dataset files already exist. Exiting without re-creating them.') return dataset_utils.download_and_uncompress_tarball(_DATA_URL, dataset_dir) photo_filenames, class_names = _get_filenames_and_classes(dataset_dir) class_names_to_ids = dict(zip(class_names, range(len(class_names)))) # Divide into train and test: random.seed(_RANDOM_SEED) random.shuffle(photo_filenames) training_filenames = photo_filenames[_NUM_VALIDATION:] validation_filenames = photo_filenames[:_NUM_VALIDATION] # First, convert the training and validation sets. _convert_dataset('train', training_filenames, class_names_to_ids, dataset_dir) _convert_dataset('validation', validation_filenames, class_names_to_ids, dataset_dir) # Finally, write the labels file: labels_to_class_names = dict(zip(range(len(class_names)), class_names)) dataset_utils.write_label_file(labels_to_class_names, dataset_dir) _clean_up_temporary_files(dataset_dir) print('\nFinished converting the Flowers dataset!')
在这里只粗略的解释一下代码的执行流程:
def image_to_tfexample(image_data, image_format, height, width, class_id): return tf.train.Example(features=tf.train.Features(feature={ 'image/encoded': bytes_feature(image_data), 'image/format': bytes_feature(image_format), 'image/class/label': int64_feature(class_id), 'image/height': int64_feature(height), 'image/width': int64_feature(width), }))
咱们已经建立好了TFRecord文件,下面就能够读取文件中的数据了。
# -*- coding: utf-8 -*- """ Created on Fri Jun 8 08:52:30 2018 @author: zy """ ''' 导入flowers数据集 ''' from datasets import download_and_convert_flowers from preprocessing import vgg_preprocessing from datasets import flowers import tensorflow as tf slim = tf.contrib.slim def read_flower_image_and_label(dataset_dir,is_training=False): ''' 下载flower_photos.tgz数据集 切分训练集和验证集 并将数据转换成TFRecord格式 5个训练数据文件(3320),5个验证数据文件(350),还有一个标签文件(存放每一个数字标签对应的类名) args: dataset_dir:数据集所在的目录 is_training:设置为TRue,表示加载训练数据集,不然加载验证集 return: image,label:返回随机读取的一张图片,和对应的标签 ''' download_and_convert_flowers.run(dataset_dir) ''' 利用slim读取TFRecord中的数据 ''' #选择数据集train if is_training: dataset = flowers.get_split(split_name = 'train',dataset_dir=dataset_dir) else: dataset = flowers.get_split(split_name = 'validation',dataset_dir=dataset_dir) #建立一个数据provider provider = slim.dataset_data_provider.DatasetDataProvider(dataset) #经过provider的get随机获取一条样本数据 返回的是两个张量 [image,label] = provider.get(['image','label']) return image,label
上面代码中,先引入头文件,而后建立provider,经过get来获取image与label两个张量。这是并无真的读取到数据,只是构建图的过程,具体数据须要经过session启动队列线程后才能够。
下面咱们启动session读取数据。
if __name__ == '__main__': #test() #读取一张图片,以及对应的标签 image,label = read_flower_image_and_label('./datasets/data/flowers') ''' 启动session,读取数据 ''' with tf.Session() as sess: sess.run(tf.global_variables_initializer()) #建立一个协调器,管理线程 coord = tf.train.Coordinator() #启动QueueRunner, 此时文件名才开始进队。 threads=tf.train.start_queue_runners(sess=sess,coord=coord) img, lab = sess.run([image, label]) plt.imshow(img) plt.title('Original image') plt.show() #终止线程 coord.request_stop() coord.join(threads)
若是咱们想一次读取多张图片怎么办?
TFRecord格式每一行样本定义为:
def image_to_tfexample(image_data, image_format, height, width, class_id): return tf.train.Example(features=tf.train.Features(feature={ 'image/encoded': bytes_feature(image_data), 'image/format': bytes_feature(image_format), 'image/class/label': int64_feature(class_id), 'image/height': int64_feature(height), 'image/width': int64_feature(width), }))
假设咱们训练时要从生成的5个TFRecord文件中读取数据,而后组合成batch。
keys_to_features = { 'image/encoded': tf.FixedLenFeature((), tf.string, default_value=''), 'image/format': tf.FixedLenFeature((), tf.string, default_value='png'), 'image/class/label': tf.FixedLenFeature( [], tf.int64, default_value=tf.zeros([], dtype=tf.int64)), }
items_to_handlers = { 'image': slim.tfexample_decoder.Image('image/encoded','image/format'), 'label': slim.tfexample_decoder.Tensor('image/class/label'), }
decoder = slim.tfexample_decoder.TFExampleDecoder(
keys_to_features, items_to_handlers)
dataset = slim.dataset.Dataset( data_sources=file_pattern, reader=tf.TFRecordReader, decoder=decoder, num_samples=SPLITS_TO_SIZES[split_name],#训练数据的总数 items_to_descriptions=_ITEMS_TO_DESCRIPTIONS, num_classes=_NUM_CLASSES, labels_to_names=labels_to_names #字典形式,格式为:id:class_call, )
provider = slim.dataset_data_provider.DatasetDataProvider( dataset, num_readers=FLAGS.num_readers, common_queue_capacity=20 * FLAGS.batch_size, common_queue_min=10 * FLAGS.batch_size)
[image, label] = provider.get(['image', 'label']) # 图像预处理 image = preprocessing_image(image, train_image_size, train_image_size) images, labels = tf.train.batch( [image, label], batch_size=FLAGS.batch_size, num_threads=FLAGS.num_preprocessing_threads, capacity=5 * FLAGS.batch_size) labels = slim.one_hot_encoding( labels, dataset.num_classes - FLAGS.labels_offset)
因为DatasetDataProvider读取到的一个样本就是随机的,所以在后面获取批量数据的时候再也不使用tf.train.shuffle_batch函数。一次读取batch_size个样本的代码以下:
def get_batch_images_and_label(dataset_dir,batch_size,num_classes,is_training=False,output_height=224, output_width=224,num_threads=10): ''' 每次取出batch_size个样本 注意:这里预处理调用的是slim库图片预处理的函数,例如:若是你使用的vgg网络,就调用vgg网络的图像预处理函数 若是你使用的是本身定义的网络,则能够本身写适合本身图像的预处理函数,好比归一化处理也可使用其余网络已经写好的预处理函数 args: dataset_dir:数据集所在的目录 batch_size:一次取出的样本数量 num_classes:输出的类别 用于对标签one_hot编码 is_training:设置为TRue,表示加载训练数据集,不然加载验证集 output_height:输出图片高度 output_width:输出图片宽 return: images,labels:返回随机读取的batch_size张图片,和对应的标签one_hot编码 ''' #获取单张图像和标签 image,label = read_flower_image_and_label(dataset_dir,is_training) # 图像预处理 这里要求图片数据是tf.float32类型的 image = vgg_preprocessing.preprocess_image(image, output_height, output_width,is_training=is_training) #缩放处理 #image = tf.image.convert_image_dtype(image, dtype=tf.float32) #image = tf.image.resize_image_with_crop_or_pad(image, output_height, output_width) # shuffle_batch 函数会将数据顺序打乱 # bacth 函数不会将数据顺序打乱 images, labels = tf.train.batch( [image, label], batch_size = batch_size, capacity=5 * batch_size, num_threads = num_threads) #one-hot编码 labels = slim.one_hot_encoding(labels,num_classes) return images,labels
至此,就可使用images做为神经网络的输入,使用labels计算损失函数等操做。
slim模块共享了模型的训练代码,使用者再也不须要关注模型代码,只需经过命令行方式便可完成训练、微调、测试等任务。
对于linux用户,在slim的scripts文件夹下还提供了模型下载、训练、预训练、微调、测试等一条龙的完整shell脚本,若是你是windows,也能够在命令行下一条一条地复制命令并执行。
训练模型的代码被放在slim下的train_image_classifier.py文件里,在该文件所在路径下,这里使用flower数据集来训练Inception_v3网络模型。在命令行下执行:
python train_image_classifier.py --train_dir=./log/train_logs --dataset_name=flowers --dataset_split_name=train --dataset_dir=./datasets/data/flowers --model_name=inception_v3
预训练是在别人训练好的模型上进行二次训练,以获得本身想要的模型。能够帮你省去大量的时间。一些高质量的模型都是经过了大量的数据样本训练而来。Github上提供了不少训练好的模型(在Imagenet数据集),能够在https://github.com/tensorflow/models/tree/master/research/slim/#Pretrained中下载。
Neural nets work best when they have many parameters, making them powerful function approximators. However, this means they must be trained on very large datasets. Because training models from scratch can be a very computationally intensive process requiring days or even weeks, we provide various pre-trained models, as listed below. These CNNs have been trained on the ILSVRC-2012-CLS image classification dataset.
In the table below, we list each model, the corresponding TensorFlow model file, the link to the model checkpoint, and the top 1 and top 5 accuracy (on the imagenet test set). Note that the VGG and ResNet V1 parameters have been converted from their original caffe formats (here and here), whereas the Inception and ResNet V2 parameters have been trained internally at Google. Also be aware that these accuracies were computed by evaluating using a single image crop. Some academic papers report higher accuracy by using multiple crops at multiple scales.
下载完预训练模型后,只要在上一节命令中添加一个参数checkpoint_path便可。
--checkpoint_path = 模型路径
checkpoint_path 里的模型是用于预训练模型的参数初始化,在训练过程当中不会改变,新产生的模型会被保存在--train_dir路径下。
注意:预训练时使用的样本必须与原来的输入尺寸和输出的分类个数一致。这些下载的模型都是分红1000类的,若是你不想分这么多类,可使用下面的微调方法。
上述的预训练模型都是在imagenet上训练的,最终输出的是1000个分类,若是咱们想使用预训练模型训练本身的数据集,就要微调了。
在微调的过程当中,须要将原有模型中的最后一层去掉,换成本身的数据集对应的分类层,例如咱们要训练flowers数据集,就须要将1000个输出换成10个输出。
具体作法以下:
举例:使用inception_v3的模型进行微调,使其能够训练flowers数据集。将下载好的模型inception_v3.ckpt解压后放在当前目录文件夹inception_v3下,经过cmd进入命令行来到slim文件下,运行命令:
python train_image_classifier.py --train_dir=./log/in3--dataset_dir=./datasets/data/flowers--dataset_name=flowers --dataset_split_name=train --model_name=inception_v3 --checkpoint_path=./inception_v3/inception_v3.ckpt--checkpoint_exclude_scopes=InceptionV3/Logits,InceptionV3/AuxLogits --trainable_scopes=InceptionV3/Logits,InceptionV3/AuxLogits
在例子中,--checkpoint_path里的模型会被载入,将权重初始化成模型里的参数,同时--checkpoint_exclude_scopes限制了最后一层没有被初始化成模型里的参数。--trainable_scopes指定了只需训练最后新加的一层,这样在训练过程当中被冻结的其它参数具备原来模型训练好的合适值,而新加入的一层则经过迭代在不断的优化本身的参数。
在微调过程当中,还能够经过在上面命令中加入:
--max_number_of_steps=500
来指定训练步数。若是没有指定训练步数,默认会一致训练下去。更多的参数,能够去看train_image_classifier.py源码。另外Script中还有使用模型来识别图片的例子。
To evaluate the performance of a model (whether pretrained or your own), you can use the eval_image_classifier.py script, as shown below.
Below we give an example of downloading the pretrained inception model and evaluating it on the imagenet dataset.
python eval_image_classifier.py --alsologtostderr --checkpoint_path=./log/in3/model.ckpt
--dataset_dir=./datasets/data/flowers
--dataset_name=flowers
--dataset_split_name=validation --model_name=inception_v3
指定的./log/in3/model.ckpt,为在微调中训练出来的模型文件。
训练好的模型能够被打包到各个平台上使用,不管是iso,Android仍是linux。具体是经过一个bazel开源工具实现的。详情参考:https://github.com/tensorflow/models/tree/master/research/slim/#Export
参考文章