TensorFlow-Slim使用方法说明

翻译自:https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/slimhtml

 

TensorFlow-Slim

 

TF-Slim是Tensorflow中一个轻量级的库,用于定义、训练和评估复杂的模型。TF-Slim中的组件能够与Tensorflow中原生的函数一块儿使用,与其余的框架,好比与tf.contrib.learn也能够一块儿使用。python

 

Usage使用方法

import tensorflow.contrib.slim as slim

Why TF-Slim?

       TF-Slim可使创建、训练和评估神经网络更加简单。git

l  容许用户经过减小模板代码使得模型更加简洁。这个能够经过使用argument scoping和大量的高层layers、variables来实现;github

l  经过使用经常使用的正则化( regularizers)使得创建模型更加简单;算法

l  一些普遍使用的计算机视觉相关的模型(好比VGG,AlexNet)已经在slim中定义好了,用户能够很方便的使用;这些既能够当成黑盒使用,也能够被扩展使用,好比添加一些“multiple heads”到不一样的内部的层;api

l  Slim使得扩展复杂模型变得容易,可使用已经存在的模型的checkpoints来开始训练算法。安全

 

What are the various components of TF-Slim?

TF-Slim由几个独立存在的组件组成,主要包括如下几个:网络

arg_scope:提供一个新的做用域(scope),称为arg_scope,在该做用域(scope)中,用户能够定义一些默认的参数,用于特定的操做;session

data:包含TF-Slim的dataset定义,data providersparallel_reader,和 decoding utilities;app

evaluation:包含用于模型评估的常规函数;

layers:包含用于创建模型的高级layers;

learning:包含一些用于训练模型的常规函数;

losses:包含一些用于loss function的函数;

metrics:包含一些热门的评价标准;

nets:包含一些热门的网络定义,如VGG,AlexNet等模型;

queues:提供一个内容管理者,使得能够很容易、很安全地启动和关闭QueueRunners;

regularizers:包含权重正则化;

variables:提供一个方便的封装,用于变量建立和使用。

 

Defining Models

       使用TF-Slim,结合variables, layers 和 scopes,模型能够很简洁地被定义。这些元件定义以下。

 

Variables

       在原生的Tensorflow中,建立Variable须要一个预约义的值或者一种初始化机制(好比从一个高斯分布中随机采样)。此外,若是一个变量须要在一个特定的设备上(如GPU)建立,那么必须被明确说明。为了减小变量建立所需的代码,TF-Slim提供了一些封装函数(定义在variables.py中),可使得用户定义变量变得简单。

       举个例子,定义一个权重(weight)变量,使用一个截断的正态分布来初始化,使用l2 loss正则化,并将该变量放置在CPU中,咱们只须要声明以下:

 

weights = slim.variable('weights',
                             shape=[10, 10, 3 , 3],
                             initializer=tf.truncated_normal_initializer(stddev=0.1),
                             regularizer=slim.l2_regularizer(0.05),
                             device='/CPU:0')

       注意到,在原生的Tensorflow中,有两种类型的variables,regular variables 和 local (transient) variables。绝大部分变量是regular variables,一旦被建立,可使用saver来将这些变量保存到磁盘中;Local variables是那些仅仅存在于一个session内,并不会被保存到磁盘中。

       TF-Slim经过定义model variables来进一步区别变量,这些是表示一个模型参数的变量。Model variables在学习期间被训练或者fine-tuned,在评估或者推断期间能够从一个checkpoint中加载。模型变量包括使用slim.fully_connected 或者 slim.conv2d建立的变量等。非模型变量(Non-model variables)指的是那些在学习或者评估阶段使用可是在实际的inference中不须要用到的变量。好比说,global_step在学习和评估阶段会用到的变量,可是实际上并非模型的一部分。相似的,moving average variables也是非模型变量。

       model variables和regular variables在TF-Slim中很容易地被建立和恢复:

# Model Variables
weights = slim.model_variable('weights',
                              shape=[10, 10, 3 , 3],
                              initializer=tf.truncated_normal_initializer(stddev=0.1),
                              regularizer=slim.l2_regularizer(0.05),
                              device='/CPU:0')
model_variables = slim.get_model_variables()

# Regular variables
my_var = slim.variable('my_var',
                       shape=[20, 1],
                       initializer=tf.zeros_initializer())
regular_variables_and_model_variables = slim.get_variables()

        这是如何工做的呢?当你经过TF-Slim的layer或者直接经过slim.model_variable函数建立一个模型的变量时,TF-Slim将变量添加到tf.GraphKeys.MODEL_VARIABLES集合中。若是你想拥有本身定制化的layers或者variables建立机制,可是仍然想利用TF-Slim来管理你的变量,此时,TF-Slim提供一个方便的函数,用于添加模型的变量到集合中:

my_model_variable = CreateViaCustomCode()

# Letting TF-Slim know about the additional variable.
slim.add_model_variable(my_model_variable)

Layers

       在原生的Tensorflow中,要定义一些层(好比说卷积层,全链接层,BatchNorm层等)是比较麻烦的。举个例子,神经网络中的卷积层由如下几个步骤组成:

  1. 建立权重和偏置变量
  2. 将输入与权重作卷积运算
  3. 将偏置加到第二步的卷积运算获得的结果中
  4. 使用一个激活函数

上面的步骤使用原始的Tensorflow代码,实现以下:

input = ...
with tf.name_scope('conv1_1') as scope:
  kernel = tf.Variable(tf.truncated_normal([3, 3, 64, 128], dtype=tf.float32,
                                           stddev=1e-1), name='weights')
  conv = tf.nn.conv2d(input, kernel, [1, 1, 1, 1], padding='SAME')
  biases = tf.Variable(tf.constant(0.0, shape=[128], dtype=tf.float32),
                       trainable=True, name='biases')
  bias = tf.nn.bias_add(conv, biases)
  conv1 = tf.nn.relu(bias, name=scope)

       为了减小重复代码,TF-Slim提供了一些方便高级别更抽象的神经网络层。好比说,卷积层实现以下:

input = ...
net = slim.conv2d(input, 128, [3, 3], scope='conv1_1')

       TF-Slim提供了大量的标准的实现,用于创建神经网络。包括以下函数:

Layer

TF-Slim

BiasAdd

slim.bias_add

BatchNorm

slim.batch_norm

Conv2d

slim.conv2d

Conv2dInPlane

slim.conv2d_in_plane

Conv2dTranspose (Deconv)

slim.conv2d_transpose

FullyConnected

slim.fully_connected

AvgPool2D

slim.avg_pool2d

Dropout

slim.dropout

Flatten

slim.flatten

MaxPool2D

slim.max_pool2d

OneHotEncoding

slim.one_hot_encoding

SeparableConv2

slim.separable_conv2d

UnitNorm

slim.unit_norm

 

       TF-Slim也两个操做符,称为repeat 和 stack,容许用户重复执行相同的操做。好比说,下面几个卷积层加一个池化层是VGG网络的一部分,

net = ...
net = slim.conv2d(net, 256, [3, 3], scope='conv3_1')
net = slim.conv2d(net, 256, [3, 3], scope='conv3_2')
net = slim.conv2d(net, 256, [3, 3], scope='conv3_3')
net = slim.max_pool2d(net, [2, 2], scope='pool2')

       减小重复代码的其中一种方法是利用for循环,以下:

net = ...
for i in range(3):
  net = slim.conv2d(net, 256, [3, 3], scope='conv3_%d' % (i+1))
net = slim.max_pool2d(net, [2, 2], scope='pool2')

       另外一种方式是,使用TF-Slim中的repeat操做:

net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
net = slim.max_pool2d(net, [2, 2], scope='pool2')

       上面例子中,slim.repeat会自动给每个卷积层的scopes命名为'conv3/conv3_1', 'conv3/conv3_2' 和 'conv3/conv3_3'。

      

       另外,TF-Slim的 slim.stack操做容许用户用不一样的参数重复调用同一种操做。slim.stack也为每个被建立的操做建立一个新的tf.variable_scope。好比说,下面是一种简单的方式来建立多层感知器(Multi-Layer Perceptron (MLP)):

# Verbose way:
x = slim.fully_connected(x, 32, scope='fc/fc_1')
x = slim.fully_connected(x, 64, scope='fc/fc_2')
x = slim.fully_connected(x, 128, scope='fc/fc_3')

# Equivalent, TF-Slim way using slim.stack:
slim.stack(x, slim.fully_connected, [32, 64, 128], scope='fc')

       在上面的例子中,slim.stack调用了slim.fully_connected三次。相似的,咱们可使用stack来简化多层的卷积层。

# Verbose way:
x = slim.conv2d(x, 32, [3, 3], scope='core/core_1')
x = slim.conv2d(x, 32, [1, 1], scope='core/core_2')
x = slim.conv2d(x, 64, [3, 3], scope='core/core_3')
x = slim.conv2d(x, 64, [1, 1], scope='core/core_4')

# Using stack:
slim.stack(x, slim.conv2d, [(32, [3, 3]), (32, [1, 1]), (64, [3, 3]), (64, [1, 1])], scope='core')

Scopes

       除了Tensorflow中做用域(scope)以外(name_scopevariable_scope),TF-Slim增长了新的做用域机制,称为arg_scope。这个新的做用域容许使用者明确一个或者多个操做和一些参数,这些定义好的操做或者参数会传递给arg_scope内部的每个操做。下面举例说明。先看以下代码片断:

net = slim.conv2d(inputs, 64, [11, 11], 4, padding='SAME',
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv1')
net = slim.conv2d(net, 128, [11, 11], padding='VALID',
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv2')
net = slim.conv2d(net, 256, [11, 11], padding='SAME',
                  weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                  weights_regularizer=slim.l2_regularizer(0.0005), scope='conv3')

       从上面的代码中能够清楚的看出来,有3层卷积层,其中不少超参数都是同样的。两个卷积层有相同的padding,全部三个卷积层有相同的weights_initializer和weight_regularizer。上面的代码包含了大量重复的值,其中一种解决方法是使用变量来讲明一些默认的值:

padding = 'SAME'
initializer = tf.truncated_normal_initializer(stddev=0.01)
regularizer = slim.l2_regularizer(0.0005)
net = slim.conv2d(inputs, 64, [11, 11], 4,
                  padding=padding,
                  weights_initializer=initializer,
                  weights_regularizer=regularizer,
                  scope='conv1')
net = slim.conv2d(net, 128, [11, 11],
                  padding='VALID',
                  weights_initializer=initializer,
                  weights_regularizer=regularizer,
                  scope='conv2')
net = slim.conv2d(net, 256, [11, 11],
                  padding=padding,
                  weights_initializer=initializer,
                  weights_regularizer=regularizer,
                  scope='conv3')

       上面的解决方案其实并无减小代码的混乱程度。经过使用arg_scope,咱们能够既能够保证每一层使用相同的值,也能够简化代码:

  with slim.arg_scope([slim.conv2d], padding='SAME',
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.01)
                      weights_regularizer=slim.l2_regularizer(0.0005)):
    net = slim.conv2d(inputs, 64, [11, 11], scope='conv1')
    net = slim.conv2d(net, 128, [11, 11], padding='VALID', scope='conv2')
    net = slim.conv2d(net, 256, [11, 11], scope='conv3')

       上面的例子代表,使用arg_scope可使得代码变得更整洁、更干净而且更加容易维护。注意到,在arg_scope中规定的参数值,它们能够被局部覆盖。好比说,上面的padding参数被设置成‘SAME’,可是在第二个卷积层中用‘VALID’覆盖了这个参数。

       咱们也能够嵌套使用arg_scope,在相同的做用域内使用多个操做。举例以下:

with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.01),
                      weights_regularizer=slim.l2_regularizer(0.0005)):
  with slim.arg_scope([slim.conv2d], stride=1, padding='SAME'):
    net = slim.conv2d(inputs, 64, [11, 11], 4, padding='VALID', scope='conv1')
    net = slim.conv2d(net, 256, [5, 5],
                      weights_initializer=tf.truncated_normal_initializer(stddev=0.03),
                      scope='conv2')
    net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc')

       在上面的例子中,在第一个arg_scope中,卷积层和全链接层被应用于相同的权重初始化和权重正则化;在第二个arg_scope中,额外的参数仅仅对卷积层conv2d起做用。

 

Working Example: Specifying the VGG16 Layers

       经过结合TF-Slim的Variables, Operations 和 scopes,咱们可使用比较少的代码来实现一个比较复杂的网络。好比说,整个VGG网络定义以下:

def vgg16(inputs):
  with slim.arg_scope([slim.conv2d, slim.fully_connected],
                      activation_fn=tf.nn.relu,
                      weights_initializer=tf.truncated_normal_initializer(0.0, 0.01),
                      weights_regularizer=slim.l2_regularizer(0.0005)):
    net = slim.repeat(inputs, 2, slim.conv2d, 64, [3, 3], scope='conv1')
    net = slim.max_pool2d(net, [2, 2], scope='pool1')
    net = slim.repeat(net, 2, slim.conv2d, 128, [3, 3], scope='conv2')
    net = slim.max_pool2d(net, [2, 2], scope='pool2')
    net = slim.repeat(net, 3, slim.conv2d, 256, [3, 3], scope='conv3')
    net = slim.max_pool2d(net, [2, 2], scope='pool3')
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv4')
    net = slim.max_pool2d(net, [2, 2], scope='pool4')
    net = slim.repeat(net, 3, slim.conv2d, 512, [3, 3], scope='conv5')
    net = slim.max_pool2d(net, [2, 2], scope='pool5')
    net = slim.fully_connected(net, 4096, scope='fc6')
    net = slim.dropout(net, 0.5, scope='dropout6')
    net = slim.fully_connected(net, 4096, scope='fc7')
    net = slim.dropout(net, 0.5, scope='dropout7')
    net = slim.fully_connected(net, 1000, activation_fn=None, scope='fc8')
  return net

Training Models

       训练Tensorflow模型要求一个模型、一个loss function、梯度计算和一个训练的程序,用来迭代的根据loss计算模型权重的梯度和更新权重。TF-Slim提供了loss function和一些帮助函数,来运行训练和评估。

 

Losses

       Loss function定义了一个咱们须要最小化的量。对于分类问题,主要是计算真正的分布与预测的几率分布之间的交叉熵。对于回归问题,主要是计算预测值与真实值均方偏差。

       特定的模型,好比说多任务学习模型,要求同时使用多个loss function;换句话说,最终被最小化的loss function是多个其余的loss function之和。好比说,一个同时预测图像中场景的类型和深度的模型,该模型的loss function就是分类loss和深度预测loss之和(the sum of the classification loss and depth prediction loss)。

       TF-Slim经过losses模块为用户提供了一种机制,使得定义loss function变得简单。好比说,下面的是咱们想要训练VGG网络的简单示例:

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets
vgg = nets.vgg

# Load the images and labels.
images, labels = ...

# Create the model.
predictions, _ = vgg.vgg_16(images)

# Define the loss functions and get the total loss.
loss = slim.losses.softmax_cross_entropy(predictions, labels)

       在上面这个例子中,咱们首先建立一个模型(利用TF-Slim的VGG实现),而后增长了标准的分类loss。如今,让咱们看看当咱们有一个多个输出的多任务模型的状况:

# Load the images and labels.
images, scene_labels, depth_labels = ...

# Create the model.
scene_predictions, depth_predictions = CreateMultiTaskModel(images)

# Define the loss functions and get the total loss.
classification_loss = slim.losses.softmax_cross_entropy(scene_predictions, scene_labels)
sum_of_squares_loss = slim.losses.sum_of_squares(depth_predictions, depth_labels)

# The following two lines have the same effect:
total_loss = classification_loss + sum_of_squares_loss
total_loss = slim.losses.get_total_loss(add_regularization_losses=False)

       在这个例子中,咱们有2个loss,是经过调用slim.losses.softmax_cross_entropy 和 slim.losses.sum_of_squares获得。咱们能够将这两个loss加在一块儿或者调用slim.losses.get_total_loss()来获得所有的loss(total_loss)。这是如何工做的?当你经过TF-Slim建立一个loss时,TF-Slim将loss加到一个特殊的TensorFlow collection of loss functions。这使得你既能够手动得管理所有的loss,也可让TF-Slim来替你管理它们。

       若是你想让TF-Slim为你管理losses可是你有一个本身实现的loss该怎么办?loss_ops.py 也有一个函数能够将你本身实现的loss加到 TF-Slims collection中。举例以下:

# Load the images and labels.
images, scene_labels, depth_labels, pose_labels = ...

# Create the model.
scene_predictions, depth_predictions, pose_predictions = CreateMultiTaskModel(images)

# Define the loss functions and get the total loss.
classification_loss = slim.losses.softmax_cross_entropy(scene_predictions, scene_labels)
sum_of_squares_loss = slim.losses.sum_of_squares(depth_predictions, depth_labels)
pose_loss = MyCustomLossFunction(pose_predictions, pose_labels)
slim.losses.add_loss(pose_loss) # Letting TF-Slim know about the additional loss.

# The following two ways to compute the total loss are equivalent:
regularization_loss = tf.add_n(slim.losses.get_regularization_losses())
total_loss1 = classification_loss + sum_of_squares_loss + pose_loss + regularization_loss

# (Regularization Loss is included in the total loss by default).
total_loss2 = slim.losses.get_total_loss()

       在这个例子中,咱们既能够手动的计算的出所有的loss function,也可让TF-Slim知道这个额外的loss而后让TF-Slim处理这个loss。

 

Training Loop

       TF-Slim提供了一个简单可是很强的用于训练模型的工具(在 learning.py)。其中包括一个能够重复测量loss,计算梯度和将模型保存到磁盘的训练函数。举个例子,一旦咱们定义好了模型,loss function和最优化方法,咱们能够调用slim.learning.create_train_op 和 slim.learning.train来实现优化。

g = tf.Graph()

# Create the model and specify the losses...
...

total_loss = slim.losses.get_total_loss()
optimizer = tf.train.GradientDescentOptimizer(learning_rate)

# create_train_op ensures that each time we ask for the loss, the update_ops
# are run and the gradients being computed are applied too.
train_op = slim.learning.create_train_op(total_loss, optimizer)
logdir = ... # Where checkpoints are stored.

slim.learning.train(
    train_op,
    logdir,
    number_of_steps=1000,
    save_summaries_secs=300,
    save_interval_secs=600):

 

        在这个例子中,提供给slim.learning.train的参数有1)train_op,用于计算loss和梯度,2)logdir用于声明checkpoints和event文件保存的路径。咱们能够用number_of_steps参数来限制梯度降低的步数;save_summaries_secs=300代表咱们每5分钟计算一次summaries,save_interval_secs=600代表咱们每10分钟保存一次模型的checkpoint。

 

 

Working Example: Training the VGG16 Model

       下面是训练一个VGG网络的例子。

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets

slim = tf.contrib.slim
vgg = nets.vgg

...

train_log_dir = ...
if not tf.gfile.Exists(train_log_dir):
  tf.gfile.MakeDirs(train_log_dir)

with tf.Graph().as_default():
  # Set up the data loading:
  images, labels = ...

  # Define the model:
  predictions = vgg.vgg_16(images, is_training=True)

  # Specify the loss function:
  slim.losses.softmax_cross_entropy(predictions, labels)

  total_loss = slim.losses.get_total_loss()
  tf.summary.scalar('losses/total_loss', total_loss)

  # Specify the optimization scheme:
  optimizer = tf.train.GradientDescentOptimizer(learning_rate=.001)

  # create_train_op that ensures that when we evaluate it to get the loss,
  # the update_ops are done and the gradient updates are computed.
  train_tensor = slim.learning.create_train_op(total_loss, optimizer)

  # Actually runs training.
  slim.learning.train(train_tensor, train_log_dir)

 

Fine-Tuning Existing Models

Brief Recap on Restoring Variables from a Checkpoint

       当一个模型被训练完毕以后,它能够从一个给定的checkpoint中使用tf.train.Saver()来恢复变量。在不少状况下,tf.train.Saver()提供一个简答的机制来恢复全部变量或者一部分变量。

# Create some variables.
v1 = tf.Variable(..., name="v1")
v2 = tf.Variable(..., name="v2")
...
# Add ops to restore all the variables.
restorer = tf.train.Saver()

# Add ops to restore some variables.
restorer = tf.train.Saver([v1, v2])

# Later, launch the model, use the saver to restore variables from disk, and
# do some work with the model.
with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model
  ...

 

详细的信息能够查看 Restoring Variables 和 Choosing which Variables to Save and Restore这两个页面。

 

Partially Restoring Models

       在一个新的数据集或者一个新的任务上fine-tune一个预训练的模型一般是比较受欢迎的。咱们可使用TF-Slim的helper函数来选择想要恢复的一部分变量:

# Create some variables.
v1 = slim.variable(name="v1", ...)
v2 = slim.variable(name="nested/v2", ...)
...

# Get list of variables to restore (which contains only 'v2'). These are all
# equivalent methods:
variables_to_restore = slim.get_variables_by_name("v2")
# or
variables_to_restore = slim.get_variables_by_suffix("2")
# or
variables_to_restore = slim.get_variables(scope="nested")
# or
variables_to_restore = slim.get_variables_to_restore(include=["nested"])
# or
variables_to_restore = slim.get_variables_to_restore(exclude=["v1"])

# Create the saver which will be used to restore the variables.
restorer = tf.train.Saver(variables_to_restore)

with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")
  print("Model restored.")
  # Do some work with the model
  ...

Restoring models with different variable names

       当从一个checkpoint中恢复变量时,Saver定位在checkpoint文件中变量的名字,而且将它们映射到当前图(graph)的变量中去。上面,咱们经过传递给saver一个变量列表来建立一个saver。在这种状况下,在checkpoint文件中定位的变量名隐式地从每一个提供的变量的var. op. name中得到。

       当checkpoint文件中的变量名与graph匹配时,将会工做良好。然而,有时候,咱们想要从一个与当前的graph不一样变量名的checkpoint中恢复变量,那么在这种状况下,咱们必须给Saver提供一个字典,该字典将每一个checkpoint中变量名映射到每一个graph的变量。下面的例子是,经过一个简单的函数得到checkpoint中的变量的名字。

# Assuming than 'conv1/weights' should be restored from 'vgg16/conv1/weights'
def name_in_checkpoint(var):
  return 'vgg16/' + var.op.name

# Assuming than 'conv1/weights' and 'conv1/bias' should be restored from 'conv1/params1' and 'conv1/params2'
def name_in_checkpoint(var):
  if "weights" in var.op.name:
    return var.op.name.replace("weights", "params1")
  if "bias" in var.op.name:
    return var.op.name.replace("bias", "params2")

variables_to_restore = slim.get_model_variables()
variables_to_restore = {name_in_checkpoint(var):var for var in variables_to_restore}
restorer = tf.train.Saver(variables_to_restore)

with tf.Session() as sess:
  # Restore variables from disk.
  restorer.restore(sess, "/tmp/model.ckpt")

Fine-Tuning a Model on a different task

       考虑这么一种状况:咱们有一个预训练好的VGG16模型,该模型是在ImageNet数据集上训练好的,有1000类。然而,咱们想要将其应用到只有20类的Pascal VOC数据集上。为了实现这个,咱们可使用不包括最后一层的预训练模型来初始化咱们的新模型。

# Load the Pascal VOC data
image, label = MyPascalVocDataLoader(...)
images, labels = tf.train.batch([image, label], batch_size=32)

# Create the model
predictions = vgg.vgg_16(images)

train_op = slim.learning.create_train_op(...)

# Specify where the Model, trained on ImageNet, was saved.
model_path = '/path/to/pre_trained_on_imagenet.checkpoint'

# Specify where the new model will live:
log_dir = '/path/to/my_pascal_model_dir/'

# Restore only the convolutional layers:
variables_to_restore = slim.get_variables_to_restore(exclude=['fc6', 'fc7', 'fc8'])
init_fn = assign_from_checkpoint_fn(model_path, variables_to_restore)

# Start training.
slim.learning.train(train_op, log_dir, init_fn=init_fn)

 

Evaluating Models.

       一旦咱们已经训练好了一个模型(或者模型正在训练之中),咱们想要看看模型的实际表现能力。这个能够经过使用一些评估度量来实现,该度量能够对模型的表现能力评分。而评估代码其实是加载数据,作预测,将预测结果与真实值作比较,最后获得得分。这个步骤能够运行一次或者周期重复。

 

Metrics

       咱们将度量定义为一个性能度量,它不是一个loss函数(losses是在训练的时候直接最优化),但咱们仍然感兴趣的是评估模型的目的。好比说,咱们想要最优化log loss,可是咱们感兴趣的度量多是F1得分(test accuracy),或者是Intersection Over Union score(这是不可微的,所以不能做为损失使用)。

       TF-Slim提供了一些使得评估模型变得简单的度量操做。计算度量的值能够分为如下三个步骤:

  1. 初始化(Initialization):初始化用于计算度量的变量
  2. 聚合(Aggregation):使用操做(好比求和操做)来计算度量
  3. 终止化(Finalization):(可选的)使用最终的操做来计算度量值,好比说计算均值,最小值,最大值等。

 

举个例子,为了计算mean_absolute_error,2个变量,count 和 total变量被初始化为0。在聚合期间,咱们观测到一些预测值和标签值,计算它们的绝对差值而后加到total中。每一次咱们观测到新的一个数据,咱们增长count。最后,在Finalization期间,total除以count来得到均值mean。

       下面的示例演示了声明度量标准的API。因为度量常常在测试集上进行评估,所以咱们假设使用的是测试集。

images, labels = LoadTestData(...)
predictions = MyModel(images)

mae_value_op, mae_update_op = slim.metrics.streaming_mean_absolute_error(predictions, labels)
mre_value_op, mre_update_op = slim.metrics.streaming_mean_relative_error(predictions, labels)
pl_value_op, pl_update_op = slim.metrics.percentage_less(mean_relative_errors, 0.3)

 

       如示例所示,一个度量的建立返回两个值:value_op和update_op。value_op是一个幂等操做,它返回度量的当前值。update_op是执行上面提到的聚合步骤的操做,以及返回度量的值。

       跟踪每一个value_op和update_op是很费力的。为了解决这个问题,TF-Slim提供了两个便利功能:

# Aggregates the value and update ops in two lists:
value_ops, update_ops = slim.metrics.aggregate_metrics(
    slim.metrics.streaming_mean_absolute_error(predictions, labels),
    slim.metrics.streaming_mean_squared_error(predictions, labels))

# Aggregates the value and update ops in two dictionaries:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    "eval/mean_absolute_error": slim.metrics.streaming_mean_absolute_error(predictions, labels),
    "eval/mean_squared_error": slim.metrics.streaming_mean_squared_error(predictions, labels),
})

Working example: Tracking Multiple Metrics

 

       将代码所有放在一块儿:

import tensorflow as tf
import tensorflow.contrib.slim.nets as nets

slim = tf.contrib.slim
vgg = nets.vgg


# Load the data
images, labels = load_data(...)

# Define the network
predictions = vgg.vgg_16(images)

# Choose the metrics to compute:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    "eval/mean_absolute_error": slim.metrics.streaming_mean_absolute_error(predictions, labels),
    "eval/mean_squared_error": slim.metrics.streaming_mean_squared_error(predictions, labels),
})

# Evaluate the model using 1000 batches of data:
num_batches = 1000

with tf.Session() as sess:
  sess.run(tf.global_variables_initializer())
  sess.run(tf.local_variables_initializer())

  for batch_id in range(num_batches):
    sess.run(names_to_updates.values())

  metric_values = sess.run(names_to_values.values())
  for metric, value in zip(names_to_values.keys(), metric_values):
    print('Metric %s has value: %f' % (metric, value))

Evaluation Loop

       TF-Slim提供了一个评估模块(evaluation.py),它包含了使用来自 metric_ops.py 模块编写模型评估脚本的辅助函数。这些功能包括按期运行评估、对数据批量进行评估、打印和汇总度量结果的功能。

import tensorflow as tf

slim = tf.contrib.slim

# Load the data
images, labels = load_data(...)

# Define the network
predictions = MyModel(images)

# Choose the metrics to compute:
names_to_values, names_to_updates = slim.metrics.aggregate_metric_map({
    'accuracy': slim.metrics.accuracy(predictions, labels),
    'precision': slim.metrics.precision(predictions, labels),
    'recall': slim.metrics.recall(mean_relative_errors, 0.3),
})

# Create the summary ops such that they also print out to std output:
summary_ops = []
for metric_name, metric_value in names_to_values.iteritems():
  op = tf.summary.scalar(metric_name, metric_value)
  op = tf.Print(op, [metric_value], metric_name)
  summary_ops.append(op)

num_examples = 10000
batch_size = 32
num_batches = math.ceil(num_examples / float(batch_size))

# Setup the global step.
slim.get_or_create_global_step()

output_dir = ... # Where the summaries are stored.
eval_interval_secs = ... # How often to run the evaluation.
slim.evaluation.evaluation_loop(
    'local',
    checkpoint_dir,
    log_dir,
    num_evals=num_batches,
    eval_op=names_to_updates.values(),
    summary_op=tf.summary.merge(summary_ops),
    eval_interval_secs=eval_interval_secs)

 

Authors

Sergio Guadarrama and Nathan Silberman

相关文章
相关标签/搜索