TensorFlow object detection API应用

时间 2019-12-01

标签 tensorflow object detection api 应用繁體版

原文原文链接

前一篇讲述了TensorFlow object detection API的安装与配置，如今咱们尝试用这个API搭建本身的目标检测模型。html

1、准备数据集python

　　本篇旨在人脸识别，在百度图片上下载了120张张钧甯的图片，存放在/models/research/object_detection下新建的images文件夹内，images文件夹下新建train和test两个文件夹，而后将120分为100和20张分别存放在train和test中。git

接下来使用 LabelImg 这款小软件，安装方法参考这里，对train和test里的图片进行人工标注（时间充裕的话越多越好），以下图所示。github

标注完成后保存为同名的xml文件，并存在原文件夹中。api

对于Tensorflow，须要输入专门的 TFRecords Format 格式。浏览器

写两个小python脚本文件，第一个将文件夹内的xml文件内的信息统一记录到.csv表格中，第二个从.csv表格中建立tfrecord格式。app

附上对应代码：编辑器

# xml2csv.py

import os import glob import pandas as pd import xml.etree.ElementTree as ET os.chdir('/home/zzf/tensorflow/models/research/object_detection/images/test') path = '/home/zzf/tensorflow/models/research/object_detection/images/test'

def xml_to_csv(path): xml_list = [] for xml_file in glob.glob(path + '/*.xml'): tree = ET.parse(xml_file) root = tree.getroot() for member in root.findall('object'): value = (root.find('filename').text, int(root.find('size')[0].text), int(root.find('size')[1].text), member[0].text, int(member[4][0].text), int(member[4][1].text), int(member[4][2].text), int(member[4][3].text) ) xml_list.append(value) column_name = ['filename', 'width', 'height', 'class', 'xmin', 'ymin', 'xmax', 'ymax'] xml_df = pd.DataFrame(xml_list, columns=column_name) return xml_df def main(): image_path = path xml_df = xml_to_csv(image_path) xml_df.to_csv('zhangjn_train.csv', index=None) print('Successfully converted xml to csv.') main()

View Code

# generate_tfrecord.py

# -*- coding: utf-8 -*-


""" Usage: # From tensorflow/models/ # Create train data: python generate_tfrecord.py --csv_input=data/tv_vehicle_labels.csv --output_path=train.record # Create test data: python generate_tfrecord.py --csv_input=data/test_labels.csv --output_path=test.record """


import os import io import pandas as pd import tensorflow as tf from PIL import Image from object_detection.utils import dataset_util from collections import namedtuple, OrderedDict os.chdir('/home/zzf/tensorflow/models/research/object_detection') flags = tf.app.flags flags.DEFINE_string('csv_input', '', 'Path to the CSV input') flags.DEFINE_string('output_path', '', 'Path to output TFRecord') FLAGS = flags.FLAGS # TO-DO replace this with label map
def class_text_to_int(row_label): if row_label == 'ZhangJN':     # 需改动
        return 1
    else: None def split(df, group): data = namedtuple('data', ['filename', 'object']) gb = df.groupby(group) return [data(filename, gb.get_group(x)) for filename, x in zip(gb.groups.keys(), gb.groups)] def create_tf_example(group, path): with tf.gfile.GFile(os.path.join(path, '{}'.format(group.filename)), 'rb') as fid: encoded_jpg = fid.read() encoded_jpg_io = io.BytesIO(encoded_jpg) image = Image.open(encoded_jpg_io) width, height = image.size filename = group.filename.encode('utf8') image_format = b'jpg' xmins = [] xmaxs = [] ymins = [] ymaxs = [] classes_text = [] classes = [] for index, row in group.object.iterrows(): xmins.append(row['xmin'] / width) xmaxs.append(row['xmax'] / width) ymins.append(row['ymin'] / height) ymaxs.append(row['ymax'] / height) classes_text.append(row['class'].encode('utf8')) classes.append(class_text_to_int(row['class'])) tf_example = tf.train.Example(features=tf.train.Features(feature={ 'image/height': dataset_util.int64_feature(height), 'image/width': dataset_util.int64_feature(width), 'image/filename': dataset_util.bytes_feature(filename), 'image/source_id': dataset_util.bytes_feature(filename), 'image/encoded': dataset_util.bytes_feature(encoded_jpg), 'image/format': dataset_util.bytes_feature(image_format), 'image/object/bbox/xmin': dataset_util.float_list_feature(xmins), 'image/object/bbox/xmax': dataset_util.float_list_feature(xmaxs), 'image/object/bbox/ymin': dataset_util.float_list_feature(ymins), 'image/object/bbox/ymax': dataset_util.float_list_feature(ymaxs), 'image/object/class/text': dataset_util.bytes_list_feature(classes_text), 'image/object/class/label': dataset_util.int64_list_feature(classes), })) return tf_example def main(_): writer = tf.python_io.TFRecordWriter(FLAGS.output_path) path = os.path.join(os.getcwd(), 'images/test')         # 需改动
    examples = pd.read_csv(FLAGS.csv_input) grouped = split(examples, 'filename') for group in grouped: tf_example = create_tf_example(group, path) writer.write(tf_example.SerializeToString()) writer.close() output_path = os.path.join(os.getcwd(), FLAGS.output_path) print('Successfully created the TFRecords: {}'.format(output_path)) if __name__ == '__main__': tf.app.run()

View Code

对于xml2csv.py，注意改变8,9行，os.chdir和path路径，以及35行，最后生成的csv文件的命名。generate_tfrecord.py也同样，路径需改成本身的，注意33行后的标签识别代码中改成相应的标签，我这里就一个。ide

对于训练集与测试集分别运行上述代码便可，获得train.record与test.record文件。测试

2、配置文件和模型

为了方便，我把image下的train和test的csv和record文件都放到object_detection/data目录下，如此，在object_dection文件夹下，咱们有以下的文件结构：

Object-Detection -data/
--test_labels.csv --test.record --train_labels.csv --train.record -images/
--test/
---testingimages.jpg --train/
---testingimages.jpg --...yourimages.jpg -training/   # 新建，用于一会训练模型使用

View Code

接下来须要设置配置文件，在objec_detection/samples下，寻找须要的对于模型的config文件，

咱们还能够在官方提供的model zoo里下载训练好的模型。咱们使用ssd_mobilenet_v1_coco，先下载它。

在 object_dection文件夹下，解压 ssd_mobilenet_v1_coco_2017_11_17.tar.gz，

将ssd_mobilenet_v1_coco.config 放在training 文件夹下，用文本编辑器打开（我用的sublime 3），进行以下更改：

一、搜索其中的 PATH_TO_BE_CONFIGURED ，将对应的路径改成本身的路径，注意不要把test跟train弄反了；

　注意最后train input reader和evaluation input reader中label_map_path必须保持一致。

二、将 num_classes 按照实际状况更改，个人例子中是1；

三、batch_size 本来是24，我在运行的时候出现显存不足的问题，为了保险起见，改成1，若是1仍是出现相似问题的话，建议换电脑……

四、fine_tune_checkpoint: "ssd_mobilenet_v1_coco_11_06_2017/model.ckpt"
from_detection_checkpoint: true

　　这里是使用finetune，在它原来训练好的模型数据上进行训练，这样能够快不少。否则从头训练好慢。

此时在对应目录（/data）下，建立一个 zhangjn.pbtxt的文本文件（能够复制一个其余名字的文件，而后用文本编辑软件打开修改），写入咱们的标签，个人例子中是两个，id序号注意与前面建立CSV文件时保持一致，从1开始。

item {
  id: 1
  name: 'ZhangJN'
}

好，全部数据都已准备好。能够开始训练了。

3、训练模型

我在本地GPU训练（本机环境：Ubuntu 16.04LTS），终端进入 object_detection目录下，最新版用model_main.py，也能够用老版本的train.py训练，后面会讲到。model_main.py训练时散热器风扇已经呼呼转动了，但终端没有step the loss 信息输出，心有点慌，须要先改几个地方，

添加 tf.logging.set_verbosity(tf.logging.INFO) 到model_main.py 的 import 区域以后，会每隔一百个step输出loss，总比没有好，至少它让你知道它在跑。
若是是python3训练，添加list() 到 model_lib.py的大概390行 category_index.values()变成： list(category_index.values())，不然会有 can't pickle dict_values ERROR出现
还有一个问题是，用model_main.py 训练时，由于它把老版本的train.py和eval.py集合到了一块儿，因此制定eval num时指定很差会有warning出现，就像：

WARNING:tensorflow:Ignoring ground truth with image id 558212937 since it was previously added

　　因此在config文件设置时，eval部分的 num_examples （以下）和运行设置参数--num_eval_steps 时任何一个值只要比你数据集中训练图片数目要大就会出现警告，由于它没那么多图片来评估，因此这两个值直接设置成训练图片数量就行了。

eval_config: {
  num_examples: 20
  # Note: The below line limits the evaluation process to 10 evaluations.
  # Remove the below line to evaluate indefinitely.
  max_evals: 10
}

而后在终端输入：

python3 model_main.py \
    --pipeline_config_path=training/ssd_mobilenet_v1_coco.config \
    --model_dir=training \
    --num_train_steps=60000 \
    --num_eval_steps=20 \
    --alsologtostderr

正常的话，稍等片刻，听到风扇开始加速转动的声音时，训练正在有条不紊地进行。model_main.py最后还生成了一个export文件夹，里面竟然把save_model.pb都生成了，我没试过这个是否是咱们后面要用的。有兴趣的能够试试这个pb文件。

不想改的话能够用老版本的train.py，在legacy/train.py，一样运行：

python3 legacy/train.py --logtostderr --train_dir=training/ --pipeline_config_path=training/ssd_mobilenet_v1_coco.config

就开始训练了

另开一个终端，一样进入到object_detection目录下，输入：

tensorboard --logdir=training

此时，咱们能够在浏览器打开查看训练进度，它会不停地传递新训练的数据进来。

运行一段时间后，咱们能够看到咱们的training文件夹下已经有模型数据保存了，接下来就能够生成咱们的须要的模型文件了，终端在object_detection目录下，输入：

python3 export_inference_graph.py --input_type image_tensor --pipeline_config_path training/ssd_mobilenet_v1_coco.config --trained_checkpoint_prefix training/model.ckpt-3737 --output_directory zhangjn_detction

其中，trained checkpoint 要改成本身训练到的数字， output为想要将模型存放在何处，我这里新建了一个文件夹zhangjn_detction 。运行结束后，就能够在zhangjn_detction文件夹下看到若干文件，有saved_model、checkpoint、frozen_inference_graph.pb等。 .pb结尾的就是最重要的frozen model了，上一篇小demo里用的就是它，接下来咱们测试就是要用到它。

4、测试模型

将object_detection目录下的object_detection_tutorial.ipynb打开，或者转成object_detection_tutorial.py的python文件，更改一下就能够测试。

# coding: utf-8

# # Object Detection Demo
# Welcome to the object detection inference walkthrough!  This notebook will walk you step by step through the process of using a pre-trained model to detect objects in an image. Make sure to follow the [installation instructions](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/installation.md) before you start.


from distutils.version import StrictVersion
import numpy as np
import os
import six.moves.urllib as urllib
import sys
import tarfile
import tensorflow as tf
import zipfile

from collections import defaultdict
from io import StringIO
from matplotlib import pyplot as plt
from PIL import Image

# This is needed since the notebook is stored in the object_detection folder.
sys.path.append("..")
from object_detection.utils import ops as utils_ops

# if StrictVersion(tf.__version__) < StrictVersion('1.9.0'):
#   raise ImportError('Please upgrade your TensorFlow installation to v1.9.* or later!')


# ## Env setup

# In[2]:


# This is needed to display the images.
# get_ipython().magic(u'matplotlib inline')


# ## Object detection imports
# Here are the imports from the object detection module.



from utils import label_map_util

from utils import visualization_utils as vis_util


# # Model preparation 

# ## Variables
# 
# Any model exported using the `export_inference_graph.py` tool can be loaded here simply by changing `PATH_TO_FROZEN_GRAPH` to point to a new .pb file.  
# 
# By default we use an "SSD with Mobilenet" model here. See the [detection model zoo](https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/detection_model_zoo.md) for a list of other models that can be run out-of-the-box with varying speeds and accuracies.

# In[4]:


# What model to download.
MODEL_NAME = 'zhangjn_detction'
# MODEL_FILE = MODEL_NAME + '.tar.gz'
# DOWNLOAD_BASE = 'http://download.tensorflow.org/models/object_detection/'

# Path to frozen detection graph. This is the actual model that is used for the object detection.
PATH_TO_FROZEN_GRAPH = MODEL_NAME + '/frozen_inference_graph.pb'

# List of the strings that is used to add correct label for each box.
PATH_TO_LABELS = os.path.join('data', 'zhangjn.pbtxt')

NUM_CLASSES = 1


# ## Download Model



# opener = urllib.request.URLopener()
# opener.retrieve(DOWNLOAD_BASE + MODEL_FILE, MODEL_FILE)
'''
tar_file = tarfile.open(MODEL_FILE)
for file in tar_file.getmembers():
  file_name = os.path.basename(file.name)
  if 'frozen_inference_graph.pb' in file_name:
    tar_file.extract(file, os.getcwd())
'''

# ## Load a (frozen) Tensorflow model into memory.



detection_graph = tf.Graph()
with detection_graph.as_default():
  od_graph_def = tf.GraphDef()
  with tf.gfile.GFile(PATH_TO_FROZEN_GRAPH, 'rb') as fid:
    serialized_graph = fid.read()
    od_graph_def.ParseFromString(serialized_graph)
    tf.import_graph_def(od_graph_def, name='')


# ## Loading label map
# Label maps map indices to category names, so that when our convolution network predicts `5`, we know that this corresponds to `airplane`.  Here we use internal utility functions, but anything that returns a dictionary mapping integers to appropriate string labels would be fine



label_map = label_map_util.load_labelmap(PATH_TO_LABELS)
categories = label_map_util.convert_label_map_to_categories(label_map, max_num_classes=NUM_CLASSES, use_display_name=True)
category_index = label_map_util.create_category_index(categories)


# ## Helper code

# In[8]:


def load_image_into_numpy_array(image):
  (im_width, im_height) = image.size
  return np.array(image.getdata()).reshape(
      (im_height, im_width, 3)).astype(np.uint8)


# # Detection



# For the sake of simplicity we will use only 2 images:
# image1.jpg
# image2.jpg
# If you want to test the code with your images, just add path to the images to the TEST_IMAGE_PATHS.
PATH_TO_TEST_IMAGES_DIR = 'test_images'
TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(3, 8) ]

# Size, in inches, of the output images.
IMAGE_SIZE = (12, 8)


# In[10]:


def run_inference_for_single_image(image, graph):
  with graph.as_default():
    with tf.Session() as sess:
      # Get handles to input and output tensors
      ops = tf.get_default_graph().get_operations()
      all_tensor_names = {output.name for op in ops for output in op.outputs}
      tensor_dict = {}
      for key in [
          'num_detections', 'detection_boxes', 'detection_scores',
          'detection_classes', 'detection_masks'
      ]:
        tensor_name = key + ':0'
        if tensor_name in all_tensor_names:
          tensor_dict[key] = tf.get_default_graph().get_tensor_by_name(
              tensor_name)
      if 'detection_masks' in tensor_dict:
        # The following processing is only for single image
        detection_boxes = tf.squeeze(tensor_dict['detection_boxes'], [0])
        detection_masks = tf.squeeze(tensor_dict['detection_masks'], [0])
        # Reframe is required to translate mask from box coordinates to image coordinates and fit the image size.
        real_num_detection = tf.cast(tensor_dict['num_detections'][0], tf.int32)
        detection_boxes = tf.slice(detection_boxes, [0, 0], [real_num_detection, -1])
        detection_masks = tf.slice(detection_masks, [0, 0, 0], [real_num_detection, -1, -1])
        detection_masks_reframed = utils_ops.reframe_box_masks_to_image_masks(
            detection_masks, detection_boxes, image.shape[0], image.shape[1])
        detection_masks_reframed = tf.cast(
            tf.greater(detection_masks_reframed, 0.5), tf.uint8)
        # Follow the convention by adding back the batch dimension
        tensor_dict['detection_masks'] = tf.expand_dims(
            detection_masks_reframed, 0)
      image_tensor = tf.get_default_graph().get_tensor_by_name('image_tensor:0')

      # Run inference
      output_dict = sess.run(tensor_dict,
                             feed_dict={image_tensor: np.expand_dims(image, 0)})

      # all outputs are float32 numpy arrays, so convert types as appropriate
      output_dict['num_detections'] = int(output_dict['num_detections'][0])
      output_dict['detection_classes'] = output_dict[
          'detection_classes'][0].astype(np.uint8)
      output_dict['detection_boxes'] = output_dict['detection_boxes'][0]
      output_dict['detection_scores'] = output_dict['detection_scores'][0]
      if 'detection_masks' in output_dict:
        output_dict['detection_masks'] = output_dict['detection_masks'][0]
  return output_dict


# In[ ]:


for image_path in TEST_IMAGE_PATHS:
  image = Image.open(image_path)
  # the array based representation of the image will be used later in order to prepare the
  # result image with boxes and labels on it.
  image_np = load_image_into_numpy_array(image)
  # Expand dimensions since the model expects images to have shape: [1, None, None, 3]
  image_np_expanded = np.expand_dims(image_np, axis=0)
  # Actual detection.
  output_dict = run_inference_for_single_image(image_np, detection_graph)
  # Visualization of the results of a detection.
  vis_util.visualize_boxes_and_labels_on_image_array(
      image_np,
      output_dict['detection_boxes'],
      output_dict['detection_classes'],
      output_dict['detection_scores'],
      category_index,
      instance_masks=output_dict.get('detection_masks'),
      use_normalized_coordinates=True,
      line_thickness=8)
  plt.figure(figsize=IMAGE_SIZE)
  plt.imshow(image_np)
  plt.show()

View Code

一、由于不用下载模型，下载相关代码能够删除，model name, path to labels , num classes 更改为本身的，download model部分都删去。

二、测试图片，准备几张放入test images文件夹中，命名images+数字.jpg的格式，就不用改代码，在

TEST_IMAGE_PATHS = [ os.path.join(PATH_TO_TEST_IMAGES_DIR, 'image{}.jpg'.format(i)) for i in range(3, 8) ]

一行更改本身图片的数字序列就行了，range(3,8)，个人图片命名从3至7.

若是用python文件的话，最后图片显示要加一句

plt.show()

运行它就能够了。

python3 object_detection_tutorial.py

总之，整个训练过程就是这样。熟悉了以后也还挺简单的。运行中可能会碰到这样那样的问题，不少是版本问题致使的。TensorFlow最烦人的一点就是版本更新太快，而且改动大，先后版本有些还不兼容。因此有问题不用怕，多Google，百度一下，通常均可以找到答案，若是是版本问题，一时无法升级的话，对比一下你的版本和最新版本哪一个差别致使的，把代码中方法调用方式改为你的版本就行了，我原来用1.4版本的时候，常常遇到版本不一样的问题，好比最新版本中tf.contrib.data.parallel_interleave()方法，在1.4版本中tf.contrib.data没有parallel_interleave()这个方法;再好比1.10版本中tf.keras.Model()类也可这这样调用tf.keras.models.Model(),可是在1.4版本中只有后者一种调用方法，如果某个程序用了前者方法，在1.4版本中要运行起来就得本身去改一下了，等等。不过用了一段时间后我仍是把TensorFlow升级到1.10了，改太多了，本身都受不了。升级一下就是麻烦点，NVIDIA 驱动，cuda,cudnn 都得改。还好此次轻车熟路，三四个小时就升级成功了。