论文学习-深度学习目标检测2014至201901综述-Deep Learning for Generic Object Detection A Survey

时间 2019-11-07

标签论文学习深度目标检测综述 deep learning generic object detection survey 繁體版

原文原文链接

目录git

写在前面

paper：https://arxiv.org/abs/1809.02165
github：https://github.com/hoya012/deep_learning_object_detection，A paper list of object detection using deep learning网络

这篇综述对深度学习目标检测2014至201901取得的进展进行了总结，包括：框架

More than 250 key contributions are included in this survey, covering many aspects of generic object detection research: leading detection frameworks and fundamental subprob-lems including object feature representation, object proposal generation, context information modeling and training strategies; evaluation issues, specifically benchmark datasets, evaluation metrics, and state of the art performance.性能

本文的主要目的在于摘录paper中的一些重要图表和结论，做为系统学习的索引，不作详细的展开。学习

下面两张图来自github，分别为paper list和performance table，红色为做者认为必读的paper。

this

目标检测任务与挑战

目标检测任务的输入是一张图像，输出是图像中的物体位置和类别，以下图所示，位置可经过Bounding Box描述，也可描述为像素的集合。

为了肯定图片中物体的位置和类别，要面临不少挑战，一个好的检测器要作到定位准确、分类准确还要效率高，须要对光照、形变、尺度、视角、尺寸、姿态、遮挡、模糊、噪声等状况鲁棒，须要能容忍可能存在的较大的类内差别，又能区分开较小的类间差别，同时还要保证高效。

lua

目标检测方法汇总

在2012年前，目标检测方法主要是人工特征工程+分类器，2012年后主要是基于DCNN的方法，以下图所示：

.net

目标检测的框架能够分红2类：设计

Two stage detection framework：含region proposal，先获取ROI，而后对ROI进行识别和回归bounding box，以RCNN系列方法为表明。
One stage detection framework：不含region proposal，将全图grid化，对每一个grid进行识别和回归，以YOLO系列方法为表明。

Pipeline对比与演化以下：

主干网络、检测框架设计、大规模高质量的数据集是决定检测性能的3个最重要的因素，决定了学到特征的好坏以及特征使用的好坏。

基础子问题

这一节谈论的重点包括：基于DCNN的特征表示、候选区生成、上下文信息、训练策略等。

基于DCNN的特征表示

主干网络（network backbone）

ILSVRC（ImageNet Large Scale Visual Recognition Competition）极大促进了DCNN architecture的改进，在计算机视觉的各类任务中，每每将这些经典网络做为主干网络（backbone），再在其上作各类文章，经常使用在目标检测任务中的DCNN architectures以下：

Methods For Improving Object Representation

物体在图像中的尺寸是未知的，图片中的不一样物体尺寸也多是不一样的，而DCNN越深层的感觉野越大，所以只在某一层上进行预测显然是难以达到最优的，一个天然的想法是利用不一样层提取到的信息进行预测，称之为multiscale object detection，可分红3类：

Detecting with combined features of multiple CNN layers
Detecting at multiple CNN layers;
Combinations of the above two methods

直接看图比较直观：

尝试对几何变形进行建模也是改善Object Representation的一个方向，方法包括结合Deformable Part based Models (DPMs)的方法、Deformable Convolutional Networks (DCN)方法等。

Context Modeling

上下文信息能够分为3类：

Semantic context: The likelihood of an object to be found in some scenes but not in others;
Spatial context: The likelihood of finding an object in some position and not others with respect to other objects in the scene;
Scale context: Objects have a limited set of sizes relative to other objects in the scene.

DCNN经过学习不一样抽象层级的特征可能已经隐式地使用了contextual information，所以目前的state-of-art目标检测方法并无显式地利用contextual information，但近来也有一些显式利用contextual information的DCNN方法，可分为2类：Global context和Local context。

感受能够在某种程度上当作是数据层面的集成学习。

Detection Proposal Methods

Two stage detection framework须要生成ROI。

生成ROI的方法，能够分为Bounding Box Proposal Methods和Object Segment Proposal Methods，前者回归出Bounding Box来描述ROI，后者经过分割获得像素集合来描述ROI。

Other Special Issues

经过data augmentation tricks（数据增广）能够获得更鲁棒的特征表示，能够当作是数据层面上的集成学习，考虑到物体尺度可大可小的问题，scaling是使用最多的数据增广方法。

Datasets and Performance Evaluation

以上。