『深度应用』对CenterNet的一些思考与质疑·对比与U版YoloV3速度与精度

欢迎你们关注小宋公众号《极简 AI》带你学深度学习python

基于深度学习的理论学习与应用开发技术分享,笔者会常常分享深度学习干货内容,你们在学习或者应用深度学习时,遇到什么问题也能够与我在上面交流知无不答。git

0. 引子

笔者很喜欢 CenterNet 极简的网络结构,CenterNet 只经过 FCN(全卷积)的方法实现了对于目标的检测与分类,无需 anchor 与 nms 等复杂的操做高效的同时精度也不差。同时也能够很将此结构简单的修改就能够应用到人体姿态估计与三维目标检测之中github

后面一些针对 CenterNet 结构应用于其余任务,也取得不错的效果,好比人脸检测 CenterFace 以及目标追踪 CenterTrack 与 FairMot。这些内容后面等笔者研习事后再补充,后面应该会作一个类 CenterNet 结构总结对比,感兴趣的读者能够持续关注一下。算法

下面要引出写此篇博文的了,在研习 CenterNet 时看到了 CenterNet 与 YoloV3 的对比,在速度与精度都实现了超越,其实针对这个结论笔者仍是略带怀疑态度的。markdown

**YoloV3 网络的特色是速度快,精度不是很高,经常使用于实际的检测项目中,实现实时检测识别。**相较于二阶段(two stage)的 Faster Rcnn 具有速度优点,相较于单阶段(one stage)的 SSD(Single Shot Detection)与 RetinaNet 有速度与精度的优点。网络

因此笔者对 CenterNet 针对 YoloV3 速度的提高仍是有些怀疑的,YoloV3 能够说目前是工业上最经常使用也是最好用的目标检测算法,若是真的如 CenterNet 的论文结论所述,CenterNet 同时也具有结构简单使用方便的特色(先忽略 DCN,部署全面支持只是时间问题),确定能取代 YoloV3 的地位。框架

针对上述状况,笔者打算作一下对比实验,测试在相同的硬件与环境的条件下,来测试 CenterNet 与 YoloV3 的精度与速度的测试,其实为了简化实验,这里只测试在相同尺寸下 CenterNet 与 YoloV3 的速度对比,精度以文章的内容为准。ide

1. 实验条件

为了读者能承认与方便复现笔者的结果,这里列出实验的硬件与环境:oop

  • 系统:Ubuntu 18.04.4 LTS
  • CPU:Intel® Core™ i5-9400F CPU @ 2.90GHz × 6
  • GPU:GeForce RTX 2060 SUPER/PCIe/SSE2
  • Cuda:10.1
  • Pytorch:1.5.0

实验参考开源:post

CenterNet:github.com/xingyizhou/…

YoloV3:github.com/ultralytics…

2. 实验过程

1.U 版 YoloV3

1. 最大边放缩 320

运行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 320 --cfg cfg/yolov3-spp.cfg --source data/samples/
复制代码

结果:模型平均耗时 12ms

Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 256x320 4 persons, 2 dogs, Done. (0.011s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 192x320 9 persons, 2 cars, 3 motorcycles, 1 trucks, Done. (0.013s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 256x320 3 persons, 2 boats, Done. (0.012s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 256x320 14 cars, Done. (0.011s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 256x320 11 persons, 1 cars, Done. (0.013s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 256x320 9 persons, 7 bicycles, 1 backpacks, Done. (0.011s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 256x320 4 persons, 1 cars, 1 buss, 1 trucks, Done. (0.011s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 256x320 3 cars, 2 stop signs, Done. (0.011s)
image 9/10 data/samples/bus.jpg: 320x256 4 persons, 1 buss, 1 handbags, Done. (0.011s)
image 10/10 data/samples/zidane.jpg: 192x320 3 persons, 3 ties, Done. (0.010s)
复制代码

2. 最大边放缩 512

运行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 512 --cfg cfg/yolov3-spp.cfg --source data/samples/
复制代码

输出: 模型平均耗时 20ms

Using CUDA device0 _CudaDeviceProperties(name='GeForce RTX 2060 SUPER', total_memory=7979MB)
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 384x512 4 persons, 2 dogs, Done. (0.018s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 320x512 9 persons, 4 cars, 2 motorcycles, 1 trucks, 1 benchs, 1 chairs, Done. (0.019s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 384x512 3 persons, 3 boats, 1 birds, Done. (0.018s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 384x512 4 persons, 18 cars, 2 traffic lights, Done. (0.019s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 384x512 13 persons, 1 trucks, Done. (0.020s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 384x512 16 persons, 6 bicycles, 2 backpacks, 1 bottles, 1 cell phones, Done. (0.020s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 384x512 3 persons, 1 cars, 1 buss, Done. (0.020s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 384x512 3 cars, 1 stop signs, Done. (0.020s)
image 9/10 data/samples/bus.jpg: 512x384 4 persons, 1 buss, 1 stop signs, 1 ties, 1 skateboards, Done. (0.018s)
image 10/10 data/samples/zidane.jpg: 320x512 3 persons, 2 ties, Done. (0.015s)
复制代码

3. 最大边放缩 800

运行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 800 --cfg cfg/yolov3-spp.cfg --source data/samples/
复制代码

输出:模型平均耗时 40ms

Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 544x800 7 persons, 2 dogs, 1 handbags, Done. (0.041s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 480x800 10 persons, 8 cars, 1 motorcycles, 2 trucks, 1 chairs, Done. (0.040s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 544x800 3 persons, 4 boats, Done. (0.045s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 544x800 6 persons, 27 cars, 1 buss, 1 trucks, 6 traffic lights, Done. (0.045s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 544x800 14 persons, 2 cars, 1 trucks, 1 ties, Done. (0.047s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 544x800 21 persons, 11 bicycles, 1 backpacks, 2 handbags, 4 bottles, 1 cell phones, Done. (0.038s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 608x800 6 persons, 1 cars, 1 buss, Done. (0.038s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 544x800 4 cars, 1 trucks, 2 stop signs, Done. (0.037s)
image 9/10 data/samples/bus.jpg: 800x608 4 persons, 1 bicycles, 1 buss, 1 ties, Done. (0.039s)
image 10/10 data/samples/zidane.jpg: 480x800 3 persons, 2 ties, Done. (0.029s)
复制代码

4. 最大边放缩 1024

运行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 1024 --cfg cfg/yolov3-spp.cfg --source data/samples/
复制代码

结果: 模型平均耗时 50ms

Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 704x1024 5 persons, 1 dogs, 1 handbags, Done. (0.054s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 576x1024 13 persons, 5 cars, 1 motorcycles, 2 trucks, 1 umbrellas, 1 chairs, Done. (0.049s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 704x1024 3 persons, 6 boats, 4 birds, Done. (0.049s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 704x1024 10 persons, 24 cars, 1 buss, 2 trucks, 7 traffic lights, Done. (0.051s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 704x1024 14 persons, 1 cars, 1 trucks, 2 handbags, 2 ties, Done. (0.048s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 704x1024 25 persons, 8 bicycles, 1 backpacks, 1 handbags, 1 kites, 2 bottles, 1 cell phones, Done. (0.051s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 768x1024 5 persons, 1 cars, 1 buss, Done. (0.052s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 704x1024 4 cars, 1 trucks, 1 stop signs, Done. (0.048s)
image 9/10 data/samples/bus.jpg: 1024x768 3 persons, 1 bicycles, 1 buss, 1 cell phones, Done. (0.054s)
image 10/10 data/samples/zidane.jpg: 576x1024 1 persons, 3 ties, Done. (0.042s)
Results saved to /home/song/yolov3/output
Done. (0.943s) avg time (0.094s)
复制代码

YoloV3 输出照片

2.CenterNet

1. 最大边放缩 320

运行:~/CenterNet/src$

python demo.py ctdet --demo ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth --input_h 256 --input_w 320
复制代码

结果: 模型平均耗时 16ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 256, 320])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.202s |load 0.005s |pre 0.003s |net 0.191s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.059s |load 0.031s |pre 0.009s |net 0.016s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.027s |load 0.010s |pre 0.002s |net 0.012s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.052s |load 0.022s |pre 0.008s |net 0.018s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.053s |load 0.021s |pre 0.009s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.057s |load 0.028s |pre 0.008s |net 0.017s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.036s |load 0.008s |pre 0.005s |net 0.019s |dec 0.001s |post 0.003s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.053s |load 0.027s |pre 0.006s |net 0.016s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.044s |load 0.017s |pre 0.005s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.028s |load 0.009s |pre 0.003s |net 0.012s |dec 0.002s |post 0.002s |merge 0.000s 
复制代码

2. 最大边放缩 512

运行:~/CenterNet/src$

python demo.py ctdet --demo ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth --input_h 384 --input_w 512
复制代码

输出:模型平均耗时 20ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 384, 512])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.206s |load 0.005s |pre 0.006s |net 0.193s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.041s |load 0.012s |pre 0.007s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.031s |load 0.005s |pre 0.005s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.060s |load 0.018s |pre 0.016s |net 0.022s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.046s |load 0.009s |pre 0.010s |net 0.023s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.055s |load 0.018s |pre 0.014s |net 0.018s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.059s |load 0.021s |pre 0.015s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.037s |load 0.008s |pre 0.007s |net 0.018s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.070s |load 0.038s |pre 0.009s |net 0.020s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.062s |load 0.027s |pre 0.014s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
复制代码
  1. 最大边放缩 800

运行:~/CenterNet/src$

python demo.py ctdet --demo ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth --input_h 544 --input_w 800
复制代码

结果:模型平均耗时 41ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 544, 800])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.234s |load 0.005s |pre 0.015s |net 0.211s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.105s |load 0.031s |pre 0.027s |net 0.044s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.092s |load 0.023s |pre 0.023s |net 0.042s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.073s |load 0.009s |pre 0.021s |net 0.040s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.091s |load 0.021s |pre 0.026s |net 0.040s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.085s |load 0.019s |pre 0.022s |net 0.042s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.091s |load 0.021s |pre 0.026s |net 0.040s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.085s |load 0.017s |pre 0.022s |net 0.042s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.098s |load 0.033s |pre 0.020s |net 0.040s |dec 0.003s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.074s |load 0.011s |pre 0.020s |net 0.040s |dec 0.001s |post 0.002s |merge 0.000s 
复制代码

4. 最大边放缩 1024

运行:~/CenterNet/src$

python demo.py ctdet --demo ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth --input_h 704 --input_w 1024
复制代码

结果:模型平均耗时 53ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 704, 1024])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.260s |load 0.005s |pre 0.025s |net 0.227s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.100s |load 0.007s |pre 0.027s |net 0.063s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.112s |load 0.014s |pre 0.034s |net 0.060s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.117s |load 0.022s |pre 0.029s |net 0.062s |dec 0.002s |post 0.002s |merge 0.000s
torch.Size([1, 3, 704, 1024])
tot 0.119s |load 0.021s |pre 0.033s |net 0.061s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.090s |load 0.007s |pre 0.018s |net 0.061s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.108s |load 0.020s |pre 0.033s |net 0.051s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.078s |load 0.007s |pre 0.018s |net 0.051s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.116s |load 0.036s |pre 0.025s |net 0.050s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.082s |load 0.006s |pre 0.020s |net 0.051s |dec 0.003s |post 0.002s |merge 0.000s 
复制代码

3. 补充实验

在进一步研究了两个代码的实现后,笔者发现了实验的一个问题,只对比了模型推理速度,虽然能看出模型推理效率。可是在实际应用场景中,先后处理也有必定耗时,因此笔者增长了一个在 640/1280 尺寸上总体耗时对比,来讲明实际应用时速度差别。

1.U 版 YoloV3

1. 最大边放缩 640

运行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 640  --cfg cfg/yolov3-spp.cfg --source data/samples/
复制代码

 结果:模型平均耗时 26ms,总体平均耗时 64ms

Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 448x640 5 persons, 2 dogs, Done. (0.028s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 384x640 12 persons, 6 cars, 3 motorcycles, 1 trucks, 1 chairs, Done. (0.026s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 448x640 3 persons, 4 boats, Done. (0.025s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 448x640 8 persons, 22 cars, 1 buss, 1 trucks, 3 traffic lights, 1 clocks, Done. (0.028s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 448x640 14 persons, 1 cars, 1 trucks, Done. (0.026s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 448x640 19 persons, 6 bicycles, 1 backpacks, 2 bottles, 1 cell phones, Done. (0.028s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 512x640 5 persons, 1 cars, 1 buss, Done. (0.025s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 448x640 5 cars, 1 trucks, 2 stop signs, Done. (0.024s)
image 9/10 data/samples/bus.jpg: 640x512 4 persons, 1 buss, Done. (0.025s)
image 10/10 data/samples/zidane.jpg: 384x640 3 persons, 2 ties, Done. (0.021s)
Results saved to /home/song/yolov3/output
Done. (0.638s) avg time (0.064s)
复制代码

2. 最大边放缩 1280

运行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 1280  --cfg cfg/yolov3-spp.cfg --source data/samples/
复制代码

 结果:模型平均耗时 78ms,总体平均耗时 124ms

Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 896x1280 4 persons, 3 dogs, 1 backpacks, Done. (0.085s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 704x1280 16 persons, 10 cars, 1 motorcycles, 1 buss, 1 trucks, 1 umbrellas, 1 chairs, Done. (0.061s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 896x1280 5 persons, 3 boats, 3 birds, Done. (0.075s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 896x1280 11 persons, 27 cars, 1 buss, 5 trucks, 7 traffic lights, 1 remotes, Done. (0.075s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 896x1280 14 persons, 2 cars, 1 trucks, 5 handbags, 7 ties, Done. (0.074s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 832x1280 27 persons, 10 bicycles, 2 backpacks, 3 handbags, 1 kites, 2 bottles, 1 cell phones, Done. (0.070s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 960x1280 5 persons, 1 cars, 1 buss, Done. (0.079s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 896x1280 5 cars, 1 trucks, 1 stop signs, 1 benchs, Done. (0.075s)
image 9/10 data/samples/bus.jpg: 1280x960 4 persons, 1 bicycles, 1 ties, 1 cups, Done. (0.078s)
image 10/10 data/samples/zidane.jpg: 768x1280 2 ties, Done. (0.062s)
Results saved to /home/song/yolov3/output
Done. (1.236s) avg time (0.124s)
复制代码

2.CenterNet

1. 最大边放缩 640

运行:~/CenterNet/src$

python demo.py ctdet --demo  ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth  --input_h 448 --input_w 640
复制代码

 结果:模型平均耗时 27ms,总体平均耗时 50ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 448, 640])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.223s |load 0.005s |pre 0.010s |net 0.206s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 448, 640])
tot 0.079s |load 0.026s |pre 0.021s |net 0.029s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.074s |load 0.023s |pre 0.018s |net 0.029s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.044s |load 0.005s |pre 0.007s |net 0.029s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.043s |load 0.005s |pre 0.006s |net 0.029s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.044s |load 0.006s |pre 0.006s |net 0.029s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.042s |load 0.004s |pre 0.007s |net 0.028s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.047s |load 0.006s |pre 0.007s |net 0.032s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.043s |load 0.009s |pre 0.008s |net 0.024s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.040s |load 0.006s |pre 0.007s |net 0.024s |dec 0.001s |post 0.002s |merge 0.000s |
复制代码

2. 最大边放缩 1280

运行:~/CenterNet/src$

python demo.py ctdet --demo  ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth  --input_h 896 --input_w 1280
复制代码

  结果:模型平均耗时 78ms,总体平均耗时 115ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 896, 1280])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.302s |load 0.005s |pre 0.039s |net 0.254s |dec 0.003s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.137s |load 0.007s |pre 0.041s |net 0.085s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.114s |load 0.005s |pre 0.027s |net 0.077s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.160s |load 0.023s |pre 0.057s |net 0.076s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.114s |load 0.004s |pre 0.025s |net 0.080s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.113s |load 0.006s |pre 0.026s |net 0.076s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.111s |load 0.004s |pre 0.025s |net 0.077s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.126s |load 0.014s |pre 0.029s |net 0.079s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.125s |load 0.010s |pre 0.031s |net 0.080s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.114s |load 0.006s |pre 0.026s |net 0.078s |dec 0.002s |post 0.002s |merge 0.000s |
复制代码

4. 实验总结

CenterNet vs YoloV3 速度 模型推理耗时

模型 \ 尺寸 320 512 800 1024
YoloV3-spp-ultralytics 12ms 20ms 40ms 50ms
CenterNet-DLA-34 16ms 20ms 41ms 53ms

CenterNet vs YoloV3 速度 模型推理 / 总体耗时                                                        

模型 \ 尺寸 640 模型 640 总体 1280 模型 1280 总体
YoloV3-spp-ultralytics 26ms 64ms 77ms 124ms
CenterNet-DLA-34 27ms 50ms 78ms 115ms

CenterNet vs YoloV3 速度 模型大小 / 内存消耗  

模型 \ 资源 模型体积 1280 尺寸内存占用
YoloV3-spp-ultralytics 252.3 MB (252,297,867 字节) 1.7G
CenterNet-DLA-34 80.9 MB (80,911,783 字节) 1.2G

速度与资源依据,笔者亲测结果,欢迎复现质疑

 
模型 \ 尺寸 512
YoloV3 32.7 map
YoloV3-spp 35.6 map
YoloV3-spp-ultralytics 42.6 map
CenterNet-DLA-34 37.4 map

精度依据:

1.github.com/ultralytics…

2.github.com/xingyizhou/…

结论以下:

关于笔者的质疑部分 “笔者对 CenterNet 针对 YoloV3 速度的提高仍是有些怀疑的”,实验结果部分证实笔者怀疑的正确性。

单纯看模型推理速度方面,CenterNet-DLA-34 在不一样尺度下均比 YoloV3-spp 版本耗时增长一些(1%-3%)与论文略有不符。可是若是将处理时间也考虑进去,CenterNet-DLA-34 在不一样尺度下均比 YoloV3-spp 版本耗时减小仍是很明显的,约有 5%-10% 的提速。

在模型大小与内存占用方面,CenterNet-DLA-34 效果较与 YoloV3-spp 版本提高仍是比较明显,体积降低为 YoloV3-spp 版本的 25% 左右,推理 GPU 内存占用也降低为 70% 左右,考虑这是 Anchor Free 方法带来的优点。

从表格 CenterNet vs YoloV3x coco 精度 中能够看出在相同尺度下,CenterNet 相较于 YoloV3 原版提高比较明显 5 个百分点,相较于 YoloV3-spp 也有 2 个百分点提高 ,可是相较于 YoloV3-spp-ultralytics(U 版 YoloV3-spp),仍是有 5 个百分点的不足。固然这个前提是这些数据准确可靠的,我倾向于相信这个结果,但没法对此结果负责。总结一下:CenterNet 相较于 YoloV3 原版提高比较明显,可是针对改进 YoloV3-spp 提高不明显,也低于 U 版 YoloV3-spp。

总结以下,CenterNet 不失为开创性的工做,统一了关键点与目标检测的流程,结构简单,使用便捷,笔者很是喜好这个网络,把它应用到实际场景之中,速度精度较 YoloV3 乃至 YoloV3-spp 均有提高,除了部署难度会稍微大些(主要是 DCN 目前推理框架支持不友好,可是也是有解决方法的)。

CenterNet 凭借**结构简单,使用便捷,速度快精度高,占用内存少等优势,**是能够替换 YoloV3,具有必定优点。虽然 YoloV4 也出来了,笔者以为,可是 YoloV4 在精度提高的同时,总体的复杂程度模型耗时也增长一些,YoloV4 彻底替换 YoloV3,并不现实(读者若是对 YoloV4 对比 YoloV3 效果感兴趣,能够评论说出来,若是感兴趣朋友多,笔者能够更新一篇)。

最后到了本文的结论,CenterNet 相较于 YoloV3 优点很明显,推荐尝试替换