深度学习硬件加速综述写作心得

大三上学期为了写一篇关于FPGA-based 硬件加速的综述,我查阅了大量相关文献,经过大约三个月时间写了第一篇英文综述论文。

关于如何阅读和收集论文,以及工具推荐请参考这两篇文章:

  1. 吴恩达关于机器学习职业生涯以及阅读论文的一些建议(附AI领域必读的10篇论文PDF)
  2. 有了这些珍藏的实用工具/学习网站,自学更快乐! 文中有论文免费下载方式

我大致讲一下这篇文章是怎么完成的,如果有兴趣的朋友欢迎留言,后期会考虑更新如何写综述的详细文章。

一月份放假第二天我参加了江苏省大学生万人计划学术冬令营,很有幸进了南京大学的电子信息前沿技术冬令营,不得不说在这个营期里我学到了太多东西。顺便提一下,我们的主要活动:学术讲座、人工智能实训、学术沙龙。在营期结束后,我回家待了一个星期左右,这期间我在考虑一个十分感兴趣的方向AI硬件加速器的设计。因为之前的课程设计、FPGA竞赛等数字逻辑器件设计及机器学习算法的学习,我下定决定要查找相关文献,想要弄清硬件加速的研究现状,同时还能实践由南大陶涛教授介绍的Web of Science的文献检索方法。

在这里插入图片描述

就这样,我在家上了浙江图书馆查找了很多这方面的国内外重要文献。刚开始看到这么多晦涩难懂的文章,我一脸茫然。经过分析,我发现有很多国内外著名的教授写了一些综述性的文章,这时我细致的看了大约五篇,然后发现了一些写作框架的构建技巧。于是乎,我当即列了一个框架。

然后,我又找了很多硬件加速器的具体实现方法的文章,这个时候的文章就较为难懂了,有基于CPU、GPU、DSP,也有基于忆阻器等实现的。我重点看关于FPGAs的实现方法和基于FPGA做了何种应用的加速,如手写字识别加速、图像压缩加速等。

到这个时候,我就得分类整理各种平台的实现方法、先进性、不足之处,然后总结出几个平台的差别,对比出FPGA的硬件加速实现。

在我写的这篇文章中,我用了大量篇幅来详细阐述深度学习、CNNs的加速器设计发展,综合对比,最后给出建设性的意见或者结论。

在这里插入图片描述

文章结构

写综述的两个建议:

  1. 写综述文章,着眼点越小越有针对性,越能写出好的综述;

  2. 另外,特别想要给的建议就是好好应用表格、统计图等数据图表来总结各种方法、发展历程对比等。这个方法尤为好用,尤为推荐!

在这里插入图片描述

下面给出草稿封面图如下:

在这里插入图片描述

文献检索的能力相当重要,下面是我在写的时候查找到的文献,供大家参考。

Refences

[1] Boukaye Boubacar Traore, Bernard Kamsu-Foguem, Fana Tangara, Deep convolution neural network for image recognition, in: Ecological Informatics,Volume 48,2018,Pages 257-268,ISSN 1574-9541.

[2] A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional neural networks, in: Advances in Neural Information Processing Systems, 2012, pp. 1097–1105.

[3] K. Simonyan and A. Zisserman, Very deep convolutional networks for large-scale image recognition, arXiv:1409.1556, 2014.

[4] S. Ren, K. He, R. Girshick, J. Sun, Faster r-cnn: towards real-time object detection with region proposal networks, in: Advances in Neural Information Processing Systems, 2015, pp. 91–99.

[5] J. Redmon, S. Divvala, R. Girshick, A. Farhadi, You only look once: unified, real-time object detection, in: Proceedings of The IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 779–788.

[6] Dawid Po?ap,Marcin Wozniak.Voice Recognition by Neuro-Heuristic Method[J].Tsinghua Science and Technology,2019,24(01):9-17.

[7] A. Ucar, Y. Demir, C. Guzelis, Object recognition and detection with deep learning for autonomous driving applications, (in English), Simul.-Trans. Soc. Model. Simul. Int. 93 (9) (Sep 2017) 759–769, doi:10.1177/0037549717709932.

[8] P. Pelliccione, E. Knauss, R. Heldal, et al., Automotive architecture framework: the experience of volvo cars, J. Syst. Archit. 77 (2017) 83–100. 06/01/ 2017 https://doi.org/10.1016/j.sysarc.2017.02.005.

[9] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521,no. 7553, pp. 436–444, 2015.

[10] D. Aysegul, J. Jonghoon, G. Vinayak, K. Bharadwaj,C. Alfredo, M. Berin, and C. Eugenio. Accelerating deep neural networks on mobile processor with embedded programmable logic. In NIPS 2013. IEEE, 2013.

[11] S. Cadambi, A. Majumdar, M. Becchi, S. Chakradhar, and H. P. Graf. A programmable parallel accelerator for learning and classification. In Proceedings of the 19th international conference on Parallel architectures and compilation techniques, pages 273{284. ACM, 2010.

[12] C. Farabet, C. Poulet, J. Y. Han, and Y. LeCun. Cnp: An fpga-based processor for convolutional networks. In Field Programmable Logic and Applications, 2009. FPL 2009. International Conference on, pages 32{37. IEEE, 2009.

[13] M. Peemen, A. A. Setio, B. Mesman, and H. Corporaal. Memory-centric accelerator design for convolutional neural networks. In Computer Design (ICCD), 2013 IEEE 31st International Conference on, pages 13{19. IEEE, 2013.

[14] 侯宇青阳,全吉成,王宏伟.深度学习发展综述[J].舰船电子工程,2017,37(04):5-9+111.

[15] 张荣,李伟平,莫同.深度学习研究综述[J].信息与控制,2018,47(04):385-397+410.

[16] Jiuxiang Gu, Zhenhua Wang, Jason Kuen, Lianyang Ma, Amir Shahroudy, Bing Shuai, Ting Liu, Xingxing Wang, Gang Wang, Jianfei Cai, Tsuhan Chen,Recent advances in convolutional neural networks,Pattern Recognition,Volume 77,2018,Pages 354-377,ISSN 0031-3203,

[17] McCulloch W S, Pitts W. A logical calculus of the ideas immanent in nervous activity [J]. Bulletin of Mathematical Biophysics,1943,5(4): 115-133

[18] Rosenblatt F. The perceptron: A probabilistic model for information storage and organization in the brain [J].Psychological Review, 1958,65(6):386-408

[19] Hubel D H, Wiesel T N. Receptive fields, binocular interaction and functional architecture in the cat’s visual cortex[J]. Journal of Physiology, 1962,160(1):106-154

[20] M. Sankaradas, V. Jakkula, S. Cadambi, S. Chakradhar,I. Durdanovic, E. Cosatto, and H. P. Graf. A massively parallel coprocessor for convolutional neural networks. In Application-specific Systems, Architectures and Processors, 2009. ASAP 2009. 20th IEEE International Conference on, pages 53{60. IEEE, 2009.

[21] C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, J. Cong, “Optimizing fpga-based accelerator design for deep convolutional neural networks”, FPGA, 2015.

[22] Liu Shaoli, Du Zidong, Tao Jinhua, et al. Cambricon: An instruction set architecture for neural networks[C]//Proc of the 43rd Int Symp on Computer Architecture. Piscataway, NJ:IEEE,2016:393-405

[23] Qianru Zhang, Meng Zhang, Tinghuan Chen, Zhifei Sun, Yuzhe Ma, Bei Yu, Recent advances in convolutional neural network acceleration, Neurocomputing, Volume 323, 2019, Pages 37-51,ISSN 0925-2312,

[24] 吴艳霞,梁楷,刘颖,崔慧敏.深度学习FPGA加速器的进展与趋势[J/OL].计算机学报,2019:1-20[2019-03-19].
http://kns.cnki.net/kcms/detail/11.1826.TP.20190114.1037.002.html.

[25] Cavigelli L, Gschwend D, Mayer C, et al. Origami: A convolutional network accelerator//Proceedings of the Great Lakes Symposium on VLSI. Pittsburgh, USA, 2015: 199-204

[26] Chen Y-H, Krishna T, Emer J, et al. 14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks //Proceedings of the 2016 IEEE International Solid-State Circuits Conference (ISSCC). San Francisco, USA, 2016: 262-263

[27] Shafiee A, Nag A, Muralimanohar N, et al. ISAAC: A convolutional neural network accelerator with In-situ analog arithmetic in crossbars//Proceedings of the ISCA. Seoul, ROK, 2016: 14-26

[28] Andri R, Cavigelli L, Rossi D, et al. YodaNN: An ultra-low power convolutional neural network accelerator based on binary weights//Proceedings of the IEEE Computer Society Annual Symposium on VLSI. Pittsburgh, USA, 2016: 236-241

[29] Gokmen T, Vlasov Y. Acceleration of deep neural network training with resistive cross-point devices: design considerations. Front neurosci, 2016, 10(51): 333

[30] 陈桂林,马胜,郭阳.硬件加速神经网络综述[J/OL].计算机研究与发展,2019(02)[2019-03-20].
http://kns.cnki.net/kcms/detail/11.1777.TP.20190129.0940.004.html.

[31] 沈阳靖,沈君成,叶俊,马琪.基于FPGA的脉冲神经网络加速器设计[J].电子科技,2017,30(10):89-92+96.

[32] 王思阳. 基于FPGA的卷积神经网络加速器设计[D].电子科技大学,2017.

[33] Nurvitadhi E V G, Sim J, et al. Can FPGAs beat GPUs in accelerating next-generation deep neural networks?//Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. Monterey, USA, 2017: 5-14

[34] Wang, T., Wang, C., Zhou, X., & Chen, H. (2018). A Survey of FPGA Based Deep Learning Accelerators: Challenges and Opportunities. arXiv preprint arXiv:1901.04988.

[35] C. Farabet, Y. LeCun, K. Kavukcuoglu, et al. Large-scale FPGA-based convolutional networks[J]. In Scaling up Machine Learning: Parallel and Distributed Approaches eds Bekkerman, 2011, 399–419.

[36] C. Farabet, B. Martini, B. Corda, et al. Neuflow: A runtime reconfigurable dataflowprocessor for vision[C]. In Computer Vision and Pattern Recognition Workshops, 2011,109–116.

[37] M. Peemen, A. Setio, B. Mesman, et al. Memory-centric accelerator design forconvolutional neural networks[C]. IEEE International Conference on Computer Design,2013, 13–19.

[38] M. Sankaradas, V. Jakkula, S. Cadambi, et al. A massively parallel coprocessor forconvolutional neural networks[C]. In Application Specific Systems, Architectures andProcessors, 2009, 53–60.

[39] Wei Ding, Zeyu Huang, Zunkai Huang, Li Tian, Hui Wang, Songlin Feng,Designing efficient accelerator of depthwise separable convolutional neural network on FPGA,Journal of Systems Architecture,2018,ISSN 1383-7621.

[40] 刘勤让,刘崇阳.利用参数稀疏性的卷积神经网络计算优化及其FPGA加速器设计[J].电子与信息学报,2018,40(06):1368-1374.

[41] Yufei Ma, Naveen Suda, Yu Cao, Sarma Vrudhula, and Jae Sun Seo. 2018. ALAMO: FPGA acceleration of deep learning algorithms with a modularized RTL compiler. Integration, the VLSI Journal. [doi>10.1016/j.vlsi.2017.12.009]

[42] 余子健,马德,严晓浪,沈君成.基于FPGA的卷积神经网络加速器[J].计算机工程,2017,43(01):109-114+119.

[43] T. Chen et al., “DianNao: A small-footprint high-throughput accelerator for ubiquitous machine-learning,” in Proc. ASPLOS, Salt Lake City, UT, USA, 2014, pp. 269–284.

[44] D. L. Ly and P. Chow, “A high-performance FPGA architecture for restricted Boltzmann machines,” in Proc. FPGA, Monterey, CA, USA,2009, pp. 73–82.

[45] S. K. Kim, L. C. McAfee, P. L. McMahon, and K. Olukotun, “A highly scalable restricted Boltzmann machine FPGA implementation,” in Proc. FPL, Prague, Czech Republic, 2009, pp. 367–372.

[46] J. Qiu et al., “Going deeper with embedded FPGA platform for convolutional neural network,” in Proc. FPGA, Monterey, CA, USA, 2016,pp. 26–35.

[47] Q. Yu, C. Wang, X. Ma, X. Li, and X. Zhou, “A deep learning prediction process accelerator based FPGA,” in Proc. CCGRID, Shenzhen, China, 2015, pp. 1159–1162.

[48] C. Wang, L. Gong, Q. Yu, X. Li, Y. Xie, X. Zhou, “DLAU: A scalable deep learning accelerator unit on FPGA”, IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst., vol. 36, no. 3, pp. 513-517, Mar. 2017.

[49] 陈煌,祝永新,田犁,汪辉,封松林.基于FPGA的卷积神经网络卷积层并行加速结构设计[J].微电子学与计算机,2018,35(10):85-88.



推荐阅读(点击标题可跳转阅读)

[1] 机器学习实战 | 逻辑回归应用之“Kaggle房价预测”

[2] 机器学习实战 | 逻辑回归应用之“Kaggle泰坦尼克之灾”

[3] 本科生晋升GM记录:Kaggle比赛进阶技巧分享

[4] 表情识别FER | 基于深度学习的人脸表情识别系统(Keras)

[5] PyTorch实战 | 使用卷积神经网络对CIFAR10图片进行分类(附源码)

[6] 有了这些珍藏的实用工具/学习网站,自学更快乐!



关注公众号迈微电子研发社,文章首发与公众号。
在这里插入图片描述

△微信扫一扫关注「迈微电子研发社」公众号

知识星球:社群旨在分享AI算法岗的秋招/春招准备攻略(含刷题)、面经和内推机会、学习路线、知识题库等。
在这里插入图片描述

△扫码加入「迈微电子研发社」学习辅导群

在这里插入图片描述