CV Code|计算机视觉开源周报20200502期

五月第二周，盘点本周新开源或即将开源的CV代码，涵盖方向普遍，不只涉及到技术创新，还涉及多种CV应用，但愿对你们有帮助。

图像分割ios

[1].A Hand Motion-guided Articulation and Segmentation Estimationgit

手部运动引导的关节模型估计与分割github

做者 | Richard Sahala Hartanto, Ryoichi Ishikawa, Menandro Roxas, Takeshi Oishi

单位 | 东京大学

论文 | https://arxiv.org/abs/2005.03691

代码 | https://github.com/cln515/Articulation-Estimation

[2].A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View算法

Sim2Real深度学习方法，用于将图像从多个车载摄像头转换为鸟瞰图中的语义分割图像微信

做者 | Lennart Reiher, Bastian Lampe, Lutz Eckstein网络

单位 | 德国联邦教育与研究部；亚琛工业大学架构

论文 | https://arxiv.org/abs/2005.04078app

代码 | https://github.com/ika-rwth-aachen/Cam2BEV框架

[3].BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation机器学习

BiSeNetV2 实时语义分割算法的非官方TF实现，在cityscapes验证集上达到71.563 miou，在GTX1070 GPU上达到 83fps。

做者 | Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang

单位 | 华中科技大学；阿德莱德大学；香港中文大学；腾讯

论文 | https://arxiv.org/abs/2004.02147

代码 | https://github.com/MaybeShewill-CV/bisenetv2-tensorflow

[4]Class-Incremental Learning for Semantic Segmentation Re-Using Neither Old Data Nor Old Labels

既不使用旧数据，也不使用旧标签的语义分割的类增量学习

做者 | Marvin Klingner, Andreas Bär, Philipp Donn, Tim Fingscheidt

单位 | Technische Universitat Braunschweig

论文 | https://arxiv.org/abs/2005.06050

代码 | https://github.com/ifnspaml/CIL_Segmentation（将开源）

[5].Detection and Retrieval of Out-of-Distribution Objects in Semantic Segmentation

在语义分割中检测和检索不在训练集分布内的目标，在Cityscapes数据集上训练，在A2D2数据集测试。

重磅！200G超大自动驾驶数据集A2D2下载

做者 | Philipp Oberdiek, Matthias Rottmann, Gernot A. Fink

单位 | 多特蒙德工业大学；伍珀塔尔大学

论文 | https://arxiv.org/abs/2005.06831

代码 | https://github.com/RonMcKay/OODRetrieval

目标检测

#半监督目标检测#

[6].A Simple Semi-Supervised Learning Framework for Object Detection

谷歌提出新算法STAC，使用在无标签的图像上检测到的目标的伪标签训练更新模型，在VOC07数据集上改进了AP0.5从76.3到79.8，在COCO数据集上仅使用5%标签数据实现 24.38mAP（相对比，监督方法使用10%标签数据达到23.86 mAP）。

做者 | Kihyuk Sohn, Zizhao Zhang, Chun-Liang Li, Han Zhang, Chen-Yu Lee, Tomas Pfister

单位 | 谷歌

论文 | https://arxiv.org/abs/2005.04757v1

代码 | https://github.com/google-research/ssl_detection/

#拥挤场景目标检测#

[7].IterDet: Iterative Scheme for ObjectDetection in Crowded Environments

目标检测每每会生成大量的目标候选框，一般的作法是使用NMS过滤目标。但对于拥挤场景的目标检测，这每每会把靠的过近的正确的目标个体去掉了。

为此，本文发明了一种迭代的目标检测方法，目标检测一次后图像被再一次输入网络，但此前检测结果被保留，使其再也不被检测到。这种迭代检测机制大大改进了拥挤场景的目标检测，代码已开源。

做者 | Danila Rukhovich, Konstantin Sofiiuk, Danil Galeev, Olga Barinova, Anton Konushin

单位 | 三星公司

论文 | https://arxiv.org/abs/2005.05708v1

代码 | https://github.com/saic-vul/iterdet

#烟雾识别#

[8].RISE Video Dataset: Recognizing Industrial Smoke Emissions

RISE视频数据集：识别工业烟气排放，代码与数据集都是开源的

做者 | Yen-Chia Hsu, Ting-Hao (Kenneth)Huang, Ting-Yao Hu, Paul Dille, Sean Prendi, Ryan Hoffman, Anastasia Tsuhlares, Randy Sargent, Illah Nourbakhsh

单位 | 宾夕法尼亚州立大学；CMU

论文 | https://arxiv.org/abs/2005.06111

代码 | https://github.com/CMU-CREATE-Lab/deep-smoke-machine

人脸技术

[9].High Resolution Face Age Editing

高分辨率人脸年龄编辑

人脸年龄编辑：迫不得已花落去，似曾类似春又来！

做者 | Xu Yao, Gilles Puy, Alasdair Newson, Yann Gousseau, Pierre Hellier

单位 | 巴黎综合理工学院；Valeo.ai

论文 | https://arxiv.org/abs/2005.04410

代码 | https://github.com/InterDigitalInc/HRFAE

[10].DeepFaceLab: A simple, flexible and extensible face swapping framework

风靡全球的换脸软件DeepFaceLab 发布论文公布了其技术原理，这是一款在Github上有近1.4W颗星的工程，也被众多youtube博主推荐和使用，据称95%的假视频背后的技术支持来自DeepFaceLab。

尽管仍具争议，但该工程开发者但愿借助公布技术细节，促进你们对换脸技术的了解和使用。

值得一提的是，尽管做者们没有公布工做单位，但从名字看出该软件大部分核心开发者是华人。

做者 | Ivan Petrov, Daiheng Gao, Nikolay Chervoniy, Kunlin Liu, Sugasa Marangonda, Chris Umé, Jian Jiang, Luis RP, Sheng Zhang, Pingyu Wu, Weiming Zhang

论文 | https://arxiv.org/abs/2005.05535

代码 | https://github.com/iperov/DeepFaceLab/

#微表情识别#

[11].ICE-GAN: Identity-aware and Capsule-Enhanced GAN for Micro-Expression Recognition and Synthesis

ICE-GAN：用于微表情识别和合成，个体感知和胶囊加强GAN方法

做者 | Jianhui Yu, Chaoyi Zhang, Yang Song, Weidong Cai

单位 | 悉尼大学；新南威尔士大学

论文 | https://arxiv.org/abs/2005.04370

代码 | https://github.com/crane-papercode/ICE-GAN（即将开源）

目标跟踪

[12].TSDM: Tracking by SiamRPN++ with a Depthrefiner and a Mask-generator

大连理工大学提出一种结合深度信息（RGB-D）与 SiamRPN++算法的目标跟踪器，其高精度版本跟踪精度大幅超越现有SOTA方法，帧率可达23fps，轻量级版本可达31fps，是一种实用的跟踪方法。

做者 | Pengyao Zhao, Quanli Liu, Wei Wang, Qiang Guo

单位 | 大连理工大学

论文 | https://arxiv.org/abs/2005.04063

代码 | https://github.com/lql-team/TSDM

视线估计

[13].MLGaze: Machine Learning‐Based Analysis of Gaze Error Patterns in Consumer Eye Tracking Systems

基于机器学习的消费级眼动跟踪系统中凝视错误模式分析

做者 | Anuradha Kar

论文 | https://arxiv.org/abs/2005.03795

代码 | https://github.com/anuradhakar49/

MLGaze

数据集 | https://data.mendeley.com/datasets/cfm4d9y7bh/1

无监督、自监督

[14].Learning to Segment Actions from Observation and Narration

使用视频旁白进行动做分割，专一无监督和弱监督方法但取得了和监督方法可比较的精度。

做者| Daniel Fried、 Jean-Baptiste Alayrac、 Phil Blunsom、 Chris Dyer、 Stephen Clark、 Aida Nematzadeh†、

单位 | DeepMind、加州大学伯克利分校

论文 | https://arxiv.org/pdf/2005.03684.pdf

代码 | https://github.com/dpfried/action-segmentation

#自监督强化学习#

[15].Planning to Explore via Self-Supervised World Models

做者 | Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

单位 | 宾夕法尼亚大学；加州大学伯克利分校；谷歌；多伦多大学；卡内基梅隆大学；Facebook

论文 | https://arxiv.org/abs/2005.05960

代码 | https://github.com/ramanans1/

plan2explore

视频 | https://youtu.be/GftqnPWsCWw

#CVPR 2020#

[16].On the uncertainty of self-supervised monocular depth estimation

自监督单目深度估计的不肯定性研究

单位｜博洛尼亚大学

论文 | https://arxiv.org/abs/2005.06209

代码 | https://github.com/mattpoggi/mono-uncertainty

人群计数

[17].Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting

做者发明了新的训练目标Local Counting Map和新的网络架构Adaptive Mixture Regression Network，实现更加精确的人群计数。

做者 | Xiyang Liu, Jie Yang, Tieqiang Wang, Wenrui Ding

单位 | 北航、顺丰、中科院自动化所

论文 | https://arxiv.org/abs/2005.05776v1

代码 | https://github.com/xiyang1012/Local-Crowd-Counting

[18].Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions

Ambient Sound Helps：极端条件下的视听人群计数

收集了一个名为auDiovISual Crowd cOunting（DISCO）的大规模基准测试数据集，该数据集包含1,935张图像和相应的音频剪辑以及170,270个带标注的实例。

做者 | Di Hu, Lichao Mou, Qingzhong Wang, Junyu Gao, Yuansheng Hua, Dejing Dou, Xiao Xiang Zhu

单位 | 香港城市大学；百度；西北工业大学；慕尼黑工业大学

论文 | https://arxiv.org/abs/2005.07097

代码 | https://github.com/qingzwang/

AudioVisualCrowdCounting

视频检索

[19].Condensed Movies: Story Based Retrieval with Contextual Embeddings

VGG组最新视频检索论文，构建了基于关键场景的超大浓缩视频数据集，提出了全新的基于story的text-to-video 检索任务，并开发了baseline，展现了利用上下文信息对该任务的有效改进。

做者 | Max Bain，Arsha Nagrani[，Andrew Brown，Andrew Zisserman

单位 | 牛津大学

论文 | https://arxiv.org/pdf/2005.04208.pdf

代码 | http://www.robots.ox.ac.uk/

~vgg/research/condensed-movies

（无权访问）

视频描述

#ACL 2020#

[20].Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

做者 | Hyounghun Kim, Zineng Tang, Mohit Bansal

单位 | UNC Chapel Hill

论文 | https://arxiv.org/abs/2005.06409

代码 | https://github.com/hyounghk/

VideoQADenseCapFrameGate-ACL2020

视频识别

[21].TAM: Temporal Adaptive Module for Video Recognition

做者发明了一种时域自适应模块（TAM），可方便嵌入到2D CNNs中去，仅须要增长稍许计算代价。在Kinetics-400 数据集上战胜了其余时域方法，在Something-Something数据集上取得了大大超过以前SOTA的精度。

做者 | Zhaoyang Liu, Limin Wang, Wayne Wu, Chen Qian, Tong Lu

单位 | 南大；商汤

论文 | https://arxiv.org/abs/2005.06803

代码 | https://github.com/liu-zhy/TANet（将开源）

行人行为预测

[22].Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs

使用 Stacked RNNs 结合上下文特征融合的行人行为预测

做者 | Amir Rasouli, Iuliia Kotseruba, John K. Tsotsos

单位 | 约克大学

论文 | https://arxiv.org/abs/2005.06582

代码 | https://github.com/aras62/SF-GRU

数据集 | http://data.nvision2.eecs.yorku.ca/PIE_dataset/

图像修补

[23].Enhanced Residual Networks for Context-based Image Outpainting

本文提出一种加强的残差网络GAN模型用于图像向外扩展，生成天然合理的视觉修补图。

做者 | Przemek Gardias, Eric Arthur, Huaming Sun

单位 | 伍斯特理工学院

论文 | https://arxiv.org/abs/2005.06723

代码 | https://github.com/etarthur/Outpainting

深度视频插值

[24].W-Cell-Net: Multi-frame Interpolation of Cellular Microscopy Videos

W-Cell-Net：细胞显微视频的多帧插值方法

做者 | Rohit Saha, Abenezer Teklemariam, Ian Hsu, Alan M. Moses

单位 | 多伦多大学

论文 | https://arxiv.org/abs/2005.06684

代码 | https://github.com/RohitSaha/W-Cell-Net_cellular_video_interpolation

物体计数

[25].Introduction of a new Dataset and Method for Detecting and Counting the Pistachios based on Deep Learning

#如何开心地数开心果？# 开心果是重要的食物，对伊朗来讲也是重要的出口农产品。而开着口的开心果价格更高，这催生了食品生产企业想要对开心果进行检测和分类的需求。伊朗的学者制做了一个开心果的数据集，含423幅标注图像3927个标注的开心果个体，并提出在视频中用RetinaNet检测开心果，再分类的方法进行检测和计数，总计数精度94.75%。

数据集和代码都开源了，但愿对那些有计数须要的应用有启发！

做者 | Mohammad Rahimzadeh, Abolfazl Attar

单位 | 伊朗科学技术大学；伊朗沙力夫理工大学

论文 | https://arxiv.org/abs/2005.03990

代码 | https://github.com/mr7495/Pistachio-Counting

数据集 | https://github.com/mr7495/Pesteh-Set

人脸活体检测

[26].Learning Generalized Spoof Cues for Face Anti-spoofing

百度活体检测论文，再也不假设非活体的类型，将活体检测看做异常检测问题，提出一种残差学习框架学习活体和非活体的鉴别特征。战胜了以前的SOTA方法。

做者 | Haocheng Feng, Zhibin Hong, Haixiao Yue, Yang Chen, Keyao Wang, Junyu Han, Jingtuo Liu, Errui Ding

单位 | 百度，北航

论文 | https://arxiv.org/abs/2005.03922

代码 | https://github.com/vis-var/lgsc-for-fas

人体动做识别与检测

[27].3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

#CVPR 2020# 深度视频中的动做识别 3D Dynamic Voxel 方法

3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

做者 | Yancheng Wang, Yang Xiao, Fu Xiong, Wenxiang Jiang, Zhiguo Cao, Joey Tianyi Zhou, Junsong Yuan

单位 | 华科、旷视等

论文 | https://arxiv.org/abs/2005.05501v1

代码 | https://github.com/3huo/3DV-Action

数据增广

[28].AutoCLINT: The Winning Method in AutoCV Challenge 2019

AutoCV Challenge 2019 冠军方案论文及代码，设计有效的代码优化和自动数据增广。

做者 | Woonhyuk Baek, Ildoo Kim, Sungwoong Kim, Sungbin Lim

单位 | Kakao Brain；UNIST

论文 | https://arxiv.org/abs/2005.04373

代码 | https://github.com/kakaobrain/autoclint

神经架构迁移

[29].Neural Architecture Transfer

神经架构搜索（NAS）常常被用于特定任务的网络设计，好比针对移动端、GPU、CPU分别搜索不一样的网络架构，但若是要在多个设备上部署，依次搜索的方式耗费大量的资源。

本文提出一种神经架构迁移的概念，设计特定任务（好比分类）的超网络，而从它的采样获得的子集能够直接用，而不须要多余的训练。

在11个涵盖大规模多类和小规模细粒度的的图像分类的全部基准测试中，该文方法改进了全部移动端部署的SOTA方法（好比在ImageNet上获得的模型比EfficientNet-B0精度高且计算量少）。小规模细粒度的任务增益更多，所须要的时间也相比NAS方法减小了一个数量级。

该方法特别适合一次性设计多个针对不一样硬件或者目标的场景。

做者| Zhichao Lu, Gautam Sreekumar, Erik Goodman, Wolfgang Banzhaf, Kalyanmoy Deb, Vishnu Naresh Boddeti

单位 | 密歇根州立大学

论文 | https://arxiv.org/abs/2005.05859v1

代码 | https://github.com/human-analysis/neural-architecture-transfer（404）

神经架构搜索

[30].Neural Architecture Search for Gliomas Segmentation on Multimodal Magnetic Resonance Imaging

使用NAS的基于多模态磁共振成像的胶质瘤神经结构分割算法研究

做者 | Feifan Wang, Bharat Biswal

单位 | 电子科技大学;新泽西理工学院

论文 | https://arxiv.org/abs/2005.06338

代码 | https://github.com/woodywff/brats_2019

对抗学习

#CVPR2020#

[31].Projection & Probability-Driven Black-Box Attack

投影和几率驱动的黑盒攻击

做者 | Jie Li, Rongrong Ji, Hong Liu, Jianzhuang Liu, Bineng Zhong, Cheng Deng, Qi Tian

单位 | 华侨大学；厦门大学；诺亚方舟华为实验室；西安电子科技大学

论文 | https://arxiv.org/abs/2005.03837

代码 | https://github.com/theFool32/PPBA

[32].Adversarial examples are useful too

做者 | Ali Borji

论文 | https://arxiv.org/abs/2005.06107

代码 | https://github.com/aliborji/Backdoor_defense.git

光谱重建

[33].Hierarchical Regression Network for Spectral Reconstruction from RGB Images

RGB图像光谱重建的层次回归网络

本文提出一个以PixelShuffle层做为层间交互的4层层次回归网络（HRNet）

在NTIRE 2020挑战赛中，是赛道2（真实世界图像）的获胜方法，在赛道1（清洁图像）中排名第三。

做者 | Yuzhi Zhao, Lai-Man Po, Qiong Yan, Wei Liu, Tingyu Lin

单位 | 香港城市大学；哈工大；商汤

论文 | https://arxiv.org/abs/2005.04703

代码 | https://github.com/zhaoyuzhi/

Hierarchical-Regression-Network-for-

Spectral-Reconstruction-from-RGB-Images

3D姿态估计

#CVPR 2020#

[34].Epipolar Transformers

卡耐基梅隆大学和Facebook的学者提出一种利用对极几何变换从2D信息构建3D感知特征的方法，使得3D姿态估计更好的利用场景3D信息，在InterHand 和 Human3.6M数据集上取得更高的精度。代码已开源。

做者 | Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu

单位 | Facebook Reality Labs；卡内基梅隆大学

论文 | https://arxiv.org/abs/2005.04551

代码 | https://github.com/yihui-he/epipolar-transformers

医学影像处理

[35].iUNets: Fully invertible U-Nets with Learnable Upand Downsampling

剑桥大学学者提出一种彻底可逆的UNet架构iUNets。UNet被普遍用用于图像到图像的变换，好比分割任务或者其逆问题成像。但在一些高维数据如3D医学成像中，原始的UNet每每对内存要求很高，做者发明了可学习的且可逆的上下采样操做，提出了一种彻底可逆的UNet架构iUNet，容许内存高效的反向传播。在CT医学图像的后处理和脑瘤分割的任务中表现出更好的结果。基于PyTorch的代码已开源。

做者 | Christian Etmann, Rihuan Ke, Carola-Bibiane Schönlieb

单位 | 剑桥大学

论文 | https://arxiv.org/abs/2005.05220

开源库 | https://github.com/cetmann/iunets

运动迁移

[36].Unpaired Motion Style Transfer from Video to Animation

#SIGGRAPH 2020# 把真人动做迁移到动画角色上

做者 | Kfir Aberman, Yijia Weng, Dani Lischinski, Daniel Cohen-Or, Baoquan Chen

单位 | 特拉维夫大学&北电；北大；AICFVE；希伯来大学

论文 | https://arxiv.org/abs/2005.05751

代码 | https://github.com/DeepMotionEditing/

deep-motion-editing

视频 | https://www.youtube.com/watch?v=m04zuBSdGrc

二进制神经网络

[37].Binarizing MobileNet via Evolution-based Searching

该方法达到了60.09％的Top-1准确性，而且赛过了最新的CI-BCNN

做者 | Hai Phan, Zechun Liu, Dang Huynh, Marios Savvides, Kwang-Ting Cheng, Zhiqiang Shen

单位 | Axon Enterprise；CMU；香港科技大学

论文 | https://arxiv.org/abs/2005.06305

代码 | https://github.com/HaiPhan1991/BinMobileNet_Evo_Search

FPGA加速CNN

[38].ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network

FPGA实现的嵌入式CNN

做者 | David Gschwend

单位 | 苏黎世联邦理工学院

论文 | https://arxiv.org/abs/2005.06892

代码 | https://github.com/dgschwend/zynqnet

局部特征提取与图像匹配

[39].The Information & Mutual Information Ratio for Counting Image Features and Their Matches

做者 | Ali Khajegili Mirabadi, Stefano Rini

单位 | 台湾交通大学

论文 | https://arxiv.org/abs/2005.06739

代码 | https://github.com/AliKhajegiliM/IR-and-MIR（将开源）

在我爱计算机视觉公众号对话框回复“ CVCode ”便可获取以上全部论文下载地址。（网盘位置：Code周报--20200502期）