常亮, 邓小明, 周明全, 武仲科, 袁野, 杨硕, 王宏安. 图像理解中的卷积神经网络. 自动化学报, 2016, 42(9): 1300-1312. doi: 10.16383/j.aas.2016.c150800
引用本文:
常亮, 邓小明, 周明全, 武仲科, 袁野, 杨硕, 王宏安. 图像理解中的卷积神经网络. 自动化学报, 2016,
42
(9): 1300-1312.
doi:
10.16383/j.aas.2016.c150800
CHANG Liang, DENG Xiao-Ming, ZHOU Ming-Quan, WU Zhong-Ke, YUAN Ye, YANG Shuo, WANG Hong-An. Convolutional Neural Networks in Image Understanding. ACTA AUTOMATICA SINICA, 2016, 42(9): 1300-1312. doi: 10.16383/j.aas.2016.c150800
Citation:
CHANG Liang, DENG Xiao-Ming, ZHOU Ming-Quan, WU Zhong-Ke, YUAN Ye, YANG Shuo, WANG Hong-An. Convolutional Neural Networks in Image Understanding.
ACTA AUTOMATICA SINICA
, 2016,
42
(9): 1300-1312.
doi:
10.16383/j.aas.2016.c150800
常亮, 邓小明, 周明全, 武仲科, 袁野, 杨硕, 王宏安. 图像理解中的卷积神经网络. 自动化学报, 2016, 42(9): 1300-1312. doi: 10.16383/j.aas.2016.c150800
引用本文:
常亮, 邓小明, 周明全, 武仲科, 袁野, 杨硕, 王宏安. 图像理解中的卷积神经网络. 自动化学报, 2016,
42
(9): 1300-1312.
doi:
10.16383/j.aas.2016.c150800
CHANG Liang, DENG Xiao-Ming, ZHOU Ming-Quan, WU Zhong-Ke, YUAN Ye, YANG Shuo, WANG Hong-An. Convolutional Neural Networks in Image Understanding. ACTA AUTOMATICA SINICA, 2016, 42(9): 1300-1312. doi: 10.16383/j.aas.2016.c150800
Citation:
CHANG Liang, DENG Xiao-Ming, ZHOU Ming-Quan, WU Zhong-Ke, YUAN Ye, YANG Shuo, WANG Hong-An. Convolutional Neural Networks in Image Understanding.
ACTA AUTOMATICA SINICA
, 2016,
42
(9): 1300-1312.
doi:
10.16383/j.aas.2016.c150800
作者简介:
常亮
北京师范大学信息科学与技术学院副教授.主要研究方向为计算机视觉与机器学习.E-mail:
[email protected]
周明全
北京师范大学信息科学与技术学院教授.主要研究方向为计算机可视化技术,虚拟现实.E-mail:
[email protected]
武仲科
北京师范大学信息科学与技术学院教授.主要研究方向为计算机图形学,计算机辅助几何设计,计算机动画,虚拟现实.E-mail:
[email protected]
袁野
中国科学院软件研究所硕士研究生.主要研究方向为计算机视觉.E-mail:
[email protected]
杨硕
中国科学院软件研究所硕士研究生.主要研究方向为计算机视觉.E-mail:
[email protected]
王宏安
中国科学院软件研究所研究员.主要研究方向为实时智能,自然人机交互.E-mail:
[email protected]
通讯作者:
邓小明
中国科学院软件研究所副研究员.主要研究方向为计算机视觉.本文通信作者.E-mail:
[email protected]
Author Bio:
Associate professor at the College of Information Science and Technology, Beijing Normal University. Her research interest covers computer vision and machine learning.
Professor at the College of Information Science and Technology, Beijing Normal University. His research interest covers information visualization and virtual reality.
Professor at the College of Information Science and Technology, Beijing Normal University. His research interest covers computer graphics, computer-aided design, computer animation, and virtual reality.
Master student at the Institute of Software, Chinese Academy of Sciences. His main research interest is computer vision.
Master student at the Institute of Software, Chinese Academy of Sciences. His main research interest is computer vision.
Professor at the Institute of Software, Chinese Academy of Sciences. His research interest covers real-time intelligence and natural human-computer interactions.
近年来,卷积神经网络(Convolutional neural networks,CNN)已在图像理解领域得到了广泛的应用,引起了研究者的关注. 特别是随着大规模图像数据的产生以及计算机硬件(特别是GPU)的飞速发展,卷积神经网络以及其改进方法在图像理解中取得了突破性的成果,引发了研究的热潮. 本文综述了卷积神经网络在图像理解中的研究进展与典型应用. 首先,阐述卷积神经网络的基础理论;然后,阐述其在图像理解的具体方面,如图像分类与物体检测、人脸识别和场景的语义分割等的研究进展与应用.
卷积神经网络 /
图像理解 /
深度学习 /
图像分类 /
Abstract:
Convolutional neural networks (CNN) have been widely applied to image understanding, and they have arose much attention from researchers. Specifically, with the emergence of large image sets and the rapid development of GPUs, convolutional neural networks and their improvements have made breakthroughs in image understanding, bringing about wide applications into this area. This paper summarizes the up-to-date research and typical applications for convolutional neural networks in image understanding. We firstly review the theoretical basis, and then we present the recent advances and achievements in major areas of image understanding, such as image classification, object detection, face recognition, semantic image segmentation etc.
Key words:
Convolutional neural networks (CNN) /
image understanding /
deep learning /
image classification /
object detection
R-CNN
[
9
]
对物体检测拥有很强的鉴别力; 比在图像金字塔上逐层滑动窗口的物体检测方法效率高;使用包围盒回归(Bounding box regression)提高物体的定位精度
依赖于区域选择算法; 网络输入图像要求固定大小, 容易破环物体的纵横比和上下文信息; 训练是多阶段过程:在特定检测数据集上对网络参数进行微调、提取特征、训练SVM (Sup-port vector machine)分类器、包围盒回归(Bounding box regression);训练时间耗时、耗存储空间
SPP-net
[
10
]
整张图像(不要求固定大小)
对物体检测拥有很强的鉴别力, 输入图像可以任意大小, 可保证图像的比例信息训练速度比R-CNN快3倍左右, 测试比R-CNN快10~100倍
网络结构复杂时, 池化对图像造成一定的信息丢失; SPP层前的卷积层不能进行网络参数更新
[
24
]
; 训练是多阶段过程:在特定检测数据集上对网络参数进行微调、提取特征、训练SVM分类器、包围盒回归; 训练时间耗时、耗存储空间
Fast R-CNN
[
24
]
整张图像(不要求固定大小)
训练和测试都明显快于SPP-net (除了候选区域提取以外的环节接近于实时), 对物体检测拥有很强的鉴别力, 输入图像可以任意大小, 保证图像比例信息, 同时进行分类与定位
依赖于候选区域选择, 它仍是计算瓶颈
Faster R-CNN
[
29
]
整张图像(不要求固定大小)
比Fast R-CNN更加快速, 对物体检测拥有很强的鉴别力; 不依赖于区域选择算法; 输入图像可以任意大小, 保证图像比例信息, 同时进行区域选择算法、分类与定位
训练过程较复杂; 计算流程仍有较大优化空间; 难以解决被遮挡物体的识别问题
Ouyang W L, Wang X G, Zeng X Y, Qiu S, Luo P, Tian Y L, Li H S, Yang S, Wang Z, Loy C C, Tang X O. Deepid-net: deformable deep convolutional neural networks for object detection. In: Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition. Boston, USA: IEEE, 2015. 2403-2412
Shin H C, Roth H R, Gao M C, Lu L, Xu Z Y, Nogues I, Yao J H, Mollura D, Summers R M. Deep convolutional neural networks for computer-aided detection: CNN architectures, dataset characteristics and transfer learning.
IEEE Transactions on Medical Imaging
, 2016, 35(5): 1285-1298
doi:
10.1109/TMI.2016.2528162
余淼, 胡占义. 高阶马尔科夫随机场及其在场景理解中的应用. 自动化学报, 2015, 41(7): 1213-1234
http://www.aas.net.cn/CN/abstract/abstract18696.shtml
Yu Miao, Hu Zhan-Yi. Higher-order Markov random fields and their applications in scene understanding.
Acta Automatica Sinica
, 2015, 41(7): 1213-1234
http://www.aas.net.cn/CN/abstract/abstract18696.shtml
地址:北京中关村东路95号
邮政编码:100190
E-mail:
[email protected]
电话:010-82544677 (日常咨询和稿件处理),
010-82544653(费用管理、寄刊)
北京仁和汇智信息技术有限公司
开发
技术支持:
[email protected]