3D点云网络:PointNet:Deep Learning on Point Sets for 3D Classification and Segmentation

PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation

PointNet用于3D分类和分割的点云深度学习网络

Stanford University, 2017CVPR
斯坦福大学提出的直接对3D点云数据进行深度学习,是第一个直接处理点云数据的神经网络,同时还提出了PointNet++在2017年的nips上发表。因为会议文章篇幅的限制,看文章的full version带附录的,附录几乎和正文同样长。。
做者github地址html

摘要

原文 译文
Point cloud is an important type of geometric data structure. Due to its irregular format, most researchers transform such data to regular 3D voxel grids or collections of images. This, however, renders data unnecessarily voluminous and causes issues. 点云是集合数据结构中的一种重要形式,可是因为点云的不规则性,大多数处理方法都是将点云数据转换成便于处理的规则形式,例如体素网格、多视图等。这些处理方法天然会带来一些没必要要的信息丢失和问题。
In this paper, we design a novel type of neural network that directly consumes point clouds, which well respects the permutation invariance of points in the input. 本文提出了一种能够直接处理点云数据的神经网络结构,能够很好的处理点云在空间排列的无序性。
Our network, named PointNet, provides a unified architecture for applications ranging from object classification, part segmentation, to scene semantic parsing. Though simple, PointNet is highly efficient and effective. 咱们提出的PointNet提供了一种综合性结构,能够处理物体分类、分割、场景语义转换等任务。尽管网络结构简单,可是却颇有效。
Empirically, it shows strong performance on par or even better than state of the art. Theoretically, we provide analysis towards understanding of what the network has learnt and why the network is robust with respect to input perturbation and corruption. 不经在实际应用上表现出色,并且咱们也提供了理论分析帮助理解内在缘由。

Net Architecture

Architecture
PointNet的网络结构如上图所示,包括两个部分:classification和segmentation,这两个任务的部分结构是相同,分割任务的结构更复杂点。
网络主要包括3个关键部分:the max pooling layer、a local and global information combination structure、two joint alignment networksgit

the max pooling layer

用max pooling来解决输入点云无序的问题。
目前有3张方法应对点云无序性:github

  1. sort input into a canonical order将无序数据排序成有序形式
  2. treat the input as a sequence to train an RNN将点云看做书序列的形式对待
  3. use a simple symmetric function to aggregate the information from each point利用一个简单的对称函数来组合这些点信息

本文选用第3种方法,用max pool来近似这么一个对称函数
f ( x 1 , . . . , x n ) g ( h ( x 1 ) , . . . , h ( x n ) ) f({x_1,...,x_n}) \approx g(h(x_1),...,h(x_n))
h h 是一个MLP, g g 是一个max pool,做者用max pool来近似一个堆成函数,而且给出了一个证实…可是我没看懂,看懂了再回来写。web

local and global information combination structure

从上节中 f f 函数获得的特征 [ f 1 , . . . , f n ] [f_1,...,f_n] 是一个点云的global feature,对于分类任务能够训练一个SVM或者MLP。可是对于分割任务,须要知道local feature,做者经过将获得的global feature和原来的point feature结合获得既有global又有local的feature。网络

two joint alignment networks

点云通过刚性变换后仍是一个点云,而且表达意思是不变的。因此咱们网络提取的特征对这些transformation是不变的。
借助于2D图像上的STN思想,本文提出将STN用于输入点云和输出特征,即input stn和feature stn,stn是一个很小的能够任意集成进网络的子网络。
对于input stn,输入是input,输出是针对输入的3x3变换矩阵
对于feature stn,输入是feature ,输出是针对输入的64x64变换矩阵,特征维数较高,因此须要将特征stn正则化,实验代表有助于网络收敛。数据结构

代码

在做者的github上给出了模型以及训练的代码,咱们关注两部分pointnet_cls.pytransform_nets.pyapp

transform_nets

input_transform:
input→conv(3,64)→conv(64,128)→conv(128,1024)→maxpool(N,1)→fc(1024,512)→fc(512,256)→fc(256,9)
做者代码的最后一个fc写的很奇怪,目标是获得一个3x3的变换矩阵,也就是9个输出的全链接层ide

def input_transform_net(point_cloud, is_training, bn_decay=None, K=3):
    """ Input (XYZ) Transform Net, input is BxNx3 gray image
        Return:
            Transformation matrix of size 3xK """
    batch_size = point_cloud.get_shape()[0].value
    num_point = point_cloud.get_shape()[1].value

    input_image = tf.expand_dims(point_cloud, -1)
    net = tf_util.conv2d(input_image, 64, [1,3],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv2', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv3', bn_decay=bn_decay)
    net = tf_util.max_pool2d(net, [num_point,1],
                             padding='VALID', scope='tmaxpool')

    net = tf.reshape(net, [batch_size, -1])
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
                                  scope='tfc1', bn_decay=bn_decay)
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
                                  scope='tfc2', bn_decay=bn_decay)

    with tf.variable_scope('transform_XYZ') as sc:
        assert(K==3)
        weights = tf.get_variable('weights', [256, 3*K],
                                  initializer=tf.constant_initializer(0.0),
                                  dtype=tf.float32)
        biases = tf.get_variable('biases', [3*K],
                                 initializer=tf.constant_initializer(0.0),
                                 dtype=tf.float32)
        biases += tf.constant([1,0,0,0,1,0,0,0,1], dtype=tf.float32)
        transform = tf.matmul(net, weights)
        transform = tf.nn.bias_add(transform, biases)

    transform = tf.reshape(transform, [batch_size, 3, K])
    return transform

feature_transform:
feature→conv(64,64)→conv(64,128)→conv(128,1024)→maxpool(N,1)→fc(1024,512)→fc(512,256)→fc(256,64)svg

def feature_transform_net(inputs, is_training, bn_decay=None, K=64):
    """ Feature Transform Net, input is BxNx1xK
        Return:
            Transformation matrix of size KxK """
    batch_size = inputs.get_shape()[0].value
    num_point = inputs.get_shape()[1].value

    net = tf_util.conv2d(inputs, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv2', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='tconv3', bn_decay=bn_decay)
    net = tf_util.max_pool2d(net, [num_point,1],
                             padding='VALID', scope='tmaxpool')

    net = tf.reshape(net, [batch_size, -1])
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
                                  scope='tfc1', bn_decay=bn_decay)
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
                                  scope='tfc2', bn_decay=bn_decay)

    with tf.variable_scope('transform_feat') as sc:
        weights = tf.get_variable('weights', [256, K*K],
                                  initializer=tf.constant_initializer(0.0),
                                  dtype=tf.float32)
        biases = tf.get_variable('biases', [K*K],
                                 initializer=tf.constant_initializer(0.0),
                                 dtype=tf.float32)
        biases += tf.constant(np.eye(K).flatten(), dtype=tf.float32)
        transform = tf.matmul(net, weights)
        transform = tf.nn.bias_add(transform, biases)

    transform = tf.reshape(transform, [batch_size, K, K])
    return transform

pointnet_cls

input→input_stn(3,3)→conv(3,64)→conv(64,64)→feature_stn(64,64)→conv(64,64)→conv(64,128)→conv(128,1024)→maxpool(N,1)→fc(1024,512)→fc(512,256)→fc(256,num_classes)函数

def get_model(point_cloud, is_training, bn_decay=None):
    """ Classification PointNet, input is BxNx3, output Bx40 """
    batch_size = point_cloud.get_shape()[0].value
    num_point = point_cloud.get_shape()[1].value
    end_points = {}

    with tf.variable_scope('transform_net1') as sc:
        transform = input_transform_net(point_cloud, is_training, bn_decay, K=3)
    point_cloud_transformed = tf.matmul(point_cloud, transform)
    input_image = tf.expand_dims(point_cloud_transformed, -1)

    net = tf_util.conv2d(input_image, 64, [1,3],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv1', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv2', bn_decay=bn_decay)

    with tf.variable_scope('transform_net2') as sc:
        transform = feature_transform_net(net, is_training, bn_decay, K=64)
    end_points['transform'] = transform
    net_transformed = tf.matmul(tf.squeeze(net, axis=[2]), transform)
    net_transformed = tf.expand_dims(net_transformed, [2])

    net = tf_util.conv2d(net_transformed, 64, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv3', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 128, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv4', bn_decay=bn_decay)
    net = tf_util.conv2d(net, 1024, [1,1],
                         padding='VALID', stride=[1,1],
                         bn=True, is_training=is_training,
                         scope='conv5', bn_decay=bn_decay)

    # Symmetric function: max pooling
    net = tf_util.max_pool2d(net, [num_point,1],
                             padding='VALID', scope='maxpool')

    net = tf.reshape(net, [batch_size, -1])
    net = tf_util.fully_connected(net, 512, bn=True, is_training=is_training,
                                  scope='fc1', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training,
                          scope='dp1')
    net = tf_util.fully_connected(net, 256, bn=True, is_training=is_training,
                                  scope='fc2', bn_decay=bn_decay)
    net = tf_util.dropout(net, keep_prob=0.7, is_training=is_training,
                          scope='dp2')
    net = tf_util.fully_connected(net, 40, activation_fn=None, scope='fc3')

    return net, end_points