Learning 2D–3D Correspondences To Solve The Blind Perspective-n-Point Problem

When correspondences are known, the problem reduces to the standard PnP problem [10,17,33,19]. When correspondences are unknown, the problem is blind PnP, for which several traditional geometry-based methods were proposed, in- cluding SoftPoSIT [7], BlindPnP [24], GOPAC [2] and GOSMA [3].

 

上述方法都是需要pose先验或者需要穷举搜索,本文直接回归we propose to estimate the correspondence matrix directly.

 

该方法从编码局部几何结构和全局上下文的点集中提取有区别的特征描述子。直觉认为,3D中某一点的局部几何结构很可能与2D中相应点的局部几何结构有一定的相似性,模拟了投影和遮挡的影响。然后在一个新的全局特征匹配模型中结合每个点集的特征来估计二维-三维对应关系。该模块使用最优传输计算加权(联合概率)矩阵,其中每个元素描述特定三维点与特定二维点的匹配性。按权重降序排序二维-三维匹配将生成一个优先匹配列表,该列表可用于恢复相机姿势。进一步消除内部歧义

从优先匹配列表中,我们附加了一个与Yi等人相似的内联分类CNN。[22]并使用过滤后的输出估计相机姿势。我们的通信估计CNN是端到端训练,代码和数据将被公布,以便于今后的研究。总体框架如图1所示。我们的贡献是:

  1. a new deep method to solve the blind PnP problem with unknown 2D–3D correspondences. To the best of our knowledge, there is no existing deep method that takes unordered 2D and 3D point-sets (with unknown corre- spondences) as inputs, and outputs a 6-DoF camera pose;

  2. a two-stream network to extract discriminative features from the point sets, which encodes both local geometric structure and global context; and

  3. an original global feature matching network based on a recurrent Sinkhorn layer to find 2D–3D correspondences, with a loss function that maximizes the matching probabilities of inlier matches.

 

When 3D points are not utilized, the PoseNet algorithms [15,14] can directly regress a camera pose. However, the accuracy of the regressed 6-DoF poses is inferior to geometry-based methods that use 3D points.(当没有3d点的时候使用pose net这样的网络是最好的,但是因为没有3d点的东西,因此精度都是一个大问题)

 

 

首先是预处理:2d点通过相机内参K矩阵反投影到3d空间,然后取齐次坐标。3d点集通过学习一个3*3的变换矩阵,对所有的3d点云归一化(其实就是变换到一个什么标准坐标系下)

 

 

然后使用l2距离针对每个2d点和每个3d点计算他们的邻域,然后手动构造一个graph,然后使用Similar to EdgeConv对这些graph进行卷积运算。最后得到每个3d和2d点的几何feature ,本文设计的是128维。再然后将整个集合输入context norm 融合全局几何信息(2d和3d点集在经过上一步都变成了3d点集合)

 

3.3 Global Feature Matching

 

定义H矩阵:,f表示上述学习到的feature

 

每个元素的Wij是根据成本矩阵H和一元匹配向量r和s估计的,而不是从Hij局部估计的。换言之,加权矩阵W全局地处理H中的成对描述符距离模糊,同时尊重一元先验。整个管道如图3所示。

 

 

W权重并不是仅仅通过h矩阵+sinkhorn得到,同时还考虑了一个先验r和s(概率)Prior Matchability: For each point we define a prior unary matchability measuring how likely it is to have a match.

 

 

W的每一行的和等于s,每一列和等于r,,因为s r表示的是这个2d点或者3d点是不是一个正确的match的概率

 

如果没有gt,这时候怎么构造gt呢,,使用重投影即可,,可以是dense的投影