对于给定的数据,其尺寸为N,C,H,W
,如今想要计算其局部的相关性,也就是说特定尺寸范围内,例如2*2
大小的区域内任意两点之间的点积。git
试写出相关的代码。github
计算局部相关性,并且这里也提到是说使用局部的区域的任意两点之间的点积来计算,因此实际上也就是须要就算对应的2*2
范围内的任意两个C维矢量的点积,最终获得一个4*4
的关系矩阵。如果在矢量点积的时候,除以各自的模,那么实际上计算的就是两个矢量的余弦距离。spa
余弦类似度用向量空间中两个向量夹角的余弦值做为衡量两个个体间差别的大小。相比距离度量,余弦类似度更加注重两个向量在方向上的差别,而非距离或长度上。
https://blog.csdn.net/weixin_38659482/article/details/85045537.net
\[ cos<\vec{x}, \vec{y}> = \frac{\vec{x} \cdot \vec{y}}{||\vec{x}|| \cdot ||\vec{y}||} = \frac{\vec{x} }{||\vec{x}||} \cdot \frac{\vec{y}}{||\vec{y}||} \]code
因此“除以模”这个归一化操做放在点积以前。实际上也就是除以沿着C维度计算的L2范数。blog
最直接的思路是简单的遍历计算,可是不实际,太耗时。如何可以利用GPU并行计算的优点,那天然是使用矩阵操做。ip
对于已有的N,C,H,W
的数据,咱们须要计算点积,对于三维以上的点积,可使用torch.matmul
,这时,乘法发生在最右侧的几个维度上。文档
能够构想,咱们最终获得的结果应该是N * NumOfRegions * 4 * 4
大小的一个张量。而这里的NumOfRegions
表示总的计算了的区域的数量。对于pytorch,我所知道的能够收集区域数据,并且没有其余多余操做的方法只有torch.nn.Unfold
。因此这里使用它来实现这个过程。get
对于矩阵乘法,思考的最简单的方式就是维度匹配。
import torch import torch.nn as nn a = torch.rand(1, 2, 3, 4) b = torch.rand(1, 2, 3, 4) print("a=>\n", a) print("b=>\n", b) a=> tensor([[[[0.4818, 0.9888, 0.8039, 0.7089], [0.7667, 0.2273, 0.9956, 0.4739], [0.9515, 0.1896, 0.7928, 0.0173]], [[0.1723, 0.8767, 0.4832, 0.6515], [0.9487, 0.6301, 0.5711, 0.7781], [0.2017, 0.9220, 0.2793, 0.2675]]]]) b=> tensor([[[[0.1417, 0.3510, 0.1170, 0.1698], [0.4311, 0.1535, 0.6087, 0.6646], [0.1880, 0.4103, 0.0289, 0.1094]], [[0.3398, 0.8751, 0.8299, 0.3514], [0.0333, 0.2831, 0.8086, 0.0514], [0.3168, 0.2895, 0.5107, 0.4949]]]])
这里先定义两个tensor,两者实际上没有关系,后面的计算也没有关系,只是为了多展现一点。
unfold_func = nn.Unfold(2, 1, 0, 1) unfold_a = unfold_func(a) print("unfold_a=>\n", unfold_a) unfold_b = unfold_func(b) print("unfold_b=>\n", unfold_b) unfold_a=> tensor([[[0.4818, 0.9888, 0.8039, 0.7667, 0.2273, 0.9956], [0.9888, 0.8039, 0.7089, 0.2273, 0.9956, 0.4739], [0.7667, 0.2273, 0.9956, 0.9515, 0.1896, 0.7928], [0.2273, 0.9956, 0.4739, 0.1896, 0.7928, 0.0173], [0.1723, 0.8767, 0.4832, 0.9487, 0.6301, 0.5711], [0.8767, 0.4832, 0.6515, 0.6301, 0.5711, 0.7781], [0.9487, 0.6301, 0.5711, 0.2017, 0.9220, 0.2793], [0.6301, 0.5711, 0.7781, 0.9220, 0.2793, 0.2675]]]) unfold_b=> tensor([[[0.1417, 0.3510, 0.1170, 0.4311, 0.1535, 0.6087], [0.3510, 0.1170, 0.1698, 0.1535, 0.6087, 0.6646], [0.4311, 0.1535, 0.6087, 0.1880, 0.4103, 0.0289], [0.1535, 0.6087, 0.6646, 0.4103, 0.0289, 0.1094], [0.3398, 0.8751, 0.8299, 0.0333, 0.2831, 0.8086], [0.8751, 0.8299, 0.3514, 0.2831, 0.8086, 0.0514], [0.0333, 0.2831, 0.8086, 0.3168, 0.2895, 0.5107], [0.2831, 0.8086, 0.0514, 0.2895, 0.5107, 0.4949]]])
这里使用fold和unfold操做以后能够看出来,外侧的括号从原来的四层变为了如今的三层,实际上表示的就是从原来的N,C,H,W
变成了如今的N,C*4,H/2*W/2
的样子。
而对于H/2*W/2
的维度上,在滑窗处理时,也是基于行主序调整成一行的。
unfold_a_reshape = unfold_a.transpose(1, 2).view(1, (3-1)*(4-1), 2, 4) # N,H'W',C,2*2 print("unfold_a_reshape=>\n", unfold_a_reshape) unfold_b_reshape = unfold_b.transpose(1, 2).view(1, (3-1)*(4-1), 2, 4) print("unfold_b_reshape=>\n", unfold_b_reshape) unfold_a_reshape=> tensor([[[[0.4818, 0.9888, 0.7667, 0.2273], [0.1723, 0.8767, 0.9487, 0.6301]], [[0.9888, 0.8039, 0.2273, 0.9956], [0.8767, 0.4832, 0.6301, 0.5711]], [[0.8039, 0.7089, 0.9956, 0.4739], [0.4832, 0.6515, 0.5711, 0.7781]], [[0.7667, 0.2273, 0.9515, 0.1896], [0.9487, 0.6301, 0.2017, 0.9220]], [[0.2273, 0.9956, 0.1896, 0.7928], [0.6301, 0.5711, 0.9220, 0.2793]], [[0.9956, 0.4739, 0.7928, 0.0173], [0.5711, 0.7781, 0.2793, 0.2675]]]]) unfold_b_reshape=> tensor([[[[0.1417, 0.3510, 0.4311, 0.1535], [0.3398, 0.8751, 0.0333, 0.2831]], [[0.3510, 0.1170, 0.1535, 0.6087], [0.8751, 0.8299, 0.2831, 0.8086]], [[0.1170, 0.1698, 0.6087, 0.6646], [0.8299, 0.3514, 0.8086, 0.0514]], [[0.4311, 0.1535, 0.1880, 0.4103], [0.0333, 0.2831, 0.3168, 0.2895]], [[0.1535, 0.6087, 0.4103, 0.0289], [0.2831, 0.8086, 0.2895, 0.5107]], [[0.6087, 0.6646, 0.0289, 0.1094], [0.8086, 0.0514, 0.5107, 0.4949]]]])
这里调整一下形状,这里能够根据维度匹配的思想进行链接,这里就是为了方便经过后面的矩阵乘法实现对于区域内任意点关系的描述矩阵的构造。
mm_unfold_a = torch.matmul(unfold_a_reshape.transpose(2, 3), unfold_a_reshape) # N,H'W',2*2,2*2 print("mm_unfold_a=>\n", mm_unfold_a) mm_unfold_b = torch.matmul(unfold_b_reshape.transpose(2, 3), unfold_b_reshape) print("mm_unfold_b=>\n", mm_unfold_b) mm_unfold_a=> tensor([[[[0.2619, 0.6275, 0.5329, 0.2181], [0.6275, 1.7462, 1.5898, 0.7771], [0.5329, 1.5898, 1.4878, 0.7720], [0.2181, 0.7771, 0.7720, 0.4487]], [[1.7462, 1.2184, 0.7771, 1.4851], [1.2184, 0.8796, 0.4871, 1.0763], [0.7771, 0.4871, 0.4487, 0.5862], [1.4851, 1.0763, 0.5862, 1.3174]], [[0.8796, 0.8847, 1.0763, 0.7569], [0.8847, 0.9270, 1.0779, 0.8429], [1.0763, 1.0779, 1.3174, 0.9163], [0.7569, 0.8429, 0.9163, 0.8301]], [[1.4878, 0.7720, 0.9209, 1.0200], [0.7720, 0.4487, 0.3433, 0.6240], [0.9209, 0.3433, 0.9459, 0.3664], [1.0200, 0.6240, 0.3664, 0.8860]], [[0.4487, 0.5862, 0.6240, 0.3562], [0.5862, 1.3174, 0.7153, 0.9488], [0.6240, 0.7153, 0.8860, 0.4078], [0.3562, 0.9488, 0.4078, 0.7065]], [[1.3174, 0.9163, 0.9488, 0.1700], [0.9163, 0.8301, 0.5930, 0.2164], [0.9488, 0.5930, 0.7065, 0.0884], [0.1700, 0.2164, 0.0884, 0.0719]]]]) mm_unfold_b=> tensor([[[[0.1355, 0.3471, 0.0724, 0.1180], [0.3471, 0.8891, 0.1805, 0.3017], [0.0724, 0.1805, 0.1869, 0.0756], [0.1180, 0.3017, 0.0756, 0.1037]], [[0.8891, 0.7674, 0.3017, 0.9213], [0.7674, 0.7025, 0.2530, 0.7424], [0.3017, 0.2530, 0.1037, 0.3224], [0.9213, 0.7424, 0.3224, 1.0244]], [[0.7025, 0.3115, 0.7424, 0.1204], [0.3115, 0.1523, 0.3875, 0.1309], [0.7424, 0.3875, 1.0244, 0.4461], [0.1204, 0.1309, 0.4461, 0.4443]], [[0.1869, 0.0756, 0.0916, 0.1865], [0.0756, 0.1037, 0.1186, 0.1450], [0.0916, 0.1186, 0.1357, 0.1689], [0.1865, 0.1450, 0.1689, 0.2522]], [[0.1037, 0.3224, 0.1450, 0.1490], [0.3224, 1.0244, 0.4839, 0.4306], [0.1450, 0.4839, 0.2522, 0.1597], [0.1490, 0.4306, 0.1597, 0.2616]], [[1.0244, 0.4461, 0.4306, 0.4668], [0.4461, 0.4443, 0.0455, 0.0982], [0.4306, 0.0455, 0.2616, 0.2559], [0.4668, 0.0982, 0.2559, 0.2569]]]])
这里计算了乘法,实际上结果计算出来的就是对应的关系矩阵。这里结果的尺寸为N, NumOfRegion, 2*2, 2*2
。(这里没有计算范数,实际上应该除以范数)
a_ = a[0, :2, :2, :2] b_ = b[0, :2, :2, :2] print(a_.shape, b_.shape) a_ = a_.reshape(1, 2, 2*2) # N,C,2*2 b_ = b_.reshape(1, 2, 2*2) print("torch.matmul(a_.t, a_)=>\n", torch.matmul(a_.transpose(1, 2), a_)) print("torch.matmul(b_.t, b_)=>\n", torch.matmul(b_.transpose(1, 2), b_)) print(torch.matmul(a_.transpose(1, 2), a_)[0] == mm_unfold_a[0, 0]) print(torch.matmul(b_.transpose(1, 2), b_)[0] == mm_unfold_b[0, 0]) torch.Size([2, 2, 2]) torch.Size([2, 2, 2]) torch.matmul(a_.t, a_)=> tensor([[[0.2619, 0.6275, 0.5329, 0.2181], [0.6275, 1.7462, 1.5898, 0.7771], [0.5329, 1.5898, 1.4878, 0.7720], [0.2181, 0.7771, 0.7720, 0.4487]]]) torch.matmul(b_.t, b_)=> tensor([[[0.1355, 0.3471, 0.0724, 0.1180], [0.3471, 0.8891, 0.1805, 0.3017], [0.0724, 0.1805, 0.1869, 0.0756], [0.1180, 0.3017, 0.0756, 0.1037]]]) tensor([[True, True, True, True], [True, True, True, True], [True, True, True, True], [True, True, True, True]]) tensor([[True, True, True, True], [True, True, True, True], [True, True, True, True], [True, True, True, True]])
从这里能够看出来,经过fold、reshape(view)、matmul
实现了对于N,C,H,W
形状的数据的局部(这里对应为滑窗操做的kernel_size
)关联矩阵的计算,并且速度又快(相较于最原始朴素的“滑窗式”计算方法)。
对于运算过程代码的书写,这里验证了一个想法,简单的按照矩阵的维度匹配的原则,是能够直接写出来这个局部关系矩阵的:
N,C,H,W --(Ws*Ws)--> N,C*Ws*Ws,H/Ws*W/Ws --> N,H/Ws*W/Ws,C*Ws*Ws --> N,H/Ws*W/Ws,C*Ws*Ws --> N,H/Ws*W/Ws,C,Ws*Ws --> N,H/Ws*W/Ws,Ws*Ws,Ws*Ws
这里的H/Ws*W/Ws
实际上反映出来的是分块的数量,这里直接使用除法对应的是滑窗大小正好能够被数据长宽整除,同时步长等于滑窗大小,没有padding
的状况。
前面给出的代码中能够看出来,这里的值对于步长为1的时候,是须要进行调整的。
unfold_func = nn.Unfold(2, 1, 0, 1) ... unfold_a_reshape = unfold_a.transpose(1, 2).view(1, (3-1)*(4-1), 2, 4)