[pytorch] PyTorch Hook

 

PyTorch Hook

  • 为何要引入hook? -> hook能够作什么?
  • 都有哪些hook?
  • 如何使用hook?
 

1. 为何引入hook?

参考:Pytorch中autograd以及hook函数详解
在pytorch中的自动求梯度机制(Autograd mechanics)中,若是将tensor的requires_grad设为True, 那么涉及到它的一系列运算将在反向传播中自动求梯度。html

In [0]:
x = torch.randn(5, 5) # requires_grad=False by default
y = torch.randn(5, 5) # requires_grad=False by default
z = torch.randn((5, 5), requires_grad=True)
a = x + y
b = a + z
print(a.requires_grad, b.requires_grad)
 
False True
 

可是自动求导的机制有个咱们须要注意的地方:在自动求导机制中只保存叶子节点,也就是中间变量在计算完成梯度后会自动释放以节省空间. 因此下面代码咱们在计算过程当中只获得了z对x的梯度,而y和z的梯度都在梯度计算后被自动释放了,因此显示为None.python

In [0]:
x = torch.tensor([1,2],dtype=torch.float32,requires_grad=True)
y = x * 2
z = torch.mean(y)
z.backward()
print("x.grad =", x.grad)
print("y.grad =", y.grad)
print("z.grad =", z.grad)
 
x.grad = tensor([1., 1.])
y.grad = None
z.grad = None
 

那么可否获得y,z的梯度呢?这就须要引入hook.
在pytorch的tutorial中介绍:
We’ve inspected the weights and the gradients. But how about inspecting / modifying the output and grad_output of a layer ? We introduce hooks for this purpose. hook的引入是为了让咱们能够检测或者修改一个layer的output或者grad_output.git

 

2. hook的种类

  • TENSOR.register_hook(FUNCTION)
  • MODULE.register_forward_hook(FUNCTION)
  • MODULE.register_backward_hook(FUNCTION)

能够为Module或者Tensor注册hook。
若是为Tensor注册hook, 用register_hook();
若是为Module注册hook, 若但愿获取前向传播中layer的input, output信息,能够用register_forward_hook(); 若是为Module注册hook, 若但愿获取反向传播中layer的grad_in, grad_out信息,能够用register_backward_hook().github

 

3. TENSOR.register_hook(FUNCTION)

In [0]:
x = torch.tensor([1,2],dtype=torch.float32,requires_grad=True)
y = x * 2
y.register_hook(print)
z = torch.mean(y)
z.backward()
 
tensor([0.5000, 0.5000])
 

以上代码中,对y进行register_hook引入print这个函数,print便是简单的打印,将y相关的grad打印出来。
在执行z.backward()执行的时候,因为y的hook函数也执行了,打印出了y关于输出z的梯度, 即 tensor([0.5000, 0.5000]) 即是y的梯度。数组

 

4. MODULE.register_forward_hook(FUNCTION) && MODULE.register_backward_hook(FUNCTION)

 

参考连接:Toy example to understand Pytorch hooks
介绍这两个的用法前,咱们先定义module, 以后的hook即是为如下的module注册的。网络

In [0]:
import numpy as np
import torch
import torch.nn as nn
from IPython.display import Image
 

1. Define the network

In [0]:
''' Define the Net '''
class Net(nn.Module):

    def __init__(self):
        super(Net, self).__init__()
        self.fc1 = nn.Linear(2,2)
        self.s1 = nn.Sigmoid()
        self.fc2 = nn.Linear(2,2)
        self.s2 = nn.Sigmoid()
        self.fc1.weight = torch.nn.Parameter(torch.Tensor([[0.15,0.2],[0.250,0.30]]))
        self.fc1.bias = torch.nn.Parameter(torch.Tensor([0.35]))
        self.fc2.weight = torch.nn.Parameter(torch.Tensor([[0.4,0.45],[0.5,0.55]]))
        self.fc2.bias = torch.nn.Parameter(torch.Tensor([0.6]))
        
    def forward(self, x):
        x= self.fc1(x)
        x = self.s1(x)
        x= self.fc2(x)
        x = self.s2(x)
        return x

net = Net()
print(net)
 
Net(
  (fc1): Linear(in_features=2, out_features=2, bias=True)
  (s1): Sigmoid()
  (fc2): Linear(in_features=2, out_features=2, bias=True)
  (s2): Sigmoid()
)
In [0]:
''' Get the value of parameters defined in the Net '''
# parameters: weight and bias
print(list(net.parameters()))
 
[Parameter containing:
tensor([[0.1500, 0.2000],
        [0.2500, 0.3000]], requires_grad=True), Parameter containing:
tensor([0.3500], requires_grad=True), Parameter containing:
tensor([[0.4000, 0.4500],
        [0.5000, 0.5500]], requires_grad=True), Parameter containing:
tensor([0.6000], requires_grad=True)]
In [0]:
''' feed the input data to get the output and loss '''
# input data
data = torch.Tensor([0.05,0.1])
# output of last layer
out = net(data)
target = torch.Tensor([0.01,0.99])  # a dummy target, for example
criterion = nn.MSELoss()
loss = criterion(out, target)
print(loss)
 
tensor(0.2984, grad_fn=<MseLossBackward>)
 

2. The structure of hook input, output && grad_in, grad_out

在MODULE.register_forward_hook(FUNCTION)中,涉及到input, output参数,
在MODULE.register_backward_hook(FUNCTION)中,涉及到grad_in, grad_out参数, 下面的图示显示了input, output分别是一个layer的输入和输出;
grad_in是整个神经网络的输出(能够想成最终的损失L)对layer的output求偏导, grad_out是 ( L对output求偏导 × output对input的偏导) => 链式法则。session

In [0]:
from google.colab import files
from IPython.display import Image
uploaded = files.upload()
 
Upload widget is only available when the cell has been executed in the current browser session. Please rerun this cell to enable.
 
Saving hook_in_out.png to hook_in_out.png
In [0]:
Image("hook_in_out.png")
Out[0]:
 

input ----------------------------> output------------------> Last layer output
y--------------------------------------------z -----> ... ------------> L
grad_out <------------------------ grad_in
(dL/dz) * (dz / dy) ---------------(dL/dz)app

 

下面代码中,若是backward = False, 表示的是前向传播,input, output分别对应layer的输入和输出;
若是backward = True, 表示的是反向传播过程,input 表示的是上图中的 grad_in, output 表示的是上图中的 grad_out.函数

In [0]:
''' Define hook 
'''
# A simple hook class that returns the input and output of a layer during forward/backward pass
class Hook():
    def __init__(self, module, backward=False):
        if backward==False:
            self.hook = module.register_forward_hook(self.hook_fn)
        else:
            self.hook = module.register_backward_hook(self.hook_fn)
    def hook_fn(self, module, input, output):
        self.input = input
        self.output = output
    def close(self):
        self.hook.remove()
In [0]:
# get the _modules.items()
# format: (name, module) 
print(list(net._modules.items()))

# use layer[0] to get the name and layer[1] to get the module
for layer in net._modules.items():
  print(layer[0], layer[1])
 
[('fc1', Linear(in_features=2, out_features=2, bias=True)), ('s1', Sigmoid()), ('fc2', Linear(in_features=2, out_features=2, bias=True)), ('s2', Sigmoid())]
fc1 Linear(in_features=2, out_features=2, bias=True)
s1 Sigmoid()
fc2 Linear(in_features=2, out_features=2, bias=True)
s2 Sigmoid()
 

为Hook类建立对象时,须要传入module参数,如下代码经过layer[1] 获取。将前向的hook都放在hookF数组中,将反向的hook都放在hookB的数组中。
注意必定要先注册hook, 以后再将data传入神经网路进行前向传播,即注册hook必定要在net(data)以前进行,由于hook函数是在forward的时候进行绑定的。ui

In [0]:
''' Register hooks on each layer 
'''
hookF = [Hook(layer[1]) for layer in list(net._modules.items())]
hookB = [Hook(layer[1],backward=True) for layer in list(net._modules.items())]
In [0]:
# run a data batch
out=net(data)
print(out)
 
tensor([0.7514, 0.7729], grad_fn=<SigmoidBackward>)
 

3. Get the hook input, output and grad_in, grad_out value

注意loss.backward(retain_graph = True) 对于backward_hook并不适用
如下报错显示了 'Hook' object has no attribute 'input', 对于loss, 它并非一个有input,output的网络层,而只是网络最后一层的输出与target的aggregated的结果。
而以前定义的Hook中,要求有明确的input和output,因此,并不适用于loss.backward()
应该采用out.backward(label_tensor, retain_graph = True)

 

3.1 loss.backward(retain_graph = True)

In [0]:
loss.backward(retain_graph = True)

print('***'*3+' Forward Hooks Inputs & Outputs '+'***'*3)
for hook in hookF:
    print(hook.input)
    print(hook.output)
    print('---'*17)
print('\n')

#! loss.backward(retain_graph=True) # doesn't work with backward hooks, 
#! since it's not a network layer but an aggregated result from the outputs of last layer vs target 
print('***'*3+' Backward Hooks Inputs & Outputs '+'***'*3)
for hook in hookB:             
    print(hook.input)          
    print(hook.output)         
    print('---'*17)
 
*********  Forward Hooks Inputs & Outputs  *********
(tensor([0.0500, 0.1000]),)
tensor([0.3775, 0.3925], grad_fn=<AddBackward0>)
---------------------------------------------------
(tensor([0.3775, 0.3925], grad_fn=<AddBackward0>),)
tensor([0.5933, 0.5969], grad_fn=<SigmoidBackward>)
---------------------------------------------------
(tensor([0.5933, 0.5969], grad_fn=<SigmoidBackward>),)
tensor([1.1059, 1.2249], grad_fn=<AddBackward0>)
---------------------------------------------------
(tensor([1.1059, 1.2249], grad_fn=<AddBackward0>),)
tensor([0.7514, 0.7729], grad_fn=<SigmoidBackward>)
---------------------------------------------------


*********  Backward Hooks Inputs & Outputs  *********
 
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-34-8c71e064b825> in <module>()
 12 print('***'*3+' Backward Hooks Inputs & Outputs '+'***'*3)
 13 for hook in hookB:
---> 14print(hook.input)
 15     print(hook.output)
 16     print('---'*17)

AttributeError: 'Hook' object has no attribute 'input'
 

3.2 out.backward(TENSOR, retain_graph = True)

下面采用的是正确的out.backward(torch.tensor([1,1],dtype=torch.float),retain_graph=True)的形式。
因为调用backward()的是out, 一个tensor而不是scalar, pytorch中不能直接求解它的Jacobian矩阵,须要为其指定grad_tensors.grad_tensors 能够看作对应张量的每一个元素的梯度。
好比对于 y.backward(v,retain_graph = True), 其中 y = (y1, y2, y3), v = (v1, v2, v3), 那么backward中执行的操做是,先分别 (y1 v1, y2 v2, y3 * v3),以后再对y求偏导,y再对parameter求偏导, 链式法则。

其实也能够看作,在通常对网络的输出y, 与标签l,利用损失函数获得一个损失标量L,表示为:
L = v1 y1 + v2 y2 + v3 y3;
dL/dy = (v1, v2, v3);
dL/dw = dL/dy
dy/dw =( v1 dy/dw, v2 dy/dw, v3 * dy/dw)
上式dL/dw中的v即为 y.backward(v,retain_graph = True)的v的体现。至关于对于y.backward()的梯度都对应乘了v的系数。

In [0]:
out.backward(torch.tensor([1, 1], dtype = torch.float), retain_graph = True)
print('***'*3+' Forward Hooks Inputs & Outputs '+'***'*3)
for hook in hookF:
    print(hook.input)
    print(hook.output)
    print('---'*17)
print('\n')
print('***'*3+' Backward Hooks Inputs & Outputs '+'***'*3)
for hook in hookB:             
    print(hook.input)          
    print(hook.output)         
    print('---'*17)
 
*********  Forward Hooks Inputs & Outputs  *********
(tensor([0.0500, 0.1000]),)
tensor([0.3775, 0.3925], grad_fn=<AddBackward0>)
---------------------------------------------------
(tensor([0.3775, 0.3925], grad_fn=<AddBackward0>),)
tensor([0.5933, 0.5969], grad_fn=<SigmoidBackward>)
---------------------------------------------------
(tensor([0.5933, 0.5969], grad_fn=<SigmoidBackward>),)
tensor([1.1059, 1.2249], grad_fn=<AddBackward0>)
---------------------------------------------------
(tensor([1.1059, 1.2249], grad_fn=<AddBackward0>),)
tensor([0.7514, 0.7729], grad_fn=<SigmoidBackward>)
---------------------------------------------------


*********  Backward Hooks Inputs & Outputs  *********
(tensor([0.0392, 0.0435]), tensor([0.0827]))
(tensor([0.0392, 0.0435]),)
---------------------------------------------------
(tensor([0.0392, 0.0435]),)
(tensor([0.1625, 0.1806]),)
---------------------------------------------------
(tensor([0.1868, 0.1755]), tensor([0.3623]))
(tensor([0.1868, 0.1755]),)
---------------------------------------------------
(tensor([0.1868, 0.1755]),)
(tensor([1., 1.]),)
---------------------------------------------------
 

4. Module Hooks Problem

Problem with backward hook function #598
在该Issue中,指出了pytorchde module的一个问题:
“Ok, so the problem is that module hooks are actually registered on the last function that the module has created. In your case x + y + z is computed as ((x + y) + z) so the hook is registered on that (_ + z) operation, and this is why you're getting only two grad inputs.

We'll definitely have to resolve this but it will need a large change in the autograd internals. However, right now @colesbury is rewriting them to make it possible to have multiple functions dispatched in parallel, and they would heavily conflict with his work. For now use only Variable hooks (or module hooks, but not on containers). Sorry!”

翻译过来是,module hooks只为一个module的最后的function注册,好比对于 (x + y + z),本应分别获得关于(x, y, z)这三个的grad, 可是pytorch会先计算(x + y), 以后计算( _ + z), 因此最终只有两个grad,一个是关于(x + y)总体的grad, 一个是关于z的grad. 这是pytorch开发中一个比较难以解决的问题,目前该问题尚未被解决。
鉴于这个问题,为了不没必要要的bug出现,设计者建议使用tensor的register_hook, 而不是module的hook。若是出现相似问题,能够知道从这里找缘由。

In [6]:
from IPython.display import Image
Image(filename = "../../Downloads/zhifubao.png", width = 200, height = 200)
Out[6]:
 

The wound is the place where the Light enters you. ~Rumi

相关文章
相关标签/搜索