实现属于本身的TensorFlow(一) - 计算图与前向传播

时间 2019-11-11

原文原文链接

前段时间由于课题须要使用了一段时间TensorFlow，感受这种框架颇有意思，除了能够搭建复杂的神经网络，也能够优化其余本身须要的计算模型，因此一直想本身学习一下写一个相似的图计算框架。前几天组会开完决定着手实现一个模仿TensorFlow接口的简陋版本图计算框架以学习计算图程序的编写以及前向传播和反向传播的实现。目前实现了前向传播和反向传播以及梯度降低优化器，并写了个优化线性模型的例子。node

代码放在了GitHub上，取名SimpleFlow, 仓库连接: https://github.com/PytLab/sim...python

git

虽然前向传播反向传播这些原理了解起来并非很复杂，可是真正着手写起来才发现,里面仍是有不少细节须要学习和处理才能对实际的模型进行优化(例如Loss函数对每一个计算节点矩阵求导的处理)。其中SimpleFlow的代码并无考虑太多的东西好比dtype和张量size的检查等，由于只是为了实现主要图计算功能并无考虑任何的优化, 内部张量运算使用的Numpy的接口(毕竟是学习和练手的目的嘛)。很久时间没更新博客了，在接下来的几篇里面我将把实现的过程的细节总结一下，但愿能够给后面学习的童鞋作个参考。github

正文

本文主要介绍计算图以及前向传播的实现, 主要涉及图的构建以及经过对构建好的图进行后序遍历而后进行前向传播计算获得具体节点上的输出值。网络

先贴上一个简单的实现效果吧:session

import simpleflow as sf

# Create a graph
with sf.Graph().as_default():
    a = sf.constant(1.0, name='a')
    b = sf.constant(2.0, name='b')
    result = sf.add(a, b, name='result')

    # Create a session to compute
    with tf.Session() as sess:
        print(sess.run(result))

计算图(Computational Graph)

计算图是计算代数中的一个基础处理方法，咱们能够经过一个有向图来表示一个给定的数学表达式，并能够根据图的特色快速方便对表达式中的变量进行求导。而神经网络的本质就是一个多层复合函数, 所以也能够经过一个图来表示其表达式。app

本部分主要总结计算图的实现，在计算图这个有向图中，每一个节点表明着一种特定的运算例如求和，乘积，向量乘积，平方等等... 例如求和表达式$f(x, y) = x + y$使用有向图表示为:框架

表达式$f(x, y, z) = z(x+y)$使用有向图表示为:ide

与TensorFlow的实现不一样，为了简化，在SimpleFlow中我并无定义Tensor类来表示计算图中节点之间的数据流动，而是直接定义节点的类型，其中主要定义了四种类型来表示图中的节点:函数

Operation: 操做节点主要接受一个或者两个输入节点而后进行简单的操做运算，例如上图中的加法操做和乘法操做等。
Variable: 没有输入节点的节点，此节点包含的数据在运算过程当中是能够变化的。
Constant: 相似Variable节点，也没有输入节点，此节点中的数据在图的运算过程当中不会发生变化
Placeholder: 一样没有输入节点，此节点的数据是经过图创建好之后经过用户传入的

其实图中的全部节点均可以当作是某种操做，其中Variable, Constant, Placeholder都是一种特殊的操做，只是相对于普通的Operation而言，他们没有输入，可是都会有输出（像上图中的$x$, $y$节点，他们自己输出自身的值到$+$节点中去），一般会输出到Operation节点，进行进一步的计算。

下面咱们主要介绍如何实现计算图的基本组件: 节点和边。

`Operation`节点

节点表示操做，边表明节点接收和输出的数据，操做节点须要含有如下属性:

input_nodes: 输入节点，里面存放与当前节点相链接的输入节点的引用
output_nodes: 输出节点, 存放以当前节点做为输入的节点，也就是当前节点的去向
output_value: 存储当前节点的数值, 若是是Add节点，此变量就存储两个输入节点output_value的和
name: 当前节点的名称
graph: 此节点所属的图

下面咱们定义了Operation基类用于表示图中的操做节点(详见https://github.com/PytLab/sim...:

class Operation(object):
    ''' Base class for all operations in simpleflow.

    An operation is a node in computational graph receiving zero or more nodes
    as input and produce zero or more nodes as output. Vertices could be an
    operation, variable or placeholder.
    '''
    def __init__(self, *input_nodes, name=None):
        ''' Operation constructor.

        :param input_nodes: Input nodes for the operation node.
        :type input_nodes: Objects of `Operation`, `Variable` or `Placeholder`.

        :param name: The operation name.
        :type name: str.
        '''
        # Nodes received by this operation.
        self.input_nodes = input_nodes

        # Nodes that receive this operation node as input.
        self.output_nodes = []

        # Output value of this operation in session execution.
        self.output_value = None

        # Operation name.
        self.name = name

        # Graph the operation belongs to.
        self.graph = DEFAULT_GRAPH

        # Add this operation node to destination lists in its input nodes.
        for node in input_nodes:
            node.output_nodes.append(self)

        # Add this operation to default graph.
        self.graph.operations.append(self)

    def compute_output(self):
        ''' Compute and return the output value of the operation.
        '''
        raise NotImplementedError

    def compute_gradient(self, grad=None):
        ''' Compute and return the gradient of the operation wrt inputs.
        '''
        raise NotImplementedError

在初始化方法中除了定义上面提到的属性外，还须要进行两个操做:

将当前节点的引用添加到他输入节点的output_nodes这样能够在输入节点中找到当前节点。
将当前节点的引用添加到图中，方便后面对图中的资源进行回收等操做

另外，每一个操做节点还有两个必须的方法: comput_output和compute_gradient. 他们分别负责根据输入节点的值计算当前节点的输出值和根据操做属性和当前节点的值计算梯度。关于梯度的计算将在后续的文章中详细介绍，本文只对节点输出值的计算进行介绍。

下面我以求和操做为例来讲明具体操做节点的实现:

class Add(Operation):
    ''' An addition operation.
    '''
    def __init__(self, x, y, name=None):
        ''' Addition constructor.

        :param x: The first input node.
        :type x: Object of `Operation`, `Variable` or `Placeholder`.

        :param y: The second input node.
        :type y: Object of `Operation`, `Variable` or `Placeholder`.

        :param name: The operation name.
        :type name: str.
        '''
        super(self.__class__, self).__init__(x, y, name=name)

    def compute_output(self):
        ''' Compute and return the value of addition operation.
        '''
        x, y = self.input_nodes
        self.output_value = np.add(x.output_value, y.output_value)
        return self.output_value

可见，计算当前节点output_value的值的前提条件就是他的输入节点的值在此以前已经计算获得了。

`Variable`节点

与Operation节点相似，Variable节点也须要output_value, output_nodes等属性，可是它没有输入节点，也就没有input_nodes属性了，而是须要在建立的时候肯定一个初始值initial_value:

class Variable(object):
    ''' Variable node in computational graph.
    '''
    def __init__(self, initial_value=None, name=None, trainable=True): 
        ''' Variable constructor.

        :param initial_value: The initial value of the variable.
        :type initial_value: number or a ndarray.

        :param name: Name of the variable.
        :type name: str.
        '''
        # Variable initial value.
        self.initial_value = initial_value

        # Output value of this operation in session execution.
        self.output_value = None

        # Nodes that receive this variable node as input.
        self.output_nodes = []

        # Variable name.
        self.name = name

        # Graph the variable belongs to.
        self.graph = DEFAULT_GRAPH

        # Add to the currently active default graph.
        self.graph.variables.append(self)
        if trainable:
            self.graph.trainable_variables.append(self)

    def compute_output(self):
        ''' Compute and return the variable value.
        '''
        if self.output_value is None:
            self.output_value = self.initial_value
        return self.output_value

`Constant`节点和`Placeholder`节点

Constant和Placeholder节点与Variable节点相似，具体实现详见: https://github.com/PytLab/sim...

计算图对象

在定义了图中的节点后咱们须要将定义好的节点放入到一个图中统一保管，所以就须要定义一个Graph类来存放建立的节点，方便统一操做图中节点的资源。

class Graph(object):
    ''' Graph containing all computing nodes.
    '''
    def __init__(self):
        ''' Graph constructor.
        '''
        self.operations, self.constants, self.placeholders = [], [], []
        self.variables, self.trainable_variables = [], []

为了提供一个默认的图，在导入simpleflow模块的时候建立一个全局变量来引用默认的图:

from .graph import Graph

# Create a default graph.
import builtins
DEFAULT_GRAPH = builtins.DEFAULT_GRAPH = Graph()

为了模仿TensorFlow的接口，咱们给Graph添加上下文管理器协议方法使其成为一个上下文管理器, 同时也添加一个as_default方法:

class Graph(object):
    #...

    def __enter__(self):
        ''' Reset default graph.
        '''
        global DEFAULT_GRAPH
        self.old_graph = DEFAULT_GRAPH
        DEFAULT_GRAPH = self
        return self

    def __exit__(self, exc_type, exc_value, exc_tb):
        ''' Recover default graph.
        '''
        global DEFAULT_GRAPH
        DEFAULT_GRAPH = self.old_graph

    def as_default(self):
        ''' Set this graph as global default graph.
        '''
        return self

这样在进入with代码块以前先保存旧的默认图对象而后将当前图赋值给全局图对象，这样with代码块中的节点默认会添加到当前的图中。最后退出with代码块时再对图进行恢复便可。这样咱们能够按照TensorFlow的方式来在某个图中建立节点.

Ok，根据上面的实现咱们已经能够建立一个计算图了:

import simpleflow as sf

with sf.Graph().as_default():
    a = sf.constant([1.0, 2.0], name='a')
    b = sf.constant(2.0, name='b')
    c = a * b

前向传播(Feedforward)

实现了计算图和图中的节点，咱们须要对计算图进行计算, 本部分对计算图的前向传播的实现进行总结。

会话

首先，咱们须要实现一个Session来对一个已经建立好的计算图进行计算，由于当咱们建立咱们以前定义的节点的时候其实只是建立了一个空节点，节点中并无数值能够用来计算，也就是output_value是空的。为了模仿TensorFlow的接口，咱们在这里也把session定义成一个上下文管理器:

class Session(object):
    ''' A session to compute a particular graph.
    '''
    def __init__(self):
        ''' Session constructor.
        '''
        # Graph the session computes for.
        self.graph = DEFAULT_GRAPH

    def __enter__(self):
        ''' Context management protocal method called before `with-block`.
        '''
        return self

    def __exit__(self, exc_type, exc_value, exc_tb):
        ''' Context management protocal method called after `with-block`.
        '''
        self.close()

    def close(self):
        ''' Free all output values in nodes.
        '''
        all_nodes = (self.graph.constants + self.graph.variables +
                     self.graph.placeholders + self.graph.operations +
                     self.graph.trainable_variables)
        for node in all_nodes:
            node.output_value = None

    def run(self, operation, feed_dict=None):
        ''' Compute the output of an operation.'''
        # ...

计算某个节点的输出值

上面咱们已经能够构建出一个计算图了，计算图中的每一个节点与其相邻的节点有方向的联系起来，如今咱们须要根据图中节点的关系来推算出某个节点的值。那么如何计算呢? 仍是以咱们刚才$f(x, y, z) = z(x + y)$的计算图为例,

若咱们须要计算橙色$\\times$运算节点的输出值，咱们须要计算与它相连的两个输入节点的输出值，进而须要计算绿色$+$的输入节点的输出值。咱们能够经过后序遍从来获取计算一个节点所需的全部节点的输出值。为了方便实现，后序遍历我直接使用了递归的方式来实现:

def _get_prerequisite(operation):
    ''' Perform a post-order traversal to get a list of nodes to be computed in order.
    '''
    postorder_nodes = []

    # Collection nodes recursively.
    def postorder_traverse(operation):
        if isinstance(operation, Operation):
            for input_node in operation.input_nodes:
                postorder_traverse(input_node)
        postorder_nodes.append(operation)

    postorder_traverse(operation)

    return postorder_nodes

经过此函数咱们能够获取计算一个节点值所须要全部节点列表，再依次计算列表中节点的输出值，最后即可以轻易的计算出当前节点的输出值了。

class Session(object):
    # ...
    def run(self, operation, feed_dict=None):
        ''' Compute the output of an operation.

        :param operation: A specific operation to be computed.
        :type operation: object of `Operation`, `Variable` or `Placeholder`.

        :param feed_dict: A mapping between placeholder and its actual value for the session.
        :type feed_dict: dict.
        '''
        # Get all prerequisite nodes using postorder traversal.
        postorder_nodes = _get_prerequisite(operation)

        for node in postorder_nodes:
            if type(node) is Placeholder:
                node.output_value = feed_dict[node]
            else:  # Operation and variable
                node.compute_output()

        return operation.output_value

例子

上面咱们实现了计算图以及前向传播，咱们就能够建立计算图计算表达式的值了, 以下:

$$ f = \left[ \begin{matrix} 1 & 2 & 3 \\ 3 & 4 & 5 \\ \end{matrix} \right] \times \left[ \begin{matrix} 9 & 8 \\ 7 & 6 \\ 10 & 11 \\ \end{matrix} \right] + 3 = \left[ \begin{matrix} 54 & 54 \\ 106 & 104 \\ \end{matrix} \right] $$

import simpleflow as sf

# Create a graph
with sf.Graph().as_default():
    w = sf.constant([[1, 2, 3], [3, 4, 5]], name='w')
    x = sf.constant([[9, 8], [7, 6], [10, 11]], name='x')
    b = sf.constant(1.0, 'b')
    result = sf.matmul(w, x) + b

    # Create a session to compute
    with sf.Session() as sess:
        print(sess.run(result))

输出值:

array([[  54.,   54.],
       [ 106.,  104.]])

总结

本文使用Python实现了计算图以及计算图的前向传播，并模仿TensorFlow的接口建立了Session以及Graph对象。下篇中将继续总结计算图节点计算梯度的方法以及反向传播和梯度降低优化器的实现。

最后再附上simpleflow项目的连接, 欢迎相互学习和交流: https://github.com/PytLab/sim...