TensorFlow人工智能入门教程之十一最强网络DLSTM 双向长短时间记忆网络（阿里小AI实现）

时间 2019-11-11

标签 tensorflow 人工智能入门教程之十最强网络 dlstm 双向长短时间记忆阿里实现栏目系统网络繁體版

原文原文链接

失眠。。。。上一章讲了最强网络之一 RSNN 深度残差网络这一章节咱们来说讲还有一个很强的网络模型，就是双向LSTM 也就是前一阵阿里吹牛逼的小AI的实现的一个重要网络部分，固然实际上比这还要复杂层数以及多个网络配合，其实就好像 alphaGo 同样，其实多个网络配合多层复用效果是最好的，这就像咱们有大脑第一中枢系统，可是咱们脊髓是第二中枢系统同样，脊髓能够控制咱们身体的某些肌肉关节运动，与大脑相互配合调节，经过神经传输相互传递信息，互相配合调节，大脑为主脊髓为辅。python

最近在学钢琴，那真难。有些东西境界到的人懂的人天然会懂。因此我博客分享一下个人理解，这都是本身自学摸索研究的东西，主要一是但愿能够给本身作个整理，无聊写写东西，其实这些东西对我来讲都是不重要的东西，可是可让你们学习了解下人工智能，人工智能就这么点么，这是基础，前面全部章节所有是基础，基础知识，你所有掌握了这些，你还只是一个门外汉，最主要的是要可以熟练的使用，不管是用来作什么，为所欲为，因地制宜，可以知道怎么运用，这才是最重要的。因此我把这些对我来讲还算很简单的知识吧，这里以及后面,至于方向，我将的东西也许有些是本身的理解，可是绝对不会影响你们的使用，本人去年一年创业就是使用tensorflow ，而后把它在spark上整合实现了，从新改写了bp反馈层 ff前向层同时改写了部分代码、实现了0.6时候的tensorflow的与spark 并行训练，因此对人工智能方面也许没有很好的数学基础，可是对代码对理解方面仍是算能够的吧。创业项目基本就是人工智能的运用以及使用。git

双向LSTM 阿里的小AI 就是使用它，我估计是使用了双向LSTM 以后接着一个RNN层并加强学习。可是小AI 里面最重要的仍是这个双向LSTM，结合RNN 结合其余的几种网络还有加强学习 . 网络

LSTM 是为了解决 RNN的一些问题，对隐藏层进行改进，让前面上下文信息可以有更远的印象，也就是记忆，架构

LSTM网络本质仍是RNN网络，基于LSTM的RNN架构上的变化有最早的BRNN（双向）dom

LSTM引入了Cell 与其说LSTM是一种RNN结构，倒不如说LSTM是RNN的一个魔改组件，把上面看到的网络中的小圆圈换成LSTM的block，就是所谓的LSTM了。那它的block长什么样子呢？ oop

Cell，就是咱们的小本子，有个叫作state的参数东西来记事儿的学习
Input Gate，Output Gate，在参数输入输出的时候起点做用，算一算东西测试
Forget Gate：遗忘门就像人体的遗忘曲线同样，正是由于遗忘的调节才能知道那些更重要，由于原始的LSTM在这个位置就是一个值1，是链接到下一时间的那个参数，之前的事情记太牢了，最近的就不住就很差了，因此要选择性遗忘一些东西。经过遗忘进行调节，这样知道那些更重要。那些值得记忆。ui

上上一章咱们讲了RNN/LSTM 的使用，因此那些操做不理解的能够到上上一章去看。人工智能

这里讲一下双向LSTM

LSTM网络本质仍是RNN网络，基于LSTM的RNN架构上的变化有最早的BRNN（双向）

在大多数应用里面 NLP 自动问答基于时间有关的上下文有关的，通常都是双向LSTM+LSTM/RNN横向扩展来实现的，效果很是好。好像国内不少吹逼的都是这样的机构实现的，虽然叫的名字不一样可是实际上是一个东西。

双向LSTM 顾名思义采用了可以双向的LSTM cell单元。是的每次可以访问下文也能访问下文

下面看看BIRNN的结构

而 LSTM 咱们上面讲了其实就是RNN 把其中的组件部位换了加上了cell 也就是记忆单元。因此双向LSTM

就是把上面双向RNN 里面h 那些园的单元所有换成LSTM单元就是双向LSTM. 阿里的小AI 就是使用它，我估计是使用了双向LSTM 以后接着一个RNN层吧。可是小AI 里面最重要的仍是这个双向LSTM，结合RNN 结合其余的几种网络还有加强学习 .

双向LSTM 在tensorflow中与上上篇文章不一样的地方就是

咱们直接使用rnn.rnn 来构建RNN 而后传入的LSTMcell（单元），这里双向是

rnn.bidirectional_rnn

其余基本与上上章基本相同，替换一下稍微修改下便可，不理解的能够跳回去看看上上章 LSTM/RNN的内容

下面贴出示例代码

import input_data
mnist = input_data.read_data_sets("/tmp/data/", one_hot=True)

import tensorflow as tf
from tensorflow.python.ops.constant_op import constant
from tensorflow.models.rnn import rnn, rnn_cell
import numpy as np

# Parameters
learning_rate = 0.001
training_iters = 100000
batch_size = 128
display_step = 10

# Network Parameters
n_input = 28 # MNIST data input (img shape: 28*28)
n_steps = 28 # timesteps
n_hidden = 128 # hidden layer num of features
n_classes = 10 # MNIST total classes (0-9 digits)

# tf Graph input
x = tf.placeholder("float", [None, n_steps, n_input])
# Tensorflow LSTM cell requires 2x n_hidden length (state & cell)
istate_fw = tf.placeholder("float", [None, 2*n_hidden])
istate_bw = tf.placeholder("float", [None, 2*n_hidden])
y = tf.placeholder("float", [None, n_classes])

# Define weights
weights = {
    # Hidden layer weights => 2*n_hidden because of foward + backward cells
    'hidden': tf.Variable(tf.random_normal([n_input, 2*n_hidden])),
    'out': tf.Variable(tf.random_normal([2*n_hidden, n_classes]))
}
biases = {
    'hidden': tf.Variable(tf.random_normal([2*n_hidden])),
    'out': tf.Variable(tf.random_normal([n_classes]))
}

def BiRNN(_X, _istate_fw, _istate_bw, _weights, _biases, _batch_size, _seq_len):

    # BiRNN requires to supply sequence_length as [batch_size, int64]
    # Note: Tensorflow 0.6.0 requires BiRNN sequence_length parameter to be set
    # For a better implementation with latest version of tensorflow, check below
    _seq_len = tf.fill([_batch_size], constant(_seq_len, dtype=tf.int64))

    # input shape: (batch_size, n_steps, n_input)
    _X = tf.transpose(_X, [1, 0, 2])  # permute n_steps and batch_size
    # Reshape to prepare input to hidden activation
    _X = tf.reshape(_X, [-1, n_input]) # (n_steps*batch_size, n_input)
    # Linear activation
    _X = tf.matmul(_X, _weights['hidden']) + _biases['hidden']

    # Define lstm cells with tensorflow
    # Forward direction cell
    lstm_fw_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Backward direction cell
    lstm_bw_cell = rnn_cell.BasicLSTMCell(n_hidden, forget_bias=1.0)
    # Split data because rnn cell needs a list of inputs for the RNN inner loop
    _X = tf.split(0, n_steps, _X) # n_steps * (batch_size, n_hidden)

    # Get lstm cell output
    outputs = rnn.bidirectional_rnn(lstm_fw_cell, lstm_bw_cell, _X,
                                            initial_state_fw=_istate_fw,
                                            initial_state_bw=_istate_bw,
                                            sequence_length=_seq_len)

    # Linear activation
    # Get inner loop last output
    return tf.matmul(outputs[-1], _weights['out']) + _biases['out']

pred = BiRNN(x, istate_fw, istate_bw, weights, biases, batch_size, n_steps)


# Define loss and optimizer
cost = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(pred, y)) # Softmax loss
optimizer = tf.train.AdamOptimizer(learning_rate=learning_rate).minimize(cost) # Adam Optimizer

# Evaluate model
correct_pred = tf.equal(tf.argmax(pred,1), tf.argmax(y,1))
accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32))

# Initializing the variables
init = tf.initialize_all_variables()

# Launch the graph
with tf.Session() as sess:
    sess.run(init)
    step = 1
    # Keep training until reach max iterations
    while step * batch_size < training_iters:
        batch_xs, batch_ys = mnist.train.next_batch(batch_size)
        # Reshape data to get 28 seq of 28 elements
        batch_xs = batch_xs.reshape((batch_size, n_steps, n_input))
        # Fit training using batch data
        sess.run(optimizer, feed_dict={x: batch_xs, y: batch_ys,
                                       istate_fw: np.zeros((batch_size, 2*n_hidden)),
                                       istate_bw: np.zeros((batch_size, 2*n_hidden))})
        if step % display_step == 0:
            # Calculate batch accuracy
            acc = sess.run(accuracy, feed_dict={x: batch_xs, y: batch_ys,
                                                istate_fw: np.zeros((batch_size, 2*n_hidden)),
                                                istate_bw: np.zeros((batch_size, 2*n_hidden))})
            # Calculate batch loss
            loss = sess.run(cost, feed_dict={x: batch_xs, y: batch_ys,
                                             istate_fw: np.zeros((batch_size, 2*n_hidden)),
                                             istate_bw: np.zeros((batch_size, 2*n_hidden))})
            print "Iter " + str(step*batch_size) + ", Minibatch Loss= " + "{:.6f}".format(loss) + \
                  ", Training Accuracy= " + "{:.5f}".format(acc)
        step += 1
    print "Optimization Finished!"
    # Calculate accuracy for 128 mnist test images
    test_len = 128
    test_data = mnist.test.images[:test_len].reshape((-1, n_steps, n_input))
    test_label = mnist.test.labels[:test_len]
    print "Testing Accuracy:", sess.run(accuracy, feed_dict={x: test_data, y: test_label,
                                                             istate_fw: np.zeros((test_len, 2*n_hidden)),
                                                             istate_bw: np.zeros((test_len, 2*n_hidden))})

下面贴出运行测试截图。

TensorFlow人工智能入门教程之十一 最强网络DLSTM 双向长短时间记忆网络（阿里小AI实现）

TensorFlow人工智能入门教程之十一最强网络DLSTM 双向长短时间记忆网络（阿里小AI实现）