如同word2vec中提到的,不少数据的原型,先后之间是存在关联性的。关联性的打破必然形成关键指征的丢失,从而在后续的训练和预测流程中下降准确率。
除了提过的天然语言处理(NLP)领域,自动驾驶前一时间点的雷达扫描数据跟后一时间点的扫描数据、音乐旋律的时间性、股票前一天跟后一天的数据,都属于这类的典型案例。
所以在传统的神经网络中,每个节点,若是把上一次的运算结果记录下来,在下一次数据处理的时候,跟上一次的运算结果结合在一块儿混合运算,就能够体现出上一次的数据对本次的影响。
如上图所示,图中每个节点就至关于神经网络中的一个节点,t-1 、 t 、 t+1是指该节点在时间序列中的动做,你能够理解为第n批次的数据。
因此上面图中的3个节点,在实现中实际是同1个节点。
指的是,在n-1批次数据到来的时候,节点进行计算,完成输出,同时保留了一个state。
在下一批次数据到来的时候,state值跟新到来的数据一块儿进行运算,再次完成输出,再次保留一个state参与下一批次的运算,如此循环。这也是循环神经网络名称的由来。python
RNN算法存在一个问题,那就是同一节点在某一时间点所保存的状态,随着时间的增加,它所能形成的影响就越小,逐渐衰减至无。这对于一些长距离上下文相关的应用,仍然是不知足要求的。
这就又发展出了LSTM算法。git
如图所示:LSTM区别于RNN的地方,主要就在于它在算法中加入了一个判断信息有用与否的“处理器”,这个处理器做用的结构被称为cell。
一个cell当中被放置了三个“门电路”,分别叫作输入门、遗忘门和输出门。一个信息进入LSTM的网络当中,能够根据规则来判断是否有用。只有符合算法认证的信息才会留下,不符的信息则经过遗忘门被遗忘。github
经过这样简单的节点结构改善,就有效的解决了长时序依赖数据在神经网络中的表现。算法
LSTM随后还出现了很多变种,进一步增强了功能或者提升了效率。好比当前比较有名的GRU(Gated Recurrent Unit )是2014年提出的。GRU在不下降处理效果的同时,减小了一个门结构。只有重置门(reset gate)和更新门(update gate)两个门,而且把细胞状态和隐藏状态进行了合并。这使得算法的实现更容易,结构更清晰,运算效率也有所提升。
目前的应用中,较多的使用是LSTM或者GRU。RNN网络其实已经不多直接用到了。数据库
官方的RNN网络教程是实现了一个NLP的应用,技术上很切合RNN的典型特征。不过从程序逻辑上太复杂了,并且计算结果也很不直观。
为了能尽快的抓住RNN网络的本质,本例仍然延续之前用过的MNIST程序,把其中的识别模型替换为RNN-LSTM网络,相信能够更快的让你们上手RNN-LSTM。
本例中的源码来自aymericdamien的github仓库,为了更接近咱们原来的示例代码,适当作了修改。在此对原做者表示感谢。
官方的课程建议在读完这里的内容以后再去学习,而且也很值得深刻的研究。
源码:编程
#!/usr/bin/env python # -*- coding=UTF-8 -*- """ Recurrent Neural Network. A Recurrent Neural Network (LSTM) implementation example using TensorFlow library. This example is using the MNIST database of handwritten digits (http://yann.lecun.com/exdb/mnist/) Links: [Long Short Term Memory](http://deeplearning.cs.cmu.edu/pdfs/Hochreiter97_lstm.pdf) [MNIST Dataset](http://yann.lecun.com/exdb/mnist/). Author: Aymeric Damien Project: https://github.com/aymericdamien/TensorFlow-Examples/ """ from __future__ import print_function import tensorflow as tf from tensorflow.contrib import rnn # Import MNIST data from tensorflow.examples.tutorials.mnist import input_data #这里指向之前下载的数据,节省下载时间 #使用时请将后面的路径修改成本身数据所在路径 mnist = input_data.read_data_sets("../mnist/data", one_hot=True) ''' To classify images using a recurrent neural network, we consider every image row as a sequence of pixels. Because MNIST image shape is 28*28px, we will then handle 28 sequences of 28 steps for every sample. ''' # Training Parameters #训练梯度 learning_rate = 0.001 #训练总步骤 training_steps = 10000 #每批次量 batch_size = 128 #每200步显示一次训练进度 display_step = 200 # Network Parameters #下面两个值实际就是28x28的图片,可是分红每组进入RNN的数据28个, #而后一共28个批次(时序)的数据,利用这种方式,找出单方向相邻两个点之间的规律 #这种方式当时不如CNN的效果,但咱们这里是为了展现RNN的应用 num_input = 28 # MNIST data input (img shape: 28*28) timesteps = 28 # timesteps #LSTM网络的参数,隐藏层数量 num_hidden = 128 # hidden layer num of features #最终分为10类,0-9十个字付 num_classes = 10 # MNIST total classes (0-9 digits) # tf Graph input #训练数据输入,跟MNIST相同 X = tf.placeholder("float", [None, timesteps, num_input]) Y = tf.placeholder("float", [None, num_classes]) # Define weights #权重和偏移量 weights = tf.Variable(tf.random_normal([num_hidden, num_classes])) biases = tf.Variable(tf.random_normal([num_classes])) def RNN(x, weights, biases): # Prepare data shape to match `rnn` function requirements # Current data input shape: (batch_size, timesteps, n_input) # Required shape: 'timesteps' tensors list of shape (batch_size, n_input) # Unstack to get a list of 'timesteps' tensors of shape (batch_size, n_input) #进入的数据是X[128(批量),784(28x28)]这样的数据 #下面函数转换成x[128,28]的数组,数组长度是28 #至关于一个[28,128,28]的张量 x = tf.unstack(x, timesteps, 1) # Define a lstm cell with tensorflow #定义一个lstm Cell,其中有128个单元,这个数值能够修改调优 lstm_cell = rnn.BasicLSTMCell(num_hidden, forget_bias=1.0) # Get lstm cell output #使用单元计算x,最后得到输出及状态 outputs, states = rnn.static_rnn(lstm_cell, x, dtype=tf.float32) # Linear activation, using rnn inner loop last output #仍然是咱们熟悉的算法,这里至关于该节点的激活函数(就是原来rule的位置) return tf.matmul(outputs[-1], weights) + biases #使用RNN网络定义一个算法模型 logits = RNN(X, weights, biases) #预测算法 prediction = tf.nn.softmax(logits) # Define loss and optimizer #代价函数、优化器及训练器,跟原来基本是相似的 loss_op = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( logits=logits, labels=Y)) optimizer = tf.train.GradientDescentOptimizer(learning_rate=learning_rate) train_op = optimizer.minimize(loss_op) # Evaluate model (with test logits, for dropout to be disabled) #使用上面定义的预测算法进行预测,跟样本标签相同即为预测正确 correct_pred = tf.equal(tf.argmax(prediction, 1), tf.argmax(Y, 1)) #最后换算成正确率 accuracy = tf.reduce_mean(tf.cast(correct_pred, tf.float32)) # Initialize the variables (i.e. assign their default value) init = tf.global_variables_initializer() # Start training with tf.Session() as sess: # Run the initializer sess.run(init) for step in range(1, training_steps+1): batch_x, batch_y = mnist.train.next_batch(batch_size) # Reshape data to get 28 seq of 28 elements #首先把数据从[128,784]转换成[128,28,28]的形状,这跟之前线性回归是不一样的 batch_x = batch_x.reshape((batch_size, timesteps, num_input)) # Run optimization op (backprop) #逐批次训练 sess.run(train_op, feed_dict={X: batch_x, Y: batch_y}) if step % display_step == 0 or step == 1: # Calculate batch loss and accuracy #每200个批次显示一下进度,当前的代价值机正确率 loss, acc = sess.run([loss_op, accuracy], feed_dict={X: batch_x, Y: batch_y}) print("Step " + str(step) + ", Minibatch Loss= " + \ "{:.4f}".format(loss) + ", Training Accuracy= " + \ "{:.3f}".format(acc)) print("Optimization Finished!") # Calculate accuracy for 128 mnist test images #训练完成,使用测试组数据进行预测 test_len = 128 test_data = mnist.test.images[:test_len].reshape((-1, timesteps, num_input)) test_label = mnist.test.labels[:test_len] print("Testing Accuracy:", \ sess.run(accuracy, feed_dict={X: test_data, Y: test_label}))
跟原来的MNIST代码对比,本源码有如下几个修改:数组
运算结果:bash
Step 9000, Minibatch Loss= 0.4518, Training Accuracy= 0.859 Step 9200, Minibatch Loss= 0.4717, Training Accuracy= 0.852 Step 9400, Minibatch Loss= 0.5074, Training Accuracy= 0.859 Step 9600, Minibatch Loss= 0.4006, Training Accuracy= 0.883 Step 9800, Minibatch Loss= 0.3571, Training Accuracy= 0.875 Step 10000, Minibatch Loss= 0.3069, Training Accuracy= 0.906 Optimization Finished! Testing Accuracy: 0.8828125
训练的结果并非很高,由于对于图像识别,RNN并非很好的算法,这里只是演示一个基本的RNN-LSTM模型。网络
上面的例子让你们对于RNN/LSTM作了入门。实际上RNN/LSTM并不适合用于图像识别,一个典型的LSTM应用案例应当是NLP。咱们下面再举一个这方面的案例。
本节是一个利用唐诗数据库,训练一个RNN/LSTM网络,随后利用训练好的网络自动写诗的案例。
源码来自互联网,做者:斗大的熊猫,在此表示感谢。
为了适应python2.x+TensorFlow1.4.1的运行环境,另外也为了你们读起来方便把训练部分跟生成部分集成到了一块儿,所以源码有所修改。也建议你们去原做者的博客去读一读相关的文章,会颇有收获,在引文中也有直接的连接。
源码讲解:app
其他的部分相信凭借注释和之前的经验应当能看懂了:
#!/usr/bin/env python # -*- coding=UTF-8 -*- # source from: # http://blog.topspeedsnail.com/archives/10542 # poetry.txt from: # https://pan.baidu.com/s/1o7QlUhO # revised: andrew # https://formoon.github.io # add python 2.x support and tf 1.4.1 support #------------------------------------------------------------------# import collections import numpy as np import tensorflow as tf import argparse import codecs import os,time import sys reload(sys) sys.setdefaultencoding('utf-8') #-------------------------------数据预处理---------------------------# poetry_file ='poetry.txt' # 诗集 poetrys = [] def readPoetry(): global poetrys #with open(poetry_file, "r", encoding='utf-8',) as f: with codecs.open(poetry_file, "r","utf-8") as f: for line in f: try: content = line.strip().split(':')[1] #title, content = line.strip().split(':') content = content.replace(' ','') if '_' in content or '(' in content or '(' in content or '《' in content or '[' in content: continue if len(content) < 5 or len(content) > 79: continue content = '[' + content + ']' poetrys.append(content) except Exception as e: pass # 按诗的字数排序 poetrys = sorted(poetrys,key=lambda line: len(line)) #for item in poetrys: # print(item) # 统计每一个字出现次数 readPoetry() all_words = [] for poetry in poetrys: all_words += [word for word in poetry] # print poetry # for word in poetry: # print(word) # all_words += word counter = collections.Counter(all_words) count_pairs = sorted(counter.items(), key=lambda x: -x[1]) words, _ = zip(*count_pairs) #print words # 取前多少个经常使用字 words = words[:len(words)] + (' ',) # 每一个字映射为一个数字ID word_num_map = dict(zip(words, range(len(words)))) #print(word_num_map) # 把诗转换为向量形式,参考word2vec to_num = lambda word: word_num_map.get(word, len(words)) poetrys_vector = [ list(map(to_num, poetry)) for poetry in poetrys] #[[314, 3199, 367, 1556, 26, 179, 680, 0, 3199, 41, 506, 40, 151, 4, 98, 1], #[339, 3, 133, 31, 302, 653, 512, 0, 37, 148, 294, 25, 54, 833, 3, 1, 965, 1315, 377, 1700, 562, 21, 37, 0, 2, 1253, 21, 36, 264, 877, 809, 1] #....] # 每次取64首诗进行训练 batch_size = 64 n_chunk = len(poetrys_vector) // batch_size x_batches = [] y_batches = [] def genTrainData(b): global batch_size,n_chunk,x_batches,y_batches,poetrys_vector batch_size=b for i in range(n_chunk): start_index = i * batch_size end_index = start_index + batch_size batches = poetrys_vector[start_index:end_index] length = max(map(len,batches)) xdata = np.full((batch_size,length), word_num_map[' '], np.int32) for row in range(batch_size): xdata[row,:len(batches[row])] = batches[row] ydata = np.copy(xdata) ydata[:,:-1] = xdata[:,1:] """ xdata ydata [6,2,4,6,9] [2,4,6,9,9] [1,4,2,8,5] [4,2,8,5,5] """ x_batches.append(xdata) y_batches.append(ydata) #---------------------------------------RNN--------------------------------------# # 定义RNN def neural_network(input_data, model='lstm', rnn_size=128, num_layers=2): if model == 'rnn': cell_fun = tf.nn.rnn_cell.BasicRNNCell elif model == 'gru': cell_fun = tf.nn.rnn_cell.GRUCell elif model == 'lstm': cell_fun = tf.nn.rnn_cell.BasicLSTMCell cell = cell_fun(rnn_size, state_is_tuple=True) cell = tf.nn.rnn_cell.MultiRNNCell([cell] * num_layers, state_is_tuple=True) initial_state = cell.zero_state(batch_size, tf.float32) with tf.variable_scope('rnnlm'): softmax_w = tf.get_variable("softmax_w", [rnn_size, len(words)+1]) softmax_b = tf.get_variable("softmax_b", [len(words)+1]) with tf.device("/cpu:0"): embedding = tf.get_variable("embedding", [len(words)+1, rnn_size]) inputs = tf.nn.embedding_lookup(embedding, input_data) outputs, last_state = tf.nn.dynamic_rnn(cell, inputs, initial_state=initial_state, scope='rnnlm') output = tf.reshape(outputs,[-1, rnn_size]) logits = tf.matmul(output, softmax_w) + softmax_b probs = tf.nn.softmax(logits) return logits, last_state, probs, cell, initial_state #训练 def train_neural_network(): global datafile input_data = tf.placeholder(tf.int32, [64, None]) output_targets = tf.placeholder(tf.int32, [64, None]) logits, last_state, _, _, _ = neural_network(input_data) targets = tf.reshape(output_targets, [-1]) #loss = tf.nn.seq2seq.sequence_loss_by_example([logits], [targets], [tf.ones_like(targets, dtype=tf.float32)], len(words)) loss = tf.contrib.legacy_seq2seq.sequence_loss_by_example([logits], [targets], [tf.ones_like(targets, dtype=tf.float32)], len(words)) cost = tf.reduce_mean(loss) learning_rate = tf.Variable(0.0, trainable=False) tvars = tf.trainable_variables() grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), 5) optimizer = tf.train.AdamOptimizer(learning_rate) train_op = optimizer.apply_gradients(zip(grads, tvars)) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) #saver = tf.train.Saver(tf.all_variables()) saver = tf.train.Saver() for epoch in range(50): sess.run(tf.assign(learning_rate, 0.002 * (0.97 ** epoch))) n = 0 for batche in range(n_chunk): train_loss, _ , _ = sess.run([cost, last_state, train_op], feed_dict={input_data: x_batches[n], output_targets: y_batches[n]}) n += 1 print(epoch, batche, train_loss) if epoch % 7 == 0: #保存的数据,文件名中有批次的标志 saver.save(sess, datafile, global_step=epoch) #-------------------------------生成古诗---------------------------------# # 使用训练完成的模型 def gen_poetry(): global datafile input_data = tf.placeholder(tf.int32, [1, None]) output_targets = tf.placeholder(tf.int32, [1, None]) def to_word(weights): t = np.cumsum(weights) s = np.sum(weights) sample = int(np.searchsorted(t, np.random.rand(1)*s)) return words[sample] _, last_state, probs, cell, initial_state = neural_network(input_data) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) saver = tf.train.Saver() #读取最后一个批次的训练数据 saver.restore(sess, datafile+"-49") state_ = sess.run(cell.zero_state(1, tf.float32)) x = np.array([list(map(word_num_map.get, '['))]) [probs_, state_] = sess.run([probs, last_state], feed_dict={input_data: x, initial_state: state_}) word = to_word(probs_) #word = words[np.argmax(probs_)] poem = '' while word != ']': poem += word if word == ',' or word=='。': poem += '\n' x = np.zeros((1,1)) x[0,0] = word_num_map[word] [probs_, state_] = sess.run([probs, last_state], feed_dict={input_data: x, initial_state: state_}) word = to_word(probs_) #word = words[np.argmax(probs_)] return poem #-------------------------------生成藏头诗---------------------------------# def gen_poetry_with_head(head,phase): global datafile input_data = tf.placeholder(tf.int32, [1, None]) output_targets = tf.placeholder(tf.int32, [1, None]) def to_word(weights): t = np.cumsum(weights) s = np.sum(weights) sample = int(np.searchsorted(t, np.random.rand(1)*s)) return words[sample] _, last_state, probs, cell, initial_state = neural_network(input_data) with tf.Session() as sess: # sess.run(tf.initialize_all_variables()) sess.run(tf.global_variables_initializer()) saver = tf.train.Saver() saver.restore(sess, datafile+"-49") state_ = sess.run(cell.zero_state(1, tf.float32)) poem = '' i = 0 p = 0 head=unicode(head,"utf-8"); for word in head: while True: if word != ',' and word != '。' and word != ']': poem += word p += 1 if p == phase: p = 0 break else: word='[' x = np.array([list(map(word_num_map.get, word))]) [probs_, state_] = sess.run([probs, last_state], feed_dict={input_data: x, initial_state: state_}) word = to_word(probs_) if i % 2 == 0: poem += ',\n' else: poem += '。\n' i += 1 return poem FLAGS = None datafile='./data/module-49' def datafile_exist(): return os.path.exists(datafile+"-49.index") def main(_): # if FLAGS.train or (not datafile_exist()): if FLAGS.train: genTrainData(64) print("poems: ",len(poetrys)) train_neural_network() exit() if datafile_exist(): genTrainData(1) if FLAGS.generate: print(gen_poetry()) else: print(gen_poetry_with_head(FLAGS.head,7)) if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('-a','--head', type=str, default='大寒将至', help='poetry with appointed head char') parser.add_argument('-t','--train', action='store_true',default=False, help='Force do train') parser.add_argument('-g','--generate', action='store_true',default=False, help='Force do train') FLAGS, unparsed = parser.parse_known_args() tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)
使用方法:
-a参数是指定藏头诗开始的字;
-g参数直接自动生成;
-t强制开始训练。(注意训练的时间仍是比较长的)
生成的效果请看:
> ./poetry.py -g 沉眉默去迎风雪, 江上才风著故人。 手把柯子不看泪, 笑逢太守也怜君。 秋风不定红钿啭, 茶雪欹眠愁断人。 语苦微成求不死, 醉看花发渐盈衣。 #藏头诗 > ./poetry.py -a "春节快乐" 春奔桃芳水路犹, 节似鸟飞酒绿出。 快龟缕日发春时, 乐见来还日只相。
至少有了个古诗的样子了。
(待续...)
TensorFlow练习3: RNN, Recurrent Neural Networks
TensorFlow练习7: 基于RNN生成古诗词
如何用TensorFlow构建RNN?这里有一份极简的教程
(译)理解 LSTM 网络 (Understanding LSTM Networks by colah)