
RNN它是一种处理时间序列数据很是流行的模型,在NLP、时间序列预测等领域已经展现出了很大的做用。因为本文注重的是RNN的实践,不是对RNN的理论知识的讲解,因此感兴趣的能够去系统地学习RNN。web
下面的例子是经过tensorflow进行实现的。使用tensorflow实现rnn或者lstm很方便,只需建立rnn或者lstm神经单元,而后建立网络就能够了,可是rnn或者lstm不一样于常规的nn神经网络,由于它是处理时间序列的,因此在进行batch训练时,对数据格式的要求也不同。
sql
下面是一个常见的RNN模型:ruby
数据预处理
微信
首先,咱们须要作的就是先导入依赖库,对数据进行划分训练集和测试集。以及将数据进行归一化处理:网络
import pandas as pdimport numpy as npimport tensorflow as tfimport matplotlib.pyplot as pltfrom sklearn.cluster import k_meansfrom sklearn.preprocessing import MinMaxScaler pd.read_csv('SP_2000_2017_Daily.csv') = df[:4100].Close.as_matrix().astype(float) = data = np.reshape(data, (-1, 1)) mm = MinMaxScaler(feature_range=(-1,1)) data = mm.fit_transform(data) print(data.shape) y = set_window(data)
# --------------------------划分训练和测试数据集-------------------- train_x = x[:2000] train_y = y[:2000] test_x = x[2000:4000] test_y = y[2000:4000]
因为RNN的结构须要按照窗口进行输入,因此还须要对数据进行划分窗口:
app
def set_window(data, windowSize = WINDOW_SIZE): x = [] label = [] length = len(data) for i in range(length - windowSize): x.append(data[i:i+windowSize, 0]) label.append(data[i+windowSize]) return x, labeldef set_window(data, windowSize = WINDOW_SIZE): x = [] label = [] length = len(data) for i in range(length - windowSize): x.append(data[i:i+windowSize, 0]) label.append(data[i+windowSize]) return x, label
网络搭建less
数据处理好了以后,就须要对网络进行搭建了。ide
接受的训练数据是一个[batchSize, 4]的shape,就是用前4个数据预测后一个。函数
hidden_size是一个lstm或者rnn单元的神经元的个数,也就是结构图中的一个方框A,能够想象其中有这么多个神经元。学习
class RNN(object): def __init__(self): self.stateNum = WINDOW_SIZE self.batchSize = 50 self.time_step = 20 # 时间步 self.hidden_size = 100 # 隐层单元数目 self._build_net() self.sess = tf.Session() self.sess.run(tf.global_variables_initializer())
def _build_net(self): self.x = tf.placeholder(tf.float32, [None, self.stateNum]) self.y = tf.placeholder(tf.float32, [None, 1])
w = tf.Variable(tf.truncated_normal([self.hidden_size, 1], stddev=0.1)) b = tf.Variable(tf.constant(0.1, shape=[1]))
input_data = tf.reshape(self.x, [-1, self.stateNum, 1]) rnn_cell = tf.nn.rnn_cell.BasicRNNCell(self.hidden_size)
batchSize = tf.shape(self.x)[0]
init_state = rnn_cell.zero_state(batchSize, tf.float32) outputs_rnn, final_state = tf.nn.dynamic_rnn(rnn_cell, input_data, dtype=tf.float32, initial_state=init_state) output = tf.reshape(outputs_rnn, [-1, self.hidden_size]) self.prediction = tf.matmul(final_state, w) + b
self.loss = tf.reduce_mean(tf.square(self.y - self.prediction)) self.train = tf.train.AdamOptimizer(0.001).minimize(self.loss) def train_net(self, x, label): batch_num = len(x) // self.batchSize for epoch in range(100): loss_sum = 0 for i in range(batch_num): dict = {self.x: x[i * self.batchSize:(i + 1) * self.batchSize], self.y: label[i * self.batchSize:(i + 1) * self.batchSize]} loss, _, pre = self.sess.run([self.loss, self.train, self.prediction], feed_dict=dict) # print(pre) loss_sum += loss print(str(epoch) + str(':') + str(loss_sum))
def predict(self, x): dict = {self.x: x} prediction = self.sess.run(self.prediction, feed_dict=dict) return prediction
上面的代码有关键的地方就是为何要reshape成这种结构?
首先reshape的这个结构是做为tf.nn.dynamic_rnn的参数传入的,咱们先看一下这个函数的参数介绍:
cell: An instance of RNNCell. inputs: The RNN inputs. If `time_major == False` (default), this must be a `Tensor` of shape: `[batch_size, max_time, ...]`, or a nested tuple of such elements. If `time_major == True`, this must be a `Tensor` of shape: `[max_time, batch_size, ...]`, or a nested tuple of such elements. This may also be a (possibly nested) tuple of Tensors satisfying this property. The first two dimensions must match across all the inputs, but otherwise the ranks and other shape components may differ. In this case, input to `cell` at each time-step will replicate the structure of these tuples, except for the time dimension (from which the time is taken). The input to `cell` at each time step will be a `Tensor` or (possibly nested) tuple of Tensors each with dimensions `[batch_size, ...]`. sequence_length: (optional) An int32/int64 vector sized `[batch_size]`. Used to copy-through state and zero-out outputs when past a batch element's sequence length. So it's more for correctness than performance. initial_state: (optional) An initial state for the RNN. If `cell.state_size` is an integer, this must be a `Tensor` of appropriate type and shape `[batch_size, cell.state_size]`. If `cell.state_size` is a tuple, this should be a tuple of tensors having shapes `[batch_size, s] for s in cell.state_size`. dtype: (optional) The data type for the initial state and expected output. Required if initial_state is not provided or RNN state has a heterogeneous dtype. parallel_iterations: (Default: 32). The number of iterations to run in parallel. Those operations which do not have any temporal dependency and can be run in parallel, will be. This parameter trades off time for space. Values >> 1 use more memory but take less time, while smaller values use less memory but computations take longer. swap_memory: Transparently swap the tensors produced in forward inference but needed for back prop from GPU to CPU. This allows training RNNs which would typically not fit on a single GPU, with very minimal (or no) performance penalty. time_major: The shape format of the `inputs` and `outputs` Tensors. If true, these `Tensors` must be shaped `[max_time, batch_size, depth]`. If false, these `Tensors` must be shaped `[batch_size, max_time, depth]`. Using `time_major = True` is a bit more efficient because it avoids transposes at the beginning and end of the RNN calculation. However, most TensorFlow data is batch-major, so by default this function accepts input and emits output in batch-major form. scope: VariableScope for the created subgraph; defaults to "rnn".
参数介绍有点长,先看一下关于inputs的介绍,其中说了inputs的格式是[batch_size, max_time, .....]这里的max_time的意思是这个rnn网络在展开的时候有多长,就是图中所示t这么长。而后这个.....的意思就是每次的输入这个x的维度了。这样的话reshape为[-1, 4, 1]就好解释了-1就是不用管这个维度,若是咱们的batch_size是50的话,-1这里就被计算为50*4/4/1=50了,也就是分50次输入rnn网络,每次输入长度是4,就是有4个方框,每一个方框接受的数据是1维的。
输出是两个结果,一个是outputs,一个是state。
outputs输出是一个[batch_size, max_time, cell.out_size]shape的输出。对于咱们设计的网络,相对应的就是一个shape[50, 4, 100]其实就是这一个batch中50组输入的数据经过这4个单元每一个单元100个神经元的输出。用常规的NN来比较,正常的NN输出应该是一个相似[50, 100]的输出,可是rnn经过展开后,获得的就是[50, 4, 100]的一个结果。
理解了outputs后,再理解state就好说了。这个state就是final state,也就是batch中一个组数据输入后,最后一个单元的神经元的输出,由于设置了100个神经元,因此总的输出就是[50, 100]。在rnn模型图中就是对应最后一个A方框的隐层输出结果。
模型训练
模型训练部分很简单,就是实例化RNN对象,而后调用其中的方法进行对训练集训练,而后拿测试集进行测试:
rnn = RNN() rnn.train_net(train_x, train_y)
result = rnn.predict(test_x) prediction = mm.inverse_transform(result) y = mm.inverse_transform(test_y) print(result) print(prediction)
fig = plt.figure() ax = fig.add_subplot(111) ax.plot(range(len(prediction)), prediction) ax.plot(range(len(y)), y) ax.legend(['prediction', 'true'])
cal_accr(prediction, y)
plt.show()
训练结果
因为没有对神经元的个数以及窗口进行调参,因此获得的预测结果并非很理想,有兴趣的能够拿本身的数据跑一下以及参数进行调整:
本文分享自微信公众号 - 人工智能学术前沿(AI_Frontier)。
若有侵权,请联系 support@oschina.cn 删除。
本文参与“OSC源创计划”,欢迎正在阅读的你也加入,一块儿分享。