机器学习 - LSTM应用之sequence generation

时间 2020-03-11

标签机器学习 lstm 应用 sequence generation 繁體版

原文原文链接

概述

LSTM在机器学习上面的应用是很是普遍的，从股票分析，机器翻译到语义分析等等各个方面都有它的用武之地，通过前面的对于LSTM结构的分析，这一节主要介绍一些LSTM的一个小应用，那就是sequence generation。其实sequence generation本事也是对一些应用的统称，例如: 让机器学习音乐后而后让机器根据学习的模型本身创造音乐（制做人快要失业啦。。。。），让机器学习某种语言而后让这个学习到的模型本身产生Word来讲话，等等。这其实本质是一种one-to-many的LSTM网络结构。这一节内容主要就是讲解这一种网络结构的应用。网络

Sequence generation的网络结构分析

在我们实际实施而且写代码以前，我们首要的任务是如何搭建一个sequence generation的网络结构。一个sequence generation的网络结构其实也是分为两个部分，第一部分是encoding （modeling），也就是我们建模的网络，它是一个many-to-many的网络结构；第二部分是decoding的过程，它是一个one-to-many的结构。那么具体这个网络结构是什么样呢？我们看看下面的图片app

上面的图片展现的就是一个sequence generation从encoding到decoding的全过程和结构。在我们的这个应用中，我们的encoding中每个time step的输入是一个文字，输出则是相应输入的后一个字，这些数据都来自于我们的training data；等到我们训练完成后，我们将训练得来的LSTM cell来构建一个decoding网络，就是我们只输入一个单词，它根据我们的以前学习的model，来自动的预测我们要说什么话，是否是很cool？？固然啦，在encoding阶段，我们的LSTM具体有多少的time steps,是根据我们的input data的shape来决定的；在decoding阶段具体有多少的time step则是由我们本身来决定的, 我们须要用一个for loop来决定我们在decoding阶段的time steps。从上图，我们也能够很明显的看出在decoding的时候，我们只有一个输入X，后面time step的输入则都是前一个time step的输出。上面就是怎么sequence generation的一个总体的结构。那么就下来，我们就分析一些它的代码，看看我们如何用代码来实现上面的网络结构。机器学习

Sequence generation 代码分析

从上面的分析，我们能够看出sequence generation是由两个部分组成，那么天然我们代码也确定得分红两部分来实现上图中的网络结构，那么接下来我们来看看第一步，就是如何用Python来实现encoding的结构，代码以下所示，我们看着代码来慢慢分析：ide

#define shared variables
n_a=64
n_values = 78 # dimensions of out single input 
reshapor = keras.layers.Reshape((1, n_values))                  # Used in Step 2.B of djmodel(), below
LSTM_cell = keras.layers.LSTM(n_a, return_state = True)         # Used in Step 2.C, return_state muset be set 
densor = keras.layers.Dense(n_values, activation='softmax')     # Used in Step 2.D

#multiple inputs (X, a, c), we have to use functional Keras, other than sequential APIs
def create_model(Tx, n_a, n_values): """ Implement the model Arguments: Tx -- length of the sequence in a corpus n_a -- the number of activations used in our model n_values -- number of unique values in the music data Returns: model -- a keras instance model with n_a activations """ # Define the input layer and specify the shape X = keras.Input(shape=(Tx, n_values))#input omit the batch_size dimension, X is still 3 dimensiones (with batch_size dimension). # Define the initial hidden state a0 and initial cell state c0 a0 = keras.Input(shape=(n_a,), name='a0') c0 = keras.Input(shape=(n_a,), name='c0') a = a0 c = c0 # Step 1: Create empty list to append the outputs while you iterate outputs = [] # Step 2: Loop for t in range(Tx): # Step 2.A: select the "t"th time step vector from X. x = keras.layers.Lambda(lambda x: X[:,t,:])(X) # Step 2.B: Use reshapor to reshape x to be (1, n_values) (≈1 line) #由于LSTM layer默认的输入的dimension是 (batch_size, Tx, n_values)，其中batch_size是省略的， 便是（Tx, n_values）。若是是（Tx,n_values）的话，LSTM()会默认循环Tx次，于是，我们将它reshape成（1，n_values）,它就不会循环了。 x = reshapor(x) # Step 2.C: Perform one step of the LSTM_cell a, _, c = LSTM_cell(x, initial_state=[a,c]) # Step 2.D: Apply densor to the hidden state output of LSTM_Cell out = densor(a) #out's shape is (m,1,n_values) # Step 2.E: add the output to "outputs"  outputs.append(out) # Step 3: Create model instance model = keras.Model(inputs=[X,a0,c0],outputs=outputs) return model

从上面的代码，我们能够看出，首先我们得定义一些shared variable，例如a, c的dimension， LSTM_cell，等等这些，这些变量在我们的model中不管是encoding仍是decoding都是公用的，并非说一个LSTM layer就含有不少个LSTM_cell，这是错误的理解（虽然我们图片上面是这么画的，但这是为了方便你们理解才画了不少个LSTM_cell，实际是同一个LSTM_cell，但愿不要误解）。首先我们构建这个网络须要的参数有，Tx = time_steps； n_a = a，c的vector的dimension；以及n_values = 我们每个输入的vector的dimension。由于我们的网络有三处输入，分别是X, a, c, 因此我们要先定义这三处输入，而且设定它们的shape，注意在设定它们的shape的时候，是不须要有batch_size的；随后我们来到for loop中，首先提取每个time step的input value，即上面代码中Lambda layer所作的事儿，而后由于我们提取的是每个time step的值，每个time step， LSTM只会循环一次，因此我们仍是得把它reshape到（1，n_values）; 随后我们将处理好的input value传递给LSTM_cell，而且返回hidden state a, 和memory cell c, 最后通过一个dense layer计算我们的输出，而且将每一步的输出装进outputs这个list中。这就是构建我们的encoding网络的整个步骤。那么既然我们分析了上面encoding的阶段，完成了对我们LSTM的训练过程而且获得了我们想要的LSTM, 那么接下来我们看一看我们的decoding过程，即如何用训练获得的LSTM来generate（predict）我们的sequence啦，我们仍是看下面的代码，而后慢慢分析oop

def sequence_inference_model(LSTM_cell, n_values = 78, n_a = 64, Ty = 100):
    """
    Uses the trained "LSTM_cell" and "densor" from model() to generate a sequence of values.
    
    Arguments:
    LSTM_cell -- the trained "LSTM_cell" from model(), Keras layer object
    densor -- the trained "densor" from model(), Keras layer object
    n_values -- integer, number of unique values
    n_a -- number of units in the LSTM_cell
    Ty -- integer, number of time steps to generate
    
    Returns:
    inference_model -- Keras model instance
    """
    
    # Define the input of your model with a shape (it is a one-to-many structure, the input shape is (1,n_values))
    x0 = keras.Input(shape=(1, n_values)) # Define a0, c0, initial hidden state for the decoder LSTM a0 = keras.Input(shape=(n_a,), name='a0') c0 = keras.Input(shape=(n_a,), name='c0') a = a0 c = c0 x = x0 # Step 1: Create an empty list of "outputs" to later store your predicted values (≈1 line) outputs = [] # Step 2: Loop over Ty and generate a value at every time step for t in range(Ty): # Step 2.A: Perform one step of LSTM_cell a, _, c = LSTM_cell(x, initial_state=[a, c]) # Step 2.B: Apply Dense layer to the hidden state output of the LSTM_cell out = densor(a) # Step 2.C: Append the prediction "out" to "outputs". out.shape = (None, 78)  outputs.append(out) # Step 2.D: Select the next value according to "out", and set "x" to be the one-hot representation of the # selected value, which will be passed as the input to LSTM_cell on the next step. We have provided # the line of code you need to do this. x = keras.layers.Lambda(one_hot)(out) # Step 3: Create model instance with the correct "inputs" and "outputs" inference_model = keras.Model(inputs=[x0, a0, c0], outputs=outputs) return inference_model

inference_model = sequence_inference_model(LSTM_cell, densor, n_values = 78, n_a = 64, Ty = 50)
inference_model.summary()学习

x_initializer = np.zeros((1, 1, 78))
a_initializer = np.zeros((1, n_a))
c_initializer = np.zeros((1, n_a))this

pred = inference_model.predict([x_initializer, a_initializer, c_initializer])spa

这个inference model就是根据上面的训练来的LSTM来predict的，它共用了上面训练得来的的LSTM中的参数weights 和bias，根据输入的一个词x0来预测后面来输出哪些值，具体输出多少个值也是根据用户设定的Ty来决定，固然啦，我们还能够更加精细化的管理我们的输出，例如若是遇到EOS，我们直接中止输出。我们即便有了前面的LSTM，可是由于结构的不一样，我们仍是得先去构建一个新的inference model，即从新要搭建一个decoding的结构。从decoding的结构我们能够看出来，我们的输入仍是有三个，即x0,a0,c0。这里有比encoding简单的地方就是我们不须要再去reshape那么的输入了，我们的输入都是标准的shape，即分别是（batch_size, Tx, n_values）, (batch_size, n_a), (batch_size, n_a)，我们直接输入进去而且输入到Lstm和densor中就能够，不须要进行一些shape方面的配置了，其次这里有一点个encoding不同的，就是须要将每个time step的输出当作下个time step的输入，即上面代码中的x=tf.keras.Lambda(one_hot)(out)。由于这是一个inference model，因此我们也不须要从新fitting啦，能够直接调用它的predict方法就能够predict啦。翻译

总结

对于sequence generation相关的应用呢，我们首先要在脑海中找到这个pattern，即它是有2部分组成的，一个encoding，一个decoding；而后用encoding来训练模型，用decoding来predict模型。对于输入的input layer，必定要注意而且理解他们input data的shape，必定要一致性；对于一块儿share的变量必定要理解，例如LSTM_cell, densor 等，他们都是构成这个LSTM模型的最基本的希望，都是share的，并非每个time step都有独立的entity。若是对于以上的步骤和内容都理解的话，对于sequence generation相关的应用就均可以套用上面的模式进行实现，惟一须要改动的就是一下dimension值。code