Deep learning with Python 学习笔记（8）

时间 2019-12-01

标签 deep learning python 学习笔记栏目 Python 繁體版

原文原文链接

Keras 函数式编程

利用 Keras 函数式 API，你能够构建类图（graph-like）模型、在不一样的输入之间共享某一层，而且还能够像使用 Python 函数同样使用 Keras 模型。Keras 回调函数和 TensorBoard 基于浏览器的可视化工具，让你能够在训练过程当中监控模型html

对于多输入模型、多输出模型和类图模型，只用 Keras 中的 Sequential模型类是没法实现的。这时可使用另外一种更加通用、更加灵活的使用 Keras 的方式，就是函数式API（functional API）python

使用函数式 API，你能够直接操做张量，也能够把层看成函数来使用，接收张量并返回张量（所以得名函数式 API）算法

一个简单示例编程

from keras.models import Sequential, Model
from keras import layers
from keras import Input

input_tensor = Input(shape=(64,))
x = layers.Dense(32, activation='relu')(input_tensor)
x = layers.Dense(32, activation='relu')(x)
output_tensor = layers.Dense(10, activation='softmax')(x)
model = Model(input_tensor, output_tensor)
model.summary()

上述使用了函数式编程，模型对应的Sequential表示以下浏览器

model = Sequential()
model.add(layers.Dense(32, activation='relu', input_shape=(64, )))
model.add(layers.Dense(32, activation='relu'))
model.add(layers.Dense(10, activation='softmax'))

即
网络

在将Model对象实例化的时候，只须要使用一个输入张量和一个输出张量，Keras 会在后台检索从 input_tensor 到 output_tensor 所包含的每一层，并将这些层组合成一个类图的数据结构，即一个 Model。固然，这种方法有效的缘由在于，output_tensor 是经过对 input_tensor 进行屡次变换获得的。若是你试图利用不相关的输入和输出来构建一个模型，那么会获得 RuntimeError数据结构

函数式 API 可用于构建具备多个输入的模型。一般状况下，这种模型会在某一时刻用一个能够组合多个张量的层将不一样的输入分支合并，张量组合方式多是相加、链接等。这一般利用 Keras 的合并运算来实现，好比 keras.layers.add、keras.layers.concatenate 等架构

一个多输入模型示例app

典型的问答模型有两个输入：一个天然语言描述的问题和一个文本片断后者提供用于回答问题的信息。而后模型要生成一个回答，在最简单的状况下，这个回答只包含一个词，能够经过对某个预约义的词表作 softmax 获得
dom

from keras.models import Model
from keras import layers
from keras import Input
import numpy as np
import keras.utils
import tools

num_samples = 1000
max_length = 100
text_vocabulary_size = 10000
question_vocabulary_size = 10000
answer_vocabulary_size = 500
# 模型
text_input = Input(shape=(None,), dtype='int32', name='text')
embedded_text = layers.Embedding(text_vocabulary_size, 64)(text_input)
encoded_text = layers.LSTM(32)(embedded_text)
question_input = Input(shape=(None,), dtype='int32', name='question')
embedded_question = layers.Embedding(question_vocabulary_size, 32)(question_input)
encoded_question = layers.LSTM(16)(embedded_question)
concatenated = layers.concatenate([encoded_text, encoded_question], axis=-1)
answer = layers.Dense(answer_vocabulary_size, activation='softmax')(concatenated)
model = Model([text_input, question_input], answer)
model.compile(optimizer='rmsprop', loss='categorical_crossentropy', metrics=['acc'])
model.summary()
# 训练方法
text = np.random.randint(1, text_vocabulary_size, size=(num_samples, max_length))
question = np.random.randint(1, question_vocabulary_size, size=(num_samples, max_length))
answers = np.random.randint(answer_vocabulary_size, size=(num_samples))
answers = keras.utils.to_categorical(answers, answer_vocabulary_size)
history = model.fit([text, question], answers, epochs=10, batch_size=128)
# model.fit({'text': text, 'question': question}, answers, epochs=10, batch_size=128)
tools.draw_acc_and_loss(history)

tools.draw_acc_loss(history)

def draw_acc_and_loss(history):
    acc = history.history['acc']
    loss = history.history['loss']
    epochs = range(1, len(loss) + 1)
    plt.figure()
    plt.plot(epochs, acc, 'b', label='Training acc')
    plt.title('Training acc')
    plt.legend()
    plt.show()

    plt.plot(epochs, loss, 'b', label='Training loss')
    plt.title('Training loss')
    plt.legend()
    plt.show()

模型

没什么用的结果acc和loss

再进行训练应该会将结果向好的方向优化，233
将epochs更改成50后的结果

利用相同的方法，咱们还可使用函数式 API 来构建具备多个输出（或多头）的模型，如下将输入某个匿名人士的一系列社交媒体发帖，而后尝试预测那我的的属性，好比年龄、性别和收入水平

当使用多输出模型时，咱们能够对网络的各个头指定不一样的损失函数，例如，年龄预测是标量回归任务，而性别预测是二分类任务，两者须要不一样的训练过程。可是，梯度降低要求将一个标量最小化，因此为了可以训练模型，咱们必须将这些损失合并为单个标量。合并不一样损失最简单的方法就是对全部损失求和。在 Keras 中，你能够在编译时使用损失组成的列表或字典来为不一样输出指定不一样损失，而后将获得的损失值相加获得一个全局损失，并在训练过程当中将这个损失最小化

当咱们为各个头指定不一样的损失函数的时候，严重不平衡的损失贡献会致使模型表示针对单个损失值最大的任务优先进行优化，而不考虑其余任务的优化。为了解决这个问题，咱们能够为每一个损失值对最终损失的贡献分配不一样大小的重要性。好比，用于年龄回归任务的均方偏差（MSE）损失值一般在 3~5 左右，而用于性别分类任务的交叉熵，损失值可能低至 0.1。在这种状况下，为了平衡不一样损失的贡献，咱们可让交叉熵损失的权重取 10，而 MSE 损失的权重取 0.5

模型概要

from keras import layers
from keras import Input
from keras.models import Model

vocabulary_size = 50000
num_income_groups = 10
# 输入设置
posts_input = Input(shape=(None,), dtype='int32', name='posts')
embedded_posts = layers.Embedding(256, vocabulary_size)(posts_input)
# 一维卷积神经网络
x = layers.Conv1D(128, 5, activation='relu')(embedded_posts)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.MaxPooling1D(5)(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.Conv1D(256, 5, activation='relu')(x)
x = layers.GlobalMaxPooling1D()(x)
x = layers.Dense(128, activation='relu')(x)
# 预测设置  
age_prediction = layers.Dense(1, name='age')(x) 
income_prediction = layers.Dense(num_income_groups, activation='softmax', name='income')(x)
gender_prediction = layers.Dense(1, activation='sigmoid', name='gender')(x)
# 网络整合
model = Model(posts_input, [age_prediction, income_prediction, gender_prediction])
# 网络输出设置
# 为损失取不一样的权重
model.compile(optimizer='rmsprop', 
    loss=['mse', 'categorical_crossentropy', 'binary_crossentropy'], 
    loss_weights=[0.25, 1., 10.])  
# 为损失取不一样的权重的等价表达式
'''
model.compile(optimizer='rmsprop', loss={'age': 'mse',
        'income': 'categorical_crossentropy',
        'gender': 'binary_crossentropy'}, 
    loss_weights={'age': 0.25,
        'income': 1., 
        'gender': 10.})
'''
# 将数据就喂入网络  
model.fit(posts, [age_targets, income_targets, gender_targets],
 epochs=10, batch_size=64)  
# 将数据喂入网络的等价表达式  
'''
model.fit(posts, {'age': age_targets,
    'income': income_targets,
    'gender': gender_targets},
    epochs=10, batch_size=64)
'''

利用函数式 API，咱们不只能够构建多输入和多输出的模型，并且还能够实现具备复杂的内部拓扑结构的网络。Keras 中的神经网络能够是层组成的任意有向无环图（directed acyclic graph）。无环（acyclic）这个限定词很重要，即这些图不能有循环，即，张量 x 不能成为生成 x 的某一层的输入。惟一容许的处理循环（即循环链接）是循环层的内部循环

使用Keras实现Inception 3一个模块

假设咱们有一个四维输入张量 x

from keras import layers


branch_a = layers.Conv2D(128, 1, activation='relu', strides=2)(x) 

branch_b = layers.Conv2D(128, 1, activation='relu')(x) 
branch_b = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_b)

branch_c = layers.AveragePooling2D(3, strides=2)(x) 
branch_c = layers.Conv2D(128, 3, activation='relu')(branch_c)

branch_d = layers.Conv2D(128, 1, activation='relu')(x)
branch_d = layers.Conv2D(128, 3, activation='relu')(branch_d)
branch_d = layers.Conv2D(128, 3, activation='relu', strides=2)(branch_d)

output = layers.concatenate([branch_a, branch_b, branch_c, branch_d], axis=-1)

完整的Inception V3架构内置于Keras中，位置在keras.applications.inception_v3.InceptionV3，其中包括在 ImageNet 数据集上预训练获得的权重

残差链接是让前面某层的输出做为后面某层的输入，从而在序列网络中有效地创造了一条捷径。前面层的输出没有与后面层的激活链接在一块儿，而是与后面层的激活相加（这里假设两个激活的形状相同）。若是它们的形状不一样，咱们能够用一个线性变换将前面层的激活改变成目标形状

若是特征图的尺寸相同，在 Keras 中实现残差链接的方法以下，用的是恒等残差链接（identity residual connection）。一样假设咱们有一个四维输入张量 x

from keras import layers


x = ...
# 对 x 进行变换
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x) 
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
# 将原始 x 与输出特征相加
y = layers.add([y, x])

若是特征图的尺寸不一样，实现残差链接的方法以下，用的是线性残差链接（linear residual connection）。依旧假设咱们有一个四维输入张量 x

from keras import layers


x = ...
y = layers.Conv2D(128, 3, activation='relu', padding='same')(x)
y = layers.Conv2D(128, 3, activation='relu', padding='same')(y)
y = layers.MaxPooling2D(2, strides=2)(y)
# 使用 1×1 卷积，将原始 x 张量线性下采样为与 y 具备相同的形状
residual = layers.Conv2D(128, 1, strides=2, padding='same')(x) 
y = layers.add([y, residual])

函数式 API 还有一个重要特性，那就是可以屡次重复使用一个层实例。若是你对一个层实例调用两次，而不是每次调用都实例化一个新层，那么每次调用能够重复使用相同的权重。这样你能够构建具备共享分支的模型，即几个分支全都共享相同的知识并执行相同的运算。也就是说，这些分支共享相同的表示，并同时对不一样的输入集合学习这些表示

from keras import layers
from keras import Input
from keras.models import Model
# 将一个 LSTM 层实例化一次
lstm = layers.LSTM(32) 

left_input = Input(shape=(None, 128)) 
left_output = lstm(left_input)

right_input = Input(shape=(None, 128)) 
# 调用已有的层实例，那么就会重复使用它的权重
right_output = lstm(right_input)

merged = layers.concatenate([left_output, right_output], axis=-1) 
predictions = layers.Dense(1, activation='sigmoid')(merged)
model = Model([left_input, right_input], predictions) 
model.fit([left_data, right_data], targets)

在函数式 API 中，能够像使用层同样使用模型。实际上，你能够将模型看做“更大的层”。Sequential 类和Model 类都是如此。这意味着你能够在一个输入张量上调用模型，并获得一个输出张量

y = model(x)

若是模型具备多个输入张量和多个输出张量，那么应该用张量列表来调用模型

y1, y2 = model([x1, x2])

在调用模型实例时，就是在重复使用模型的权重，正如在调用层实例时，就是在重复使用层的权重。调用一个实例，不管是层实例仍是模型实例，都会重复使用这个实例已经学到的表示

在 Keras 中实现连体视觉模型（共享卷积基）

from keras import layers
from keras import applications
from keras import Input


# 图像处理基础模型是Xception 网络（只包括卷积基）
xception_base = applications.Xception(weights=None, include_top=False) 

# 输入250*250RGB图像
left_input = Input(shape=(250, 250, 3)) 
left_features = xception_base(left_input) 

right_input = Input(shape=(250, 250, 3))
# 对相同的视觉模型调用第二次
right_input = xception_base(right_input)

merged_features = layers.concatenate([left_features, right_input], axis=-1)

注：

1*1 卷积

咱们已经知道，卷积可以在输入张量的每个方块周围提取空间图块，并对全部图块应用相同的变换。极端状况是提取的图块只包含一个方块。这时卷积运算等价于让每一个方块向量通过一个 Dense 层：它计算获得的特征可以将输入张量通道中的信息混合在一块儿，但不会将跨空间的信息混合在一块儿（由于它一次只查看一个方块）。这种 1×1 卷积［也叫做逐点卷积（pointwise convolution）］是 Inception 模块的特点，它有助于区分开通道特征学习和空间特征学习。若是你假设每一个通道在跨越空间时是高度自相关的，但不一样的通道之间可能并不高度相关，那么这种作法是很合理的

深度学习中的表示瓶颈

在 Sequential 模型中，每一个连续的表示层都构建于前一层之上，这意味着它只能访问前一层激活中包含的信息。若是某一层过小（好比特征维度过低），那么模型将会受限于该层激活中可以塞入多少信息。残差链接能够将较早的信息从新注入到下游数据中，从而部分解决了深度学习模型的这一问题

深度学习中的梯度消失

反向传播是用于训练深度神经网络的主要算法，其工做原理是未来自输出损失的反馈信号向下传播到更底部的层。若是这个反馈信号的传播须要通过不少层，那么信号可能会变得很是微弱，甚至彻底丢失，致使网络没法训练。这个问题被称为梯度消失（vanishing gradient）  

深度网络中存在这个问题，在很长序列上的循环网络也存在这个问题。在这两种状况下，反馈信号的传播都必须经过一长串操做。LSTM 层引入了一个携带轨道（carry track），能够在与主处理轨道平行的轨道上传播信息。残差链接在前馈深度网络中的工做原理与此相似，但它更加简单：它引入了一个纯线性的信息携带轨道，与主要的层堆叠方向平行，从而有助于跨越任意深度的层来传播梯度

Deep learning with Python 学习笔记（9）
Deep learning with Python 学习笔记（7）