tensorflow学习笔记——使用TensorFlow操做MNIST数据（1）

时间 2019-11-06

标签 tensorflow 学习笔记使用 mnist 数据繁體版

原文原文链接

　　续集请点击我：tensorflow学习笔记——使用TensorFlow操做MNIST数据（2）html

　　本节开始学习使用tensorflow教程，固然从最简单的MNIST开始。这怎么说呢，就比如编程入门有Hello World，机器学习入门有MNIST。在此节，我将训练一个机器学习模型用于预测图片里面的数字。node

　　MNIST 是一个很是有名的手写体数字识别数据集，在不少资料中，这个数据集都会被用作深度学习的入门样例。而Tensorflow的封装让MNIST数据集变得更加方便。MNIST是NIST数据集的一个子集，它包含了60000张图片做为训练数据，10000张图片做为测试数据。在MNIST数据集中的每一张图片都表明了0~9中的一个数字。图片的大小都为28*28，且数字都会出如今图片的正中间，以下图：python

　　在上图中右侧显示了一张数字1的图片，而右侧显示了这个图片所对应的像素矩阵，MNIST数据集提供了四个下载文件，在tensorflow中能够将这四个文件直接下载放到一个目录中并加载，以下代码input_data.read_data_sets所示，若是指定目录中没有数据，那么tensorflow会自动去网络上进行下载。下面代码介绍了如何使用tensorflow操做MNIST数据集。git

　　MNIST数据集的官网是Yann LeCun's website。在这里，咱们提供了一份python源代码用于自动下载和安装这个数据集。你能够下载这份代码，而后用下面的代码导入到你的项目里面，也能够直接复制粘贴到你的代码文件里面。github

import tensorflow.examples.tutorials.mnist.input_data as input_data
mnist = input_data.read_data_sets("MNIST_data/", one_hot=True)

　　这里，我直接下载了数据集，并放在了个人代码里面。测试以下：web

# _*_coding:utf-8_*_

from tensorflow.examples.tutorials.mnist import input_data

mnist = input_data.read_data_sets('MNIST_data', one_hot=True)
print("OK")
# 打印“Training data size: 55000”
print("Training data size: ", mnist.train.num_examples)
# 打印“Validating data size: 5000”
print("Validating data size: ", mnist.validation.num_examples)
# 打印“Testing data size: 10000”
print("Testing data size: ", mnist.test.num_examples)
# 打印“Example training data: [0. 0. 0. ... 0.380 0.376 ... 0.]”
print("Example training data: ", mnist.train.images[0])
# 打印“Example training data label: [0. 0. 0. 0. 0. 0. 0. 1. 0. 0.]”
print("Example training data label: ", mnist.train.labels[0])

batch_size = 100
# 从train的集合中选取batch_size个训练数据
xs, ys = mnist.train.next_batch(batch_size)
# 输出“X shape:(100,784)”
print("X shape: ", xs.shape)
# 输出"Y shape:(100,10)"
print("Y shape: ", ys.shape)

　　结果以下：算法

Extracting MNIST_data\train-images-idx3-ubyte.gz
Extracting MNIST_data\train-labels-idx1-ubyte.gz
Extracting MNIST_data\t10k-images-idx3-ubyte.gz
Extracting MNIST_data\t10k-labels-idx1-ubyte.gz
OK
Training data size:  55000
Validating data size:  5000
Testing data size:  10000
X shape:  (100, 784)
Y shape:  (100, 10)

　　从上面的代码能够看出，经过input_data.read_data_sets函数生成的类会自动将MNIST数据集划分红train，validation和test三个数据集，其中train这个集合内含有55000张图片，validation集合内含有5000张图片，这两个集合组成了MNIST自己提供的训练数据集。test集合内有10000张图片，这些图片都来自与MNIST提供的测试数据集。处理后的每一张图片是一个长度为784的一维数组，这个数组中的元素对应了图片像素矩阵中的每个数字（28*28=784）。由于神经网络的输入是一个特征向量，因此在此把一张二维图像的像素矩阵放到一个一维数组中能够方便tensorflow将图片的像素矩阵提供给神经网络的输入层。像素矩阵中元素的取值范围为[0, 1]，它表明了颜色的深浅。其中0表示白色背景，1表示黑色前景。编程

Softmax回归的再复习

　　咱们知道MNIST的每一张图片都表示一个数字，从0到9。咱们但愿获得给定图片表明每一个数字的几率。好比说，咱们的模型可能推测一张包含9的图片表明数字9 的几率为80%可是判断它是8 的几率是5%（由于8和9都有上半部分的小圆），而后给予它表明其余数字的几率更小的值。api

　　这是一个使用softmax regression模型的经典案例。softmax模型能够用来给不一样的对象分配几率。即便在以后，咱们训练更加精细的模型的时候，最后一步也是须要softmax来分配几率。数组

softmax回归第一步

　　为了获得一张给定图片属于某个特定数字类的证据（evidence），咱们对图片像素值进行加权求和。若是这个像素具备很强的证听说明这种图片不属于该类，那么相应的权值为负数，相反若是这个像素拥有很强的证听说明这张图片不属于该类，那么相应的权值为负数，相反若是这个像素拥有有力的证据支持这张图片属于这个类，那么权值是正数。

　　下面的图片显示了一个模型学习到的图片上每一个像素对于特定数字类的权值，红色表明负数权值，蓝色表明正数权值。

　　咱们也须要加入一个额外的偏置量（bias），由于输入每每会带有一些无关的干扰量。由于对于给定的输入图片 x 它表明的是数字 i 的证据能够表示为：

　　其中，Wi表明权重，bi表明数字 i 类的偏置量， j 表明给定图片 x 的像素索引用于像素求和。而后用 softmax 函数能够把这些证据转换成几率 y：

　　这里的softmax能够当作是一个激励（activation）函数或者连接（link）函数，把咱们定义的线性函数的输出转换成咱们想要的格式，也就是关于10个数字类的几率分布。所以，给定一张图片，它对于每个数字的吻合度能够被softmax函数转换成一个几率值。softmax函数能够定义为：

　　展开等式右边的子式，能够获得：

　　可是更多的时候把softmax模型函数定义为前一种形式：把输入值当作幂指函数求值，再正则化这些结果值。这个幂运算表示，更大的证据对应更大的假设模型（hypothesis）里面的乘数权重值。反之，拥有更少的证据意味着在假设模型里面拥有更小的乘数系数。假设模型里面的权值不能够是0值或者负值。Softmax而后会正则化这些权重值，使他们的总和等于1，以此构造一个有效的几率分布。

　　对于softmax回归模型能够用下面的图解释，对于输入的 xs 加权求和，再分别加上一个偏置量，最后再输入到 softmax 函数中：

　　若是把它写成一个等式，咱们能够获得：

　　咱们也能够用向量表示这个计算过程：用矩阵乘法和向量相加。这有助于提升计算效率：

　　更进一步，能够写成更加紧凑的方式：

　　接下来就是使用Tensorflow实现一个 Softmax Regression。其实在Python中，当尚未TensorFlow时，一般使用Numpy作密集的运算操做。由于Numpy是使用C和一部分 fortran语言编写的，而且调用了 openblas，mkl 等矩阵运算库，所以效率很好。其中每个运算操做的结果都要返回到Python中，但不一样语言之间传输数据可能会带来更大的延迟。TensorFlow一样将密集的复杂运算搬到Python外面执行，不过作的更完全。

　　首先载入TensorFlow库，并建立一个新的 InteractiveSession ，使用这个命令会将这个session 注册为默认的session，以后的运算也默认跑在这个session里，不一样session之间的数据和运算应该都是相互独立的。接下来建立一个 placeholder，即输入数据的地方。Placeholder 的第一个参数是数据类型，第二个参数 [None, 784] 表明 tensor 的 shape，也就是数据的尺寸，这里的None表明不限条数的输入， 784表明每条数据的是一个784维的向量。

sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32, [None, 784])

　　接下来要给Softmax Regression模型中的 weights和 biases 建立 Variable 对象，Variable是用来存储模型参数的，不一样于存储数据的 tensor 一旦使用掉就会消失，Variable 在模型训练迭代中是持久的，它能够长期存在而且在每轮迭代中被更新。咱们把 weights 和 biases 所有初始化为0，由于模型训练时会自动学习适合的值，因此对这个简单模型来讲初始值不过重要，不过对于复杂的卷积网络，循环网络或者比较深的全链接网络，初始化的方法就比较重要，甚至能够说相当重要。注意这里的W的shape是 [784, 10] 中， 784是特征的维数，然后面的10 表明有10类，由于Label在 one-hot编码后是10维的向量。

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

　　接下来要实现 Softmax Regression 算法，咱们将上面提到的 y = softmax(Wx+b)，改成 TensorFlow的语言就是下面的代码：

y = tf.nn.softmax(tf.matmul(x, W) + b)

　　Softmax是 tf.nn 下面的一个函数，而 tf.nn 则包含了大量的神经网络的组件， tf.matmul是TensorFlow中矩阵乘法函数。咱们就定义了简单的 Softmax Regression，语法和直接写数学公式很像。然而Tensorflow最厉害的地方还不是定义公式，而是将forward和backward的内容都自动实现。因此咱们接下来定义好loss，训练时将会自动求导并进行梯度降低，完成对 Softmax Regression模型参数的自动学习。

训练模型的步骤

　　为了训练模型，咱们首先须要定义一个指标来评估这个模型是好的。其实，在机器学习中，咱们一般定义指标来表示一个模型是坏的，这个指标称为成本（cost）或者损失（loss），而后近邻最小化这个指标。可是这两种方式是相同的。

　　一个很是常见的的成本函数是“交叉熵（cross-entropy）”。交叉熵产生于信息论里面的信息压缩编码技术，可是后来他演变为从博弈论到机器学习等其余领域里的重要技术手段，它的定义以下：

　　Y是咱们预测的几率分布， y' 是实际的分布（咱们输入的 one-hot vector）。比较粗糙的理解是，交叉熵是用来衡量咱们的预测用于描述真相的低效性。

　　为了计算交叉熵，咱们首先须要添加一个新的占位符用于输入正确值：

y_ = tf.placeholder("float", [None,10])

　　而后咱们能够用交叉熵的公式计算交叉熵：

cross_entropy = -tf.reduce_sum(y_*tf.log(y))

　　或者

cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),
                                              reduction_indices=[1]))

　　（这里咱们 y_ * tf.log(y)是前面公式的交叉熵公式，而 tf.reduce_sum就是求和，而tf.reduce_mean则是用来对每一个 batch 的数据结果求均值。）

　　首先，用 tf.log 计算 y 的每一个元素的对数。接下来，咱们把 y_ 的每个元素和 tf.log(y) 的对应元素相乘。最后用 tf.reduce_sum 计算张量的全部元素总和。（注意：这里的交叉熵不只仅用来衡量单一的一对预测和真实值，而是全部的图片的交叉熵的总和。对于全部的数据点的预测表现比单一数据点的表现能更好的描述咱们的模型的性能）。

　　当咱们知道咱们须要咱们的模型作什么的时候，用TensorFlow来训练它是很是容易的。由于TensorFlow拥有一张描述咱们各个计算单元的图，它能够自动的使用反向传播算法（backpropagation algorithm）来有效的肯定变量是如何影响想要最小化的那个成本值。而后TensorFlow会用选择的优化算法来不断地修改变量以下降成本。

train_step = tf.train.GradientDescentOptimizer(0.01).minimize(cross_entropy)

　　在这里，咱们要求TensorFlow用梯度降低算法（gradient descent alogrithm）以0.01的学习速率最小化交叉熵。梯度降低算法（gradient descent algorithm）是一个简单的学习过程，TensorFlow只需将每一个变量一点点地往使成本不断下降的方向移动。固然TensorFlow也提供了其余许多优化算法：只要简单地调整一行代码就可使用其余的算法。

　　TensorFlow在这里实际上所作的是，它会在后台给描述你的计算的那张图里面增长一系列新的计算操做单元用于实现反向传播算法和梯度降低算法。而后，它返回给你的只是一个单一的操做，当运行这个操做时，它用梯度降低算法训练你的模型，微调你的变量，不断减小成本。

　　如今，咱们已经设置好了咱们的模型，在运行计算以前，咱们须要添加一个操做来初始化咱们建立的变量并初始化，可是咱们能够在一个Session里面启动咱们的模型：

with tf.Session() as sess:
    tf.global_variables_initializer().run()

　　而后开始训练模型，这里咱们让模型训练训练1000次：

for i in range(1000):
  batch_xs, batch_ys = mnist.train.next_batch(100)
  sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})

　　或者

for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train_step.run({x: batch_xs, y:batch_ys})

　　该训练的每一个步骤中，咱们都会随机抓取训练数据中的100个批处理数据点，而后咱们用这些数据点做为参数替换以前的占位符来运行 train_step。并且咱们能够经过 feed_dict 将 x 和 y_ 张量占位符用训练数据替代。（注意：在计算图中，咱们能够用 feed_dict 来替代任何张量，并不只限于替换占位符）

　　使用一小部分的随机数据来进行训练被称为随机训练（stochastic training）- 在这里更确切的说是随机梯度降低训练。在理想状况下，咱们但愿用咱们全部的数据来进行每一步的训练，由于这能给咱们更好的训练结果，但显然这须要很大的计算开销。因此，每一次训练咱们可使用不一样的数据子集，这样作既能够减小计算开销，又能够最大化地学习到数据集的整体特性。

　　如今咱们已经完成了训练，接下来就是对模型的准确率进行验证，下面代码中的 tf.argmax是从一个 tensor中寻找最大值的序号， tf.argmax(y, 1)就是求各个预测的数字中几率最大的一个，而 tf.argmax(y_, 1)则是找到样本的真实数字类别。而 tf.equal方法则是用来判断预测数字类别是否就是正确的类别，最后返回计算分类是否正确的操做 correct_predition。

保存检查点（checkpoint）的步骤

　　为了获得能够用来后续恢复模型以进一步训练或评估的检查点文件（checkpoint file），咱们实例化一个 tf.train.Saver。

saver = tf.train.Saver()

　　在训练循环中，将按期调用 saver.save() 方法，向训练文件夹中写入包含了当前全部可训练变量值的检查点文件。

saver.save(sess, FLAGS.train_dir, global_step=step)

　　这样，咱们之后就可使用 saver.restore() 方法，重载模型的参数，继续训练。

saver.restore(sess, FLAGS.train_dir)

评估模型的步骤

　　那么咱们的模型性能如何呢？

　　首先让咱们找出那些预测正确的标签。tf.argmax 是一个很是有用的函数，它能给出某个 tensor 对象在某一维上的其数据最大值所在的索引值。因为标签向量是由0， 1 组成，因此最大值1所在的索引位置就是类别标签，好比 tf.argmax(y, 1) 返回的是模型对于任一输入 x 预测到的标签值，而 tf.argmax(y_, 1) 表明正确的标签，咱们能够用 tf.equal 来检测咱们的预测是否真实标签匹配（索引位置同样表示匹配）。

correct_prediction = tf.equal(tf.argmax(y,1), tf.argmax(y_,1))

　　这里返回一个布尔数组，为了计算咱们分类的准确率，咱们使用cast将布尔值转换为浮点数来表明对，错，而后取平均值。例如： [True, False, True, True] 变为 [1, 0, 1, 1]，计算出平均值为 0.75。

accuracy = tf.reduce_mean(tf.cast(correct_prediction, "float"))

　　最后，咱们能够计算出在测试数据上的准确率，大概是91%。

print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

　　下面咱们总结一下实现了上面简单机器学习算法 Softmax Regression，这能够算做是一个没有隐含层的最浅的神经网络，咱们来总结一下整个流程，咱们作的事情分为下面四部分：

1，定义算法公式，也就是神经网络forward时的计算
2，定义loss，选定优化器，并指定优化器优化loss
3，迭代地对数据进行训练
4，在测试集或验证集上对准确率进行评测

　　这几个步骤是咱们使用Tensorflow进行算法设计，训练的核心步骤，也将会贯穿全部神经网络。

　　代码以下：

#_*_coding:utf-8_*_
import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data

# 首先咱们直接加载TensorFlow封装好的MNIST数据   28*28=784
mnist = input_data.read_data_sets('MNIST_data/', one_hot=True)
print(mnist.train.images.shape, mnist.train.labels.shape)
print(mnist.test.images.shape, mnist.test.labels.shape)
print(mnist.validation.images.shape, mnist.validation.labels.shape)
'''
(55000, 784) (55000, 10)
(10000, 784) (10000, 10)
(5000, 784) (5000, 10)
'''

sess = tf.InteractiveSession()
x = tf.placeholder(tf.float32, [None, 784])

W = tf.Variable(tf.zeros([784, 10]))
b = tf.Variable(tf.zeros([10]))

y = tf.nn.softmax(tf.matmul(x, W) + b)

y_ = tf.placeholder(tf.float32, [None, 10])
cross_entropy = tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(y),
                                              reduction_indices=[1]))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

tf.global_variables_initializer().run()

for i in range(1000):
    batch_xs, batch_ys = mnist.train.next_batch(100)
    train_step.run({x: batch_xs, y_: batch_ys})

correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))

accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))

print(accuracy.eval(feed_dict={x: mnist.test.images, y_: mnist.test.labels}))

传统神经网络训练MNIST数据

1，训练步骤

输入层：拉伸的手写文字图像，维度为 [-1, 28*28]
全链接层：500个节点，维度为[-1, 500]
输出层：分类输出，维度为 [-1, 10]

2，训练代码及其结果

　　传统的神经网络训练代码：

# _*_coding:utf-8_*_
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

BATCH_SIZE = 100  # 训练批次
INPUT_NODE = 784  # 784=28*28 输入节点
OUTPUT_NODE = 10  # 输出节点
LAYER1_NODE = 784  # 层级节点
TRAIN_STEP = 10000  # 训练步数
LEARNING_RATE = 0.01
L2NORM_RATE = 0.001


def train(mnist):
    # define input placeholder
    # None表示此张量的第一个维度能够是任何长度的
    input_x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='input_x')
    input_y = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='input_y')

    # define weights and biases
    w1 = tf.Variable(tf.truncated_normal(shape=[INPUT_NODE, LAYER1_NODE], stddev=0.1))
    b1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))

    w2 = tf.Variable(tf.truncated_normal(shape=[LAYER1_NODE, OUTPUT_NODE], stddev=0.1))
    b2 = tf.Variable(tf.constant(0.1, shape=[OUTPUT_NODE]))

    layer1 = tf.nn.relu(tf.nn.xw_plus_b(input_x, w1, b1))
    y_hat = tf.nn.xw_plus_b(layer1, w2, b2)
    print("1 step ok!")

    # define loss
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=y_hat, labels=input_y)
    cross_entropy_mean = tf.reduce_mean(cross_entropy)
    regularization = tf.nn.l2_loss(w1) + tf.nn.l2_loss(w2) + tf.nn.l2_loss(b1) + tf.nn.l2_loss(b2)
    loss = cross_entropy_mean + L2NORM_RATE * regularization
    print("2 step ok")

    # define accuracy
    correct_predictions = tf.equal(tf.argmax(y_hat, 1), tf.argmax(input_y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float32))
    print("3 step ok")

    # train operation
    global_step = tf.Variable(0, trainable=False)
    train_op = tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss, global_step=global_step)
    print("4 step ok ")

    with tf.Session() as sess:
        tf.global_variables_initializer().run()  # 初始化建立的变量
        print("5 step OK")

        # 开始训练模型
        for i in range(TRAIN_STEP):
            xs, ys = mnist.train.next_batch(BATCH_SIZE)
            feed_dict = {
                input_x: xs,
                input_y: ys,
            }
            _, step, train_loss, train_acc = sess.run([train_op, global_step, loss, accuracy], feed_dict=feed_dict)
            if (i % 100 == 0):
                print("After %d steps, in train data, loss is %g, accuracy is %g." % (step, train_loss, train_acc))

        test_feed = {input_x: mnist.test.images, input_y: mnist.test.labels}
        test_acc = sess.run(accuracy, feed_dict=test_feed)
        print("After %d steps, in test data, accuracy is %g." % (TRAIN_STEP, test_acc))


if __name__ == '__main__':
    mnist = input_data.read_data_sets("data", one_hot=True)
    print("0 step ok!")
    train(mnist)

　　结果以下：

Extracting data\train-images-idx3-ubyte.gz
Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz
0 step ok!
1 step ok!
2 step ok
3 step ok
4 step ok 
5 step OK
After 1 steps, in train data, loss is 5.44352, accuracy is 0.03.
After 101 steps, in train data, loss is 0.665012, accuracy is 0.95.

...  ...

After 9901 steps, in train data, loss is 0.304703, accuracy is 0.96.
After 10000 steps, in test data, accuracy is 0.9612.

3，部分代码解析：

占位符

　　咱们经过为输入图像和目标输出类别建立节点，来开始构建计算图：

x = tf.placeholder("float", [None, 784])

　　x 不是一个特定的值，而是一个占位符 placeholder，咱们在TensorFlow运行计算时输入这个值咱们但愿可以输入任意数量的MNIST图像，每张图展平为784维（28*28）的向量。咱们用二维的浮点数张量来表示这些图，这个张量的形状是 [None, 784]，这里的None表示此张量的第一个维度能够是任意长度的。

　　虽然palceholder 的shape参数是可选的，可是有了它，TensorFlow可以自动捕捉因数据维度不一致致使的错误。

变量

　　一个变量表明着TensorFlow计算图中的一个值，可以在计算过程当中使用，甚至进行修改。在机器学习的应用过程当中，模型参数通常用Variable来表示。

w1 = tf.Variable(tf.truncated_normal(shape=[INPUT_NODE, LAYER1_NODE], stddev=0.1))
b1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))

　　咱们的模型也须要权重值和偏置量，固然咱们能够将其当作是另外的输入（使用占位符），这里使用Variable。一个Variable表明一个可修改的张量，它能够用于计算输入值，也能够在计算中被修改。对于各类机器学习英语，通常都会有模型参数，能够用Variable来表示。

　　变量须要经过session初始化后，才能在session中使用，这一初始化的步骤为，为初始化指定具体值，并将其分配给每一个变量，能够一次性为全部变量完成此操做。

sess.run(tf.initialize_all_variables())

卷积神经网络-CNN训练MNIST数据

　　在MNIST上只有96%的正确率是在是糟糕，因此这里咱们用一个稍微复杂的模型：卷积神经网络来改善效果，这里大概会达到100%的准确率。

1，训练步骤

输入层：手写文字图像，维度为[-1, 28, 28, 1]
卷积层1：filter的shape为5*5*32，strides为1，padding为“SAME”。卷积后维度为 [-1, 28, 28, 32]
池化层2：max-pooling，ksize为2*2，步长为2，池化后维度为 [-1, 14, 14, 32]
卷积层3：filter的shape为5*5*64，strides为1，padding为“SAME”。卷积后维度为 [-1, 14, 14, 64]
池化层4：max-pooling，ksize为2*2，步长为2。池化后维度为 [-1, 7, 7, 64]
池化后结果展开：将池化层4 feature map展开， [-1, 7, 7, 64] ==> [-1, 3136]
全链接层5：输入维度 [-1, 3136]，输出维度 [-1, 512]
全链接层6：输入维度 [-1, 512]，输出维度[-1, 10]。通过softmax后，获得分类结果。

2，训练代码及其结果展现

　　卷积神经网络的训练代码以下：

# _*_coding:utf-8_*_
import tensorflow as tf
import numpy as np
from tensorflow.examples.tutorials.mnist import input_data

BATCH_SIZE = 100  # 训练批次
INPUT_NODE = 784  # 784=28*28 输入节点
OUTPUT_NODE = 10  # 输出节点
LAYER1_NODE = 784  # 层级节点
TRAIN_STEP = 10000  # 训练步数
LEARNING_RATE = 0.01
L2NORM_RATE = 0.001


def train(mnist):
    # define input placeholder
    # None表示此张量的第一个维度能够是任何长度的
    input_x = tf.placeholder(tf.float32, [None, INPUT_NODE], name='input_x')
    input_y = tf.placeholder(tf.float32, [None, OUTPUT_NODE], name='input_y')

    # define weights and biases
    w1 = tf.Variable(tf.truncated_normal(shape=[INPUT_NODE, LAYER1_NODE], stddev=0.1))
    b1 = tf.Variable(tf.constant(0.1, shape=[LAYER1_NODE]))

    w2 = tf.Variable(tf.truncated_normal(shape=[LAYER1_NODE, OUTPUT_NODE], stddev=0.1))
    b2 = tf.Variable(tf.constant(0.1, shape=[OUTPUT_NODE]))

    layer1 = tf.nn.relu(tf.nn.xw_plus_b(input_x, w1, b1))
    y_hat = tf.nn.xw_plus_b(layer1, w2, b2)
    print("1 step ok!")

    # define loss
    cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=y_hat, labels=input_y)
    cross_entropy_mean = tf.reduce_mean(cross_entropy)
    regularization = tf.nn.l2_loss(w1) + tf.nn.l2_loss(w2) + tf.nn.l2_loss(b1) + tf.nn.l2_loss(b2)
    loss = cross_entropy_mean + L2NORM_RATE * regularization
    print("2 step ok")

    # define accuracy
    correct_predictions = tf.equal(tf.argmax(y_hat, 1), tf.argmax(input_y, 1))
    accuracy = tf.reduce_mean(tf.cast(correct_predictions, tf.float32))
    print("3 step ok")

    # train operation
    global_step = tf.Variable(0, trainable=False)
    train_op = tf.train.AdamOptimizer(LEARNING_RATE).minimize(loss, global_step=global_step)
    print("4 step ok ")

    with tf.Session() as sess:
        tf.global_variables_initializer().run()  # 初始化建立的变量
        print("5 step OK")

        # 开始训练模型
        for i in range(TRAIN_STEP):
            xs, ys = mnist.train.next_batch(BATCH_SIZE)
            feed_dict = {
                input_x: xs,
                input_y: ys,
            }
            _, step, train_loss, train_acc = sess.run([train_op, global_step, loss, accuracy], feed_dict=feed_dict)
            if (i % 100 == 0):
                print("After %d steps, in train data, loss is %g, accuracy is %g." % (step, train_loss, train_acc))

        test_feed = {input_x: mnist.test.images, input_y: mnist.test.labels}
        test_acc = sess.run(accuracy, feed_dict=test_feed)
        print("After %d steps, in test data, accuracy is %g." % (TRAIN_STEP, test_acc))


if __name__ == '__main__':
    mnist = input_data.read_data_sets("data", one_hot=True)
    print("0 step ok!")
    train(mnist)

　　训练结果以下：

Extracting data\train-images-idx3-ubyte.gz
Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz
0 step ok!
1 step ok!
2 step OK!
3 step OK!
4 step OK!
5 step ok!
6 step OK
7 step ok!
8 step OK!
9, step OK!
step:1 ,train loss:10.3222, train_acc:0.17
step:101 ,train loss:0.727434, train_acc:0.93
step:201 ,train loss:0.829298, train_acc:0.84
step:301 ,train loss:0.630073, train_acc:0.95
step:401 ,train loss:0.53797, train_acc:0.95
step:501 ,train loss:0.624309, train_acc:0.92
step:601 ,train loss:0.412459, train_acc:1
step:701 ,train loss:0.392073, train_acc:1
... ...
step:2601 ,train loss:0.173203, train_acc:1
step:2701 ,train loss:0.226434, train_acc:0.99
step:2801 ,train loss:0.172851, train_acc:0.99
step:2901 ,train loss:0.227201, train_acc:0.96
After 3000 training steps, in test dataset, loss is 0.165731, acc is 0.989

3，部分代码解析：

初始化权重

　　为了避免在创建模型的时候反复作初始化的操做，咱们定义两个函数用于初始化。

# 定义W初始化函数
def get_weights(shape):
    form = tf.truncated_normal(shape, stddev=0.1)
    return tf.Variable(form)


# 定义b初始化函数
def get_biases(shape):
    form = tf.constant(0.1, shape=shape)
    return tf.Variable(form)

　　为了建立此模型，咱们须要建立大量的权重和偏置项。这个模型中的权重在初始化时应该加入少许的噪声来打破对称性以及避免0梯度。因为咱们使用的是ReLU神经元，所以比较好的作法是用一个较小的正数来初始化偏置项，以免神经元节点输出恒为0 的问题（dead neurons）。为了避免在创建模型的时候反复作初始化操做，咱们定义两个函数用于初始化。

卷积和池化

　　TensorFlow在卷积和池化上有很强的灵活性，那么如何处理边界？步长应该设置多大？这里咱们卷积使用的是1步长（stride size），0边距（padding size）。保证输出和输出的大小相同，咱们的池化是用简单的2*2大小的模板作 max pooling 。

第一层卷积

　　如今咱们开始实现第一层卷积，它是由一个卷积接一个 max pooling 完成。卷积在每一个5*5的patch中算出 32个特征。卷积的权重张量形状是 [5, 5, 1, 32]，前两个维度是patch的大小，接着是输出通道的数目，最后是输出的通道数目。为了使用这一层卷积，咱们将x 变成一个4d向量，其第二，第三对应图片的宽，高。最后一维表明输出的通道数目。

　　最后咱们将x 和权重向量进行卷积，加上偏置项，而后应用ReLU激活函数，代码以下：

# 第一层：卷积层conv1
'''
input: [-1, 28, 28, 1]
filter: [5, 5, 32]
output: [-1, 28, 28, 32]
'''
with tf.name_scope("conv1"):
    w = get_weights([FILTER1_SIZE, FILTER1_SIZE, 1, FILTER1_NUM])
    b = get_biases([FILTER1_NUM])
    conv1_op = tf.nn.conv2d(
        input=input_x,
        filter=w,
        strides=[1, 1, 1, 1],
        padding="SAME",
        name='conv1_op'
    )

    conv1 = tf.nn.relu(tf.nn.bias_add(conv1_op, b), name='relu')

第二层池化

　　　前面也说过，咱们的池化用简单传统的2*2大小模板作 max pooling .

# 第二层：池化层pooling2
'''
input: [-1, 28, 28, 32]
output: [-1. 14, 14, 32]
'''
with tf.name_scope('pooling2'):
    pooling2 = tf.nn.max_pool(
        value=conv1,
        ksize=[1, 2, 2, 1],
        strides=[1, 2, 2, 1],
        padding='SAME',
        name='pooling1'
    )

第三层卷积

　　为了构建一个更深的网络，咱们会把几个相似的层堆叠起来，第二层中，每一个5*5的patch 会获得64个特征，代码以下：

# 第三次：卷积层conv3
'''
input" [-1, 14, 14, 32]
filter: [5, 5, 64]
output: [-1, 14, 14, 64]
'''
with tf.name_scope('conv3'):
    w = get_weights([FILTER3_SIZE, FILTER3_SIZE, FILTER1_NUM, FILTER3_NUM])
    b = get_biases([FILTER3_NUM])
    conv3_op = tf.nn.conv2d(
        input=pooling2,
        filter=w,
        strides=[1, 1, 1, 1],
        padding="SAME",
        name='conv3_op'
    )
    conv3 = tf.nn.relu(tf.nn.bias_add(conv3_op, b), name='relu')

第四层池化

　　而后进行池化，随机减小特征：

# 第四层：池化层pooling4
    '''
    input: [-1, 14, 14, 64]
    output: [-1, 7, 7, 64]
    '''
    with tf.name_scope("pooling4"):
        pooling4 = tf.nn.max_pool(
            value=conv3,
            ksize=[1, 2, 2, 1],
            strides=[1, 2, 2, 1],
            padding="SAME",
            name="pooling4"
        )

全链接层

　　如今图片的尺寸减小到7*7，咱们加入一个3136个神经元的全链接层，用于处理整个图片。咱们先把池化层输出的张量reshape成一些向量，也就是展开它。而后输入其权重公式，偏置项公式，而后求其L2_Loss函数，最后代码以下：

# 池化结果展开
'''
input: [-1, 7,7,64]
output: [-1, 3136]
'''
pooling4_flat = tf.reshape(pooling4, [-1, FLAT_SIZE])
print("5 step ok!")

# 第五层：全链接层 FC5
'''
input: [-1, 3136]  (56*56=3136)
output: [-1, 512]
'''
with tf.name_scope('fc5'):
    w = get_weights([FLAT_SIZE, FC5_SIZE])
    b = get_biases([FC5_SIZE])
    fc5 = tf.nn.relu(tf.nn.xw_plus_b(pooling4_flat, w, b, name='fc5'), name='relu')
    fc5_drop = tf.nn.dropout(fc5, droput_keep_prob)
    l2_loss += tf.nn.l2_loss(w) + tf.nn.l2_loss(b)
print("6 step OK")

# 第六层：全链接层（输出）
'''
input: [-1, 512]
output: [-1, 10]
'''
with tf.name_scope('fc6'):
    w = get_weights([FC5_SIZE, OUTPUT_SIZE])
    b = get_biases([OUTPUT_SIZE])
    y_hat = tf.nn.xw_plus_b(fc5_drop, w, b, name='y_hat')
    l2_loss += tf.nn.l2_loss(w) + tf.nn.l2_loss(b)
print("7 step ok!")

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits=y_hat, labels=input_y)
cross_entropy_mean = tf.reduce_mean(cross_entropy)
loss = cross_entropy_mean + L2NORM_RATE * l2_loss

简易前馈神经网络训练MNIST数据

　　下面利用TensorFlow使用MNIST数据集训练并评估一个用于识别手写数字的简易前馈神经网络（feed-forward neural network）。

1，代码以下：

　　其中说明一下下面两个文件的目的：

1，full_connected_mnist.py 构建一个彻底链接（fully connected）的MNIST模型所需的代码
2，full_connected_feed.py 利用下载的数据集训练构建好的MNIST模型的主要代码

　　full_connected_mnist.py代码以下：

# _*_coding:utf-8_*_
"""Builds the MNIST network.
# 为模型构建实现推理 损失 训练的模型
Implements the inference/loss/training pattern for model building.
# 推理  根据运行网络的须要 构建模型向前作预测
1. inference() - Builds the model as far as required for running the network forward to make predictions.
#  将生成loss所需的层添加到推理模型中
2. loss() - Adds to the inference model the layers required to generate loss.
# 训练  将生成和所需的ops添加到损失模型中并应用梯度
3. training() - Adds to the loss model the Ops required to generate and apply gradients.
This file is used by the various "fully_connected_*.py" files and not meant to
be run.
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

import math

import tensorflow as tf

# The MNIST dataset has 10 classes representing the digits 0 through 9
NUM_CLASSES = 10

# The MNIST images are always 28*28 pixels
IMAGES_SIZE = 28
IMAGES_PIXELS = IMAGES_SIZE * IMAGES_SIZE


def inference(images, hidden1_units, hidden2_units):
    """Build the MNIST model up to where it may be used for inference.
  Args:
    images: Images placeholder, from inputs().
    hidden1_units: Size of the first hidden layer.
    hidden2_units: Size of the second hidden layer.
  Returns:
    softmax_linear: Output tensor with the computed logits.
  """
    # hidden1
    with tf.name_scope('hidden1'):
        weights = tf.Variable(
            tf.truncated_normal([IMAGES_PIXELS, hidden1_units],
                                stddev=1.0 / math.sqrt(float(IMAGES_PIXELS))),
            name='weights'
        )
        biases = tf.nn.relu(tf.matmul([hidden1_units]),
                            name='weights')
        hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
    # hidden2
    with tf.name_scope('hidden2'):
        weights = tf.Variable(
            tf.truncated_normal([hidden1_units, hidden2_units],
                                stddev=1.0 / math.sqrt(float(hidden1_units))),
            name='weights'
        )
        biases = tf.Variable(tf.zeros([hidden2_units]),
                             name='biases')
        hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
    # Linear
    with tf.name_scope('softmax_linear'):
        weights = tf.Variable(
            tf.truncated_normal([hidden2_units, NUM_CLASSES],
                                stddev=1.0 / math.sqrt(float(hidden2_units))),
            name='weights'
        )
        biases = tf.nn.relu(tf.zeros([NUM_CLASSES]),
                            name='biases')
        logits = tf.matmul(hidden2, weights) + biases
    return logits


def loss(logits, labels):
    """Calculates the loss from the logits and the labels.
  Args:
    logits: Logits tensor, float - [batch_size, NUM_CLASSES].
    labels: Labels tensor, int32 - [batch_size].
  Returns:
    loss: Loss tensor of type float.
  """
    labels = tf.to_int64(labels)
    return tf.losses.sparse_softmax_cross_entropy(labels=labels, logits=logits)


def training(loss, learning_rate):
    """Sets up the training Ops.
  Creates a summarizer to track the loss over time in TensorBoard.
  Creates an optimizer and applies the gradients to all trainable variables.
  The Op returned by this function is what must be passed to the
  `sess.run()` call to cause the model to train.
  Args:
    loss: Loss tensor, from loss().
    learning_rate: The learning rate to use for gradient descent.
  Returns:
    train_op: The Op for training.
  """
    # add a scalar summary for the snapshot loss
    tf.summary.scalar('loss', loss)
    # create the gradient descent optimizer with the given learning rate
    optimizer = tf.train.GradientDescentOptimizer(learning_rate)
    # create a variable to track the global step
    global_step = tf.Variable(0, name='global_step', trainable=False)
    # Use the optimizer to apply the gradients that minimize the loss
    # (and also increment the global step counter) as a single training step.
    train_op = optimizer.minimize(loss, global_step=global_step)
    return train_op


def evaluation(logits, labels):
    """Evaluate the quality of the logits at predicting the label.
    Args:
      logits: Logits tensor, float - [batch_size, NUM_CLASSES].
      labels: Labels tensor, int32 - [batch_size], with values in the
        range [0, NUM_CLASSES).
    Returns:
      A scalar int32 tensor with the number of examples (out of batch_size)
      that were predicted correctly.
    """
    # For a classifier model, we can use the in_top_k Op.
    # It returns a bool tensor with shape [batch_size] that is true for
    # the examples where the label is in the top k (here k=1)
    # of all logits for that example.
    correct = tf.nn.in_top_k(logits, labels, 1)
    # Return the number of true entries.
    return tf.reduce_sum(tf.cast(correct, tf.int32))

　　full_connected_feed.py代码以下：

#_*_coding:utf-8_*_
'''Trains and Evaluates the MNIST  network using a feed dictionary'''
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function

# pulint:disable=missing-docstring
import argparse
import os
import sys
import time

from six.moves import xrange   # pylint: disable=redefined-builtin
import tensorflow as tf

from tensorflow.examples.tutorials.mnist import input_data
from tensorflow.examples.tutorials.mnist import mnist

# Basic model parameters as external flags
FLAGS = None

def placeholder_inputs(batch__size):
    '''
        生成一个占位符变量来表述输入张量
        这些占位符被模型构建的其他部分用做输入代码，并将从下面 .run() 循环中下载数据提供
    :param batch__size:
    :return: 返回一个 图像的占位符，一个标签占位符
    '''
    # Note that the shapes of the placeholders match the shapes of the full
    # image and label tensors, except the first dimension is now batch_size
    # rather than the full size of the train or test data sets.
    images_placeholder = tf.placeholder(tf.float32, shape=(batch__size, mnist.IMAGE_PIXELS))
    labels_placeholder = tf.placeholder(tf.int32, shape=(batch__size))
    return images_placeholder, labels_placeholder

def fill_feed_dict(data_set, images_pl, labels_pl):
    """Fills the feed_dict for training the given step.
      A feed_dict takes the form of:
      feed_dict = {
          <placeholder>: <tensor of values to be passed for placeholder>,
          ....
      }
      Args:
        data_set: The set of images and labels, from input_data.read_data_sets()
        images_pl: The images placeholder, from placeholder_inputs().
        labels_pl: The labels placeholder, from placeholder_inputs().
      Returns:
        feed_dict: The feed dictionary mapping from placeholders to values.
      """
    # Create the feed_dict for the placeholders filled with the next
    # `batch size` examples.
    images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size, FLAGS.fake_data)
    feed_dict = {
        images_pl: images_feed,
        labels_pl: labels_feed,
    }
    return feed_dict

def do_evaluation(sess, eval_correct, images_placeholder, labels_placeholder, data_set):
    '''
        对阵个数据进行评估
    :param sess: The session in which the model has been trained.
    :param eval_correct: The Tensor that returns the number of correct predictions.
    :param images_placeholder: The images placeholder.
    :param labels_placeholder: The labels placeholder.
    :param data_set: The set of images and labels to evaluate, from
      input_data.read_data_sets().
    :return:
    '''
    # and run one epoch of eval
    true_count = 0  # counts the number of correct predictions
    steps_per_epoch = data_set.num_examples // FLAGS.batch_size
    num_examples = steps_per_epoch * FLAGS.batch_size
    for step in range(steps_per_epoch):
        feed_dict = fill_feed_dict(data_set, images_placeholder, labels_placeholder)
        true_count += sess.run(eval_correct, feed_dict=feed_dict)
    precision = float(true_count) / num_examples
    print('Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' %
          (num_examples, true_count, precision))

def run_training():
    '''Train MNIST for  a number of steps'''
    # get the sets of images and labels for training  validation and test on MNIST
    data_sets = input_data.read_data_sets(FLAGS.input_data_dir, FLAGS.fake_data)

    # Tell Tensorflow that the model will be built into the default graph
    with tf.Graph().as_default():
        #Generate placeholders for the images and labels
        images_placeholder, labels_placeholder = placeholder_inputs(FLAGS.batch_size)

    # build a graph that computes predictions from the inference model
    logits = mnist.inference(images_placeholder, FLAGS.hidden1, FLAGS.hidden2)

    # add to graph the ops for loss calculation
    loss = mnist.loss(logits, labels_placeholder)

    # add to the graph the ops that calculate and apply gradients
    train_op = mnist.training(loss, FLAGS.learning_rate)

    # add the Op to compare the logits to the labels during evaluation
    eval_correct = mnist.evaluation(logits, labels_placeholder)

    # bulid the summary Tensor based on the TF collection of Summaries
    summary = tf.summary.merge_all()

    # add the variable initializer Op
    init = tf.global_variables_initializer()

    # create a saver fro writing training checkpoint
    saver = tf.train.Saver()

    # create a session for running Ops on the Graph
    sess = tf.compat.v1.Session()

    # Instantiate a SummaryWriter to output summaries and the graph
    summary_writer = tf.summary.FileWriter(FLAGS.log_dir, sess.graph)

    # and then after everything is built

    # run the Op to initialize the variables
    sess.run(init)

    # start the training loop
    for step in range(FLAGS.max_steps):
        start_time = time.time()

        # Fill a feed dictionary with the actual set of images and labels
        # for this particular training step.
        feed_dict = fill_feed_dict(data_sets.train,
                                   images_placeholder,
                                   labels_placeholder)

        # Run one step of the model.  The return values are the activations
        # from the `train_op` (which is discarded) and the `loss` Op.  To
        # inspect the values of your Ops or variables, you may include them
        # in the list passed to sess.run() and the value tensors will be
        # returned in the tuple from the call.
        _, loss_value = sess.run([train_op, loss],
                                 feed_dict=feed_dict)

        duration = time.time() - start_time

        # write the summaries and print an overview fairly often
        if step % 100 == 0:
            # Print status to stdout.
            print('Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration))
            # Update the events file.
            summary_str = sess.run(summary, feed_dict=feed_dict)
            summary_writer.add_summary(summary_str, step)
            summary_writer.flush()

        # Save a checkpoint and evaluate the model periodically.
        if (step + 1) % 1000 == 0 or (step + 1) == FLAGS.max_steps:
            checkpoint_file = os.path.join(FLAGS.log_dir, 'model.ckpt')
            saver.save(sess, checkpoint_file, global_step=step)
            # Evaluate against the training set.
            print('Training Data Eval:')
            do_evaluation(sess,
                    eval_correct,
                    images_placeholder,
                    labels_placeholder,
                    data_sets.train)
            # Evaluate against the validation set.
            print('Validation Data Eval:')
            do_evaluation(sess,
                    eval_correct,
                    images_placeholder,
                    labels_placeholder,
                    data_sets.validation)
            # Evaluate against the test set.
            print('Test Data Eval:')
            do_evaluation(sess,
                    eval_correct,
                    images_placeholder,
                    labels_placeholder,
                    data_sets.test)


def main():
    if tf.gfile.Exists(FLAGS.log_dir):
        tf.gfile.DeleteRecursively(FLAGS.log_dir)
    tf.gfile.MakeDirs(FLAGS.log_dir)
    run_training()


def main(_):
  if tf.gfile.Exists(FLAGS.log_dir):
    tf.gfile.DeleteRecursively(FLAGS.log_dir)
  tf.gfile.MakeDirs(FLAGS.log_dir)
  run_training()


if __name__ == '__main__':
  parser = argparse.ArgumentParser()
  parser.add_argument(
      '--learning_rate',
      type=float,
      default=0.01,
      help='Initial learning rate.'
  )
  parser.add_argument(
      '--max_steps',
      type=int,
      default=2000,
      help='Number of steps to run trainer.'
  )
  parser.add_argument(
      '--hidden1',
      type=int,
      default=128,
      help='Number of units in hidden layer 1.'
  )
  parser.add_argument(
      '--hidden2',
      type=int,
      default=32,
      help='Number of units in hidden layer 2.'
  )
  parser.add_argument(
      '--batch_size',
      type=int,
      default=100,
      help='Batch size.  Must divide evenly into the dataset sizes.'
  )
  parser.add_argument(
      '--input_data_dir',
      type=str,
      default=os.path.join(os.getenv('TEST_TMPDIR', ''),
                           'input_data'),
      help='Directory to put the input data.'
  )
  parser.add_argument(
      '--log_dir',
      type=str,
      default=os.path.join(os.getenv('TEST_TMPDIR', ''),
                           'fully_connected_feed'),
      help='Directory to put the log data.'
  )
  parser.add_argument(
      '--fake_data',
      default=False,
      help='If true, uses fake data for unit testing.',
      action='store_true'
  )

  FLAGS, unparsed = parser.parse_known_args()
  tf.app.run(main=main, argv=[sys.argv[0]] + unparsed)

　　注意事项：

此处参数注意修改！！

　　结果展现：

Extracting data\train-images-idx3-ubyte.gz
Extracting data\train-labels-idx1-ubyte.gz
Extracting data\t10k-images-idx3-ubyte.gz
Extracting data\t10k-labels-idx1-ubyte.gz
Step 0: loss = 2.30 (0.114 sec)
Step 100: loss = 2.11 (0.002 sec)
Step 200: loss = 1.92 (0.002 sec)
Step 300: loss = 1.49 (0.002 sec)
Step 400: loss = 1.22 (0.002 sec)
Step 500: loss = 0.95 (0.003 sec)
Step 600: loss = 0.73 (0.003 sec)
Step 700: loss = 0.58 (0.004 sec)
Step 800: loss = 0.58 (0.002 sec)
Step 900: loss = 0.50 (0.002 sec)
Training Data Eval:
Num examples: 55000  Num correct: 47310  Precision @ 1: 0.8602
Validation Data Eval:
Num examples: 5000  Num correct: 4365  Precision @ 1: 0.8730
Test Data Eval:
Num examples: 10000  Num correct: 8628  Precision @ 1: 0.8628
Step 1000: loss = 0.51 (0.017 sec)
Step 1100: loss = 0.46 (0.104 sec)
Step 1200: loss = 0.50 (0.002 sec)
Step 1300: loss = 0.40 (0.003 sec)
Step 1400: loss = 0.57 (0.003 sec)
Step 1500: loss = 0.40 (0.003 sec)
Step 1600: loss = 0.42 (0.003 sec)
Step 1700: loss = 0.44 (0.003 sec)
Step 1800: loss = 0.41 (0.002 sec)
Step 1900: loss = 0.37 (0.000 sec)
Training Data Eval:
Num examples: 55000  Num correct: 49375  Precision @ 1: 0.8977
Validation Data Eval:
Num examples: 5000  Num correct: 4529  Precision @ 1: 0.9058
Test Data Eval:
Num examples: 10000  Num correct: 8985  Precision @ 1: 0.8985

　　准确率比起卷积神经网络仍是差点，可是效果仍是不错的。这里咱们主要是学习这个过程，了解深度学习训练模型的过程，因此还好。

2，部分代码解析

下载

　　在run_training() 方法的一开始， input_data.read_data_sets() 函数会确保咱们的本地训练文件夹中，是否已经下载了正确的数据，而后将这些数据解压并返回一个含有DataSet实例的字典。

data_sets = input_data.read_data_sets(FLAGS.train_dir, FLAGS.fake_data)

　　注意：fake_data标记是用于单元测试的。

　　数据展现以下：

输入与占位符（inputs and placeholders）

　　placeholder_inputs() 函数将生成两个 tf.placeholder 操做，定义传入图表中的 shape 参数， shape参数中包括 batch_size 值，后续还会将实际的训练用例传入图表。

images_placeholder = tf.placeholder(tf.float32, shape=(batch_size,
                                                       IMAGE_PIXELS))

labels_placeholder = tf.placeholder(tf.int32, shape=(batch_size))

　　在训练循环（training loop）的后续步骤中，传入的整个图像和标签数据集会被切片，以符合每个操做所设置的batch_size值，占位符操做将会填补以符合这个batch_size值。而后使用feed_dict参数，将数据传入sess.run()函数。

构建图表（Build the Graph）

　　在为数据建立占位符以后，就能够运行full_connected_mnist.py文件，通过三阶段的模式函数操做： inference() loss() training() 图表就构建完成了。

inference()：尽量的构建好图表，知足促使神经网络向前反馈并做出预测的要求。
loss()：往inference 图表中添加生成损失（loss）所须要的操做（ops）
training() ：往损失函数中添加计算并应用梯度（gradients）所须要的操做

推理（Inference）

　　inference()函数会尽量的构建图表，作到返回包含了预测结果（output prediction）的Tensor。

　　它接受图像占位符为输入，在此基础上借助ReLU（Rectified Linear Units）激活函数，构建一对彻底链接层（layers），以及一个有着十个节点（node），指明了输出 logits 模型的线性层。

　　每一层都建立于一个惟一的 tf.name_scope之下，建立于该做用域之下的全部元素都将带有其前缀。

with tf.name_scope('hidden1') as scope:

　　在定义的做用域中，每一层所使用的权重和误差都在 tf.Variable实例中生成，而且包含了各自指望的shape。

weights = tf.Variable(
    tf.truncated_normal([IMAGE_PIXELS, hidden1_units],
                        stddev=1.0 / math.sqrt(float(IMAGE_PIXELS))),
    name='weights')

biases = tf.Variable(tf.zeros([hidden1_units]),
                     name='biases')

　　例如，当这些层是在hidden1 做用域下生成时，赋予权重变量的独特名称将会是“hidden1/weights”。

　　每一个变量在构建时，都会得到初始化操做（initializer ops）。

　　在这种最多见的状况下，经过tf.truncated_normal函数初始化权重变量，给赋予的shape则是一个二维tensor，其中第一个维度表明该层中权重变量所链接（connect from）的单元数量，第二个维度表明该层中权重变量所链接到的（connect to）单元数量。对于名叫hidden1的第一层，相应的维度则是[IMAGE_PIXELS, hidden1_units]，由于权重变量将图像输入链接到了hidden1层。tf.truncated_normal初始函数将根据所获得的均值和标准差，生成一个随机分布。

　　而后，经过tf.zeros函数初始化误差变量（biases），确保全部误差的起始值都是0，而它们的shape则是其在该层中所接到的（connect to）单元数量。

　　图表的三个主要操做，分别是两个tf.nn.relu操做，它们中嵌入了隐藏层所需的tf.matmul；以及logits模型所需的另一个tf.matmul。三者依次生成，各自的tf.Variable实例则与输入占位符或下一层的输出tensor所链接。

hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)

hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)

logits = tf.matmul(hidden2, weights) + biases

　　最后，程序会返回包含了输出结果的logits Tensor。

损失（loss）

　　loss()函数经过添加所须要的损失操做，进一步构建图表。

　　首先，labels_placeholder中的值，将被编码为一个含有1-hot values 的Tensor。例如，若是类标识符为“3”，那么该值就会被转换为：[0, 0, 0, 1, 0, 0, 0, 0, 0, 0]

batch_size = tf.size(labels)
labels = tf.expand_dims(labels, 1)
indices = tf.expand_dims(tf.range(0, batch_size, 1), 1)
concated = tf.concat(1, [indices, labels])
onehot_labels = tf.sparse_to_dense(
    concated, tf.pack([batch_size, NUM_CLASSES]), 1.0, 0.0)

　　以后，又添加一个 tf.nn.softmax_cross_entropy_with_logits操做，用来比较 inference() 函数与 1-hot 标签所输出的 logits Tensor。

cross_entropy = tf.nn.softmax_cross_entropy_with_logits(logits,
                                                        onehot_labels,
                                                        name='xentropy')

　　而后，使用tf.reduce_mean() 函数，计算 batch 维度（第一维度）下交叉熵（cross entropy）的平均值，将该值做为总损失。

loss = tf.reduce_mean(cross_entropy, name='xentropy_mean')

　　最后程序会返回包含了损失值的Tensor。

训练training()

　　training() 函数添加了经过梯度降低（gradient descent）将损失最小化所须要的操做。

　　首先，该函数从loss() 函数中得到损失Tensor，将其交给 tf.scalar_summary，后者在于SummaryWriter配合使用时，能够向事件文件（events file）中生成汇总值（summary values）。每次写入汇总值时，它都会释放损失Tensor的当前值（snapshot value）。

tf.scalar_summary(loss.op.name, loss)

　　接下来，咱们实例化一个tf.train.GradientDescentOptimizer，负责按照所要求的学习效率（learning rate）应用梯度降低法（gradients）。

optimizer = tf.train.GradientDescentOptimizer(FLAGS.learning_rate)

　　以后，咱们生成一个变量用于保存全局训练步骤（global training step）的数值，并使用minimize()函数更新系统中的三角权重（triangle weights）、增长全局步骤的操做。根据惯例，这个操做被称为 train_op，是TensorFlow会话（session）诱发一个完整训练步骤所必须运行的操做（见下文）。

global_step = tf.Variable(0, name='global_step', trainable=False)
train_op = optimizer.minimize(loss, global_step=global_step)

　　最后，程序返回包含了训练操做（training op）输出结果的Tensor。

图表

　　在 run_training() 这个函数的一开始，是一个python语言的with命令，这个命令代表全部已经构建的操做都要与默认的 tf.Graph 全局实例关联起来。

with tf.Graph().as_default():

　　tf.Graph实例是一系列能够做为总体执行的操做。TensorFlow的大部分场景只须要依赖默认图表一个实例便可。

会话

　　完成所有的构建准备，生成所有所需的操做以后，咱们就能够建立一个tf.Session，用于运行图表。

sess = tf.Session()

　　固然，咱们也能够利用with 代码块生成Session，限制做用域：

with tf.Session() as sess:

　　Session函数中没有传入参数，代表该代码将会依附于（若是尚未建立会话，则会建立新的会话）默认的本地会话。

　　生成会话以后，全部tf.Variable实例都会当即经过调用各自初始化操做中的sess.run()函数进行初始化。

init = tf.initialize_all_variables()
sess.run(init)

　　sess.run()方法将会运行图表中与做为参数传入的操做相对应的完整子集。在初次调用时，init操做只包含了变量初始化程序tf.group。图表的其余部分不会在这里，而是在下面的训练循环运行。

训练循环

　　完成会话中变量的初始化以后，就能够开始训练了。

　　训练的每一步都是经过用户代码控制，而能实现有效训练的最简单循环就是：

for step in range(max_steps):
    sess.run(train_op)

　　可是咱们这里须要更复杂一点，由于咱们必须把输入的数据根据每一步的状况进行切分，以匹配以前生成的占位符。

向图表提供反馈

　　执行每一步时，咱们的代码会生成一个反馈字典（feed dictionary），其中包含对应步骤中训练所要使用的例子，这些例子的哈希键就是其所表明的占位符操做。

　　fill_feed_dict函数会查询给定的DataSet，索要下一批次batch_size的图像和标签，与占位符相匹配的Tensor则会包含下一批次的图像和标签。

images_feed, labels_feed = data_set.next_batch(FLAGS.batch_size)

　　而后，以占位符为哈希键，建立一个python字典对象，键值则是其表明的反馈Tensor。

feed_dict = {
    images_placeholder: images_feed,
    labels_placeholder: labels_feed,
}

　　这个字典随后做为feed_dict参数，传入sess.run()函数中，为这一步的训练提供输入样例。

检查状态

　　在运行sess.run函数时，要在代码中明确其须要获取的两个值：[train_op, loss]。

for step in range(FLAGS.max_steps):
    feed_dict = fill_feed_dict(data_sets.train,
                               images_placeholder,
                               labels_placeholder)
    _, loss_value = sess.run([train_op, loss],
                             feed_dict=feed_dict)

　　由于要获取这两个值，sess.run()会返回一个有两个元素的元组。其中每个Tensor对象，对应了返回的元组中的numpy数组，而这些数组中包含了当前这步训练中对应Tensor的值。因为train_op并不会产生输出，其在返回的元祖中的对应元素就是None，因此会被抛弃。可是，若是模型在训练中出现误差，loss Tensor的值可能会变成NaN，因此咱们要获取它的值，并记录下来。

　　假设训练一切正常，没有出现NaN，训练循环会每隔100个训练步骤，就打印一行简单的状态文本，告知用户当前的训练状态。

if step % 100 == 0:
    print 'Step %d: loss = %.2f (%.3f sec)' % (step, loss_value, duration)

状态可视化

　　为了释放TensorBoard所使用的事件文件（events file），全部的即时数据（在这里只有一个）都要在图表构建阶段合并至一个操做（op）中。

summary_op = tf.merge_all_summaries()

　　在建立好会话（session）以后，能够实例化一个tf.train.SummaryWriter，用于写入包含了图表自己和即时数据具体值的事件文件。

summary_writer = tf.train.SummaryWriter(FLAGS.train_dir,
                                        graph_def=sess.graph_def)

　　最后，每次运行summary_op时，都会往事件文件中写入最新的即时数据，函数的输出会传入事件文件读写器（writer）的add_summary()函数。

summary_str = sess.run(summary_op, feed_dict=feed_dict)
summary_writer.add_summary(summary_str, step)

执行sess会话的下面代码报错：

    sess = tf.compat.v1.Session()

AttributeError: module 'tensorflow.python.util.compat' has no attribute 'v1'

　　解决方法：

    sess = tf.Session()

参考文献：https://blog.csdn.net/GeForce_GTX1080Ti/article/details/80119194

TensorFlow官网（http://www.tensorfly.cn/tfdoc/tutorials/mnist_beginners.html）

http://wiki.jikexueyuan.com/project/tensorflow-zh/tutorials/mnist_tf.html

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/mnist/fully_connected_feed.py

此处是作以笔记，记录学习过程，如有侵权，请联系我