TensorFlow学习笔记（七）TesnorFlow实现计算加速

时间 2019-12-12

标签 tensorflow 学习笔记 tesnorflow 实现计算加速繁體版

原文原文链接

1、TensorFlow使用GPU

　　TensorFlow能够经过td.device函数来指定运行每一个操做的设备，这个设备能够是本设备的CPU或GPU，也能够是远程的某一台设备。异步

TF生成会话的时候，可愿意经过设置tf.log_device_placemaent参数来打印每个运算的设备。分布式

import tensorflow as tf

a = tf.constant([1.0,2.0,3.0],shape=[3],name='a')
b = tf.constant([1.0,2.0,3.0],shape=[3],name='b')
c= tf.add_n([a,b],name="c")

with tf.Session(config=tf.ConfigProto(log_device_placement = True)) as sess:
    print(sess.run(c))



########
Device mapping: no known devices.
c: (AddN): /job:localhost/replica:0/task:0/device:CPU:0
b: (Const): /job:localhost/replica:0/task:0/device:CPU:0
a: (Const): /job:localhost/replica:0/task:0/device:CPU:0

[2. 4. 6.]

在配置好了GPU环境的TensorFlow中，若是没有明确指明运行设备，TF会优先选择GPU。函数

import tensorflow as tf

a = tf.constant([1.0,2.0,3.0],shape=[3],name='a')
b = tf.constant([1.0,2.0,3.0],shape=[3],name='b')
c= tf.add_n([a,b],name="c")

with tf.Session(config=tf.ConfigProto(log_device_placement = True)) as sess:
    print(sess.run(c))



########
Device mapping: no known devices.
c: (AddN): /job:localhost/replica:0/task:0/device:GPU:0
b: (Const): /job:localhost/replica:0/task:0/device:GPU:0
a: (Const): /job:localhost/replica:0/task:0/device:GPU:0

[2. 4. 6.]

能够经过tf.device 来制定运行操做的设备。性能

import tensorflow as tf
with tf.device("/CPU:0"):
    a = tf.constant([1.0,2.0,3.0],shape=[3],name='a')
    b = tf.constant([1.0,2.0,3.0],shape=[3],name='b')
with tf.device("/GPU:0"):
    c= tf.add_n([a,b],name="c")

with tf.Session(config=tf.ConfigProto(log_device_placement = True)) as sess:
    print(sess.run(c))

某些数据类型是不被GPU所支持的。强制指定设备会报错。为了不解决这个问题。在建立会还时能够指定参数allow_soft_placement 。当allow_soft_placement为True的时候，若是运算没法在GPU上运行，TF会自动将其放在CPU 上运行。学习

a_cpu = tf.Variable(0,name='a_cpu')
with tf.device('/gpu:0'):
    a_gpu = tf.Variable(0,name='a_gpu')

sess = tf.Session(config=tf.ConfigProto(log_device_placement=True,allow_soft_placement = True))
sess.run(tf.global_variables_initializer())

a_gpu: (VariableV2): /job:localhost/replica:0/task:0/device:CPU:0
a_gpu/read: (Identity): /job:localhost/replica:0/task:0/device:CPU:0
a_gpu/Assign: (Assign): /job:localhost/replica:0/task:0/device:CPU:0
init/NoOp_1: (NoOp): /job:localhost/replica:0/task:0/device:CPU:0
a_cpu: (VariableV2): /job:localhost/replica:0/task:0/device:CPU:0
a_cpu/read: (Identity): /job:localhost/replica:0/task:0/device:CPU:0
a_cpu/Assign: (Assign): /job:localhost/replica:0/task:0/device:CPU:0
init/NoOp: (NoOp): /job:localhost/replica:0/task:0/device:CPU:0
init: (NoOp): /job:localhost/replica:0/task:0/device:CPU:0
a_gpu/initial_value: (Const): /job:localhost/replica:0/task:0/device:CPU:0
a_cpu/initial_value: (Const): /job:localhost/replica:0/task:0/device:CPU:0

　　实践经验：将计算密集型的操做放在GPU上。为了提升程序运行速度，尽可能将相关操做放在同一台设备上。this

2、深度学习训练与并行模式

　　经常使用的并行化深度学习模型的方法有两种：同步模式和异步模式。spa

在异步模式下，不一样设备之间是彻底独立的。code

异步模型流程图：server

同步模型流程图：

同步模式时，单个设备不会单独对参数进行更新，而会等待全部设备都完成反向传播以后再统一更新参数。

同步模式解决了异步模式中存在参数更新的问题，然而同步模式的效率却低于异步模式。

3、多GPU并行

　　通常来讲，一台机器上的多个GPU性能类似，因此在这种设置下跟多的是采用同步模式训练甚多学习模型。

4、分布式TensorFlow

　　　经过多GPU并行的方式当然能够达到很好的训练效果，可是一台机器上毕竟GPU的个数是有限的。若是须要记忆不提高深度学习模型的训练效果，就须要将TensorFlow分布式的运行在多台计算机上。

　　4.1分布式TensorFlow的原理

　　在第二个小结中，介绍了分布式TensorFlow训练甚多学习模型的理论。本小节将具体介绍如何使用TF在分布式集群中训练深度学习模型。TensorFlow集群经过一系列的任务（tasks）来执行TF计算图中的运算。通常来讲，不一样的任务跑在不一样的机器上。固然，使用GPU时，不一样任务可使用用一太机器上的不一样GPU。TF中的任务能够聚合成工做。每一个工做能够包含一个或多个任务。当一个TF集群有多个任务的时候，须要使用tf.train.ClusterSpec来指定运行每个人物的机器。

配置第一个任务集群　　

import tensorflow as tf

c = tf.constant('Hello ,this is the server1!')

#生成一个有两我的物的集群，一个任务跑在本地的2222端口，另外一个跑在本地的2223端口
cluster = tf.train.ClusterSpec({"local":['localhost:2998','localhost2999']})
#经过上面生成的集群配置生成Server。并经过job_name和task_index指定当前启动的任务。
server = tf.train.Server(cluster,job_name='local',task_index=0)
#经过server.target生成会话来使用来使用TF集群中的资源。经过log_device_placement能够看到执行每个操做的任务
sess = tf.Session(server.target,config=tf.ConfigProto(log_device_placement = True))
print(sess.run(c))

配置第二个任务，使用一样的集群配置

import tensorflow as tf

c = tf.constant('Hello ,this is the server2!')
#和第一个任务同样的集群配置
cluster = tf.train.ClusterSpec({"local":['localhost:2998','localhost2999']})
#指定task_index = 1，因此第二个任务是运行在2999端口上
server = tf.train.Server(cluster,job_name='local',task_index=0)

sess = tf.Session(server.target,config=tf.ConfigProto(log_device_placement = True))
print(sess.run(c))

当只启动第一个任务时，程序会停下来等待第二个任务启动。并且持续输出failed to connect to “ipv4:127.0.0.1:2999”，当第二个任务启动后，才会输出第一个任务的结果。

TensorFlow学习笔记（七）TesnorFlow实现计算加速

目录：

1、TensorFlow使用GPU

2、深度学习训练与并行模式

3、多GPU并行

4、分布式TensorFlow

4.1分布式TensorFlow的原理

4.2分布式TensorFlow模型训练

4.3使用caicloud运行分布式TensorFlow

1、TensorFlow使用GPU

2、深度学习训练与并行模式

3、多GPU并行

4、分布式TensorFlow

4.1分布式TensorFlow的原理

4.2分布式TensorFlow模型训练

4.3使用caicloud运行分布式TensorFlow