程序有 premade_estimator.py 和 iris_data.pynode
iris_data.py 读取 training data 和 test data 以及定义 estimator 用的数据格式python
------------------------------------------------------------------------------------------------------------git
1. iris_data.py 程序修改github
iris_data 远程下载训练集和测试集。app
http://download.tensorflow.org/data/iris_training.csv http://download.tensorflow.org/data/iris_test.csv
可是实际测试没法使用。dom
这里有这两个文件:ide
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_training.csv https://github.com/tensorflow/tensorflow/blob/master/tensorflow/examples/tutorials/monitors/iris_test.csv
下载后保存为 xlsx,并修改 iris_data.py 中文件的下载和读取部分为:函数
def load_data(y_name='Species'): # x-features y-labels train = pandas.read_excel('iris_training.xlsx',names = CSV_COLUMN_NAMES, header = 0) train_features,train_labels = train, train.pop(y_name) test = pandas.read_excel('iris_test.xlsx',names = CSV_COLUMN_NAMES, header = 0) test_features, test_labels = test, test.pop(y_name) return (train_features,train_labels),(test_features,test_labels)
即:原来的 may_load 部分能够删除。改写 load_data,使用 read_excel。发现不一样版本的中 returen 的测试
变量有些为 train_features, labels 有些为 train_x, y. 统一修改成 features 和 labels 更方便阅读。ui
------------------------------------------------------------------------------------------------------------
2. premade_estimator.py
添加 tensorflow 和 iris_data 模块
import tensorflow as tf import iris_data
从 iris_data 读取 training 和 test 数据
# Fetch the data (train_features, train_labels), (test_features, test_labels) = iris_data.load_data()
-------------------------------------------------------------------------------------------------------------
将 training_features data 添加到 tf.feature_column 中
my_feature_columns = [] for key in train_features.keys(): my_feature_columns.append(tf.feature_column.numeric_column(key=key))
其中
tf.feature_column #tools for ingesting and representing features tf.feature_column.numeric_column(...) #Represents real valued or numerical features
将 train_features 中的每个 keys 添加到 tensorflow.feature_column 中
-------------------------------------------------------------------------------------------------------------
实例化一个 estimator
classifier = tf.estimator.DNNClassifier( feature_columns=my_feature_columns, hidden_units=[10, 10], n_classes=3)
其中
tf.estimator.DNNClassifier # A classifier for TensorFlow DNN models. feature_columns # input the feature_cloumn of the model hidden_units = [m,n] # the length of hidden_units define the number of hidden layers # m and n define the number of nodes in each layer n_classes # the classes to be clarified
-------------------------------------------------------------------------------------------------------------
训练一个模型 Train the Model
classifier.train( input_fn=lambda:iris_data.train_input_fn(train_features, train_labels,args.batch_size), steps=args.train_steps)
train_input_fn 引用自 iris_data 定义的函数
def train_input_fn(features, labels, batch_size): dataset = tf.data.Dataset.from_tensor_slices((dict(features), labels)) dataset = dataset.shuffle(1000).repeat().batch(batch_size) return dataset
analysis
tf.data.Dataset # A Dataset can be used to represent an input pipeline as a collection of elements (nested # structures #of tensors) and a "logical plan" of transformations that act on those elements. # 高层 TensorFlow API,用于读取数据并转化成 train 方法所需的格式 tf.data.Dataset.from_tensor_slices # Creates a Dataset whose elements are slices of the given tensors. dataset.shuffle # Randomly shuffles the elements of this dataset 随机的训练样本会使训练效果更好 # 经过函数 tf.data.Dataset.shuffle 将样本随机化 dataset.repeat # Repeats this dataset count times dataset.batch # Combines consecutive elements of this dataset into batches (dict(features),labels) # features (dic) and labels (seris) combines as a turple
DNNClassifier.train 的第一个参数 input_fn 要求的是一个函数
A function that provides input data for training as minibatches.
并且要求这个函数的返回值是 tf.data.dataset object 或者是 turple
注意在输入 input_fn 函数使用用的 lamda 表达式:lambda
表达式是一行函数。它们在其余语言中也被称为匿名函数。若是你不想在程序中对一个函数使用两次,你也许会想用lambda表达式,它们和普通的函数彻底同样。
-------------------------------------------------------------------------------------------------------------
评估一个模型 Evaluate the model
为了评估模型的有效性,每一个 estimator 都提供了 evaluate
方法
eval_result = classifier.evaluate( input_fn=lambda:eval_input_fn(test_features, test_labels, args.batch_size)) print('\nTest set accuracy: {accuracy:0.3f}\n'.format(**eval_result))
注意评估一个模型的有效性须要调用的是测试数据集。 classifier.evaluate 的调用方法与 train 函数相似
def eval_input_fn(features, labels, batch_size): features=dict(features) if labels is None: # No labels, use only features. inputs = features else: inputs = (features, labels) # Convert the inputs to a Dataset. dataset = tf.data.Dataset.from_tensor_slices(inputs) # Batch the examples assert batch_size is not None, "batch_size must not be None" dataset = dataset.batch(batch_size) # Return the dataset. return dataset
#The assert statement exists in almost every programming language. When you do... assert condition #you're telling the program to test that condition, and trigger an error if the condition is false. #In Python, it's roughly equivalent to this: if not condition: raise AssertionError()
------------------------------------------------------------------------------------------------------------
3.总结
如何构建一个 estimator
如何测试一个 estimator
若是构建 estimator 用的数据