在上篇博文中,我大略介绍了一下神经网络究竟是什么,但愿能帮助没有接触过这个科技领域的同窗对神经网络有一个感性的认知。本篇将经过简单的数学推论和 Python 代码实现来解释神经网络最基本的两个要素:node
并在最后实现一个深度神经网络。为了方便解释基础运算过程,在本篇中我将仅使用numpy
来进行数学运算。python
神经网络最多见的使用场景便是给予必定输入信息后,可以处理信息,而后给出一个结果做为输出,这种输出多是预测、分类结果或其它。在神经网络执行此类任务时,输入信息(一般是特征值)将被带入到一个相互链接的节点网络中。这些独立的节点被称做感知器或者神经元,它们是构成神经网络的基本单元。每一个感知器依照输入数据来决定如何对数据分类。算法
以学校招生为例,如下是某学校对于往届申请学生的招收状况:json
咱们则能够在已知一个学生的高考分数和情商测试的状况下,预测其是否会被这所学校录取。根据往届信息看来,一个学生是否会被录取由高考分数和情商测试两个因素共同决定。这两个因素并无任何一项对结果起决定性影响,而是各自占有必定权重(Weight)。假设咱们已知这两个因素的各自的权重,则用来进行次学校录取预测的神经网络结构可能是:bash
当特征数据被输入感知器,它会与分配给这个特定输入的权重相乘。例如,上图感知器有两个输入,test和 iq,因此它有两个与之相关的权重,而且能够分别调整。一个较大的权重意味着神经网络认为这个输入比其它输入更重要,较小的权重意味着数据不是那么重要。一个极端的例子是,若是 test 成绩对学生录取没有影响,那么 test 分数的权重就会是零,也就是说,它对感知器的输出没有影响。网络
感知器把权重应用于输入再加总的过程叫作线性组合。经过简洁的数学表达方式即为:app
经过以上的计算还不足以方便的预测出这个学生是否会被该学校录取,感知器求和的结果须要被转换成输出信号才能输出最终的结果。在这个例子中,输出结果多是:dom
这是须要经过把线性组合传给激活函数 f 来实现的。一个简单胜任的激活函数(activation function)能够是:curl
为了增长数学运算的功能完整性,这个公式中还将引入一个偏置项(bias)用来调整输出信号的大小。最终咱们有了一个完整的感知器计算公式:函数
须要注意的是,在数据被整理得足够"好"(咱们之后再聊聊怎样预处理数据)的状况下,咱们并不太须要偏置项。因此在后续的推导和代码中,你可能看不到偏置项的存在,不要惊讶。
这里给出感知器的Python实现样例:
import numpy as np def activation(h): if h <= 0: return 0 else: return 1 inputs = np.array([0.7, 0.3]) weights = np.random.normal(loc=0.0, scale=1, size=(1, 2)) bias = np.random.normal(loc=0.0, scale=1, size=(1)) output = activation(np.dot(weights, inputs) + bias) print('Output: {}'.format(output))
总结看来,单个感知器的结构能够表示为下图左侧。若是要解决以上的预测问题,神经网络结构将不会如上图示例同样仅仅是一个感知器,而会是多个、多层感知器组合而成(下图右侧),一个感知器的输出能够变成另外一个感知器的输入,通过多层运算后最终输出结果。一次神经网络预测运算将涉及其中全部感知器的运算,这一过程被称为正向传播。
在有了以上的感知器后,就能够进行学生录取状况的预测工做了。可是不出意外的话,使用这样的神经网络并不能给出靠谱的预测,由于目前咱们并不知道各个输入特征的权重值(weight)。使用不靠谱的权重天然不会得出像样的结果。好在咱们有不少现成的历史数据,即咱们知道什么样的学生已经被录取了,也知道什么样的学生没有被录取。咱们能够将历史数据的学生信息带入神经网络,看看咱们的神经网络所产生的输出结果和实际的结果有什么不一样。而后根据结果不一样的对比来修正权重,如此下去神经网络将变得愈来愈准确(hopefully)。
这个过程被称为神经网络的训练,而那些现有的真实数据被称做训练数据集。神经网络刚被建立时,权重是随机值。当神经网络根据训练数据集学习到什么样的输入数据会致使什么样的输出结果以后,网络会根据以前权重下分类的错误来调整它们。
为了作以上的骚操做,咱们须要理清两件事情:
关于输出结果差距的量化,一个很直觉的方法即是把真实结果 y 和计算出的结果 y^ 相减。可是这样并非最好的方法,由于这会带来负数,不利于判断差值的大小。在此,咱们用 y 和y^ 相减后的平方值来量化训练时每一次预测计算的差值。则在神经网络运行过全部的训练数据后,差值的总和为(为何前面有个1/2?纯粹为了方便后面的演算):
这个值被称为SSE(Sum of Squared Errors of prediction)。为了使神经网络有尽量好的表现,咱们但愿SSE越小越好,由于SSE越小,神经网络计算出的输出结果也就越贴近事实。
接下来的问题就是怎样调整权重和偏置项了。能够从公式中看出,SSE的大小和输入 x 和权重 w 相关。咱们并不能对输入作什么手脚,在此只能考虑怎样对权重作出改动。为了使说明更加清晰,在此单独考虑一条数据记录了计算以及相对应的那个输出结果。假设SSE与权重 w 的关系以下图。
若要使得SSE最小化,权重须要在每一个训练迭代中不停作出调整,直到最终到达是SSE最小的值。这个过程便是梯度降低。
权重调整的大小与当前 w 位置的梯度值成反比,如下是一段公式推导:
δ在计算中被称做error term,没有实际意义,纯粹为了数学方便。η被称为学习速率(learning rate),由开发者自行设置,这个参数控制了权重变化的速度。正确设置这个参数在神经网络的训练中尤其重要,太低的学习速率会是的网络须要花很长时间才能达到理想的准确率,过大的学习速率会使得网络不停跳过权重的最佳值,使网络准确率在训练师波动频繁,甚至彻底没法达到理想却实际上可能的最佳状态。
如下为梯度降低的Python实现样例:
import numpy as np # 这里使用sigmoid做为激活函数 def sigmoid(x): return 1 / (1 + np.exp(-x)) np.random.seed(42) n_records, n_features = features.shape last_loss = None weights = np.random.normal(scale=1 / n_features**.5, size=n_features) epochs = 1000 learnrate = 0.5 for e in range(epochs): del_w = np.zeros(weights.shape) for x, y in zip(features.values, targets): # 公式的力量 output = sigmoid(np.dot(x, weights)) error = y - output error_term = error * output * (1 - output) del_w += error_term * x weights += learnrate * del_w / n_records if e % (epochs / 10) == 0: out = sigmoid(np.dot(features, weights)) loss = np.mean((out - targets) ** 2) print("Train loss: ", loss) tes_out = sigmoid(np.dot(features_test, weights)) predictions = tes_out > 0.5 accuracy = np.mean(predictions == targets_test) print("Prediction accuracy: {:.3f}".format(accuracy))
与正向传播相反,在复杂的网络结构中,权重从最后一层(结果输出)逐步向以前的网络层级更新,这一过程便是反向传播。虽然反向传播的发明者、深度学习教父Geoffrey Hinton不久前指出目前的反向传播算法有诸多缺陷,急需被取代。咱们在仰望大神们新的研究成果的同时,反向传播还是当下最有效的学习手段。
如下是一个仅用numpy
实现的包含一个隐藏层的神经网络,激活函数分别是:
NeuralNetwork.py:
import numpy as np class NeuralNetwork: def __init__(self, input_nodes, hidden_nodes, output_nodes, learning_rate, weights_input_to_hidden=None, weights_hidden_to_output=None): self.input_nodes = input_nodes self.hidden_nodes = hidden_nodes self.output_nodes = output_nodes # Initialize weights if type(weights_input_to_hidden).__name__ == 'NoneType' and type(weights_hidden_to_output).__name__ == 'NoneType': self.weights_input_to_hidden = np.random.normal(0.0, self.input_nodes**-0.5, (self.input_nodes, self.hidden_nodes)) self.weights_hidden_to_output = np.random.normal(0.0, self.hidden_nodes**-0.5, (self.hidden_nodes, self.output_nodes)) else: self.weights_input_to_hidden = weights_input_to_hidden self.weights_hidden_to_output = weights_hidden_to_output self.lr = learning_rate def sigmoid(x): return 1 / (1 + np.exp( -x )) def sigmoid_prime(x): return sigmoid(x) * (1 - sigmoid(x)) def linear(x): return x def linear_prime(x): return x ** 0 # Activation functions self.activation_function = sigmoid self.activation_function_prime = sigmoid_prime self.activation_function2 = linear self.activation_function_prime2 = linear_prime def train(self, features, targets): n_records = features.shape[0] delta_weights_i_h = np.zeros(self.weights_input_to_hidden.shape) delta_weights_h_o = np.zeros(self.weights_hidden_to_output.shape) for X, y in zip(features, targets): # Forward Pass hidden_inputs = np.dot(X, self.weights_input_to_hidden) hidden_outputs = self.activation_function(hidden_inputs) final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) final_outputs = self.activation_function2(final_inputs) # Backward Pass error = y - final_outputs output_error_term = error * self.activation_function_prime2(final_outputs) hidden_error = np.dot(output_error_term, self.weights_hidden_to_output.T) hidden_error_term = hidden_error * self.activation_function_prime(hidden_inputs) # Weight steps delta_weights_i_h += hidden_error_term * X[:, None] delta_weights_h_o += output_error_term * hidden_outputs[:, None] self.weights_hidden_to_output += self.lr * delta_weights_h_o / n_records self.weights_input_to_hidden += self.lr * delta_weights_i_h / n_records def run(self, features): hidden_inputs = np.dot(features, self.weights_input_to_hidden) hidden_outputs = self.activation_function(hidden_inputs) final_inputs = np.dot(hidden_outputs, self.weights_hidden_to_output) final_outputs = self.activation_function2(final_inputs) return final_outputs def get_weights(self): return self.weights_input_to_hidden, self.weights_hidden_to_output
DataProcessor.py:
import numpy as np import pandas as pd import matplotlib.pyplot as plt class DataProcessor: def __init__(self, data_path): self.orig_data = pd.read_csv(data_path) self.data = self.orig_data self.scaled_features = {} self.train_features = None self.train_targets = None self.test_features = None self.test_targets = None self.test_data = None self.val_features = None self.val_targets = None def show_data(self, plot_by_dteday=False): print (self.data.head()) if plot_by_dteday == True: self.data[:24*10].plot(x='dteday', y='cnt', title='Data for the first 10 days') plt.show() def virtualize(self): # Add virtualized data columns dummy_fields = ['season', 'weathersit', 'mnth', 'hr', 'weekday'] for each in dummy_fields: dummies = pd.get_dummies(self.data[each], prefix=each, drop_first=False) self.data = pd.concat([self.data, dummies], axis=1) # Drop scale data columns fields_to_drop = ['instant', 'dteday', 'season', 'weathersit', 'weekday', 'atemp', 'mnth', 'workingday', 'hr'] self.data = self.data.drop(fields_to_drop, axis=1) def normalize(self): quant_features = ['casual', 'registered', 'cnt', 'temp', 'hum', 'windspeed'] for each in quant_features: mean, std = self.data[each].mean(), self.data[each].std() self.scaled_features[each] = [mean, std] self.data.loc[:, each] = (self.data[each] - mean) / std def split(self): # Save data of last 21 days for testing self.test_data = self.data[-21 * 24:] self.data = self.data[:-21 * 24] target_fields = ['cnt', 'casual', 'registered'] features, targets = self.data.drop(target_fields, axis=1), self.data[target_fields] self.test_features, self.test_targets = self.test_data.drop(target_fields, axis=1), self.test_data[target_fields] self.train_features, self.train_targets = features[:-60*24], targets[:-60*24] self.val_features, self.val_targets = features[-60*24:], targets[-60*24:] def get_train_data(self): return self.train_features, self.train_targets def get_test_data(self): return self.test_features, self.test_targets, self.test_data def get_val_data(self): return self.val_features, self.val_targets def get_scaled_features(self): return self.scaled_features def get_orig_data(self): return self.orig_data
Train.py
import sys import json from pprint import pprint import DataProcessor import NeuralNetwork import numpy as np import matplotlib.pyplot as plt # Get training parameters with open('networkConfig.json') as config_file: config = json.load(config_file) pprint(config) iterations = config['iterations'] learning_rate = config['learning_rate'] hidden_nodes = config['hidden_nodes'] output_nodes = config['output_nodes'] # Get data data_processor = DataProcessor.DataProcessor('Bike-Sharing-Dataset/hour.csv') data_processor.virtualize() data_processor.normalize() data_processor.split() train_features, train_targets = data_processor.get_train_data() val_features, val_targets = data_processor.get_val_data() # Initialize NeuralNetwork N_i = train_features.shape[1] network = NeuralNetwork.NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate) losses = {'train': [], 'validation': []} def MSE(y, Y): return np.mean((y-Y)**2) for ii in range(iterations): # pick 128 random records from training data set batch = np.random.choice(train_features.index, size=128) X, y = train_features.ix[batch].values, train_targets.ix[batch]['cnt'] network.train(X, y) train_loss = MSE(network.run(train_features).T, train_targets['cnt'].values) val_loss = MSE(network.run(val_features).T, val_targets['cnt'].values) sys.stdout.write("\rProgress: {:2.1f}".format(100 * ii/float(iterations)) \ + "% ... Training loss: " + str(train_loss)[:5] \ + " ... Validation loss: " + str(val_loss)[:5]) sys.stdout.flush() losses['train'].append(train_loss) losses['validation'].append(val_loss) # Store weights weights_input_to_hidden, weights_hidden_to_output = network.get_weights() np.save('weights_input_to_hidden', weights_input_to_hidden) np.save('weights_hidden_to_output', weights_hidden_to_output) # Plot losses plt.plot(losses['train'], label='Training loss') plt.plot(losses['validation'], label='Validation loss') plt.legend() _ = plt.ylim() plt.show()
Run.py
import json from pprint import pprint import DataProcessor import NeuralNetwork import numpy as np import pandas as pd import matplotlib.pyplot as plt # Get training parameters with open('networkConfig.json') as config_file: config = json.load(config_file) pprint(config) iterations = config['iterations'] learning_rate = config['learning_rate'] hidden_nodes = config['hidden_nodes'] output_nodes = config['output_nodes'] # Get data data_processor = DataProcessor.DataProcessor('Bike-Sharing-Dataset/hour.csv') data_processor.virtualize() data_processor.normalize() data_processor.split() test_features, test_targets, test_data = data_processor.get_test_data() scaled_features = data_processor.get_scaled_features() orig_data = data_processor.get_orig_data() mean, std = scaled_features['cnt'] # Initialize network weights_input_to_hidden = np.load('weights_input_to_hidden.npy') weights_hidden_to_output = np.load('weights_hidden_to_output.npy') N_i = test_features.shape[1] network = NeuralNetwork.NeuralNetwork(N_i, hidden_nodes, output_nodes, learning_rate, weights_input_to_hidden=weights_input_to_hidden, weights_hidden_to_output=weights_hidden_to_output) # Run network prediction predictions = network.run(test_features).T * std + mean # Plot prediction and ground trueth fig, ax = plt.subplots(figsize=(8, 4)) ax.plot(predictions[0], label='Prediction') ax.plot((test_targets['cnt']*std + mean).values, label='Data') ax.set_xlim(right=len(predictions)) ax.legend() dates = pd.to_datetime(orig_data.ix[test_data.index]['dteday']) dates = dates.apply(lambda d: d.strftime('%b %d')) ax.set_xticks(np.arange(len(dates))[12::24]) _ = ax.set_xticklabels(dates[12::24], rotation=45) plt.show()
networkConfig.json
{ "iterations": 10000, "learning_rate": 0.1, "hidden_nodes": 7, "output_nodes": 1 }
> curl -O https://archive.ics.uci.edu/ml/machine-learning-databases/00275/Bike-Sharing-Dataset.zip % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 100 273k 100 273k 0 0 26888 0 0:00:10 0:00:10 --:--:-- 59889 > unzip Bike-Sharing-Dataset.zip Archive: Bike-Sharing-Dataset.zip inflating: Readme.txt inflating: day.csv inflating: hour.csv
使用这些数据,这个神经网络在训练以后将能够预测租赁自行车的使用量。
>>> from DataProcessor import DataProcessor as dp >>> data_processor = dp('Bike-Sharing-Dataset/hour.csv') >>> data_processor.show_data() instant dteday season yr mnth hr holiday weekday workingday \ 0 1 2011-01-01 1 0 1 0 0 6 0 1 2 2011-01-01 1 0 1 1 0 6 0 2 3 2011-01-01 1 0 1 2 0 6 0 3 4 2011-01-01 1 0 1 3 0 6 0 4 5 2011-01-01 1 0 1 4 0 6 0 weathersit temp atemp hum windspeed casual registered cnt 0 1 0.24 0.2879 0.81 0.0 3 13 16 1 1 0.22 0.2727 0.80 0.0 8 32 40 2 1 0.22 0.2727 0.80 0.0 5 27 32 3 1 0.24 0.2879 0.75 0.0 3 10 13 4 1 0.24 0.2879 0.75 0.0 0 1 1
>>> data_processor.virtualize() >>> data_processor.show_data() yr holiday temp hum windspeed casual registered cnt season_1 \ 0 0 0 0.24 0.81 0.0 3 13 16 1 1 0 0 0.22 0.80 0.0 8 32 40 1 2 0 0 0.22 0.80 0.0 5 27 32 1 3 0 0 0.24 0.75 0.0 3 10 13 1 4 0 0 0.24 0.75 0.0 0 1 1 1 season_2 ... hr_21 hr_22 hr_23 weekday_0 weekday_1 weekday_2 \ 0 0 ... 0 0 0 0 0 0 1 0 ... 0 0 0 0 0 0 2 0 ... 0 0 0 0 0 0 3 0 ... 0 0 0 0 0 0 4 0 ... 0 0 0 0 0 0 weekday_3 weekday_4 weekday_5 weekday_6 0 0 0 0 1 1 0 0 0 1 2 0 0 0 1 3 0 0 0 1 4 0 0 0 1 [5 rows x 59 columns]
>>> data_processor.normalize() >>> data_processor.show_data() yr holiday temp hum windspeed casual registered cnt \ 0 0 0 -1.334609 0.947345 -1.553844 -0.662736 -0.930162 -0.956312 1 0 0 -1.438475 0.895513 -1.553844 -0.561326 -0.804632 -0.823998 2 0 0 -1.438475 0.895513 -1.553844 -0.622172 -0.837666 -0.868103 3 0 0 -1.334609 0.636351 -1.553844 -0.662736 -0.949983 -0.972851 4 0 0 -1.334609 0.636351 -1.553844 -0.723582 -1.009445 -1.039008 season_1 season_2 ... hr_21 hr_22 hr_23 weekday_0 weekday_1 \ 0 1 0 ... 0 0 0 0 0 1 1 0 ... 0 0 0 0 0 2 1 0 ... 0 0 0 0 0 3 1 0 ... 0 0 0 0 0 4 1 0 ... 0 0 0 0 0 weekday_2 weekday_3 weekday_4 weekday_5 weekday_6 0 0 0 0 0 1 1 0 0 0 0 1 2 0 0 0 0 1 3 0 0 0 0 1 4 0 0 0 0 1 [5 rows x 59 columns]
> python Train.py
在训练以前,你可能想要自行调整一下networkConfig.json中的超参数:
{ "iterations": 10000, "learning_rate": 0.1, "hidden_nodes": 7, "output_nodes": 1 }
训练完成以后,你将看到:
在网络训练以后,便可运行网络
> python Run.py
你将看到以下的图,预测数据和实际数据的对比: