[TOC] 更新、更全的《机器学习》的更新网站,更有python、go、数据结构与算法、爬虫、人工智能教学等着你:<a target="_blank" href="https://www.cnblogs.com/nickchen121/p/11686958.html">http://www.javashuo.com/article/p-vozphyqp-cm.html</a>html
import pandas as pd import matplotlib.pyplot as plt from matplotlib.font_manager import FontProperties from sklearn.model_selection import train_test_split from sklearn.linear_model import LinearRegression %matplotlib inline font = FontProperties(fname='/Library/Fonts/Heiti.ttc')
df = pd.read_csv('housing-data.txt', sep='\s+', header=0) X = df.iloc[:, :-1].values y = df['MEDV'].values # 将数据分红训练集(0.7)和测试集(0.3) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
lr = LinearRegression() # 训练模型 lr.fit(X_train, y_train) # 预测训练集数据 y_train_predict = lr.predict(X_train) # 预测测试集数据 y_test_predict = lr.predict(X_test)
# y_train_predict-y_train训练数据偏差值 plt.scatter(y_train_predict, y_train_predict-y_train, c='r', marker='s', edgecolor='white', label='训练数据') # y_train_predict-y_train测试数据偏差值 plt.scatter(y_test_predict, y_test_predict-y_test, c='g', marker='o', edgecolor='white', label='测试数据') plt.xlabel('预测值', fontproperties=font) plt.ylabel('偏差值', fontproperties=font) # 可视化y=0的一条直线即偏差为0的直线 plt.hlines(y=0, xmin=-10, xmax=50, color='k') plt.xlim(-10, 50) plt.legend(prop=font) plt.show()
_8_0.png?x-oss-process=style/watermark)python
from sklearn.metrics import mean_squared_error # 训练集的均方偏差 train_mse = mean_squared_error(y_train,y_train_predict) # 测试集的均方偏差 test_mse = mean_squared_error(y_test,y_test_predict) print('训练集的均方偏差:{}'.format(train_mse)) print('测试集的均方偏差:{}'.format(test_mse))
训练集的均方偏差:23.049177061822277 测试集的均方偏差:19.901828312902534
训练集的均方偏差是19.4,而测试集的均方偏差是28.4,能够发现测试集的偏差更大了,也就是说训练集过拟合了。算法