《机器学习》学习笔记（一）：线性回归、逻辑回归

时间 2019-12-05

原文原文链接

本笔记主要记录学习《机器学习》的总结体会。若有理解不到位的地方，欢迎你们指出，我会努力改正。算法

在学习《机器学习》时，我主要是经过Andrew Ng教授在mooc上提供的《Machine Learning》课程，不得不说Andrew Ng老师在讲授这门课程时，真的很用心，特别是编程练习，这门课真的很nice，在此谢谢Andrew Ng老师的付出。同时也谢过告知这个平台的小伙伴。本文在写的过程当中，多有借鉴Andrew Ng教授在mooc提供的资料，再次感谢。编程

转载请注明出处：http://blog.csdn.net/u010278305
机器学习

什么是机器学习？我认为机器学习就是，给定必定的信息（如一间房子的面子，一幅图片每一个点的像素值等等），经过对这些信息进行“学习”，得出一个“学习模型“，这个模型能够在有该类型的信息输入时，输出咱们感兴趣的结果。比如咱们若是要进行手写数字的识别，已经给定了一些已知信息（一些图片和这些图片上的手写数字是多少），咱们能够按如下步骤进行学习：函数

一、将这些图片每一个点的像素值与每一个图片的手写数字值输入”学习系统“。学习

二、经过”学习过程“，咱们获得一个”学习模型“，这个模型能够在有新的手写数字的图片输入时，给出这张图片对应手写数字的合理估计。
测试

什么是线性回归？个人理解就是，用一个线性函数对提供的已知数据进行拟合，最终获得一个线性函数，使这个函数知足咱们的要求（如具备最小平方差,随后咱们将定义一个代价函数，使这个目标量化），以后咱们能够利用这个函数，对给定的输入进行预测（例如，给定房屋面积，咱们预测这个房屋的价格）。以下图所示：.net

假设咱们最终要的获得的假设函数具备以下形式：scala

其中，x是咱们的输入，theta是咱们要求得的参数。code

代价函数以下：blog

咱们的目标是使得此代价函数具备最小值。

为此，咱们还须要求得代价函数关于参量theta的导数，即梯度，具备以下形式：

有了这些信息以后，咱们就能够用梯度降低算法来求得theta参数。过程以下：

其实，为了求得theta参数，有更多更好的算法能够选择，咱们能够经过调用matlab的fminunc函数实现,而咱们只需求出代价与梯度，供该函数调用便可。

根据以上公式，咱们给出代价函数的具体实现：

function J = computeCostMulti(X, y, theta)
%COMPUTECOSTMULTI Compute cost for linear regression with multiple variables
%   J = COMPUTECOSTMULTI(X, y, theta) computes the cost of using theta as the
%   parameter for linear regression to fit the data points in X and y

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;

% Instructions: Compute the cost of a particular choice of theta
%               You should set J to the cost.
hThetaX=X*theta;
J=1/(2*m)*sum((hThetaX-y).^2);

end

什么是逻辑回归？相比于线性回归，逻辑回归只会输出一些离散的特定值（例如断定一封邮件是否为垃圾邮件，输出只有0和1），并且对假设函数进行了处理，使得输出只在0和1之间。

假设函数以下：

代价函数以下：

梯度函数以下，观察可知，形式与线性回归时同样：

有了这些信息，咱们就能够经过fminunc求出最优的theta参数，咱们只需给出代价与梯度的计算方式，代码以下：

function [J, grad] = costFunction(theta, X, y)
%COSTFUNCTION Compute cost and gradient for logistic regression
%   J = COSTFUNCTION(theta, X, y) computes the cost of using theta as the
%   parameter for logistic regression and the gradient of the cost
%   w.r.t. to the parameters.

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
%
% Note: grad should have the same dimensions as theta
%
hThetaX=sigmoid(X * theta);
J=1/m*sum(-y.*log(hThetaX)-(1-y).*log(1-hThetaX));
grad=(1/m*(hThetaX-y)'*X)';

end

其中，sigmod函数以下：

function g = sigmoid(z)
%SIGMOID Compute sigmoid functoon
%   J = SIGMOID(z) computes the sigmoid of z.

% You need to return the following variables correctly 
g = zeros(size(z));

% Instructions: Compute the sigmoid of each value of z (z can be a matrix,
%               vector or scalar).
e=exp(1);
g=1./(1+e.^-z);

end

有时，会出现”过拟合“的状况，即求得的参数可以很好的拟合训练集中的数据，但在进行预测时，明显与趋势不符，比如下图所示：

此时，咱们须要进行正则化处理，对参数进行惩罚，使得除theta(1)以外的theta值均保持较小值。

进行正则化以后的代价函数以下：

进行正则化以后的梯度以下：

下面给出正则化以后的代价与梯度值得代码：

function [J, grad] = costFunctionReg(theta, X, y, lambda)
%COSTFUNCTIONREG Compute cost and gradient for logistic regression with regularization
%   J = COSTFUNCTIONREG(theta, X, y, lambda) computes the cost of using
%   theta as the parameter for regularized logistic regression and the
%   gradient of the cost w.r.t. to the parameters. 

% Initialize some useful values
m = length(y); % number of training examples

% You need to return the following variables correctly 
J = 0;
grad = zeros(size(theta));

% Instructions: Compute the cost of a particular choice of theta.
%               You should set J to the cost.
%               Compute the partial derivatives and set grad to the partial
%               derivatives of the cost w.r.t. each parameter in theta
hThetaX=sigmoid(X * theta);
theta(1)=0;
J=1/m*sum(-y.*log(hThetaX)-(1-y).*log(1-hThetaX))+lambda/(2*m)*sum(theta.^2);
grad=(1/m*(hThetaX-y)'*X)' + lambda/m*theta;

end

对于线性回归，正则化的过程基本相似。

至于如何选择正则化时的常数lambda，咱们能够将数据分为训练集、交叉验证集和测试集三部分，在不一样lambda下，先用训练集求出参数theta，以后求出训练集与交叉验证集的代价，经过分析得出适合的lambda。以下图所示：

转载请注明出处：http://blog.csdn.net/u010278305