吴恩达机器学习 课堂笔记 Chapter8 正则化(Regularization)
The problem of overfitting
If we have too many features, the learned hypothesis may fit the training set very well, but fail to generalize to new examples.
- Underfitting => high bias
- Overfitting => high variance
Methods for addressing overfitting
- Reduce number of features
Manually choose which features to keep => Drop some information meanwhile.
Model selection algorithm
- Regularization
Keep all the features, but reduce magnitude/values of parameter
θj
Works well when we have lots of features, each of which contributes a bit to predicting y.
Cost function
Intuition
Small values for parameters
θ0,θ1,...,θn
- “Simpler” hypothesis
- Less prone to overfitting
- We cannot know in advance which ones to pick. So we ask for every parameter to be small.
Cost function
J(θ)=2m1(i=0∑m(hθ(x(i)−yi)2)+λi=1∑nθj2)
NOTE:
We do not penalize
θ0.
λ:regularization parameter. Need to choose it well.
Regularized linear regression
Gradient descent
Normal equation
Regularized logistic regression
Cost function
J(θ)=−m1∑(yloghθ(x)(1−y)log(1−hθ(x)))+2m1i=1∑nθj2
Gradient descent
Same as above.