%matplotlib inline

SVCs的正则化参数¶

下面的例子展示了使用svm去进行 scale 正则化参数javascript

若是咱们认为损失函数是每一个样本的单独偏差，那么数据拟合项，或每一个样本的偏差之和，将随着咱们添加更多的样本而增长。可是，处罚部分不会增长。css

当使用cross validation去设置正则化参数Chtml

由于咱们的loss function依赖于样本的数量，因此样本的数量会影响C的取值。由此产生的问题是“咱们如何优化调整C以适应不一样数量的训练样本?”html5

下面的图用于说明在使用 $l1$ 惩罚和 $l2$ 惩罚的状况下，为了补偿样本数量的变化，缩放咱们的“C”所产生的效果。java

$l1-penalty\ case$node

在l1的状况下，理论认为因为l1的误差，预测一致性(即在给定的假设条件下，估计值学习预测以及模型知道真实分布)是不可能的。然而，它确实代表，在找到正确的非零参数集及其符号方面，模型一致性能够经过缩放C1来实现。python

$l2-penalty\ case$jquery

理论说为了得到预测连续性，随着样本量的增长，惩罚参数应该保持固定linux

仿真结果说明：android

下面两个图绘制了C做为横轴，相应的交叉验证分数做为y周，对于生成的数据集的几个不一样部分。

在 $l1$ 惩罚项，交叉验证偏差与测试偏差相关，当根据样本个数n去放缩C时，这点能够在第一二张图中能够看出。

在 $l2$ 惩罚项，最好的结果来源于 C 值没有放缩的案例。

最多见的带惩罚项的SVM 公式：

$$\begin{array}{l}{\min \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi(i)} \\ {\text { s.t. } y^{(i)}\left(w^{T} \Phi\left(x^{(i)}\right)+b\right) \geq 1-\xi(i), i=1,2 \ldots n}\end{array}$$

解释：SVM 的最优化公式等价于 hinge损失 + L2 正则化项

hinge 损失的含义：

$$ max\{0,1-m\} $$

hinge 损失与SVM的联系

假设咱们把hinge损失的横轴坐标变量 m 看作离 SVM 中超平面的距离，

当 $\ \hat{y}\cdot y >1 $时，表明分类正确，即损失为0，当$\ \hat{y}\cdot y <1 $,把离超平面的距离，即 1 - $\hat{y}\cdot y $ 视为损失量。

因此就有： $min\{ \xi\} \Longleftrightarrow min\{hinge损失\} \Longleftrightarrow min\{max\{1-\hat{y}\cdot y\}\}$

因此就有：$$\begin{array}{l}{\min \frac{1}{2}\|w\|^{2}+C \sum_{i=1}^{n} \xi(i)} \\ {\text { s.t. } y^{(i)}\left(w^{T} \Phi\left(x^{(i)}\right)+b\right) \geq 1-\xi(i), i=1,2 \ldots n}\end{array} \\ \Longrightarrow \min _{\omega, \gamma}\left[C \sum_{i=1}^{n} \max \left\{0,1-\left(\omega^{T} x_{i}+\gamma\right) y_{i}\right\}+\frac{1}{2}\|\omega\|_{2}^{2}\right]$$

再将正则化参数C除到右项中，命名为 $\lambda$

$$\Longrightarrow \min _{\omega, \gamma}\left[ \sum_{i=1}^{n} \max \left\{0,1-\left(\omega^{T} x_{i}+\gamma\right) y_{i}\right\}+\lambda \|\omega\|_{2}^{2}\right] $$

上式就为：Hinge损失 + L2正则化。

很是好的损失函数介绍博客：https://blog.csdn.net/u010976453/article/details/78488279

导入数据和库

import numpy as np
import matplotlib.pyplot as plt
from sklearn.svm import LinearSVC
from sklearn.model_selection import GridSearchCV
from sklearn import datasets

# 交叉验证库
from sklearn.model_selection import ShuffleSplit

# 新学的库

from sklearn.utils import check_random_state

MAKE_CLASSIFICATION(分类生成器）：

n_features :特征个数= n_informative（） + n_redundant + n_repeated
n_informative：多信息特征的个数
n_redundant：冗余信息，informative特征的随机线性组合
n_repeated ：重复信息，随机提取n_informative和n_redundant 特征
n_classes：分类类别
n_clusters_per_class ：某一个类别是由几个cluster构成的

参考博文：http://www.freesion.com/article/606117357/

$check\_random\_state()$ : Turn seed into a $np.random.RandomState$ instance

rnd = check_random_state(1)
# set up dataset
n_samples = 100
n_features = 300
# l1 data (only 5 informative features)
X_1, y_1 = datasets.make_classification(n_samples=n_samples,
                                        n_features=n_features, n_informative=5,
                                        random_state=1)

print(y_1)

[1 1 1 0 0 0 1 0 1 1 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1
 1 0 1 1 0 1 0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 1 1 0 1 0 0 1 0 1 1 0 0 0 0 1 0
 0 1 0 1 0 0 0 0 0 0 1 0 1 0 1 1 0 0 1 1 0 1 1 1 0 0]

# L2 数据：没有稀疏，可是特征更少
y_2 = np.sign(.5 - rnd.rand(n_samples))
X_2 = rnd.randn(n_samples, n_features // 5) + y_2[:, np.newaxis]
X_2 += 5 * rnd.randn(n_samples, n_features // 5)

print(y_2)
print(y_2.reshape(100,-1).all()==y_2[:, np.newaxis].all())
print(X_2.shape)
print(X_2[:5,:5])

[ 1. -1. -1.  1.  1. -1.  1.  1. -1.  1. -1.  1.  1. -1.  1.  1. -1.  1.
  1. -1.  1. -1. -1. -1. -1.  1. -1.  1.  1. -1.  1. -1.  1. -1.  1. -1.
 -1.  1.  1.  1.  1. -1.  1.  1. -1. -1.  1. -1.  1. -1.  1.  1.  1.  1.
 -1.  1. -1.  1.  1. -1.  1. -1. -1.  1. -1.  1.  1.  1.  1. -1.  1.  1.
  1.  1.  1.  1.  1. -1. -1.  1. -1.  1.  1.  1. -1.  1. -1.  1.  1. -1.
 -1.  1.  1. -1. -1. -1.  1.  1. -1.  1.]
True
(100, 60)
[[ 6.01232332  1.14532968 -8.5996665   8.04569999 -3.9950747 ]
 [-8.23881306 -8.82447483 -5.90777693  9.59317826 -4.68055683]
 [-2.74578891  6.7955197   5.93798556  9.08262617 -4.40408565]
 [-2.60992335 -0.78653401 -1.48344612  7.13938015  6.75872993]
 [-1.54692851  0.92916065  3.52381451  1.37473455  2.65960139]]

生成一个列表类型，以便以后的循环。列表中输入四个参数：

第一参数：LinearSVC模型
第二参数：交叉验证须要搜索C值的范围
第3、四参数：训练的数据以及标签

clf_sets = [(LinearSVC(penalty='l1', loss='squared_hinge', dual=False,
                       tol=1e-3),
             np.logspace(-2.3, -1.3, 10), X_1, y_1),
            (LinearSVC(penalty='l2', loss='squared_hinge', dual=True,
                       tol=1e-4),
             np.logspace(-4.5, -2, 10), X_2, y_2)]

colors = ['navy', 'cyan', 'darkorange']

lw = 2

for clf, cs, X, y in clf_sets:
    # set up the plot for each regressor
    fig, axes = plt.subplots(nrows=2, sharey=True, figsize=(9, 10))

    for k, train_size in enumerate(np.linspace(0.3, 0.7, 3)[::-1]):
        param_grid = dict(C=cs)
        # To get nice curve, we need a large number of iterations to
        # reduce the variance
        grid = GridSearchCV(clf, refit=False, param_grid=param_grid,
                            cv=ShuffleSplit(train_size=train_size,
                                            test_size=.3,
                                            n_splits=250, random_state=1))
        grid.fit(X, y)
        scores = grid.cv_results_['mean_test_score']

        scales = [(1, 'No scaling'),
                  ((n_samples * train_size), '1/n_samples'),
                  ]

        for ax, (scaler, name) in zip(axes, scales):
            ax.set_xlabel('C')
            ax.set_ylabel('CV Score')
            grid_cs = cs * float(scaler)  # scale the C's
            ax.semilogx(grid_cs, scores, label="fraction %.2f" %
                        train_size, color=colors[k], lw=lw)
            ax.set_title('scaling=%s, penalty=%s, loss=%s' %
                         (name, clf.penalty, clf.loss))

    plt.legend(loc="best")
plt.show()

print(grid_cs)

[0.00094868 0.00179845 0.00340939 0.0064633  0.01225272 0.02322791
 0.04403398 0.08347678 0.15824991 0.3       ]

不是很理解这到底在干什么？L1 貌似处理稀疏数据有优点， L2：处理非稀疏数据。至于缩放C值的目的我也不知道是什么？

SVM官方教程：SVCs的正则化参数

SVCs的正则化参数¶