sklearn逻辑回归(Logistic Regression,LR)调参指南

时间 2019-11-06

标签 sklearn 逻辑回归 logistic regression 指南栏目应用数学繁體版

原文原文链接

python信用评分卡建模（附代码，博主录制）

https://study.163.com/course/introduction.htm?courseId=1005214003&utm_campaign=commission&utm_source=cp-400000000398149&utm_medium=sharehtml

sklearn逻辑回归官网调参指南python

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.htmlgit

`sklearn.linear_model`.LogisticRegression

class sklearn.linear_model. LogisticRegression (penalty=’l2’, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver=’warn’, max_iter=100, multi_class=’warn’, verbose=0, warm_start=False, n_jobs=None, l1_ratio=None)[source]¶

Logistic Regression (aka logit, MaxEnt) classifier.github

In the multiclass case, the training algorithm uses the one-vs-rest (OvR) scheme if the ‘multi_class’ option is set to ‘ovr’, and uses the cross-entropy loss if the ‘multi_class’ option is set to ‘multinomial’. (Currently the ‘multinomial’ option is supported only by the ‘lbfgs’, ‘sag’, ‘saga’ and ‘newton-cg’ solvers.)算法

This class implements regularized logistic regression using the ‘liblinear’ library, ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ solvers. Note that regularization is applied by default. It can handle both dense and sparse input. Use C-ordered arrays or CSR matrices containing 64-bit floats for optimal performance; any other input format will be converted (and copied).shell

The ‘newton-cg’, ‘sag’, and ‘lbfgs’ solvers support only L2 regularization with primal formulation, or no regularization. The ‘liblinear’ solver supports both L1 and L2 regularization, with a dual formulation only for the L2 penalty. The Elastic-Net regularization is only supported by the ‘saga’ solver.微信

Parameters:	penalty : str, ‘l1’, ‘l2’, ‘elasticnet’ or ‘none’, optional (default=’l2’) Used to specify the norm used in the penalization. The ‘newton-cg’, ‘sag’ and ‘lbfgs’ solvers support only l2 penalties. ‘elasticnet’ is only supported by the ‘saga’ solver. If ‘none’ (not supported by the liblinear solver), no regularization is applied.less New in version 0.19: l1 penalty with SAGA solver (allowing ‘multinomial’ + L1)dom dual : bool, optional (default=False) Dual or primal formulation. Dual formulation is only implemented for l2 penalty with liblinear solver. Prefer dual=False when n_samples > n_features. tol : float, optional (default=1e-4) Tolerance for stopping criteria. C : float, optional (default=1.0) Inverse of regularization strength; must be a positive float. Like in support vector machines, smaller values specify stronger regularization. fit_intercept : bool, optional (default=True) Specifies if a constant (a.k.a. bias or intercept) should be added to the decision function. intercept_scaling : float, optional (default=1) Useful only when the solver ‘liblinear’ is used and self.fit_intercept is set to True. In this case, x becomes [x, self.intercept_scaling], i.e. a “synthetic” feature with constant value equal to intercept_scaling is appended to the instance vector. The intercept becomes `intercept_scaling * synthetic_feature_weight`. Note! the synthetic feature weight is subject to l1/l2 regularization as all other features. To lessen the effect of regularization on synthetic feature weight (and therefore on the intercept) intercept_scaling has to be increased. class_weight : dict or ‘balanced’, optional (default=None) Weights associated with classes in the form `{class_label: weight}`. If not given, all classes are supposed to have weight one. The “balanced” mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as `n_samples / (n_classes * np.bincount(y))`. Note that these weights will be multiplied with sample_weight (passed through the fit method) if sample_weight is specified. New in version 0.17: class_weight=’balanced’ random_state : int, RandomState instance or None, optional (default=None) The seed of the pseudo random number generator to use when shuffling the data. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by `np.random`. Used when `solver` == ‘sag’ or ‘liblinear’. solver : str, {‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, optional (default=’liblinear’). Algorithm to use in the optimization problem. For small datasets, ‘liblinear’ is a good choice, whereas ‘sag’ and ‘saga’ are faster for large ones. For multiclass problems, only ‘newton-cg’, ‘sag’, ‘saga’ and ‘lbfgs’ handle multinomial loss; ‘liblinear’ is limited to one-versus-rest schemes. ‘newton-cg’, ‘lbfgs’, ‘sag’ and ‘saga’ handle L2 or no penalty ‘liblinear’ and ‘saga’ also handle L1 penalty ‘saga’ also supports ‘elasticnet’ penalty ‘liblinear’ does not handle no penalty Note that ‘sag’ and ‘saga’ fast convergence is only guaranteed on features with approximately the same scale. You can preprocess the data with a scaler from sklearn.preprocessing. New in version 0.17: Stochastic Average Gradient descent solver. New in version 0.19: SAGA solver. Changed in version 0.20: Default will change from ‘liblinear’ to ‘lbfgs’ in 0.22. max_iter : int, optional (default=100) Maximum number of iterations taken for the solvers to converge. multi_class : str, {‘ovr’, ‘multinomial’, ‘auto’}, optional (default=’ovr’) If the option chosen is ‘ovr’, then a binary problem is fit for each label. For ‘multinomial’ the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. ‘multinomial’ is unavailable when solver=’liblinear’. ‘auto’ selects ‘ovr’ if the data is binary, or if solver=’liblinear’, and otherwise selects ‘multinomial’. New in version 0.18: Stochastic Average Gradient descent solver for ‘multinomial’ case. Changed in version 0.20: Default will change from ‘ovr’ to ‘auto’ in 0.22. verbose : int, optional (default=0) For the liblinear and lbfgs solvers set verbose to any positive number for verbosity. warm_start : bool, optional (default=False) When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution. Useless for liblinear solver. See the Glossary. New in version 0.17: warm_start to support lbfgs, newton-cg, sag, saga solvers. n_jobs : int or None, optional (default=None) Number of CPU cores used when parallelizing over classes if multi_class=’ovr’”. This parameter is ignored when the `solver` is set to ‘liblinear’ regardless of whether ‘multi_class’ is specified or not. `None` means 1 unless in a `joblib.parallel_backend` context. `-1` means using all processors. See Glossary for more details. l1_ratio : float or None, optional (default=None) The Elastic-Net mixing parameter, with `0 <= l1_ratio <= 1`. Only used if penalty='elasticnet'`. Setting ``l1_ratio=0 is equivalent to using `penalty='l2'`, while setting `l1_ratio=1` is equivalent to using `penalty='l1'`. For `0 < l1_ratio <1`, the penalty is a combination of L1 and L2.
Attributes:	classes_ : array, shape (n_classes, ) A list of class labels known to the classifier. coef_ : array, shape (1, n_features) or (n_classes, n_features) Coefficient of the features in the decision function. coef_ is of shape (1, n_features) when the given problem is binary. In particular, when `multi_class='multinomial'`, coef_ corresponds to outcome 1 (True) and `-coef_` corresponds to outcome 0 (False). intercept_ : array, shape (1,) or (n_classes,) Intercept (a.k.a. bias) added to the decision function. If `fit_intercept` is set to False, the intercept is set to zero. `intercept_` is of shape (1,) when the given problem is binary. In particular, when `multi_class='multinomial'`, `intercept_` corresponds to outcome 1 (True) and `-intercept_` corresponds to outcome 0 (False). n_iter_ : array, shape (n_classes,) or (1, ) Actual number of iterations for all classes. If binary or multinomial, it returns only 1 element. For liblinear solver, only the maximum number of iteration across all classes is given. Changed in version 0.20: In SciPy <= 1.0.0 the number of lbfgs iterations may exceed `max_iter`. `n_iter_` will now report at most `max_iter`.

1. 概述

　　　　在scikit-learn中，与逻辑回归有关的主要是这3个类。LogisticRegression， LogisticRegressionCV 和logistic_regression_path。其中LogisticRegression和LogisticRegressionCV的主要区别是LogisticRegressionCV使用了交叉验证来选择正则化系数C。而LogisticRegression须要本身每次指定一个正则化系数。除了交叉验证，以及选择正则化系数C之外， LogisticRegression和LogisticRegressionCV的使用方法基本相同。

　　　　logistic_regression_path类则比较特殊，它拟合数据后，不能直接来作预测，只能为拟合数据选择合适逻辑回归的系数和正则化系数。主要是用在模型选择的时候。通常状况用不到这个类，因此后面再也不讲述logistic_regression_path类。

　　　　此外，scikit-learn里面有个容易让人误解的类RandomizedLogisticRegression,虽然名字里有逻辑回归的词，可是主要是用L1正则化的逻辑回归来作特征选择的，属于维度规约的算法类，不属于咱们常说的分类算法的范畴。

　　　　后面的讲解主要围绕LogisticRegression和LogisticRegressionCV中的重要参数的选择来来展开，这些参数的意义在这两个类中都是同样的。

2. 正则化选择参数：penalty

　　　　LogisticRegression和LogisticRegressionCV默认就带了正则化项。penalty参数可选择的值为"l1"和"l2".分别对应L1的正则化和L2的正则化，默认是L2的正则化。

　　　　在调参时若是咱们主要的目的只是为了解决过拟合，通常penalty选择L2正则化就够了。可是若是选择L2正则化发现仍是过拟合，即预测效果差的时候，就能够考虑L1正则化。另外，若是模型的特征很是多，咱们但愿一些不重要的特征系数归零，从而让模型系数稀疏化的话，也可使用L1正则化。

　　　　penalty参数的选择会影响咱们损失函数优化算法的选择。即参数solver的选择，若是是L2正则化，那么4种可选的算法{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’}均可以选择。可是若是penalty是L1正则化的话，就只能选择‘liblinear’了。这是由于L1正则化的损失函数不是连续可导的，而{‘newton-cg’, ‘lbfgs’,‘sag’}这三种优化算法时都须要损失函数的一阶或者二阶连续导数。而‘liblinear’并无这个依赖。

　　　　具体使用了这4个算法有什么不一样以及有什么影响咱们下一节讲。

3. 优化算法选择参数：solver

　　　　solver参数决定了咱们对逻辑回归损失函数的优化方法，有4种算法能够选择，分别是：

　　　　a) liblinear：使用了开源的liblinear库实现，内部使用了坐标轴降低法来迭代优化损失函数。

　　　　b) lbfgs：拟牛顿法的一种，利用损失函数二阶导数矩阵即海森矩阵来迭代优化损失函数。

　　　　c) newton-cg：也是牛顿法家族的一种，利用损失函数二阶导数矩阵即海森矩阵来迭代优化损失函数。

　　　　d) sag：即随机平均梯度降低，是梯度降低法的变种，和普通梯度降低法的区别是每次迭代仅仅用一部分的样原本计算梯度，适合于样本数据多的时候，SAG是一种线性收敛算法，这个速度远比SGD快。关于SAG的理解，参考博文线性收敛的随机优化算法之 SAG、SVRG（随机梯度降低）

　　　　从上面的描述能够看出，newton-cg, lbfgs和sag这三种优化算法时都须要损失函数的一阶或者二阶连续导数，所以不能用于没有连续导数的L1正则化，只能用于L2正则化。而liblinear通吃L1正则化和L2正则化。

　　　　同时，sag每次仅仅使用了部分样本进行梯度迭代，因此当样本量少的时候不要选择它，而若是样本量很是大，好比大于10万，sag是第一选择。可是sag不能用于L1正则化，因此当你有大量的样本，又须要L1正则化的话就要本身作取舍了。要么经过对样本采样来下降样本量，要么回到L2正则化。

在sklearn的官方文档中，对于solver的使用说明以下：

In a nutshell, one may choose the solver with the following rules:

Case	Solver
Small dataset or L1 penalty	“liblinear”
Multinomial loss or large dataset	“lbfgs”, “sag” or “newton-cg”
Very Large dataset	“sag”

　　　　从上面的描述，你们可能以为，既然newton-cg, lbfgs和sag这么多限制，若是不是大样本，咱们选择liblinear不就好了嘛！错，由于liblinear也有本身的弱点！咱们知道，逻辑回归有二元逻辑回归和多元逻辑回归。对于多元逻辑回归常见的有one-vs-rest(OvR)和many-vs-many(MvM)两种。而MvM通常比OvR分类相对准确一些。郁闷的是liblinear只支持OvR，不支持MvM，这样若是咱们须要相对精确的多元逻辑回归时，就不能选择liblinear了。也意味着若是咱们须要相对精确的多元逻辑回归不能使用L1正则化了。

总结而言，liblinear支持L1和L2，只支持OvR作多分类，“lbfgs”, “sag” “newton-cg”只支持L2，支持OvR和MvM作多分类。

　　　　具体OvR和MvM有什么不一样咱们下一节讲。

4. 分类方式选择参数：multi_class

　　　　multi_class参数决定了咱们分类方式的选择，有 ovr和multinomial两个值能够选择，默认是 ovr。

　　　　ovr即前面提到的one-vs-rest(OvR)，而multinomial即前面提到的many-vs-many(MvM)。若是是二元逻辑回归，ovr和multinomial并无任何区别，区别主要在多元逻辑回归上。

　　　　OvR的思想很简单，不管你是多少元逻辑回归，咱们均可以看作二元逻辑回归。具体作法是，对于第K类的分类决策，咱们把全部第K类的样本做为正例，除了第K类样本之外的全部样本都做为负例，而后在上面作二元逻辑回归，获得第K类的分类模型。其余类的分类模型得到以此类推。

　　　　而MvM则相对复杂，这里举MvM的特例one-vs-one(OvO)做讲解。若是模型有T类，咱们每次在全部的T类样本里面选择两类样本出来，不妨记为T1类和T2类，把全部的输出为T1和T2的样本放在一块儿，把T1做为正例，T2做为负例，进行二元逻辑回归，获得模型参数。咱们一共须要T(T-1)/2次分类。

　　　　从上面的描述能够看出OvR相对简单，但分类效果相对略差（这里指大多数样本分布状况，某些样本分布下OvR可能更好）。而MvM分类相对精确，可是分类速度没有OvR快。

　　　　若是选择了ovr，则4种损失函数的优化方法liblinear，newton-cg, lbfgs和sag均可以选择。可是若是选择了multinomial,则只能选择newton-cg, lbfgs和sag了。

5. 类型权重参数： class_weight

　　　　class_weight参数用于标示分类模型中各类类型的权重，能够不输入，即不考虑权重，或者说全部类型的权重同样。若是选择输入的话，能够选择balanced让类库本身计算类型权重，或者咱们本身输入各个类型的权重，好比对于0,1的二元模型，咱们能够定义class_weight={0:0.9, 1:0.1}，这样类型0的权重为90%，而类型1的权重为10%。

　　　　若是class_weight选择balanced，那么类库会根据训练样本量来计算权重。某种类型样本量越多，则权重越低，样本量越少，则权重越高。

sklearn的官方文档中，当class_weight为balanced时，类权重计算方法以下：

n_samples / (n_classes * np.bincount(y))，n_samples为样本数，n_classes为类别数量，np.bincount(y)会输出每一个类的样本数，例如y=[1,0,0,1,1],则np.bincount(y)=[2,3]

　　　　那么class_weight有什么做用呢？在分类模型中，咱们常常会遇到两类问题：

　　　　第一种是误分类的代价很高。好比对合法用户和非法用户进行分类，将非法用户分类为合法用户的代价很高，咱们宁愿将合法用户分类为非法用户，这时能够人工再甄别，可是却不肯将非法用户分类为合法用户。这时，咱们能够适当提升非法用户的权重。

　　　　第二种是样本是高度失衡的，好比咱们有合法用户和非法用户的二元样本数据10000条，里面合法用户有9995条，非法用户只有5条，若是咱们不考虑权重，则咱们能够将全部的测试集都预测为合法用户，这样预测准确率理论上有99.95%，可是却没有任何意义。这时，咱们能够选择balanced，让类库自动提升非法用户样本的权重。

　　　　提升了某种分类的权重，相比不考虑权重，会有更多的样本分类划分到高权重的类别，从而能够解决上面两类问题。

　　　　固然，对于第二种样本失衡的状况，咱们还能够考虑用下一节讲到的样本权重参数： sample_weight，而不使用class_weight。sample_weight在下一节讲。

6. 样本权重参数： sample_weight

　　　　上一节咱们提到了样本不失衡的问题，因为样本不平衡，致使样本不是整体样本的无偏估计，从而可能致使咱们的模型预测能力降低。遇到这种状况，咱们能够经过调节样本权重来尝试解决这个问题。调节样本权重的方法有两种，第一种是在class_weight使用balanced。第二种是在调用fit函数时，经过sample_weight来本身调节每一个样本权重。

　　　　在scikit-learn作逻辑回归时，若是上面两种方法都用到了，那么样本的真正权重是class_weight*sample_weight.

　　　　以上就是scikit-learn中逻辑回归类库调参的一个小结，还有些参数好比正则化参数C（交叉验证就是 Cs），迭代次数max_iter等，因为和其它的算法类库并无特别不一样，这里很少累述了。

python风控建模实战lendingClub(博主录制，catboost，lightgbm建模，2K超清分辨率)

https://study.163.com/course/courseMain.htm?courseId=1005988013&share=2&shareId=400000000398149

微信扫二维码，免费学习更多python资源