样本不平衡处理

时间 2019-12-11

标签样本不平衡处理繁體版

原文原文链接

一.下采样python

　　对于样本不均衡来讲,使得两个样本(向少的样本靠齐)一样的少.将多的数据进行裁剪使得样本最后能够均衡,具体的代码设计以下:dom

#以二分类为例
#对整个样本进行分开
one_data=data[data['label']==1].index
zero_data=data[data['label'==0]].index
#将多的样本进行随机的抽样(raplace表明着不重复抽取)
one_sample=np.random.choice(one_data,len(zero_data),replace=False)

under_sample_index=np.concatenate([zero_data,np.array(one_sample)])
data.loc[under_sample_index]

二.过采样设计

　　对于样本不均衡来讲,使得两个样本(向多的样本靠齐)一样的多(制造多的样本)blog

from imblearn.over_sampling import SMOTE
#没有用pip install imblearn
oversampler=SMOTE(random_state=0)
os_features,os_labels=oversampler.fit_sample(features_train,labels_train)