最近在用python作数据挖掘,在聚类的时候遇到了一个很是恶心的问题。话很少说,直接上代码:python
1 from sklearn.cluster import KMeans 2 from sklearn.decomposition import PCA 3 import matplotlib.pyplot as plt 4 #kmeans算法 5 df1=df23 6 kmeans = KMeans(n_clusters=5, random_state=10).fit(df1) 7 #贴上每一个样本对应的簇类别标签 8 df1['level']=kmeans.labels_ 9 #df1.to_csv('new_df.csv') 10 11 df2=df1.groupby('level',as_index=False)['level'].agg({'num': np.size}) 12 print(df2.head()) 13 14 #将用于聚类的数据的特征的维度降至2维 15 pca = PCA(n_components=2) 16 new_pca = pd.DataFrame(pca.fit_transform(df1)) 17 print(new_pca.head()) 18 19 #可视化 20 d = new_pca[df1['level'] == 0] 21 plt.plot(d[0], d[1], 'gv') 22 d = new_pca[df1['level'] == 1] 23 plt.plot(d[0], d[1], 'ko') 24 d = new_pca[df1['level'] == 2] 25 plt.plot(d[0], d[1], 'b*') 26 d = new_pca[df1['level'] == 3] 27 plt.plot(d[0], d[1], 'y+') 28 d = new_pca[df1['level'] == 4] 29 plt.plot(d[0], d[1], 'c.') 30 31 plt.title('the result of polymerization') 32 plt.show()
错误以下:算法
网上找了很久都没找到解决方法,明明以前成功过的。因而我查看了df23数据,发现它是这样的:dom
与以前成功的dataframe的惟一差异就是索引!!!重要的事情说三遍!!!索引!!!索引!!!因而乎,我去找怎么重置索引的方法,见代码:spa
1 df24=df23[["forks_count","has_issues","has_wiki","open_issues_count","stargazers_count","watchers_count","created_pushed_time","created_updated_time"]] 2 df24=df24.reset_index() 3 df24=df24[["forks_count","has_issues","has_wiki","open_issues_count","stargazers_count","watchers_count","created_pushed_time","created_updated_time"]]
而后聚类就成功了。。。心累。。。。code