[Scikit-learn] 2.3 Clustering - Spectral clustering

时间 2019-11-18

标签 scikit learn 2.3 clustering spectral 繁體版

原文原文链接

From: 2.3.5 Clustering - Spectral clusteringhtml

From: 漫谈 Clustering (4): Spectral Clustering算法

From: 漫谈 Clustering (番外篇): Dimensionality Reductiondom

传统方式的弊端

事实上，Laplacian Eigenmap （拉普拉斯特征映射）假设数据分布在一个嵌套在高维空间中的低维流形上， Laplacian Matrix 则是流形的 Laplace Beltrami operator 的一个离散近似。关于流形以及 Laplacian Eigenmap 这个模型的理论知识就不在这里作更多地展开了，下面看一个比较直观的例子 Swiss Roll 。ide

Swiss Roll 是一个像面包圈同样的结构，能够看做一个嵌套在三维空间中的二维流形，以下图所示，左边是 Swiss Roll ，右边是从 Swiss Roll 中随机选出来的一些点，为了标明点在流形上的位置，给它们都标上了颜色。ui

【显然，这个东西无法用传统方式作聚类分析，直观的看上去，除了距离，还应该考虑结构因素】this

而下图是 Laplacian Eigenmap 和 PCA 分别对 Swiss Roll 降维的结果，能够看到 LE 成功地把这个流形展开把在流形上属于不一样位置的点映射到了不一样的地方，而 PCA 的结果则很糟糕，几种颜色都混杂在了一块儿。url

因此，Spectral Clustering便登场了！

能够看到，只有三个比较人性化：SpectralClustering, AgglomerativeClustering, DBSCANspa

SpectralClustering

根据数据构造一个 Graph ，Graph 的每个节点对应一个数据点，将类似的点链接起来，而且边的权重用于表示数据之间的类似度。把这个 Graph 用邻接矩阵的形式表示出来，记为。一个最偷懒的办法就是：直接用咱们前面在 K-medoids 中用的类似度矩阵做为。
把的每一列元素加起来获得个数，把它们放在对角线上（其余地方都是零），组成一个 $N\times N$ 的矩阵，记为。并令。
求出的前个特征值（在本文中，除非特殊说明，不然“前个”指按照特征值的大小从小到大的顺序） $\{\lambda\}_{i=1}^k$ 以及对应的特征向量 $\{\mathbf{v}\}_{i=1}^k$ 。
把这个特征（列）向量排列在一块儿组成一个 $N\times k$ 的矩阵，将其中每一行看做维空间中的一个向量，并使用 K-means 算法进行聚类。聚类的结果中每一行所属的类别就是原来 Graph 中的节点亦即最初的个数据点分别所属的类别。

其实，若是你熟悉 Dimensional Reduction （降维）的话，大概已经看出来了，Spectral Clustering 其实就是：.net

　　经过 Laplacian Eigenmap 的降维方式降维以后再作 K-means 的一个过程。3d

不过，为何要恰好降到维呢？其实整个模型还能够从另外一个角度导出来，因此，让咱们不妨先跑一下题。

在 Image Processing （我好像以前有据说我对这个领域深恶痛绝？）里有一个问题就是对图像进行 Segmentation （区域分割），也就是让类似的像素组成一个区域，好比，咱们通常但愿一张照片里面的人（前景）和背景被分割到不一样的区域中。在 Image Processing 领域里已经有许多自动或半自动的算法来解决这个问题，而且有很多方法和 Clustering 有密切联系。好比咱们在谈 Vector Quantization 的时候就曾经用 K-means 来把颜色类似的像素聚类到一块儿，不过那还不是真正的 Segmentation ，由于若是仅仅是考虑颜色类似的话，图片上位置离得很远的像素也有可能被聚到同一类中，咱们一般并不会把这样一些“游离”的像素构成的东西称为一个“区域”，但这个问题其实也很好解决：只要在聚类用的 feature 中加入位置信息（例如，原来是使用 R、G、B 三个值来表示一个像素，如今加入 x、y 两个新的值）便可。

Ref: http://blog.csdn.net/abcjennifer/article/details/8170687

将边权赋值为两点之间的similarity，作聚类的目标就是最小化类间connection的weight。

好比对于下面这幅图，分割以下

可是这样有可能会有问题，好比：

因为Graph cut criteria 只考虑了类间差小，而没考虑internal cluster density.因此会有上面分割的问题。这里引入Normalised-cut（Shi & Malik, 97'）。

Graph Cut

另外一方面，还有一个常常被研究的问题就是 Graph Cut ，简单地说就是把一个 Graph 的一些边切断，让他被打散成一些独立联通的 sub-Graph ，而这些被切断的边的权值的总和就被称为 Cut 值。

若是用一张图片中的全部像素来组成一个 Graph ，并把（好比，颜色和位置上）类似的节点链接起来，边上的权值表示类似程度，那么把图片分割为几个区域的问题实际上等价于把 Graph 分割为几个 sub-Graph 的问题，而且咱们能够要求分割所得的 Cut 值最小，亦即：那些被切断的边的权值之和最小，直观上咱们能够知道，权重比较大的边没有被切断，表示比较类似的点被保留在了同一个 sub-Graph 中，而彼此之间联系不大的点则被分割开来。咱们能够认为这样一种分割方式是比较好的。

实际上，抛开图像分割的问题不谈，在 Graph Cut 相关的一系列问题中，Minimum cut （最小割）自己就是一个被普遍研究的问题，而且有成熟的算法来求解。只是单纯的最小割在这里一般并非特别适用，不少时候只是简单地把和其余像素联系最弱的那一个像素给分割出去了，相反，咱们一般更但愿分割出来的区域（的大小）要相对均匀一些，而不是一些很大的区块和一些几乎是孤立的点。为此，又有许多替代的算法提出来，如 Ratio Cut 、Normalized Cut 等。

Laplacian Eigenmap

From: http://www.cnblogs.com/lmsj918/p/4075228.html

主要内容是Laplacian Eigenmap中的核心推导过程。

有空仍是多点向这位师兄请教，每次都会捡到很多金子。

Reference : 《Laplacian Eigenmaps for Dimensionality Reduction and Data Representation》,2003,MIT

From: Spectral clustering for image segmentation

我很是喜欢的一个示例代码

print(__doc__) # Authors: Emmanuelle Gouillart <emmanuelle.gouillart@normalesup.org> # Gael Varoquaux <gael.varoquaux@normalesup.org> # License: BSD 3 clause

import numpy as np import matplotlib.pyplot as plt from sklearn.feature_extraction import image from sklearn.cluster import spectral_clustering l = 100 x, y = np.indices((l, l)) center1 = (28, 24) center2 = (40, 50) center3 = (67, 58) center4 = (24, 70) radius1, radius2, radius3, radius4 = 16, 14, 15, 14 circle1 = (x - center1[0]) ** 2 + (y - center1[1]) ** 2 < radius1 ** 2 circle2 = (x - center2[0]) ** 2 + (y - center2[1]) ** 2 < radius2 ** 2 circle3 = (x - center3[0]) ** 2 + (y - center3[1]) ** 2 < radius3 ** 2 circle4 = (x - center4[0]) ** 2 + (y - center4[1]) ** 2 < radius4 ** 2

# ############################################################################# # 4 circles
img = circle1 + circle2 + circle3 + circle4 # We use a mask that limits to the foreground: the problem that we are # interested in here is not separating the objects from the background, # but separating them one from the other.
mask = img.astype(bool) img = img.astype(float) img += 1 + 0.2 * np.random.randn(*img.shape) # Convert the image into a graph with the value of the gradient on the # edges. 携带了梯度值，有什么用么？
graph = image.img_to_graph(img, mask=mask) # Take a decreasing function of the gradient: we take it weakly # dependent from the gradient the segmentation is close to a voronoi
graph.data = np.exp(-graph.data / graph.data.std()) # Force the solver to be arpack, since amg is numerically # unstable on this example
labels = spectral_clustering(graph, n_clusters=4, eigen_solver='arpack') label_im = -np.ones(mask.shape) label_im[mask] = labels plt.matshow(img) plt.matshow(label_im) # ############################################################################# # 2 circles
img = circle1 + circle2 mask = img.astype(bool) img = img.astype(float) img += 1 + 0.2 * np.random.randn(*img.shape) graph = image.img_to_graph(img, mask=mask) graph.data = np.exp(-graph.data / graph.data.std()) labels = spectral_clustering(graph, n_clusters=2, eigen_solver='arpack') label_im = -np.ones(mask.shape) label_im[mask] = labels plt.matshow(img) plt.matshow(label_im) plt.show()

Result:

算法理解：

# Convert the image into a graph with the value of the gradient on the # edges. 携带了梯度值，有什么用么？
graph = image.img_to_graph(img, mask=mask) # Take a decreasing function of the gradient: we take it weakly # dependent from the gradient the segmentation is close to a voronoi
graph.data = np.exp(-graph.data / graph.data.std()) # Force the solver to be arpack, since amg is numerically # unstable on this example
labels = spectral_clustering(graph, n_clusters=4, eigen_solver='arpack') label_im = -np.ones(mask.shape) label_im[mask] = labels

泰森多边形 voronoi

参考：Dimensionality Reduction: Why we take Eigenvectors of the Similarity Matrix?