In some cases, one Gaussian distribution cannot represent p(x|θ), (see red model in figure 1 ), thus in this chapter we want to estimate the mixture density of multivariate Gaussians.web
thus
γj(xn)
represents
“responsibility of component j for mixture density given xn”, if we can estimate
γj(xn)
, then we can obtain
uj
; and K-Means cluster is helpful.
3. K-Means cluster
K-Means cluster aims to assign data to one of the K clusters according to the distance to the mean of each cluster.this
3.1 steps
step1:Initialization: pick K arbitrary centroids (cluster means)atom
step2:Assign each sample to the closest centroid.lua
step3:Adjust the centroids to be the means of the samples assigned to them.spa
step4:Go to step 2 until no change in step3;3d
figure2 the process of K-Means cluster (K = 2)
3.2 Objective function
K-Means optimizes the following objective function: rest
rnk
is an indicator variable that checks whether
uk
is the nearest cluster center to point
xn
.
3.3 Advantages and Disadvantages
Advantage:component
simple and fast to compute
converge to local minimum of within-cluster squared error
Disadvantage:
sensitive to initialization
sensitive to outliers
difficult to set K properly
only detect spherical clusters
figure3 the problem of K-Means cluster (K = 2)
4 .EM Algorithm
Once we use K-Means cluster to get the mean of each cluster, then we have θj=(uj,Σj), we can estimate the “responsibility” of component j for mixture density γj(xn).
4.1 K-Means Clustering Revisited
step1:Initialization pick K arbitrary centroids [compute θ0j=(μ0j,Σ0j)]
step2:Assign each sample to the closest centroid. [compute γj(xn)⇒ Estep]
step3:Adjust the centroids to be the means of the samples assigned to them, [compute θτj=(μτj,Στj)⇒ Mstep]
step4: Go to step 2 (until no change)
The process is almost same with K-Means cluster, but in K-Means one point only depends on one distribution, no concept like γj(xn) .
4.2 Estep & Mstep
Estep: softly assign samples to mixture components
Very general, can represent any (continuous) distribution.
Once trained, very fast to evaluate.
Can be updated online.
4.4 Caveats
introduce regularization instead of Σ−1, use (Σ+σ)−1 to avoid Σ−1=0 causing p(xn|θj)goes to infinite
Initialize with k-Means to get better results Typical steps: Run k-Means M times (e.g. M = 10~100) Pick the best result (lowest error J) Use this result to initialize EM
EM for MoG is computational expensive
Need to select the number of mixture components K properly⇒ model selection problem