Warning: imagejpeg(C:\Inetpub\vhosts\kidney.de\httpdocs\phplern\30073045
.jpg): Failed to open stream: No such file or directory in C:\Inetpub\vhosts\kidney.de\httpdocs\pget.php on line 117 Stat+Anal+Data+Min
2018 ; 11
(4
): 153-166
Nephropedia Template TP
gab.com Text
Twit Text FOAVip
Twit Text #
English Wikipedia
The next-generation K-means algorithm
#MMPMID30073045
Demidenko E
Stat Anal Data Min
2018[Aug]; 11
(4
): 153-166
PMID30073045
show ga
Typically, when referring to a model-based classification, the mixture
distribution approach is understood. In contrast, we revive the
hard-classification model-based approach developed by Banfield and Raftery (1993)
for which K-means is equivalent to the maximum likelihood (ML) estimation. The
next-generation K-means algorithm does not end after the classification is
achieved, but moves forward to answer the following fundamental questions: Are
there clusters, how many clusters are there, what are the statistical properties
of the estimated means and index sets, what is the distribution of the
coefficients in the clusterwise regression, and how to classify multilevel data?
The statistical model-based approach for the K-means algorithm is the key,
because it allows statistical simulations and studying the properties of
classification following the track of the classical statistics. This paper
illustrates the application of the ML classification to testing the no-clusters
hypothesis, to studying various methods for selection of the number of clusters
using simulations, robust clustering using Laplace distribution, studying
properties of the coefficients in clusterwise regression, and finally to
multilevel data by marrying the variance components model with K-means.