Title:
|
Interpretation and optimization of the $k$-means algorithm (English) |
Author:
|
Sabo, Kristian |
Author:
|
Scitovski, Rudolf |
Language:
|
English |
Journal:
|
Applications of Mathematics |
ISSN:
|
0862-7940 (print) |
ISSN:
|
1572-9109 (online) |
Volume:
|
59 |
Issue:
|
4 |
Year:
|
2014 |
Pages:
|
391-406 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
The paper gives a new interpretation and a possible optimization of the well-known $k$-means algorithm for searching for a locally optimal partition of the set $\mathcal {A}= \{a_i\in \mathbb {R}^n\colon i=1,\dots ,m\}$ which consists of $k$ disjoint nonempty subsets $\pi _1,\dots ,\pi _k$, $1\leq k\leq m$. For this purpose, a new divided $k$-means algorithm was constructed as a limit case of the known smoothed $k$-means algorithm. It is shown that the algorithm constructed in this way coincides with the $k$-means algorithm if during the iterative procedure no data points appear in the Voronoi diagram. If in the partition obtained by applying the divided $k$-means algorithm there are data points lying in the Voronoi diagram, it is shown that the obtained result can be improved further. (English) |
Keyword:
|
clustering |
Keyword:
|
data mining |
Keyword:
|
$k$-means |
Keyword:
|
Voronoi diagram |
MSC:
|
62H30 |
MSC:
|
68T10 |
MSC:
|
90C26 |
MSC:
|
91C20 |
idZBL:
|
Zbl 06362235 |
idMR:
|
MR3233551 |
DOI:
|
10.1007/s10492-014-0063-5 |
. |
Date available:
|
2014-07-14T09:01:30Z |
Last updated:
|
2020-07-02 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/143871 |
. |
Reference:
|
[1] Aurenhammer, F., Klein, R.: Voronoi Diagrams.Handbook of Computational Geometry J.-R. Sack et al. North-Holland Amsterdam (2000), 201-290. Zbl 0995.65024, MR 1746678 |
Reference:
|
[2] Bagirov, A. M.: Modified global $k$-means algorithm for minimum sum-of-squares clustering problems.Pattern Recognition 41 (2008), 3192-3199. Zbl 1147.68669, 10.1016/j.patcog.2008.04.004 |
Reference:
|
[3] Bagirov, A. M., Ugon, J., Webb, D.: Fast modified global $k$-means algorithm for incremental cluster construction.Pattern Recognition 44 (2011), 866-876. Zbl 1213.68514, 10.1016/j.patcog.2010.10.018 |
Reference:
|
[4] Dennis, J. E. jun, Schnabel, R. B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations.Classics in Applied Mathematics 16 SIAM, Philadelphia (1996). MR 1376139 |
Reference:
|
[5] Finkel, D. E.: DIRECT Optimization Algorithm User Guide, Center for Research in Scientific Computation.North Carolina State University (2003), http://www4.ncsu.edu/ definkel/research/index.html. |
Reference:
|
[6] Floudas, C. A., Gounaris, C. E.: A review of recent advances in global optimization.J. Glob. Optim. 45 (2009), 3-38. Zbl 1180.90245, MR 2533740, 10.1007/s10898-008-9332-8 |
Reference:
|
[7] Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications.ASA-SIAM Series on Statistics and Applied Probability 20 SIAM, Philadelphia (2007). Zbl 1185.68274, MR 2331172 |
Reference:
|
[8] Grbić, R., Nyarko, E. K., Scitovski, R.: A modification of the {DIRECT} method for Lipschitz global optimization for a symmetric function.J. Glob. Optim. 57 (2013), 1193-1212. Zbl 1279.65076, MR 3121789, 10.1007/s10898-012-0020-3 |
Reference:
|
[9] Iyigun, C.: Probabilistic Distance Clustering, Ph.D. thesis.Graduate School-New Brunswick Rutgers (2007). MR 2429670 |
Reference:
|
[10] Iyigun, C., Ben-Israel, A.: A generalized Weiszfeld method for the multi-facility location problem.Oper. Res. Lett. 38 (2010), 207-214. Zbl 1188.90148, MR 2608859, 10.1016/j.orl.2009.11.005 |
Reference:
|
[11] Jones, D. R., Perttunen, C. D., Stuckman, B. E.: Lipschitzian optimization without the Lipschitz constant.J. Optimization Theory Appl. 79 (1993), 157-181. Zbl 0796.49032, MR 1246501, 10.1007/BF00941892 |
Reference:
|
[12] Kogan, J.: Introduction to Clustering Large and High-Dimensional Data.Cambridge University Press Cambridge (2007). Zbl 1183.62106, MR 2297007 |
Reference:
|
[13] Kogan, J., Nicholas, C., Wiacek, M.: Hybrid clustering of large high dimensional data.Proceedings of the Workshop on Text Mining M. Castellanos et al. SIAM (2007). |
Reference:
|
[14] Kogan, J., Teboulle, M.: Scaling clustering algorithms with Bregman distances.Proceedings of the Workshop on Text Mining (M. W. Berry et al., eds.), 2006. |
Reference:
|
[15] Leisch, F.: A toolbox for $K$-centroids cluster analysis.Comput. Stat. Data Anal. 51 (2006), 526-544. Zbl 1157.62439, MR 2297469, 10.1016/j.csda.2005.10.006 |
Reference:
|
[16] Likas, A., Vlassis, N., Verbeek, J. J.: The global $k$-means clustering algorithm.Pattern Recognition 36 (2003), 451-461. 10.1016/S0031-3203(02)00060-2 |
Reference:
|
[17] Ng, M. K.: A note on constrained $k$-means algorithms.Pattern Recognition 33 (2000), 525-519. |
Reference:
|
[18] Pintér, J. D.: Global Optimization in Action. Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications.Nonconvex Optimization and Its Applications 6 Kluwer Academic Publishers, Dordrecht (1996). Zbl 0842.90110, MR 1374104, 10.1007/978-1-4757-2502-5_13 |
Reference:
|
[19] Sabo, K., Scitovski, R., Vazler, I.: One-dimensional center-based $l_1$-clustering method.Optim. Lett. 7 (2013), 5-22. Zbl 1283.90034, MR 3017090, 10.1007/s11590-011-0389-9 |
Reference:
|
[20] Sabo, K., Scitovski, R., Vazler, I., Zekić-Sušac, M.: Mathematical models of natural gas consumption.Energy Conversion and Management 52 (2011), 1721-1727. 10.1016/j.enconman.2010.10.037 |
Reference:
|
[21] Späth, H.: Cluster-Formation und Analyse. Theorie, FORTRAN-Programme und Beispiele.R. Oldenburg Verlag München (1983). Zbl 0536.62048 |
Reference:
|
[22] Su, Z., Kogan, J.: Second order conditions for $k$-means clustering: Partitions vs. centroids.Text Mining 2008 Workshop (held in conjuction with the 8th SIAM International Conference on Data Mining) Atlanta (2008). |
Reference:
|
[23] Teboulle, M.: A unified continuous optimization framework for center-based clustering methods.J. Mach. Learn. Res. 8 (2007), 65-102. Zbl 1222.68318, MR 2280215 |
Reference:
|
[24] Volkovich, V., Kogan, J., Nicholas, C.: Building initial partitions through sampling techniques.Eur. J. Oper. Res. 183 (2007), 1097-1105. Zbl 1135.90042, MR 2343742, 10.1016/j.ejor.2005.12.045 |
Reference:
|
[25] Wang, W., Zhang, Y.: On fuzzy cluster validity indices.Fuzzy Sets Syst. 158 (2007), 2095-2117. Zbl 1123.62046, MR 2405299 |
Reference:
|
[26] Yang, X. S.: Firefly algorithms for multimodal optimization.O. Watanabe et al. Stochastic Algorithms: Foundations and Applications 5th international symposium, SAGA 2009, Sapporo, Japan. Proceedings. Springer, Berlin. Lecture Notes in Computer Science {\it 5792} (2009), 169-178. Zbl 1260.90164, MR 2580252, 10.1007/978-3-642-04944-6_14 |
. |