Previous |  Up |  Next

Article

Title: Interpretation and optimization of the $k$-means algorithm (English)
Author: Sabo, Kristian
Author: Scitovski, Rudolf
Language: English
Journal: Applications of Mathematics
ISSN: 0862-7940 (print)
ISSN: 1572-9109 (online)
Volume: 59
Issue: 4
Year: 2014
Pages: 391-406
Summary lang: English
.
Category: math
.
Summary: The paper gives a new interpretation and a possible optimization of the well-known $k$-means algorithm for searching for a locally optimal partition of the set $\mathcal {A}= \{a_i\in \mathbb {R}^n\colon i=1,\dots ,m\}$ which consists of $k$ disjoint nonempty subsets $\pi _1,\dots ,\pi _k$, $1\leq k\leq m$. For this purpose, a new divided $k$-means algorithm was constructed as a limit case of the known smoothed $k$-means algorithm. It is shown that the algorithm constructed in this way coincides with the $k$-means algorithm if during the iterative procedure no data points appear in the Voronoi diagram. If in the partition obtained by applying the divided $k$-means algorithm there are data points lying in the Voronoi diagram, it is shown that the obtained result can be improved further. (English)
Keyword: clustering
Keyword: data mining
Keyword: $k$-means
Keyword: Voronoi diagram
MSC: 62H30
MSC: 68T10
MSC: 90C26
MSC: 91C20
idZBL: Zbl 06362235
idMR: MR3233551
DOI: 10.1007/s10492-014-0063-5
.
Date available: 2014-07-14T09:01:30Z
Last updated: 2020-07-02
Stable URL: http://hdl.handle.net/10338.dmlcz/143871
.
Reference: [1] Aurenhammer, F., Klein, R.: Voronoi Diagrams.Handbook of Computational Geometry J.-R. Sack et al. North-Holland Amsterdam (2000), 201-290. Zbl 0995.65024, MR 1746678
Reference: [2] Bagirov, A. M.: Modified global $k$-means algorithm for minimum sum-of-squares clustering problems.Pattern Recognition 41 (2008), 3192-3199. Zbl 1147.68669, 10.1016/j.patcog.2008.04.004
Reference: [3] Bagirov, A. M., Ugon, J., Webb, D.: Fast modified global $k$-means algorithm for incremental cluster construction.Pattern Recognition 44 (2011), 866-876. Zbl 1213.68514, 10.1016/j.patcog.2010.10.018
Reference: [4] Dennis, J. E. jun, Schnabel, R. B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations.Classics in Applied Mathematics 16 SIAM, Philadelphia (1996). MR 1376139
Reference: [5] Finkel, D. E.: DIRECT Optimization Algorithm User Guide, Center for Research in Scientific Computation.North Carolina State University (2003), http://www4.ncsu.edu/ definkel/research/index.html.
Reference: [6] Floudas, C. A., Gounaris, C. E.: A review of recent advances in global optimization.J. Glob. Optim. 45 (2009), 3-38. Zbl 1180.90245, MR 2533740, 10.1007/s10898-008-9332-8
Reference: [7] Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications.ASA-SIAM Series on Statistics and Applied Probability 20 SIAM, Philadelphia (2007). Zbl 1185.68274, MR 2331172
Reference: [8] Grbić, R., Nyarko, E. K., Scitovski, R.: A modification of the {DIRECT} method for Lipschitz global optimization for a symmetric function.J. Glob. Optim. 57 (2013), 1193-1212. Zbl 1279.65076, MR 3121789, 10.1007/s10898-012-0020-3
Reference: [9] Iyigun, C.: Probabilistic Distance Clustering, Ph.D. thesis.Graduate School-New Brunswick Rutgers (2007). MR 2429670
Reference: [10] Iyigun, C., Ben-Israel, A.: A generalized Weiszfeld method for the multi-facility location problem.Oper. Res. Lett. 38 (2010), 207-214. Zbl 1188.90148, MR 2608859, 10.1016/j.orl.2009.11.005
Reference: [11] Jones, D. R., Perttunen, C. D., Stuckman, B. E.: Lipschitzian optimization without the Lipschitz constant.J. Optimization Theory Appl. 79 (1993), 157-181. Zbl 0796.49032, MR 1246501, 10.1007/BF00941892
Reference: [12] Kogan, J.: Introduction to Clustering Large and High-Dimensional Data.Cambridge University Press Cambridge (2007). Zbl 1183.62106, MR 2297007
Reference: [13] Kogan, J., Nicholas, C., Wiacek, M.: Hybrid clustering of large high dimensional data.Proceedings of the Workshop on Text Mining M. Castellanos et al. SIAM (2007).
Reference: [14] Kogan, J., Teboulle, M.: Scaling clustering algorithms with Bregman distances.Proceedings of the Workshop on Text Mining (M. W. Berry et al., eds.), 2006.
Reference: [15] Leisch, F.: A toolbox for $K$-centroids cluster analysis.Comput. Stat. Data Anal. 51 (2006), 526-544. Zbl 1157.62439, MR 2297469, 10.1016/j.csda.2005.10.006
Reference: [16] Likas, A., Vlassis, N., Verbeek, J. J.: The global $k$-means clustering algorithm.Pattern Recognition 36 (2003), 451-461. 10.1016/S0031-3203(02)00060-2
Reference: [17] Ng, M. K.: A note on constrained $k$-means algorithms.Pattern Recognition 33 (2000), 525-519.
Reference: [18] Pintér, J. D.: Global Optimization in Action. Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications.Nonconvex Optimization and Its Applications 6 Kluwer Academic Publishers, Dordrecht (1996). Zbl 0842.90110, MR 1374104, 10.1007/978-1-4757-2502-5_13
Reference: [19] Sabo, K., Scitovski, R., Vazler, I.: One-dimensional center-based $l_1$-clustering method.Optim. Lett. 7 (2013), 5-22. Zbl 1283.90034, MR 3017090, 10.1007/s11590-011-0389-9
Reference: [20] Sabo, K., Scitovski, R., Vazler, I., Zekić-Sušac, M.: Mathematical models of natural gas consumption.Energy Conversion and Management 52 (2011), 1721-1727. 10.1016/j.enconman.2010.10.037
Reference: [21] Späth, H.: Cluster-Formation und Analyse. Theorie, FORTRAN-Programme und Beispiele.R. Oldenburg Verlag München (1983). Zbl 0536.62048
Reference: [22] Su, Z., Kogan, J.: Second order conditions for $k$-means clustering: Partitions vs. centroids.Text Mining 2008 Workshop (held in conjuction with the 8th SIAM International Conference on Data Mining) Atlanta (2008).
Reference: [23] Teboulle, M.: A unified continuous optimization framework for center-based clustering methods.J. Mach. Learn. Res. 8 (2007), 65-102. Zbl 1222.68318, MR 2280215
Reference: [24] Volkovich, V., Kogan, J., Nicholas, C.: Building initial partitions through sampling techniques.Eur. J. Oper. Res. 183 (2007), 1097-1105. Zbl 1135.90042, MR 2343742, 10.1016/j.ejor.2005.12.045
Reference: [25] Wang, W., Zhang, Y.: On fuzzy cluster validity indices.Fuzzy Sets Syst. 158 (2007), 2095-2117. Zbl 1123.62046, MR 2405299
Reference: [26] Yang, X. S.: Firefly algorithms for multimodal optimization.O. Watanabe et al. Stochastic Algorithms: Foundations and Applications 5th international symposium, SAGA 2009, Sapporo, Japan. Proceedings. Springer, Berlin. Lecture Notes in Computer Science {\it 5792} (2009), 169-178. Zbl 1260.90164, MR 2580252, 10.1007/978-3-642-04944-6_14
.

Files

Files Size Format View
AplMat_59-2014-4_3.pdf 319.7Kb application/pdf View/Open
Back to standard record
Partner of
EuDML logo