Previous |  Up |  Next

Article

Keywords:
clustering; data mining; $k$-means; Voronoi diagram
Summary:
The paper gives a new interpretation and a possible optimization of the well-known $k$-means algorithm for searching for a locally optimal partition of the set $\mathcal {A}= \{a_i\in \mathbb {R}^n\colon i=1,\dots ,m\}$ which consists of $k$ disjoint nonempty subsets $\pi _1,\dots ,\pi _k$, $1\leq k\leq m$. For this purpose, a new divided $k$-means algorithm was constructed as a limit case of the known smoothed $k$-means algorithm. It is shown that the algorithm constructed in this way coincides with the $k$-means algorithm if during the iterative procedure no data points appear in the Voronoi diagram. If in the partition obtained by applying the divided $k$-means algorithm there are data points lying in the Voronoi diagram, it is shown that the obtained result can be improved further.
References:
[1] Aurenhammer, F., Klein, R.: Voronoi Diagrams. Handbook of Computational Geometry J.-R. Sack et al. North-Holland Amsterdam (2000), 201-290. MR 1746678 | Zbl 0995.65024
[2] Bagirov, A. M.: Modified global $k$-means algorithm for minimum sum-of-squares clustering problems. Pattern Recognition 41 (2008), 3192-3199. DOI 10.1016/j.patcog.2008.04.004 | Zbl 1147.68669
[3] Bagirov, A. M., Ugon, J., Webb, D.: Fast modified global $k$-means algorithm for incremental cluster construction. Pattern Recognition 44 (2011), 866-876. DOI 10.1016/j.patcog.2010.10.018 | Zbl 1213.68514
[4] Dennis, J. E. jun, Schnabel, R. B.: Numerical Methods for Unconstrained Optimization and Nonlinear Equations. Classics in Applied Mathematics 16 SIAM, Philadelphia (1996). MR 1376139
[5] Finkel, D. E.: DIRECT Optimization Algorithm User Guide, Center for Research in Scientific Computation. North Carolina State University (2003), http://www4.ncsu.edu/ definkel/research/index.html.
[6] Floudas, C. A., Gounaris, C. E.: A review of recent advances in global optimization. J. Glob. Optim. 45 (2009), 3-38. DOI 10.1007/s10898-008-9332-8 | MR 2533740 | Zbl 1180.90245
[7] Gan, G., Ma, C., Wu, J.: Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability 20 SIAM, Philadelphia (2007). MR 2331172 | Zbl 1185.68274
[8] Grbić, R., Nyarko, E. K., Scitovski, R.: A modification of the {DIRECT} method for Lipschitz global optimization for a symmetric function. J. Glob. Optim. 57 (2013), 1193-1212. DOI 10.1007/s10898-012-0020-3 | MR 3121789 | Zbl 1279.65076
[9] Iyigun, C.: Probabilistic Distance Clustering, Ph.D. thesis. Graduate School-New Brunswick Rutgers (2007). MR 2429670
[10] Iyigun, C., Ben-Israel, A.: A generalized Weiszfeld method for the multi-facility location problem. Oper. Res. Lett. 38 (2010), 207-214. DOI 10.1016/j.orl.2009.11.005 | MR 2608859 | Zbl 1188.90148
[11] Jones, D. R., Perttunen, C. D., Stuckman, B. E.: Lipschitzian optimization without the Lipschitz constant. J. Optimization Theory Appl. 79 (1993), 157-181. DOI 10.1007/BF00941892 | MR 1246501 | Zbl 0796.49032
[12] Kogan, J.: Introduction to Clustering Large and High-Dimensional Data. Cambridge University Press Cambridge (2007). MR 2297007 | Zbl 1183.62106
[13] Kogan, J., Nicholas, C., Wiacek, M.: Hybrid clustering of large high dimensional data. Proceedings of the Workshop on Text Mining M. Castellanos et al. SIAM (2007).
[14] Kogan, J., Teboulle, M.: Scaling clustering algorithms with Bregman distances. Proceedings of the Workshop on Text Mining (M. W. Berry et al., eds.), 2006.
[15] Leisch, F.: A toolbox for $K$-centroids cluster analysis. Comput. Stat. Data Anal. 51 (2006), 526-544. DOI 10.1016/j.csda.2005.10.006 | MR 2297469 | Zbl 1157.62439
[16] Likas, A., Vlassis, N., Verbeek, J. J.: The global $k$-means clustering algorithm. Pattern Recognition 36 (2003), 451-461. DOI 10.1016/S0031-3203(02)00060-2
[17] Ng, M. K.: A note on constrained $k$-means algorithms. Pattern Recognition 33 (2000), 525-519.
[18] Pintér, J. D.: Global Optimization in Action. Continuous and Lipschitz Optimization: Algorithms, Implementations and Applications. Nonconvex Optimization and Its Applications 6 Kluwer Academic Publishers, Dordrecht (1996). DOI 10.1007/978-1-4757-2502-5_13 | MR 1374104 | Zbl 0842.90110
[19] Sabo, K., Scitovski, R., Vazler, I.: One-dimensional center-based $l_1$-clustering method. Optim. Lett. 7 (2013), 5-22. DOI 10.1007/s11590-011-0389-9 | MR 3017090 | Zbl 1283.90034
[20] Sabo, K., Scitovski, R., Vazler, I., Zekić-Sušac, M.: Mathematical models of natural gas consumption. Energy Conversion and Management 52 (2011), 1721-1727. DOI 10.1016/j.enconman.2010.10.037
[21] Späth, H.: Cluster-Formation und Analyse. Theorie, FORTRAN-Programme und Beispiele. R. Oldenburg Verlag München (1983). Zbl 0536.62048
[22] Su, Z., Kogan, J.: Second order conditions for $k$-means clustering: Partitions vs. centroids. Text Mining 2008 Workshop (held in conjuction with the 8th SIAM International Conference on Data Mining) Atlanta (2008).
[23] Teboulle, M.: A unified continuous optimization framework for center-based clustering methods. J. Mach. Learn. Res. 8 (2007), 65-102. MR 2280215 | Zbl 1222.68318
[24] Volkovich, V., Kogan, J., Nicholas, C.: Building initial partitions through sampling techniques. Eur. J. Oper. Res. 183 (2007), 1097-1105. DOI 10.1016/j.ejor.2005.12.045 | MR 2343742 | Zbl 1135.90042
[25] Wang, W., Zhang, Y.: On fuzzy cluster validity indices. Fuzzy Sets Syst. 158 (2007), 2095-2117. MR 2405299 | Zbl 1123.62046
[26] Yang, X. S.: Firefly algorithms for multimodal optimization. O. Watanabe et al. Stochastic Algorithms: Foundations and Applications 5th international symposium, SAGA 2009, Sapporo, Japan. Proceedings. Springer, Berlin. Lecture Notes in Computer Science {\it 5792} (2009), 169-178. DOI 10.1007/978-3-642-04944-6_14 | MR 2580252 | Zbl 1260.90164
Partner of
EuDML logo