Previous |  Up |  Next

Article

Full entry | Fulltext not available (moving wall 24 months)      Feedback
Keywords:
supervised learning; trained model; perturbations; effect of rounding; low-precision arithmetic
Summary:
Post-training rounding, also known as quantization, of estimated parameters stands as a widely adopted technique for mitigating energy consumption and latency in machine learning models. This theoretical endeavor delves into the examination of the impact of rounding estimated parameters in key regression methods within the realms of statistics and machine learning. The proposed approach allows for the perturbation of parameters through an additive error with values within a specified interval. This method is elucidated through its application to linear regression and is subsequently extended to encompass radial basis function networks, multilayer perceptrons, regularization networks, and logistic regression, maintaining a consistent approach throughout.
References:
[1] Agresti, A.: Foundations of Linear and Generalized Linear Models. Wiley Series in Probability and Statistics. John Wiley & Sons, Hoboken (2015). MR 3308143 | Zbl 1309.62001
[2] Blokdyk, G.: Artificial Neural Network: A Complete Guide. 5STARCooks, Toronto (2021).
[3] Carroll, R. J., Ruppert, D., Stefanski, L. A., Crainiceanu, C. M.: Measurement Error in Nonlinear Models: A Modern Perspective. Monographs on Statistics and Applied Probability 105. Chapman & Hall/CRC, Boca Raton (2006). DOI 10.1201/9781420010138 | MR 2243417 | Zbl 1119.62063
[4] Croci, M., Fasi, M., Higham, N. J., Mary, T., Mikaitis, M.: Stochastic rounding: Implementation, error analysis and applications. R. Soc. Open Sci. 9 (2022), Article ID 211631, 25 pages. DOI 10.1098/rsos.211631
[5] Egrioglu, E., Bas, E., Karahasan, O.: Winsorized dendritic neuron model artificial neural network and a robust training algorithm with Tukey's biweight loss function based on particle swarm optimization. Granul. Comput. 8 (2023), 491-501. DOI 10.1007/s41066-022-00345-y
[6] Fasi, M., Higham, N. J., Mikaitis, M., Pranesh, S.: Numerical behavior of NVIDIA tensor cores. PeerJ Computer Sci. 7 (2021), Article ID e330, 19 pages. DOI 10.7717/peerj-cs.330
[7] Gao, F., Li, B., Chen, L., Shang, Z., Wei, X., He, C.: A softmax classifier for high-precision classification of ultrasonic similar signals. Ultrasonics 112 (2021), Article ID 106344, 8 pages. DOI 10.1016/j.ultras.2020.106344
[8] Greene, W. H.: Econometric Analysis. Pearson Education, Harlow (2018).
[9] Hastie, T., Tibshirani, R., Wainwright, R.: Statistical Learning with Sparsity: The Lasso and Generalizations. Monographs on Statistics and Applied Probability 143. CRC Press, Boca Raton (2015). DOI 10.1201/b18401 | MR 3616141 | Zbl 1319.68003
[10] Hoefler, T., Alistarh, D., Ben-Nun, T., Dryden, N., Peste, A.: Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks. J. Mach. Learn. Res. 22 (2021), Article ID 241, 124 pages. MR 4329820 | Zbl 07626756
[11] Kalina, J., Tichavský, J.: On robust estimation of error variance in (highly) robust regression. Measurement Sci. Rev. 20 (2020), 6-14. DOI 10.2478/msr-2020-0002
[12] Kalina, J., Vidnerová, P., Soukup, L.: Modern approaches to statistical estimation of measurements in the location model and regression. Handbook of Metrology and Applications Springer, Singapore (2023), 2355-2376. DOI 10.1007/978-981-99-2074-7_125
[13] Louizos, C., Reisser, M., Blankevoort, T., Gavves, E., Welling, M.: Relaxed quantization for discretized neural networks. Available at https://arxiv.org/abs/1810.01875 (2018), 14 pages. DOI 10.48550/arXiv.1810.01875
[14] Maddox, W. J., Potapczynski, A., Wilson, A. G.: Low-precision arithmetic for fast Gaussian processes. Proc. Mach. Learn. Res. 180 (2022), 1306-1316.
[15] Nagel, M., Fournarakis, M., Amjad, R. A., Bondarenko, Y., Baalen, M. van, Blankevoort, T.: A white paper on neural network quantization. Available at https://arxiv.org/abs/2106.08295 (2021), 27 pages. DOI 10.48550/arXiv.2106.08295
[16] Park, J.-H., Kim, K.-M., Lee, S.: Quantized sparse training: A unified trainable framework for joint pruning and quantization in DNNs. ACM Trans. Embedded Comput. Syst. 21 (2022), Article ID 60, 22 pages. DOI 10.1145/3524066
[17] Pillonetto, G.: System identification using kernel-based regularization: New insights on stability and consistency issues. Automatica 93 (2018), 321-332. DOI 10.1016/j.automatica.2018.03.065 | MR 3810919 | Zbl 1400.93316
[18] Riazoshams, H., Midi, H., Ghilagaber, G.: Robust Nonlinear Regression with Applications Using R. John Wiley & Sons, Hoboken (2019). DOI 10.1002/9781119010463 | MR 3839600 | Zbl 1407.62022
[19] Saleh, A. K. M. E., Picek, J., Kalina, J.: R-estimation of the parameters of a multiple regression model with measurement errors. Metrika 75 (2012), 311-328. DOI 10.1007/s00184-010-0328-2 | MR 2909549 | Zbl 1239.62081
[20] Seghouane, A.-K., Shokouhi, N.: Adaptive learning for robust radial basis function networks. IEEE Trans. Cybernetics 51 (2021), 2847-2856. DOI 10.1109/TCYB.2019.2951811
[21] Shultz, K. S., Whitney, D., Zickar, M. J.: Measurement Theory in Action: Case Studies and Exercises. Routledge, New York (2020). DOI 10.4324/9781315869834
[22] Šíma, J., Vidnerová, P., Mrázek, V.: Energy complexity model for convolutional neural networks. Artificial Neural Networks and Machine Learning -- ICANN 2023 Lecture Notes in Computer Science 14263. Springer, Cham (2023), 186-198. DOI 10.1007/978-3-031-44204-9_16 | MR 4776700
[23] Smucler, E., Yohai, V. J.: Robust and sparse estimators for linear regression models. Comput. Stat. Data Anal. 111 (2017), 116-130. DOI 10.1016/j.csda.2017.02.002 | MR 3630222 | Zbl 1464.62164
[24] Sze, V., Chen, Y.-H., Yang, T.-J., Emer, J. S.: Efficient processing of deep neural networks: A tutorial and survey. Proc. IEEE 105 2017 (2295-2329). DOI 10.1109/JPROC.2017.2761740 | MR 3784727
[25] Víšek, J.Á.: Consistency of the least weighted squares under heteroscedasticity. Kybernetika 47 (2011), 179-206. MR 2828572 | Zbl 1220.62064
[26] Wang, N., Choi, J., Brand, D., Chen, C.-Y., Gopalakrishnan, K.: Training deep neural networks with 8-bit floating point numbers. NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems Curran Associates, New York (2018), 7686-7695. DOI 10.5555/3327757.3327866
[27] Yan, W. Q.: Computational Methods for Deep Learning: Theory, Algorithms, and Implementations. Texts in Computer Science. Springer, Singapore (2023). DOI 10.1007/978-981-99-4823-9 | MR 4660076 | Zbl 7783714
[28] Yu, J., Anitescu, M.: Multidimensional sum-up rounding for integer programming in optimal experimental design. Math. Program. 185 (2021), 37-76 \99999DOI99999 10.1007/s10107-019-01421-z . DOI 10.1007/s10107-019-01421-z | MR 4201708 | Zbl 1458.62158
[29] Zhang, R., Wilson, A. G., Sa, C. De: Low-precision stochastic gradient Langevin dynamics. Proc. Mach. Learn. Res. 162 (2022), 26624-26644.
Partner of
EuDML logo