Previous |  Up |  Next

Article

Title: Seasonal time-series imputation of gap missing algorithm (STIGMA) (English)
Author: Rangel-Heras, Eduardo
Author: Zuniga, Pavel
Author: Alanis, Alma Y.
Author: Hernandez-Vargas, Esteban A.
Author: Sanchez, Oscar D.
Language: English
Journal: Kybernetika
ISSN: 0023-5954 (print)
ISSN: 1805-949X (online)
Volume: 59
Issue: 6
Year: 2023
Pages: 861-879
Summary lang: English
.
Category: math
.
Summary: This work presents a new approach for the imputation of missing data in weather time-series from a seasonal pattern; the seasonal time-series imputation of gap missing algorithm (STIGMA). The algorithm takes advantage from a seasonal pattern for the imputation of unknown data by averaging available data. We test the algorithm using data measured every $10$ minutes over a period of $365$ days during the year 2010; the variables include global irradiance, diffuse irradiance, ultraviolet irradiance, and temperature, arranged in a matrix of dimensions $52,560$ rows for data points over time and $4$ columns for weather variables. The particularity of this work is that the algorithm is well-suited for the imputation of values when the missing data are presented continuously and in seasonal patterns. The algorithm employs a date-time index to collect available data for the imputation of missing data, repeating the process until all missing values are calculated. The tests are performed by removing $5\%$, $10\%$, $15\%$, $20\%$, $25\%$, and $30\%$ of the available data, and the results are compared to autoregressive models. The proposed algorithm has been successfully tested with a maximum of $2,736$ contiguous missing values that account for $19$ consecutive days of a single month; this dataset is a portion of all the missing values when the time-series lacks $30\%$ of all data. The metrics to measure the performance of the algorithms are root-mean-square error (RMSE) and the coefficient of determination ($R^{2}$). The results indicate that the proposed algorithm outperforms autoregressive models while preserving the seasonal behavior of the time-series. The STIGMA is also tested with non-weather time-series of beer sales and number of air passengers per month, which also have a cyclical pattern, and the results show the precise imputation of data. (English)
Keyword: contiguous missing values
Keyword: seasonal patterns
Keyword: time-series
MSC: 62-04
MSC: 68Pxx
idZBL: Zbl 07830568
DOI: 10.14736/kyb-2023-6-0861
.
Date available: 2024-02-26T11:11:44Z
Last updated: 2024-08-02
Stable URL: http://hdl.handle.net/10338.dmlcz/152261
.
Reference: [1] Ahn, H., Sun, K., Kim, K. P.: Comparison of missing data imputation methods in time series forecasting..Computers Materials Continua 70 (2022), 767-779.
Reference: [2] Anava, O., Hazan, E., Zeevi, A.: International Conference on Machine Learning..Proc. Machine Learning Research, Lille 2015.
Reference: [3] Bashir, F., Wei, H. L.: Handling missing data in multivariate time series using a vector autoregressive model-imputation (VAR-IM) algorithm..Neurocomputing 276 (2018), 23-30.
Reference: [4] Batista, G. E. A. P. A., Monard, M. C.: An analysis of four missing data treatment methods for supervised learning..Appl. Artific. Intell. 17 (2003), 519-533.
Reference: [5] Bras, L. P., Menezes, J. C.: Dealing with gene expression missing data..IEE Proceedings - Systems Biology, 153 (2006), 105-119.
Reference: [6] Brown, S., Tauler, R., Walczak, B.: Comprehensive Chemometrics: Chemical and Biochemical Data Analysis. (Second edition.).Elsevier, Smsterdam 2020.
Reference: [7] Choong, M. K., Charbit, M., Yan, H.: Autoregressive-model-based missing value estimation for DNA microarray time series data..IEEE Trans. Inform. Technol. Biomedicine 13 (2009), 131-137.
Reference: [8] Dan, E. L., Dinşoreanu, M., Mureşan, R. C.: 2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR)..IEEE, London 2020.
Reference: [9] Dunsmuir, W., Robinson, P. M.: Estimation of time series models in the presence of missing data..J. Amer. Statist. Assoc. 76 (1981), 560-568.
Reference: [10] Folch-Fortuny, A., Arteaga, F., Ferrer, A.: Enabling network inference methods to handle missing data and outliers..BMC Bioinformatics 16 (2015), 1-12.
Reference: [11] Folch-Fortuny, A., Arteaga, F., Ferrer, A.: PCA model building with missing data: New proposals and a comparative study..Chemometr. Intell. Labor. Systems 146 (2015), 77-88.
Reference: [12] Folch-Fortuny, A., Arteaga, F., Ferrer, A.: Missing data imputation toolbox for MATLAB..Chemometr. Intell. Labor. Systems 154 (2016), 93-100.
Reference: [13] González-Martíneza, J. M., Noord, O. E. de, Ferrer, A.: Multisynchro: a novel approach for batch synchronization in scenarios of multiple asynchronisms..J. Chemometr. 28 (2014), 462-475.
Reference: [14] Hui, D., Wan, S., Su, B, Katul, G., Monson, R., Luo, Y.: Gap-filling missing data in eddy covariance measurements using multiple imputation (MI) for annual estimations..Agricultur. Forest Meteorology 121 (2004), 93-111.
Reference: [15] Junger, W. L., Leon, A. Ponce de: Imputation of missing data in time series for air pollutants..Atmosph. Environment 102 (2015), 96-104.
Reference: [16] Liu, S., Molenaar, P. C. M.: iVAR: A program for imputing missing data in multivariate time series using vector autoregressive models..Behavior Res. Methods 46 (2014), 1138-1148.
Reference: [17] Magán-Carrión, R., Pulido-Pulido, F., Camacho, J., García-Teodoro, P.: Tampered data recovery in WSNs through dynamic PCA and variable routing strategies..J. Commun. 8 (2013), 738-750.
Reference: [18] Makridakis, S., Wheelwright, S. C., Hyndman, R. J.: Forecasting: Methods and Applications. (Third edition.).Wiley, India 2008.
Reference: [19] Montgomery, D. C.: Statistical Quality Control. (Sixth edition.).Wiley, New York 2005.
Reference: [20] Murad, H., Dankner, R., Berlin, A., Olmer, L., Freedman, L. S.: Imputing missing time-dependent covariate values for the discrete time Cox model..Statist. Methods Medical Res. 29 (2020), 2074-2086. MR 4128979,
Reference: [21] Neves, D. T., Alves, J., Naik, M. G., Proenca, A. J., Prasser, F.: From missing data imputation to data generation..J. Comput. Sci. 61 (2022), 101640.
Reference: [22] Noor, N. M., Bakri-Abdullah, M. M. Al, Yahaya, A. Shukri, Ramli, N. A.: Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set..Trans Tech Publications, Switzerland 2014.
Reference: [23] Pedreschi, R., Hertog, M. L. A. T. M., Carpentier, S. C., Lammertyn, J., Robben, J., Noben, J. P., Panis, B., Swennen, R., Nicola, B. M.: Treatment of missing values for multivariate statistical analysis of gel-based proteomics data..Proteomics 29 (2008), 1371-1383.
Reference: [24] Quevedo, J., Puig, V., Cembrano, G., Aguilar, J., Isaza, C., Saporta, D., Benito, G., Hedo, M., Molina, A.: Estimating missing and false data in flow meters of a water distribution network..IFAC Proc. Vol. 39 (2006), 1181-1186.
Reference: [25] Sun, Y., Li, J., Xu, Y., Zhang, T., Wang, X.: Deep learning versus conventional methods for missing data imputation: A review and comparative study..Expert Systems Appl. 227 (2023), 120-201. MR 4523179,
Reference: [26] Zarzo, M., Martí, P.: Modeling the variability of solar radiation data among weather stations by means of principal components analysis..Appl. Energy 88 (2011), 2775-2784.
Reference: [27] Zhang, Z.: Missing data imputation: focusing on single imputation..AME Publ. 4 (2016), 1-8.
.

Files

Files Size Format View
Kybernetika_59-2023-6_4.pdf 16.81Mb application/pdf View/Open
Back to standard record
Partner of
EuDML logo