Title:
|
Seasonal time-series imputation of gap missing algorithm (STIGMA) (English) |
Author:
|
Rangel-Heras, Eduardo |
Author:
|
Zuniga, Pavel |
Author:
|
Alanis, Alma Y. |
Author:
|
Hernandez-Vargas, Esteban A. |
Author:
|
Sanchez, Oscar D. |
Language:
|
English |
Journal:
|
Kybernetika |
ISSN:
|
0023-5954 (print) |
ISSN:
|
1805-949X (online) |
Volume:
|
59 |
Issue:
|
6 |
Year:
|
2023 |
Pages:
|
861-879 |
Summary lang:
|
English |
. |
Category:
|
math |
. |
Summary:
|
This work presents a new approach for the imputation of missing data in weather time-series from a seasonal pattern; the seasonal time-series imputation of gap missing algorithm (STIGMA). The algorithm takes advantage from a seasonal pattern for the imputation of unknown data by averaging available data. We test the algorithm using data measured every $10$ minutes over a period of $365$ days during the year 2010; the variables include global irradiance, diffuse irradiance, ultraviolet irradiance, and temperature, arranged in a matrix of dimensions $52,560$ rows for data points over time and $4$ columns for weather variables. The particularity of this work is that the algorithm is well-suited for the imputation of values when the missing data are presented continuously and in seasonal patterns. The algorithm employs a date-time index to collect available data for the imputation of missing data, repeating the process until all missing values are calculated. The tests are performed by removing $5\%$, $10\%$, $15\%$, $20\%$, $25\%$, and $30\%$ of the available data, and the results are compared to autoregressive models. The proposed algorithm has been successfully tested with a maximum of $2,736$ contiguous missing values that account for $19$ consecutive days of a single month; this dataset is a portion of all the missing values when the time-series lacks $30\%$ of all data. The metrics to measure the performance of the algorithms are root-mean-square error (RMSE) and the coefficient of determination ($R^{2}$). The results indicate that the proposed algorithm outperforms autoregressive models while preserving the seasonal behavior of the time-series. The STIGMA is also tested with non-weather time-series of beer sales and number of air passengers per month, which also have a cyclical pattern, and the results show the precise imputation of data. (English) |
Keyword:
|
contiguous missing values |
Keyword:
|
seasonal patterns |
Keyword:
|
time-series |
MSC:
|
62-04 |
MSC:
|
68Pxx |
idZBL:
|
Zbl 07830568 |
DOI:
|
10.14736/kyb-2023-6-0861 |
. |
Date available:
|
2024-02-26T11:11:44Z |
Last updated:
|
2024-08-02 |
Stable URL:
|
http://hdl.handle.net/10338.dmlcz/152261 |
. |
Reference:
|
[1] Ahn, H., Sun, K., Kim, K. P.: Comparison of missing data imputation methods in time series forecasting..Computers Materials Continua 70 (2022), 767-779. |
Reference:
|
[2] Anava, O., Hazan, E., Zeevi, A.: International Conference on Machine Learning..Proc. Machine Learning Research, Lille 2015. |
Reference:
|
[3] Bashir, F., Wei, H. L.: Handling missing data in multivariate time series using a vector autoregressive model-imputation (VAR-IM) algorithm..Neurocomputing 276 (2018), 23-30. |
Reference:
|
[4] Batista, G. E. A. P. A., Monard, M. C.: An analysis of four missing data treatment methods for supervised learning..Appl. Artific. Intell. 17 (2003), 519-533. |
Reference:
|
[5] Bras, L. P., Menezes, J. C.: Dealing with gene expression missing data..IEE Proceedings - Systems Biology, 153 (2006), 105-119. |
Reference:
|
[6] Brown, S., Tauler, R., Walczak, B.: Comprehensive Chemometrics: Chemical and Biochemical Data Analysis. (Second edition.).Elsevier, Smsterdam 2020. |
Reference:
|
[7] Choong, M. K., Charbit, M., Yan, H.: Autoregressive-model-based missing value estimation for DNA microarray time series data..IEEE Trans. Inform. Technol. Biomedicine 13 (2009), 131-137. |
Reference:
|
[8] Dan, E. L., Dinşoreanu, M., Mureşan, R. C.: 2020 IEEE International Conference on Automation, Quality and Testing, Robotics (AQTR)..IEEE, London 2020. |
Reference:
|
[9] Dunsmuir, W., Robinson, P. M.: Estimation of time series models in the presence of missing data..J. Amer. Statist. Assoc. 76 (1981), 560-568. |
Reference:
|
[10] Folch-Fortuny, A., Arteaga, F., Ferrer, A.: Enabling network inference methods to handle missing data and outliers..BMC Bioinformatics 16 (2015), 1-12. |
Reference:
|
[11] Folch-Fortuny, A., Arteaga, F., Ferrer, A.: PCA model building with missing data: New proposals and a comparative study..Chemometr. Intell. Labor. Systems 146 (2015), 77-88. |
Reference:
|
[12] Folch-Fortuny, A., Arteaga, F., Ferrer, A.: Missing data imputation toolbox for MATLAB..Chemometr. Intell. Labor. Systems 154 (2016), 93-100. |
Reference:
|
[13] González-Martíneza, J. M., Noord, O. E. de, Ferrer, A.: Multisynchro: a novel approach for batch synchronization in scenarios of multiple asynchronisms..J. Chemometr. 28 (2014), 462-475. |
Reference:
|
[14] Hui, D., Wan, S., Su, B, Katul, G., Monson, R., Luo, Y.: Gap-filling missing data in eddy covariance measurements using multiple imputation (MI) for annual estimations..Agricultur. Forest Meteorology 121 (2004), 93-111. |
Reference:
|
[15] Junger, W. L., Leon, A. Ponce de: Imputation of missing data in time series for air pollutants..Atmosph. Environment 102 (2015), 96-104. |
Reference:
|
[16] Liu, S., Molenaar, P. C. M.: iVAR: A program for imputing missing data in multivariate time series using vector autoregressive models..Behavior Res. Methods 46 (2014), 1138-1148. |
Reference:
|
[17] Magán-Carrión, R., Pulido-Pulido, F., Camacho, J., García-Teodoro, P.: Tampered data recovery in WSNs through dynamic PCA and variable routing strategies..J. Commun. 8 (2013), 738-750. |
Reference:
|
[18] Makridakis, S., Wheelwright, S. C., Hyndman, R. J.: Forecasting: Methods and Applications. (Third edition.).Wiley, India 2008. |
Reference:
|
[19] Montgomery, D. C.: Statistical Quality Control. (Sixth edition.).Wiley, New York 2005. |
Reference:
|
[20] Murad, H., Dankner, R., Berlin, A., Olmer, L., Freedman, L. S.: Imputing missing time-dependent covariate values for the discrete time Cox model..Statist. Methods Medical Res. 29 (2020), 2074-2086. MR 4128979, |
Reference:
|
[21] Neves, D. T., Alves, J., Naik, M. G., Proenca, A. J., Prasser, F.: From missing data imputation to data generation..J. Comput. Sci. 61 (2022), 101640. |
Reference:
|
[22] Noor, N. M., Bakri-Abdullah, M. M. Al, Yahaya, A. Shukri, Ramli, N. A.: Comparison of Linear Interpolation Method and Mean Method to Replace the Missing Values in Environmental Data Set..Trans Tech Publications, Switzerland 2014. |
Reference:
|
[23] Pedreschi, R., Hertog, M. L. A. T. M., Carpentier, S. C., Lammertyn, J., Robben, J., Noben, J. P., Panis, B., Swennen, R., Nicola, B. M.: Treatment of missing values for multivariate statistical analysis of gel-based proteomics data..Proteomics 29 (2008), 1371-1383. |
Reference:
|
[24] Quevedo, J., Puig, V., Cembrano, G., Aguilar, J., Isaza, C., Saporta, D., Benito, G., Hedo, M., Molina, A.: Estimating missing and false data in flow meters of a water distribution network..IFAC Proc. Vol. 39 (2006), 1181-1186. |
Reference:
|
[25] Sun, Y., Li, J., Xu, Y., Zhang, T., Wang, X.: Deep learning versus conventional methods for missing data imputation: A review and comparative study..Expert Systems Appl. 227 (2023), 120-201. MR 4523179, |
Reference:
|
[26] Zarzo, M., Martí, P.: Modeling the variability of solar radiation data among weather stations by means of principal components analysis..Appl. Energy 88 (2011), 2775-2784. |
Reference:
|
[27] Zhang, Z.: Missing data imputation: focusing on single imputation..AME Publ. 4 (2016), 1-8. |
. |