On replacement of outliers and missing values in time series
DOI:
https://doi.org/10.6092/issn.2281-4485/16184Keywords:
Missing Value, Outlier, Autoregressive Moving Average, Facebook’s Prophet, Long Short-Term Memory, Mean and Median Imputation.Abstract
Presence of missing values and occurrence of outliers in time series cause many hindrances in the analysis of data. Several methods are proposed for determining estimates to replace the missing values and outliers. Mean, median, the largest order statistic and time series model based forecast values are used as the estimates for replacing missing values and outliers. But, no recommendations have been made so far for selection of the estimation methods. This paper attempts to compare the performance of six such estimation methods. Among them, time series models are fitted applying the autoregressive moving average method, long short-term memory method and Facebook’s Prophet method. Models are validated using the test data. Time series of Air Quality Index is used for carrying out for comparative study.
References
AHN H., SUN K., KIM K.P. (2022) Comparison of missing data imputation methods in time series forecasting. Computers. Materials and Continua, 70(1):767-779.: https://doi.org/10.32604/cmc.2022.019369
AGBAILU A.O., SENO A., CLEMENT O.O. (2020) Kalman filter algorithm versus other methods of estimating missing values: time series evidence. African Journal of Mathematics and Statistics Studies, 4(2):1-9.: https://doi.org/10.52589/AJMSS-VFVNMQLX
BOX G.E.P., JENKINS G.M., REINSEL G., LJUNG G.M. (2015). Time series analysis: forecasting and control. John Wiley & Sons. ISSN 978-1-118-67502-1
CINAR Y.G., MIRISAEE H., GOSWAMI P., GAUSSIER E., AIT-BACHIR A. (2018) Period-aware content attention RNNs for time series forecasting with missing values. Neurocomputing, 312:177-186. https://doi.org/10.1016/j.neucom.2018.05.090
CHANG Y.S., CHIAO H.T., ABIMANNAN S., HUANG Y.P., TSAI Y.T., LIN K.M. (2020) An LSTM-based aggregated model for air pollution forecasting. Atmospheric Pollution Research, 11(8):1451-1463.: https://doi.org/10.1016/j.apr.2020.05.015
DENESHKUMAR V., KANNAN K.S. (2011) Outliers in time series data. Int. J. Agricult. Stat. Sci, 7(2):685-691. ISBN 0973-1903.
COUSINEAU D., CHARTIER S. (2010) Outliers detection and treatment: a review. International Journal of Psychological Research, 3(1):58-67. https://doi.org/10.21500/20112084.844
DEUTSCH S.J., RICHARDS J.E., SWAIN J.J. (1990) Effects of a single outlier on ARMA identification. Communications in Statistics-Theory and Methods, 19(6):2207-2227. https://doi.org/10.1080/03610929008830316
ENDERS C.K. (2010) Applied Missing Data Analysis, Guilford press. ISSN 978-1-60623-639-0
HUANG M.W., LIN W.C., TSAI C.F. (2018) Outlier removal in model-based missing value imputation for medical datasets. Journal of healthcare engineering, 2018:1-9. https://doi.org/10.1155/2018/1817479
JADHAV A., PRAMOD D., RAMANATHAN K. (2019) Comparison of Performance of Data Imputation Methods for Numeric Dataset. Applied Artificial Intelligence 33(10): 913-933. https://doi.org/10.1080/08839514.2019.1637138
JANARTHANAN R., PARTHEEBAN P., SOMASUNDARAM K., NAVIN ELAMPARITHI P. (2021) A deep learning approach for prediction of air quality index in a metropolitan city. Sustainable Cities and Society, 67:102720-102731. https://doi.org/10.1016/j.scs.2021.102720
KIHORO J.M., ATHIANY H., WALTER O.Y., W K H (2013) Imputation of incomplete non-stationary seasonal time series data. Mathematical Theory and Model, 3(12):142-154. ISBN 2225-0522
KOLBASI A., UNSAL A. (2019) A Comparison of the Outlier Detecting Methods: An Application on Turkish Foreign Trade Data. Journal of Mathematics and Statistical Science, 5:213-234. ISBN 2411-2518
LEDOLTER J. (1989) The effect of additive outliers on the forecasts from ARIMA models. International Journal of Forecasting, 5(2):231-240. https://doi.org/10.1016/0169-2070(89)90090-3
LIN W.C., TSAI C.F. (2020) Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53(2). 1487-1509. https://doi.org/10.1007/s10462-019-09709-4
LYU J., YIN S., SHANG C.C., MA Y., SUN N., SHEN G., LIU C. (2021) Sensitivity analysis of isoprene and aerosol emission in a suburban plantation using long short-term memory model. Urban Forestry and Urban Greening, 64:127303- 127310. https://doi.org/10.1016/j.ufug.2021.127303
MANI G., VOLETY R. (2021) A comparative analysis of LSTM and ARIMA for enhanced real-time air pollutant levels forecasting using sensor fusion with ground station data. Cogent Engineering, 8(1):1936886-1936912. https://doi.org/10.1080/23311916.2021.1936886
QIU J., WANG B., ZHOU C (2020) Forecasting stock prices with long-short term memory neural network based on attention mechanism. PLoS ONE, 15(1):1-15. https://doi.org/10.1371/journal.pone.0227222
RUBIN D.B. (1976) Inference and missing data. Biometrika 63(3):581-592. https://doi.org/10.1093/biomet/63.3.581
SAVARIMUTHU N., KARESIDDAIAH S. (2021) An unsupervised neural network approach for imputation of missing values in univariate time series data. Concurrency and Computation: Practice and experience, 33(9):1–16. https://doi.org/10.1002/cpe.6156
SHEN J., VALAGOLAM D., McCALLA S. (2020) Prophet forecasting model: A machine learning approach to predict the concentration of air pollutants (PM2.5, PM10, O3, NO2, SO2, CO) in Seoul, South Korea. PeerJ, 8:1-18. https://doi.org/10.7717/peerj.9961
SONG X., LIU Y., XUE L., WANG J., ZHANG J., WANG J., JIANG L., CHENG Z. (2020) Time-series well performance prediction based on Long Short-Term Memory (LSTM) neural network model. Journal of Petroleum Science and Engineering, 186:106682-106700.: https://doi.org/10.1016/j.petrol.2019.106682
TAYLOR S.J., LETHAM B. (2018) Forecasting at scale. American statistician, 72(1):37-45. https://doi.org/10.1080/00031305.2017.1380080
TOLVI J. (1998) Outliers in time series: A review. University of Turku. Department of Eco-nomics, Research reports, 76:1-30.
ZEILEIS A., GROTHENDIECK G. (2005) Zoo: S3 infrastructure for regular and irregular time series. Journal of Statistical Software, 14(6):1-27. https://doi.org/10.18637/jss.v014.i06
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Loganathan Appaia, Sumithra Palraj
This work is licensed under a Creative Commons Attribution 4.0 International License.