On replacement of outliers and missing values in time series

Authors

  • Loganathan Appaia Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamil Nadu
  • Sumithra Palraj Department of Statistics, Manonmaniam Sundaranar University, Tirunelveli, Tamil Nadu

DOI:

https://doi.org/10.6092/issn.2281-4485/16184

Keywords:

Missing Value, Outlier, Autoregressive Moving Average, Facebook’s Prophet, Long Short-Term Memory, Mean and Median Imputation.

Abstract

Presence of missing values and occurrence of outliers in time series cause many hindrances in the analysis of data. Several methods are proposed for determining estimates to replace the missing values and outliers. Mean, median, the largest order statistic and time series model based forecast values are used as the estimates for replacing missing values and outliers. But, no recommendations have been made so far for selection of the estimation methods.  This paper attempts to compare the performance of six such estimation methods. Among them, time series models are fitted applying the autoregressive moving average method, long short-term memory method and Facebook’s Prophet method. Models are validated using the test data. Time series of Air Quality Index is used for carrying out for comparative study.

References

AHN H., SUN K., KIM K.P. (2022) Comparison of missing data imputation methods in time series forecasting. Computers. Materials and Continua, 70(1):767-779.: https://doi.org/10.32604/cmc.2022.019369

AGBAILU A.O., SENO A., CLEMENT O.O. (2020) Kalman filter algorithm versus other methods of estimating missing values: time series evidence. African Journal of Mathematics and Statistics Studies, 4(2):1-9.: https://doi.org/10.52589/AJMSS-VFVNMQLX

BOX G.E.P., JENKINS G.M., REINSEL G., LJUNG G.M. (2015). Time series analysis: forecasting and control. John Wiley & Sons. ISSN 978-1-118-67502-1

CINAR Y.G., MIRISAEE H., GOSWAMI P., GAUSSIER E., AIT-BACHIR A. (2018) Period-aware content attention RNNs for time series forecasting with missing values. Neurocomputing, 312:177-186. https://doi.org/10.1016/j.neucom.2018.05.090

CHANG Y.S., CHIAO H.T., ABIMANNAN S., HUANG Y.P., TSAI Y.T., LIN K.M. (2020) An LSTM-based aggregated model for air pollution forecasting. Atmospheric Pollution Research, 11(8):1451-1463.: https://doi.org/10.1016/j.apr.2020.05.015

DENESHKUMAR V., KANNAN K.S. (2011) Outliers in time series data. Int. J. Agricult. Stat. Sci, 7(2):685-691. ISBN 0973-1903.

COUSINEAU D., CHARTIER S. (2010) Outliers detection and treatment: a review. International Journal of Psychological Research, 3(1):58-67. https://doi.org/10.21500/20112084.844

DEUTSCH S.J., RICHARDS J.E., SWAIN J.J. (1990) Effects of a single outlier on ARMA identification. Communications in Statistics-Theory and Methods, 19(6):2207-2227. https://doi.org/10.1080/03610929008830316

ENDERS C.K. (2010) Applied Missing Data Analysis, Guilford press. ISSN 978-1-60623-639-0

HUANG M.W., LIN W.C., TSAI C.F. (2018) Outlier removal in model-based missing value imputation for medical datasets. Journal of healthcare engineering, 2018:1-9. https://doi.org/10.1155/2018/1817479

JADHAV A., PRAMOD D., RAMANATHAN K. (2019) Comparison of Performance of Data Imputation Methods for Numeric Dataset. Applied Artificial Intelligence 33(10): 913-933. https://doi.org/10.1080/08839514.2019.1637138

JANARTHANAN R., PARTHEEBAN P., SOMASUNDARAM K., NAVIN ELAMPARITHI P. (2021) A deep learning approach for prediction of air quality index in a metropolitan city. Sustainable Cities and Society, 67:102720-102731. https://doi.org/10.1016/j.scs.2021.102720

KIHORO J.M., ATHIANY H., WALTER O.Y., W K H (2013) Imputation of incomplete non-stationary seasonal time series data. Mathematical Theory and Model, 3(12):142-154. ISBN 2225-0522

KOLBASI A., UNSAL A. (2019) A Comparison of the Outlier Detecting Methods: An Application on Turkish Foreign Trade Data. Journal of Mathematics and Statistical Science, 5:213-234. ISBN 2411-2518

LEDOLTER J. (1989) The effect of additive outliers on the forecasts from ARIMA models. International Journal of Forecasting, 5(2):231-240. https://doi.org/10.1016/0169-2070(89)90090-3

LIN W.C., TSAI C.F. (2020) Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review, 53(2). 1487-1509. https://doi.org/10.1007/s10462-019-09709-4

LYU J., YIN S., SHANG C.C., MA Y., SUN N., SHEN G., LIU C. (2021) Sensitivity analysis of isoprene and aerosol emission in a suburban plantation using long short-term memory model. Urban Forestry and Urban Greening, 64:127303- 127310. https://doi.org/10.1016/j.ufug.2021.127303

MANI G., VOLETY R. (2021) A comparative analysis of LSTM and ARIMA for enhanced real-time air pollutant levels forecasting using sensor fusion with ground station data. Cogent Engineering, 8(1):1936886-1936912. https://doi.org/10.1080/23311916.2021.1936886

QIU J., WANG B., ZHOU C (2020) Forecasting stock prices with long-short term memory neural network based on attention mechanism. PLoS ONE, 15(1):1-15. https://doi.org/10.1371/journal.pone.0227222

RUBIN D.B. (1976) Inference and missing data. Biometrika 63(3):581-592. https://doi.org/10.1093/biomet/63.3.581

SAVARIMUTHU N., KARESIDDAIAH S. (2021) An unsupervised neural network approach for imputation of missing values in univariate time series data. Concurrency and Computation: Practice and experience, 33(9):1–16. https://doi.org/10.1002/cpe.6156

SHEN J., VALAGOLAM D., McCALLA S. (2020) Prophet forecasting model: A machine learning approach to predict the concentration of air pollutants (PM2.5, PM10, O3, NO2, SO2, CO) in Seoul, South Korea. PeerJ, 8:1-18. https://doi.org/10.7717/peerj.9961

SONG X., LIU Y., XUE L., WANG J., ZHANG J., WANG J., JIANG L., CHENG Z. (2020) Time-series well performance prediction based on Long Short-Term Memory (LSTM) neural network model. Journal of Petroleum Science and Engineering, 186:106682-106700.: https://doi.org/10.1016/j.petrol.2019.106682

TAYLOR S.J., LETHAM B. (2018) Forecasting at scale. American statistician, 72(1):37-45. https://doi.org/10.1080/00031305.2017.1380080

TOLVI J. (1998) Outliers in time series: A review. University of Turku. Department of Eco-nomics, Research reports, 76:1-30.

ZEILEIS A., GROTHENDIECK G. (2005) Zoo: S3 infrastructure for regular and irregular time series. Journal of Statistical Software, 14(6):1-27. https://doi.org/10.18637/jss.v014.i06

Downloads

Published

2023-04-03

How to Cite

Appaia, L., & Palraj, S. (2023). On replacement of outliers and missing values in time series. EQA - International Journal of Environmental Quality, 53(1), 1–10. https://doi.org/10.6092/issn.2281-4485/16184

Issue

Section

Articles