On the Issue of Incomplete and Missing Water-Quality Data in Mine Site Databases: Comparing Three Imputation Methods

Betrie, Getnet D.; Sadiq, Rehan; Tesfamariam, Solomon; Morin, Kevin A.

doi:10.1007/s10230-014-0322-4

On the Issue of Incomplete and Missing Water-Quality Data in Mine Site Databases: Comparing Three Imputation Methods

ZUR PROBLEMATIK VON UNVOLLSTÄNDIGEN UND FEHLENDEN BESCHAFFENHEITSDATEN IN DATENBANKEN VON BERGBAUSTANDORTEN: VERGLEICH VON DREI BERECHNUNGSMETHODEN

Sobre el problema de datos de calidad de agua incompletos y perdidos en bases de datos de sitios de minas: Comparación de tres métodos de asignación

矿区不完整和缺损水质数据问题:三种插补方法对比

Technical Article
Published: 14 December 2014

Volume 35, pages 3–9, (2016)
Cite this article

Mine Water and the Environment Aims and scope Submit manuscript

Getnet D. Betrie¹,
Rehan Sadiq¹,
Solomon Tesfamariam¹ &
…
Kevin A. Morin²

We’re sorry, something doesn't seem to be working properly.

Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Large water-quality databases are valuable for predicting mine drainage chemistry, identifying optimal measures for mitigation and remediation, and refuting/refining models and theories. However, such databases often have missing values due to periodic lack of sampling and analysis or input errors. These missing values lead to problems in machine learning and statistical analysis of water-quality data from mine sites. Using water-quality data collected from 1971 to 1994 from many locations at a copper-molybdenum-gold-silver-rhenium mine site, we compared three imputation methods to estimate missing water-quality data: iterative robust model-based imputation (IRMI), multiple imputations of incomplete multivariate data (AMELIA), and sequential imputation for missing values (IMPSEQ). These methods were evaluated based on mean absolute error, relative absolute error, and percent bias techniques. The results showed that IMPSEQ and IRMI are suitable to impute missing values in water-quality databases at mine sites, whereas AMELIA is not.

Zusammenfassung

Große Wasserbeschaffenheitsdatenbanken sind wertvoll zur Abschätzung der Chemie von Bergbauwässern, zur Bestimmung optimaler Maßnahmen zur Minderung von Belastungen und Behandlung von Wasser sowie zur kritischen Auseinandersetzung bzw. Verfeinerung von Modellen und Theorien. Solche Datenbanken weisen jedoch oftmals Lücken auf. Gründe dafür können regelmäßig fehlende Kennwerte nach Probenplan sowie Analyse und Eingabefehler sein. Die fehlenden Werte führen bei der Automatisierung und statistischen Auswertung der Beschaffenheitsdaten zu Problemen. Wir haben für Beschaffenheitsdaten, welche zwischen 1971 und 1994 von verschiedenen Kupfer-, Molybden-, Gold-, Silber- und Rhenium-Bergbaustandorten gewonnen wurden, drei Berechnungsmethoden verglichen, um die Wasserbeschaffenheit abzuschätzen: schrittweise robuste modellbasierte Berechnung (IRMI), multiple Berechnung von multivariaten Daten (AMELIA) und sequentielle Berechnung von fehlenden Werten (IMPSEQ). Diese Verfahren wurden unter Verwendung des mittleren absoluten Fehlers, des relativen absoluten Fehlers und der prozentualen Abweichung ausgewertet. Die Ergebnisse zeigten, dass IMPSEQ und IRMI geeignet sind fehlende Werte in Beschaffenheitsdatenbanken zu berechnen. AMELIA erwies sich als ungeeignet.

Resumen

Las bases de datos de calidad de agua son valiosas para predecir la química del drenaje de minas, identificando medidas óptimas para mitigación y remediación y refutando/refinando modelos y teorías. Sin embargo, tales bases de datos frecuentemente tienen valores perdidos debido a muestreos sistemáticamente no realizados o a errores en el análisis y en la introducción de datos. Estos valores perdidos generan problemas en el aprendizaje de las máquinas y en el análisis estadístico de datos de calidad de agua desde sitios de minas. Usando datos de calidad de agua colectados entre 1971 y 1994 desde muchos puntos dentro de la zona de una mina de cobre-molibdeno-oro-plata y renio, comparamos tres métodos para estimar los datos perdidos de calidad de agua: modelo de asignación robusto iterativo (IRMI), asignaciones múltiples de datos multivariantes incompleto (AMELIA) y asignación secuencial para datos perdidos (IMPSEQ). Estos métodos fueron evaluados basándose en el error absoluto promedio, error absolutivo relative y las técnicas de sesgo porcentual. Los resultados mostraron que IMPSEQ y IRMI son adecuados para asignar valores perdidos en bases de datos de calidad de agua mientras que AMELIA no lo es.

抽象

大型水质数据库对于预测矿井排水的水化学特征、优化水环境治理措施、反演及改进预测模型等意义重大。然而,数据库常常由于未定期取样、化验和输入等原因造成水质数据缺损。数据缺损将导致矿井排水水质数据处理与统计结果出现问题。文章以某个铜-钼-金-银-铼矿1971-1994年多个观测点的水质数据为例,比较了三种缺损数据的插补方法:基于模型的迭代插补法(IRMI)、不完整多元数据的多重插补法(AMELIA)和缺失值顺序插补法(IMPSEQ)。三种方法的对比、评价标准是平均绝对误差、相对绝对误差和百分比偏差。研究结果表明,除AMELIA方法外,IMPSEQ 和 IRMI方法都适于插补缺失的矿区水质数据。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Two stage iterative approach for addressing missing values in small-scale water quality data

Article Open access 27 November 2024

Evaluation of Methods and Imputed Datasets

Comparison and selection criterion of missing imputation methods and quality assessment of monthly rainfall in the Central Rift Valley Lakes Basin of Ethiopia

Article 22 July 2023

Discover the latest articles, news and stories from top researchers in related subjects.

Environmental Chemistry

References

Acuna E, Rodriguez C (2004) The treatment of missing values and its effect on classifier accuracy. In: Banks D, House L, McMorris FR, Arabie P, Gaul W (eds) Classification, clustering and data mining applications. Springer, Berlin, pp 639–647
Chapter Google Scholar
Bello AL (1995) Imputation techniques in regression analysis: looking closely at their implementation. Comput Stat Data Anal 20(1):45–57
Article Google Scholar
Betrie GD, Mohamed YA, van Griensven A, Srinivasan R (2011) Sediment management modelling in the Blue Nile Basin using SWAT model. Hydrol Earth Syst Sci 15(3):807–818
Article Google Scholar
Betrie GD, Tesfamariam S, Morin KA, Sadiq R (2013) Predicting copper concentrations in acid mine drainage: a comparative analysis of five machine learning techniques. Environ Monit Assess 185(5):4171–4182
Article Google Scholar
Graham JW (2012) Missing data: analysis and design. Springer Science + Business Media, USA
Book Google Scholar
Güler C, Thyne G, McCray J, Turner K (2002) Evaluation of graphical and multivariate statistical methods for classification of water chemistry data. Hydrogeol J 10(4):455–474
Article Google Scholar
Gupta HV, Sorooshian S, Yapo PO (1999) Status of automatic calibration for hydrologic models: comparison with multilevel expert calibration. J Hydrol Eng 4(2):135–143
Article Google Scholar
Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581
Article Google Scholar
Honaker J, King G, Blackwell M (2006) Amelia II: a program for missing data. J Stat Softw 45(7):1–47
Google Scholar
Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, USA
Book Google Scholar
Morin KA, Hutt NM, Aziz M (2012) Case studies of thousands of water analyses through decades of monitoring: selected observations from three mine sites in British Columbia, Canada. Proceeding, 2012 international conference on acid rock drainage, Ottawa, Canada
RStudio (2012) RStudio Software. http://www.rstudio.org/
Schafer JL, Olsen MK (1998) Multivariate behavioral multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivar Behav Res 33(4):545–571
Article Google Scholar
Templ M, Kowarik A, Filzmoser P (2011) Iterative stepwise regression imputation using standard and robust methods. Comput Stat Data An 55(10):2793–2806
Article Google Scholar
Templ M, Alfons A, Filzmoser P (2012) Exploring incomplete data using visualization techniques. Adv Data Anal Classi 6(1):29–47
Article Google Scholar
Verboven S, Branden KV, Goos P (2007) Sequential imputation for missing values. Comput Biol Chem 31(5–6):320–327
Article Google Scholar

Download references

Author information

Authors and Affiliations

School of Engineering, The University of British Columbia, Kelowna, BC, Canada
Getnet D. Betrie, Rehan Sadiq & Solomon Tesfamariam
Mine Drainage Assessment Group, Surrey, BC, Canada
Kevin A. Morin

Authors

Getnet D. Betrie
View author publications
You can also search for this author inPubMed Google Scholar
Rehan Sadiq
View author publications
You can also search for this author inPubMed Google Scholar
Solomon Tesfamariam
View author publications
You can also search for this author inPubMed Google Scholar
Kevin A. Morin
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Getnet D. Betrie.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Betrie, G.D., Sadiq, R., Tesfamariam, S. et al. On the Issue of Incomplete and Missing Water-Quality Data in Mine Site Databases: Comparing Three Imputation Methods. Mine Water Environ 35, 3–9 (2016). https://doi.org/10.1007/s10230-014-0322-4

Download citation

Received: 19 April 2014
Accepted: 02 December 2014
Published: 14 December 2014
Issue Date: March 2016
DOI: https://doi.org/10.1007/s10230-014-0322-4

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

On the Issue of Incomplete and Missing Water-Quality Data in Mine Site Databases: Comparing Three Imputation Methods

Abstract

Zusammenfassung

Resumen

抽象

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Two stage iterative approach for addressing missing values in small-scale water quality data

Evaluation of Methods and Imputed Datasets

Comparison and selection criterion of missing imputation methods and quality assessment of monthly rainfall in the Central Rift Valley Lakes Basin of Ethiopia

Explore related subjects

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now