Skip to main content
Log in

On the Issue of Incomplete and Missing Water-Quality Data in Mine Site Databases: Comparing Three Imputation Methods

ZUR PROBLEMATIK VON UNVOLLSTÄNDIGEN UND FEHLENDEN BESCHAFFENHEITSDATEN IN DATENBANKEN VON BERGBAUSTANDORTEN: VERGLEICH VON DREI BERECHNUNGSMETHODEN

Sobre el problema de datos de calidad de agua incompletos y perdidos en bases de datos de sitios de minas: Comparación de tres métodos de asignación

矿区不完整和缺损水质数据问题:三种插补方法对比

  • Technical Article
  • Published:
Mine Water and the Environment Aims and scope Submit manuscript

    We’re sorry, something doesn't seem to be working properly.

    Please try refreshing the page. If that doesn't work, please contact support so we can address the problem.

Abstract

Large water-quality databases are valuable for predicting mine drainage chemistry, identifying optimal measures for mitigation and remediation, and refuting/refining models and theories. However, such databases often have missing values due to periodic lack of sampling and analysis or input errors. These missing values lead to problems in machine learning and statistical analysis of water-quality data from mine sites. Using water-quality data collected from 1971 to 1994 from many locations at a copper-molybdenum-gold-silver-rhenium mine site, we compared three imputation methods to estimate missing water-quality data: iterative robust model-based imputation (IRMI), multiple imputations of incomplete multivariate data (AMELIA), and sequential imputation for missing values (IMPSEQ). These methods were evaluated based on mean absolute error, relative absolute error, and percent bias techniques. The results showed that IMPSEQ and IRMI are suitable to impute missing values in water-quality databases at mine sites, whereas AMELIA is not.

Zusammenfassung

Große Wasserbeschaffenheitsdatenbanken sind wertvoll zur Abschätzung der Chemie von Bergbauwässern, zur Bestimmung optimaler Maßnahmen zur Minderung von Belastungen und Behandlung von Wasser sowie zur kritischen Auseinandersetzung bzw. Verfeinerung von Modellen und Theorien. Solche Datenbanken weisen jedoch oftmals Lücken auf. Gründe dafür können regelmäßig fehlende Kennwerte nach Probenplan sowie Analyse und Eingabefehler sein. Die fehlenden Werte führen bei der Automatisierung und statistischen Auswertung der Beschaffenheitsdaten zu Problemen. Wir haben für Beschaffenheitsdaten, welche zwischen 1971 und 1994 von verschiedenen Kupfer-, Molybden-, Gold-, Silber- und Rhenium-Bergbaustandorten gewonnen wurden, drei Berechnungsmethoden verglichen, um die Wasserbeschaffenheit abzuschätzen: schrittweise robuste modellbasierte Berechnung (IRMI), multiple Berechnung von multivariaten Daten (AMELIA) und sequentielle Berechnung von fehlenden Werten (IMPSEQ). Diese Verfahren wurden unter Verwendung des mittleren absoluten Fehlers, des relativen absoluten Fehlers und der prozentualen Abweichung ausgewertet. Die Ergebnisse zeigten, dass IMPSEQ und IRMI geeignet sind fehlende Werte in Beschaffenheitsdatenbanken zu berechnen. AMELIA erwies sich als ungeeignet.

Resumen

Las bases de datos de calidad de agua son valiosas para predecir la química del drenaje de minas, identificando medidas óptimas para mitigación y remediación y refutando/refinando modelos y teorías. Sin embargo, tales bases de datos frecuentemente tienen valores perdidos debido a muestreos sistemáticamente no realizados o a errores en el análisis y en la introducción de datos. Estos valores perdidos generan problemas en el aprendizaje de las máquinas y en el análisis estadístico de datos de calidad de agua desde sitios de minas. Usando datos de calidad de agua colectados entre 1971 y 1994 desde muchos puntos dentro de la zona de una mina de cobre-molibdeno-oro-plata y renio, comparamos tres métodos para estimar los datos perdidos de calidad de agua: modelo de asignación robusto iterativo (IRMI), asignaciones múltiples de datos multivariantes incompleto (AMELIA) y asignación secuencial para datos perdidos (IMPSEQ). Estos métodos fueron evaluados basándose en el error absoluto promedio, error absolutivo relative y las técnicas de sesgo porcentual. Los resultados mostraron que IMPSEQ y IRMI son adecuados para asignar valores perdidos en bases de datos de calidad de agua mientras que AMELIA no lo es.

抽象

型水质数据库对于预测矿井排水的水化学特征、优化水环境治理措施、反演及改进预测模型等意义重大。然而,数据库常常由于未定期取样、化验和输入等原因造成水质数据缺损。数据缺损将导致矿井排水水质数据处理与统计结果出现问题。文章以某个铜-钼-金-银-铼矿1971-1994年多个观测点的水质数据为例,比较了三种缺损数据的插补方法:基于模型的迭代插补法(IRMI)、不完整多元数据的多重插补法(AMELIA)和缺失值顺序插补法(IMPSEQ)。三种方法的对比、评价标准是平均绝对误差、相对绝对误差和百分比偏差。研究结果表明,除AMELIA方法外,IMPSEQ 和 IRMI方法都适于插补缺失的矿区水质数据。

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  • Acuna E, Rodriguez C (2004) The treatment of missing values and its effect on classifier accuracy. In: Banks D, House L, McMorris FR, Arabie P, Gaul W (eds) Classification, clustering and data mining applications. Springer, Berlin, pp 639–647

    Chapter  Google Scholar 

  • Bello AL (1995) Imputation techniques in regression analysis: looking closely at their implementation. Comput Stat Data Anal 20(1):45–57

    Article  Google Scholar 

  • Betrie GD, Mohamed YA, van Griensven A, Srinivasan R (2011) Sediment management modelling in the Blue Nile Basin using SWAT model. Hydrol Earth Syst Sci 15(3):807–818

    Article  Google Scholar 

  • Betrie GD, Tesfamariam S, Morin KA, Sadiq R (2013) Predicting copper concentrations in acid mine drainage: a comparative analysis of five machine learning techniques. Environ Monit Assess 185(5):4171–4182

    Article  Google Scholar 

  • Graham JW (2012) Missing data: analysis and design. Springer Science + Business Media, USA

    Book  Google Scholar 

  • Güler C, Thyne G, McCray J, Turner K (2002) Evaluation of graphical and multivariate statistical methods for classification of water chemistry data. Hydrogeol J 10(4):455–474

    Article  Google Scholar 

  • Gupta HV, Sorooshian S, Yapo PO (1999) Status of automatic calibration for hydrologic models: comparison with multilevel expert calibration. J Hydrol Eng 4(2):135–143

    Article  Google Scholar 

  • Honaker J, King G (2010) What to do about missing values in time-series cross-section data. Am J Polit Sci 54(2):561–581

    Article  Google Scholar 

  • Honaker J, King G, Blackwell M (2006) Amelia II: a program for missing data. J Stat Softw 45(7):1–47

    Google Scholar 

  • Little RJA, Rubin DB (2002) Statistical analysis with missing data. Wiley, USA

    Book  Google Scholar 

  • Morin KA, Hutt NM, Aziz M (2012) Case studies of thousands of water analyses through decades of monitoring: selected observations from three mine sites in British Columbia, Canada. Proceeding, 2012 international conference on acid rock drainage, Ottawa, Canada

  • RStudio (2012) RStudio Software. http://www.rstudio.org/

  • Schafer JL, Olsen MK (1998) Multivariate behavioral multiple imputation for multivariate missing-data problems: a data analyst’s perspective. Multivar Behav Res 33(4):545–571

    Article  Google Scholar 

  • Templ M, Kowarik A, Filzmoser P (2011) Iterative stepwise regression imputation using standard and robust methods. Comput Stat Data An 55(10):2793–2806

    Article  Google Scholar 

  • Templ M, Alfons A, Filzmoser P (2012) Exploring incomplete data using visualization techniques. Adv Data Anal Classi 6(1):29–47

    Article  Google Scholar 

  • Verboven S, Branden KV, Goos P (2007) Sequential imputation for missing values. Comput Biol Chem 31(5–6):320–327

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Getnet D. Betrie.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Betrie, G.D., Sadiq, R., Tesfamariam, S. et al. On the Issue of Incomplete and Missing Water-Quality Data in Mine Site Databases: Comparing Three Imputation Methods. Mine Water Environ 35, 3–9 (2016). https://doi.org/10.1007/s10230-014-0322-4

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10230-014-0322-4

Keywords