A Data Mining Approach to Predict In Situ Detoxification Potential of Chlorinated Ethenes

Document Type


Publication Date


Publication Title

Environmental Science and Technology


Despite advances in physicochemical remediation technologies, in situ bioremediation treatment based on Dehalococcoides mccartyi (Dhc) reductive dechlorination activity remains a cornerstone approach to remedy sites impacted with chlorinated ethenes. Selecting the best remedial strategy is challenging due to uncertainties and complexity associated with biological and geochemical factors influencing Dhc activity. Guidelines based on measurable biogeochemical parameters have been proposed, but contemporary efforts fall short of meaningfully integrating the available information. Extensive groundwater monitoring data sets have been collected for decades, but have not been systematically analyzed and used for developing tools to guide decision-making. In the present study, geochemical and microbial data sets collected from 35 wells at five contaminated sites were used to demonstrate that a data mining prediction model using the classification and regression tree (CART) algorithm can provide improved predictive understanding of a site’s reductive dechlorination potential. The CART model successfully predicted the 3-month-ahead reductive dechlorination potential with 75.8% and 69.5% true positive rate (i.e., sensitivity) for the training set and the test set, respectively. The machine learning algorithm ranked parameters by relative importance for assessing in situ reductive dechlorination potential. The abundance of Dhc 16S rRNA genes, CH4, Fe2+, NO3, NO2 , and SO4 2− concentrations, total organic carbon (TOC) amounts, and oxidation−reduction potential (ORP) displayed significant correlations (p < 0.01) with dechlorination potential, with NO3 , NO2, and Fe2+ concentrations exhibiting precedence over other parameters. Contrary to prior efforts, the power of data mining approaches lies in the ability to discern synergetic effects between multiple parameters that affect reductive dechlorination activity. Overall, these findings demonstrate that data mining techniques (e.g., machine learning algorithms) effectively utilize groundwater monitoring data to derive predictive understanding of contaminant degradation, and thus have great potential for improving decision-making tools. A major need for realizing the predictive capabilities of data mining approaches is a curated, open-access, up-to-date and comprehensive collection of biogeochemical groundwater monitoring data.


Funding for this study was provided by the Strategic Environmental Research and Development Program (SERDP) under Project ER-2312 and the Environmental Security Technology Certification Program (ESTCP) under Project ER-1129.