Prediction of heavy metal concentrations in soil using machine learning models
Keywords:
Heavy metals, Machine learning, Soil properties, Ensemble modelsAbstract
The application of machine learning (ML) models is increasingly used to predict pollutant levels in environmental samples, particularly heavy metals. This study predicted lead (Pb), zinc (Zn), and cadmium (Cd) concentrations in soil samples collected near a plastic recycling facility in Ogun State, Nigeria. Atomic absorption spectrophotometry (AAS) showed low heavy metal concentrations (mg kg−1 ), with Pb ≤ 0.38, Cd ≤ 0.40, and Zn ≤ 7.55, below the regulatory limits considered in this study. Five regression models—linear regression, Random Forest, Extra Trees, XGBoost, and CatBoost—were evaluated using soil physicochemical properties as predictors. XGBoost produced the highest R2 values for Pb (0.973) and Cd (0.971), whereas Extra Trees produced the highest R2 value for Zn (0.957). CatBoost and Random Forest also showed strong predictive ability, with generally low root mean square error (RMSE) and mean absolute error (MAE) values. The feature-importance results indicated that nitrogen, total organic carbon, and organic matter were important predictors of heavy metal concentrations. The findings suggest strongly nonlinear relationships between soil properties and heavy metal concentrations and support ensemble ML models as useful tools for rapid preliminary monitoring. The results should, however, be interpreted cautiously because of the limited number of experimental samples and the use of rule-based synthetic data augmentation.
Published
How to Cite
Issue
Section
Copyright (c) 2026 Temitope M. Osobamiro, Samuel O. Sipeolu, Emmanuel F. Ayo, Sakeenah O. Abdullah (Author)

This work is licensed under a Creative Commons Attribution 4.0 International License.