Prediction of heavy metal concentrations in soil using machine learning models

Authors

  • Temitope M. Osobamiro
    Department of Chemical Sciences, Olabisi Onabanjo University, P.M.B. 2002, Ago-Iwoye, Nigeria
  • Samuel O. Sipeolu
    Department of Chemical Sciences, Olabisi Onabanjo University, P.M.B. 2002, Ago-Iwoye, Nigeria
  • Emmanuel F. Ayo
    Department of Computer Sciences, Olabisi Onabanjo University, P.M.B. 2002, Ago-Iwoye, Nigeria
  • Sakeenah O. Abdullah
    Department of Chemical Sciences, Olabisi Onabanjo University, P.M.B. 2002, Ago-Iwoye, Nigeria

Keywords:

Heavy metals, Machine learning, Soil properties, Ensemble models

Abstract

The application of machine learning (ML) models is increasingly used to predict pollutant levels in environmental samples, particularly heavy metals. This study predicted lead (Pb), zinc (Zn), and cadmium (Cd) concentrations in soil samples collected near a plastic recycling facility in Ogun State, Nigeria. Atomic absorption spectrophotometry (AAS) showed low heavy metal concentrations (mg kg−1 ), with Pb ≤ 0.38, Cd ≤ 0.40, and Zn ≤ 7.55, below the regulatory limits considered in this study. Five regression models—linear regression, Random Forest, Extra Trees, XGBoost, and CatBoost—were evaluated using soil physicochemical properties as predictors. XGBoost produced the highest R2 values for Pb (0.973) and Cd (0.971), whereas Extra Trees produced the highest R2 value for Zn (0.957). CatBoost and Random Forest also showed strong predictive ability, with generally low root mean square error (RMSE) and mean absolute error (MAE) values. The feature-importance results indicated that nitrogen, total organic carbon, and organic matter were important predictors of heavy metal concentrations. The findings suggest strongly nonlinear relationships between soil properties and heavy metal concentrations and support ensemble ML models as useful tools for rapid preliminary monitoring. The results should, however, be interpreted cautiously because of the limited number of experimental samples and the use of rule-based synthetic data augmentation.

Dimensions

[1] M. Keçeci, F. Gökmen, M. Usul, C. Koca & V. Uygur, “Prediction of cadmium content using machine learning methods”, Environmental Earth Sciences 83 (2024) 362. https://doi.org/10.1007/s12665-024-11672-5.

[2] E. Joseph, J. Azorji, O. Nwachukwu, S. Iheagwam, J. Okere, K. Ukeje & D. Anamnah, “Assessment of physicochemical characteristics and heavy metal concentration in soils and plants in selected refuse dumpsites within Nkwerre LGA, Imo State, Southeast Nigeria”, South Asian Research Journal of Natural Products 3 (2020) 26. Available online: https://www.sarpublication.com/.

[3] K. N. Palansooriya, J. Li, P. D. Dissanayake, M. Suvarna, L. Li, X. Yuan, B. Sarkar, D. C. W. Tsang, J. Rinklebe, X. Wang & Y. S. Ok, “Prediction of soil heavy metal immobilization by biochar using machine learning”, Environmental Science & Technology 56 (2022) 4187. https://pubs.acs.org/doi/10.1021/acs.est.1c08302.

[4] O. O. Olayinka, O. O. Akande, K. Bamgbose & M. T. Adetunji, “Physicochemical characteristics and heavy metal levels in soil samples obtained from selected anthropogenic sites in Abeokuta, Nigeria”, Journal of Applied Sciences and Environmental Management 21 (2017) 883. https://doi.org/10.4314/jasem.v21i5.14.

[5] O. O. Eseyin, G. J. Udom & I. C. Osu, “Heavy metal concentration and physicochemical parameters in soil and plants near unengineered dumpsites in Port Harcourt, Nigeria”, Journal of Geography, Environment and Earth Science International 19 (2019) 1. Available online: https://journaljgeesi.com/index.php/JGEESI/article/view/376.

[6] T. A. Gyamfi, B. Koomson & E. Bessah, “Assessment of burnt polyethylene impact on physicochemical and biological properties of soil at Esereso-Adagya landfill, Ghana”, Research Square (2025). https://doi.org/10.21203/rs.3.rs-7464330/v1.

[7] R. A. Wuana & F. E. Okieimen, “Heavy metals in contaminated soils: a review of sources, chemistry, risks and best available strategies for remediation”, International Scholarly Research Notices 2011 (2011) 402647. https://doi.org/10.5402/2011/402647.

[8] W. Cao & C. Zhang, “A collaborative compound neural network model for soil heavy metal content prediction”, IEEE Access 8 (2020) 129497. https://doi.org/10.1109/ACCESS.2020.3009248.

[9] A. Suleymanov, R. Suleymanov, A. Kulagin & M. Yurkevich, “Mercury prediction in urban soils by remote sensing and relief data using machine learning techniques”, Remote Sensing 15 (2023) 3158. https://doi.org/10.3390/rs15123158.

[10] T. M. T. Huynh, C. F. Ni, Y. S. Su, V. C. N. Nguyen, I. H. Lee, C. P. Lin & H. H. Nguyen, “Predicting heavy metal concentrations in shallow aquifer systems based on low-cost physicochemical parameters using machine learning techniques”, International Journal of Environmental Research and Public Health 19 (2022) 12180. https://doi.org/10.3390/ijerph191912180.

[11] S. Shi, M. Hou, Z. Gu, C. Jiang, W. Zhang, M. Hou & Z. Xi, “Estimation of heavy metal content in soil based on machine learning models”, Land 11 (2022) 1037. https://doi.org/10.3390/land11071037.

[12] J. Liu, Y. Zhang, H. Wang & Y. Du, “Study on the prediction of soil heavy metal elements content based on visible near-infrared spectroscopy”, Spectrochimica Acta Part A: Molecular and Biomolecular Spectroscopy 199 (2018) 43. https://doi.org/10.1016/j.saa.2018.03.040.

[13] Association of Official Analytical Chemists (AOAC), “Official Methods of Analysis”, 18th ed., AOAC, Arlington (2015) 806.

[14] I. O. Ojo, J. O. Ojo & O. Oladele, “Analysis of heavy metals and some physicochemical parameters in soil of major industrial dumpsites in Akure Township, Ondo State, Nigeria”, International Journal of Chemistry 7 (2015) 55. https://doi.org/10.5539/ijc.v7n1p55.

[15] J. Haware & H. Pramond, “Determination of specific heavy metals in fruit juices using atomic absorption spectrophotometer (AAS)”, International Journal of Research in Chemistry and Environment 4 (2014) 163. Available online: https://ijrce.org/index.php/ijrce/article/view/31.

[16] F. S. de Oliveira & R. Stefani, “Evaluating the use of synthetic data for machine learning prediction of self-healing capacity of concrete”, AI in Civil Engineering 4 (2025) 25. https://doi.org/10.1007/s43503-025-00074-6.

[17] S. Palaniappan, R. Logeswaran, S. Khanam & Y. Zhang, “Machine learning model for predicting net environmental effects”, Journal of Informatics and Web Engineering 4 (2025) 243. https://doi.org/10.33093/jiwe.2025.4.1.18.

[18] H. Castro-Gutiérrez, C. Robles-Algarı́n & A. Polo, “Data augmentation and machine learning for heavy metal detection in mulberry leaves using laser-induced breakdown spectroscopy (LIBS) spectral data”, Processes 13 (2025) 1688. https://doi.org/10.3390/pr13061688.

[19] N. Roustaei, “Application and interpretation of linear-regression analysis”, Medical Hypothesis, Discovery & Innovation in Ophthalmology 13 (2024) 151. https://doi.org/10.51329/mehdiophthal1506.

[20] G. Heinze, M. Baillie, L. Lusa, W. Sauerbrei, C. O. Schmidt, F. E. Harrell & M. Huebner, “Regression without regrets: initial data analysis is a prerequisite for multivariable regression”, BMC Medical Research Methodology 24 (2024) 178. https://doi.org/10.1186/s12874-024-02294-3.

[21] T. Alkhalifah, H. Wang & O. Ovcharenko, “ML real: bridging the gap between training on synthetic data and real data applications in machine learning”, Artificial Intelligence in Geosciences 3 (2022) 101. https://doi.org/10.1016/j.aiig.2022.09.002.

[22] L. Breiman, “Random forests”, Machine Learning 45 (2001) 5. https://doi.org/10.1023/A:1010933404324.

[23] M. Ghazwani & M. Y. Begum, “Computational intelligence modeling of hyoscine drug solubility and solvent density in supercritical processing: gradient boosting, extra trees, and random forest models”, Scientific Reports 13 (2023) 10046. https://doi.org/10.1038/s41598-023-37232-8.

[24] A. Alazba & H. Aljamaan, “Software defect prediction using stacking generalization of optimized tree-based ensembles”, Applied Sciences 12 (2022) 4577. https://doi.org/10.3390/app12094577.

[25] F. Xia, T. Fan, Y. Chen, D. Ding, J. Wei, D. Jiang & S. Deng, “Prediction of heavy metal concentrations in contaminated sites from portable X-ray fluorescence spectrometer data using machine learning”, Processes 10 (2022) 536. https://doi.org/10.3390/pr10030536.

[26] M. F. Ioni??, S. M. Radu & E. C. Dunca, “Correlation analysis of heavy metal concentrations in the tailing dumps Branch 1 and 2 Lupeni using Pearson coefficient matrix”, Mining Revue 30 (2024) 22. https://doi.org/10.2478/minrv-2024-0023.

[27] O. A. Al-Khashman, A. O. Al-Khashman, N. R. J. Hynes, H. M. Alnawafleh & P. S. Velu, “Assessment of heavy metals contamination of topsoil and street dust around cement factory in southern Jordan”, Journal of Environmental Protection 15 (2024) 672. https://doi.org/10.4236/jep.2024.156038.

[28] M. M. Mukaka, “A guide to appropriate use of correlation coefficient in medical research”, Malawi Medical Journal 24 (2012) 69. Available online: https://pmc.ncbi.nlm.nih.gov/articles/PMC3576830/.

[29] US EPA, “Regional Screening Levels (RSLs)–Generic Tables (June 2017)”, United States Environmental Protection Agency, Washington, DC (2017). Available online: https://www.epa.gov/risk/regional-screening-levels-rsls-generic-tables-june-2017.

[30] O. R. Abbas, K. B. Al-Paruany & M. H. Mashjel, “Distribution of heavy metals, polycyclic aromatic hydrocarbons, micro-plastics particles, and their potential contamination of soil in a selected area in Baghdad City, Iraq”, Iraqi Journal of Science 66 (2025) 3818. https://doi.org/10.24996/ijs.2025.66.9.25.

[31] E. A. Yerima, B. N. Hikon, C. V. Ogbodo, H. Ataitiya & J. D. Ani, “Chemical speciation and mobility of heavy metals in soils around Nasara Sack and Packaging Company, Akwanga, Nigeria”, Journal of Advances in Chemistry 16 (2019) 5379. Available online: https://rajpub.com/index.php/jac/article/view/8434.

[32] M. Tefera, F. Gebreyohannes & M. Saraswathi, “Heavy metal analysis in the soils in and around Robe town, Bale Zone, Southeast Ethiopia”, Eurasian Journal of Soil Science 7 (2018) 251. https://doi.org/10.18393/ejss.403004.

[33] United States Environmental Protection Agency (USEPA), “Supplemental guidance for developing soil screening levels for Superfund sites”, OSWER 9355.4-24, United States Environmental Protection Agency, Washington, DC (2002). Available online: https://www.epa.gov/superfund/superfund-soil-screening-guidance.

[34] United States Department of Agriculture (USDA), “Soil survey manual”, USDA Handbook No. 18 (2003). Available online: https://www.nrcs.usda.gov/resources/guides-and-instructions/soil-survey-manual.

[35] Department of Petroleum Resources, “Environmental guidelines and standards for the petroleum industry in Nigeria (EGASPIN)”, Department of Petroleum Resources (2002). Available online: https://www.aziza.com.ng/wp-content/uploads/2020/06/environmental-guidelines-and-standards-for-the-petroleum-industry-in-nigeria-egaspin-2002.pdf.

[36] S. S. Ramos-Romero, H. R. Benavides-Rosales & J. J. Peña-Chamorro, “Advances in modelling the transport of heavy metals in agricultural soils and their leaching into groundwater: an integrative critical review”, Frontiers in Environmental Science 14 (2026) 1764394. https://doi.org/10.3389/fenvs.2026.1764394.

[37] Y. Wan, J. Liu, Z. Zhuang, Q. Wang & H. Li, “Heavy metals in agricultural soils: sources, influencing factors, and remediation strategies”, Toxics 12 (2024) 63. https://doi.org/10.3390/toxics12010063.

[38] T. Hu, Q. Chen, Z. Lin, C. Qi & L. Chai, “Machine learning enables low-cost determination of soil heavy metal concentrations”, ACS ES&T Engineering 5 (2025) 3085. https://doi.org/10.1021/acsestengg.5c00463.

[39] W. Ma, K. Tan & P. Du, “Predicting soil heavy metal based on random forest model”, in: IEEE International Geoscience and Remote Sensing Symposium (IGARSS), IEEE (2016) 4331. https://doi.org/10.1109/IGARSS.2016.7730129.

[40] T. Hu, M. Wu, Q. Chen, L. Chai & C. Qi, “Machine learning uncovers dominant fractions of heavy metal(loid)s in global soils”, Communications Earth & Environment 7 (2026) 214. https://doi.org/10.1038/s43247-026-03221-8.

Published

2026-05-24

How to Cite

Prediction of heavy metal concentrations in soil using machine learning models. (2026). Proceedings of the Nigerian Society of Physical Sciences, 3, 336. https://doi.org/10.61298/pnspsc.2026.3.336

How to Cite

Prediction of heavy metal concentrations in soil using machine learning models. (2026). Proceedings of the Nigerian Society of Physical Sciences, 3, 336. https://doi.org/10.61298/pnspsc.2026.3.336