Zero-inflated and hurdle models with an application to the number of involved axillary lymph nodes in patients with breast cancer in Zimbabwe: A bootstrap resampling validation approach

Authors

  • Bester Saruchera School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg, 3209, South Africa
  • Oliver Bodhlyera School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg,3209, South Africa
  • Henry Mwambi School of Mathematics, Statistics and Computer Science, University of KwaZulu-Natal, Pietermaritzburg,3209, South Africa
  • Ntokozo Ndlovu Department of Oncology, Faculty of Medicine and Health Sciences, University of Zimbabwe, Harare, Zimbabwe

Keywords:

Breast cancer, Axillary lymph nodes, Count regression models, Boot- strapping resampling

Abstract

Breast cancer, increasingly prevalent in Zimbabwe, underscores the need to understand the involved axillary nodal status in diagnosed patients for assessment of disease severity and its potential progression. This study was undertaken to investigate factors influencing the number of axillary lymph nodes in breast cancer patients by identifying the best fitting count regression model, validated through bootstrap resampling. A retrospective analysis using hospital-based data for patients diagnosed with breast cancer at one of the two major referral hospitals in Zimbabwe was applied. We evaluated and compared count regression models – Poisson with Negative Binomial (NB), Zero-Inflated Negative Binomial (ZINB), Zero-Inflated Poisson (ZIP), Hurdle Negative Binomial (HNB) and Hurdle Poisson (HP) which are efficient in handling over-dispersed count data to investigate the various risk factors associated with involved axillary lymph nodes. Covariates included age, tumor size, tumor grade, estrogen receptor status, progesterone receptor status and HER2 status. Model diagnostics were assessed using Aikake Information Criterion and Bayesian Information Criterion. The ZINB and HNB models outperformed other models, with the HNB model demonstrating consistency across bootstrap-resampled datasets. Bootstrap resampling validated the reliability of model estimates, addressing biases caused by small sample sizes. Age was significantly associated with the zero-inflation component of the HNB model. This study highlights the importance of selecting appropriate count regression models for analyzing medical data and demonstrates the utility of integrating bootstrap resampling to ensure robust statistical inference. The findings provide actionable insights for therapy planning and resource allocation.

Dimensions

[1] World Health Organization, “Zimbabwe burden of cancer: cancer country profile 202”, 2020. [Online]. Available: https://cdn.who.int/media/docs/default-source/country-profiles/cancer/zwe-2020.pdf.

[2] International Agency for Research on Cancer, “World cancer day: breast cancer overtakes lung cancer in terms of number of new cancer cases worldwide”, 2021, [Press Release]. [Online]. https://www.iarc.who.int/wp-content/uploads/2021/02/pr294_E.pdf.

[3] M. B. Amin, F. L. Greene, S. B. Edge, C. C. Compton, J. E. Gershenwald, R. K. Brookland, L. Meyer, D. M. Gress, D. R. Byrd & D. P. Winchester, “The eighth edition AJCC cancer staging manual: continuing to build a bridge from a population?based to a more ?personalized? approach to cancer staging”, CA: A Cancer Journal for Clinicians 67 (2017) 93. https://doi.org/10.3322/caac.21388.

[4] F. Peintinger, R. Reitsamer., M. Smidt, T. Kuhn & C. Liedtke, “Lymph nodes in breast cancer - what can we learn from translational research”, Breast Care 13 (2018) 342. https://doi.org/10.1159/000492435.

[5] Y. Zou, X. Hu & X. Deng, “Distant lymph node metastases from breast cancer-is it time to review TNM cancer staging?”, JAMA Network Open 4 (2021) e212026. https://doi.org/10.1001/jamanetworkopen.2021.2026.

[6] J. M. Hilbe, Negative binomial regression, Cambridge University Press, Cambridge, UK, 2011. https://doi.org/10.1017/CBO9780511973420.

[7] D. Lambert, “Zero-inflated poisson regression, with an application to defects in manufacturing”, Technometrics 34 (1992) 1. https://doi.org/10.2307/1269547.

[8] W. H. Greene, “Accounting for excess zeros and sample selection in poisson and negative binomial regression models”, NYU Working Paper No. EC-94-10, 1994. [Online]. https://ssrn.com/abstract=1293115.

[9] S. G. Heeringa, B. T. West & P. A. Berglund, Applied survey data analysis, Chapman & Hall/CRC, CRC Press, 2017, pp. 1–590. https://doi.org/10.1201/9781420080674

[10] D. C. Heilbron, “Zero?altered and other regression models for count data with added zeros”, Biometrical Journal 36 (1994) 531. https://doi.org/10.1002/bimj.4710360505.

[11] J. Mullahy, “Specification and testing of some modified count data models”, Journal of Econometrics 33 (1986) 341. https://doi.org/10.1016/0304-4076(86)90002-3.

[12] O. S. Adesina, “Bayesian multilevel models for count data”, Journal of the Nigerian Society of Physical Sciences 3 (2021) 224. https://doi.org/10.46481/jnsps.2021.168.

[13] O. Maxwell, B. A. Mayowa, I. U. Chinedu & E. Amadi, “Modelling count data; a generalized linear model framework”, American Journal of Mathematics and Statistics 8 (2018) 179. https://doi.org/10.5923/j.ajms.20180806.03.

[14] P. K. Swain, S. Grover, S. Chakravarty, K. Goel & V. Singh, “Estimation of number of involved lymph nodes in breast cancer patients using Bayesian regression approach”, J. Stat. Appl. Probab. Lett. 4 (2017) 17. https://doi.org/10.18576/jsapl/040103.

[15] C. X. Feng, “A comparison of zero-inflated and hurdle models for modeling zero-inflated count data”, Journal of Statistical Distributions and Applications 8 (2021) 8. urlhttps://doi.org/10.1186/s40488-021-00121-4.

[16] M. Liaqat & S. Kamal, “Zero-inflated and hurdle models with an application to the number of involved axillary lymph nodes in primary breast cancer”, Journal of King Saud University-Science 34 (2022) 101932. https://doi.org/10.1016/j.jksus.2022.101932.

[17] N. S. Abu Bakar, J. Ab Hamid, M. N. S. M. ShaifulJefri, M. N. Sham & J. A. Syakira, “Count data models for outpatient health services utilisation”, BMC Medical Research Methodology 22 (2022) 261. https://doi.org/10.1186/s12874-022-01733-3.

[18] K. Nketia & D. K. de Souza, “Using zero-inflated and hurdle regression models to analyze schistosomiasis data of school children in the southern areas of Ghana”, PLOS ONE 19 (2024) e0304681. https://doi.org/10.1371/journal.pone.0304681.

[19] A. A. Yirga, S. F. Melesse, H. G. Mwambi & D. G. Ayele, “Negative binomial mixed models for analyzing longitudinal CD4 count data”, Scientific Reports 10 (2020) 16742. https://https://doi.org/10.1038/s41598-020-73883-7.

[20] M.-C. Hu, M. Pavlicova & E. V. Nunes, “Zero-inflated and hurdle models of count data with extra zeros: examples from an HIV-risk reduction intervention trial”, The American Journal of Drug and Alcohol Abuse 37 (2011) 367. https://doi.org/10.3109/00952990.2011.597280.

[21] S. Sharker, L. Balbuena, G. Marcoux & C. X. Feng, “Modeling socio-demographic and clinical factors influencing psychiatric inpatient service use: a comparison of models for zero-inflated and overdispersed count data”, BMC Medical Research Methodology 20 (2020) 232. https://doi.org/10.1186/s12874-020-01112-w.

[22] F. Tüzen, S. Erba? & H. Olmu?, “A simulation study for count data models under varying degrees of outliers and zeros”, Communications in Statistics-Simulation and Computation 49 (2020) 1078. https://doi.org/10.1080/03610918.2018.1498886.

[23] L. Xu, A. D. Paterson, W. Turpin & W. Xu., “Assessment and selection of competing models for zero-inflated microbiome data”, PLoS One 10 (2015) e0129606. https://doi.org/10.1371/journal.pone.0129606.

[24] S. Younespour, E. Maraghi, A. Saki Malehi, M. Raissizadeh, M. Seghatoleslami & M. Hosseinzadeh, “Evaluating related factors to the number of involved lymph nodes in patients with breast cancer using zero-inflated negative binomial regression model”, Journal of Biostatistics and Epidemiology 6 (2021) 259. https://doi.org/10.18502/jbe.v6i4.5679.

[25] N. Asiamah, H. Kofi Mensah & E. Fosu Oteng-Abayie, “Do larger samples really lead to more precise estimates? A simulation study”, American Journal of Educational Research 5 (2017) 9. https://doi.org/10.12691/education-5-1-2.

[26] B. Efron, The Jackknife, the bootstrap and other resampling plans, Society for Industrial and Applied Mathematics, SIAM, Carlifonia, 1982. https://epubs.siam.org/doi/book/10.1137/1.9781611970319.

[27] N. V. Truong, T. Shimizu & T. Kurihara, “Generating reliable tourist accommodation statistics: boot-strapping regression model for overdispersed long-tailed data”, Journal of Tourism, Heritage & Services Marketing (JTHSM) 6 (2020) 30. https://doi.org/10.5281/zenodo.3837608.

[28] J. Sillabutra, P. Kitidamrongsuk, C. Ujeh, S. Sae-tang & K. Donjdee, “Bootstrapping with R to make generalized inference for regression model”, Procedia Computer Science 86 (2016) 228. https://pdf.sciencedirectassets.com/.

[29] X. Zhang, J. Lee & W. W. B. Goh, “An investigation of how normalisation and local modelling techniques confound machine learning performance in a mental health study”, Heliyon 8 (2022) e09502. https://10.1016/j.heliyon.2022.e09502.

[30] A. Fitrianto, “A study of count regression models for mortality rate”, CAUCHY: Jurnal Matematika Murni dan Aplikasi 7 (2021) 142. https://doi.org/10.18860/ca.v7i1.13642.

[31] M. Devidas & E. O. George, “Monotonic algorithms for maximum likelihood estimation in generalized linear models”, Sankhy?: The Indian Journal of Statistics, Series B 61 (1999) 382. http://www.jstor.org/stable/25053099.

[32] J. W. Hardin & J. M. Hilbe, Generalized linear models and extensions, Stata Press, Arizona, 2007. https://www.google.com/url?sa=t&source=web&rct=j&opi=89978449&url=https://www.stata-press.com/books/preview/glmext4-preview.pdf.

[33] S. A. Klugman, H. Panjer & G. Willmot, Loss models: from data to decisions, 3rd ed., Wiley Series in Probability and Statistics, John Wiley & Sons, Hoboken, N.J., 1998. https://lccn.loc.gov/2018031122.

[34] J. M. Hilbe, Modeling count data, International Encyclopedia of Statistical Science, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-04898-2_369.

[35] M. B. Morrissey & G. D. Ruxton, “Revisiting advice on the analysis of count data”, Methods in Ecology and Evolution 11 (2020) 1133. https://doi.org/10.1111/2041-210X.13473.

[36] J. F. Lawless, “Negative binomial and mixed poisson regression”, The Canadian Journal of Statistics / LaRevue Canadienne de Statistique 15 (1987) 209. https://doi.org/10.2307/3314912.

[37] J. Brownlee, “A gentle introduction to the BFGS optimization algorithm:tutorial on optimization”. Accessed on 19 May 2021. [Online]. https://machinelearningmastery.com/bfgs-optimization-in-python/.

[38] H. Akaike, “A new look at the statistical model identification”, IEEE Transactions on Automatic Control 19 (1974) 716. https://doi.org/10.1109/TAC.1974.1100705.

[39] G. Schwarz, “Estimating the dimension of a model”, Annals of Statistics 6 (1978) 461. https://doi.org/10.1214/aos/1176344136.

[40] L. Simar & P. W. Wilson, “Estimation and inference in two-stage, semi-parametric models of production processes”, Journal of Econometrics 136 (2007) 31. https://doi.org/10.1016/j.jeconom.2005.07.009.

[41] T. J. DiCiccio & B. Efron, “Bootstrap confidence intervals”, Statistical Science 11 (1996) 189. https://DOI:10.1214/ss/1032280214.

[42] M. R. Chernick, Bootstrap methods: a guide for practitioners and researchers, John Wiley & Sons, 2007. https:doi.org//10.1002/9780470192573.

[43] I. Jatoi, S. G. Hilsenbeck, G. M. Clark & C. K. Osborne, “Significance of axillary lymph node metastasis in primary breast cancer”, Journal of Clinical Oncology 17 (1999) 2334. https://doi.org/10.1200/JCO.1999.17.8.2334.

[44] A. Andika, S. Abdullah & S. Nurrohmah, “Hurdle negative binomial regression model”, in ICSA-International Conference on Statistics and Analytics, 2019. [Online]. https://doi.org/10.29244/icsa.2019.pp57-68.

[45] G. A. Fernandez & K. P. Vatcheva, “A comparison of statistical methods for modeling count data with an application to hospital length of stay”, BMC Medical Research Methodology 22 (2022) 211. https://doi.org/10.1186/s12874-022-01685-8.

[46] S. M. Downs-Canner, C. E. Gaber, R. J. Louie, P. D. Strassle, K. K. Gallagher, H. B. Muss & D. W. Ollila, “Nodal positivity decreases with age in women with early-stage, hormone receptor-positive breast cancer”, Cancer 126 (2020) 1193. https://doi.org/10.1002/cncr.32668.

[47] M. Luo, X. Lin, D. Hao, K. W. Shen, W. Wu, L. Wang, S. Ruan & J. Zhou, “Incidence and risk factors of lymph node metastasis in breast cancer patients without preoperative chemoradiotherapy and neoadjuvant therapy: analysis of SEER data”, Gland Surgery 12 (2023) 1508. https://doi.org/10.21037/gs-23-258.

[48] H. Wildiers, B. Van Calster, L. V. van de Poll-Franse, W. Hendrickx, J. Røislien, A. Smeets, R. Paridaens, K. Deraedt, K. Leunen & C. Weltens, “Relationship between age and axillary lymph node involvement in women with breast cancer”, Journal of Clinical Oncology 27 (2009) 2931. https://doi.org/10.1200/ JCO.2008.16.7619.

[49] X. Cui, H. Zhu & J. Huang, “Nomogram for predicting lymph node involvement in triple-negative breast cancer”, Frontiers in Oncology 10 (2020) 608334. https://doi.org/10.3389/fonc.2020.608334.

[50] Z. Lv, W. Zhang, Y. Zhang, G. Zhong, X. Zhang, Q. Yang & Y. Li, “Metastasis patterns and prognosis of octogenarians with metastatic breast cancer: A large-cohort retrospective study”, PLOS ONE 17 (2022) e0263104. https://doi.org/10.1371/journal.pone.0263104.

[51] S. Sandoughdaran, M. Malekzadeh & M. E. Akbari, “Frequency and predictors of axillary lymph node metastases in Iranian women with early breast cancer”, Asian Pacific Journal of Cancer Prevention: APJCP 19 (2018) 1617. https://doi.org/10.22034/APJCP.2018.19.6.1617.

[52] L. T. Greer, M. Rosman, W. C. Mylander, W. Liang, R. R. Buras, A. B. Chagpar, M. J. Edwards & L. Tafra, “A prediction model for the presence of axillary lymph node involvement in women with invasive breast cancer: A focus on older women”, The Breast Journal 20 (2014) 147. https://doi.org/10.1111/tbj.12233.

[53] V. Sopik & S. A. Narod, “The relationship between tumour size, nodal status and distant metastases: on the origins of breast cancer”, Breast Cancer Research and Treatment 170 (2018) 647. https://doi.org/10.1007/s10549-018-4796-9.

[54] S. K. Min, S. K. Lee, J. Woo, S. M. Jung, J. M. Ryu, J. Yu, J. E. Lee, S. W. Kim, B. J. Chae & S. J. Nam, “Relation between tumor size and lymph node metastasis according to subtypes of breast cancer”, Journal of Breast Cancer 24 (2021) 75. https://doi.org/10.4048/jbc.2021.24.e4.

[55] K. M. Elleson, K. Englander, J. Gallagher, N. Chintapally, W. Sun, J. Whiting, M. Mallory, J. Kiluk, S. Hoover, N. Khakpour & others, “Factors predictive of positive lymph nodes for breast cancer”, Current Oncology 30 (2023) 10351. https://doi.org/10.3390/curroncol30120754.

[56] C. Jiang, Y. Xiu, K. Qiao, X. Yu, S. Zhang & Y. Huang, “Prediction of lymph node metastasis in patients with breast invasive micropapillary carcinoma based on machine learning and SHapley Additive exPlanations framework”, Frontiers in Oncology 12 (2022) 981059. https://doi.org/10.3389/fonc.2022.981059.

[57] Y. Zhang, J. Li, Y. Fan, X. Li, J. Qiu, M. Zhu & H. Li, “Risk factors for axillary lymph node metastases in clinical stage T1-2N0M0 breast cancer patients”, Medicine 98 (2019) e17481. https://doi.org/10.1097/MD.0000000000017481.

Published

2025-02-27

How to Cite

Zero-inflated and hurdle models with an application to the number of involved axillary lymph nodes in patients with breast cancer in Zimbabwe: A bootstrap resampling validation approach. (2025). Recent Advances in Natural Sciences, 3(1), 137. https://doi.org/10.61298/rans.2025.3.1.137

How to Cite

Zero-inflated and hurdle models with an application to the number of involved axillary lymph nodes in patients with breast cancer in Zimbabwe: A bootstrap resampling validation approach. (2025). Recent Advances in Natural Sciences, 3(1), 137. https://doi.org/10.61298/rans.2025.3.1.137