Machine-learning forecasting model of tuberculosis cases among children in South Africa
DOI:
https://doi.org/10.17159/sajs.2025/16658Keywords:
ARIMA model, Bayesian, random forest, machine learning, tuberculosisAbstract
Globally, children and young adolescents under 15 years old constitute approximately 11% of all tuberculosis (TB) cases, with a growing concern over TB infections in children under 5 years old, especially in resource-limited settings. Nonetheless, the true extent of TB burden among children remains inadequately explored in South Africa. The application of a random forest–Bayesian autoregressive integrated moving average (RF-BARIMA) model for infectious disease prediction has not been previously employed to study TB in children. In this study, we employed the RF-BARIMA model to forecast TB incidences, from 2010 to 2019, among children under 5 years old in South Africa’s Eastern Cape Province. Comparative analysis demonstrated that the RF-BARIMA model outperformed other models in predictive accuracy and forecast capability. Our predictions revealed a projected mean of 0.4122 TB cases per month in 2022, with an effective sample size of 4054 TB cases in the Eastern Cape Province. These findings indicate a prospective reduction of 1670.85 TB cases among children under 5 years old in the coming years. The RF-BARIMA model offers enhanced predictive and forecast accuracy in comparison to the single Bayesian ARIMA model. These results provide compelling evidence of significant under-reporting and potentially elevated TB incidence among children under 5 years old in South Africa’s Eastern Cape Province, raising important implications for public health policy and intervention strategies.
Significance:
Childhood tuberculosis (TB) in South Africa is a significant concern, with the majority of cases occurring in children aged 0–4 years. The burden in children mirrors the high burden of the adult epidemic in the country. The RF-BARIMA model integrates the non-linear pattern of random forest with the probabilistic time series forecasting strengths of Bayesian ARIMA, aiming to improve prediction accuracy and quantify uncertainty in the forecasts. The results lead to a call for urgent public health policy and intervention strategies to address the under-reporting and elevated TB incidence in this vulnerable demographic, further reinforcing the study’s global significance.
Downloads
Published
Issue
Section
License

All articles are published under a Creative Commons Attribution 4.0 International Licence
Copyright is retained by the authors. Readers are welcome to reproduce, share and adapt the content without permission provided the source is attributed.
Disclaimer: The publisher and editors accept no responsibility for statements made by the authors
How to Cite
- Abstract 148
- PDF 76
- EPUB 26
- XML 26
- Supplementary material 70
- Peer review history 35







.png)