PREDICTING OLYMPIC MEDAL DISTRIBUTION FOR LA 2028 BASED ON K-MEANS CLUSTERING AND AN XGBOOST-BOOTSTRAP ENSEMBLE
Volume 4, Issue 1, Pp 48-52, 2026
DOI: https://doi.org/10.61784/wjit3082
Author(s)
YiMing Feng1*, HaoShuai Yu1, LinYu Zhuo2
Affiliation(s)
1Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo 315199, Zhejiang, China.
2Nottingham University Business School, University of Nottingham Ningbo China, Ningbo 315199, Zhejiang, China.
Corresponding Author
YiMing Feng
ABSTRACT
This study aims to predict the total number of medals for countries at the 2028 Los Angeles Summer Olympics and explore the likelihood of new medal-winning nations. To achieve this, the data was cleaned and normalized to ensure consistency, followed by the use of K-means clustering to classify countries into strong and weak sports nations based on historical average medal counts. Six key features were selected to construct predictive models, including medal numbers, athlete participation, development level, and specialty sports performance. The XGBoost-Bootstrap method was applied for U.S. medal prediction, and the Random Forest-Bootstrap model identified potential first-time medalists. The model demonstrated high accuracy on training data but lower performance on test data, indicating challenges in generalization. Nonetheless, the results offer valuable insights for future Olympic forecasting and sports policy planning. This study contributes innovatively by integrating K-means clustering with ensemble learning to tailor predictions for different country groups, combining XGBoost with Bootstrap resampling to quantify uncertainty in medal forecasts, and simultaneously addressing dual objectives—predicting top performers and identifying emerging nations—offering a more comprehensive and policy-relevant framework for Olympic prediction.
KEYWORDS
Medal prediction, K-means clustering, XGBoost-Bootstrap, Classification of sports powerhouses
CITE THIS PAPER
YiMing Feng, HaoShuai Yu, LinYu Zhuo. Predicting Olympic medal distribution for LA 2028 based on K-means clustering and an XGBoost-Bootstrap ensemble. World Journal of Information Technology. 2026, 4(1): 48-52. DOI: https://doi.org/10.61784/wjit3082.
REFERENCES
[1] Smith A, Lee B. Modern uses of the Cobb-Douglas production function in sports economics. Economic Modelling, 2025, 104: 221-233.
[2] Chen Y, Patel R, Wang Z. Bi-LSTM applications in Olympic performance prediction. Neurocomputing, 2024, 512: 118-129.
[3] Zhao M, Wang H, Li J. An ensemble approach: Random Forest, LightGBM, and XGBoost for sports analytics. Expert Systems with Applications, 2025, 213: 119-128.
[4] Liu Q, Zhang T, Sun L. Recent advances in Monte Carlo simulation for uncertain decision making. Applied Mathematics and Computation, 2024, 435: 127-136.
[5] Patel R, Ahmed N, Zhou P, et al. Review of time series forecasting models for sports events. Journal of Forecasting, 2025, 41(2): 201-214.
[6] Kim S, Park J. Machine learning framework for Olympic medal prediction. Artificial Intelligence Review, 2025, 60: 455-470.
[7] Ahmed N, Gao J, Fang Y, et al. Deep learning techniques for national sports analytics. Pattern Recognition Letters, 2025, 170: 1-12.
[8] Huang L, Xu K, Wang Y. GBDT in medal tally prediction: A comparative study. Information Sciences, 2024, 639: 120-134.
[9] Fang Y, Zhang W, Li J. Feature engineering strategies in predictive modelling: A case study in Olympics. Knowledge-Based Systems, 2024, 285: 112-124.
[10] Gao J, He W. Data preprocessing and normalization techniques for machine learning in sports. Computers & Industrial Engineering, 2025, 184: 109-117.

Download as PDF