YiMing Feng1*, HaoShuai Yu1, LinYu Zhuo2

">
Science, Technology, Engineering and Mathematics.
Open Access

PREDICTING OLYMPIC MEDAL DISTRIBUTION FOR LA 2028 BASED ON K-MEANS CLUSTERING AND AN XGBOOST-BOOTSTRAP ENSEMBLE

Download as PDF

Volume 4, Issue 1, Pp 48-52, 2026

DOI: https://doi.org/10.61784/wjit3082

Author(s)

YiMing Feng1*, HaoShuai Yu1, LinYu Zhuo2

Affiliation(s)

1Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo 315199, Zhejiang, China.

2Nottingham University Business School, University of Nottingham Ningbo China, Ningbo 315199, Zhejiang, China.

Corresponding Author

YiMing Feng

ABSTRACT

This study aims to predict the total number of medals for countries at the 2028 Los Angeles Summer Olympics and explore the likelihood of new medal-winning nations. To achieve this, the data was cleaned and normalized to ensure consistency, followed by the use of K-means clustering to classify countries into strong and weak sports nations based on historical average medal counts. Six key features were selected to construct predictive models, including medal numbers, athlete participation, development level, and specialty sports performance. The XGBoost-Bootstrap method was applied for U.S. medal prediction, and the Random Forest-Bootstrap model identified potential first-time medalists. The model demonstrated high accuracy on training data but lower performance on test data, indicating challenges in generalization. Nonetheless, the results offer valuable insights for future Olympic forecasting and sports policy planning. This study contributes innovatively by integrating K-means clustering with ensemble learning to tailor predictions for different country groups, combining XGBoost with Bootstrap resampling to quantify uncertainty in medal forecasts, and simultaneously addressing dual objectives—predicting top performers and identifying emerging nations—offering a more comprehensive and policy-relevant framework for Olympic prediction.

KEYWORDS

Medal prediction, K-means clustering, XGBoost-Bootstrap, Classification of sports powerhouses

CITE THIS PAPER

YiMing Feng, HaoShuai Yu, LinYu Zhuo. Predicting Olympic medal distribution for LA 2028 based on K-means clustering and an XGBoost-Bootstrap ensemble. World Journal of Information Technology. 2026, 4(1): 48-52. DOI: https://doi.org/10.61784/wjit3082.

REFERENCES

[1] Smith A, Lee B. Modern uses of the Cobb-Douglas production function in sports economics. Economic Modelling, 2025, 104: 221-233.

[2] Chen Y, Patel R, Wang Z. Bi-LSTM applications in Olympic performance prediction. Neurocomputing, 2024, 512: 118-129.

[3] Zhao M, Wang H, Li J. An ensemble approach: Random Forest, LightGBM, and XGBoost for sports analytics. Expert Systems with Applications, 2025, 213: 119-128.

[4] Liu Q, Zhang T, Sun L. Recent advances in Monte Carlo simulation for uncertain decision making. Applied Mathematics and Computation, 2024, 435: 127-136.

[5] Patel R, Ahmed N, Zhou P, et al. Review of time series forecasting models for sports events. Journal of Forecasting, 2025, 41(2): 201-214.

[6] Kim S, Park J. Machine learning framework for Olympic medal prediction. Artificial Intelligence Review, 2025, 60: 455-470.

[7] Ahmed N, Gao J, Fang Y, et al. Deep learning techniques for national sports analytics. Pattern Recognition Letters, 2025, 170: 1-12.

[8] Huang L, Xu K, Wang Y. GBDT in medal tally prediction: A comparative study. Information Sciences, 2024, 639: 120-134.

[9] Fang Y, Zhang W, Li J. Feature engineering strategies in predictive modelling: A case study in Olympics. Knowledge-Based Systems, 2024, 285: 112-124.

[10] Gao J, He W. Data preprocessing and normalization techniques for machine learning in sports. Computers & Industrial Engineering, 2025, 184: 109-117. 

All published work is licensed under a Creative Commons Attribution 4.0 International License. sitemap
Copyright © 2017 - 2026 Science, Technology, Engineering and Mathematics.   All Rights Reserved.