PREDICTING OLYMPIC MEDAL DISTRIBUTION FOR LA 2028 BASED ON K-MEANS CLUSTERING AND AN XGBOOST-BOOTSTRAP ENSEMBLE

<p><span style=

PREDICTING OLYMPIC MEDAL DISTRIBUTION FOR LA 2028 BASED ON K-MEANS CLUSTERING AND AN XGBOOST-BOOTSTRAP ENSEMBLE

Download as PDF

Volume 4, Issue 1, Pp 48-52, 2026

DOI: https://doi.org/10.61784/wjit3082

Author(s)

YiMing Feng^1*, HaoShuai Yu¹, LinYu Zhuo²

Affiliation(s)

¹Faculty of Science and Engineering, University of Nottingham Ningbo China, Ningbo 315199, Zhejiang, China.

²Nottingham University Business School, University of Nottingham Ningbo China, Ningbo 315199, Zhejiang, China.

Corresponding Author

YiMing Feng

ABSTRACT

This study aims to predict the total number of medals for countries at the 2028 Los Angeles Summer Olympics and explore the likelihood of new medal-winning nations. To achieve this, the data was cleaned and normalized to ensure consistency, followed by the use of K-means clustering to classify countries into strong and weak sports nations based on historical average medal counts. Six key features were selected to construct predictive models, including medal numbers, athlete participation, development level, and specialty sports performance. The XGBoost-Bootstrap method was applied for U.S. medal prediction, and the Random Forest-Bootstrap model identified potential first-time medalists. The model demonstrated high accuracy on training data but lower performance on test data, indicating challenges in generalization. Nonetheless, the results offer valuable insights for future Olympic forecasting and sports policy planning. This study contributes innovatively by integrating K-means clustering with ensemble learning to tailor predictions for different country groups, combining XGBoost with Bootstrap resampling to quantify uncertainty in medal forecasts, and simultaneously addressing dual objectives—predicting top performers and identifying emerging nations—offering a more comprehensive and policy-relevant framework for Olympic prediction.

KEYWORDS

Medal prediction, K-means clustering, XGBoost-Bootstrap, Classification of sports powerhouses

CITE THIS PAPER

YiMing Feng, HaoShuai Yu, LinYu Zhuo. Predicting Olympic medal distribution for LA 2028 based on K-means clustering and an XGBoost-Bootstrap ensemble. World Journal of Information Technology. 2026, 4(1): 48-52. DOI: https://doi.org/10.61784/wjit3082.

REFERENCES

[1] Smith A, Lee B. Modern uses of the Cobb-Douglas production function in sports economics. Economic Modelling, 2025, 104: 221-233.

[2] Chen Y, Patel R, Wang Z. Bi-LSTM applications in Olympic performance prediction. Neurocomputing, 2024, 512: 118-129.

[3] Zhao M, Wang H, Li J. An ensemble approach: Random Forest, LightGBM, and XGBoost for sports analytics. Expert Systems with Applications, 2025, 213: 119-128.

[4] Liu Q, Zhang T, Sun L. Recent advances in Monte Carlo simulation for uncertain decision making. Applied Mathematics and Computation, 2024, 435: 127-136.

[5] Patel R, Ahmed N, Zhou P, et al. Review of time series forecasting models for sports events. Journal of Forecasting, 2025, 41(2): 201-214.

[6] Kim S, Park J. Machine learning framework for Olympic medal prediction. Artificial Intelligence Review, 2025, 60: 455-470.

[7] Ahmed N, Gao J, Fang Y, et al. Deep learning techniques for national sports analytics. Pattern Recognition Letters, 2025, 170: 1-12.

[8] Huang L, Xu K, Wang Y. GBDT in medal tally prediction: A comparative study. Information Sciences, 2024, 639: 120-134.

[9] Fang Y, Zhang W, Li J. Feature engineering strategies in predictive modelling: A case study in Olympics. Knowledge-Based Systems, 2024, 285: 112-124.

[10] Gao J, He W. Data preprocessing and normalization techniques for machine learning in sports. Computers & Industrial Engineering, 2025, 184: 109-117.