Science, Technology, Engineering and Mathematics.
Open Access

PERFORMANCE OPTIMIZATION OF DEEPSEEK MOE ARCHITECTURE IN MULTI-SCALE PREDICTION OF STOCK RETURNS

Download as PDF

Volume 3, Issue 2, Pp 1-9, 2025

DOI: https://doi.org/10.61784/wjit3026

Author(s)

HaiLong Liao

Affiliation(s)

School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.

Corresponding Author

HaiLong Liao

ABSTRACT

Stock market data has significant multi-scale characteristics. High-frequency data (such as minute-level price fluctuations) contains rich but noise-intensive short-term information, while low-frequency data (such as daily trend) reflects long-term market dynamics but has response delays. Traditional time-series models (such as LSTM or Transformer) have inherent limitations in processing multi-scale features: the recursive structure of LSTM is difficult to efficiently process high-frequency noise, and the self-attention mechanism of Transformer is insufficient in capturing local features and has a large number of parameters. This study proposes a dynamic routing optimization framework based on DeepSeek MoE (Mixture of Experts), which realizes effective decoupling and fusion of multi-scale features through a hierarchical processing architecture, intelligent routing mechanism, and efficient parallel computing technology. Experimental results show that on the Shanghai-Shenzhen 300 constituent stocks (2018-2024) dataset, the high-frequency prediction error of the model is reduced by 32.7% compared with traditional methods, and the maximum drawdown rate under extreme market conditions is reduced by 41%. Gradient attribution analysis reveals the dominant role of liquidity factors (such as turnover rate) in the prediction results, providing an interpretable intelligent decision-making framework for quantitative investment.

KEYWORDS

DeepSeek; Mixture of Experts (MoE); Dynamic routing mechanism; Stock return prediction; Multi-scale feature decoupling; Financial time-series analysis; VIX volatility index; Gradient attribution analysis; Shanghai-Shenzhen 300 index

CITE THIS PAPER

HaiLong Liao. Performance optimization of DeepSeek MoE architecture in multi-scale prediction of stock returns. World Journal of Information Technology. 2025, 3(2): 1-9. DOI: https://doi.org/10.61784/wjit3026.

REFERENCES

[1] HaiLong Liao. DeepSeek large - scale model: technical analysis and development prospect. Journal of Computer Science and Electrical Engineering. 2025, 7(1): 33-37. DOI: https://doi.org/10.61784/jcsee3035.

[2] HaiLong Liao. A-share intelligent stock selection strategy based on the DeepSeek large model: Technical routes, factor systems, and empirical research. Eurasia Journal of Science and Technology. 2025, 7(2): 7-13. DOI: https://doi.org/10.61784/ejst3070.

[3] Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation, 1997, 9(8): 1735-1780. DOI: https://dl.acm.org/doi/10.1162/NECO.1997.9.8.1735.

[4] Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need. arXiv preprint, 2017, arXiv:1706.03762. https://arxiv.org/abs/1706.03762.

[5] DeepSeek Team. DeepSeek Technology Panorama Analysis (Part II): MoE Architecture Innovation - How to Break Through the Performance Ceiling of Large Models with "Refined Division of Labor". Weixin Articles, 2023.

[6] Shazeer N, Mirhoseini A, Maziarz K, et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv preprint, 2017, arXiv:1701.06538. https://arxiv.org/abs/1701.06538.

[7] DeepSeek-AI Team. Large-Scale Mixture-of-Experts with Dynamic Routing for Multiscale Financial Forecasting. arXiv preprint, 2024, arXiv:2412.19437. https://arxiv.org/abs/2412.19437.

[8] Dai AM, et al. Mixture-of-Experts Copula Models for Multivariate Financial Risk Analysis. arXiv preprint, 2023, arXiv:2307.16432. https://arxiv.org/abs/2307.16432.

[9] Mallat S. A Wavelet Tour of Signal Processing: The Sparse Way (3rd ed.). Springer, 2009. https://link.springer.com/book/10.1007/978-0-387-21656-7.

[10] Zhang J, Zhou H, Li H, et al. LSTM-Transformer for Multivariate Time Series Forecasting. arXiv preprint, 2021, arXiv:2106.00263. https://arxiv.org/abs/2106.00263.

[11] Yu F, Koltun V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv preprint, 2015, arXiv:1511.07122. https://arxiv.org/abs/1511.07122.

All published work is licensed under a Creative Commons Attribution 4.0 International License. sitemap
Copyright © 2017 - 2025 Science, Technology, Engineering and Mathematics.   All Rights Reserved.