PERFORMANCE OPTIMIZATION OF DEEPSEEK MOE ARCHITECTURE IN MULTI-SCALE PREDICTION OF STOCK RETURNS

Authors

  • HaiLong Liao (Corresponding Author) School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing 100049, China.

Keywords:

DeepSeek, Mixture of Experts (MoE), Dynamic routing mechanism, Stock return prediction, Multi-scale feature decoupling, Financial time-series analysis, VIX volatility index, Gradient attribution analysis, Shanghai-Shenzhen 300 index

Abstract

Stock market data has significant multi-scale characteristics. High-frequency data (such as minute-level price fluctuations) contains rich but noise-intensive short-term information, while low-frequency data (such as daily trend) reflects long-term market dynamics but has response delays. Traditional time-series models (such as LSTM or Transformer) have inherent limitations in processing multi-scale features: the recursive structure of LSTM is difficult to efficiently process high-frequency noise, and the self-attention mechanism of Transformer is insufficient in capturing local features and has a large number of parameters. This study proposes a dynamic routing optimization framework based on DeepSeek MoE (Mixture of Experts), which realizes effective decoupling and fusion of multi-scale features through a hierarchical processing architecture, intelligent routing mechanism, and efficient parallel computing technology. Experimental results show that on the Shanghai-Shenzhen 300 constituent stocks (2018-2024) dataset, the high-frequency prediction error of the model is reduced by 32.7% compared with traditional methods, and the maximum drawdown rate under extreme market conditions is reduced by 41%. Gradient attribution analysis reveals the dominant role of liquidity factors (such as turnover rate) in the prediction results, providing an interpretable intelligent decision-making framework for quantitative investment.

References

[1] HaiLong Liao. DeepSeek large - scale model: technical analysis and development prospect. Journal of Computer Science and Electrical Engineering. 2025, 7(1): 33-37. DOI: https://doi.org/10.61784/jcsee3035 .

[2] HaiLong Liao. A-share intelligent stock selection strategy based on the DeepSeek large model: Technical routes, factor systems, and empirical research. Eurasia Journal of Science and Technology. 2025, 7(2): 7-13. DOI: https://doi.org/10.61784/ejst3070 .

[3] Hochreiter S, Schmidhuber J. Long Short-Term Memory. Neural Computation, 1997, 9(8): 1735-1780. DOI: https://dl.acm.org/doi/10.1162/NECO.1997.9.8.1735 .

[4] Vaswani A, Shazeer N, Parmar N, et al. Attention Is All You Need. arXiv preprint, 2017, arXiv:1706.03762. https://arxiv.org/abs/1706.03762 .

[5] DeepSeek Team. DeepSeek Technology Panorama Analysis (Part II): MoE Architecture Innovation - How to Break Through the Performance Ceiling of Large Models with "Refined Division of Labor". Weixin Articles, 2023.

[6] Shazeer N, Mirhoseini A, Maziarz K, et al. Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer. arXiv preprint, 2017, arXiv:1701.06538. https://arxiv.org/abs/1701.06538 .

[7] DeepSeek-AI Team. Large-Scale Mixture-of-Experts with Dynamic Routing for Multiscale Financial Forecasting. arXiv preprint, 2024, arXiv:2412.19437. https://arxiv.org/abs/2412.19437 .

[8] Dai AM, et al. Mixture-of-Experts Copula Models for Multivariate Financial Risk Analysis. arXiv preprint, 2023, arXiv:2307.16432. https://arxiv.org/abs/2307.16432 .

[9] Mallat S. A Wavelet Tour of Signal Processing: The Sparse Way (3rd ed.). Springer, 2009. https://link.springer.com/book/10.1007/978-0-387-21656-7 .

[10] Zhang J, Zhou H, Li H, et al. LSTM-Transformer for Multivariate Time Series Forecasting. arXiv preprint, 2021, arXiv:2106.00263. https://arxiv.org/abs/2106.00263 .

[11] Yu F, Koltun V. Multi-Scale Context Aggregation by Dilated Convolutions. arXiv preprint, 2015, arXiv:1511.07122. https://arxiv.org/abs/1511.07122 .

Downloads

Published

2025-03-12

Issue

Section

Research Article

DOI:

How to Cite

Liao, H. (2025). Performance Optimization Of Deepseek Moe Architecture In Multi-Scale Prediction Of Stock Returns. Eurasia Journal of Science and Technology, 3(2), 1-9. https://doi.org/10.61784/wjit3026