HYBRID FEATURE-ENHANCED 4D GAUSSIAN SPLATTING FOR DYNAMIC SCENE RECONSTRUCTION

Authors

  • Sha Li School of Management, University of Shanghai for Science and Technology, Shanghai 200082, China.
  • WanXiang Qin (Corresponding Author) College of Arts and Design, Yulin Normal University, Yulin 537000, Guangxi, China.

Keywords:

4D Gaussian splatting, Hybrid feature enhanced, Dynamic scene reconstruction

Abstract

Real-time modeling of dynamic scenes is a pivotal challenge in computer vision and graphics. Methods employing canonical space deformation with 3D Gaussians have achieved compelling speed, but a fundamental limitation persists: their feature representation often fails to capture the complex interplay of spatial, temporal, and multi-scale information in dynamic settings. This paper introduces a hybrid feature enhancement framework that systematically addresses this core issue. Our key idea is to forge a powerful and adaptive feature representation through the synergistic co-design of three modules: a Spatial Relation Module that explicitly encodes geometric context, a Dynamic Feature Adapter that employs gating for temporal conditioning, and a Multi-scale Integration Module that dynamically aggregates features across scales. The primary contribution of our work is this unified architecture designed to robustly enhance the feature backbone of deformation-based dynamic Gaussian representations. Extensive experiments on major benchmarks, including D-NeRF and HyperNeRF, demonstrate that our framework consistently elevates reconstruction quality, achieving superior performance over state-of-the-art methods on key metrics like PSNR and SSIM, thereby validating its general effectiveness.

References

[1] Wu Guanjun, Yi Taoranm, Fang Jiemin, et al. 4D gaussian splatting for real-time dynamic scene rendering. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle, WA, USA, 2024, 20310-20320. DOI: 10.1109/CVPR52733.2024.01920.

[2] Yang Ziyi, Gao Xinyu, Zhou Wen, et al. Deformable 3D gaussians for high-fidelity monocular dynamic scene reconstruction. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), Seattle, WA, USA, 2024, 20331-20341. DOI: 10.1109/CVPR52733.2024.01922.

[3] Bae Jeongmin, Kim Seoha, Yun Youngsik, et al. Per-gaussian embedding-based deformation for deformable 3d gaussian splatting. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024, 15073. DOI: 10.1007/978-3-031-72633-0_18.

[4] Luiten Jonathon, Kopanas Georgios, Leibe Bastian, et al. Dynamic 3D gaussians: Tracking by persistent dynamic view synthesis. 2024 International Conference on 3D Vision (3DV), Davos, Switzerland, 2024, 800-809. DOI: 10.1109/3DV62453.2024.00044.

[5] Yang, Zeyu, Yang Hongye, Pan Zijie, et al. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint. 2023. DOI: 10.48550/arXiv.2310.10642.

[6] Kumar Ashish, Rajagopalan A N. DynaMoDe-NeRF: Motion-aware Deblurring Neural Radiance Field for Dynamic Scenes. Proceedings of the Computer Vision and Pattern Recognition Conference. 2025, 21728-21738.

[7] Coomans Arno, Edoardo A Dominci, Christian Döring, et al. Real‐time Neural Rendering of Dynamic Light Fields. Computer Graphics Forum, 2024, 43(6).

[8] Wang Jiaxu, Xu Bo, Cheng Hao, et al. DONE: Dynamic Neural Representation Via Hyperplane Neural ODE. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, 4355-4359. DOI: 10.1109/ICASSP48485.2024.10446247.

[9] Zhang Boyu, Zhu Zheng, Xu Wenbo. DetRF: Detachable Novel Views Synthesis of Dynamic Scenes Using Backdrop-Driven Neural Radiance Fields. Proceedings of the AAAI Conference on Artificial Intelligence. 2025, 39(9): 9860-9868. DOI: 10.1609/aaai.v39i9.33069.

[10] Liu Lingjie, Gu Jiatao, Lin Kyaw Zaw, et al. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 2020, 33: 15651-15663.

[11] Kim Seoha, Bae Jeongmin, Yun Youngsik, et al. Sync-nerf: Generalizing dynamic nerfs to unsynchronized videos. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(3): 2777-2785.

[12] Park Keunhong, Sinha Utkarsh, Hedman Peter, et al. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM Transactions on Graphics (TOG), 2021, 40(6): 1-12.

[13] Fan Haoqi, Xiong Bo, Mangalam Karttikeya, et al. Multiscale vision transformers. Proceedings of the IEEE/CVF international conference on computer vision, (ICCV), Montreal, QC, Canada, 2021, 6804-6815. DOI: 10.1109/ICCV48922.2021.00675.

[14] Xia Chunlong, Wang Xinliang, Lv Feng, et al. Vit-comer: Vision transformer with convolutional multi-scale feature interaction for dense predictions. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024.

[15] Ke Junjie, Wang Qifei, Wang Yilin, et al. MUSIQ: Multi-scale image quality transformer. Proceedings of the IEEE/CVF international conference on computer vision (ICCV), Montreal, QC, Canada, 2021, 5493-5502. DOI: 10.1109/ICCV48922.2021.00510.

[16] Zhou Xin, Liang Dingkang, Xu Wei, et al. Dynamic adapter meets prompt tuning: Parameter-efficient transfer learning for point cloud analysis. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2024, 14707-14717. DOI: 10.1109/CVPR52733.2024.01393.

[17] Kaur Gagandeep, Amit Sharma. A deep learning-based model using hybrid feature extraction approach for consumer sentiment analysis. Journal of big data, 2023, 10. DOI: 10.1186/s40537-022-00680-6.

[18] Zhu Tianyu, Hiller Markus, Ehsanpour Mahsa, et al. Looking beyond two frames: End-to-end multi-object tracking using spatial and temporal transformers. IEEE transactions on pattern analysis and machine intelligence, 2023, 45(11): 12783-12797. DOI: 10.1109/TPAMI.2022.3213073.

[19] Luo Yingtao, Liu Qiang, Liu Zhaocheng. Stan: Spatio-temporal attention network for next location recommendation. Proceedings of the web conference 2021, 2021, 2177 - 2185. DOI: 10.1145/3442381.3449998.

[20] Zeng Yifei, Jiang Yanqin, Zhu Siyu, et al. Stag4D: Spatial-temporal anchored generative 4d gaussians. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024, 15094. DOI: 10.1007/978-3-031-72764-1_10.

[21] Lee Dong In, Hyeongcheol Park, Jiyoung Seo, et al. Editsplat: Multi-view fusion and attention-guided optimization for view-consistent 3d scene editing with 3d gaussian splatting. Proceedings of the Computer Vision and Pattern Recognition Conference, 2025.

[22] Schönberger J L, Frahm J M. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), Las Vegas, NV, USA, 2016, 4104-4113. DOI: 10.1109/CVPR.2016.445.

[23] Kerbl B, Kopanas G, Leimkühler T, et al. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 2023, 42(4): 1-14.

Downloads

Published

2026-03-24

How to Cite

Sha Li, WanXiang Qin. Hybrid Feature-Enhanced 4D Gaussian Splatting For Dynamic Scene Reconstruction. Eurasia Journal of Science and Technology. 2026, 8(1): 70-77. DOI: https://doi.org/10.61784/ejst3136.