HYBRID FEATURE-ENHANCED 4D GAUSSIAN SPLATTING FOR DYNAMIC SCENE RECONSTRUCTION

Sha Li; WanXiang Qin

doi:10.61784/ejst3136

Authors

Sha Li School of Management, University of Shanghai for Science and Technology, Shanghai 200082, China.
WanXiang Qin (Corresponding Author) College of Arts and Design, Yulin Normal University, Yulin 537000, Guangxi, China.

Keywords:

4D Gaussian splatting, Hybrid feature enhanced, Dynamic scene reconstruction

Abstract

Real-time modeling of dynamic scenes is a pivotal challenge in computer vision and graphics. Methods employing canonical space deformation with 3D Gaussians have achieved compelling speed, but a fundamental limitation persists: their feature representation often fails to capture the complex interplay of spatial, temporal, and multi-scale information in dynamic settings. This paper introduces a hybrid feature enhancement framework that systematically addresses this core issue. Our key idea is to forge a powerful and adaptive feature representation through the synergistic co-design of three modules: a Spatial Relation Module that explicitly encodes geometric context, a Dynamic Feature Adapter that employs gating for temporal conditioning, and a Multi-scale Integration Module that dynamically aggregates features across scales. The primary contribution of our work is this unified architecture designed to robustly enhance the feature backbone of deformation-based dynamic Gaussian representations. Extensive experiments on major benchmarks, including D-NeRF and HyperNeRF, demonstrate that our framework consistently elevates reconstruction quality, achieving superior performance over state-of-the-art methods on key metrics like PSNR and SSIM, thereby validating its general effectiveness.

References

[1] Wu Guanjun, Yi Taoranm, Fang Jiemin, et al. 4D gaussian splatting for real-time dynamic scene rendering. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Seattle, WA, USA, 2024, 20310-20320. DOI: 10.1109/CVPR52733.2024.01920.

[2] Yang Ziyi, Gao Xinyu, Zhou Wen, et al. Deformable 3D gaussians for high-fidelity monocular dynamic scene reconstruction. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition(CVPR), Seattle, WA, USA, 2024, 20331-20341. DOI: 10.1109/CVPR52733.2024.01922.

[3] Bae Jeongmin, Kim Seoha, Yun Youngsik, et al. Per-gaussian embedding-based deformation for deformable 3d gaussian splatting. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024, 15073. DOI: 10.1007/978-3-031-72633-0_18.

[4] Luiten Jonathon, Kopanas Georgios, Leibe Bastian, et al. Dynamic 3D gaussians: Tracking by persistent dynamic view synthesis. 2024 International Conference on 3D Vision (3DV), Davos, Switzerland, 2024, 800-809. DOI: 10.1109/3DV62453.2024.00044.

[5] Yang, Zeyu, Yang Hongye, Pan Zijie, et al. Real-time photorealistic dynamic scene representation and rendering with 4d gaussian splatting. arXiv preprint. 2023. DOI: 10.48550/arXiv.2310.10642.

[6] Kumar Ashish, Rajagopalan A N. DynaMoDe-NeRF: Motion-aware Deblurring Neural Radiance Field for Dynamic Scenes. Proceedings of the Computer Vision and Pattern Recognition Conference. 2025, 21728-21738.

[7] Coomans Arno, Edoardo A Dominci, Christian Döring, et al. Real‐time Neural Rendering of Dynamic Light Fields. Computer Graphics Forum, 2024, 43(6).

[8] Wang Jiaxu, Xu Bo, Cheng Hao, et al. DONE: Dynamic Neural Representation Via Hyperplane Neural ODE. ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Korea, Republic of, 2024, 4355-4359. DOI: 10.1109/ICASSP48485.2024.10446247.

[9] Zhang Boyu, Zhu Zheng, Xu Wenbo. DetRF: Detachable Novel Views Synthesis of Dynamic Scenes Using Backdrop-Driven Neural Radiance Fields. Proceedings of the AAAI Conference on Artificial Intelligence. 2025, 39(9): 9860-9868. DOI: 10.1609/aaai.v39i9.33069.

[10] Liu Lingjie, Gu Jiatao, Lin Kyaw Zaw, et al. Neural sparse voxel fields. Advances in Neural Information Processing Systems, 2020, 33: 15651-15663.

[11] Kim Seoha, Bae Jeongmin, Yun Youngsik, et al. Sync-nerf: Generalizing dynamic nerfs to unsynchronized videos. Proceedings of the AAAI Conference on Artificial Intelligence, 2024, 38(3): 2777-2785.

[12] Park Keunhong, Sinha Utkarsh, Hedman Peter, et al. Hypernerf: A higher-dimensional representation for topologically varying neural radiance fields. ACM Transactions on Graphics (TOG), 2021, 40(6): 1-12.

[13] Fan Haoqi, Xiong Bo, Mangalam Karttikeya, et al. Multiscale vision transformers. Proceedings of the IEEE/CVF international conference on computer vision, (ICCV), Montreal, QC, Canada, 2021, 6804-6815. DOI: 10.1109/ICCV48922.2021.00675.

[14] Xia Chunlong, Wang Xinliang, Lv Feng, et al. Vit-comer: Vision transformer with convolutional multi-scale feature interaction for dense predictions. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024.

[15] Ke Junjie, Wang Qifei, Wang Yilin, et al. MUSIQ: Multi-scale image quality transformer. Proceedings of the IEEE/CVF international conference on computer vision (ICCV), Montreal, QC, Canada, 2021, 5493-5502. DOI: 10.1109/ICCV48922.2021.00510.

[16] Zhou Xin, Liang Dingkang, Xu Wei, et al. Dynamic adapter meets prompt tuning: Parameter-efficient transfer learning for point cloud analysis. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 2024, 14707-14717. DOI: 10.1109/CVPR52733.2024.01393.

[17] Kaur Gagandeep, Amit Sharma. A deep learning-based model using hybrid feature extraction approach for consumer sentiment analysis. Journal of big data, 2023, 10. DOI: 10.1186/s40537-022-00680-6.

[18] Zhu Tianyu, Hiller Markus, Ehsanpour Mahsa, et al. Looking beyond two frames: End-to-end multi-object tracking using spatial and temporal transformers. IEEE transactions on pattern analysis and machine intelligence, 2023, 45(11): 12783-12797. DOI: 10.1109/TPAMI.2022.3213073.

[19] Luo Yingtao, Liu Qiang, Liu Zhaocheng. Stan: Spatio-temporal attention network for next location recommendation. Proceedings of the web conference 2021, 2021, 2177 - 2185. DOI: 10.1145/3442381.3449998.

[20] Zeng Yifei, Jiang Yanqin, Zhu Siyu, et al. Stag4D: Spatial-temporal anchored generative 4d gaussians. European Conference on Computer Vision. Cham: Springer Nature Switzerland, 2024, 15094. DOI: 10.1007/978-3-031-72764-1_10.

[21] Lee Dong In, Hyeongcheol Park, Jiyoung Seo, et al. Editsplat: Multi-view fusion and attention-guided optimization for view-consistent 3d scene editing with 3d gaussian splatting. Proceedings of the Computer Vision and Pattern Recognition Conference, 2025.

[22] Schönberger J L, Frahm J M. Structure-from-motion revisited. In Proceedings of the IEEE conference on computer vision and pattern recognition(CVPR), Las Vegas, NV, USA, 2016, 4104-4113. DOI: 10.1109/CVPR.2016.445.

[23] Kerbl B, Kopanas G, Leimkühler T, et al. 3D Gaussian Splatting for Real-Time Radiance Field Rendering. ACM Transactions on Graphics, 2023, 42(4): 1-14.

HYBRID FEATURE-ENHANCED 4D GAUSSIAN SPLATTING FOR DYNAMIC SCENE RECONSTRUCTION

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

DOI:

How to Cite