IMPROVING SMALL FIRE TARGET DETECTION IN UAV IMAGERY: AN ENHANCED RT-DETR WITH MULTI-SCALE FUSION AND EXPERT ROUTING
Keywords:
Fire detection, Real-time object detection, RT-DETR, Adaptive Spatial Feature Fusion (ASFF), Mixture-of-experts (MoE)Abstract
Early fire detection is of paramount importance for forest fire prevention, yet traditional monitoring methods (e.g., satellites and ground-based stations) suffer from poor real-time performance or limited coverage. Unmanned aerial vehicles equipped with computer vision offer a novel solution for fire detection, but complex backgrounds, small flame and smoke targets, and varying illumination and weather conditions make accurate recognition challenging. In this work, we enhance the real-time detection Transformer model RT-DETR by designing a hybrid encoder architecture tailored for UAV fire imagery. Key improvements include the integration of an Adaptive Spatial Feature Fusion (ASFF) module to reconcile multi-scale feature inconsistencies; incorporation of Efficient Channel Attention (ECA) to strengthen channel-wise representations; replacement of the Transformer's fully connected feed-forward network with a Gated Mixture-of-Experts (MoE) structure to boost model capacity; and a multi-layer Transformer feature aggregation strategy. We evaluate the improved model on a UAV smoke fire dataset. Results show a significant uplift in both detection accuracy and recall: at an IoU threshold of 0.5, the enhanced RT-DETR achieves over 88.8% mAP—an approximate 2% gain over the original RT-DETR and superior performance compared to YOLO-series baselines. Ablation studies confirm that ASFF fusion, multi-attention mechanisms, and the MoE architecture each contribute meaningfully to small-target fire detection. Crucially, these advances incur negligible additional inference latency, enabling real-time intelligent monitoring for wildland fire scenarios.References
[1] Chen Y, Zhang Y, Xin J, et al. A UAV-based forest fire detection algorithm using convolutional neural network. 2018 37th Chinese Control Conference (CCC). IEEE, 2018: 10305-10310.
[2] Haucap J, Rasch A, Stiebale J. How mergers affect innovation: theory and evidence. International Journal of Industrial Organization, 2019, 63: 283-325.
[3] Tsung-Yi Lin, Priya Goyal, Ross Girshick, Kaiming He, and Piotr Dollar. Focal loss for dense object detection. In ′ Proceedings of the IEEE/CVF International Conference on Computer Vision, 2017, 2: 2980–2988.
[4] Jocher G, Stoken A, Borovec J, et al. ultralytics/yolov5: v3. 0. Zenodo, 2020.
[5] Chien-Yao Wang, Alexey Bochkovskiy, and Hong-Yuan Mark Liao. Yolov7: Trainable bag-of-freebies sets new state-ofthe-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023: 7464–7475.
[6] Mukhiddinov M, Abdusalomov A B, Cho J. A wildfire smoke detection system using unmanned aerial vehicle images based on the optimized YOLOv5. Sensors, 2022, 22(23): 9384.
[7] Zhao Y, Lv W, Xu S, et al. Detrs beat yolos on real-time object detection. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2024: 16965-16974.
[8] Xizhou Zhu, Weijie Su, Lewei Lu, et al. Deformable detr: Deformable transformers for end-to-end object detection. In International Conference on Learning Representations, 2020.
[9] Shilong Liu, Feng Li, Hao Zhang, et al. Dab-detr: Dynamic anchor boxes are better queries for detr. In International Conference on Learning Representations, 2021.
[10] Lv W, Zhao Y, Chang Q, et al. Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer. arXiv preprint arXiv:2407.17140, 2024.
[11] Liu Z, Zhang K, Wang C, et al. Research on the identification method for the forest fire based on deep learning. Optik, 2020, 223: 165491.
[12] Jiaqi Shi, Jinhu Wang, Junhui Xu, et al. Research on forest fire monitoring technology based on UAV and convolutional neural network. Advances in Applied Mathematics, 2022, 11: 3200.
[13] Jie Li, Xuanbing Qiu, Enhua Zhang, et al. Fire recognition algorithm based on convolutional neural network. Journal of Computer Applications, 2020, 40(S2): 173-177.
[14] Qiang Chen, Jian Wang, Chuchu Han, et al. Group detr v2: Strong object detector with encoder-decoder pretraining. arXiv preprint arXiv:2211.03594, 2022.
[15] Shazeer N, Mirhoseini A, Maziarz K, et al. Outrageously large neural networks: The sparsely-gated mixture-of-experts layer. arXiv preprint arXiv:1701.06538, 2017.
[16] Fedus W, Zoph B, Shazeer N. Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity. Journal of Machine Learning Research, 2022, 23(120): 1-39.
[17] Riquelme C, Puigcerver J, Mustafa B, et al. Scaling vision with sparse mixture of experts. Advances in Neural Information Processing Systems, 2021, 34: 8583-8595.
[18] Yuan, Jinghuil. A Margin-Maximizing Fine-Grained Ensemble Method. arXiv preprint arXiv:2409.12849, 2024.