A MULTI-LABEL IMAGE RECOGNITION NETWORK DRIVEN BY LABEL-IMAGE SEMANTIC ALIGNMENT
Keywords:
Multi-label image recognition, Graph convolutional network, Semantic decoupling, Multi-head self-attention, Label correlationAbstract
Multi-label image recognition aims to predict a set of semantic labels for an image and has wide applications in other fields. Existing methods have two main problems: the attention regions generated by attention-based methods are insufficiently correlated with label semantics; methods based on label correlation modeling lack dynamic interaction with visual image content, making it difficult to achieve precise alignment between label semantics and image regions. To address these issues, this paper proposes a novel multi-label image recognition algorithm. First, we construct a label graph and leverage a graph convolutional network to learn label semantic priors, modeling the dependencies among labels. Second, we design a semantic decoupling module that adaptively focuses on relevant image regions under the guidance of label semantics to generate label semantic representations. Finally, we introduce a semantic association reasoning module that employs a multi-head self-attention mechanism to dynamically capture semantic correlations among labels, thereby enhancing the discriminative ability of features. Experimental results on the PASCAL VOC 2007 dataset show that our method achieves 95.0% mAP, surpassing existing state-of-the-art methods and improving by 1.6 percentage points over the baseline SSGRL. Ablation studies further validate the effectiveness of each module.References
[1] Xu Jiahao, Tian Hongda, Wang Zhiyong, et al. Joint input and output space learning for multi-label image classification. IEEE Transactions on Multimedia, 2020(23): 1696-1707.
[2] Wu Yanan, Feng Songhe, Yang Wang. Semantic-aware graph matching mechanism for multi-label image recognition. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 33(11): 6788-6803.
[3] Hu Yunqing, Chen Qianglong, Zhang Yin. Semantic perception enhancement region pyramid model for multi-label image recognition. Journal of Computer-Aided Design & Graphics, 2025, 37(10): 1770-1786.
[4] Gao Bin-Bin, Zhou Hongyu. Learning to discover multi-class attentional regions for multi-label image recognition. IEEE Transactions on Image Processing, 2021, 30: 5920-5932.
[5] Chen Jiale, Feng Xu, Tao Zeng, et al. MSFA: Multi-stage feature aggregation network for multi-label image recognition. IET Image Processing, 2024, 18(7): 1862-1877.
[6] Li Liang, Wang Shuhui, Jiang Shuqiang, et al. Attentive recurrent neural network for weak-supervised multi-label image classification. In Proceedings of the 26th ACM international conference on Multimedia, 2018: 1092-1100.
[7] Zhou Wei, Xia Zhiwu, Dou Peng, et al. Aligning image semantics and label concepts for image multi-label classification. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19(2): 1-23.
[8] Ye Qingwen, Zhang Qiuju. Multi-label image recognition using channel pixel attention. Computer Science and Exploration, 2024, 18(08): 2109-2117.
[9] Zhou Wei, Zheng Zhijie, Su Tao, et al. DATran: Dual attention transformer for multi-label image classification. IEEE Transactions on Circuits and Systems for Video Technology, 2023, 34(1): 342-356.
[10] Chen Tianshui, Xu Muxin, Hui Xiaolu, et al. Learning semantic-specific graph representation for multi-label image recognition. In Proceedings of the IEEE/CVF international conference on computer vision, 2019: 522-531.
[11] Wang Xuesong, Rong Xiaolong, Cheng Yuhu, et al. Multi-label image recognition based on adaptive multi-scale graph convolutional network. Control and Decision, 2022, 37(07): 1737-1744. DOI: 10.13195/j.kzyjc.2021.0179.
[12] Yuan Jin, Chen Shikai, Zhang Yao, et al. Graph attention transformer network for multi-label image classification. ACM Transactions on Multimedia Computing, Communications and Applications, 2023, 19(4): 1-16.
[13] Qu Xiwen, Che Hao, Huang Jun, et al. Multi-layered semantic representation network for multi-label image classification. International journal of machine learning and cybernetics, 2023, 14(10): 3427-3435.