FACE RECOGNITION MODEL BASED ON VISION TRANSFORMER-Upubscience Publisher

FACE RECOGNITION MODEL BASED ON VISION TRANSFORMER

Download as PDF

Volume 7, Issue 5, Pp 51-58, 2025

DOI: https://doi.org/10.61784/jcsee3077

Author(s)

JiaChen Gao

Affiliation(s)

School of Artificial Intelligence, China University of Mining and Technology-Beijing, Beijing 100083, China.

Corresponding Author

JiaChen Gao

ABSTRACT

Facial recognition technology for workplace attendance has attracted significant attention due to its ability to accurately and efficiently record attendance and enhance enterprise management efficiency. However, existing methods often suffer from several limitations, including vulnerability to interference in complex environments, poor robustness, high computational complexity, and inadequate defense against security attacks. To address these challenges, this study proposes an approach that integrates Multi-Task Cascaded Convolutional Neural Networks (MTCNN) to rapidly detect facial landmarks and perform alignment, providing standardized inputs for subsequent processing. A Vision Transformer (ViT) module is employed to extract global features through a self-attention mechanism, offering strong global modeling capabilities. Finally, a Softmax module is used to perform classification by computing category probabilities and generating recognition results. This module also guides feature learning during model training, leading to improved accuracy, efficiency, and robustness of facial recognition in attendance scenarios under complex conditions.

KEYWORDS

MTCNN; Vision transformer; Softmax; Face recognition

CITE THIS PAPER

JiaChen Gao. Face recognition model based on vision transformer. Journal of Computer Science and Electrical Engineering. 2025, 7(5): 51-58. DOI: https://doi.org/10.61784/jcsee3077.

REFERENCES

[1] Saxena S, Verbeek J. Heterogeneous face recognition with CNNs//Computer Vision-ECCV 2016 Workshops: Amsterdam, The Netherlands , October 8-10 and 15-16, 2016, Proceedings, Part III 14. Springer International Publishing, 2016: 483-491.

[2] Parkhi O, Vedaldi A, Zisserman A. Deep face recognition//BMVC 2015-Proceedings of the British Machine Vision Conference 2015. British Machine Vision Association, 2015.

[3] Jacob G M, Stenger B. Facial action unit detection with transformers//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 7680-7689.

[4] Fu H, Yu X, Zhuang J, et al. Face Recognition in Real-World Scenarios: Recent Advances and Challenges. IEEE Access, 2022, 10, 45312-45334.

[5] Xiong C, Zhao X, Tang D, et al. Conditional convolutional neural network for modality-aware face recognition//Proceedings of the IEEE International Conference on Computer Vision. 2015: 3667-3675.

[6] Hu G, Yang Y, Yi D, et al. When face recognition meets with deep learning: an evaluation of convolutional neural networks for face recognition// Proceedings of the IEEE international conference on computer vision workshops. 2015: 142-150.

[7] Yang Y X, Wen C, Xie K, et al. Face recognition using the SR-CNN model. Sensors, 2018, 18(12): 4237.

[8] Pu M, Huang Y, Liu Y, et al. Edter: Edge detection with transformer//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 1402-1412.

[9] Kim M, Su Y, Liu F, et al. Keypoint relative position encoding for face recognition//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 244-255.

[10] Eyiokur F I, Ekenel H K, Waibel A. Unconstrained face mask and face-hand interaction datasets: building a computer vision system to help prevent the transmission of COVID-19. Signal, image and video processing, 2023, 17(4): 1027-1034.

[11] Cao Z, Schmid N A, Cao S, et al. GMLM-CNN: A hybrid solution to SWIR-VIs face verification with limited imagery. Sensors, 2022, 22(23): 9500.

[12] Wang Z, Zhu X, Zhang T, et al. 3d face reconstruction with the geometric guidance of facial part segmentation//Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2024: 1672-1682.

[13] Chen X, Mihajlovic M, Wang S, et al. Morphable diffusion: 3D-consistent diffusion for single-image avatar creation//Proceedings of the IEEE/ CVF Conference on Computer Vision and Pattern Recognition. 2024: 10359-10370.

[14] Lee J, Wang Y, Cho S. Angular Margin-Mining Softmax Loss for Face Recognition. IEEE Access, 2022, 10: 43071-43080.

[15] Prakash P, Sam A J. Transformer-Metric Loss for CNN-Based Face Recognition. arXiv preprint arXiv:2412.02198, 2024.

[16] Oinar C, Le B M, Woo S S. Kappaface: adaptive additive angular margin loss for deep face recognition. IEEE Access, 2023, 11: 137138-137150.

[17] Kim M, Jain A K, Liu X. Adaface: quality adaptive margin for face recognition//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2022: 18750-18759.

[18] Dosovitskiy A, Beyer L, Kolesnikov A, et al. An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale. International Conference on Learning Representations (ICLR), 2021.

[19] Zhang X, Gao Y. Robust Face Recognition via Cross-Attention Vision Transformers. IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023.

[20] Andiani F M, Soewito B. Face recognition for work attendance using multitask convolutional neural network (MTCNN) and pre-trained facenet. ICIC Express Letters, 2021, 15(1): 57-65.

[21] Masi I, Wu Y, Hassner T, et al. Face Alignment by 3D Model Learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2021, 43(10), 3464-3477.

[22] Wang M, Deng W. Additive Margin Softmax for Face Verification. IEEE Signal Processing Letters, 2020, 25(7): 926-930.