MONOCULAR 3D BINAURAL LOCALIZATION AND DYNAMIC TRACKING FOR EAR-SIDE ACTIVE NOISE CONTROL IN VEHICLE CABINS
Keywords:
Monocular vision, Binaural localization, Ear tracking, Cabin perception, Active noise control, Intelligent cockpitAbstract
Accurate three-dimensional ear-position tracking is a prerequisite for ear-side active noise control in vehicle cabins, because natural head motion can directly shift the control target region and degrade the spatial consistency between the actual ears and the modeled control points. To address this problem, this study proposes a monocular-vision-based method for binaural three-dimensional localization and dynamic tracking in cabin scenes. A unified mapping among the pixel coordinate system, camera coordinate system, and cabin coordinate system is first established through camera calibration and geometric modeling. Facial landmark detection is then used to infer the two-dimensional locations of the left and right ears, after which cabin-feature-constrained monocular depth estimation is introduced to recover ear-region depth in a spatially aligned manner. The binaural three-dimensional coordinates are further refined through head-pose compensation, temporal filtering, and short-term prediction, so that the final output can be directly used as a control-oriented ear-state sequence. Multi-condition experiments under different illumination, occlusion, and head-pose variations show that the proposed method maintains good localization accuracy and trajectory continuity in all six tested cases. The root-mean-square error of binaural three-dimensional localization ranges from 18.40 to 25.57 mm, while the mean interaural distance error remains within 0.37 to 3.33 mm. Even under adverse conditions such as weak illumination and partial occlusion, the processed ear-state output remains continuously available to the downstream active noise control interface. These results indicate that the proposed method provides a low-cost and practically deployable solution for monocular binaural ear tracking in vehicle cabins and can serve as an effective perception front-end for ear-side active noise control.References
[1] Elliott S J, Nelson P A. Active noise control. IEEE Signal Processing Magazine, 1993, 10(4): 12-35.
[2] Jung W, Elliott S J, Cheer J. Combining the remote microphone technique with head-tracking for local active sound control. The Journal of the Acoustical Society of America, 2017, 142(1): 298-307.
[3] Elliott S J, Jung W, Cheer J. Head tracking extends local active control of broadband sound to higher frequencies. Scientific Reports, 2018, 8: 5403.
[4] Dong Y, Hu Z, Uchimura K, et al. Driver inattention monitoring system for intelligent vehicles: A review. IEEE Transactions on Intelligent Transportation Systems, 2010, 12(2): 596-614.
[5] Kaplan S, Guvensan M A, Yavuz A G, et al. Driver behavior analysis for safe driving: A survey. IEEE Transactions on Intelligent Transportation Systems, 2015, 16(6): 3017-3032.
[6] Ji Q, Yang X. Real-time eye, gaze, and face pose tracking for monitoring driver vigilance. Real-Time Imaging, 2002, 8(5): 357-377.
[7] Bergasa L M, Nuevo J, Sotelo M A, et al. Real-time system for monitoring driver vigilance. IEEE Transactions on Intelligent Transportation Systems, 2006, 7(1): 63-77.
[8] Mishra A, Lee S, Kim D, et al. In-cabin monitoring system for autonomous vehicles. Sensors, 2022, 22(12): 4360.
[9] Sharma P K, Chakraborty P. A review of driver gaze estimation and application in gaze behavior understanding. Engineering Applications of Artificial Intelligence, 2024, 133: 108117.
[10] Murphy-Chutorian E, Doshi A, Trivedi M M. Head pose estimation for driver assistance systems: A robust algorithm and experimental evaluation//2007 IEEE Intelligent Transportation Systems Conference. IEEE, 2007: 709-714.
[11] Murphy-Chutorian E, Trivedi M M. Head pose estimation in computer vision: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008, 31(4): 607-626.
[12] Morency L P, Whitehill J, Movellan J. Monocular head pose estimation using generalized adaptive view-based appearance model. Image and Vision Computing, 2010, 28(5): 754-761.
[13] Alioua N, Amine A, Rogozan A, et al. Driver head pose estimation using efficient descriptor fusion. EURASIP Journal on Image and Video Processing, 2016, 2016(1): 2.
[14] Diaz-Chito K, Hernández-Sabaté A, López A M. A reduced feature set for driver head pose estimation. Applied Soft Computing, 2016, 45: 98-107.
[15] Wang Y, Yuan G, Mi Z, et al. Continuous driver’s gaze zone estimation using RGB-D camera. Sensors, 2019, 19(6): 1287.
[16] Wang Y, Yuan G, Fu X. Driver’s head pose and gaze zone estimation based on multi-zone templates registration and multi-frame point cloud fusion. Sensors, 2022, 22(9): 3154.
[17] Park B K D, Jones M, Miller C, et al. In-Vehicle Occupant Head Tracking Using a Low-Cost Depth Camera//WCX World Congress Experience. SAE Technical Paper, 2018.
[18] Tambwekar A, Park B K D, Kusari A, et al. Three-Dimensional Posture Estimation of Vehicle Occupants Using Depth and Infrared Images. Sensors, 2024, 24(17): 5530.
[19] Ko K L, Yoo J S, Han C W, et al. Pose and shape estimation of humans in vehicles. IEEE Transactions on Intelligent Transportation Systems, 2023, 25(1): 402-416.
[20] Cavalcanti U L, Poggi M, Tosi F, et al. CabNIR: A Benchmark for In-Vehicle Infrared Monocular Depth Estimation//2025 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV). IEEE, 2025: 2578-2590.
[21] Kuo S M, Morgan D R. Active noise control: a tutorial review. Proceedings of the IEEE, 2002, 87(6): 943-973.
[22] Pawelczyk M. Adaptive noise control algorithms for active headrest system. Control Engineering Practice, 2004, 12(9): 1101-1112.
[23] Elliott S J, Jones M. An active headrest for personal audio. The Journal of the Acoustical Society of America, 2006, 119(5): 2702-2709.
[24] Jiang H, Chen H, Tao J, et al. Accuracy requirements of ear-positioning for active control of road noise in a car. Applied Acoustics, 2024, 225: 110164.
[25] Liu Y, Li H, Zou H, et al. Active headrest combined with a depth camera-based ear-positioning system. The Journal of the Acoustical Society of America, 2025, 157(1): 519-526.
[26] Zhang Z. A flexible new technique for camera calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2000, 22(11): 1330-1334.
[27] Kazemi V, Sullivan J. One millisecond face alignment with an ensemble of regression trees//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2014: 1867-1874.
[28] Zhang K, Zhang Z, Li Z, et al. Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Processing Letters, 2016, 23(10): 1499-1503.
[29] Lepetit V, Moreno-Noguer F, Fua P. EPnP: An accurate O(n) solution to the PnP problem. International Journal of Computer Vision, 2009, 81(2): 155-166.
[30] Gui M, Schusterbauer J, Prestel U, et al. DepthFM: Fast monocular depth estimation with flow matching. arXiv preprint arXiv:2403.13788, 2024.