A sound source localization method based on multi-scale cross-STFT complex-valued convolutional neural network
Abstract
Purpose
This paper aims to address the limitations of current deep learning algorithms for sound source localization (SSL), which focus on a single feature and frequency scale, neglecting the integration of multi-scale information. The method developed in this study enhances localization accuracy by effectively using the spatial information and spectral diversity provided by microphone arrays.
Design/methodology/approach
The method is based on a multi-scale cross-short-time Fourier transform (STFT) complex-valued convolutional neural network (CCNN). It uses cross-STFT spectra at different scales to capture detailed acoustic information across various frequencies. The effectiveness of the algorithm was validated through both simulations and experimental studies.
Findings
Experimental results demonstrate that the proposed multi-scale cross-STFT CCNN not only outperforms the single-scale cross-STFT model but also delivers superior localization performance compared to other advanced methods, achieving consistently higher accuracy. The method shows excellent robustness across various signal-to-noise ratio (SNR) conditions and performs well even on imbalanced datasets, confirming its strong generalization capabilities.
Originality/value
This paper introduces a novel approach to SSL that integrates multi-scale information, addressing a key limitation of existing methods. The findings offer significant value to researchers and practitioners in the field of acoustic signal processing, particularly those focused on deep learning-based localization techniques.
Keywords
Acknowledgements
This work was funded by the National Natural Science Foundation of China (Grant No.51805154) and the Hubei Provincial Natural Science Foundation of China (2022CFB473).
Data availability: The data and code supporting the findings of this study are available from the corresponding author upon reasonable request.
Citation
Liu, M., Zhou, C., Feng, H., Gong, C., Hu, J. and Jian, Z. (2025), "A sound source localization method based on multi-scale cross-STFT complex-valued convolutional neural network", Sensor Review, Vol. ahead-of-print No. ahead-of-print. https://doi.org/10.1108/SR-10-2024-0870
Publisher
:Emerald Publishing Limited
Copyright © 2025, Emerald Publishing Limited