Speaker counting based on a novel hive shaped nested microphone array by WPT and 2D adaptive SRP algorithms in near-field scenarios

Dehghan Firoozabadi, Ali; Adasme, Pablo; Zabala-Blanco, David; Palacios Játiva, Pablo; Azurdia-Meza, Cesar A.

Mostrar el registro sencillo de la publicación

dc.contributor.author	Dehghan Firoozabadi, Ali
dc.contributor.author	Adasme, Pablo
dc.contributor.author	Zabala-Blanco, David
dc.contributor.author	Palacios Játiva, Pablo
dc.contributor.author	Azurdia-Meza, Cesar A.
dc.date.accessioned	2023-06-05T20:24:11Z
dc.date.available	2023-06-05T20:24:11Z
dc.date.issued	2023
dc.identifier.uri	http://repositorio.ucm.cl/handle/ucm/4820
dc.description.abstract	Speech processing algorithms, especially sound source localization (SSL), speech enhancement, and speaker tracking are considered to be the main fields in this application. Most speech processing algorithms require knowing the number of speakers for real implementation. In this article, a novel method for estimating the number of speakers is proposed based on the hive shaped nested microphone array (HNMA) by wavelet packet transform (WPT) and 2D sub-band adaptive steered response power (SB-2DASRP) with phase transform (PHAT) and maximum likelihood (ML) filters, and, finally, the agglomerative classification and elbow criteria for obtaining the number of speakers in near-field scenarios. The proposed HNMA is presented for aliasing and imaging elimination and preparing the proper signals for the speaker counting method. In the following, the Blackman–Tukey spectral estimation method is selected for detecting the proper frequency components of the recorded signal. The WPT is considered for smart sub-band processing by focusing on the frequency bins of the speech signal. In addition, the SRP method is implemented in 2D format and adaptively by ML and PHAT filters on the sub-band signals. The SB-2DASRP peak positions are extracted on various time frames based on the standard deviation (SD) criteria, and the final number of speakers is estimated by unsupervised agglomerative clustering and elbow criteria. The proposed HNMA-SB-2DASRP method is compared with the frequency-domain magnitude squared coherence (FD-MSC), i-vector probabilistic linear discriminant analysis (i-vector PLDA), ambisonics features of the correlational recurrent neural network (AF-CRNN), and speaker counting by density-based classification and clustering decision (SC-DCCD) algorithms on noisy and reverberant environments, which represents the superiority of the proposed method for real implementation.	es_CL
dc.language.iso	en	es_CL
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 Chile	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
dc.source	Sensors, 23(9), 4499	es_CL
dc.subject	Speech processing	es_CL
dc.subject	Speaker counting	es_CL
dc.subject	Source localization	es_CL
dc.subject	Adaptive processing	es_CL
dc.subject	Microphone arrays	es_CL
dc.subject	Classification	es_CL
dc.subject	Spectral estimation	es_CL
dc.title	Speaker counting based on a novel hive shaped nested microphone array by WPT and 2D adaptive SRP algorithms in near-field scenarios	es_CL
dc.type	Article	es_CL
dc.ucm.facultad	Facultad de Ciencias de la Ingeniería	es_CL
dc.ucm.indexacion	Scopus	es_CL
dc.ucm.indexacion	Isi	es_CL
dc.ucm.uri	mdpi.com/1424-8220/23/9/4499	es_CL
dc.ucm.doi	doi.org/10.3390/s23094499	es_CL