Multi-channel target speech enhancement using labeled random finite sets and deep learning under reverberant environments

Datta, Jayanta; Dehghan Firoozabadi, Ali; Zabala-Blanco, David; Castillo Soria, Francisco Ruben; Adams, Martin; Perez, Claudio

Mostrar el registro sencillo de la publicación

dc.contributor.author	Datta, Jayanta
dc.contributor.author	Dehghan Firoozabadi, Ali
dc.contributor.author	Zabala-Blanco, David
dc.contributor.author	Castillo Soria, Francisco Ruben
dc.contributor.author	Adams, Martin
dc.contributor.author	Perez, Claudio
dc.date.accessioned	2024-04-01T18:47:09Z
dc.date.available	2024-04-01T18:47:09Z
dc.date.issued	2023
dc.identifier.uri	http://repositorio.ucm.cl/handle/ucm/5277
dc.description.abstract	We proposed a multi-channel speech enhancement procedure under reverberant conditions with acoustic source tracking and beamforming. A deep learning algorithm was applied to improve the construction of a measurement set for labeled random finite set (RFS)-based target source tracking for source localization and tracking to predict time-frequency (T-F) mask and enhance the target speech. During the source localization, steered response power phase transform (SRP-PHAT) was used to construct the measurement set for the labeled random finite set-based source tracking framework. However, owing to noise and reverberation effects, the constructed measurement set suffered from impairments that degraded the performance of the source tracking algorithm. Accurate location estimates of the target source in motion are crucial to the subsequent speech enhancement framework. Owing to its de-noising capability, a deep learning algorithm was applied to compensate for the impairments arising due to noise and reverberation for the construction of an improved measurement set. This enabled the source tracking framework to estimate the location of the target source with improved accuracy. Furthermore, a deep learning framework was applied in the speech enhancement to predict the T-F mask corresponding to the target source. T-F masking, originally used in computational auditory scene analysis (CASA), was treated with a speech enhancement method in which weights were assigned to the bins of T-F representations of the received mixture signal to enhance the received signal containing the target speech. By using the information from the source tracking sab-system, a deep learning framework was used to predict the T-F mask corresponding to the target source in the spectral domain. The inclusion of such a neural T-F mask prediction sub-system within the speech enhancement stage improved the target source separation of the time-varying beamformer. Computer simulation results showed the application of deep learning algorithms in the source localization as well as final speech enhancement stages dynamically localized acoustic sources as well as constructed effective time-varying beamformers and T-F masking. Thus, the speech corresponding to the target source was enhanced under reverberant conditions with multiple interferers.	es_CL
dc.language.iso	en	es_CL
dc.rights	Atribución-NoComercial-SinDerivadas 3.0 Chile	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/3.0/cl/	*
dc.source	2023 IEEE 5th Eurasia Conference on IOT, Communication and Engineering (ECICE), 640-645	es_CL
dc.subject	Deep learning	es_CL
dc.subject	Location awareness	es_CL
dc.subject	Time-frequency analysis	es_CL
dc.subject	Target tracking	es_CL
dc.subject	Transforms	es_CL
dc.subject	Speech enhancement	es_CL
dc.subject	Prediction algorithms	es_CL
dc.title	Multi-channel target speech enhancement using labeled random finite sets and deep learning under reverberant environments	es_CL
dc.type	Article	es_CL
dc.ucm.facultad	Facultad de Ciencias de la Ingeniería	es_CL
dc.ucm.indexacion	Scopus	es_CL
dc.ucm.uri	ieeexplore.ieee.org/document/10382971/authors#authors	es_CL
dc.ucm.doi	doi.org/10.1109/ECICE59523.2023.10382971	es_CL

Ficheros en la publicación

Ficheros	Tamaño	Formato	Ver
No hay ficheros asociados a esta publicación.

Esta publicación aparece en la(s) siguiente(s) colección(ones)

Artículos Científicos

Mostrar el registro sencillo de la publicación

Excepto si se señala otra cosa, la licencia de la publicación se describe como Atribución-NoComercial-SinDerivadas 3.0 Chile

Listar

Mi cuenta

Multi-channel target speech enhancement using labeled random finite sets and deep learning under reverberant environments

Ficheros en la publicación

Esta publicación aparece en la(s) siguiente(s) colección(ones)