2021
Mean absorption estimation from room impulse responses using virtually supervised learning
Cedric Foy, Antoine Deleforge, and Diego Di Carlo
in The Journal of the Acoustical Society of America,
Vol. 150, Num. 2,
pp. 1286--1299, 2021.
@article{foy2021mean, title={Mean absorption estimation from room impulse responses using virtually supervised learning}, author={Foy, C{'e}dric and Deleforge, Antoine and Di Carlo, Diego}, journal={The Journal of the Acoustical Society of America}, volume={150}, number={2}, pages={1286--1299}, year={2021}, publisher={AIP Publishing} }
In the context of building acoustics and the acoustic diagnosis of an existing room, it introduces and investigates a new approach to estimate the mean absorption coefficients solely from a room impulse response (RIR). This inverse problem is tackled via virtually supervised learning, namely, the RIR-to-absorption mapping is implicitly learned by regression on a simulated dataset using artificial neural networks. Simple models based on well-understood architectures are the focus of this work. The critical choices of geometric, acoustic, and simulation parameters, which are used to train the models, are extensively discussed and studied while keeping in mind the conditions that are representative of the field of building acoustics. Estimation errors from the learned neural models are compared to those obtained with classical formulas that require knowledge of the room's geometry and reverberation times. Extensive comparisons made on a variety of simulated test sets highlight different conditions under which the learned models can overcome the well-known limitations of the diffuse sound field hypothesis underlying these formulas. Results obtained on real RIRs measured in an acoustically configurable room show that at 1 kHz and above, the proposed approach performs comparably to classical models when reverberation times can be reliably estimated and continues to work even when they cannot.
2021
dEchorate: a calibrated room impulse response dataset for echo-aware signal processing
Diego Di Carlo, Pinchas Tandeitnik, Cedric Foy, Nancy Bertin, Antoine Deleforge, Sharon Gannot
in IEEE Signal Processing Magazine,
Vol. 2021, Num. 5,
pp. 1--15, 2021.
@article{carlo2021dechorate, title={dEchorate: a calibrated room impulse response dataset for echo-aware signal processing}, author={Carlo, Diego Di and Tandeitnik, Pinchas and Foy, Cedri{'c} and Bertin, Nancy and Deleforge, Antoine and Gannot, Sharon}, journal={EURASIP Journal on Audio, Speech, and Music Processing}, volume={2021}, pages={1--15}, year={2021}, publisher={Springer} }
This paper presents a new dataset of measured multichannel room impulse responses (RIRs) named dEchorate. It includes annotations of early echo timings and 3D positions of microphones, real sources, and image sources under different wall configurations in a cuboid room. These data provide a tool for benchmarking recent methods in echo-aware speech enhancement, room geometry estimation, RIR estimation, acoustic echo retrieval, microphone calibration, echo labeling, and reflector position estimation. The dataset is provided with software utilities to easily access, manipulate, and visualize the data as well as baseline methods for echo-related tasks.
2019
Audio-Based Search and Rescue With a Drone: Highlights From the IEEE Signal Processing Cup 2019 Student Competition
Deleforge, Antoine and Di Carlo, Diego and Strauss, Martin and Serizel, Romain and Marcenaro, Lucio
in IEEE Signal Processing Magazine,
Vol. 36, Num. 5,
pp. 138--144, 2019.
@article{Deleforge2019audio,
author = {Deleforge, Antoine and {Di Carlo}, Diego and Strauss, Martin and Serizel, Romain and Marcenaro, Lucio},
journal = {IEEE Signal Processing Magazine},
number = {5},
pages = {138--144},
publisher = {IEEE},
title = {Audio-Based Search and Rescue With a Drone: Highlights From the IEEE Signal Processing Cup 2019 Student Competition [SP Competitions]},
url = {https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8827999},
volume = {36},
year = {2019}
}
Increasing interest in unmanned aerial vehicles (UAVs), commonly referred to as drones, has occurred in recent years. Search and rescue scenarios where humans in emergency situations need to be quickly found in difficult to access areas constitute an important field of application for this technology. Drones have already been used by humanitarian organizations in countries such as Haiti and the Philippines to map areas after a natural disaster using high-resolution embedded cameras, as documented in a recent United Nations report [1]. Although research efforts have focused mostly on developing video-based solutions for this task [2], UAV-embedded audio-based localization has received relatively less attention [3-7]. However, UAVs equipped with a microphone array could be of critical help to localize people in emergency situations, especially when video sensors are limited by a lack of visual feedback due to bad lighting conditions (such as at night or in fog) or obstacles limiting the field of view (Figure 1).
2024
Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising
Yoto Fujita, Aditya Arie Nugraha, Diego Di Carlo, Yoshiaki Bando, Mathieu Fontaine, and Kazuyoshi Yoshii
in Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA),
2024.
@inproceedings{fujita2024runtimeadaptation,
abbr = {APSIPA},
bibtex_show = {true},
author = {Fujita, Yoto and Nugraha, Aditya Arie and Di Carlo, Diego and Bando, Yoshiaki and Fontaine, Mathieu and Yoshii, Kazuyoshi},
title = {Run-Time Adaptation of Neural Beamforming for Robust Speech Dereverberation and Denoising},
booktitle = {Proceedings of Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)},
year = {2024},
month = dec,
pages = {},
address = {Macau, China},
preprint = {https://arxiv.org/abs/2410.22805}
}
This paper describes speech enhancement for realtime automatic speech recognition (ASR) in real environments. A standard approach to this task is to use neural beamforming that can work efficiently in an online manner. It estimates the masks of clean dry speech from a noisy echoic mixture spectrogram with a deep neural network (DNN) and then computes a enhancement filter used for beamforming. The performance of such a supervised approach, however, is drastically degraded under mismatched conditions. This calls for run-time adaptation of the DNN. Although the ground-truth speech spectrogram required for adaptation is not available at run time, blind dereverberation and separation methods such as weighted prediction error (WPE) and fast multichannel nonnegative matrix factorization (FastMNMF) can be used for generating pseudo groundtruth data from a mixture. Based on this idea, a prior work proposed a dual-process system based on a cascade of WPE and minimum variance distortionless response (MVDR) beamforming asynchronously fine-tuned by block-online FastMNMF. To integrate the dereverberation capability into neural beamforming and make it fine-tunable at run time, we propose to use weighted power minimization distortionless response (WPD) beamforming, a unified version of WPE and minimum power distortionless response (MPDR), whose joint dereverberation and denoising filter is estimated using a DNN. We evaluated the impact of run-time adaptation under various conditions with different numbers of speakers, reverberation times, and signal-to-noise ratios (SNRs).
2024
RIR-in-a-Box: Estimating Room Acoustics from 3D Mesh Data through Shoebox Approximation
Liam Kelley, Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Yoshiaki Bando, and Kazuyoshi Yoshii
in Annual Conference of the International Speech Communication Association (Interspeech),
2024.
@inproceedings{kelley2024ririnabox,
abbr = {Interspeech},
bibtex_show = {true},
author = {Kelley, Liam and Di Carlo, Diego and Nugraha, Aditya Arie and Fontaine, Mathieu and Bando, Yoshiaki and Yoshii, Kazuyoshi},
title = {RIR-in-a-Box: Estimating Room Acoustics from 3D Mesh Data through Shoebox Approximation},
booktitle = {Proceedings of Annual Conference of the International Speech Communication
Association (Interspeech)},
year = {2024},
month = sep,
pages = {3255-3259},
address = {Kos Island, Greece},
url = {https://www.isca-archive.org/interspeech_2024/kelley24_interspeech.html},
html = {https://www.isca-archive.org/interspeech_2024/kelley24_interspeech.html},
pdf = {https://www.isca-archive.org/interspeech_2024/kelley24_interspeech.pdf},
preprint = {https://telecom-paris.hal.science/hal-04632526},
doi = {10.21437/Interspeech.2024-2053}
}
Acoustic echoes retrieval is a research topic that is gaining importance in many speech and audio signal processing applications such as speech enhancement, source separation, dereverberation and room geometry estimation. This work proposes a novel approach to retrieve acoustic echoes timing off-grid and blindly, i.e., from a stereophonic recording of an unknown sound source such as speech. It builds on the recent framework of continuous dictionaries. In contrast with existing methods, the proposed approach does not rely on parameter tuning nor peak picking techniques by working directly in the parameter space of interest. The accuracy and robustness of the method are assessed on challenging simulated setups with varying noise and reverberation levels and are compared to two state-of-the-art methods.
2024
Joint Audio Source Localization and Separation with Distributed Microphone Arrays Based on Spatially-Regularized Multichannel NMF
Yoshiaki Sumura, Diego Di Carlo, Aditya Arie Nugraha, Yoshiaki Bando, and Kazuyoshi Yoshii
in International Workshop on Acoustic Signal Enhancement (IWAENC),
2024.
@inproceedings{sumura2024jointlocalsep,
abbr = {IWAENC},
bibtex_show = {true},
author = {Sumura, Yoshiaki and Di Carlo, Diego and Nugraha, Aditya Arie and Bando, Yoshiaki and Yoshii, Kazuyoshi},
title = {Joint Audio Source Localization and Separation with Distributed Microphone Arrays Based on Spatially-Regularized Multichannel NMF},
booktitle = {Proceedings of International Workshop on Acoustic Signal Enhancement (IWAENC)},
year = {2024},
month = sep,
pages = {145-149},
address = {Aalborg, Denmark},
url = {https://ieeexplore.ieee.org/document/10694042},
html = {https://ieeexplore.ieee.org/document/10694042},
doi = {10.1109/IWAENC61483.2024.10694042}
}
This paper describes a statistically principled method that simultaneously localizes and separates multiple sound sources using multiple calibrated microphone arrays distributed in a room. Given the extensive research on direction of arrival (DOA) estimation with a single microphone array, for 3D source localization, one may attempt triangulation based on DOAs separately and egocentrically estimated by multiple arrays. However, in multiple sources scenarios, this cascading approach faces both the inter-array DOA association problem and the error accumulation problem. To solve these problems, we propose a spatially regularized extension of a versatile blind source separation method called multichannel nonnegative matrix factorization (MNMF). Our method treats multiple microphone arrays as a single big array and puts priors on the frequency-wise spatial covariance matrices (SCMs) of each source. These priors are defined using the source DOA computed from the 3D positions of the source and arrays. The power spectral densities (PSDs), SCMs, and positions of multiple sources are jointly estimated under the unified maximum-a-posteriori (MAP) principle. We show the effectiveness of the joint statistical estimation for real data recorded by four five-channel microphone arrays of Microsoft Azure Kinect.
2024
Neural Steerer: Novel Steering Vector Synthesis with a Causal Neural Field over Frequency and Direction
Diego Di Carlo, Aditya Arie Nugraha, Mathieu Fontaine, Yoshiaki Bando, and Kazuyoshi Yoshii
in IEEE International Conference on Acoustics, Speech and Signal Processing Workshops (ICASSPW),,
2024.
@inproceedings{dicarlo2024neuralsteerer,
abbr = {ICASSPW},
bibtex_show = {true},
author = {Di Carlo, Diego and Nugraha, Aditya Arie and Fontaine, Mathieu and Bando, Yoshiaki and Yoshii, Kazuyoshi},
title = {Neural Steerer: Novel Steering Vector Synthesis with a Causal Neural Field over Frequency and Direction},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech and Signal Processing Workshops (ICASSPW)},
month = apr,
year = {2024},
pages = {740-744},
address = {Seoul, South Korea},
url = {https://ieeexplore.ieee.org/document/10626510},
html = {https://ieeexplore.ieee.org/document/10626510},
preprint = {https://arxiv.org/abs/2305.04447},
doi = {10.1109/ICASSPW62465.2024.10626510}
}
We address the problem of accurately interpolating measured anechoic steering vectors with a deep learning framework called the neural field. This task plays a pivotal role in reducing the resource-intensive measurements required for precise sound source separation and localization, essential as the front-end of speech recognition. Classical approaches to interpolation rely on linear weighting of nearby measurements in space on a fixed, discrete set of frequencies. Drawing inspiration from the success of neural fields for novel view synthesis in computer vision, we introduce the neural steerer, a continuous complex-valued function that takes both frequency and direction as input and produces the corresponding steering vector. Importantly, it incorporates inter-channel phase difference information and a regularization term enforcing filter causality, essential for accurate steering vector modeling. Our experiments, conducted using a dataset of real measured steering vectors, demonstrate the effectiveness of our resolution-free model in interpolating such measurements.
2024
Implicit neural representation for change detection
Peter Naylor, Diego Di Carlo, Arianna Traviglia, Makoto Yamada, Marco Fiorucci
in IEEE/CVF Winter Conference on Applications of Computer Vision (WACV),
2024.
@inproceedings{naylor2024implicit,
title={Implicit neural representation for change detection},
author={Naylor, Peter and Di Carlo, Diego and Traviglia, Arianna and Yamada, Makoto and Fiorucci, Marco},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={935--945},
year={2024}
doi={10.1109/WACV57701.2024.00098}
html={https://ieeexplore.ieee.org/abstract/document/10483630/}
}
Identifying changes in a pair of 3D aerial LiDAR point clouds, obtained during two distinct time periods over the same geographic region presents a significant challenge due to the disparities in spatial coverage and the presence of noise in the acquisition system. The most commonly used approaches to detecting changes in point clouds are based on supervised methods which necessitate extensive labelled data often unavailable in real-world applications. To ad- dress these issues, we propose an unsupervised approach that comprises two components: Implcit Neural Represena- tion (INR) for continuous shape reconstruction and a Gaus- sian Mixture Model for categorising changes. INR offers a grid-agnostic representation for encoding bi-temporal point clouds, with unmatched spatial support that can be regu- larised to enhance high-frequency details and reduce noise. The reconstructions at each timestamp are compared at ar- bitrary spatial scales, leading to a significant increase in detection capabilities. We apply our method to a benchmark dataset comprising simulated LiDAR point clouds for ur- ban sprawling. This dataset encompasses diverse challeng- ing scenarios, varying in resolutions, input modalities and noise levels. This enables a comprehensive multi-scenario evaluation, comparing our method with the current state-of- the-art approach. We outperform the previous methods by a margin of 10% in the intersection over union metric. In addition, we put our techniques to practical use by applying them in a real-world scenario to identify instances of illicit excavation of archaeological sites and validate our results by comparing them with findings from field experts.
2023
Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning
Aditya Arie Nugraha, Diego Di Carlo, Yoshiaki Bando, Mathieu Fontaine, and Kazuyoshi Yoshii
in IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA),
2023.
@inproceedings{nugraha2023gpdkl,
selected = {true},
abbr = {WASPAA},
bibtex_show = {true},
author = {Nugraha, Aditya Arie and Di Carlo, Diego and Bando, Yoshiaki and Fontaine, Mathieu and Yoshii, Kazuyoshi},
title = {Time-Domain Audio Source Separation Based on Gaussian Processes with Deep Kernel Learning},
booktitle = {Proceedings of IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA)},
year = {2023},
month = oct,
pages = {1--5},
address = {New Paltz, NY, USA},
url = {https://ieeexplore.ieee.org/document/10248168},
html = {https://ieeexplore.ieee.org/document/10248168},
preprint = {https://hal.science/hal-04172863},
doi = {10.1109/WASPAA58266.2023.10248168}
}
This paper revisits single-channel audio source separation based on a probabilistic generative model of a mixture signal defined in the continuous time domain. We assume that each source signal follows a non-stationary Gaussian process (GP), i.e., any finite set of sampled points follows a zero-mean multivariate Gaussian distribution whose covariance matrix is governed by a kernel function over time-varying latent variables. The mixture signal composed of such source signals thus follows a GP whose covariance matrix is given by the sum of the source covariance matrices. To estimate the latent variables from the mixture signal, we use a deep neural network with an encoder-separator-decoder architecture (e.g., Conv-TasNet) that separates the latent variables in a pseudo-time-frequency space. The key feature of our method is to feed the latent variables into the kernel function for estimating the source covariance matrices, instead of using the decoder for directly estimating the time-domain source signals. This enables the decomposition of a mixture signal into the source signals with a classical yet powerful Wiener filter that considers the full covariance structure over all samples. The kernel function and the network are trained jointly in the maximum likelihood framework. Comparative experiments using two-speech mixtures under clean, noisy, and noisy-reverberant conditions from the WSJ0-2mix, WHAM!, and WHAMR! benchmark datasets demonstrated that the proposed method performed well and outperformed the baseline method under noisy and noisy-reverberant conditions.
2022
Elliptically Contoured Alpha-Stable Representation for MUSIC-Based Sound Source Localization
Mathieu Fontaine, Diego Di Carlo, Kouhei Sekiguchi, Aditya Arie Nugraha, Yoshiaki Bando, and Kazuyoshi Yoshii
in European Signal Processing Conference (EUSIPCO),
2022.
@inproceedings{fontaine2022alphamusic,
abbr = {EUSIPCO},
bibtex_show = {true},
author = {Fontaine, Mathieu and Di Carlo, Diego and Sekiguchi, Kouhei and Nugraha, Aditya Arie and Bando, Yoshiaki and Yoshii, Kazuyoshi},
title = {Elliptically Contoured Alpha-Stable Representation for MUSIC-Based Sound Source Localization},
booktitle = {Proceedings of European Signal Processing Conference (EUSIPCO)},
year = {2022},
month = aug,
pages = {26--30},
address = {Belgrade, Serbia},
url = {https://ieeexplore.ieee.org/document/9909944},
html = {https://ieeexplore.ieee.org/document/9909944},
pdf = {https://eurasip.org/Proceedings/Eusipco/Eusipco2022/pdfs/0000026.pdf}
}
This paper introduces a theoretically-rigorous sound source localization (SSL) method based on a robust extension of the classical multiple signal classification (MUSIC) algorithm. The original SSL method estimates the noise eigenvectors and the MUSIC spectrum by computing the spatial covariance matrix of the observed multichannel signal and then detects the peaks from the spectrum. In this work, the covariance matrix is replaced with the positive definite shape matrix originating from the elliptically contoured α-stable model, which is more suitable under real noisy high-reverberant conditions. Evaluation on synthetic data shows that the proposed method outperforms baseline methods under such adverse conditions, while it is comparable on real data recorded in a mild acoustic condition.
2022
Post processing sparse and instantaneous 2D velocity fields using physics-informed neural networks
Diego Di Carlo, Dominique Heitz, Thomas Corpetti
in 20th International Symposium on Application of Laser and Imaging Techniques to Fluid Mechanics (LXLASER),
2022.
@inproceedings{di2022post,
title={Post processing sparse and instantaneous 2D velocity fields using physics-informed neural networks},
author={Di Carlo, Diego and Heitz, Dominique and Corpetti, Thomas},
booktitle={Proceedings of the 20th International Symposium on Application of Laser and Imaging Techniques to Fluid Mechanics},
doi={10.55037/lxlaser.20th.183},
year={2022}
}
This work tackles the problem of resolving high-resolution velocity fields from a set of sparse off-grid observations. This task, crucial in many applications spanning from experimental fluiddynamics to compute vision and medicine, can be addressed with deep neural network models trained to employ physics-based constraints. This work proposes an original unsupervised deep learning framework involving sub-grid models that improve the accuracy of super-resolved instantaneous and sparse velocity fields of turbulent flows. Python code, dataset and results are available at https://github.com/Chutlhu/TurboSuperResultion/
2020
BLASTER: An Off-Grid Method for Blind and Regularized Acoustic Echoes Retrieval
Di Carlo, Diego and Elvira, Clement and Deleforge, Antoine and Bertin, Nancy and Gibonval, Remi
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2020.
@inproceedings{kelley2024ririnabox,
abbr = {Interspeech},
bibtex_show = {true},
author = {Kelley, Liam and Di Carlo, Diego and Nugraha, Aditya Arie and Fontaine, Mathieu and Bando, Yoshiaki and Yoshii, Kazuyoshi},
title = {RIR-in-a-Box: Estimating Room Acoustics from 3D Mesh Data through Shoebox Approximation},
booktitle = {Proceedings of Annual Conference of the International Speech Communication
Association (Interspeech)},
year = {2024},
month = sep,
pages = {3255-3259},
address = {Kos Island, Greece},
url = {https://www.isca-archive.org/interspeech_2024/kelley24_interspeech.html},
html = {https://www.isca-archive.org/interspeech_2024/kelley24_interspeech.html},
pdf = {https://www.isca-archive.org/interspeech_2024/kelley24_interspeech.pdf},
preprint = {https://telecom-paris.hal.science/hal-04632526},
doi = {10.21437/Interspeech.2024-2053}
}
This paper describes a method for estimating the room impulse response (RIR) for a microphone and a sound source located at arbitrary positions from the 3D mesh data of the room. Simulating realistic RIRs with pure physics-driven methods often fails the balance between physical consistency and computational efficiency, hindering application to real time speech processing. Alternatively, one can use MESH2IR, a fast black-box estimator that consists of an encoder extracting latent code from mesh data with a graph convolutional network (GCN) and a decoder generating the RIR from the latent code. Combining these two approaches, we propose a fast yet physically coherent estimator with interpretable latent code based on differentiable digital signal processing (DDSP). Specifically, the encoder estimates a virtual shoebox room scene that acoustically approximates the real scene, accelerating physical simulation with the differentiable image-source model in the decoder. Our experiments showed that our method outperformed MESH2IR for real mesh data obtained with the depth scanner of Microsoft HoloLens 2, and can provide correct spatial consistency for binaural RIRs.
2019
MIRAGE: 2D Source Localization Using Microphone Pair Augmentation with Echoes
Di Carlo, Diego and Deleforge, Antoine and Bertin, Nancy
in IEEE International Conference on Acoustics, Speech and Signal Processing,
2019.
@inproceedings{DiCarlo2019mirage,
arxiv = {1906.08968},
author = { Di Carlo, Diego and Deleforge, Antoine and Bertin, Nancy},
booktitle = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
doi = {10.1109/ICASSP.2019.8683534},
hal_id = {hal-01909531},
keywords = {Image Microphones,Sound Source Localization,Supervised Learning,TDOA Estimation},
pages = {775--779},
title = {Mirage: 2D Source Localization Using Microphone Pair Augmentation with Echoes},
url = {https://github.com/Chutlhu/MIRAGE},
volume = {2019-May},
year = {2019}
}
It is commonly observed that acoustic echoes hurt performance of sound source localization (SSL) methods. We introduce the concept of microphone array augmentation with echoes (MIRAGE) and show how estimation of early-echo characteristics can in fact benefit SSL. We propose a learning based scheme for echo estimation combined with a physics based scheme for echo aggregation. In a simple scenario involving 2 microphones close to a reflective surface and one source, we show using simulated data that the proposed approach performs similarly to a correlation-based method in azimuth estimation while retrieving elevation as well from 2 microphones only, an impossible task in anechoic settings.
2018
SEPARAKE: Source Separation with a Little Help from Echoes
Scheibler, Robin and Di Carlo, Diego and Deleforge, Antoine and Dokmanic, Ivan
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2018.
@inproceedings{Scheibler2017separake,
arxiv = {1711.06805},
author = {Scheibler, Robin and Di Carlo, Diego and Deleforge, Antoine and Dokmanic, Ivan},
doi = {10.1109/ICASSP.2018.8461345},
hal_id = {hal-01909531},
journal = {IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)},
keywords = {Echoes,Multi-channel,NMF,Room geometry,Source separation},
pages = {6897--6901},
title = {Separake: Source Separation with a Little Help from Echoes},
url = {https://github.com/fakufaku/separake},
year = {2018}
}
It is commonly believed that multipath hurts various audio process-ing algorithms.At odds with this belief, we show that multipath in fact helps sound source separation,even with very simple propagation models.Unlike most existing methods, we neither ignore the room impulse responses, nor we attempt to estimate them fully. Weather assume that we know the positions of a few virtual
micro-phones generated by echoes and we show how this gives us enough spatial
diversity to get a performance boost over the anechoic case.We show improvements for two standard algorithms\u2014one that uses only magnitudes of the transfer functions, and one that also uses the phases.Concretely, we show that multichannel non-negative matrix factorization aided with a small number of echoes beats the vanilla variant of the same algorithm, and that with magnitude information only, echoes enable separation where it was previously impossible
2018
Evaluation of an Open-Source Implementation of the SPR-PHAT Algorithm Within the 2018 Locata Challenge
Lebarbenchon, Romain and Camberlein, Ewen and Di Carlo, Diego and Deleforge, Antoine and Bertin, Nancy
in LOCATA Challenge Workshop - a satellite event of International Workshop on Acoustic Signal Enhancement (IWAENC),
2018.
2018
Interference reduction on full-length live recordings
Di Carlo, Diego and Liutkus, Antoine and Déguernel, Ken
in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),
2018.
2017
Gaussian framework for interference reduction in live recordings
Di Carlo, Diego and Déguernel, Ken and Liutkus, Antoine
in AES International Conference on Semantic Audio,
2017.
2016
Gestural Control Of Wavefield synthesis
Grani, Francesco and Di Carlo, Diego and Portillo, Jorge Madrid and Girardi, Matteo and Paisa, Razvan and Banas, Jian Stian and Vogiatzoglou, Iakovos and Overholt, Dan and Serafin, Stefania
in Sound and Music Computing Conference (SMC),
2016.
2014
Automatic music listening for automatic music performance: a grandpiano dynamics classifier
Di Carlo, Diego and Rodá, Antonio
in Proceedings of the 1st International Workshop on Computer and Robotic Systems for Automatic Music Performance (SAMP 14),
2014.