Invention Title:

KALMANNET: A LEARNABLE KALMAN FILTER FOR ACOUSTIC ECHO CANCELLATION

Publication number:

US20250356871

Publication date:
Section:

Physics

Class:

G10L21/0216

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The patent application introduces a method and apparatus for acoustic echo cancellation (AEC) using a neural-network-based model named KALMANNET. This system processes audio signals captured by microphones, applying AEC to suppress unwanted echoes while preserving the desired audio components. The approach leverages the capabilities of deep neural networks to address challenges in traditional AEC methods, such as nonlinearity and parameter tuning.

Background

Acoustic echo cancellation is crucial in speech processing, especially in mobile communication and teleconferencing. Conventional methods like DSP-based adaptive filtering, including NLMS and RLS, struggle with nonlinearity and require meticulous parameter tuning. Recent advancements in deep learning have shown promise in overcoming these limitations by treating AEC as a source separation problem. Despite their success, these methods face difficulties with dynamic echo paths. The Kalman filter, known for robustness in double-talk scenarios, has not been fully utilized in AEC, presenting an opportunity for innovation.

Technical Details

The proposed system comprises a memory storing computer code and processors executing this code to implement the AEC process. The neural-network-based model includes a recurrent neural network (RNN) that processes audio signals. It features two branches: one estimating far-end non-linear distortion using complex-valued ratio filters (cRF) from convolution layers, and another estimating transition factors and non-linear transition functions using LSTM cells. The Kalman filter is updated based on outputs from these branches.

Model Architecture

The neural network model utilizes a 4-layer LSTM architecture with each layer comprising 257 hidden units. The first branch applies complex-valued ratio filters to estimate distortions, while the second branch employs linear layers with sigmoidal activation for transition factors and LSTM cells for non-linear transitions. A loss function combining scale-invariance signal-to-distortion ratio (SI-SDR) and mean absolute error (MAE) guides training, ensuring effective echo cancellation.

Implementation and Application

The invention can be implemented using various processing circuits or integrated circuits, with code stored on non-transitory computer-readable media. It finds applications in communication systems involving bidirectional data transmission such as videoconferencing. The system supports both unidirectional and bidirectional video transmission, ensuring robust performance across different network types including telecommunications networks, LANs, WANs, and the Internet.