Invention Title:

MULTI-BATCH REINFORCEMENT LEARNING VIA MULTI-IMITATION LEARNING

Publication number:

US20260012850

Publication date:
Section:

Electricity

Class:

H04W28/0862

Inventors:

Assignee:

Applicant:

Smart overview of the Invention

The patent application describes a method for enhancing traffic load balancing in communication systems using a novel approach that combines reinforcement learning (RL) and imitation learning. This method involves receiving traffic data from multiple base stations and creating augmented traffic data sets by integrating subsets of data from different stations. The augmented data is used to train individual AI models through imitation learning, which are then distilled into a generalized AI model. This model is capable of predicting future traffic loads for each base station, providing a more efficient load balancing solution.

Challenges Addressed

Traditional reinforcement learning faces challenges in real-world applications due to its reliance on extensive interactions with the environment, which are often impractical. This application addresses these limitations by using batch reinforcement learning, which learns from previously collected data. However, existing batch RL algorithms struggle with limited data, especially in new network nodes. The proposed method overcomes these issues by using multi-imitative learning to enhance data efficiency and flexibility, making it suitable for practical deployment in communication systems.

Methodology

The method involves several key steps: receiving traffic data from base stations, creating augmented traffic data by merging subsets from different stations, and training AI models through imitation learning. A generalized AI model is then obtained by distilling knowledge from these individual models. The process includes computing a distillation loss and a triplet loss, which help refine the model's predictions for both identical and different tasks. The generalized model is trained until its overall loss stabilizes or meets a predefined threshold.

Technical Implementation

The technical implementation includes defining state-action pairs that represent the system's state and the rewards for actions taken. These pairs are crucial for determining the sample selection ratio, which measures the similarity between data from different stations. The ratio is used to decide which data points to include in the augmented data set. The AI models are then trained using these augmented data sets, and the generalized model is fine-tuned through loss computations to ensure accurate future traffic load predictions.

Deployment and Benefits

Once developed, the generalized AI model is deployed to base stations, where it is updated based on real-time observations. This ensures the model remains accurate and responsive to changes in traffic patterns. The approach enhances data efficiency and flexibility, allowing for effective load balancing even with limited data. It provides a practical solution for real-time traffic management in communication networks, potentially reducing congestion and improving service quality.