Invention Title:

SUPPLY CHAIN OPTIMIZATION WITH REINFORCEMENT LEARNING

Publication number:

US20250278689

Publication date:

2025-09-04

Section:

Physics

Class:

G06Q10/067

Inventors:

Patric HAMMLER 🇨🇭 Rheinfelden, Switzerland

Nicolas Oliver RIESTERER 🇩🇪 Ebringen, Germany

Assignee:

HOFFMANN-LA ROCHE INC. 🇺🇸 Little Falls, NJ, United States

Applicant:

Hoffmann-La Roche Inc. 🇺🇸 Little Falls, NJ, United States

Smart overview of the Invention

Supply Chain Optimization with Reinforcement Learning introduces a method for improving supply chain efficiency using machine learning, specifically reinforcement learning. The focus is on multi-distribution-level supply chains, which involve complex interactions between various nodes such as warehouses and distribution centers. The goal is to develop a dynamic and scalable model that reduces operational costs while ensuring demand satisfaction.

Supply Chain Structure

Supply chains consist of nodes and edges. Nodes are stocking points like warehouses, while edges represent the relationships and dependencies between these nodes. In this context, the invention differentiates between single-echelon (independent nodes) and multi-echelon (interdependent nodes) supply chains. The optimization process aims to create a holistic approach that considers the interdependencies among various nodes, thus preventing isolated decision-making that could negatively impact the overall system.

Markov Decision Processes

The optimization model is framed as a Markov Decision Problem (MDP), characterized by a 4-tuple (S, A, T, R):

S: State space representing the supply chain's current situation.
A: Set of actions for managing states, such as reorder decisions.
T: Transition probabilities describing dynamics like demand variability and lead times.
R: Reward signal assessing the quality of actions based on cost impacts.

Virtual Environment and Digital Twin

A virtual environment mimics real-world supply chain conditions using real-time data to create a digital twin. This digital twin acts as an indistinguishable virtual counterpart for system simulation and optimization purposes. By interfacing an optimizer with this digital twin, the invention allows for rapid optimization without directly impacting the real-world supply chain. This setup facilitates risk assessment and enables learning from negative experiences in a controlled virtual setting.

Machine Learning Implementation

The model employs an AI algorithm, specifically a Deep Reinforcement Learning (DRL) model. Implemented via frameworks like Stable-Baselines3, it uses algorithms such as A2C, DDPG, DQN, among others, to optimize reorder policies dynamically across varying scales of supply chain complexity. This approach ensures high performance by leveraging thousands of years of simulated time for training, ultimately enhancing decision-making processes in supply chain management.