Invention Title:

MAPPING ABSTRACT DATA MOVEMENTS INTO SEQUENTIAL AND PARALLEL DIRECT MEMORY ACCESS (DMA) PROGRAMMING

Publication number:

US20250298763

Publication date:

2025-09-25

Section:

Physics

Class:

G06F13/28

Inventors:

Ahmad Itani 🇺🇸 San Jose, CA, United States

Dawid Stanislaw Pajak 🇺🇸 San Carlos, CA, United States

Lachlan Francis DOWLING 🇺🇸 San Jose, CA, United States

Wenqi BAO 🇨🇳 Shanghai, China

Jinkai GAO 🇨🇳 Shanghai, China

Assignee:

NVIDIA Corporation 🇺🇸 Santa Clara, CA, United States

Applicant:

NVIDIA Corporation 🇺🇸 Santa Clara, CA, United States

Smart overview of the Invention

The disclosed system involves the automatic generation of hardware-level configurations for Direct Memory Access (DMA) devices using high-level descriptions of data movements. This approach simplifies programming by translating abstract data flows into specific hardware commands. It reduces human error and enhances compatibility across different hardware generations, addressing limitations in current manual methods of DMA programming.

Background

High-performance computing accelerators are crucial for tasks in scientific, graphics, and machine learning fields, but they suffer from high latency to main memory. DMA engines help mitigate this by efficiently managing data movement between memory hierarchies. However, programming these engines is complex and prone to errors due to the need for low-level, hardware-specific coding, which is labor-intensive and often results in performance issues.

System Functionality

The system employs one or more processors to receive high-level data flow instructions and automatically generate low-level code for DMA devices. This process involves a DMA compiler that optimizes bandwidth allocation and links data flows with DMA phases for efficient execution. The method allows for seamless adaptation to different hardware configurations without extensive manual intervention.

Implementation Details

Generating hardware-level configurations may include creating intermediate configurations and allocating resources like buffer bandwidth. The system can prioritize data flows and determine phase descriptors for sequential execution. This flexibility supports various applications, such as autonomous systems, virtual reality content generation, and deep learning operations.

Applications

The technology is applicable across numerous domains, including robotics, autonomous vehicles, augmented reality, digital twin operations, and cloud computing. It provides a robust framework for enhancing performance and reducing complexity in environments where efficient data movement is critical.