US20260010969
2026-01-08
Physics
G06T1/20
The patent application details a graphics processing unit (GPU) that enhances computations related to neural networks. This GPU features a dynamic precision fixed-point unit capable of converting floating-point tensors into fixed-point tensors. The innovation aims to improve the efficiency and precision of neural network computations by dynamically managing the precision of integer deep learning primitives.
Modern graphics processors are designed to handle a wide range of operations beyond traditional graphics tasks. These include machine learning and general-purpose computations. The shift from fixed-function units to programmable units in GPUs has enabled them to support diverse operations. Single instruction, multiple thread (SIMT) architectures in GPUs are crucial for maximizing parallel processing efficiency, allowing groups of threads to execute instructions synchronously.
The computing system incorporates a processing subsystem with one or more processors connected to a system memory. A memory hub facilitates communication between the processor and memory, while an I/O subsystem manages input and output operations. The GPU, part of the parallel processor subsystem, can be connected via high-speed interconnects like PCIe or NVLink, optimizing the processing of commands and instructions for various applications, including graphics and machine learning tasks.
The system's architecture allows flexibility in integrating components. The GPU can be connected to the processor cores either through external interconnects or integrated directly on the same chip. This integration enhances the efficiency of command processing and supports various configurations, including system on chip (SoC) and system in package (SIP) designs. The components can also be part of a multi-chip module (MCM), facilitating modular computing systems.
The described computing system is adaptable, with potential variations in component connections and configurations. For instance, the memory hub and I/O hub could be integrated or separated, and multiple processors could be used. The system's design supports optional components, allowing customization based on specific needs. This flexibility ensures the system can cater to different computational demands and technological advancements.