US20240257294
2024-08-01
Physics
G06T1/20
The patent outlines a compute optimization mechanism specifically designed for deep neural networks, focusing on the architecture of graphics processing units (GPUs). The GPU comprises multiple multiprocessors, each equipped with a register file capable of storing various operand types and multiple processing cores. These cores are divided into two sets, each associated with distinct memory channels, allowing for more efficient data handling and processing.
Traditional graphics processing has relied on fixed-function units for tasks such as linear interpolation and texture mapping. Recent advancements have led to programmable graphics processors that can handle a wider array of operations. Enhancements in parallel processing techniques, particularly through single instruction, multiple thread (SIMT) architectures, have been pivotal in maximizing processing efficiency by executing program instructions synchronously across groups of parallel threads.
The described GPU can be connected to host processor cores to enhance performance across various tasks, including graphics operations and machine learning. These connections can occur via high-speed interconnects like PCIe or NVLink or through integration within a single chip. Work allocation to the GPU is managed through sequences of commands, which the GPU processes using dedicated circuitry designed for efficiency.
A comprehensive computing system is presented, featuring a processing subsystem that connects multiple processors and system memory through a memory hub. This hub facilitates communication between the parallel processors and I/O subsystems, enabling input from devices and output to display systems. The architecture supports various configurations, including integrated circuits that combine multiple components into single packages or modules.
The computing system's design allows for significant flexibility in component arrangement and connectivity. Variations can include direct connections between processors and memory or alternative topologies for integrating I/O hubs and memory hubs. The system can accommodate multiple processor sets and supports diverse architectures, ensuring adaptability to different computational needs while maintaining optimized performance for deep neural network operations.