Invention Title:

COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS

Publication number:

US20240257294

Publication date:

2024-08-01

Section:

Physics

Class:

G06T1/20

Inventors:

Altug Koker El Dorado Hills, CA, United States

Adam T. Lake Portland, OR, United States

Subramaniam MAIYURAN Gold River, CA, United States

Nadathur Rajagopalan Satish Santa Clara, CA, United States

Karthik Vaidyanathan Berkeley, CA, United States

Feng Chen Shanghai, China

Balaji Vembu Folsom, CA, United States

Sara S. Baghsorkhi San Jose, CA, United States

Travis T. Schluessler Hillsboro, OR, United States

Ben J. Ashbaugh Folsom, CA, United States

JOYDEEP RAY Folsom, CA, United States

Justin E. Gottschlich Santa Clara, CA, United States

Linda L. Hurd Cool, CA, United States

Wenyin Fu Folsom, CA, United States

Nicolas C. Galoppo Von Borries Portland, OR, United States

PRASOONKUMAR SURTI Folsom, CA, United States

Jeffery S. Boles Folsom, CA, United States

Josh B. Mastronarde Sacramento, CA, United States

Rajkishore Barik Santa Clara, CA, United States

John H. Feit Folsom, CA, United States

Abhishek R. Appu El Dorado Hills, CA, United States

Devan Burke Portland, OR, United States

Narayan Srinivasa Portland, OR, United States

Tsung-Han Lin Campbell, CA, United States

ERIKO NURVITADHI Hillsboro, OR, United States

Kamal Sinha Rancho Cordova, CA, United States

Farshad Akhbari Chandler, AZ, United States

Dukhwan Kim San Jose, CA, United States

Assignee:

INTEL CORPORATION Santa Clara, CA, United States

Applicant:

Intel Corporation Santa Clara, CA, United States

Drawings (4 of 40)

Drawing 01 for COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS

Drawing 02 for COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS

Drawing 03 for COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS

Drawing 04 for COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS

Smart overview of the Invention

The patent outlines a compute optimization mechanism specifically designed for deep neural networks, focusing on the architecture of graphics processing units (GPUs). The GPU comprises multiple multiprocessors, each equipped with a register file capable of storing various operand types and multiple processing cores. These cores are divided into two sets, each associated with distinct memory channels, allowing for more efficient data handling and processing.

Background Context

Traditional graphics processing has relied on fixed-function units for tasks such as linear interpolation and texture mapping. Recent advancements have led to programmable graphics processors that can handle a wider array of operations. Enhancements in parallel processing techniques, particularly through single instruction, multiple thread (SIMT) architectures, have been pivotal in maximizing processing efficiency by executing program instructions synchronously across groups of parallel threads.

Integration with Host Processors

The described GPU can be connected to host processor cores to enhance performance across various tasks, including graphics operations and machine learning. These connections can occur via high-speed interconnects like PCIe or NVLink or through integration within a single chip. Work allocation to the GPU is managed through sequences of commands, which the GPU processes using dedicated circuitry designed for efficiency.

System Architecture

A comprehensive computing system is presented, featuring a processing subsystem that connects multiple processors and system memory through a memory hub. This hub facilitates communication between the parallel processors and I/O subsystems, enabling input from devices and output to display systems. The architecture supports various configurations, including integrated circuits that combine multiple components into single packages or modules.

Flexibility and Modifications

The computing system's design allows for significant flexibility in component arrangement and connectivity. Variations can include direct connections between processors and memory or alternative topologies for integrating I/O hubs and memory hubs. The system can accommodate multiple processor sets and supports diverse architectures, ensuring adaptability to different computational needs while maintaining optimized performance for deep neural network operations.