Invention Title:

COMPUTE OPTIMIZATION MECHANISM FOR DEEP NEURAL NETWORKS

Publication number:

US20250166115

Publication date:

2025-05-22

Section:

Physics

Class:

G06T1/20

Inventors:

Altug Koker El Dorado Hills, CA, United States

Adam T. Lake Portland, OR, United States

Subramaniam MAIYURAN Gold River, CA, United States

Nadathur Rajagopalan Satish Santa Clara, CA, United States

Karthik Vaidyanathan Berkeley, CA, United States

Feng Chen Shanghai, China

Balaji Vembu Folsom, CA, United States

Sara S. Baghsorkhi San Jose, CA, United States

Travis T. Schluessler Hillsboro, OR, United States

Ben J. Ashbaugh Folsom, CA, United States

JOYDEEP RAY Folsom, CA, United States

Justin E. Gottschlich Santa Clara, CA, United States

Linda L. Hurd Cool, CA, United States

Wenyin Fu Folsom, CA, United States

Nicolas C. Galoppo Von Borries Portland, OR, United States

PRASOONKUMAR SURTI Folsom, CA, United States

Jeffery S. Boles Folsom, CA, United States

Josh B. Mastronarde Sacramento, CA, United States

Rajkishore Barik Santa Clara, CA, United States

John H. Feit Folsom, CA, United States

Abhishek R. Appu El Dorado Hills, CA, United States

Devan Burke Portland, OR, United States

Narayan Srinivasa Portland, OR, United States

Tsung-Han Lin Campbell, CA, United States

ERIKO NURVITADHI Hillsboro, OR, United States

Kamal Sinha Rancho Cordova, CA, United States

Farshad Akhbari Chandler, AZ, United States

Dukhwan Kim San Jose, CA, United States

Assignee:

INTEL CORPORATION Santa Clara, CA, United States

Applicant:

Intel Corporation Santa Clara, CA, United States

Smart overview of the Invention

Overview: The described technology focuses on optimizing compute operations within deep neural networks by utilizing advanced graphics processing units (GPUs). These GPUs consist of multiple multiprocessors, each equipped with a register file to store various operand types and a set of processing cores. These cores are divided into two types, each associated with distinct memory channels, enabling efficient data processing.

Background: Traditional graphics processors were designed with fixed function units for specific graphics operations like tessellation and texture mapping. However, modern GPUs have evolved to become programmable, allowing them to support a broader range of operations. This shift has led to the adoption of Single Instruction, Multiple Thread (SIMT) architectures, which enhance parallel processing by synchronously executing program instructions across multiple threads.

Detailed Mechanisms: The GPU is connected to host processor cores to accelerate various operations, including machine learning and pattern analysis. This connection can be established through high-speed interconnects like PCIe or NVLink. The processor cores allocate tasks to the GPU via work descriptors, which the GPU processes using dedicated circuitry. The compute mechanism features multiple execution units (EUs) of different types within each processing unit, facilitating matrix-vector transformations using shared local memory or register files.

System Configuration: A typical computing system includes a processing subsystem with processors and system memory interconnected by a memory hub. This hub communicates with an I/O subsystem that manages input devices and display outputs. Parallel processors are connected via communication links like PCI Express, forming a computationally focused system capable of handling graphics and general-purpose tasks.

Integration and Flexibility: The system's components can be integrated into various configurations, such as system-on-chip (SoC) or multi-chip modules (MCM), enhancing modularity and scalability. The architecture allows for different connection topologies and component arrangements, providing flexibility in design and implementation. Optional components can be included or omitted based on specific requirements, making the system adaptable to diverse computing environments.