Invention Title:

DYNAMIC PRECISION FOR NEURAL NETWORK COMPUTE OPERATIONS

Publication number:

US20250299032

Publication date:

2025-09-25

Section:

Physics

Class:

G06N3/063

Inventors:

Altug Koker 🇺🇸 El Dorado Hills, CA, United States

Sanjeev Jahagirdar 🇺🇸 Folsom, CA, United States

Nadathur Rajagopalan Satish 🇺🇸 Santa Clara, CA, United States

John C. Weast 🇺🇸 Portland, OR, United States

Feng Chen 🇨🇳 Shanghai, China

Balaji Vembu 🇺🇸 Folsom, CA, United States

Tatiana Shpeisman 🇺🇸 Menlo Park, CA, United States

JOYDEEP RAY 🇺🇸 Folsom, CA, United States

Mike B. Macpherson 🇺🇸 Portland, OR, United States

Linda L. Hurd 🇺🇸 Cool, CA, United States

Nicolas C. Galoppo Von Borries 🇺🇸 Portland, OR, United States

Anbang Yao 🇨🇳 Beijing, China

Rajkishore Barik 🇺🇸 Santa Clara, CA, United States

Vasanth Ranganathan 🇺🇸 El Dorado Hills, CA, United States

XIAOMING CHEN 🇨🇳 Shanghai, China

Abhishek R. Appu 🇺🇸 El Dorado Hills, CA, United States

Narayan Srinivasa 🇺🇸 Portland, OR, United States

Tsung-Han Lin 🇺🇸 Campbell, CA, United States

ERIKO NURVITADHI 🇺🇸 Hillsboro, OR, United States

Kamal Sinha 🇺🇸 Rancho Cordova, CA, United States

Farshad Akhbari 🇺🇸 Chandler, AZ, United States

Dukhwan Kim 🇺🇸 San Jose, CA, United States

Ping T. Tang 🇺🇸 Edison, NJ, United States

Michael S. Strickland 🇺🇸 Sunnyvale, CA, United States

Assignee:

INTEL CORPORATION 🇺🇸 Santa Clara, CA, United States

Applicant:

Intel Corporation 🇺🇸 Santa Clara, CA, United States

Smart overview of the Invention

The patent application describes an apparatus designed to enhance neural network computations by dynamically selecting between high and low precision components. This apparatus includes a compute engine with both precision components and logic to receive and execute instructions using the appropriate component. The selection process involves applying a gate to either the high or low precision component, optimizing the execution of instructions based on the computational needs.

Background

Machine learning, particularly neural networks, benefits significantly from parallel processing capabilities. General-purpose graphics processing units (GPGPUs) are integral in efficiently implementing these computations due to their parallel architecture. The single instruction, multiple thread (SIMT) architecture of GPGPUs maximizes parallel processing efficiency, allowing for the training of high-capacity networks on large datasets.

Detailed Description

The embodiments discussed can be implemented using various combinations of hardware and software in different processors, such as GPCPUs, CPUs, and GPUs. These embodiments are applicable in diverse computing systems, including mobile devices like smartphones and wearables. A GPU can accelerate various operations by being communicatively coupled with processor cores through high-speed interconnects like PCIe or NVLink.

System Configuration

The system includes a processing subsystem with one or more processors connected to a system memory and an I/O subsystem through a memory hub. Parallel processors form a computationally focused system that may include many integrated cores or a graphics processing subsystem. These components can be integrated into a single package or chip, forming configurations such as System on Chip (SoC) or System in Package (SIP).

Flexibility and Integration

The described computing system is flexible in its configuration, allowing for variations in connection topology and integration of components. For instance, processor connections to memory and I/O hubs can vary, and multiple processor sets can be connected via multiple sockets. This flexibility supports diverse applications and optimizations for specific computational tasks.