Invention Title:

SYSTEM AND METHODS FOR PIPLINED HETEROGENEOUS DATAFLOW FOR ARTIFICIAL INTELLIGENCE ACCELERATORS

Publication number:

US20250328762

Publication date:

2025-10-23

Section:

Physics

Class:

G06N3/08

Inventors:

Kerem Akarvardar 🇹🇼 Hsinchu, Taiwan

Xiaoyu Sun 🇹🇼 Hsinchu, Taiwan

Assignee:

TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY, LTD. 🇹🇼 Hsinchu, Taiwan

Applicant:

TAIWAN SEMICONDUCTOR MANUFACTURING COMPANY LTD. 🇹🇼 Hsinchu, Taiwan

Smart overview of the Invention

The patent application describes systems and methods for a pipelined heterogeneous dataflow designed for artificial intelligence (AI) accelerators. These systems utilize a pipelined processing core that includes two types of processing cores: one with a matrix array of processing elements (PEs) arranged in rows and columns, and another configured to receive outputs from the first core. Each PE is capable of performing multiply-and-accumulate (MAC) operations based on inputs and weights.

Background

AI accelerators are specialized hardware designed to efficiently process AI workloads, such as neural networks. These accelerators often use systolic arrays to perform operations like multiplication and accumulation. Traditional AI accelerators typically support a fixed dataflow, which may not be optimal for all types of AI workloads, given the variety in layer types and shapes such as convolutional (CONV) and fully connected (FC) layers.

Innovation

The disclosed technology introduces a novel approach by using separate CONV and FC cores tailored for different workflows, improving performance through pipelining computations. For instance, the CONV core is configured for weight stationary dataflow, while the FC core employs input stationary dataflow. This configuration enhances overall latency and throughput by optimizing dataflows for each core type and reducing interconnect overhead.

Technical Advantages

By employing a pipelined architecture, this system provides technical advantages over conventional designs. It reduces computational overhead by eliminating unnecessary horizontal weight forwarding in the FC core, thus improving efficiency. The pipelined architecture allows for more efficient deep neural network calculations, which can lead to enhanced performance in real-world applications.

Implementation Details

The pipelined core serves as a building block for systolic array-based AI accelerators, processing data in waves through MAC operations. This architecture can be adapted to various configurations beyond systolic arrays, such as vector engines or other heterogeneous architectures. The flexibility of the design allows it to be used in diverse computing scenarios, potentially enhancing the capabilities of AI accelerators.