US20260017228
2026-01-15
Physics
G06F15/80
The disclosed hardware accelerator offers a flexible configuration to support various data types and operation flows, enhancing the processing of AI workloads. It includes multiple fixed tensor operation logic units and a tensor operation pipeline logic. This setup allows the accelerator to receive a pipeline command from the processor, defining a series of tensor operation stages. The accelerator processes tensor data through these stages, producing a pipeline result that is then sent back to the processor.
In AI applications, hardware accelerators like tensor processing units (TPUs) are crucial for efficiently handling neural network computations, which involve large data sets called tensors. These applications often use quantization to reduce the size and computational cost of neural networks by lowering the precision of data. However, traditional methods require separate quantization processes that add overhead. Additionally, conventional hardware demands specific data formats, limiting flexibility and potentially hindering performance for specific AI applications.
The invention introduces a hardware accelerator that can be configured to handle different data types and operational flows, addressing the limitations of current technologies. It features a configurable pipeline processing element array with fixed tensor operation logic units. The array processes tensor data based on a received pipeline definition, outputting results efficiently. This flexibility allows for better adaptation to diverse AI tasks and data formats, potentially improving computational efficiency without sacrificing accuracy.
Existing hardware accelerators often require data reshaping to fit manufacturer-specified formats, adding time and cost. Fully programmable alternatives are slower and not optimized for large AI data sets. The invention provides a more adaptable solution with strategically limited fixed function logic units to maximize performance. The hardware can perform various operations like splitting, subtracting, selecting, and concatenating data, and it supports numeric format agnosticism. This design enables efficient implementation of programmable math operations using lookup tables.
The hardware accelerator integrates seamlessly with a processor to manage repetitive, intensive tasks in machine learning. It supports a machine learning program with training and inference modules, facilitating neural network operations. The training module adjusts connection weights using backpropagation, while the inference module processes input data to generate outputs. This integration enhances the overall efficiency and effectiveness of machine learning applications, accommodating evolving data science techniques and hardware advancements.