New AMD CPU Patent Reveals 3D Stacked Machine Learning Accelerator Design

On September 25, 2020, AMD issued a patent for a unique processor that offers a vertically stacked machine learning (ML) accelerator on the I/O chip, or IOD. AMD could be preparing a data center-based system-on-chip (SoC) with integrated Field Programmable Gate Arrays (FPGAs) or machine learning accelerators for specialized GPUs. AMD will eventually add an FPGA or GPU on top of its processor’s I/O die, similar to how AMD adds specialized cache to its new processors.

AMD Begins Focus on 3D Stacked Machine Learning Accelerators in Latest Patent Innovations

The technology is vital as it will allow the company to add additional classes of accelerators to future SoC processors. AMD’s patent does not guarantee that consumers will see the newly designed processors appear on the market. The company’s latest venture lets users see what the future might hold with the right research and development at the forefront. AMD hasn’t expressed any recent patent information, which means we can only estimate what the company is planning for the new designs.

AMD Ryzen 7 5800X3D Gaming Benchmark Shows Impressive 3D Virtual Cache Performance in CPU-Bound Scenarios

The “Direct Attached Machine Learning Accelerator” patent issued to AMD explains the possible uses the company can initiate with an ML accelerator stacked on the processor with the IOD included. The technology will consist of a compute FPGA or GPU to process ML workloads stacked on an IOD with a specialized accelerator connector. AMD can initiate this design by adding a single accelerator in local memory, using memory tied to the IOD or a separate section not attached to the IOD head.

When talking about machine learning, it is usually synonymous with data centers. Still, AMD will have to increase the workloads of its chips with this new technology. AMD’s patent would increase the speed of workloads without combining the expensive, custom silicon used in system chips. Benefits would also include greater efficiency in power, data transmissions and more capacity.

The timing of the patent seems strategic due to the filing close to the AMD/Xilinx acquisition. Now that we’re just over a year and a half after filing and seeing the patent finally published in late March 2022, we may see the new designs, if they come to fruition, as soon as 2023. The inventor listed on the patent is AMD Fellow Maxim V. Kazakov.

AMD SP5 socket shown in all its glory, LGA 6096 for future EPYC processors with 96 cores and above

AMD is creating new EPYC processors, named Genoa and Bergamo, which use an I/O die design combined with an accelerator. It may be possible for AMD to manufacture AI-based processors under the Genoa and Bergamo series with machine learning accelerators.

Speaking of AMD’s EPYC line, the company is looking for a higher 600W cTDP or configurable thermal design power for the fifth-generation EPYC Turin line of processors. EPYC Turin processors offer twice the cTDP of the current EPYC 7003 Milan series. Additionally, the company’s fourth- and fifth-generation SP5 platform EPYC processors deliver up to 700W of power consumption in short bursts. With Genoa and Bergamo processors, if an ML accelerator is added to the processor, it would increase power consumption. Future server chipsets would benefit from vertically stacked accelerators, such as AMD’s recently patented ML-accelerated processor designs.

It should be understood that many variations are possible based on the disclosure here[…]

Suitable processors include, by way of example, general purpose processor, special purpose processor, conventional processor, graphics processor, machine learning processor, [a DSP, an ASIC, an FPGA]and other types of integrated circuits (ICs).

[…] Such processors can be manufactured by setting up a manufacturing process using the results of processed hardware description language (HDL) instructions and other intermediate data, including netlists (such instructions may be stored on a human-readable medium). computer).

— excerpt from the AMD patent “Direct-connected machine learning accelerator”

With the help of Xilinx technology, the company can now offer compute-focused GPU designs, rugged FPGA designs, programmable processor series from Pensando, and solid x86 microarchitecture. Multi-chip designs, similar to the technology seen in AMD Infinity Fabric interconnect technology, are now a reality for the enterprise. Data center processors with vertical stacking will provide more options for enterprises by combining multi-tiled APUs for data centers and processors built with TSMC’s N4X performance nodes and complementing them with a GPU or a FPGA accelerator with optimally enhanced N3E process technology.

The crucial takeaway from the AMD-published patent is the machine learning accelerator technology itself and its place in the future of mainstream processors. AMD would integrate the accelerator more universally into future product lines, allowing for a more diverse portfolio that would put them at the forefront of data center applications and customer-specific use.

Norma A. Roth