Advanced Micro Devices is launching the AMD Instinct MI100, a new graphics processing unit (GPU) that plays a specialized role as an accelerator for scientific research computing.
The 7-nanometer GPU accelerator uses AMD’s CDNA architecture to handle high-performance computing (HPC) and AI processing so scientists can work on heavy-duty computing tasks, such as coronavirus research.
Santa Clara, California-based AMD said the chip is the world’s fastest HPC GPU and the first x86 server GPU to surpass 10 teraflops in performance.
AMD VP Brad McCredie said in a press briefing that there is ample documentation to show CPU progress in datacenter applications has slowed down relative to the progress of GPUs. More recently, general-purpose GPUs have also begun to slow their rate of progress. That’s why the company has divided its design efforts between consumer graphics GPUs and enterprise/server GPUs, as the needs of graphics and AI processing can be very different. The separate architecture approach stands in contrast to rival Nvidia’s approach of using just one.
“We believe with this broad set of function and capability, this GPU will service both the needs of supercomputing customers, as well as the broad needs of classic HPC clients,” McCredie said.
The device is supported by new accelerated computers from AMD’s customers, including Dell, Gigabyte, HP Enterprise, and Supermicro.
The chip competes with a new 80GB version of the Nvidia A100 GPU, which was also announced today.
“The new MI100 from AMD is a solid step forward for HPC, as it provides [close to] 18% better peak floating-point performance than the original 40GB A100 from Nvidia,” Moor Insights & Strategy analyst Karl Freund said in an email to VentureBeat. “However, real applications may derive more benefit from the 80GB of HPB2E (high bandwidth) memory that the new A100 offers. The attractive price/performance of the MI100 may tilt price-sensitive customers in AMD’s favor.”
Freund added, “For AI, I do not believe the AMD MI100 has adequate performance to take on Nvidia in the bulk of the market, which lies in cloud service providers, ecommerce, and social media. Overall, the MI100 is a step toward HPC exascale in 2023, while the 80GB A100 can double or even triple AI application performance over the six-month-old A100. In AI, Nvidia raised the bar yet again, and I do not see any competitors who can clear that hurdle.”
Combined with the second-generation AMD Epyc central processing units (CPUs) and ROCm 4.0 open software, the MI100 is designed to help scientists make scientific breakthroughs, McCredie said.
“High-performance computing is really playing a really important role in the analysis of the possibility of contracting the [coronavirus], developing vaccines, and all sorts of life sciences applications,” senior VP Dan McNamara said in a press briefing.
GPUs for graphics and the enterprise
AMD’s compute-focused GPU architecture is dubbed CDNA, and its Radeon graphics GPU architecture is called RDNA. The company recently released its high-end graphics chips, the 7-nanometer Radeon 5600 GPUs.
McNamara said, “These two workloads … don’t really need to coexist. There’s no need to have an insane chip be able to go and play Halo and also perform advanced molecular simulation, seismic analysis, or astrophysics simulation.”
The MI100 offers up to 11.5 teraflops of peak FP64 performance for HPC and up to 46.1 teraflops peak FP32 Matrix performance for AI and machine learning workloads.
Open software platform
The AMD ROCm developer software provides the foundation for “exascale computing,” McCredie said. ROCm 4.0 has been optimized to deliver performance at scale for MI100-based systems.
The Oak Ridge National Laboratory’s leadership computing facility has received early access to the accelerator chip, and the early results are promising, with 2-3 times performance increases compared to other GPUs, according to science lab director Bronson Messer.
The chips are expected to be available in systems at the end of the year.
“You’re going to get twice as much science for the same amount of money,” McCredie said.
Best practices for a successful AI Center of Excellence: A guide for both CoEs and business units Access here