We’re excited to bring back Transform 2022 in person on July 19 and virtually from July 20-28. Join leaders in AI and data for in-depth discussions and exciting networking opportunities. Register today!
Intel’s Habana overtook Nvidia in the latest MLPerf benchmark results, which has become the industry standard benchmark set for comparing AI accelerators. Although Nvidia has already announced its next-generation GPU, the results indicate that the competition in deep learning training hardware is heating up.
Intel acquired startup Habana in late 2019 for $2 billion, and late last year its first-generation 16nm Gaudi NPU (neural processing unit) went live in Amazon’s AWS cloud. , claiming 40% better performance per dollar compared to Nvidia-based instances. However, since it competed with Nvidia’s 7nm A100, Habana mostly achieved its value by charging a lower price, not beating Nvidia on performance.
That changed in May when Habana announced Gaudi2 on 7nm, which increases the number of tensor processing cores by 3x and offers up to 96 GB of HBM2e. Habana claimed it outperformed the A100, Nvidia’s two-year-old primary data center GPU, by a comfortable margin. The launch came just in time to be included in the latest MLPerf results, which is the industry’s attempt to standardize deep learning benchmarking.
Habana said she only had 10 days from launch to submit her results, so she was unable to complete all eight tests and focused only on the two best-known benchmarks: ResNet -50 (image recognition) and BERT (natural language processing). MLPerf submissions go through a month-long peer review process.
Habana also said the short time means he hasn’t had time to optimize the software extensively yet. For example, Gaudi2 added support for a new lower precision FP8 format, which was not used in the submission. Instead, Habana chose to submit results based on the same software that is available to all Habana customers, while Nvidia allegedly uses optimizations not available in its software available to customers.
This means that the performance difference in the unoptimized cases is greater. In Habana’s own tests using public repositories on Azure instances, Habana measured that Gaudi2 was at least 2x faster on ResNet-50 and BERT than the A100. Habana says these results are more representative of out-of-the-box performance customers will see using publicly available software.
In the MLPerf results, compared to Nvidia’s submission, Gaudi2 was able to train ResNet-50 in 36% less time, which translates to 56% higher performance. Nonetheless, it can be noted that MLPerf results from deep learning startup MosaicML, who used PyTorch, provided a training time of 23.8 minutes that beat Nvidia’s own submission, although still slower than Gaudí2. On the other hand, other software optimizations can also reduce Gaudi2 time in future submission.
In BERT the win was smaller with Gaudi2 taking 7% less time than the A100. Compared to Gaudi, Gaudi2 was 3x and 4.7x faster in ResNet-50 and BERT respectively. Results for all accelerators are based on 8-map servers. Habana further showed results for a 256-core system, which delivers almost 25x better performance, compared to the theoretical scaling limit of 32x, showing that performance is maintained in scalable configurations. in which these chips are often deployed.
The thesis of most AI startups was that they could beat Nvidia by throwing out all the GPU stuff and just focusing on the AI hardware. Even though it only had a few days to submit its results since the official launch, Habana’s Gaudi2 beat Nvidia’s A100, both made on 7nm process technology, using off-the-shelf hardware and commercially available software. Habana further claims that the performance difference on unoptimized code, outside of MLPerf, can be more than 2x. Since Habana is likely to offer its Gaudi2 at a lower price than Nviida’s A100, and each Gaudi chip also has 24 built-in 100G Ethernet ports, the difference in total cost of ownership may be even greater, as Habana and AWS already claim for the first-generation Gaudi.
While Habana may have taken the performance crown this round, Nvidia has already announced its next-gen H100 with availability later this year. Habana has not yet announced cloud instances for Gaudi2.