Recently, Nvidia has revealed fresh performance numbers for its Hopper H100 compute GPU, which is widely used for deep learning workloads and has outperformed its predecessor, the A100, in terms of time-to-train measurements. Nvidia has also made progress in achieving performance gains through software optimizations. Furthermore, Nvidia has provided early performance comparisons between the new compact L4 compute GPU and its T4 predecessor.
In September 2022, Nvidia shared results from the MLPerf 2.1 benchmark, showcasing the H100’s superiority over the A100 by up to 4.3-4.4 times across various inference workloads. The latest performance metrics from the MLPerf 3.0 benchmark not only confirm the H100’s advantage over the A100 but also demonstrate that it is much faster than recently launched competitors such as Intel’s Xeon Platinum 8480+ (Sapphire Rapids) processor, NeuChips’ ReccAccel N3000, and Qualcomm’s Cloud AI 100 solutions, across several workloads.
The MLPerf 3.0 benchmark has confirmed Nvidia’s H100 compute GPU as a dominant performer on several workloads, such as image classification (ResNet 50 v1.5), natural language processing (BERT Large), speech recognition (RNN-T), medical imaging (3D U-Net), object detection (RetinaNet), and recommendation (DLRM).
This performance advantage was observed not only in comparison to the previous A100 model but also against new competitors such as Intel’s Xeon Platinum 8480+ (Sapphire Rapids) processor, NeuChips’ ReccAccel N3000, and Qualcomm’s Cloud AI 100 solutions. According to Nvidia, its GPUs are more widely supported across the machine learning industry compared to competitors, as some workloads are unable to run on rival platforms.
While impressive, the performance results from Nvidia in the MLPerf 3.0 benchmark require some context to understand fully. It should be noted that there are two ways for vendors to submit their results to the MLPerf benchmark: closed and open. In the secured category, all vendors must use the same neural networks to ensure a fair competition.
However, in the open category, vendors can tweak the grids to optimize performance for their hardware. Nvidia’s results are based on the closed category, which means that performance improvements made by rivals like Intel in the open category may not be reflected in Nvidia’s results. Therefore, it’s important to keep these limitations in mind when interpreting the benchmark results.
Nvidia’s recent announcement highlights the significance of software optimizations in enhancing the performance of modern AI hardware. Through software optimization in the MLPerf benchmark, the top-tier H100 compute GPU from Nvidia demonstrated substantial improvements in performance. The most recent MLPerf 3.0 outcomes indicate that the H100 surpassed its predecessor, the A100, in several inference workloads. In multiple workloads, the H100 also surpassed the performance of other competing solutions, such as Intel’s Xeon Platinum 8480+ and NeuChips’s ReccAccel N3000.
Notably, Nvidia’s GPUs demonstrated better support across the ML industry, as some workloads failed on competing solutions. However, it’s essential to consider that the benchmark results were based on the closed category, which means that performance improvements made by competitors like Intel in the open class may not be reflected in Nvidia’s results. The performance gains achieved by the H100 due to software optimization indicate the immense potential of investing in this area to enhance the capabilities of AI hardware.
In a recent blog post, Dave Salvator, Director of AI, Benchmarking, and Cloud at Nvidia, emphasizes the critical role of AI inference performance. Salvator uses the analogy of the iPhone launch to illustrate the rapid expansion of AI and its significance in various industries. According to Salvator, deploying AI on factory floors and online recommendation systems has led to an insatiable demand for efficient inference performance. He points out that the increasing popularity of language models such as ChatGPT further highlights the need for faster and more powerful hardware. Therefore, it is clear that AI inference performance is a crucial driver of the success of AI applications in the current technological landscape.
Additionally, while these GPUs are already available through major systems makers and cloud service providers, it is still necessary to perform further testing to determine their true capabilities. Nvidia has conducted a comparison of its L4 compact data center GPU with its T4 predecessor using the MLPerf 3.0 benchmark. The L4 GPU, which utilizes the Ada Lovelace architecture, has been found to be 2.2-3.1 times faster than the T4, which is based on the TU104 architecture. Dave Salvator has also highlighted the L4 GPU’s enhanced capabilities in image decoding, video processing, graphics, and real-time rendering. While Nvidia’s H100 and L4 compute GPUs have delivered impressive results in the benchmarking, it’s crucial to note that these numbers come from Nvidia’s own releases rather than independent testing.