Google Dethrones NVIDIA With Split Results In Latest Artificial Intelligence Benchmarking Tests

Google Dethrones NVIDIA With Split Results In Latest Artificial Intelligence Benchmarking Tests

Digital transformation is responsible for artificial intelligence workloads being created at an unprecedented scale. These workloads require corporations to collect and store mountains of data. Even as business intelligence is being extracted from current machine learning models, new data inflows are being used to create new models and update existing models.

Building AI models is complex and expensive. It is also very much different than traditional software development. Artificial intelligence models need specialized hardware for accelerated compute and high-performance storage as well as a purpose-built infrastructure to handle AI’s technical nuances.

In today’s world, many critical business decisions and customer-facing services rely on accurate machine learning insights. To train, run, and scale models as quickly and accurately as possible, an enterprise have the knowledge to choose the best hardware and software for its machine learning applications.

Benchmarking

MLCommons is an open engineering consortium that has made it easier for companies to make machine learning decisions with its standardized benchmarking. Its mission is to make machine learning better for everyone. Tests are conducted and unbiased comparisons help companies determine which vendor best suits its artificial intelligence application requirements. The foundation for MLCommons began its first MLPerf benchmarking in 2018.

MLcommons recently conducted a benchmarking program called MLPerf Training v2.0 to measure the performance of hardware and software used to train machine learning models. There were 250 performance results reported from 21 different submitters, including Azure, Baidu
BIDU

BIDU
Dell, Fujitsu, GIGABYTE, Google
GOOG

GOOG
Graphcore, HP
HPQ

HPQ
E, Inspur, Intel-Habana Labs, Lenovo, Nettrix, NVIDI
VIDI

NVDA

IAD

VIDI

NVDA

IAD
A, Samsung, and Supermicro.

This round of testing focused on determining how long it takes to train various neural networks. Faster model training leads to speedier model deployment, impacting the model’s TCO and ROI.

A new object detection benchmark was added to MLPerf Training 2.0, which trains the new RetinaNet reference model on a larger and more diverse dataset called Open Images. This new test reflects state-of-the-art ML training for applications like collision avoidance for vehicles and robotics, retail analytics, and many others.

Results

Machine learning has seen much innovation since 2021, both in hardware and software. For the first time since MLPerf began, Google’s cloud based TPU v4 ML supercomputer outperformed NVIDIA A100 in four out of eight training tests covering language (2), computer vision (4), reinforcement learning (1), and recommender systems (1).

According to the graphic comparing the performance of Google and NVIDIA, Google had the quickest training times for BERT (language), ResNet (image recognition), RetinaNet (object detection), and MaskRCNN (image recognition). As for DLRM (recommendation), Google came in narrowly ahead of NVIDIA, but this was a research project and unavailable for public use.

Overall, Google submitted scores for five out of the eight benchmarks, best training times are shown below:

In a discussion with Vikram Kasivajhula, Google’s Director of Product Management for ML Infrastructure, I asked what approach Google used to make such dramatic improvements in the TPU v4.

“We’ve been focusing on addressing the pain points of large model users who are innovating at the frontiers of machine learning,” he said. “Our cloud product is in fact, an instantiation of this focus. We have also been focusing on performance per dollar. As you can imagine, these models get incredibly large and expensive to train. One of our priorities is to make sure it is affordable.”

A one-of-a-kind submission

A unique submission was made to MLPerf Training 2.0 by a Stanford graduate student, Tri Dao. Dao submitted an 8-A100 system for BERT training.

NVIDIA also had a submission using the same configuration as Dao. I suspect it was a courtesy submission by NVIDIA to provide Dao with a documented point of comparison.

NVIDIA finished training the BERT model with its 8-A100 in 18.442 minutes while Dao’s submission only took 17.402 minutes. He achieved a faster training time by using a method called FlashWarning. Attention is a technique that mimics cognitive attention. The effect enhances some parts of the input data while diminishing other parts — the motivation is that the network should devote more focus to the small but important parts of the data.

Wrap-up

Over the past three years, Google has made a lot of progress with its TPU. Similarly, NVIDIA has used its A100 successfully for four years. A great deal of software improvement has been put into the A100, as evidenced by its long record of accomplishments.

We are likely to see NVIDIA submissions in 2023 using both its A100 and the new H100, a beast by any current standard. Everyone was hoping to see H100 performance this year, but NVIDIA did not submit it since it was not publicly available.

Software improvements in general were obvious in the latest results. Kasivajhula said that hardware was only half the story of Google’s improved benchmarks. The other half was software optimizations.

“Many optimizations were learned from our own cutting edge benchmark use cases across YouTube and search,” he said. “We are now making them available to users.”

Google also made several performance improvements to the virtualization stack to fully utilize compute power of both CPU hosts and TPU chips. The results of Google’s software improvements were shown by its peak performance on image and recommendation models.

Overall, Google cloud TPUs offer significant performance and cost savings at scale. It will take time to find out if the advantages are enough to entice more customers to switch to Google Cloud TPUs.

Longer term, Google’s top results in the major categories may foreshadow NVIDIA achieving fewer top MLPerf results in the future. It is in the ecosystem’s best interest to see heavy contention among multiple vendors for MLPerf top performance results.

One thing is for certain, MLPerf Training 2.0 was much more interesting than in previous rounds when NVIDIA claimed performance victories in almost every category.

Full results of MLPerf Training 2.0 are available here.

Paul Smith-Goodson is the Vice President and Principal Analyst for quantum computing, artificial intelligence and space at Moor Insights and Strategy. You can follow him on Twitter for current information on quantum, AI, and space.

Note: Moor Insights & Strategy writers and editors may have contributed to this article.

Moor Insights & Strategy, like all research and tech industry analyst firms, provides or has provided paid services to technology companies. These services include research, analysis, advising, consulting, benchmarking, acquisition matchmaking, and speaking sponsorships. The company has had or currently has paid business relationships with 8×8, Accenture
ACN

ACN
A10 Network
ATEN
k
ATEN
s, Advanced Micro Device
AMD
s
AMD
Amaz
AMZN
oh
AMZN
n, Amazon Web Services, Ambient Scientific, Anuta Networks, Applied Brain Research, Applied Micro, Apstra, Arm, Aruba Networks (now HPE), Atom Computing, AT&T
T

T
Aura, Automation Anywhere, AWS, A-10 Strategies, Bitfusion, Blaize, Box, Broadcom
AVGO

AVGO
C3.AI, Cal
CALX
I
CALX
x, Campfire, Cisco System
CSCO
m
CSCO
s, Clear Software, Cloud
CLDR
ra
CLDR
Clumio, Cognitive Systems, CompuCom, Cradlepoint, CyberArk, Dell, Dell EMC, Dell Technologies
DELL

DELL
Diablo Technologies, Dialogue Group, Digital Optics, Dreamium Labs, D-Wave, Echelon, Ericsson, Extreme Networks
EXT

EXT
Five9, Flex, Foundries.io, Foxconn, Frame (now VMware
vmw

vmw
), Fujitsu, Gen Z Consortium, Glue Networks, GlobalFoundries, Revolve (now Google), Google Cloud, Graphcore, Groq, Hiregenics, Hotwire Global, HP Inc., Hewlett Packard Enterprise, Honeywell, Huawei Technologies, IBM
IBM

IBM
Infinidat, Infosys, Inseego, IonQ, IonVR, Inseego, Infosys, Infiot, Intel, Interdigital, Jabil Circuit
JBL

JBL
Keysight, Konica Minolta, Lattice Semiconduct
LSCC
gold
LSCC
Lenovo, Linux Foundation, Lightbits Labs, LogicMonitor, Luminar, MapBox, Marvell Technology
MRVL

MRVL
Mavenir, Marseille Inc, Mayfair Equity, Meraki (Cisco), Merck KGaA, Mesophere, Micron Technology
MU

MU
Microsoft
MSFT
you
MSFT
MiTEL, Mojo Networks, MongoDB, MulteFire Alliance, National Instruments
NATI

NATI
Neat, NetAp
NTAP
p
NTAP
Nightwatch, NOKIA (Alcatel-Lucent), Nortek, Novumind, NVIDIA, Nutanix, Nuvia (now Qualcomm
QCOM

QCOM
), onsemi, ON
NAKED
U
NAKED
G, OpenStack Foundation, Orac
ORCL
the
ORCL
Palo Alto Net
PANW
work
PANW
s, Panasas, Peraso, Pexip, Pixelwork
PXLX
s
PXLX
Plume Design, PlusAI, Poly (formerly Plantroni
PLT
cs
PLT
), Portworx, Pure Stora
PSTG
g
PSTG
e, Qualcomm, Quantinuum, Rackspace, Rambu
RMBS
s
RMBS
Rayvolt E-Bikes, Red Ha
RHT
you
RHT
Renesas, Residio, Samsung Electronics, Samsung Semi, SAP, SAS, Scale Computing, Schneider Electric, SiFive, Silver Peak (now Aruba-HPE), SkyWorks, SONY Optical Storage, Splunk, Springpath (now Cisco), Spirent, Splunk, Sprint
S

S
(now T-Mobile), Stratus Technologies, Symante
NLOK
vs
NLOK
Synaptic
SYNA
s
SYNA
Syniverse, Synops
SNPS
ys
SNPS
Tanium, Telesign, TE Connectivity, TensTorrent, Tobii Technology, Teradata
CDT

CDT
,T-Mobile, Treasure Data, Twitter, Unity Technologies, UiPath, Verizon Communications
VZ

VZ
VAST Data, Ventana Micro Systems, Vidyo, VMware, Wave Computing, Wellsmith, Xilinx
XLNX

XLNX
Zayo, Zebra, Zededa, Zende
ZEN
sk
ZEN
Zoho, Zoom, and Zsca
SZ
first
SZ
. Moor Insights & Strategy founder, CEO, and Chief Analyst Patrick Moorhead is an investor in dMY Technology Group Inc. VI, Dreamium Labs, Groq, Luminar Technologies, MemryX, and Movandi.

.

LEAVE A REPLY

Please enter your comment!
Please enter your name here