Elevate your enterprise data technology and strategy at Transform 2021.
At Google I/O 2021, Google today formally announced its fourth-generation tensor processing units (TPUs), which the company claims can complete AI and machine learning training workloads in close-to-record wall clock time. Google says that clusters of TPUv4s can surpass the capabilities of previous-generation TPUs on workloads including object detection, image classification, natural language processing, machine translation, and recommendation benchmarks.
TPUv4 chips offers more than double the matrix multiplication TFLOPs of a third-generation TPU (TPUv3), where a single TFLOP is equivalent to 1 trillion floating-point operations per second. (Matrices are often used to represent the data that feeds into AI models.) It also offers a “significant” boost in memory bandwidth while benefiting from unspecified advances in interconnect technology. Google says that overall, at an identical scale of 64 chips and not accounting for improvement attributable to software, the TPUv4 demonstrates an average improvement of 2.7 times over TPUv3 performance.
Google’s TPUs are application-specific integrated circuits (ASICs) developed specifically to accelerate AI. They’re liquid-cooled and designed to slot into server racks; deliver up to 100 petaflops of compute; and power Google products like Google Search, Google Photos, Google Translate, Google Assistant, Gmail, and Google Cloud AI APIs. Google announced the third generation in 2018 at its annual I/O developer conference and this morning took the wraps off the successor, which is in the research stages.
Cutting-edge performance
TPUv4 clusters — or “pods” — total 4,096 chips interconnected with 10 times the bandwidth of most other networking technologies, according to Google. This enables a TPUv4 pod to deliver more than an exaflop of compute, which is equivalent to about 10 million average laptop processors at peak performance
“This is a historic milestone for us — previously to get an exaflop, you needed to build a custom supercomputer,” Google CEO Sundar Pichai said during a keynote address. “But we already have many of these deployed today and will soon have dozens of TPUv4 four pods in our datacenters, many of which will be operating at or near 90% carbon-free energy.”
This year’s MLPerf results suggest Google’s fourth-generation TPUs are nothing to scoff at. On an image classification task that involved training an algorithm (ResNet-50 v1.5) to at least 75.90% accuracy with the ImageNet data set, 256 fourth-gen TPUs finished in 1.82 minutes. That’s nearly as fast as 768 Nvidia A100 graphics cards combined with 192 AMD Epyc 7742 CPU cores (1.06 minutes) and 512 of Huawei’s AI-optimized Ascend910 chips paired with 128 Intel Xeon Platinum 8168 cores (1.56 minutes). TPUv3s had the fourth-gen beat at 0.48 minutes of training, but perhaps only because 4,096 TPUv3s were used in tandem.
The fourth-gen TPUs also scored well when tasked with training a BERT model on a large Wikipedia corpus. Training took 1.82 minutes with 256 fourth-gen TPUs, only slightly slower than the 0.39 minutes it took with 4,096 third-gen TPUs. Meanwhile, achieving a 0.81-minute training time with Nvidia hardware required 2,048 A100 cards and 512 AMD Epyc 7742 CPU cores.
Google says that TPUv4 pods will be available to cloud customers starting later this year.
VentureBeat
VentureBeat’s mission is to be a digital town square for technical decision-makers to gain knowledge about transformative technology and transact. Our site delivers essential information on data technologies and strategies to guide you as you lead your organizations. We invite you to become a member of our community, to access:
- up-to-date information on the subjects of interest to you
- our newsletters
- gated thought-leader content and discounted access to our prized events, such as Transform 2021: Learn More
- networking features, and more