New rumors posted on the forum chipell suggest that the compute-focused Nvidia GH100 GPU, based on the yet-to-be-announced Hopper microarchitecture, can deliver a transistor count about 2.5 times that of its predecessor, the Ampere-architected GA100, and be larger even than the newly -Announced Instinct MI250X, AMD rival with multi-chip design.
Nvidia GH100 may have 2.5 times more transistors than GA100
According to rumors, the GH100 would be equipped with 140 billion transistors, an absurdly high number even compared to other computing solutions. In comparison, an RTX 3090, with a GA102 GPU, has 28.3 billion transistors, while the Radeon RX 6900 XT’s Navi 21 has 26.8 billion.
nVidia HPC chips: Transistors per SM
255M – GP100251M – GV100422M – GA100
GH100: 486M if 70B transistors per die (and 140B overall)… or: 972M if 140M transistors per die (and 280B overall)https://t.co/99fLpRKC24
— 3DCenter.org (@3DCenter_org) February 8, 2022
Want to stay on top of the best tech news of the day? Access and subscribe to our new youtube channel, Kenyannews News. Every day a summary of the main news from the tech world for you!
Within the data center segment, the GA100 chip, present in the Nvidia A100 acceleration board, offers 54.2 billion transistors. The Aldebaran GPU, used in the Instinct MI250X, is slightly larger, with 58.2 billion transistors. Even so, with sizes that impressed in their respective releases, both are 2.5 times smaller than the configuration Nvidia would supposedly be preparing with the Hopper architecture.
The information is in line with other recent rumors that indicated that the new Hopper GPU would be one of the largest ever released, with an area close to 900 mm². If this is the case, adding the possibility that the component will have 140 billion transistors, this would also be one of the densest graphics chips to hit the market.
Maybe around 900mm². 😂😂😂 I feel sorry, because I didn’t know you had such a big response to the “1000”.
— kopite7kimi (@kopite7kimi) January 29, 2022
Taking the GA100 as a base again, whose area is speculated to be 790 mm², we have a density of 73.6 million transistors per mm². If we consider that the area of the GH100 will be 900 mm², this would represent 150 million transistors per mm², an increment of more than twice. The numbers are impressive and show what the supposed adoption of TSMC’s 5nm lithography can provide for the company’s next generation of GPUs.
GPU Hopper can bring up to 233GB of RAM and 1.9GB of cache
It’s not just the transistor count that the GH100 would impress — rumor has it, the specifications of Nvidia’s new data center accelerator would shake up the market and put the company back in the performance leadership in the segment, after suffering a defeat with the huge gains. of the Instinct MI250X and its microarchitecture CDNA 2.
The huge area would translate to a total of 144 Streaming Multiprocessors (SMs) per die which, in turn, would represent 9,216 CUDA cores if the 64 cores per SM count seen in the Ampere architecture for computing is maintained. Taking into account rumors that a supposed GH102 variant will utilize multi-chip MCM design, the count could double to an impressive 288 SMs, with 18,432 cores.
Another interesting point would be the way the GPU would communicate with the cache and memories, by adopting a redesigned architecture just announced in Nvidia’s own study, called Composable On-Package Architecture, or just COPA, which would allow the production of different designs. for various purposes.
“The results show that when compared to a converged GPU design, COPA-GPU featuring a combination of 16x larger cache capacity and 1.6x higher DRAM bandwidth scales per-GPU training and inference performance by 31% and 35% respectively. (… )” pic.twitter.com/oXcg5e8Jfo
— Underfox (@Underfox3) April 7, 2021
At the moment, two possible applications of COPA GPUs are pointed out: high performance computing (HPC) or Deep Learning (DL). While the HPC format would follow the already known industry standard, the architecture for DL would have a cache chip completely separate from the GPU, allowing the implementation of up to 1,920 MB of LLC cache, working with up to 233 GB of HBM2e RAM and bandwidth of 6.3 TB/sec.