The Wafer Scale Engine measures 8 inches by 8 inches, which is significantly bigger than a 1-inch to 1.5-inch GPU. Whereas a GPU has about 5,000 cores, the WSE has 850,000 cores and 40 GB of on-chip SRAM reminiscence, which is 10 instances sooner than HBM reminiscence utilized in GPUs. Meaning 20 PB/sec of reminiscence bandwidth and 6.25 petaflops of processing energy on dense matrices and 62.5 petaflops on sparse matrices.
In one other benchmark in opposition to the Meta Llama 3.1-405B mannequin used to coach generative AI to answer human enter, Cerebras produced 969 tokens per second, far outpacing the quantity two performer, Samba Nova, which generated 164 tokens per second. That makes Cerebras’s throughput 12 instances sooner than AWS’s AI occasion and 6 instances sooner than its closest competitor, Samba Nova.
Cerebras isn’t shy in regards to the secret to its success. In accordance with James Wang, director of product advertising at Cerebras, it’s the large Wafer Scale Engine with its 850,000 cores that may all speak to one another at excessive speeds.
“Supercomputers at the moment are nice for weak scaling,” mentioned Wang. “You are able to do extra work, extra quantity of labor, however you’ll be able to’t make the identical work go sooner. Usually it tapers out on the max variety of GPUs you’ve per node, which is round eight or 16, relying on configuration. Past that, you are able to do extra quantity, however you’ll be able to’t go sooner. And we don’t have this drawback. We actually, as a result of our chip itself is so massive, transfer the robust scaling curve up by one or two orders of magnitude.”
Inside a single server with eight GPUs, the GPUs use NVLink to share knowledge and talk, to allow them to be programmed roughly to appear to be a single processor, Wang provides. However as soon as it goes past eight GPUs, in any supercomputer configurations, the interconnect adjustments from NVLink to InfiniBand or Ethernet, and at that time, “they will’t be programmed like a single unit,” Wang says.
Earlier this month, Cerebras introduced that Sandia Nationwide Laboratories is deploying a Cerebras CS-3 testbed for AI workloads.