-0.4 C
New York
Sunday, January 26, 2025

The perfect networking structure utilized by main hyperscaler (aws, gcp, azure) to deal with large east-west visitors


I need to study the very best networking structure utilized by main hyperscaler datacenters. I am notably speaking concerning the hyperscalers which are optimized for large distributed ai coaching utilizing greater than 50,000 gpus or so.

Is Backbone and leaf structure the very best structure for datacenter with large east-west visitors?

How do they join 50K and even 100K+ gpus (not too long ago in XAI’s datacenter by elon musk and his crew) on a single community?

Backbone and leaf structure requires each backbone change to attach with each leaf (Prime-Of-Rack) change and vice versa.

However how do you join leaf switches which are past the variety of ports backbone change has?
The best way to join 100k+ Nvidia GPUs with all backbone switches?
I’m not capable of perceive this.
Typical useful resource on the web exhibits backbone and leaf structure like under. They’re simply connecting as many leaf with backbone. What if there are extra leaf than the ports in backbone change?
enter image description here

I did some analysis and got here to this analysis paper.

Use of BGP for Routing in Giant-Scale Knowledge Facilities

Is that this the identical structure the hyperscaler cloud supplies use?
I’m attempting to design datacenter structure myself that may be deployable past 100k+ gpus in a single large facility (for studying function.🙂). I couldn’t discover any useful resource on how to try this.
So, I’m on the lookout for reply on following questions.

  • The best way to join 10k+ racks (100k+ gpus inside it) in backbone and leaf structure dealing with large east-west visitors whether it is? If not, please point out it.
  • How is cable administration carried out? I’ve seen NVIDIA DGX superpod picture
    • They’ve compute node and administration node.
    • How do they join this in cluster? Say I need to join 10 superpod, how do they join these superpods in backbone and leaf structure? (Any concept on what number of wires do they join from one superpod to a different?)

nvidia dgx superpod image

Related Articles

LEAVE A REPLY

Please enter your comment!
Please enter your name here

Latest Articles