xAI planning to 10x its “Colossus” supercluster
Not content with already having the largest supercomputing cluster in the world, Elon Musk’s xAI is planning on scaling up its “Colossus” supercomputer from 100,000 Nvidia GPUs to 1 million of the expensive chips, according to reporting by the Financial Times.
Current-generation Nvidia H100 GPUs are estimated to cost anywhere between $20,000 to $40,000 each, meaning the total cost of the super-jumbo cluster could easily cost upward of $20 billion.
The xAI cluster, which is located in South Memphis, Tennessee, was built in three months, a blistering pace that shocked the AI industry. The specialized GPUs are used for both training large language models as well as running applications powered by the model, like X’s Grok chatbot, which the social-media platform calls “your humorous AI Assistant on X.”
When Colossus was first turned on, civic groups and the city council in the South Memphis community were caught by surprise, as power-utility officials were prevented from discussing the plans after signing xAI nondisclosure agreements. xAI’s use of temporary, portable methane-powered gas turbines, which don’t require permits, raises questions about how the company plans to meet the extreme electricity and water requirements to power and cool the massive computing cluster.
The current Colossus cluster was reported by NPR to need a million gallons of community drinking water per day — as well as 50 megawatts of local utility power in addition to the energy from the turbines.
The xAI cluster, which is located in South Memphis, Tennessee, was built in three months, a blistering pace that shocked the AI industry. The specialized GPUs are used for both training large language models as well as running applications powered by the model, like X’s Grok chatbot, which the social-media platform calls “your humorous AI Assistant on X.”
When Colossus was first turned on, civic groups and the city council in the South Memphis community were caught by surprise, as power-utility officials were prevented from discussing the plans after signing xAI nondisclosure agreements. xAI’s use of temporary, portable methane-powered gas turbines, which don’t require permits, raises questions about how the company plans to meet the extreme electricity and water requirements to power and cool the massive computing cluster.
The current Colossus cluster was reported by NPR to need a million gallons of community drinking water per day — as well as 50 megawatts of local utility power in addition to the energy from the turbines.