How GenAI is Exposing the Limits of Data Centre Infrastructure

News Desk - 14/11/2024

Ashley Woodbridge, CTO, Lenovo Infrastructure Solutions Group, META

The race to develop and deploy generative AI is not slowing down, with growth in the energy levels required to power it doubling every 100 days, according to a recent research paper. In a recent study by Lenovo, Generative AI is being actively incorporated into company strategies, processes and offerings in the Middle East as well. 64% of companies in the Middle East have actively invested in GenAI while 34% are currently planning to invest as well. AI adoption in the Middle East is quite high, with 65% built, with 27% buying and 14% having it embedded. However, 57% has also raised that it is extremely difficult to recruit into AI-related positions.

The development of generative AI models is incredibly energy intensive, driving not only an exponential requirement for power, but also an increase in the density of compute needed. Data centres already consume up to 2% of electricity worldwide, according to the International Energy Agency, and this is set to grow. With generative AI requiring more energy-hungry hardware, there is growing demand for new ways to deal with the heat coming from the Graphics Processing Units (GPUs) which power the generative AI revolution.

Each query in ChatGPT needs nearly 10 times as much energy as a Google search, according to estimates by Goldman Sachs. This is increasing the energy demands of every data centre, and necessitating new ideas around energy use, specifically in how data centres are cooled. Traditional air cooling is no longer delivering and, in an era where every business leader has sustainability in mind, the fact that liquid cooling can reduce power consumption by up to 40% makes it an easy way to lower carbon emissions.

Hunger for power

Energy intensive Graphics Processing Units (GPUs) that power AI platforms require five to 10 times more energy than Central Processing Units (CPUs), because of the larger number of transistors. This is already impacting data centres. There are also new, cost-effective design methodologies incorporating features such as 3D silicon stacking, which allows GPU manufacturers to pack more components into a smaller footprint. This again increases the power density, meaning data centres need more energy, and create more heat.

Another trend running in parallel is a steady fall in TCase (or Case Temperature) in the latest chips. TCase is the maximum safe temperature for the surface of chips such as GPUs. It is a limit set by the manufacturer to ensure the chip will run smoothly and not overheat, or require throttling which impacts performance. On newer chips, T Case is coming down from 90 to 100 degrees Celsius to 70 or 80 degrees, or even lower. This is further driving the demand for new ways to cool GPUs.

As a result of these factors, air cooling is no longer doing the job when it comes to AI. It is not just the power of the components, but the density of those components in the data centre. Unless servers become three times bigger than they were before, efficient heat removal is needed. That requires special handling, and liquid cooling will be essential to support the mainstream roll-out of AI.

Growing popularity

Liquid cooling is growing in popularity. Public research institutions were amongst the first users, because they usually request the latest and greatest in data centre tech to drive high performance computing (HPC) and AI. Yet they tend to have fewer fears around the risk of adopting new technology before it is already established in the market.

Enterprise customers are more risk averse. They need to make sure what they deploy will immediately provide return on investment. We are now seeing more and more financial institutions – often conservative due to regulatory requirements – adopt the technology, alongside the automotive industry.

The latter are big users of HPC systems to develop new cars, and now also the service providers in colocation data centres. Generative AI has huge power requirements that most enterprises cannot fulfil within their premises, so they need to go to a colocation data centre, to service providers that can deliver those computational resources. Those service providers are now transitioning to new GPU architectures, and to liquid cooling. If they deploy liquid cooling, they can be much more efficient in their operations.

The liquid cooling difference

At Lenovo we have more than a decade of experience of liquid cooling, and our Neptune system uses pure water with intakes at room temperature so customers don’t need to spend money on chillers. Liquid cooling delivers results both within individual servers and in the larger data centre. By transitioning from a server with fans to a server with liquid cooling, businesses can make significant reductions when it comes to energy consumption. But this is only at device level, whereas perimeter cooling – removing heat from the data centre – requires more energy to cool and remove the heat. That can mean only two thirds of the energy that the data centre is using is going towards computing, the task the data centre is designed to do. The rest is used to keep the data centre cool.

Power usage effectiveness (PUE) is a measurement of how efficient data centres are. You take the power required to run the whole data centre, including the cooling systems, divided by the power requirements of the IT equipment. With data centres that are optimised by liquid, some of them are doing PUE of 1.1, and some even 1.04, which means a very small amount of marginal energy. That’s before we even consider the opportunity to take this hot liquid or water coming out of the racks, and reuse that heat to do something useful, such as heating the building in the winter, which we see some customers doing today.

Density is also very important. Liquid cooling allows us to pack a lot of equipment in a high rack density. With liquid cooling, we can populate those racks and use less data centre space overall, less real estate, which is going to be very important for AI.

Towards a cleaner data centre

The energy demands of generative AI are not going to get any smaller, and liquid cooled systems offer a way to deliver the energy density that AI demands. It empowers businesses to reduce energy use, and for data centres to accommodate the number of GPUs required to drive tomorrow’s innovation. When it comes to the huge energy demands of generative AI, air cooling is no longer enough.