Data centers are great, but they are very big and complex. Nvidia is working on changing that with their latest specialized hardware. Like the AI-based system DGX A100 which they introduced in May.
The company says the A100 system can almost replace an entire data center into a single platform with a much smaller footprint. It definitely has quite the hardware and power which results in it being ideal for complex tasks like AI training or even coronavirus treatment analysis.
The DGX A100 uses eight Nvidia A100 Tensor Core GPUs and 320GB memory. The system also uses a 200Gbps interconnect from Mellanox. It’s capable of providing about five petaflops of computing power. And the system is flexible which allows it to work in a cluster with other A100s – a total of 140 A100 can work together, providing massive computing capabilities.
“NVIDIA DGX A100 is the ultimate instrument for advancing AI,” said Jensen Huang, founder and CEO of NVIDIA. “NVIDIA DGX is the first AI system built for the end-to-end machine learning workflow – from data analytics to training to inference. And with the giant performance leap of the new DGX, machine learning engineers can stay ahead of the exponentially growing size of AI models and data.”
Each A100 takes 6 rack units and you can have five of them in a single data center rack. Each A100 costs $199 000 and uses as much as 6.5kW which means a full rack will have between 28kW to 32.5kW density.
Would it actually change data centers?
The question is whether such a system will actually drive a change in the typical data center configuration. On paper, the A100 has everything a data center operator might want – massive power, decent energy consumption, flexibility and a small footprint.
There’s certainly interest for such hardware. The reasons for that are obvious. But there are some additional challenges. Data centers have to be “Nvidia DGX-Ready Data Center Program partners” to be able to use the A100. Of course, becoming a partner to a provider isn’t anything new, although it may increase worries about vendor lock-in.
Still, that’s a decision which is up to every data center operator. Nvidia is working forward and it already has the A100 work at the US Argonne National Laboratory.
“We’re using America’s most powerful supercomputers in the fight against COVID-19, running AI models and simulations on the latest technology available, like the NVIDIA DGX A100,” said Rick Stevens, associate laboratory director for Computing, Environment and Life Sciences at Argonne. “The compute power of the new DGX A100 systems coming to Argonne will help researchers explore treatments and vaccines and study the spread of the virus, enabling scientists to do years’ worth of AI-accelerated work in months or days.”