“Nvidia’s supercomputer GPUs are designed to accelerate demanding workloads in data centers and scientific research.
Chinese researchers have achieved nearly 10 times the performance of Nvidia-powered US supercomputers using domestic graphics processors, according to a peer-reviewed study.
This breakthrough challenges the long-standing dominance of US-made chips in advanced scientific research and highlights the unintended consequences of Washington’s tech sanctions.
Escaping the ‘chokepoint’
According to South China Morning Post, experts warn that software optimization alone cannot bridge the hardware gap indefinitely.
This development reflects Beijing’s broader strategy to reduce dependence on Western chip technologies amid ongoing US sanctions.
High-computation fields like flood defense planning and urban waterlogging analysis require massive processing power, which has been limited by hardware restrictions.
The production of advanced GPUs such as Nvidia’s A100 and H100 is dominated by foreign manufacturers, and US export restrictions further hinder China’s access.
Additionally, Nvidia’s CUDA software ecosystem is restricted from running on third-party hardware, limiting independent algorithm development.
The software-enabled solution
Professor Nan Tongchao from Hohai University in Nanjing led research on a “”multi-node, multi-GPU”” parallel computing approach to improve supercomputing efficiency.
His team optimized data exchange between nodes to reduce performance losses in parallel computing.
In 2021, Oak Ridge National Laboratory introduced the TRITON flood forecasting model on the Summit supercomputer, but it achieved only a sixfold speed increase using 64 nodes.
Nan’s approach combined multiple GPUs into a single node to counterbalance domestic hardware limitations, drastically reducing communication overhead.
Implemented on a domestic x86 computing platform with Hygon 7185 processors (32 cores, 64 threads, 2.5 GHz) and domestic GPUs supported by 128GB memory and 200 Gb/s bandwidth, the model achieved a sixfold speedup using just seven nodes, reducing node usage by 89% compared to TRITON.
Nan’s team validated the model by simulating flood evolution at the Zhuangli Reservoir in Shandong province, using 200 computational nodes and 800 GPUs. The simulation was completed in just three minutes, reaching a speedup of over 160 times.
“Simulating floods at a river basin scale in just minutes allows real-time flood evolution analysis and better disaster prevention,” Nan said.
The research code is open-source, and Nan stated that the findings could extend to hydrometeorology, sedimentation, and surface-water-groundwater simulations.”