Huawei OceanStor A Series Storage Retains Top Spot for Performance in MLPerf Storage Benchmarks

[Shenzhen, China, August 15, 2025] MLCommons, an AI engineering consortium, built on a philosophy of open collaboration, recently released the results of its MLPerf Storage v2.0 benchmark suite. In this round of testing, Jinan Institute of Supercomputing Technology (JNIST) and Huawei collaborated to deliver notable results, with OceanStor A series storage ranking first worldwide in multiple performance metrics—including performance per storage system, per rack unit, and per client.

MLPerf Storage is the industry’s authoritative benchmark for measuring AI storage performance, renowned for its strict standardization and cross-vendor comparability. This year’s tests included 26 mainstream vendors.

For model training, the MLPerf Storage benchmark suite includes the 3D U-Net workload, focusing on GPU utilization and scale-out capabilities. It evaluates how well storage systems can support the computing power demands of large-scale AI clusters. A new addition in this version is the checkpointing mode, which is the industry’s first standard test for assessing checkpoint performance during large AI model training. It covers scenarios like resumable training and model archiving. These tests offer valuable guidance for selecting storage.

Huawei OceanStor A Series Storage Sets a New Global Record for Model Training Performance, Reaching 698 GiB/s

In the bandwidth-intensive 3D U-Net training test, Huawei OceanStor A series storage systems ranked first globally in performance across three categories, while maintaining GPU utilization above 90%.

  • An 8 U dual-node OceanStor A800 system sustained stable bandwidth of 698 GiB/s, meeting the training requirements on 255 H100 GPUs.
250815-07

3D U-Net test case: No. 1 in performance per storage system

  • Similarly, a 2 U dual-node OceanStor A600 system met the training requirements on 76 H100 GPUs, with a bandwidth of 108 GiB/s per rack unit and 104 GiB/s per client.
250815-13

3D U-Net test case: No. 1 in performance per rack unit and per client

OceanStor A Series Storage Delivers 6.7x Higher Checkpointing Performance than the Second-Best Performer

In the checkpointing test, Huawei OceanStor A series storage ranked first in performance in the scenarios of single client with eight simulated GPUs.

  • Llama3_8b: 40.2 GiB/s read bandwidth and 20.5 GiB/s write bandwidth.
  • Llama3_70b: 68.8 GiB/s read bandwidth and 62.4 GiB/s write bandwidth, 6.7x higher than second place.
250815-10

Checkpointing test case: No. 1 in performance per client

Huawei OceanStor A Series Storage Accelerates the Adoption of Large AI Models with Successive Innovations

Purpose-built to meet the growing demands for computing power, Huawei OceanStor A series storage uses the latest technological innovations to ensure performance can keep up with client and node growth. It delivers stable cluster bandwidth in hundreds of TBs, enhanced data access for large-scale training, and end-to-end acceleration of training and inference.

OceanStor A series storage provides high scalability—to EB-level capacity—meeting the storage needs of mass data. In terms of data resilience, it achieves 99.999% high reliability through architecture innovation. OceanStor A series storage also builds a new data paradigm with a PB-level Key-Value (KV) cache resource pool, reducing Time To First Token (TTFT) by up to 90% while ensuring inference accuracy, and improving inference throughput over 10-fold in long-sequence scenarios. Furthermore, OceanStor A series storage provides a built-in Retrieval-Augmented Generation (RAG) knowledge base and supports multi-mode retrieval of scalars, vectors, tensors, and graphs, significantly lowering the barrier to entry for using large AI models.

Looking ahead, Huawei will continue to innovate OceanStor A series storage tailored for High-Performance Computing (HPC) and large AI model training and inference, working with customers to build an intelligent future.

Continue Reading