

As shown i n T able 2, the standard double precision ( FP64 ) theoretical peak and the FP64 tensor DGEMM peak performance are both at 11.5 TFLOPS.

In the DGEMM (double - precision GEMM) benchmark, the theoretical peak performance of the AMD MI100 GPU is 11.5 TFLOPS and the measured sustained performance is 7.9 TFLOPS.DGEMM and SGEMM for both AMD MI100 peak and AMD-PCIe sustained The following figure shows the observed numbers of DGEMM and SGEMM:įigure 2. Although GEMM benchmark results might not represent real-world application performance, it is still a good benchmark to demonstrate the performance capability of different GPUs. The results of these tests reflect the performance of an ideal application that only runs matrix multiplication in the form of the peak TFLOPS that the GPU can deliver. The rocblas-bench binary compiled from was used to collect DGEMM and SGEMM results. The GEMM benchmark is a simple, multithreaded dense matrix-to-matrix multiplication benchmark that can be used to test the performance of GEMM on a single GPU. The following table provides the configuration details of the PowerEdge R7525 system under test (SUT): We present results from the general matrix multiplication (GEMM) microbenchmarks, the LAMMPS benchmarks, and the NAMD benchmarks to showcase performance and scalability. This blog focuses on the performance characteristics of a single PowerEdge R7525 server with AMD MI100-32G GPUs.
FP32 VS FP64 PORTABLE
FP32 VS FP64 SOFTWARE
