One of the main benefits of AWS is being able to select from over 400+ types of instances for running your high performance computing (HPC) applications. In addition to differences in number of cores, memory, and storage, AWS offers different architectures from Intel, AMD, and our own Arm-based Graviton instances. We also offer many different hardware accelerators such as GPUs and FPGAs. We even offer workload-specific instances for deep learning, inference, or video transcoding.
With all that choice, it becomes a challenge for customers to define what benefits or trade-offs one instance may have over others with respect to any given application. A simple strategy is to test the most capable instance that can run your application at a reasonable price point. If that experiment works, customers tend to end their search there. While this is a fine method for most workloads, it is not for cost and performance–sensitive applications in HPC.
Over at the AWS HPC blog channel, we have been conducting a few price-performance benchmarks for several popular HPC codes, including weather modeling, finite element analysis, quantum chemistry, molecular dynamics and seismic analysis. These studies all focused on differences between CPU architectures, and/or scaling across multiple instances leveraging MPI for communication. We also wanted to take a look at the effect a GPU accelerator would have on HPC codes.
Recently we did just that and documented our study of how GROMACS, a popular open-source molecular dynamics application, runs on different instances equipped with different types and number of NVIDIA GPUs. You can read the whole three-part series here:
-
- Part 1 – Introduction to GROMACS and how it leverages CPU and GPU resources.
- Part 2 – A look at GROMACS price-performance on a single instance with different types and number of GPUs.
- Part 3 – A look at GROMACS price-performance running across multiple instances, with and without GPUs, with and without enabling Elastic Fabric Adapter, our high-speed, low-latency, high-throughput networking technology.
Prior to our study, we would have guessed that our biggest GPU instances, the Amazon EC2 P4d instances equipped with 8 NVIDIA A100 GPUs would have trounced all other instances in performance, but this was not the case for GROMACS. We actually found that in both single- and multi-instance experiments, the G4dn instance family, equipped with the same processor but NVIDIA T4 GPUs, performed just as well as the P4d instances. It did so with a ~4X better price-to-performance ratio. This result has more to do with how GROMACS uses a GPU vs the full set of capabilities a GPU family has for other workloads.
Another surprise came when we saw that price-performance was worse for certain cluster sizes. This was due to how GROMACS leverages MPI ranks and threads when distributing work, and there are intervals where adding more hardware does not provide a gain until you reach the next interval.
Read the articles above to find out more specifics, but the take home message from our study was that a careful test of your application’s price-to-performance ratio with different-sized problems and different architectures can pay dividends later when you go to production.
If you would like to learn more about running your HPC workloads on AWS, visit our website. For more stories like the above, tune into our HPC Tech Shorts video series, and read our blog. Please join us at SC21 to learn more about running your largest, most complex workloads on AWS, and enter for a chance to win $200 in AWS credits.
"with" - Google News
November 01, 2021 at 04:33PM
https://ift.tt/3mvSTAQ
Optimizing price-performance for GROMACS with GPUs - HPCwire
"with" - Google News
https://ift.tt/3d5QSDO
https://ift.tt/2ycZSIP
Bagikan Berita Ini
0 Response to "Optimizing price-performance for GROMACS with GPUs - HPCwire"
Post a Comment