Tutorial on Scaling to Petaflops with Intel Xeon Phi

Over at Dr. Dobbs, Rob Farber has posted a tutorial on using MPI to tie together thousands of Intel Xeon Phi coprocessors. Farber uses his MPI code example on the Stampede supercomputer at TACC, achieving a remarkable 2.2 Petaflops of performance when running on 3000 nodes.

Observed scaling to 3000 nodes of the TACC Stampede supercomputer.

This article demonstrates how to utilize Intel Xeon Phi coprocessors to evaluate a single objective function across a computational cluster using MPI. The example code can be used with existing numerical optimizations libraries to solve real problems of interest to data scientists. Performance results show that the TACC Stampede supercomputer is indeed capable of sustaining many petaflops of average effective performance. In other words, “Effective Performance” or “Honest flops” that take into account all communications overhead. Small compute clusters containing 256 nodes, which are affordable for schools and small research organizations, have the ability to exceed the peak theoretical performance of multimillion-dollar machines that are still operational at the smaller U.S. national laboratories, and deliver performance that approaches that of even large leadership-class supercomputers that are only a few years old.

Read the Full Story.

The post Tutorial on Scaling to Petaflops with Intel Xeon Phi appeared first on insideHPC.

RELATED ARTICLESMORE FROM AUTHOR

Celebrating the Second Year of Linux Man-Pages Maintenance Sponsorship

How to Deploy Lightweight Language Models on Embedded Linux with LiteLLM

Automating Compliance Management with UTMStack’s Open Source SIEM & XDR

Using OpenTelemetry and the OTel Collector for Logs, Metrics, and Traces

Xen 4.19 is released

RELATED ARTICLES MORE FROM AUTHOR