The Ruby cluster is composed of both standard Intel Xeon CPUs as well as new Xeon Phi coprocessors. Special considerations must be taken both when compiling for and running software on the Phi coprocessors. This guide provides general information on the Phi coprocessors and breaks down the different types of programming models available for them. For detailed information on compiling software for our Phis, please refer to our Phi Compiling Guide.
What are the Xeon Phi coprocessors?
The Xeon Phi coprocessors (commonly referred to as both accelerators and MICs) can be thought of as complementary add-ons to Ruby's standard Xeon Host CPUs. Much like the GPU accelerators available on Oakley and Glenn, they are used to increase performance by providing a specialized computational environment optimized for certain operations. For certain operations, the Phis will be orders of magnitude faster than the same operation run on the CPU.
How are the Phi coprocessors different than GPUs?
The goal of each is to do the same thing, but how they do this differs. Xeon Phis run Intel assembly code similar to that of the Xeon Host CPUs. By simply recompiling source code for the Phi, most programs can greatly benefit. On the other hand, GPUs traditionally run their own proprietary code, requiring programs to be tediously tailored for these specific GPUs before they can be compiled and run.
Should I use the Phis?
Short answer: If you are going to run your code on Ruby, yes. Getting code to run on the Phis can be such a simple process and result in significant gains that there is little reason not to.
For more help on determining whether to use the Phi, refer to Intel's guide on the subject.
You do not have to take advantage of the Phis to run code on Ruby.
How can I use the Phis?
There are three main ways of taking advantage of the Phi's computing power:
- Compile binary for Phi ONLY
- Done with
- SSH to Phi and then run
- Good for getting familiar with Phi characteristics
- Host sits idly during code execution
Symmetric/Heterogeneous Execution (Using MPI)
- Compile and run code on both Host and Phi
- Both the Host and the Phi operate "symmetrically" as MPI targets
- Requires careful load balancing between MPI tasks due to differences between Host and Phi (as well as OpenMP threads if taking the hybrid approach)
Automatic Offload (AO) with Intel Math Kernel Library (MKL)
- Some MKL routines are automatically offloaded to Phi when code is run on the Host
- Does not require any changes to code -- completely transparent to the user
- MKL determines if computation will benefit from offloading
Compiler Assisted Offloading (CAO)
- Add offload directives/pragmas to parts of code you want offloaded
- If the Phi is unavailable for any reason, code defaults back to running on the Host
- Two programming sub-models differing on data movement
Explicit - Code directs data movement to/from Phi
- Only supports arrays of scalar or bitwise copyable structure or class. For more complex C/C++ data types, use implicit model
Implicit - Code establishes virtual "shared memory" model, data is synchronized automatically between Host and Phi at established points
- only available for C/C++
- appropriate for complex, pointer-based data structure (linked lists, binary trees, etc.)
- Not appropriate for very large data
What programming languages can you use for the Phis?
Only code compiled from C/C++ and Fortran can be run on the Phis.
Can I still use X for parallel programming?
The Phis have available most parallel programming options available on the Host. Specifically, the Phis are known to support the following:
- MVAPICH2 (OSC's recommended MPI library)
- Intel Cilk Plus
- Intel Threading Building Blocks (Intel TBB)
What sections of code should I offload? (CAO only)
Highly-parallel sections of code are good candidates for offload. Serial code offloaded will run much slower than on the Host.
Data transfers between the Phi and Host must also be taken into consideration when choosing sections of code to offload. Data transfers are slow and should be minimized. If two offloaded parallel sections of code have a serial section between them and they all act on the same data, it may be more efficient to offload the serial section as well. This eliminates the need to transfer the data back to the Host, run the serial section, and then transfer this data back to the Phi.
How do I run code on the Phis?
MKL, OpenMP, or MPI based programs
To run a MKL, OpenMP, or MPI based program on the Phis, some libraries may need to be copied over.
How do I set up environment variables on the Phi?
By default, all environment variables set on the Host are passed to the Phi. This behavior can be over-ridden by setting the MIC_ENV_PREFIX to a string. Then, only environment variables prefixed by this string will then be passed to the Phi's environment.
For example, setting MIC_ENV_PREFIX to PHI would cause only environment variables prefixed with PHI to be passed (PHI_PATH, PHI_LIBRARY, etc.).
See Intel's Setting Environment Variables on the CPU to Modify the Coprocessor's Execution Environment for more information on passing and setting environment variables.