Ruby Phi FAQ

The Ruby cluster is composed of both standard Intel Xeon CPUs as well as new Xeon Phi coprocessors.  Special considerations must be taken both when compiling for and running software on the Phi coprocessors.  This guide provides general information on the Phi coprocessors and breaks down the different types of programming models available for them.  For detailed information on compiling software for our Phis, please refer to our Phi Compiling Guide

 

The Intel Xeon Phi coprocessors are referred to as "Phis", and the Intel Xeon CPU as "Host". 

What are the Xeon Phi coprocessors?

The Xeon Phi coprocessors (commonly referred to as both accelerators and MICs) can be thought of as complementary add-ons to Ruby's standard Xeon Host CPUs.  Much like the GPU accelerators available on Oakley and Glenn, they are used to increase performance by providing a specialized computational environmental optimized for certain operations.  For certain operations, the Phis will be orders of magnitude faster than the same operation run on the CPU.  

How are the Phi coprocessors different than GPUs?

The goal of each is to do the same thing, but how they do this differs.  Xeon Phis run Intel assembly code similar to that of the Xeon Host CPUs.  By simply recompiling source code for the Phi, most programs can be greatly benefit.  On the other hand, GPUs traditionally run their own proprietary code, requiring programs to be tediously tailored for these specific GPUs before they can be compiled and run. 

 

Should I use the Phis?

Short answer: If you are going to run your code on Ruby, yes.  Getting code to run on the Phis can be such a simple process and result in significant gains that there is little reason not to.  

For more help on determining whether to use the Phi, refer to Intel's guide on the subject.

You do not have to take advantage of the Phis to run code on Ruby. 

 

How can I use the Phis?

There are three main ways of taking advantage of the Phi's computing power:

  1. Native Execution

    • Compile binary for Phi ONLY
    • Done with -mmic compiler flag
    • SSH to Phi and then run
    • Good for getting familiar with Phi characteristics
    • Host sits idly during code execution
  2. Symmetric/Heterogeneous Execution (Using MPI)

    • Compile and run code on both Host and Phi
    • Both the Host and the Phi operate "symmetrically" as MPI targets
    • Requires careful load balancing between MPI tasks due to differences between Host and Phi (as well as OpenMP threads if taking the hybrid approach)
  3. Offload Execution

    • Automatic Offload (AO) with Intel Math Kernel Library (MKL)
      • Some MKL routines are automatically offloaded to Phi when code is run on the Host
      • Does not require any changes to code -- completely transparent to the user
      • MKL determines if computation will benefit from offloading
    • Compiler Assisted Offloading (CAO)
      • Add offload directives/pragmas to parts of code you want offloaded
      • If the Phi is unavailable for any reason, code defaults back to running on the Host
      • Two programming sub-models differing on data movement
        1. Explicit - Code directs data movement to/from Phi 
          • Only supports arrays of scalar or bitwise copyable structure or class.  For more complex C/C++ data types, use implicit model
          • Uses #pragma offload construct
        2. Implicit - Code establishes virtual "shared memory" model, data is synchronized automatically between Host and Phi at established points
          • only available for C/C++
          • appropriate for complex, pointer-based data structure (linked lists, binary trees, etc.)
          • Uses _Cilk_shared and _Cilk_offload constructs
          • Not appropriate for very large data

What programming languages can you use for the Phis?

Only code compiled from C/C++ and Fortran can be run on the Phis.

Can I still use X for parallel programming?

The Phis have available most parallel programming options available on the Host.  Specifically, the Phis are known to support the following:

  • MVAPICH2 (OSC's recommended MPI library)
  • OpenMP
  • ​Intel Cilk Plus
  • pthreads
  • Intel Threading Building Blocks (Intel TBB)

What sections of code should I offload? (CAO only)

Highly-parallel sections of code are good candidates for offload.  Serial code offloaded will run much slower than on the Host.

Data transfers between the Phi and Host must also be taken into consideration when choosing sections of code to offload. Data transfers are slow and should be minimized.  If two offloaded parallel sections of code have a serial section between them and they all act on the same data, it may be more efficient to offload the serial section as well.  This eliminates the need to transfer the data back to the Host, run the serial section, and then transfer this data back to the Phi.

 

How do I run code on the Phis?

MKL, OpenMP, or MPI based programs

To run a MKL, OpenMP, or MPI based program on the Phis, some libraries may need to be copied over.  

 

How I set up environmental variables on the Phi?

By default, all environmental variables set on the Host are passed to the Phi.  This behavior can be over-ridden by setting the MIC_ENV_PREFIX to a string.  Then, only environmental variables prefixed by this string will then be passed to the Phi's environment.

For example, setting MIC_ENV_PREFIX to PHI would cause only environmental variables prefixed with PHI to be passed (PHI_PATH, PHI_LIBRARY, etc.).

MIC_LD_LIBRARY_PATH is not stripped and passed to the Phi, and thus MIC_ENV_PREFIX=MIC will not work to change the Phi's LD_LIBRARY_PATH

See Intel's Setting Environment Variables on the CPU to Modify the Coprocessor's Execution Environment for more information on passing and setting environmental variables.

 

Supercomputer: