ARM HPC tools

ARM HPC tools analyze how HPC software runs. It consists of three applications, ARM DDT, ARM Performance Reports and ARM MAP: 

  • ARM DDT: graphical debugger for HPC applications.
  • ARM MAP: HPC application profiler with easy-to-use GUI environment.
  • ARM Performance Reports: simple tool to generate a single-page HTML or plain text report that presents overall performance characteristics of HPC applications.

 

NOTE: Because ARM has aquired Allinea, all Allinea module files have been renamed accordingly. Allinea modules are still available and have same functionality as new ARM modules.
NOTE [June 29, 2022]: As ARM reported security vulnerabilities on the old ARM Forge versions prior to 22.0.x, we have removed the old versions and installed 22.0.2 version.

Availability & Restrictions

Versions

The following versions of ARM HPC tools are available on OSC clusters:

Version Owens Pitzer
22.0.2 X* X*
* Current default version

You can use module spider arm to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

ARM DDT, MAP and Performance Reports are available to all OSC users.

Publisher/Vendor/Repository and License Type

ARM, Commercial

Usage

ARM DDT

ARM DDT is a debugger for HPC software that automatically alerts users of memory bugs and divergent behavior. For more features and benefits, visit ARM HPC tools and libraries - DDT.

For usage instructions and more iformation, read ARM DDT.

ARM MAP

ARM MAP produces a detailed profile of HPC software. Unlike ARM Performance Reports, you must have the source code to run ARM MAP because its analysis details the software line-by-line. For more features and benefits, visit ARM HPC tools and libraries - MAP

For usage instructions and more information, read ARM MAP.

ARM Performance Reports

ARM Performance Reports analyzes and documents information on CPU, MPI, I/O, and Memory performance characteristics of HPC software, even third party code, to aid understanding about the overall performance. Although it should not be used all the time, ARM Performance Reports is recommended to OSC users as a viable option to analyze how an HPC application runs. View an example report to navigate the format of a typical report. For more example reports, features and benefits, visit ARM HPC tools and libraries - Performance Reports.

For usage instructions and more information, read ARM Performance Reports.

Troubleshooting

Using ARM software with MVAPICH2

This note from ARM's Getting Started Guide applies to both perf-report and MAP:

Some MPIs, most notably MVAPICH, are not yet supported by ARM's Express Launch mode
(in which you can just put “perf-report” in front of an existing mpirun/mpiexec line). These can
still be measured using the Compatibility Launch mode.

Instead of this Express Launch command:

perf-report mpiexec <mpi args> <program> <program args> # BAD

Use the compatibility launch version instead:

perf-report -n <num procs> --mpiargs="<mpi args>" <program> <program args>

Further Reading

See Also

Documentation Attachment: 
Supercomputer: 
Service: 
Fields of Science: 

ARM Performance Reports

ARM Performance Reports is a simple tool used to generate a single-page HTML or plain text report that presents the overall performance characteristics of HPC applications. It supports pthreads, OpenMP, or MPI code on CPU, GPU, and MIC based architectures.

Availability and Restrictions

Versions

The versions currently available at OSC are:

Version Owens Pitzer
22.0.2 X* X*
* Current default version

You can use module spider arm-pr to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

ARM Performance Reports is available to all OSC users. We have 64 seats with 64 HPC tokens. Users can monitor the license status here.

Publisher/Vendor and License Type

ARM, Commercial

Usage

Set-up

To load the module for the ARM Performance Reports default version, use module load arm-pr. To select a particular software version, use module load arm-pr/version. For example, use module load arm-pr/6.0 to load ARM Performance Reports version 6.0, provided the version is available on the OSC cluster in use.

Using ARM Performance Reports

You can use your regular executables to generate performance reports. The program can be used to analyze third-party code as well as code you develop yourself. Performance reports are normally generated in a batch job.

To generate a performance report for an MPI program:

module load arm-pr
perf-report -np <num procs> --mpiargs="<mpi args>" <program> <program args>

where <num procs> is the number of MPI processes to use, <mpi args> represents arguments to be passed to mpiexec (other than -n or -np), <program> is the executable to be run and <program args> represents arguments passed to your program.

For example, if you normally run your program with mpiexec -n 12 wave_c, you would use

perf-report -np 12 wave_c

To generate a performance report for a non-MPI program:

module load arm-pr
perf-report --no-mpi <program> <program args>

The performance report is created in both html and plain text formats. The file names are based on the executable name, number of processes, date and time, for example,  wave_c_12p_2016-02-05_12-46.html. To open the report in html format use

firefox wave_c_12p_2016-02-05_12-46.html

For more details, download the ARM Performance Reports User Guide.

Performance Reports with GPU

ARM Performance Reports can be used for CUDA codes. If you have an executable compiled with the CUDA library, you can launch ARM Performance Reports with

perf-report {executable}

For more information, please read the section 6.10 of the ARM Performance Reports User Guide.

Further Reading

See Also

Documentation Attachment: 
Supercomputer: 
Service: 

ARM MAP

ARM MAP is a full scale profiler for HPC programs. We recommend using ARM MAP after reviewing reports from ARM Performance Reports. MAP supports pthreads, OpenMP, and MPI software on CPU, GPU, and MIC based architectures.

Availability & Restrictions

Versions

The ARM MAP versions currently available at OSC are:

Version Owens Pitzer
22.0.2 X* X*
* Current default version

You can use module spider arm-map to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

ARM MAP is available to all OSC users. We have 64 seats with 80 HPC tokens. Users can monitor the ARM License Server Status.

Publisher/Vendor and License Type

ARM, Commercial

Usage

Set-up

To load the default version of the ARM MAP module, use module load arm-map. To select a particular software version, use module load arm-map/version. For example, use module load arm-map/6.0 to load ARM MAP version 6.0, provided the version is available on the cluster in use. 

Note: Before you run MAP from the command line for the first time, open MAP as a GUI from OnDemand to configure with appropriate settings for your environment.

Using ARM MAP

Profiling HPC software with ARM MAP typically involves three steps: 

1. Prepare the executable for profiling.

Regular executables can be profiled with ARM MAP, but source code line detail will not be available. You need executables with debugging information to view source code line detail: re-compile your code with a -g  option added among the other appropriate compiler options. For example:

mpicc wave.c -o wave -g -O3

This executable built with the debug flag can be used for ARM Performance Reports as well.

Note: The -g flag turns off all optimizations by default. For profiling your code you should use the same optimizations as your regular executable, so explicitly include the -On flag, where n is your normal level of optimization, typically -O2 or -O3, as well as any other compiler optimization options.

2. Run your code to produce the profile data file (.map file).

Profiles are normally generated in a batch job.  To generate a MAP profile for an MPI program:

module load arm-map
map --profile -np <num proc> --mpiargs="<mpi args>" <program> <program args>

where <num procs> is the number of MPI processes to use, <mpi args> represents arguments to be passed to srun (other than -n), <program> is the executable to be run and <program args> represents arguments passed to your program.

For example, if you normally run your program with mpiexec -n 12 wave_c, you would use

map --profile -np 12 wave_c

To profile a non-MPI program:

module load arm-map
map --profile --no-mpi <program> <program args>

The profile data is saved in a .map file in your current directory.

As a result of this step, a .map file that is the profile data file is created in your current directory. The file name is based on the executable name, number of processes, date and time, for example, wave_c_12p_2016-02-05_12-46.map.

For more details on using ARM MAP, refer to the ARM Forge User Guide.

3. Analyze the profile data file using either the ARM local client or the MAP GUI.

You can open the profile data file using a client running on your local desktop computer. For client installation and usage instructions, please refer to the section: Client Download and Setup. This option typically offers the best performance.

Alternatively, you can run MAP in interactive mode, which launches the graphical user interface (GUI).  For example:

map wave_c_12p_2016-02-05_12-46.map

For the GUI application, one should use an OnDemand VDI (Virtual Desktop Interface) or have X11 forwarding enabled (see Setting up X Windows). Note that X11 forwarding can be distractingly slow for interactive applications.

MAP with GPU

ARM MAP can be used for CUDA codes. If you have an executable compiled with the CUDA library, you can launch ARM MAP with

map {executable}

For more information, please read the Chapter 15 of the ARM Forge User Guide.

Client Download and Setup

1. Download the client.

To download the client, go to the ARM website and choose the appropriate ARM Forge remote client download for Windows, Mac, or Linux. For Windows and Mac, just double click on the downloaded file and allow the installer to run. For Linux, extract the tar file using the command tar -xf file_name and run the installer in the extracted file directory with ./installer. Please contact OSC Help, if you have any issues on downloading the client.

2. Configure the client.

After installation, you can configure the client as follows:

  • Open the client program. For Windows or Mac, just click the desktop icon or navigate to the application through its file path. For Linux use the command {arm-forge-path}/bin/map.

  • Once the program is launched, select ARM MAP in the left column.
  • In the Remote Launch drop down menu, select "Configure...".
  • Click Add to create a new profile for your login.
  • In the Host Name section, type your ssh connection. For example: "username@ruby.osc.edu".
  • For Remote Installation Directory, type /usr/local/arm/forge-{version}, specifying the ARM Forge version number that created the data profile file you are attempting to view. For example, /usr/local/arm/forge-7.0 for ARM Forge version 7.0.
  • You can test your login information by clicking Test Remote Launch. It will ask your password. Use the same password for the cluster login.
  • Close the Configure window. You will see a new option under the Remote Launch drop down menu for the host name you entered. Select your profile and login with your password. 
  • If the login was successful, then you should see License Serial:XXX in the bottom left corner of the window.

This login configuration is needed only for the first time of use. In subsequent times, you can just select your profile.

3. Open the profile data file.

After login, click on LOAD PROFILE DATA FILE. This opens a file browser of your home directory on the OSC cluster you logged onto. Go to the directory that contains the .map file and select it. This will open the file and allow you to navigate the source code line-by-line and investigate the performance characteristics. 

A license is not required to simply open the client, so it is possible to skip 2. Configure the client, if you download the profile data file to your desktop. You can then open it by just selecting LOAD PROFILE DATA FILE and navigating through a file browser on your local system.

Note that the client is ARM Forge, a client that contains ARM MAP and ARM DDT. ARM DDT is a debugger, and OSC has license only for ARM MAP. If you need a debugger, you can use Totalview instead.

Further Reading

See Also

Documentation Attachment: 
Supercomputer: 
Service: 
Fields of Science: 

ARM DDT

Arm DDT is a graphical debugger for HPC applications. It supports pthreads, OpenMP, or MPI code on CPU, GPU, and MIC based architectures.

Availability & Restrictions

Versions

The Arm DDT versions currently available at OSC are:

Version Owens Pitzer
22.0.2 X* X*
* Current default version

You can use module spider arm-ddt to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

Arm DDT is available to all OSC users. We have 64 seats with 80 HPC tokens. Users can monitor the Arm License Server Status.

Publisher/Vendor and License Type

ARM, Commercial

Usage

Set-up

To load the module for the Arm DDT default version, use module load arm-ddt. To select a particular software version, use module load arm-ddt/version. For example, use module load arm-ddt/7.0 to load Arm DDT version 7.0, provided the version is available on the OSC cluster in use.

Note: Before you run DDT from the command line for the first time, open DDT as a GUI from OnDemand to configure with appropriate settings for your environment.

Using Arm DDT

DDT debugs executables to generate DDT reports. The program can be used to debug third-party code as well as code you develop yourself. DDT reports are normally generated in a batch job.

To generate a DDT report for an MPI program:

module load arm-ddt
ddt --offline -np <num procs> --mpiargs="<mpi args>" <program> <program args>

where <num procs> is the number of MPI processes to use, <mpi args> represents arguments to be passed to mpiexec (other than -n or -np), <program> is the executable to be run and <program args> represents arguments passed to your program.

For example, if you normally run your program with mpiexec -n 12 wave_c, you would use

ddt --offline -np 12 wave_c

To debug a non-MPI program:

module load arm-ddt
ddt --offline --no-mpi <program> <program args>

The DDT report is created in html format. The file names are based on the executable name, number of processes, date and time, for example, wave_c_12p_2016-02-05_12-46.html. To open the report use

firefox wave_c_12p_2016-02-05_12-46.html

Using the Arm DDT GUI

To debug with the DDT GUI remove the --offline option. For example, to debug the MPI program above, use

ddt -np 12 wave_c

For a non-MPI program:

ddt --no-mpi <program> <program args>

This will open the DDT GUI, enabling interactive debugging options.

For the GUI application, one should use an OnDemand VDI (Virtual Desktop Interface) or have X11 forwarding enabled (see Setting up X Windows). Note that X11 forwarding can be distractingly slow for interactive applications.

For more details, see the Arm DDT developer page.

DDT with GPU

DDT can be used for CUDA codes. If you have an executable compiled with the CUDA library, you can launch Arm Performance Reports with

ddt {executable}

For more information, please read the chapter 14 of the Arm Forge User Guide.

Supercomputer: