ARM HPC tools analyze how HPC software runs. It consists of three applications, ARM DDT, ARM Performance Reports and ARM MAP:
The following versions of ARM HPC tools are available on OSC clusters:
Version | Owens | Pitzer |
---|---|---|
22.0.2 | X* | X* |
You can use module spider arm
to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.
ARM DDT, MAP and Performance Reports are available to all OSC users.
ARM, Commercial
ARM DDT is a debugger for HPC software that automatically alerts users of memory bugs and divergent behavior. For more features and benefits, visit ARM HPC tools and libraries - DDT.
For usage instructions and more iformation, read ARM DDT.
ARM MAP produces a detailed profile of HPC software. Unlike ARM Performance Reports, you must have the source code to run ARM MAP because its analysis details the software line-by-line. For more features and benefits, visit ARM HPC tools and libraries - MAP.
For usage instructions and more information, read ARM MAP.
ARM Performance Reports analyzes and documents information on CPU, MPI, I/O, and Memory performance characteristics of HPC software, even third party code, to aid understanding about the overall performance. Although it should not be used all the time, ARM Performance Reports is recommended to OSC users as a viable option to analyze how an HPC application runs. View an example report to navigate the format of a typical report. For more example reports, features and benefits, visit ARM HPC tools and libraries - Performance Reports.
For usage instructions and more information, read ARM Performance Reports.
This note from ARM's Getting Started Guide applies to both perf-report and MAP:
Some MPIs, most notably MVAPICH, are not yet supported by ARM's Express Launch mode
(in which you can just put “perf-report” in front of an existing mpirun/mpiexec line). These can
still be measured using the Compatibility Launch mode.
Instead of this Express Launch command:
perf-report mpiexec <mpi args> <program> <program args> # BAD
Use the compatibility launch version instead:
perf-report -n <num procs> --mpiargs="<mpi args>" <program> <program args>
ARM Performance Reports is a simple tool used to generate a single-page HTML or plain text report that presents the overall performance characteristics of HPC applications. It supports pthreads, OpenMP, or MPI code on CPU, GPU, and MIC based architectures.
The versions currently available at OSC are:
Version | Owens | Pitzer |
---|---|---|
22.0.2 | X* | X* |
You can use module spider arm-pr
to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.
ARM Performance Reports is available to all OSC users. We have 64 seats with 64 HPC tokens. Users can monitor the license status here.
ARM, Commercial
To load the module for the ARM Performance Reports default version, use module load arm-pr
. To select a particular software version, use module load arm-pr/version
. For example, use module load arm-pr/6.0
to load ARM Performance Reports version 6.0, provided the version is available on the OSC cluster in use.
You can use your regular executables to generate performance reports. The program can be used to analyze third-party code as well as code you develop yourself. Performance reports are normally generated in a batch job.
To generate a performance report for an MPI program:
module load arm-pr perf-report -np <num procs> --mpiargs="<mpi args>" <program> <program args>
where <num procs>
is the number of MPI processes to use, <mpi args>
represents arguments to be passed to mpiexec (other than -n or -np), <program>
is the executable to be run and <program args>
represents arguments passed to your program.
For example, if you normally run your program with mpiexec -n 12 wave_c
, you would use
perf-report -np 12 wave_c
To generate a performance report for a non-MPI program:
module load arm-pr perf-report --no-mpi <program> <program args>
The performance report is created in both html and plain text formats. The file names are based on the executable name, number of processes, date and time, for example, wave_c_12p_2016-02-05_12-46.html
. To open the report in html format use
firefox wave_c_12p_2016-02-05_12-46.html
For more details, download the ARM Performance Reports User Guide.
ARM Performance Reports can be used for CUDA codes. If you have an executable compiled with the CUDA library, you can launch ARM Performance Reports with
perf-report {executable}
For more information, please read the section 6.10 of the ARM Performance Reports User Guide.
ARM MAP is a full scale profiler for HPC programs. We recommend using ARM MAP after reviewing reports from ARM Performance Reports. MAP supports pthreads, OpenMP, and MPI software on CPU, GPU, and MIC based architectures.
The ARM MAP versions currently available at OSC are:
Version | Owens | Pitzer |
---|---|---|
22.0.2 | X* | X* |
You can use module spider arm-map
to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.
ARM MAP is available to all OSC users. We have 64 seats with 80 HPC tokens. Users can monitor the ARM License Server Status.
ARM, Commercial
To load the default version of the ARM MAP module, use module load arm-map
. To select a particular software version, use module load arm-map/version
. For example, use module load arm-map/6.0
to load ARM MAP version 6.0, provided the version is available on the cluster in use.
Note: Before you run MAP from the command line for the first time, open MAP as a GUI from OnDemand to configure with appropriate settings for your environment.
Profiling HPC software with ARM MAP typically involves three steps:
Regular executables can be profiled with ARM MAP, but source code line detail will not be available. You need executables with debugging information to view source code line detail: re-compile your code with a -g
option added among the other appropriate compiler options. For example:
mpicc wave.c -o wave -g -O3
This executable built with the debug flag can be used for ARM Performance Reports as well.
Note: The -g
flag turns off all optimizations by default. For profiling your code you should use the same optimizations as your regular executable, so explicitly include the -On
flag, where n is your normal level of optimization, typically -O2
or -O3
, as well as any other compiler optimization options.
Profiles are normally generated in a batch job. To generate a MAP profile for an MPI program:
module load arm-map map --profile -np <num proc> --mpiargs="<mpi args>" <program> <program args>
where <num procs>
is the number of MPI processes to use, <mpi args>
represents arguments to be passed to srun (other than -n), <program>
is the executable to be run and <program args>
represents arguments passed to your program.
For example, if you normally run your program with mpiexec -n 12 wave_c
, you would use
map --profile -np 12 wave_c
To profile a non-MPI program:
module load arm-map map --profile --no-mpi <program> <program args>
The profile data is saved in a .map file in your current directory.
As a result of this step, a .map file that is the profile data file is created in your current directory. The file name is based on the executable name, number of processes, date and time, for example, wave_c_12p_2016-02-05_12-46.map
.
For more details on using ARM MAP, refer to the ARM Forge User Guide.
You can open the profile data file using a client running on your local desktop computer. For client installation and usage instructions, please refer to the section: Client Download and Setup. This option typically offers the best performance.
Alternatively, you can run MAP in interactive mode, which launches the graphical user interface (GUI). For example:
map wave_c_12p_2016-02-05_12-46.map
For the GUI application, one should use an OnDemand VDI (Virtual Desktop Interface) or have X11 forwarding enabled (see Setting up X Windows). Note that X11 forwarding can be distractingly slow for interactive applications.
ARM MAP can be used for CUDA codes. If you have an executable compiled with the CUDA library, you can launch ARM MAP with
map {executable}
For more information, please read the Chapter 15 of the ARM Forge User Guide.
To download the client, go to the ARM website and choose the appropriate ARM Forge remote client download for Windows, Mac, or Linux. For Windows and Mac, just double click on the downloaded file and allow the installer to run. For Linux, extract the tar file using the command tar -xf file_name
and run the installer in the extracted file directory with ./installer
. Please contact OSC Help, if you have any issues on downloading the client.
After installation, you can configure the client as follows:
Open the client program. For Windows or Mac, just click the desktop icon or navigate to the application through its file path. For Linux use the command {arm-forge-path}/bin/map
.
/usr/local/arm/forge-{version}
, specifying the ARM Forge version number that created the data profile file you are attempting to view. For example, /usr/local/arm/forge-7.0
for ARM Forge version 7.0.This login configuration is needed only for the first time of use. In subsequent times, you can just select your profile.
After login, click on LOAD PROFILE DATA FILE. This opens a file browser of your home directory on the OSC cluster you logged onto. Go to the directory that contains the .map file and select it. This will open the file and allow you to navigate the source code line-by-line and investigate the performance characteristics.
A license is not required to simply open the client, so it is possible to skip 2. Configure the client, if you download the profile data file to your desktop. You can then open it by just selecting LOAD PROFILE DATA FILE and navigating through a file browser on your local system.
Note that the client is ARM Forge, a client that contains ARM MAP and ARM DDT. ARM DDT is a debugger, and OSC has license only for ARM MAP. If you need a debugger, you can use Totalview instead.
Arm DDT is a graphical debugger for HPC applications. It supports pthreads, OpenMP, or MPI code on CPU, GPU, and MIC based architectures.
The Arm DDT versions currently available at OSC are:
Version | Owens | Pitzer |
---|---|---|
22.0.2 | X* | X* |
You can use module spider arm-ddt
to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.
Arm DDT is available to all OSC users. We have 64 seats with 80 HPC tokens. Users can monitor the Arm License Server Status.
ARM, Commercial
To load the module for the Arm DDT default version, use module load arm-ddt
. To select a particular software version, use module load arm-ddt/version
. For example, use module load arm-ddt/7.0
to load Arm DDT version 7.0, provided the version is available on the OSC cluster in use.
Note: Before you run DDT from the command line for the first time, open DDT as a GUI from OnDemand to configure with appropriate settings for your environment.
DDT debugs executables to generate DDT reports. The program can be used to debug third-party code as well as code you develop yourself. DDT reports are normally generated in a batch job.
To generate a DDT report for an MPI program:
module load arm-ddt ddt --offline -np <num procs> --mpiargs="<mpi args>" <program> <program args>
where <num procs>
is the number of MPI processes to use, <mpi args>
represents arguments to be passed to mpiexec (other than -n or -np), <program>
is the executable to be run and <program args>
represents arguments passed to your program.
For example, if you normally run your program with mpiexec -n 12 wave_c
, you would use
ddt --offline -np 12 wave_c
To debug a non-MPI program:
module load arm-ddt ddt --offline --no-mpi <program> <program args>
The DDT report is created in html format. The file names are based on the executable name, number of processes, date and time, for example, wave_c_12p_2016-02-05_12-46.html
. To open the report use
firefox wave_c_12p_2016-02-05_12-46.html
To debug with the DDT GUI remove the --offline
option. For example, to debug the MPI program above, use
ddt -np 12 wave_c
For a non-MPI program:
ddt --no-mpi <program> <program args>
This will open the DDT GUI, enabling interactive debugging options.
For the GUI application, one should use an OnDemand VDI (Virtual Desktop Interface) or have X11 forwarding enabled (see Setting up X Windows). Note that X11 forwarding can be distractingly slow for interactive applications.
For more details, see the Arm DDT developer page.
DDT can be used for CUDA codes. If you have an executable compiled with the CUDA library, you can launch Arm Performance Reports with
ddt {executable}
For more information, please read the chapter 14 of the Arm Forge User Guide.