HOWTO: Collect performance data for your program

This page outlines ways to generate and view performance data for your program using tools available at OSC.

Intel Tools

This section describes how to use performance tools from Intel. Make sure that you have an Intel module loaded to use these tools.

Intel VTune

Intel VTune is a tool to generate profile data for your application. Generating profile data with Intel VTune typically involves three steps:

1. Prepare the executable for profiling.

You need executables with debugging information to view source code line detail: re-compile your code with a -g  option added among the other appropriate compiler options. For example:

mpicc wave.c -o wave -g -O3

2. Run your code to produce the profile data.

Profiles are normally generated in a batch job. To generate a VTune profile for an MPI program:

mpiexec <mpi args> amplxe-cl <vtune args> <program> <program args>

where <mpi args> represents arguments to be passed to mpiexec, <program> is the executable to be run, <vtune args> represents arguments to be passed to the VTune executable amplxe-cl, and <program args> represents arguments passed to your program.

For example, if you normally run your program with mpiexec -n 12 wave_c, you would use

mpiexec -n 12 amplxe-cl -collect hotspots -result-dir r001hs wave_c

To profile a non-MPI program:

amplxe-cl <vtune args> <program> <program args>

The profile data is saved in a .map file in your current directory.

As a result of this step, a subdirectory that contains the profile data files is created in your current directory. The  subdirectory name is based on the -result-dir argument and the node id, for example, r001hs.o0674.ten.osc.edu.

3. Analyze your profile data.

You can open the profile data using the VTune GUI in interactive mode. For example:

amplxe-gui r001hs.o0674.ten.osc.edu

One should use an OnDemand VDI (Virtual Desktop Interface) or have X11 forwarding enabled (see Setting up X Windows). Note that X11 forwarding can be distractingly slow for interactive applications.

Intel ITAC

Intel Trace Analyzer and Collector (ITAC) is a tool to generate trace data for your application. Generating trace data with Intel ITAC typically involves three steps:

1. Prepare the executable for tracing.

You need to compile your executbale with -tcollect  option added among the other appropriate compiler options to insert instrumentation probes calling the ITAC API. For example:

mpicc wave.c -o wave -tcollect -O3

2. Run your code to produce the trace data.

mpiexec -trace <mpi args> <program> <program args>

For example, if you normally run your program with mpiexec -n 12 wave_c, you would use

mpiexec -trace -n 12 wave_c

As a result of this step, .anc, .f, .msg, .dcl, .stf, and .proc files will be generated in your current directory.

3. Analyze the trace data files using Trace Analyzer

You will need to use traceanalyzer to view the trace data. To open Trace Analyzer:

traceanalyzer /path/to/<stf file>

where the base name of the .stf file will be the name of your executable.

One should use an OnDemand VDI (Virtual Desktop Interface) or have X11 forwarding enabled (see Setting up X Windows) to view the trace data. Note that X11 forwarding can be distractingly slow for interactive applications.

Intel APS

Intel's Application Performance Snapshot (APS) is a tool that provides a summary of your application's performance . Profiling HPC software with Intel APS typically involves four steps:

1. Prepare the executable for profiling.

Regular executables can be profiled with Intel APS. but source code line detail will not be available. You need executables with debugging information to view source code line detail: re-compile your code with a -g  option added among the other approriate compiler options. For example:

mpicc wave.c -o wave -tcollect -O3

2. Run your code to produce the profile data directory.

Profiles are normally generated in a batch job. To generate profile data for an MPI program:

mpiexec -trace <mpi args> <program> <program args>

where <mpi args> represents arguments to be passed to mpiexec, <program> is the executable to be run and <program args> represents arguments passed to your program.

For example, if you normally run your program with mpiexec -n 12 wave_c, you would use

mpiexec -n 12 wave_c

To profile a non-MPI program:

aps <program> <program args>

The profile data is saved in a subdirectory in your current directory. The directory name is based on the date and time, for example, aps_result_YYYYMMDD/

3. Generate the profile file from the directory.

To generate the html profile file from the result subdirectory:

aps --report=./aps_result_YYYYMMDD

to create the file aps_report_YYYYMMDD_HHMMSS.html.

4. Analyze the profile data file.

You can open the profile data file using a web browswer on your local desktop computer. This option typically offers the best performance.

ARM Tools

This section describes how to use performance tools from ARM.

ARM MAP

Instructions for how to use MAP is available here.

ARM DDT

Instructions for how to use DDT is available here.

ARM Performance Reports

Instructions for how to use Performance Reports is available here.

Other Tools

This section describes how to use other performance tools.

HPC Toolkit

Rice University's HPC Toolkit is a collection of performance tools. Instructions for how to use it at OSC is available here.

TAU Commander

TAU Commander is a user interface for University of Oregon's TAU Performance System. Instructions for how to use it at OSC is available here.

Supercomputer: 
Service: