HOW TO: Look at wall-time accuracy using XDMoD

The XDMoD tool at xdmod.osc.edu can be used to get an overview of how accurate the requested wall-time of jobs are with the actual wall-time of jobs.

The requested wall-time is the amount of time that one wants to reserve resources.

#PBS -l walltime=xx:xx:xx

The actual walltime is how long the job held the reserved resources before completing. It will be related to one in the job output file with filename

<job-name>.o<job-id> 

e.g.

$ cat owens_r_parallel.sh.o7691974
<omitted-irrelevant-information>
...

-----------------------
Resources requested:
nodes=2:ppn=28
-----------------------
Resources used:
cput=00:18:42
walltime=00:10:29        <- Actual wall-time here
mem=11.156GB
vmem=21.148GB
-----------------------
Resource units charged (estimate):
0.978 RUs
-----------------------

Or if submitted through ondemand, it will be in Folder Contents:

Screen Shot 2019-09-25 at 3.45.10 PM.png

 

It is important to understand that the requested wall-time is used when scheduling a submitted job. If a job requests a wall-time that is much more than the expected wall-time, then it may take longer to start because the resources need to be reserved for the period that the job runs even if the job only uses a small portion of that requested wall-time.

This allows one to view the wall-time accuracy for an individual job, but XDMoD can be used to do this for jobs submitted over time.

First, login to xdmod.osc.edu, see this page for more instructions.

https://www.osc.edu/supercomputing/knowledge-base/xdmod_tool

Then, navigate to the Metric Explorer tab.

Look for the Metric Catalog on the left side of the page and expand the SUPREMM options. Select Wall Hours: Requested: Per Job and group by None.

walltime_acc_metric_tab.png

This will now show the average wall-time requested.

The actual wall-time data can be added by navigating to Add Data -> SUPREMM -> Wall Hours: Per Job.

walltime_acc_add_data.png

walltime_acc_select_walltime.png

This will open a new window titled Data Series Definition, to change some parameters before showing the new data. In order to easily distinguish between actual and requested wall-time, change the Display Type to Bar, then click add to view the new data.

walltime_add_data_settings.png

Now there is a line which shows the average requested wall-time of jobs, and bars which depict the average actual wall-time of jobs. Essentialy, the closer the bar is to the line, without intersecting the line, the more accurate the wall-time predicition. If the bar intersects the line, then it may indicate the there was not enough wall-time requested for a job to complete, but remember that these values are averages.

walltime_acc_final_zoom.png

One can also view more detailed information about these jobs by clicking a data point and using the Show raw data option.

wall_acc_select_datapoint.png

In order to have the Show raw data option, one may need to use the Drilldown option first to sort the jobs in that list by use or another metric.

wall_acc_show_raw_data.png

Supercomputer: