HOW TO: Look at requested time accuracy using XDMoD

The XDMoD tool at xdmod.osc.edu can be used to get an overview of how accurate the requested time of jobs are with the elapsed time of jobs.

One way of specifying a time request is:

#SBATCH --time=xx:xx:xx

The elapsed time is how long the job ran for before completing. This can be obtained using the  sacct command.

$ sacct -u <username> --format=jobid,account,elapsed

It is important to understand that the requested time is used when scheduling a submitted job. If a job requests a time that is much more than the expected elapsed time, then it may take longer to start because the resources need to be allocated for the time that the job requests even if the job only uses a small portion of that requested time.

This allows one to view the requested time accuracy for an individual job, but XDMoD can be used to do this for jobs submitted in over a time range.

First, login to xdmod.osc.edu, see this page for more instructions.

https://www.osc.edu/supercomputing/knowledge-base/xdmod_tool

Then, navigate to the Metric Explorer tab.

Look for the Metric Catalog on the left side of the page and expand the SUPREMM options. Select Wall Hours: Requested: Per Job and group by None.

walltime_acc_metric_tab.png

This will now show the average time requested.

The actual time data can be added by navigating to Add Data -> SUPREMM -> Wall Hours: Per Job.

walltime_acc_add_data.png

walltime_acc_select_walltime.png

This will open a new window titled Data Series Definition, to change some parameters before showing the new data. In order to easily distinguish between elapsed and requested time, change the Display Type to Bar, then click add to view the new data.

walltime_add_data_settings.png

Now there is a line which shows the average requested time of jobs, and bars which depict the average elapsed time of jobs. Essentialy, the closer the bar is to the line, without intersecting the line, the more accurate the time predicition. If the bar intersects the line, then it may indicate the there was not enough time requested for a job to complete, but remember that these values are averages.

walltime_acc_final_zoom.png

One can also view more detailed information about these jobs by clicking a data point and using the Show raw data option.

wall_acc_select_datapoint.png

In order to have the Show raw data option, one may need to use the Drilldown option first to sort the jobs in that list by use or another metric.

wall_acc_show_raw_data.png

Supercomputer: