qsub filter rejects valid jobs

Resolution: 
Resolved

Job scripts submitted on Glenn, Oakley, or Ruby all go a submit filter before reaching the resource manager, Torque.  A bug has been discovered in our submit filter which prevents jobs with the substring "-A" in their PBS directives from being parsed and interpreted correctly.  This behavior is only observed in jobs where the "-A" substring is specified in the job script -- If an argument with the "-A" substring is passed to qsub from the command line this behavior is not present.

Take the following job script:

#PBS -lnodes=1:ppn=12
#PBS -Nthisisajobname-Andsoisthis
#PBS -lwalltime=1:00:00

sleep 10

This job script defines a 1 node job for 1 hour of walltime that is named "thisisajobname-Andsoisthis".

If one attempts to submit the above job the following error will be given:

$ qsub qsub_filter_group_test.pbs
qsub filter $Revision: 490 $, $Date: 2014-09-23 17:02:52 -0400 (Tue, 23 Sep 2014) $
Group: ndsoisthis is not valid.
Please choose from: appl
qsub: Your job has been administratively rejected by the queueing system.
qsub: There may be a more detailed explanation prior to this notice.

This is occuring because the submit filter is interpreting the string after the "-A" substring to be the group to charge against, when in reality it is part of the job name.  This behavior is present when the "-A" substring is present in any PBS directive.

Our temporary workaround is to not use the substring "-A" within the PBS directives unless it is being used to define the group to charge against.

We are working on a better long term solution, and will update this known issue once one is developed.