Occasionally, jobs that experience problems may generate emails from staff or automated systems at the center with some information about the nature of the problem. This page provides additional information about the various emails sent, and steps that can be taken to address the problem.
regular job emails
These emails can be turned on/off using the appropriate slurm directives. Other email addresses can also be specified. See the mail options section of job scripts page.
|job began/end||Job began or ended. These are normal emails.|
|job aborted||Job has ended in an abnormal state.|
There is no option to turn these emails off, as they require us to contact the user that submitted the job. We can work with you if they will be expected. Please contact OSC Help in this case.
|Deleted by administrator||
OSC staff may delete running jobs if:
OSC staff may delete queued jobs if:
|Emails exceed expected volume||Job emails may be delayed if too many are queued to be sent to a single email address. This is to prevent OSC from being blacklisted by the email server.|
|failure due to hardware/software problem||The node(s) or software that a job was using had a critical issue and the job failed.|
|overuse of physical memory (RAM)||
The node that was in use crashed due to it being out of memory.
See out-of-memory (OOM) or excessive memory usage page for more information.
|Job requeued||A job may be requeued explicitly by a system administrator or after a node failure.|
An issue with gpfs may have affected the job. This includes directories located in:
|Filling up /tmp||
Job failed after exhausting the space in a node's local /tmp directory.
Please request either an entire node or use scratch.
Contact OSC Help for assistance if there are any questions.