The Ohio Supercomputer Center (OSC) is experiencing an email delivery problem with several types of messages from MyOSC. 

 OSC is preparing to update Slurm on its production systems to version 23.11.4 on March, 27. 

Jobs reports 'excessive memory usage' message

Category: 
Resolution: 
Resolved

A batch job output may contain this message, e.g.:

****** p0102.ten.osc.edu: Excessive memory usage detected; job may have failed. ******

The safest course of action is to resubmit the job with increased memory. A simple method to do that is to increase the value of the batch ppn specifier. For context and alternative methods, see https://www.osc.edu/documentation/knowledge_base/out_of_memory_oom_or_excessive_memory_usage

 

This issue is not common, but it can appear in surprising ways. For example, copying large files in batch jobs can trigger the message and the copying can be correct or incorrect. In addition, the amount of memory required to copy large files is not intuitive, e.g.: if an Owens job requests '-l nodes=1:ppn=2', then it is allocated mem=4315*2 MB; if that job attempts to copy a file of size 6 GB from a home directory to a compute node's local temporary directory, then the job will report "Excessive memory usage detected ; job may have failed."

 

We are still troubleshooting this issue.