HOWTO: Reduce Disk Space Usage

This "how to" will demonstrate how to lower ones' disk space usage.  The following procedures can be applied to all of OSC's file systems.

We recommend users regularly check their data usage and clean out old data that is no longer needed.

Users who need assistance lowering their data usage can contact OSC Help.

Preventing Excessive Data Usage Before It Starts

Users should ensure that their jobs are written in such a way that temporary data is not saved to permanent file systems, such as the project space file system or their home directory.  
 

If your job copies data from the scratch file system or its node's local disk ($TMPDIR) back to a permanent file system, such as the project space file system or a home directory, you should ensure you are only copying the files you will need later.

Identifying Old and Large Data

The following commands will help you identify old data using the find command.

find commands may produce an excessive amount of output.  To terminate the command while it is running, click  CTRL + C .

Find all files in a directory that have not been accessed in the past 100 days:

This command will recursively search the users home directory and give a detailed listing of all files not accessed in the past 100 days.  

The last access time atime is updated when a file is opened by any operation, including grep , cat , head , sort , etc.

find ~ -atime +100 -exec ls -l {} \;
  • To search a different directory replace ~ with the path you wish to search.  A period . can be used to search the current directory.
  • To view files not accessed over a different time span, replace 100 with your desired number of days.
  • To view the total size in bytess of all the files found by find , you can add  | awk '{s+=$5} END {print "Total SIZE (bytes): " s}' to the end of the command:
find ~ -atime +100 -exec ls -l {} \;| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'

Find all files in a directory that have not been modified in the past 100 days:

This command will recursively search the users home directory and give a detailed listing of all files not modified in the past 100 days.

The last modified time mtime is updated when a file's contents are updated or saved.  Viewing a file will not update the last modified time.

find ~ -mtime +100 -exec ls -l {} \; 
  • To search a different directory replace  ~  with the path you wish to search.  A period  .  can be used to search the current directory.
  • To view files not modified over a different time span, replace  100  with your desired number of days.
  • To view the total size in bytes of all the files found by find , you can add  | awk '{s+=$5} END {print "Total SIZE (bytes): " s}'  to the end of the command:
find ~ -mtime +100 -exec ls -l {} \;| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'

List files larger than a specified size:

Adding the -size <size>  option and argument to the find command allows you to only view files larger than a certain size.  This option and argument can be added to any other find command.

For example, to view all files in a users home directory that are larger than 1GB:

find ~ -size 1G -exec ls -l {} \;

Deleting Identified Data

CAUTION: Be careful when deleting files.  Be sure your command will do what you want before running it.  Extra caution should be used when deleting files from a file system that is not backed up, such as the scratch file system.

If you no longer need the old data, you can delete it using the rm command.

If you need to delete a whole directory tree (a directory and all of its subcontents, including other directories), you can use the rm -R command.  

For example, the following command will delete the data directory in a users home directory:

rm -R ~/data

If you would like to be prompted for conformation before deleting every file, use the -i option.

rm -Ri ~/data 

Enter y or n when prompted.  Simply pressing the enter button will default to n .

Deleting files found by find

rm can be combined with any find command to delete the files found.  The syntax for doing so is:

find <location> <other find options> -exec rm -i {} \;

Where <other find options> can include one or more of the options  -atime <time> , -mtime <time> , and -size <size> .

The following command would find all files in the ~/data directory 1G or larger that have not been accessed in the past 100 days, and then prompt for confirmation to delete each file:

find ~/data -atime +100 -size 1G -exec rm -i {} \;

If you are absolutely sure the files identified by find are okay to delete you can remove the -i option to rm and you will not be prompted.  Extreme caution should be used when doing so!

Archiving Data

If you still need the data but do not plan on needing the data in the immediate future, contact OSC Help to discuss moving the data to a archive file system.  Requests for data to be moved to the archive file system should be larger than 1TB. 

Compressing

If you need the data but do not access the data frequently, you should compress the data using tar or gzip.

Moving Data to a Local File System

If you have the space available locally you can transfer your data there using sftp or Globus

Globus is recommended for large transfers.

The OnDemand File application should not be used for transfers larger than 1GB.

Supercomputer: 
Service: