This HOWTO will demonstrate how to lower ones' disk space usage. The following procedures can be applied to all of OSC's file systems.
We recommend users regularly check their data usage and clean out old data that is no longer needed.
Users who need assistance lowering their data usage can contact OSC Help.
Users should ensure that their jobs are written in such a way that temporary data is not saved to permanent file systems, such as the project space file system or their home directory.
If your job copies data from the scratch file system or its node's local disk ($TMPDIR
) back to a permanent file system, such as the project space file system or a home directory ( /users/PXX####/xxx####/
), you should ensure you are only copying the files you will need later.
The following commands will help you identify old data using the find
command.
find
commands may produce an excessive amount of output. To terminate the command while it is running, click CTRL + C
.This command will recursively search the users home directory and give a detailed listing of all files not accessed in the past 100 days.
The last access time atime
is updated when a file is opened by any operation, including grep
, cat
, head
, sort
, etc.
find ~ -atime +100 -exec ls -l {} \;
~
with the path you wish to search. A period .
can be used to search the current directory.100
with your desired number of days.find
, you can add | awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
to the end of the command:find ~ -atime +100 -exec ls -l {} \;| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
This command will recursively search the users home directory and give a detailed listing of all files not modified in the past 100 days.
The last modified time mtime
is updated when a file's contents are updated or saved. Viewing a file will not update the last modified time.
find ~ -mtime +100 -exec ls -l {} \;
~
with the path you wish to search. A period .
can be used to search the current directory.100
with your desired number of days.find
, you can add | awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
to the end of the command:find ~ -mtime +100 -exec ls -l {} \;| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
Adding the -size <size>
option and argument to the find command allows you to only view files larger than a certain size. This option and argument can be added to any other find command.
For example, to view all files in a users home directory that are larger than 1GB:
find ~ -size +1G -exec ls -l {} \;
Use the following command to view list dirs under <target-dir> and number of files contained in the dirs.
du --inodes -d 1 <target-dir>
If you no longer need the old data, you can delete it using the rm
command.
If you need to delete a whole directory tree (a directory and all of its subcontents, including other directories), you can use the rm -R
command.
For example, the following command will delete the data directory in a users home directory:
rm -R ~/data
If you would like to be prompted for confirmation before deleting every file, use the -i
option.
rm -Ri ~/data
Enter y
or n
when prompted. Simply pressing the enter button will default to n
.
find
The rm
command can be combined with any find
command to delete the files found. The syntax for doing so is:
find <location> <other find options> -exec rm -i {} \;
Where <other find options>
can include one or more of the options -atime <time>
, -mtime <time>
, and -size <size>
.
The following command would find all files in the ~/data
directory 1G or larger that have not been accessed in the past 100 days, and then prompt for confirmation to delete each file:
find ~/data -atime +100 -size 1G -exec rm -i {} \;
If you are absolutely sure the files identified by find
are okay to delete you can remove the -i
option to rm
and you will not be prompted. Extreme caution should be used when doing so!
If you still need the data but do not plan on needing the data in the immediate future, contact OSC Help to discuss moving the data to an archive file system. Requests for data to be moved to the archive file system should be larger than 1TB.
If you need the data but do not access the data frequently, you should compress the data using tar or gzip.
If you have the space available locally you can transfer your data there using sftp or Globus.
Globus is recommended for large transfers.
The OnDemand File application should not be used for transfers larger than 1GB.