This HOWTO will demonstrate how to lower ones' disk space usage. The following procedures can be applied to all of OSC's file systems.
We recommend users regularly check their data usage and clean out old data that is no longer needed.
Users who need assistance lowering their data usage can contact OSC Help.
Preventing Excessive Data Usage Before It Starts
Users should ensure that their jobs are written in such a way that temporary data is not saved to permanent file systems, such as the project space file system or in their home directory.
If your job copies data from the scratch file system or its node's local disk ($TMPDIR
) back to a permanent file system, such as the project space file system or a home directory ( /users/PXX####/xxx####/
), you should ensure you are only copying the files you will need later.
Identifying Old and Large Data
The following commands will help you identify old data using the find
command.
find
commands may produce an excessive amount of output. To terminate the command while it is running, click CTRL + C
.Find all files in a directory that have not been accessed in the past 100 days:
This command will recursively search the users home directory and give a detailed listing of all files not accessed in the past 100 days.
The last access time atime
is updated when a file is opened by any operation, including grep
, cat
, head
, sort
, etc.
find ~ -atime +100 -exec ls -l {} \;
- To search a different directory replace
~
with the path you wish to search. A period.
can be used to search the current directory. - To view files not accessed over a different time span, replace
100
with your desired number of days. - To view the total size in bytes of all the files found by
find
, you can add| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
to the end of the command:
find ~ -atime +100 -exec ls -l {} \;| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
Find all files in a directory that have not been modified in the past 100 days:
This command will recursively search the users home directory and give a detailed listing of all files not modified in the past 100 days.
The last modified time mtime
is updated when a file's contents are updated or saved. Viewing a file will not update the last modified time.
find ~ -mtime +100 -exec ls -l {} \;
- To search a different directory replace
~
with the path you wish to search. A period.
can be used to search the current directory. - To view files not modified over a different time span, replace
100
with your desired number of days. - To view the total size in bytes of all the files found by
find
, you can add| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
to the end of the command:
find ~ -mtime +100 -exec ls -l {} \;| awk '{s+=$5} END {print "Total SIZE (bytes): " s}'
List files larger than a specified size:
Adding the -size <size>
option and argument to the find command allows you to only view files larger than a certain size. This option and argument can be added to any other find command.
For example, to view all files in a users home directory that are larger than 1GB:
find ~ -size +1G -exec ls -l {} \;
List number of files in directories
Use the following command to view list dirs under <target-dir> and number of files contained in the dirs.
du --inodes -d 1 <target-dir>
Deleting Identified Data
If you no longer need the old data, you can delete it using the rm
command.
If you need to delete a whole directory tree (a directory and all of its subcontents, including other directories), you can use the rm -R
command.
For example, the following command will delete the data directory in a users home directory:
rm -R ~/data
If you would like to be prompted for confirmation before deleting every file, use the -i
option.
rm -Ri ~/data
Enter y
or n
when prompted. Simply pressing the enter button will default to n
.
Deleting files found by find
The rm
command can be combined with any find
command to delete the files found. The syntax for doing so is:
find <location> <other find options> -exec rm -i {} \;
Where <other find options>
can include one or more of the options -atime <time>
, -mtime <time>
, and -size <size>
.
The following command would find all files in the ~/data
directory 1G or larger that have not been accessed in the past 100 days, and then prompt for confirmation to delete each file:
find ~/data -atime +100 -size 1G -exec rm -i {} \;
If you are absolutely sure the files identified by find
are okay to delete you can remove the -i
option to rm
and you will not be prompted. Extreme caution should be used when doing so!
Archiving Data
If you still need the data but do not plan on needing the data in the immediate future, contact OSC Help to discuss moving the data to an archive file system. Requests for data to be moved to the archive file system should be larger than 1TB.
Compressing
If you need the data but do not access the data frequently, you should compress the data using tar or gzip.
Reducing number of files using tar
If you want to keep a number of files, you can choose to combine them into a single archive file. You might do this if the data that you do not access frequently is in a number of files. These files can be different file types. The following command shows you how to add 2 files (named file1 and file2) into a single, tar, archive file (named files.tar). It is good practice to keep the extension .tar to differentiate the file as an archive, though it is not necessary.
tar -cvf files.tar file1 file2
To extract the data, you can use the following command.
tar -xvf files.tar
Reducing disk size using gzip
If you want to keep the need to reduce the total space being used by a file, you can compress the file using gzip (GNU zip). You might do this if the data that you do not access frequently is in a large file. The following command shows you how to compress a file (named file.txt). The resulting file of using gzip will have the same file name as before (extensions included) but will add the extension .gz to differentiate the file as compressed.
gzip file.txt
You can also compress multiple files into a single gzip file using the following command. This command also gives you more flexibility in naming the zipped files.
cat file1.txt file2.txt |gzip > files.txt.gz
To extract the data, you can use the following command.
gunzip file.txt.gz
Combing tar and gzip
If you have multiple, large files or a single large directory, it may be helpful to compress an entire directory. In order to do this you will need to tar the directory into a single file and then use the gzip command to compress the file. You can shorten the command into a single line as follows.
tar -cvfz folder.tar.gz folder
Moving Data to a Local File System
If you have the space available locally you can transfer your data there using sftp or Globus.
Globus is recommended for large transfers.
The OnDemand File application should not be used for transfers larger than 1GB.