The Sequence Read Archive (SRA Toolkit) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Use SRA Toolkit tools to directly operate on SRA runs.
Availability and Restrictions
The following versions of SRA Toolkit are available on OSC clusters:
Version | Owens | Pitzer |
---|---|---|
2.6.3 | X | |
2.9.0 | X | |
2.9.1 | X | |
2.9.6 | X* | X* |
2.10.7 | X | X |
You can use module spider sratoolkit
to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.
Access
SRA Toolkit is available to all OSC users. If you have any questions, please contact OSC Help.
Publisher/Vendor/Repository and License Type
National Center for Biotechnology Information, Freeware
Usage
Usage on Pitzer and Owens
Set-up
module load sratoolkit
. The default version will be loaded. To select a particular SRA Toolkit version, use module load sratoolkit/version
. For example, use module load sratoolkit/2.9.6
to load SRA Toolkit 2.9.6Download SRA Data
The issue has been reported here.
You can download SRA data to local directory with prefetch
$ module load sratoolkit/2.9.6 $ prefetch SRR390728
The default download path is in your home directory ~/ncbi/public
. For example, you can find the SRA file SRR390728.sra in ~/ncbi/public/sra
and the resource files in ~/ncbi/public/refseq
. Use srapath
to check if the SRA accession is available in the download path
$ srapath SRR390728
/users/PAS1234/johndoe/
ncbi/public/sra/SRR390728.sra
Now you can run other SRA tools, e.g. fastq-dump
on computing nodes. Here is a job script example:
#!/bin/bash #SBATCH --job-name use_fastq_dump #SBATCH --time=0:10:0 #SBATCH --nodes=1 --ntasks-per-node=1 module load sratoolkit/2.9.6 module list fastq-dump -X 5 -Z ~/ncbi/public/sra/SRR390728.sra
However, our Home Directory file system is not suitable for heavy computation. If the SRA file is large, you can consider the following options for better performance.
Change default download path to a faster file system, i.e. /fs/scratch
You can change the default download path for SRA data to our scratch file system with one of following two approaches. For example, /fs/scratch/PAS1234/johndoe/ncbi
:
# ### Approach 1. # $ mkdir -p/fs/scratch/PAS1234/johndoe/ncbi # create ncbi directory in scratch if you don't have one
$ module load sratoolkit/2.9.x $ vdb-config --set /repository/user/main/public/root=/fs/scratch/PAS1234/johndoe/ncbi $ prefetch SRR390728 $ srapath SRR390728 /fs/scratch/PAS1234/johndoe/ncbi/sra/SRR390728.sra # ### Approach 2. CAUTION: make sure no working data in ~/ncbi # $ rm -rf ~/ncbi/* $ rmdir ~/ncbi $ mkdir -p/fs/scratch/PAS1234/johndoe/ncbi # create ncbi directory in scratch if you don't have one
$ ln -s/fs/scratch/PAS1234/johndoe/ncbi ~/ncbi
$ module load sratoolkit/2.9.x $ prefetch SRR390728 $ srapath SRR390728 /users/PAS1234/johndoe/ncbi/sra/SRR390728.sra # ### Approach 3. sratoolkit/2.10.x only # $ module load sratoolkit/2.10.x $ vdb-config --prefetch-to-cwd $ mkdir -p/fs/scratch/PAS1234/johndoe/ncbi # create ncbi directory in scratch if you don't have one
$ cd /fs/scratch/PAS1234/johndoe/ncbi $ prefetch SRR390728 $ srapath SRR390728 /fs/scratch/PAS1234/johndoe/ncbi/SRR390728
Your SRA data would be stored in /fs/scratch/PAS1234/johndoe/ncbi
Known Issues
Error when downloading SRA data on computing nodes