SRA Toolkit

The Sequence Read Archive (SRA Toolkit) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Use SRA Toolkit tools to directly operate on SRA runs.

*IMPORTANT NOTE: NCBI blocks any connection from computing nodes because they are behind firewalls. Thus OSC users cannot use SRA tools to download data "on-the-fly" at runtime or fetch data on computing nodes, e.g. 'fastq-dump -X 5 SRR390728' or 'prefetch SRR390728'.  OSC users must download SRA data on login nodes using the command 'prefetch' before any sequence analysis. Please read the section Download SRA Data below to learn how to download and use SRA data.

Availability and Restrictions

The following versions of SRA Toolkit are available on OSC clusters:

Version Owens Pitzer
2.6.3 X  
2.9.0 X  
2.9.1   X
2.9.6 X* X*
2.10.7 X X
* Current default version

 

You can use  module spider sratoolkit to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

SRA Toolkit is available to all OSC users. If you have any questions, please contact OSC Help.

Publisher/Vendor/Repository and License Type

National Center for Biotechnology Information, Freeware

Usage

Usage on Owens

Set-up

To configure your environment for use of SRA Toolkit, run the following command: module load sratoolkit. The default version will be loaded. To select a particular SRA Toolkit version, use module load sratoolkit/version. For example, use module load sratoolkit/2.9.6 to load SRA Toolkit 2.9.6.

Usage on Pitzer

Set-up

To configure your environment for use of SRA Toolkit, run the following command: module load sratoolkit. The default version will be loaded. To select a particular SRA Toolkit version, use module load sratoolkit/version. For example, use module load sratoolkit/2.9.6 to load SRA Toolkit 2.9.6.

Download SRA Data

*IMPORTANT NOTE: NCBI has shifted to using cloud-style object stores. Aspera support is not available until the situation changes.

You can download SRA data to local directory with prefetch 

$ module load sratoolkit/2.9.6
$ prefetch SRR390728

The default download path is in your home directory ~/ncbi/public. For example, you can find the SRA file SRR390728.sra in ~/ncbi/public/sra and the resource files in ~/ncbi/public/refseq.  Use srapath to check if the SRA accession is available in the download path

$ srapath SRR390728
/users/PAS1234/johndoe/ncbi/public/sra/SRR390728.sra

Now you can run other SRA tools, e.g. fastq-dump on computing nodes. Here is a job script example:

#PBS -N use_fastq_dump
#PBS -j oe
#PBS -l walltime=0:10:0
#PBS -l nodes=1:ppn=1

cd $PBS_O_WORKDIR
module load sratoolkit/2.9.6
module list
fastq-dump -X 5 -Z ~/ncbi/public/sra/SRR390728.sra

However, our Home Directory file system is not suitable for heavy computation. If the SRA file is large, you can consider the following options for better performance. 


Change default download path to a faster file system, i.e. /fs/scratch

You can change the default download path for SRA data to our scratch file system with one of following two approaches.  For example, /fs/scratch/PAS1234/johndoe/ncbi:

#
### Approach 1.
#
$ module load sratoolkit/2.9.x
$ vdb-config --set /repository/user/main/public/root=/fs/scratch/PAS1234/johndoe/ncbi
$ prefetch SRR390728
$ srapath SRR390728
/fs/scratch/PAS1234/johndoe/ncbi/sra/SRR390728.sra

#
### Approach 2. CAUTION: make sure no working data in ~/ncbi
# 
$ rm -rf ~/ncbi/*
$ rmdir ~/ncbi
$ ln -s  /fs/scratch/PAS1234/johndoe/ncbi ~/ncbi
$ module load sratoolkit/2.9.x
$ prefetch SRR390728
$ srapath SRR390728
/users/PAS1234/johndoe/ncbi/sra/SRR390728.sra

#
### Approach 3. sratoolkit/2.10.x only
# 
$ module load sratoolkit/2.10.x 
$ vdb-config --prefetch-to-cwd
$ cd /fs/scratch/PAS1234/johndoe/ncbi
$ prefetch SRR390728
$ srapath SRR390728
/fs/scratch/PAS1234/johndoe/ncbi/SRR390728

Your SRA data would be stored in /fs/scratch/PAS1234/johndoe/ncbi

Further Reading

 
Supercomputer: 
Service: 
Fields of Science: