SRA Toolkit

The Sequence Read Archive (SRA Toolkit) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Use SRA Toolkit tools to directly operate on SRA runs.

*IMPORTANT NOTE: NCBI blocks any connection from computing nodes because they are behind firewalls. Thus OSC users cannot use SRA tools to download data "on-the-fly" at runtime on computing nodes, e.g. 'fastq-dump -X 5 SRR390728'.  OSC users must download SRA data on login nodes using the command 'prefetch' before any sequence analysis. Please read the section 'Download SRA Data' below to learn how to download and use SRA data.

Availability and Restrictions

The following versions of SRA Toolkit are available on OSC clusters:

Version Owens Pitzer
2.6.3  X  
2.9.0 X*  
2.9.1   X*
2.9.6   X
* Current default version

 

You can use  module spider sratoolkit to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

SRA Toolkit is available to all OSC users without restriction.

Publisher/Vendor/Repository and License Type

National Center for Biotechnology Information, Freeware

Usage

Usage on Owens

Set-up

To configure your environment for use of SRA Toolkit, run the following command: module load sratoolkit. The default version will be loaded. To select a particular SRA Toolkit version, use module load sratoolkit/version. For example, use module load sratoolkit/2.6.3 to load SRA Toolkit 2.6.3.

Usage on Pitzer

Set-up

To configure your environment for use of SRA Toolkit, run the following command: module load sratoolkit. The default version will be loaded. To select a particular SRA Toolkit version, use module load sratoolkit/version. For example, use module load sratoolkit/2.9.1 to load SRA Toolkit 2.9.1.

Download SRA Data

At OSC, you can use Aspera to facilitate SRA data download. For example, prefetch with SRA accession 

$ module load sratoolkit/2.9.x
$ prefetch -v SRR390728 $USE_ASPERA

The default download path is in your home directory ~/ncbi. For example, you can find the SRA file SRR390728.sra in ~/ncbi/sra and the resource files in ~/ncbi/refseq. Then you can run other SRA tools, e.g. fastq-dump on computing nodes. Here is a job script example:

#PBS -N use_fastq_dump
#PBS -j oe
#PBS -l walltime=0:10:0
#PBS -l nodes=1:ppn=1

cd $PBS_O_WORKDIR
module load sratoolkit/2.9.0
module list
fastq-dump -X 5 -Z SRR390728

However, our Home Directory file system is not suitable for heavy computation. If the SRA file is large, you can consider the following two options for better performance. 

Copy SRA file to $TMPDIR

Use the script above as an example, copy the SRA data to $TMPDIR before calling fastq-dump:

cp ~/ncbi/sra/SRR390728.sra $TMPDIR
cd $TMPDIR
fastq-dump -X 5 -Z ./SRR390728


Change default download path to a faster file system, i.e. /fs/scratch

You can change the default download path for SRA data to our scratch file system. For example, /fs/scratch/PAS1234/johndoe/ncbi:

$ mkdir -p ~/.ncbi
$ echo '/repository/user/main/public/root = "/fs/scratch/PAS1234/johndoe/ncbi"' > ~/.ncbi/user-settings.mkfg
$ module load sratoolkit/2.9.x
$ prefetch -v SRR390728 $USE_ASPERA

Then your SRA data are saved in /fs/scratch/PAS1234/johndoe/ncbi and SRA tools will load data from this directory. 

Further Reading

Supercomputer: 
Service: 
Fields of Science: