SRA Toolkit

The Sequence Read Archive (SRA Toolkit) stores raw sequence data from "next-generation" sequencing technologies including 454, IonTorrent, Illumina, SOLiD, Helicos and Complete Genomics. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence. Use SRA Toolkit tools to directly operate on SRA runs.

NCBI blocks any connection from computing nodes because they are behind firewalls. Thus OSC users cannot use SRA tools to download data "on-the-fly" at runtime or fetch data on computing nodes, e.g. 'fastq-dump -X 5 SRR390728' or 'prefetch SRR390728'.  OSC users must download SRA data on login nodes using the command 'prefetch' before any sequence analysis. Please read the section Download SRA Data below to learn how to download and use SRA data.

Availability and Restrictions

The following versions of SRA Toolkit are available on OSC clusters:

Version Owens Pitzer
2.6.3 X  
2.9.0 X  
2.9.1   X
2.9.6 X* X*
2.10.7 X X
* Current default version

 

You can use  module spider sratoolkit to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access

SRA Toolkit is available to all OSC users. If you have any questions, please contact OSC Help.

Publisher/Vendor/Repository and License Type

National Center for Biotechnology Information, Freeware

Usage

Usage on Pitzer and Owens

Set-up

To configure your environment for use of SRA Toolkit, run the following command: module load sratoolkit. The default version will be loaded. To select a particular SRA Toolkit version, use module load sratoolkit/version. For example, use module load sratoolkit/2.9.6 to load SRA Toolkit 2.9.6

Download SRA Data

NCBI has shifted to using cloud-style object stores. Aspera support requires Cloud credential :https://github.com/ncbi/sra-tools/wiki/04.-Cloud-Credentials.

The issue has been reported here.

You can download SRA data to local directory with prefetch 

$ module load sratoolkit/2.9.6
$ prefetch SRR390728
If it is your first time to use Toolkit, you might encounter some errors. Please refer here for Toolkit configuration.

The default download path is in your home directory ~/ncbi/public. For example, you can find the SRA file SRR390728.sra in ~/ncbi/public/sra and the resource files in ~/ncbi/public/refseq.  Use srapath to check if the SRA accession is available in the download path

$ srapath SRR390728
/users/PAS1234/johndoe/ncbi/public/sra/SRR390728.sra

Now you can run other SRA tools, e.g. fastq-dump on computing nodes. Here is a job script example:

#!/bin/bash
#SBATCH --job-name use_fastq_dump
#SBATCH --time=0:10:0
#SBATCH --nodes=1 --ntasks-per-node=1

module load sratoolkit/2.9.6
module list
fastq-dump -X 5 -Z ~/ncbi/public/sra/SRR390728.sra

However, our Home Directory file system is not suitable for heavy computation. If the SRA file is large, you can consider the following options for better performance. 


Change default download path to a faster file system, i.e. /fs/scratch

You can change the default download path for SRA data to our scratch file system with one of following two approaches.  For example, /fs/scratch/PAS1234/johndoe/ncbi:

#
### Approach 1.
#
$ mkdir -p /fs/scratch/PAS1234/johndoe/ncbi # create ncbi directory in scratch if you don't have one
$ module load sratoolkit/2.9.x
$ vdb-config --set /repository/user/main/public/root=/fs/scratch/PAS1234/johndoe/ncbi
$ prefetch SRR390728
$ srapath SRR390728
/fs/scratch/PAS1234/johndoe/ncbi/sra/SRR390728.sra

#
### Approach 2. CAUTION: make sure no working data in ~/ncbi
# 
$ rm -rf ~/ncbi/*
$ rmdir ~/ncbi
$ mkdir -p /fs/scratch/PAS1234/johndoe/ncbi # create ncbi directory in scratch if you don't have one
$ ln -s  /fs/scratch/PAS1234/johndoe/ncbi ~/ncbi
$ module load sratoolkit/2.9.x
$ prefetch SRR390728
$ srapath SRR390728
/users/PAS1234/johndoe/ncbi/sra/SRR390728.sra

#
### Approach 3. sratoolkit/2.10.x only
# 
$ module load sratoolkit/2.10.x 
$ vdb-config --prefetch-to-cwd
$ mkdir -p /fs/scratch/PAS1234/johndoe/ncbi # create ncbi directory in scratch if you don't have one 
$ cd /fs/scratch/PAS1234/johndoe/ncbi
$ prefetch SRR390728
$ srapath SRR390728
/fs/scratch/PAS1234/johndoe/ncbi/SRR390728

Your SRA data would be stored in /fs/scratch/PAS1234/johndoe/ncbi

Further Reading

Supercomputer: 
Service: 
Fields of Science: