GATK

GATK is a software package for analysis of high-throughput sequencing data. The toolkit offers a wide variety of tools, with a primary focus on variant discovery and genotyping as well as strong emphasis on data quality assurance.

Availability and Restrictions

Versions

The following versions of GATK are available on OSC clusters:

Version Owens Pitzer Notes
3.5 X    
4.0.11.0   X  
4.1.2.0 X* X*  
4.4.0.0 X X  
* Current default version

You can use module spider gatk to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

Access for Academic Users

GATK4 is available to all OSC users under BSD 3-clause License.

GATK3 is available to academic OSC users. Please review the license agreement carefully before use. If you have any questions, please contact OSC Help.

Publisher/Vendor/Repository and License Type

Broad Institute, Inc., BSD 3-clause License (GATK4 only)

Usage

Usage on Owens

Set-up

To configure your environment for use of GATK, run the following command: module load gatk. The default version will be loaded. To select a particular GATK version, use module load gatk/version. For example, use module load gatk/4.1.2.0 to load GATK 4.1.2.0.

Usage

This software is a Java executable .jar file; thus, it is not possible to add to the PATH environment variable. From module load gatk, a new environment variable, GATK, will be set. Thus, users can use the software by running the following command: gatk {other options},e.g  run gatk -h to see all options.

Usage on Pitzer

Set-up

To configure your environment for use of GATK, run the following command: module load gatk. The default version will be loaded.

Usage

This software is a Java executable .jar file; thus, it is not possible to add to the PATH environment variable. From module load gatk, a new environment variable, GATK, will be set. Thus, users can use the software by running the following command: gatk {other options},e.g  run gatk -h to see all options.

Known Issues

CBLAS undefined symbol error

Update: 05/22/2019 
Version: all

If you use GATK tools that need CBLAS (e.g. CreateReadCountPanelOfNormals), you might encounter an error as

INFO: successfully loaded /tmp/jniloader1239007313705592313netlib-native_system-linux-x86_64.so
java: symbol lookup error: /tmp/jniloader1239007313705592313netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dspr
java: symbol lookup error: /tmp/jniloader1239007313705592313netlib-native_system-linux-x86_64.so: undefined symbol: cblas_dspr

The error raises because the system-default LAPACK does not support CBLAS.  The remedy is to run GATK in conjunction with lapack/3.8.0:

$ module load lapack/3.8.0
$ module load gatk/4.1.2.0
$ LD_LIBRARY_PATH=$OSC_LAPACK_DIR/lib64 gatk AnyTool toolArgs

Alternatively, we recommend using the GATK container. First, download the GATK container to your home or project directory

$ qsub -I -l nodes=1:ppn=1
$ cd $TMPDIR
$ export SINGULARITY_CACHEDIR=$TMPDIR
$ SINGULARITY_TMPDIR=$TMPDIR 
$ singularity pull docker://broadinstitute/gatk:4.1.2.0
$ cp gatk_4.1.2.0.sif ~/

Then run any GATK tool via

$ singularity exec ~/gatk_4.1.2.0.sif gatk AnyTool ToolArgs

You can read more about container in general from here. If you have any further questions, please contact OSC Help.

Further Reading

Supercomputer: 
Service: 
Fields of Science: