Torch

"Torch is a deep learning framework with wide support for machine learning algorithms. It's open-source, simple to use, and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C / CUDA implementation. Torch offers popular neural network and optimization libraries that are easy to use, yet provide maximum flexibility to build complex neural network topologies. It also runs up to 70% faster on the latest NVIDIA Pascal™ GPUs, so you can now train networks in hours, instead of days."

Quote from Torch documentation.

Availability and Restrictions

Versions

The following version of Torch is available on OSC cluster:

Version Owens
7 X*
* Current default version

You can use module spider torch to view available modules for a given machine. Feel free to contact OSC Help if you need other versions for your work.

The current version of Torch on Owens requires cuda/8.0.44 and CUDNN v5 for GPU calculations.

Access 

Torch is available to all OSC users. If you have any questions, please contact OSC Help.

Publisher/Vendor/Repository and License Type

Soumith Chintala, Ronan Collobert, Koray Kavukcuoglu, Clement Farabet/ Open source

Usage

Usage on Owens

Setup on Owens

To configure the Owens cluster for the use of Torch, use the following commands:

module load torch

Batch Usage on Ruby or Owens

Batch jobs can request multiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations for Owens, and Scheduling Policies and Limits for more info.  In particular, Torch should be run on a GPU-enabled compute node.

An Example of Using Torch with CIFAR10 Training Data on Owens

Below is an example batch script (job.txt) for using Torch. Please see the reference https://github.com/szagoruyko/cifar.torch for more details.

#!/bin/bash
#SBATCH --job-name=Torch
#SBATCH --nodes=1 --ntasks-per-node=28 --gpus=1
#SBATCH --time=00:30:00
#SBATCH --account <project-account>

# Load module load for torch
module load torch
# Migrate to job temp directory 
cd $TMPDIR
# Clone sample data and scripts
git clone https://github.com/szagoruyko/cifar.torch.git .
# Run the image preprocessing (not necessary for subsequent runs, just re-use provider.t7)
OMP_NUM_THREADS=28 th -i provider.lua <<Input
provider = Provider()
provider:normalize()
torch.save('provider.t7',provider)
exit
y
Input
# Run the torch training
th train.lua --backend cudnn
# Copy results from job temp directory
cp -a * $SLURM_SUBMIT_DIR

In order to run it via the batch system, submit the job.txt file with the following command:

sbatch job.txt

Further Reading

Supercomputer: 
Service: 
Technologies: 
Fields of Science: