PyTorch is an open source machine learning framework with GPU acceleration and deep neural networks that is based on the automatic differentiation in the Torch library of tensors.
OSC does not provide general access to PyTorch. However, we are available to assist with the configuration of local individual/research-group installations on all our clusters. If you have any questions, please contact OSC Help.
Publisher/Vendor/Repository and License Type
https://pytorch.org, Open source.
Installing PyTorch Locally
Here is an example installation that was used in February 2022 to install a GPU enabled version compatible with the CUDA drivers on the clusters at that time:
Load the correct python and cuda modules:
module load python/3.7-2019.10 cuda/11.1.1 module list
Create a python environment to install pytorch into:
conda create -n pytorch
Activate the conda environment:
source activate pytorch
Install the specific version of pytorch:
pip install -t ~/local/pytorch torch==1.10.2+cu111 torchvision==0.11.3+cu111 -f https://download.pytorch.org/whl/cu111/torch_stable.html
PyTorch is now installed into your $HOME/local directory using the local install directory hierarchy described here and can be tested via:
module load python/3.7-2019.10 cuda/11.1.1 ; module list ; source activate pytorchpython <<EOF import torch x = torch.rand(5, 3) print("torch.rand(5, 3) =", x) print( "Is cuda available =", torch.cuda.is_available() ) exit EOF
Please refer here if you want a different version of the Pytorch.
Batch Usage
Batch jobs can request multiple nodes/cores and compute time up to the limits of the OSC systems. Refer to Queues and Reservations for Owens, and Scheduling Policies and Limits for more info. In particular, Pytorch should be run on a GPU-enabled compute node.
AN EXAMPLE BATCH SCRIPT TEMPLTE
Below is an example batch script (job.sh
) for using PyTorch (Slurm syntax).
Contents of job.sh
#!/bin/bash #SBATCH --job-name=pytorch #SBATCH --nodes=1 --ntasks-per-node=28 --gpus_per_node=1 --gpu_cmode=shared #SBATCH --time=30:00 #SBATCH --account=yourprojectID cd $SLURM_SUBMIT_DIR module load python/3.6cuda source activate your-local-python-environment-name python your-pytorch-script.py
In order to run it via the batch system, submit the job.sh
file with the following command:
sbatch job.sh
GPU Usage
- GPU Usage: PyTorch can be ran on a GPU for signifcant performace improvements. See HOWTO: Use GPU with Tensorflow and PyTorch
- Horovod: If you are using PyTorch with a GPU you may want to also consider using Horovod. Horovod will take single-GPU training scripts and scale it to train across many GPUs in parallel.