vLLM
vLLM is an open-source inference server for large language models (LLMs).
vLLM is an open-source inference server for large language models (LLMs).
It is now possible to run Docker and Apptainer/Singularity containers on all clusters at OSC. Single-node jobs are currently supported, including GPU jobs; MPI jobs are planned for the future.
From the Docker website: "A container image is a lightweight, stand-alone, executable package of a piece of software that includes everything needed to run it: code, runtime, system tools, system libraries, settings."