Ollama is an open-source inference server for large language models (LLMs). This module also includes Open-WebUI, which provides an easy-to-use web interface.
Availability and Restrictions
Versions
Ollama is available on OSC Clusters. The versions currently available at OSC are:
| Version | Cardinal | Ascend |
|---|---|---|
| 0.5.13 | X | X |
| 0.11.3 | X | X |
| 0.12.5 | X | X |
| 0.13.1 | X | X |
You can use module spider ollama to view available modules for a given machine.
Access:
All OSC users may use Ollama and Open-WebUI, but individual models may have their own license restrictions.
Publisher/Vendor/Repository and License Type
https://github.com/ollama/ollama, MIT license.
https://github.com/open-webui/open-webui, BSD-3-Clause license.
Prerequisites
- GPU Usage: Ollama should be run with a GPU for best performance.
- OnDemand Desktop Session: If using the Open-WebUI web interface, you will need to first start an OnDemand Desktop session on Cardinal/Ascend with GPU.
Due to the need for GPUs, we recommend not running Ollama on login nodes nor OnDemand lightweight desktops.
Running Ollama and Open-WebUI Overview
1. Load module
2. Start Ollama
3. Start Open-WebUI
Commands
Ollama is available through the module system and must be loaded prior to running any of the commands below:
loading ollama module:
module load ollama/0.13.1
Starting ollama:
ollama_start
This will print out a port number for the Ollama service. E.g.,
Ollama port: 61234
Starting open-webui:
open_webui_start
This will print out a port number for the Open_WebUI service. E.g.,
Open_WebUI port: 51234
Port numbers are only examples - your port numbers will differ from the ones above.
Ollama must be running for Open-WebUI to connect. Starting Open-WebUI will automatically open a browser.
Stopping Ollama and Open-WebUI:
Ollama and Open-WebUI can be manually stopped with the following commands:
ollama_stop
open_webui_stop
They are also killed upon module unload. If you want to stop the services, you can simply unload the ollama module:
module unload ollama/0.13.1
Model Management
By default, Ollama uses a central, read-only model repository defined by OLLAMA_MODELS
However, you can use custom models and manage your own set of models by setting OLLAMA_MODELS to an existing path you have write access to, such as a project directory or scratch space. This must be done prior to starting Ollama.
export OLLAMA_MODELS=/fs/project/ABC1234/ollama/models ollama_start
installing a model:
ollama_pull <modelname>
The list of supported models can be found at ollama.com/library. Ollama must be running prior to pulling a new model.
Some models require licensing agreements or are otherwise restricted and require a Hugging Face account and login. With the Ollama module loaded, use the huggingface-cli tool to login:
hf auth login
For more details, see https://huggingface.co/docs/huggingface_hub/en/guides/cli.
Deleting a model:
ollama_rm <modelname>
Ollama must be running prior to deleting model. You can only delete models if you are using a custom OLLAMA_MODELS path that you have write access to.
Interactive vs. Batch Usage
Ollama can be used interactively by loading the module and starting the service(s) as described above.
Requesting a GPU-enabled desktop session and using Open-WebUI is one possible use case.
The Ollama module can also be used in batch mode by loading the module in your batch script. For example, you may want to run offline inference by running a script that relies on an inference endpoint.
Ollama provides an OpenAI API-compliant API endpoint, and can be accessed by Open-WebUI or another OpenAI API-compliant client, meaning you can bring your own clients or write your own. As long as you can send requests to localhost:OLLAMA_PORT, this should work and support a wide variety of workflows.
For the most up-to-date API compatibility information (and more examples), see: Ollama API docs and Open-WebUI API docs. OpenAI API chat completion docs are useful as a reference, but Ollama does not currently support the complete OpenAI API, including tools and responses.
Here is a basic Python example using the OpenAI package:
import os
from openai import OpenAI
ollama_port = os.getenv("OLLAMA_PORT")
client = OpenAI( base_url = f"http://localhost:{ollama_port}/v1", api_key="")
response = client.chat.completions.create(
model = "gemma3:12b",
messages = [
{"role": "developer", "content": "talk like a pirate"},
{"role": "user", "content": "how do I check a Python object's type?"}
]
)
For more advanced API usage example with asynchronous requests, see this GitHub project: OSC/async_llm_api
Please note this software is in early user testing and might not function as desired. Please reach out to oschelp@osc.edu with any issues.
Jupyter Usage
This is under development - contact oschelp@osu.edu if you're interested in this functionality.