Ollama

Ollama is an open-source inference server for large language models (LLMs).  This module also includes Open-WebUI, which provides an easy-to-use web interface.

Ollama is in early user testing phase - not all functionality is guaranteed to work.  Contact oschelp@osc.edu with any questions.
Ollama is not currently suitable for use with protected or sensitive data - do not use if you need protected data service. See https://www.osc.edu/resources/protected_data_service for more details.

Availability and Restrictions

Versions

Ollama is available on OSC Clusters. The versions currently available at OSC are:

Version Cardinal Ascend
0.5.13 X X
0.11.3 X X
0.12.5 X X
0.13.1 X X

 

You can use module spider ollama to view available modules for a given machine.

Access:

All OSC users may use Ollama and Open-WebUI, but individual models may have their own license restrictions.

Publisher/Vendor/Repository and License Type

https://github.com/ollama/ollama, MIT license.

https://github.com/open-webui/open-webui, BSD-3-Clause license.

Prerequisites

  • GPU Usage: Ollama should be run with a GPU for best performance. 
  • OnDemand Desktop Session: If using the Open-WebUI web interface, you will need to first start an OnDemand Desktop session on Cardinal/Ascend with GPU.

Due to the need for GPUs, we recommend not running Ollama on login nodes nor OnDemand lightweight desktops.

Running Ollama and Open-WebUI Overview

1. Load module

2. Start Ollama

3. Start Open-WebUI

 

Commands

Ollama is available through the module system and must be loaded prior to running any of the commands below:

loading ollama module:
module load ollama/0.13.1
Starting ollama:
ollama_start

This will print out a port number for the Ollama service. E.g.,

Ollama port: 61234

Starting open-webui:
open_webui_start

This will print out a port number for the Open_WebUI service. E.g.,

Open_WebUI port: 51234

Port numbers are only examples - your port numbers will differ from the ones above.

Ollama must be running for Open-WebUI to connect.  Starting Open-WebUI will automatically open a browser.

Take note of your port numbers, as you will need them if you close your browser.
Stopping Ollama and Open-WebUI:

Ollama and Open-WebUI can be manually stopped with the following commands:

ollama_stop
open_webui_stop

They are also killed upon module unload.  If you want to stop the services, you can simply unload the ollama module:

module unload ollama/0.13.1

Model Management

By default, Ollama uses a central, read-only model repository defined by OLLAMA_MODELS

However, you can use custom models and manage your own set of models by setting OLLAMA_MODELS to an existing path you have write access to, such as a project directory or scratch space.  This must be done prior to starting Ollama.

export OLLAMA_MODELS=/fs/project/ABC1234/ollama/models
ollama_start
installing a model:
ollama_pull <modelname>

The list of supported models can be found at ollama.com/library. Ollama must be running prior to pulling a new model. 

Downloading large LLMs can exceed your disk space quota.  Check model sizes before downloading!


Some models require licensing agreements or are otherwise restricted and require a Hugging Face account and login.  With the Ollama module loaded, use the huggingface-cli tool to login:

hf auth login

For more details, see https://huggingface.co/docs/huggingface_hub/en/guides/cli.

 

Deleting a model:
ollama_rm <modelname>

Ollama must be running prior to deleting model.  You can only delete models if you are using a custom OLLAMA_MODELS path that you have write access to.

 

Interactive vs. Batch Usage

Ollama can be used interactively by loading the module and starting the service(s) as described above.

Requesting a GPU-enabled desktop session and using Open-WebUI is one possible use case.

The Ollama module can also be used in batch mode by loading the module in your batch script.  For example, you may want to run offline inference by running a script that relies on an inference endpoint.

Ollama provides an OpenAI API-compliant API endpoint, and can be accessed by Open-WebUI or another OpenAI API-compliant client, meaning you can bring your own clients or write your own.  As long as you can send requests to localhost:OLLAMA_PORT, this should work and support a wide variety of workflows. 

For the most up-to-date API compatibility information (and more examples), see: Ollama API docs and Open-WebUI API docs.  OpenAI API chat completion docs are useful as a reference, but Ollama does not currently support the complete OpenAI API, including tools and responses.

Here is a basic Python example using the OpenAI package:

import os
from openai import OpenAI

ollama_port = os.getenv("OLLAMA_PORT")

client = OpenAI( base_url = f"http://localhost:{ollama_port}/v1", api_key="") 

response = client.chat.completions.create(
    model = "gemma3:12b",
    messages = [
        {"role": "developer", "content": "talk like a pirate"},
        {"role": "user", "content": "how do I check a Python object's type?"}
     ]
)

For more advanced API usage example with asynchronous requests, see this GitHub project: OSC/async_llm_api 

Please note this software is in early user testing and might not function as desired.  Please reach out to oschelp@osc.edu with any issues.

Jupyter Usage

This is under development - contact oschelp@osu.edu if you're interested in this functionality.

 

Supercomputer: 
Technologies: