Ollama

Ollama is an open-source inference server supporting a number of generative AI models.  This module also includes Open-WebUI, which provides an easy-to-use web interface.

Ollama is in early user testing phase - not all functionality is guaranteed to work.  Contact oschelp@osc.edu with any questions.

Availability and Restrictions

Versions

Ollama is available on OSC Clusters. The versions currently available at OSC are:

Version Cardinal Ascend
0.5.13 X X

 

You can use module spider ollama to view available modules for a given machine.

Access:

All OSC users may use Ollama and Open-WebUI, but individual models may have their own license restrictions.

Publisher/Vendor/Repository and License Type

https://github.com/ollama/ollama, MIT license.

https://github.com/open-webui/open-webui, BSD-3-Clause license.

Prerequisites

  • GPU Usage: Ollama should be run with a GPU for best performance. 
  • OnDemand Desktop Session: If using the Open-WebUI web interface, you will need to first start an OnDemand Desktop session on Cardinal with GPU.

Running Ollama and Open-WebUI Overview

1. Load module

2. Start Ollama

3. Pull a model (first time only)

4. Start Open-WebUI

 

Commands

Ollama is available through the module system and must be loaded prior to running any of the commands below:

loading ollama module:
module load ollama/0.5.13

This will print out two port numbers, one each for the Ollama and Open_WebUI services. E.g.,

Ollama port: 61234

Open_WebUI port: 51234

These are only examples - your port numbers will differ from the ones above.

Take note of your port numbers, as you will need them if you close your browser.
Starting ollama:
ollama_start
Starting open-webui:
open_webui_start

Ollama must be running for Open-WebUI to connect.  Starting Open-WebUI will automatically open a browser.  A model must also be installed before it is available - see Model Management below.

Model Management

installing a model:
ollama_pull <modelname>

The list of supported models can be found at ollama.com/library. Ollama must be running prior to pulling a new model.  By default, models are saved to $HOME/.ollama/models, but this is customizable through the use of environment variables.  See module show ollama/0.5.13 for more details.

Some models require licensing agreements or are otherwise restricted and require a Hugging Face account and login.  With the Ollama module loaded, use the huggingface-cli tool to login:

>>> huggingface-cli login

For more details, see https://huggingface.co/docs/huggingface_hub/en/guides/cli.

 

Deleting a model:
ollama_rm <modelname>

Ollama must be running prior to deleting model.

 

Interactive vs. Batch Usage

Ollama can be used interactively by loading the module and starting the service(s) as described above.

Requesting a GPU-enabled desktop session and using Open-WebUI is one possible use case.

The Ollama module can also be used in batch mode by loading the module in your batch script.  For example, you may want to run offline inference by running a script that relies on an inference endpoint.

Ollama provides an OpenAI API-compliant API endpoint, and can be accessed by Open-WebUI or another OpenAI API-compliant client, meaning you can bring your own clients or write your own.  As long as you can send requests to localhost:OLLAMA_PORT, this should work and support a wide variety of workflows. 

Please note this software is in early user testing and might not function as desired.  Please reach out to oschelp@osc.edu with any issues.

    Jupyter Usage

    This is not yet tested but might work - contact oschelp@osu.edu if you're interested in this functionality.

     

    Supercomputer: 
    Technologies: