Ollama Integration

Located in Scripts/Runtime/Ollama/

Wraps the Ollama REST API with async/await patterns and streaming support for running LLMs locally.

Key Scripts

| Script | Purpose | |--------|---------| | OllamaConfig.cs | MonoBehaviour configuration component. Manages server URL, auth token, model, and history limit. | | Chat.cs | Chat completions with history management. Supports streaming responses. | | Generate.cs | Raw text generation without chat context. | | Embeddings.cs | Generate vector embeddings for RAG applications. | | RAG.cs | Retrieval Augmented Generation helpers. | | ToolCalling.cs | Function/tool calling support. | | Payload.cs | Request/response data structures. |

Basic Usage

// Initialize a chat session
Ollama.InitChat(historyLimit: 8, system: "You are a helpful assistant.");

// Send a chat message (non-streaming)
string response = await Ollama.Chat("gemma3:4b", "Hello!");

// Send a chat message (streaming)
await Ollama.ChatStream(
    onTextReceived: (text) => responseText.text += text,
    model: "gemma3:4b",
    prompt: "Tell me a story"
);

Configuration

Configure via the OllamaConfig component in your scene:

  • Server URL — Your Ollama server address, or set directly via Ollama.SetBaseUrl()
  • Auth Token — Optional bearer token for secured Ollama servers
  • Model — Use Ollama model syntax (e.g., gemma3:4b, llama3:8b)
  • History Limit — Number of previous messages to include in chat history
  • Keep Alive — How long the model stays loaded in memory (default: 300 seconds)

Chat History

// Save chat to disk
Ollama.SaveChatHistory("mychat.dat");

// Load chat from disk
Ollama.LoadChatHistory("mychat.dat", historyLimit: 8);
ℹ️
Note: Ollama must be running on the target machine. Install from ollama.com and pull your desired model with ollama pull gemma3:4b.