Ollama Integration
Located in Scripts/Runtime/Ollama/
Wraps the Ollama REST API with async/await patterns and streaming support for running LLMs locally.
Key Scripts
| Script | Purpose |
|--------|---------|
| OllamaConfig.cs | MonoBehaviour configuration component. Manages server URL, auth token, model, and history limit. |
| Chat.cs | Chat completions with history management. Supports streaming responses. |
| Generate.cs | Raw text generation without chat context. |
| Embeddings.cs | Generate vector embeddings for RAG applications. |
| RAG.cs | Retrieval Augmented Generation helpers. |
| ToolCalling.cs | Function/tool calling support. |
| Payload.cs | Request/response data structures. |
Basic Usage
// Initialize a chat session
Ollama.InitChat(historyLimit: 8, system: "You are a helpful assistant.");
// Send a chat message (non-streaming)
string response = await Ollama.Chat("gemma3:4b", "Hello!");
// Send a chat message (streaming)
await Ollama.ChatStream(
onTextReceived: (text) => responseText.text += text,
model: "gemma3:4b",
prompt: "Tell me a story"
);
Configuration
Configure via the OllamaConfig component in your scene:
- Server URL — Your Ollama server address, or set directly via
Ollama.SetBaseUrl() - Auth Token — Optional bearer token for secured Ollama servers
- Model — Use Ollama model syntax (e.g.,
gemma3:4b,llama3:8b) - History Limit — Number of previous messages to include in chat history
- Keep Alive — How long the model stays loaded in memory (default: 300 seconds)
Chat History
// Save chat to disk
Ollama.SaveChatHistory("mychat.dat");
// Load chat from disk
Ollama.LoadChatHistory("mychat.dat", historyLimit: 8);
Note: Ollama must be running on the target machine. Install from ollama.com and pull your desired model with
ollama pull gemma3:4b.