LlamaCPP Integration
Located in Scripts/Runtime/Llamacpp/
High-performance LLM inference by connecting to a remote llama.cpp server.
Key Scripts
| Script | Purpose |
|--------|---------|
| LLM.cs | Server configuration component. Manages connection settings (port, context size, API key). |
| LLMCaller.cs | Base class for making LLM requests. Handles local/remote switching and request management. |
| LLMChatTemplates.cs | Chat template definitions for different model formats. |
| LLMInterface.cs | Request/response data structures for the llama.cpp API. |
Basic Usage
// LLMCaller handles the connection
public LLMCaller llmCharacter;
// Send a chat message and receive streaming response
string response = await llmCharacter.Chat(prompt, OnPartialResponse);
// Callback for streaming tokens
void OnPartialResponse(string partial) {
responseText.text = partial;
}
Configuration
- Remote Mode — Set
remote = trueand configurehost(e.g.,localhost:13333) - Context Size — Adjust
contextSizeon theLLMcomponent (default: 8192) - Chat Template — Set the appropriate template for your model in
chatTemplate
Tip: Use the recommended
qwen2.5-3b-instruct-q8_0.gguf model from the Getting Started guide for best results with the RPG Generator demo.