# LLM Providers ## Provider Interface ```go type Provider interface { Stream(ctx context.Context, req *CompletionRequest) (<-chan *Event, error) Info() ProviderInfo } ``` All providers return a uniform `Stream` of `Event` values — text deltas, thinking deltas, tool calls, and usage. The agent's `consumeStream` function normalizes these into the internal `Message` format, making the agent completely provider-agnostic. --- ## CompletionRequest ```go type CompletionRequest struct { Model string Messages []types.Message Tools []types.ToolInfo System string Thinking types.ThinkingLevel MaxTokens int Temperature float64 StreamOpts StreamOptions } ``` The `BeforeProviderRequest` extension hook receives this struct as JSON and can modify any field before it is sent to the provider — useful for overriding temperature, trimming the tool list, or adjusting `MaxTokens` per request. --- ## ProviderInfo ```go type ProviderInfo struct { Name string Model string MaxTokens int ContextWindow int // 0 = unknown HasToolCall bool HasImages bool } ``` `Info()` is called once at startup. The service uses `ContextWindow` to trigger compaction when the conversation grows too large. `HasImages` controls whether the TUI offers image attachment UI. --- ## ModelLister ```go type ModelLister interface { ListModels() ([]string, error) } ``` All five adapters implement `ModelLister`. When `--list-models` is passed, the CLI casts the active provider to `ModelLister` and prints the result. Each adapter queries the appropriate API: | Provider | Query mechanism | |---|---| | `ollama` | `GET /api/tags` | | `llamacpp` | `GET /v1/models` | | `openai` | `GET /v1/models` | | `anthropic` | `GET /v1/models` | | `google` | Gemini model list API | --- ## Supported Providers | Provider | Backend | |---|---| | `ollama` | Local Ollama server (HTTP) | | `llamacpp` | llama.cpp server (HTTP, OpenAI-compatible) | | `openai` | OpenAI API or any OpenAI-compatible endpoint | | `anthropic` | Anthropic Messages API | | `google` | Google Gemini API | Each adapter lives in `internal/llm/` and translates the provider's wire format into the uniform `Stream` abstraction. --- ## Feature Matrix | Provider | Tools | Images | Thinking | Context Window | |---|:---:|:---:|:---:|---| | `ollama` | ✓ | ✓ | model-dependent | 4096 (default) | | `llamacpp` | ✓ | ✗ | ✗ | from server `n_ctx` | | `openai` | ✓ | ✓ | reasoning models | model-dependent | | `anthropic` | ✓ | ✓ | ✓ extended | model-dependent | | `google` | ✓ | ✓ | ✗ | 1,000,000+ | --- ## Per-Provider Notes ### Ollama The Ollama adapter uses the `/api/chat` endpoint with streaming enabled. Context window defaults to 4096 when not reported by the server. Thinking is supported on models that emit `` tokens (e.g. `qwq`, `deepseek-r1`) — `sharur` surfaces these as `EventThinkingDelta` events by detecting the tag boundaries in the stream. ### llama.cpp Uses the OpenAI-compatible `/v1/chat/completions` endpoint. The context window (`n_ctx`) is queried from the server at startup. Image attachments are not supported because llama.cpp's OpenAI endpoint does not accept multipart vision payloads in the standard format. ### OpenAI Uses the standard `/v1/chat/completions` streaming endpoint. Any server implementing this API — vLLM, LM Studio, Groq, Together AI — can be used by setting `openAIBaseURL`. Reasoning models (o3, o4-mini) emit `reasoning_content` deltas that are surfaced as `EventThinkingDelta`. ### Anthropic Uses the Messages API (`/v1/messages`) with streaming. Extended thinking is activated when `req.Thinking` is `medium` or `high`: - **medium** — 10,000-token thinking budget - **high** — 20,000-token thinking budget The API requires `temperature: 1.0` when extended thinking is enabled; the adapter sets this automatically and overrides any user-supplied temperature for that request. ### Google Uses the Gemini `generateContent` API via the `google.golang.org/genai` client library. Gemini 1.5 Pro and later have context windows of 1M+ tokens; compaction is rarely triggered for typical sessions. --- ## Adding a Provider Implement the `Provider` interface in `internal/llm/yourprovider.go` and register it in `internal/config/factory.go`. Implement `ModelLister` to enable `--list-models`. The adapter receives a fully-formed `CompletionRequest`; it is responsible for translating `Message.ToolCalls` and `Message.Images` into the target API's format.