This section describes the high-level architecture of sharur: how its components are organized, how data flows through the system, and how the key abstractions relate to each other.
The agent is driven by an event-bus (internal/events). Every meaningful state transition emits an agent.Event to all subscribers.
EventBus Performance
The EventBus is async and non-blocking. Publish() enqueues to a 4096-item buffered channel per subscriber and returns immediately โ it never blocks the agent loop. Each subscriber runs in its own goroutine. Slow subscribers drop events to protect the agent loop from backpressure.
Event Flow
sequenceDiagram
participant User
participant Agent
participant LLM
participant Tools
User->>Agent: Prompt(text)
Agent->>Agent: EventAgentStart
loop each LLM turn
Agent->>Agent: EventTurnStart ยท EventMessageStart
Agent->>LLM: provider.Stream()
LLM-->>Agent: EventTextDelta (รn)
LLM-->>Agent: EventThinkingDelta (รn, if thinking enabled)
LLM-->>Agent: EventToolCall (รn, if tools requested)
Agent->>Agent: EventMessageEnd
loop each tool call
Agent->>Tools: execTool()
Tools-->>Agent: EventToolDelta (streaming)
Agent->>Agent: EventToolOutput
end
Agent->>Agent: EventTurnEnd
end
Agent->>Agent: EventAgentEnd
State Machine
The agent transitions through explicit states to prevent concurrent modification:
A ToolRegistry holds all registered tools. During a turn, when the LLM emits a tool call, execTool looks up the tool by name, executes it, and streams partial output via EventToolDelta before emitting the final EventToolOutput.
Dry-Run Mode: When DryRun is enabled, any tool that is not marked as read-only will bypass execution and return a descriptive preview of what it would have done.
Input Sanitization: Prompt template expansion automatically wraps user inputs in <untrusted_input> tags to prevent prompt injection into the base instructions.
sharur follows a Strict Protobuf Internal Architecture. Instead of UI modes calling Go functions directly, all interfaces are treated as clients of a central AgentService.
Protobuf Boundary
The interface between the UI and the core is defined in proto/sharur/v1/agent.proto. This boundary ensures:
Consistency: All modes (TUI, CLI, JSON, Remote gRPC) use the exact same code paths and logic.
Decoupling: UI logic is completely isolated from agent state, session persistence, and provider adapters.
Interoperability: Any gRPC-capable client can interact with a sharur service.
In-Process Communication
For local CLI usage, sharur uses a specialized In-Process Client (internal/service/client.go). It uses bufconn to implement the pb.AgentServiceClient interface over an in-memory pipe. This provides the safety and structure of gRPC without the latency or configuration complexity of network ports.
Backend Service (internal/service)
The Service struct implements pb.AgentServiceServer. It owns the session.Manager and manages the lifecycle of agent.Agent instances. It translates between internal agent events (Go channels) and Protobuf event streams.
Session Loading Strategy
RPCs split into three lookup strategies:
Strategy
Used by
Behaviour
getOrCreate(id)
Prompt, NewSession
Always returns an entry โ creates a fresh agent if id is unknown, loading from disk if a matching session file exists
Returns the entry if it is in memory or can be loaded from disk; returns NotFound for completely unknown IDs
lookup(id)
Steer, Abort, FollowUp, StreamEvents
In-memory only โ these only make sense for a currently-running agent
This means a /resume <id> command can switch to any session ever saved to disk without a round-trip NewSession call: the first GetMessages or GetState call transparently loads it.
All providers return a uniform Stream of Event values โ text deltas, thinking deltas, tool calls, and usage. The agent’s consumeStream function normalizes these into the internal Message format, making the agent completely provider-agnostic.
The BeforeProviderRequest extension hook receives this struct as JSON and can modify any field before it is sent to the provider โ useful for overriding temperature, trimming the tool list, or adjusting MaxTokens per request.
Info() is called once at startup. The service uses ContextWindow to trigger compaction when the conversation grows too large. HasImages controls whether the TUI offers image attachment UI.
All five adapters implement ModelLister. When --list-models is passed, the CLI casts the active provider to ModelLister and prints the result. Each adapter queries the appropriate API:
Provider
Query mechanism
ollama
GET /api/tags
llamacpp
GET /v1/models
openai
GET /v1/models
anthropic
GET /v1/models
google
Gemini model list API
Supported Providers
Provider
Backend
ollama
Local Ollama server (HTTP)
llamacpp
llama.cpp server (HTTP, OpenAI-compatible)
openai
OpenAI API or any OpenAI-compatible endpoint
anthropic
Anthropic Messages API
google
Google Gemini API
Each adapter lives in internal/llm/ and translates the provider’s wire format into the uniform Stream abstraction.
Feature Matrix
Provider
Tools
Images
Thinking
Context Window
ollama
โ
โ
model-dependent
4096 (default)
llamacpp
โ
โ
โ
from server n_ctx
openai
โ
โ
reasoning models
model-dependent
anthropic
โ
โ
โ extended
model-dependent
google
โ
โ
โ
1,000,000+
Per-Provider Notes
Ollama
The Ollama adapter uses the /api/chat endpoint with streaming enabled. Context window defaults to 4096 when not reported by the server. Thinking is supported on models that emit <think> tokens (e.g. qwq, deepseek-r1) โ sharur surfaces these as EventThinkingDelta events by detecting the tag boundaries in the stream.
llama.cpp
Uses the OpenAI-compatible /v1/chat/completions endpoint. The context window (n_ctx) is queried from the server at startup. Image attachments are not supported because llama.cpp’s OpenAI endpoint does not accept multipart vision payloads in the standard format.
OpenAI
Uses the standard /v1/chat/completions streaming endpoint. Any server implementing this API โ vLLM, LM Studio, Groq, Together AI โ can be used by setting openAIBaseURL. Reasoning models (o3, o4-mini) emit reasoning_content deltas that are surfaced as EventThinkingDelta.
Anthropic
Uses the Messages API (/v1/messages) with streaming. Extended thinking is activated when req.Thinking is medium or high:
medium โ 10,000-token thinking budget
high โ 20,000-token thinking budget
The API requires temperature: 1.0 when extended thinking is enabled; the adapter sets this automatically and overrides any user-supplied temperature for that request.
Google
Uses the Gemini generateContent API via the google.golang.org/genai client library. Gemini 1.5 Pro and later have context windows of 1M+ tokens; compaction is rarely triggered for typical sessions.
Adding a Provider
Implement the Provider interface in internal/llm/yourprovider.go and register it in internal/config/factory.go. Implement ModelLister to enable --list-models. The adapter receives a fully-formed CompletionRequest; it is responsible for translating Message.ToolCalls and Message.Images into the target API’s format.
Each .jsonl file contains one JSON object per line:
Line 0 (header): kind=header โ session ID, parentId, model, timestamps, system prompt, compaction settings, dryRun flag
Subsequent lines: kind=message โ individual conversation messages with full payloads (role, content, thinking, tool calls, tool call ID)
Session Tree
Sessions form a linked tree via parentId. The session.Manager.BuildTree() method assembles all sessions from the project directory into a []*TreeNode tree. FlattenTree produces a depth-first flat list with structured layout metadata (gutters, connectors, indentation), which the TUI layer uses to render a clean Unicode box-drawing tree diagram.
flowchart TD
A["Session A
(root)"] --> B["Session B
(/branch from A)"]
A --> C["Session C
(/fork of A)"]
B --> D["Session D
(/branch from B at msg 5)"]
B --> E["Session E
(/rebase of B)"]
B --> F["Session F
(/merge into B)"]
style C stroke-dasharray: 5 5
/fork creates an independent copy (dashed border above) with no parentId link โ it does not appear as a child in the tree visualization.
Branching, Rebasing & Merging
flowchart TD
Q{"What do you need?"}
Q -->|"Explore an alternate
path from this point"| Branch["/branch [idx]
Child session, same history up to idx"]
Q -->|"Independent copy
no tree relationship"| Fork["/fork
Detached snapshot"]
Q -->|"Clean up the conversation
keep only specific messages"| Rebase["/rebase
Interactively select messages
for a new session"]
Q -->|"Combine two sessions
into one context"| Merge["/merge <id>
LLM-synthesized merge turn
appended to current session"]
Command
Creates parent link
Copies history
Interactive
/branch [idx]
โ
up to idx
โ
/fork
โ
full
โ
/rebase
โ
selected messages
โ
/merge <id>
โ
appends other session
LLM turn
The /tree modal (keyboard shortcut B, F, R on a selected session) exposes all of these without leaving the TUI.
Compaction & Context Management
To stay within LLM context windows, sharur implements an auto-compaction strategy:
Trigger: When tokens > ContextWindow - reserveTokens, compaction fires.
Summarization: The agent uses the LLM to generate a structured summary (<!-- sharur-summary -->) of the pruned messages.
File Tracking: The summary carries forward lists of files read and modified, so the assistant retains awareness of what it has already seen.
Split Turn Handling: If compaction cuts mid-turn, a “Turn Prefix Summary” is generated to preserve context for the remaining tool calls.
Session Tree Integration: Compaction events are stored as TypeCompaction records in the JSONL file, visible in /stats and preserved across restarts.
Compaction Configuration
// ~/.sharur/config.json or .sharur/config.json
{"compaction":{"enabled":true,"reserveTokens":2048,"keepRecentTokens":8192}}
Field
Default
Description
enabled
true
Whether auto-compaction fires when the token budget is exceeded
reserveTokens
2048
Tokens to keep free at the top of the context window; compaction triggers when used > window - reserveTokens
keepRecentTokens
8192
Minimum recent-turn tokens to always retain after compaction, ensuring the current conversation thread survives
Trigger compaction manually at any time with /compact in the TUI or by calling the Compact RPC directly.
Export & Import
Sessions can be exported to and imported from JSONL files:
# Export from TUI/export /path/to/session.jsonl
# Import into TUI (creates a new session from the file)/import /path/to/session.jsonl
# Export from CLI without entering TUIshr --export /path/to/session.html # HTML snapshot
Exported JSONL files are self-contained: they include the session header and all messages. Imported sessions are assigned a new UUID and added to the current project’s session directory.
historyEntry, contentItem, toolCallEntry โ render data model
utils.go
Helper functions (Capitalize)
Prompt Submission
Prompt submission uses promptGRPC(), which opens a client.Prompt() server-streaming RPC and drains *pb.AgentEvent messages into m.eventCh in a goroutine. The listenForEvent Bubble Tea command feeds that channel back into the update loop one event at a time.
Prompt History
The TUI maintains a per-session prompt history in m.promptHistory, synced from the service via GetMessages at startup and after session switches. Users navigate previous prompts using Up/Down arrow keys while the editor is focused; the current draft is preserved as m.draftInput.
Render Data Model
The TUI stores conversation history as []historyEntry. Each entry has an ordered []contentItem slice that preserves the exact stream order:
This mirrors the content[] array model, ensuring correct temporal ordering of thinking, text, and tool calls.
Modal System
Stats โ Token counts, session metadata, file/path info
Config โ Active model, provider, compaction settings
Session Tree โ Interactive paginated tree with structured branch visualization; supports Resume (Enter) and Branch (B)
Rebase Picker โ Selection interface for history manipulation
Merge Picker โ Fuzzy finder for selecting sessions to merge into the current conversation
Build & Release
sharur uses a combination of Mage and GitHub Actions for CI/CD.
Versioning
The project version is maintained in a VERSION file in the repository root. During build, Magefile.go reads this file and injects it into the binary using linker flags (-ldflags "-X main.version=...").
Mage Targets
Target
Description
Build
Compile shr for the current platform with version injection
Test
Run all unit tests with coverage
Vet
Static analysis with go vet
Lint
Run golangci-lint
Vuln
Vulnerability scan with govulncheck
All
Run generate, build, test, vet, lint, and vuln in sequence
Release
Cross-compile for Linux, macOS, and Windows (AMD64/ARM64), package into dist/
Generate
Run buf to regenerate protobuf stubs
Docs
Generate API reference (gomarkdoc) and build the Hugo site
DocsServe
Run Hugo dev server at localhost:1313 with live reload
PkgSite
Run pkgsite for local full API browsing including internals
CI/CD Pipelines
Continuous Integration (ci.yml)
Triggered on every push to main and all pull requests. Runs mage all within a Nix environment on both ubuntu-latest and macos-latest, then uploads per-platform binaries as build artifacts. Coverage is collected and summarised via go tool cover.
Automated Release (release.yml)
Triggered by pushing a version tag (e.g., v1.2.3). Runs mage release to build cross-platform assets and uses softprops/action-gh-release to publish them to a new GitHub Release.
Docs Deploy (docs.yml)
Triggered on push to main and on published releases. Runs mage docs (gomarkdoc + Hugo build) and deploys docs/public/ to the gh-pages branch via peaceiris/actions-gh-pages.