Developer Guide
The developer guide covers two audiences:
The developer guide covers two audiences:
This section describes the high-level architecture of sharur: how its components are organized, how data flows through the system, and how the key abstractions relate to each other.
flowchart TD
CLI["CLI flags & Config"] --> Svc
subgraph core ["internal/agent"]
Agent["Agent
Messages · SteerQueue · FollowUpQueue
StateMachine"]
RunTurn["runTurn
provider.Stream · consumeStream · execTools"]
EB["EventBus
async · non-blocking · 4096-item buffer"]
Agent --> RunTurn
RunTurn -->|publishes| EB
end
Svc["internal/service
AgentService"] --> core
RunTurn --> LLM
subgraph llm ["internal/llm"]
LLM["Provider interface
Stream · Info"]
Adapters["Ollama · OpenAI · Anthropic
llama.cpp · Google"]
LLM --> Adapters
end
EB --> TUI["TUI"]
EB --> JSON["JSON stdout"]
EB --> GRPC["gRPC stream"]
EB --> Session["session saver"]flowchart TD
Input["User Input"] --> Mode["TUI · JSON · Remote Client"]
Mode --> PBClient["pb.AgentServiceClient
bufconn or TCP"]
PBClient --> Service["internal/service
getOrCreate / loadIfExists"]
Service --> AP["agent.Prompt(ctx, text)"]
AP --> MI["ext.ModifyInput()"]
MI --> SS["ext.SessionStart() · ext.AgentStart()
EventAgentStart"]
SS --> Loop
subgraph Loop ["runTurn loop"]
direction TB
BP["ext.BeforePrompt() · ModifySystemPrompt()
ModifyContext() · BeforeProviderRequest()"]
LLMStream["llm.Provider.Stream()
EventTextDelta · EventThinkingDelta · EventToolCall"]
APR["ext.AfterProviderResponse()
EventTurnStart · ext.TurnStart()"]
ToolExec["ext.BeforeToolCall() · execTool() · ext.AfterToolCall()
EventToolDelta · EventToolOutput"]
TE["ext.TurnEnd()"]
More{"more tool calls?"}
BP --> LLMStream --> APR --> ToolExec --> TE --> More
More -->|yes| BP
end
More -->|no| AgEnd["EventAgentEnd · ext.AgentEnd()"]
AgEnd --> Save["service saves session to disk"]
Save --> Stream["Stream Protobuf Events to client"]
Stream --> Render["Render: TUI · JSONL stdout · gRPC stream"]The agent is driven by an event-bus (internal/events). Every meaningful state transition emits an agent.Event to all subscribers.
The EventBus is async and non-blocking. Publish() enqueues to a 4096-item buffered channel per subscriber and returns immediately — it never blocks the agent loop. Each subscriber runs in its own goroutine. Slow subscribers drop events to protect the agent loop from backpressure.
sequenceDiagram
participant User
participant Agent
participant LLM
participant Tools
User->>Agent: Prompt(text)
Agent->>Agent: EventAgentStart
loop each LLM turn
Agent->>Agent: EventTurnStart · EventMessageStart
Agent->>LLM: provider.Stream()
LLM-->>Agent: EventTextDelta (×n)
LLM-->>Agent: EventThinkingDelta (×n, if thinking enabled)
LLM-->>Agent: EventToolCall (×n, if tools requested)
Agent->>Agent: EventMessageEnd
loop each tool call
Agent->>Tools: execTool()
Tools-->>Agent: EventToolDelta (streaming)
Agent->>Agent: EventToolOutput
end
Agent->>Agent: EventTurnEnd
end
Agent->>Agent: EventAgentEndThe agent transitions through explicit states to prevent concurrent modification:
stateDiagram-v2
[*] --> Idle
Idle --> Thinking : Prompt()
Thinking --> Executing : tool calls present
Thinking --> Idle : no tool calls
Thinking --> Compacting : token limit reached
Thinking --> Aborting : Abort() called
Executing --> Thinking : more turns needed
Executing --> Idle : done
Compacting --> Thinking : resume
Aborting --> Idle
Thinking --> Error
Error --> [*]Two queues support non-blocking interaction while the agent is running:
Tools implement a simple interface:
A ToolRegistry holds all registered tools. During a turn, when the LLM emits a tool call, execTool looks up the tool by name, executes it, and streams partial output via EventToolDelta before emitting the final EventToolOutput.
Built-in tools: read, write, edit, bash, grep, ls, find
DryRun is enabled, any tool that is not marked as read-only will bypass execution and return a descriptive preview of what it would have done.<untrusted_input> tags to prevent prompt injection into the base instructions.sharur follows a Strict Protobuf Internal Architecture. Instead of UI modes calling Go functions directly, all interfaces are treated as clients of a central AgentService.
The interface between the UI and the core is defined in proto/sharur/v1/agent.proto. This boundary ensures:
sharur service.For local CLI usage, sharur uses a specialized In-Process Client (internal/service/client.go). It uses bufconn to implement the pb.AgentServiceClient interface over an in-memory pipe. This provides the safety and structure of gRPC without the latency or configuration complexity of network ports.
internal/service)The Service struct implements pb.AgentServiceServer. It owns the session.Manager and manages the lifecycle of agent.Agent instances. It translates between internal agent events (Go channels) and Protobuf event streams.
RPCs split into three lookup strategies:
| Strategy | Used by | Behaviour |
|---|---|---|
getOrCreate(id) | Prompt, NewSession | Always returns an entry — creates a fresh agent if id is unknown, loading from disk if a matching session file exists |
loadIfExists(id) | GetState, GetMessages, ConfigureSession, ForkSession, CloneSession | Returns the entry if it is in memory or can be loaded from disk; returns NotFound for completely unknown IDs |
lookup(id) | Steer, Abort, FollowUp, StreamEvents | In-memory only — these only make sense for a currently-running agent |
This means a /resume <id> command can switch to any session ever saved to disk without a round-trip NewSession call: the first GetMessages or GetState call transparently loads it.
All providers return a uniform Stream of Event values — text deltas, thinking deltas, tool calls, and usage. The agent’s consumeStream function normalizes these into the internal Message format, making the agent completely provider-agnostic.
The BeforeProviderRequest extension hook receives this struct as JSON and can modify any field before it is sent to the provider — useful for overriding temperature, trimming the tool list, or adjusting MaxTokens per request.
Info() is called once at startup. The service uses ContextWindow to trigger compaction when the conversation grows too large. HasImages controls whether the TUI offers image attachment UI.
All five adapters implement ModelLister. When --list-models is passed, the CLI casts the active provider to ModelLister and prints the result. Each adapter queries the appropriate API:
| Provider | Query mechanism |
|---|---|
ollama | GET /api/tags |
llamacpp | GET /v1/models |
openai | GET /v1/models |
anthropic | GET /v1/models |
google | Gemini model list API |
| Provider | Backend |
|---|---|
ollama | Local Ollama server (HTTP) |
llamacpp | llama.cpp server (HTTP, OpenAI-compatible) |
openai | OpenAI API or any OpenAI-compatible endpoint |
anthropic | Anthropic Messages API |
google | Google Gemini API |
Each adapter lives in internal/llm/ and translates the provider’s wire format into the uniform Stream abstraction.
| Provider | Tools | Images | Thinking | Context Window |
|---|---|---|---|---|
ollama | ✓ | ✓ | model-dependent | 4096 (default) |
llamacpp | ✓ | ✗ | ✗ | from server n_ctx |
openai | ✓ | ✓ | reasoning models | model-dependent |
anthropic | ✓ | ✓ | ✓ extended | model-dependent |
google | ✓ | ✓ | ✗ | 1,000,000+ |
The Ollama adapter uses the /api/chat endpoint with streaming enabled. Context window defaults to 4096 when not reported by the server. Thinking is supported on models that emit <think> tokens (e.g. qwq, deepseek-r1) — sharur surfaces these as EventThinkingDelta events by detecting the tag boundaries in the stream.
Uses the OpenAI-compatible /v1/chat/completions endpoint. The context window (n_ctx) is queried from the server at startup. Image attachments are not supported because llama.cpp’s OpenAI endpoint does not accept multipart vision payloads in the standard format.
Uses the standard /v1/chat/completions streaming endpoint. Any server implementing this API — vLLM, LM Studio, Groq, Together AI — can be used by setting openAIBaseURL. Reasoning models (o3, o4-mini) emit reasoning_content deltas that are surfaced as EventThinkingDelta.
Uses the Messages API (/v1/messages) with streaming. Extended thinking is activated when req.Thinking is medium or high:
The API requires temperature: 1.0 when extended thinking is enabled; the adapter sets this automatically and overrides any user-supplied temperature for that request.
Uses the Gemini generateContent API via the google.golang.org/genai client library. Gemini 1.5 Pro and later have context windows of 1M+ tokens; compaction is rarely triggered for typical sessions.
Implement the Provider interface in internal/llm/yourprovider.go and register it in internal/config/factory.go. Implement ModelLister to enable --list-models. The adapter receives a fully-formed CompletionRequest; it is responsible for translating Message.ToolCalls and Message.Images into the target API’s format.
Sessions are persisted as JSONL files in a project-aware directory:
Each .jsonl file contains one JSON object per line:
kind=header — session ID, parentId, model, timestamps, system prompt, compaction settings, dryRun flagkind=message — individual conversation messages with full payloads (role, content, thinking, tool calls, tool call ID)Sessions form a linked tree via parentId. The session.Manager.BuildTree() method assembles all sessions from the project directory into a []*TreeNode tree. FlattenTree produces a depth-first flat list with structured layout metadata (gutters, connectors, indentation), which the TUI layer uses to render a clean Unicode box-drawing tree diagram.
flowchart TD
A["Session A
(root)"] --> B["Session B
(/branch from A)"]
A --> C["Session C
(/fork of A)"]
B --> D["Session D
(/branch from B at msg 5)"]
B --> E["Session E
(/rebase of B)"]
B --> F["Session F
(/merge into B)"]
style C stroke-dasharray: 5 5/fork creates an independent copy (dashed border above) with no parentId link — it does not appear as a child in the tree visualization.
flowchart TD
Q{"What do you need?"}
Q -->|"Explore an alternate
path from this point"| Branch["/branch [idx]
Child session, same history up to idx"]
Q -->|"Independent copy
no tree relationship"| Fork["/fork
Detached snapshot"]
Q -->|"Clean up the conversation
keep only specific messages"| Rebase["/rebase
Interactively select messages
for a new session"]
Q -->|"Combine two sessions
into one context"| Merge["/merge <id>
LLM-synthesized merge turn
appended to current session"]| Command | Creates parent link | Copies history | Interactive |
|---|---|---|---|
/branch [idx] | ✓ | up to idx | ✗ |
/fork | ✗ | full | ✗ |
/rebase | ✓ | selected messages | ✓ |
/merge <id> | ✗ | appends other session | LLM turn |
The /tree modal (keyboard shortcut B, F, R on a selected session) exposes all of these without leaving the TUI.
To stay within LLM context windows, sharur implements an auto-compaction strategy:
tokens > ContextWindow - reserveTokens, compaction fires.<!-- sharur-summary -->) of the pruned messages.TypeCompaction records in the JSONL file, visible in /stats and preserved across restarts.| Field | Default | Description |
|---|---|---|
enabled | true | Whether auto-compaction fires when the token budget is exceeded |
reserveTokens | 2048 | Tokens to keep free at the top of the context window; compaction triggers when used > window - reserveTokens |
keepRecentTokens | 8192 | Minimum recent-turn tokens to always retain after compaction, ensuring the current conversation thread survives |
Trigger compaction manually at any time with /compact in the TUI or by calling the Compact RPC directly.
Sessions can be exported to and imported from JSONL files:
Exported JSONL files are self-contained: they include the session header and all messages. Imported sessions are assigned a new UUID and added to the current project’s session directory.
The TUI is built with Bubble Tea (v2) and organized into focused files:
| File | Responsibility |
|---|---|
interactive.go | Run() entry point, gRPC client wiring |
model.go | model struct definition, newModel() |
update.go | Update() — key handling, slash commands, picker logic, promptGRPC() |
events.go | handleAgentEvent() — maps *pb.AgentEvent payloads to TUI history updates |
view.go | View() — renders chat history, status bar, input |
modal.go | Stats, Config, and Session Tree modal overlays |
slash.go | Slash command parsing and handlers (all via gRPC client) |
picker.go | Fuzzy picker component (sessions, skills, files, prompts) |
keys.go | Keybinding helpers (Matches, K.Ctrl(...)) |
types.go | historyEntry, contentItem, toolCallEntry — render data model |
utils.go | Helper functions (Capitalize) |
Prompt submission uses promptGRPC(), which opens a client.Prompt() server-streaming RPC and drains *pb.AgentEvent messages into m.eventCh in a goroutine. The listenForEvent Bubble Tea command feeds that channel back into the update loop one event at a time.
The TUI maintains a per-session prompt history in m.promptHistory, synced from the service via GetMessages at startup and after session switches. Users navigate previous prompts using Up/Down arrow keys while the editor is focused; the current draft is preserved as m.draftInput.
The TUI stores conversation history as []historyEntry. Each entry has an ordered []contentItem slice that preserves the exact stream order:
This mirrors the content[] array model, ensuring correct temporal ordering of thinking, text, and tool calls.
Enter) and Branch (B)sharur uses a combination of Mage and GitHub Actions for CI/CD.
The project version is maintained in a VERSION file in the repository root. During build, Magefile.go reads this file and injects it into the binary using linker flags (-ldflags "-X main.version=...").
| Target | Description |
|---|---|
Build | Compile shr for the current platform with version injection |
Test | Run all unit tests with coverage |
Vet | Static analysis with go vet |
Lint | Run golangci-lint |
Vuln | Vulnerability scan with govulncheck |
All | Run generate, build, test, vet, lint, and vuln in sequence |
Release | Cross-compile for Linux, macOS, and Windows (AMD64/ARM64), package into dist/ |
Generate | Run buf to regenerate protobuf stubs |
Docs | Generate API reference (gomarkdoc) and build the Hugo site |
DocsServe | Run Hugo dev server at localhost:1313 with live reload |
PkgSite | Run pkgsite for local full API browsing including internals |
ci.yml)Triggered on every push to main and all pull requests. Runs mage all within a Nix environment on both ubuntu-latest and macos-latest, then uploads per-platform binaries as build artifacts. Coverage is collected and summarised via go tool cover.
release.yml)Triggered by pushing a version tag (e.g., v1.2.3). Runs mage release to build cross-platform assets and uses softprops/action-gh-release to publish them to a new GitHub Release.
docs.yml)Triggered on push to main and on published releases. Runs mage docs (gomarkdoc + Hugo build) and deploys docs/public/ to the gh-pages branch via peaceiris/actions-gh-pages.
The github.com/goppydae/sharur/sdk package lets you embed a sharur agent in any Go program.
See the sub-pages for a quickstart, custom tool implementations, the EventBus API, and in-process extensions.
Import github.com/goppydae/sharur/sdk to embed an agent in any Go program.
| Call | Description |
|---|---|
sdk.NewAgent(cfg) | Create and initialize an agent |
ag.Subscribe(fn) | Register an event handler; called for every emitted event |
ag.Prompt(ctx, text) | Send a user message and start the agent loop |
ag.Idle() | Returns a channel that closes when the agent reaches Idle state |
ag.Steer(ctx, text) | Inject a steering message into the running turn |
ag.FollowUp(ctx, text) | Queue a message to process after the current turn |
ag.Abort(ctx) | Cancel the current running turn |
ag.SetExtensions(exts) | Replace the extension list (takes effect on next prompt) |
Subscribe to events by checking e.Type:
| Event type | Payload field | Description |
|---|---|---|
EventAgentStart | — | Agent loop started |
EventAgentEnd | — | Agent loop completed |
EventTurnStart | — | LLM turn started |
EventTurnEnd | — | LLM turn completed |
EventTextDelta | e.Content | Incremental response text |
EventThinkingDelta | e.Content | Incremental thinking text |
EventToolCall | e.ToolCall | Tool invocation started |
EventToolDelta | e.Content | Streaming tool output |
EventToolOutput | e.ToolOutput | Final tool result |
Pass sdk.DefaultTools() in sdk.Config.Tools to get the full set of built-in tools:
| Tool | Description |
|---|---|
read | Read file contents with offset/limit support |
write | Create or overwrite files |
edit | Search-and-replace edits within files |
bash | Execute shell commands |
grep | Search file contents via regex |
ls | List directory contents |
find | Locate files using glob patterns |
bash,write, andeditare destructive. In--dry-runmode they preview what they would do without executing.
Implement sdk.Tool to create a custom tool:
ToolUpdate is a callback for streaming partial output while the tool runs:
Register alongside the built-in tools:
Pass only the tools you want rather than the full default set:
Or build the list manually to include only read-only tools for a sandboxed agent.
The agent communicates state transitions via an event bus. Every meaningful action emits an sdk.Event to all registered subscribers.
Multiple subscribers are allowed. Each runs in its own goroutine. The EventBus is non-blocking — Publish enqueues to a 4096-item buffered channel per subscriber and returns immediately, so slow subscribers drop events rather than stalling the agent loop.
| Type constant | Payload | Fired when |
|---|---|---|
EventAgentStart | — | Prompt() called, agent loop begins |
EventAgentEnd | — | Agent loop completes (all turns done) |
EventTurnStart | — | An LLM request turn begins |
EventTurnEnd | — | A turn’s tool calls finish |
EventMessageStart | — | LLM starts streaming a response |
EventMessageEnd | — | LLM response stream complete |
EventTextDelta | e.Content string | Incremental response text chunk |
EventThinkingDelta | e.Content string | Incremental extended-thinking chunk |
EventToolCall | e.ToolCall | Tool invocation requested by LLM |
EventToolDelta | e.Content string | Streaming partial output from a running tool |
EventToolOutput | e.ToolOutput | Final tool result (success or error) |
The agent transitions through explicit states visible via EventAgentStart/EventAgentEnd and the ag.Idle() channel:
ag.Idle() returns a channel that closes when the agent returns to Idle. Use it to block until a prompt completes:
If your extension is written in Go and you control the build, you can implement sdk.Extension (an alias of agent.Extension) directly — no gRPC, no subprocess, no socket. This is the lowest-overhead extension path.
sdk.NoopExtension provides no-op defaults for every method. Embed it and override only what you need.
All types are re-exported from sdk so callers only need to import github.com/goppydae/sharur/sdk.
ModifyInput — runs before the user text is added to the transcript. Return an InputResult with:
sdk.InputContinue — pass through unchangedsdk.InputTransform — replace with result.Textsdk.InputHandled — consume entirely; no agent turn is started and nothing is appended to the transcriptModifyContext — receives and returns the message slice that will be sent to the LLM. Changes do not affect the stored session transcript — they are ephemeral per-turn.
BeforeToolCall — return (result, true) to intercept and block the tool; return (nil, false) to allow normal execution.
BeforeCompact — return nil to let the default LLM summarization run, or a *CompactionResult to supply your own summary and skip the LLM call.