Internals

This section describes the high-level architecture of sharur: how its components are organized, how data flows through the system, and how the key abstractions relate to each other.

Directory Structure

sharur/
│   ├── internal/
│   │   ├── service/        # Central AgentService implementation + in-process client
│   │   ├── gen/            # Generated Protobuf stubs (pb.AgentServiceClient/Server)
│   │   ├── agent/          # Core agentic loop, event bus, state machine
│   │   ├── llm/            # LLM provider adapters (Ollama, OpenAI, Anthropic, llama.cpp, Google)
│   │   ├── tools/          # Built-in tool implementations + registry
│   │   ├── session/        # JSONL-backed session persistence, branching, tree
│   │   ├── modes/
│   │   │   ├── interactive/ # Bubble Tea TUI (pb client)
│   │   │   ├── print.go    # One-shot CLI JSONL mode (pb client)
│   │   │   └── grpc.go     # gRPC server mode (wraps Service)
│   │   ├── config/         # Config loading (global + project layering)
│   │   ├── themes/         # TUI colour themes
│   │   ├── types/          # Shared value types (Message, Session, ThinkingLevel)
│   │   ├── events/         # Generic publish-subscribe event bus
│   │   ├── skills/         # Skill discovery (Markdown files → slash commands)
│   │   ├── prompts/        # Prompt template discovery
│   │   └── contextfiles/   # Auto-discovered context file injection (AGENTS.md, etc.)
│   ├── cmd/                # Entry points (shr)
│   ├── proto/              # Protobuf definitions (sharur/v1/agent.proto)
│   ├── extensions/         # gRPC extension loader + proto definitions
│   └── sdk/                # Public Go SDK

Component Diagram

flowchart TD
    CLI["CLI flags & Config"] --> Svc

    subgraph core ["internal/agent"]
        Agent["Agent
Messages · SteerQueue · FollowUpQueue
StateMachine"]
        RunTurn["runTurn
provider.Stream · consumeStream · execTools"]
        EB["EventBus
async · non-blocking · 4096-item buffer"]
        Agent --> RunTurn
        RunTurn -->|publishes| EB
    end

    Svc["internal/service
AgentService"] --> core

    RunTurn --> LLM

    subgraph llm ["internal/llm"]
        LLM["Provider interface
Stream · Info"]
        Adapters["Ollama · OpenAI · Anthropic
llama.cpp · Google"]
        LLM --> Adapters
    end

    EB --> TUI["TUI"]
    EB --> JSON["JSON stdout"]
    EB --> GRPC["gRPC stream"]
    EB --> Session["session saver"]

Data Flow Summary

flowchart TD
    Input["User Input"] --> Mode["TUI · JSON · Remote Client"]
    Mode --> PBClient["pb.AgentServiceClient
bufconn or TCP"]
    PBClient --> Service["internal/service
getOrCreate / loadIfExists"]
    Service --> AP["agent.Prompt(ctx, text)"]
    AP --> MI["ext.ModifyInput()"]
    MI --> SS["ext.SessionStart() · ext.AgentStart()
EventAgentStart"]

    SS --> Loop

    subgraph Loop ["runTurn loop"]
        direction TB
        BP["ext.BeforePrompt() · ModifySystemPrompt()
ModifyContext() · BeforeProviderRequest()"]
        LLMStream["llm.Provider.Stream()
EventTextDelta · EventThinkingDelta · EventToolCall"]
        APR["ext.AfterProviderResponse()
EventTurnStart · ext.TurnStart()"]
        ToolExec["ext.BeforeToolCall() · execTool() · ext.AfterToolCall()
EventToolDelta · EventToolOutput"]
        TE["ext.TurnEnd()"]
        More{"more tool calls?"}
        BP --> LLMStream --> APR --> ToolExec --> TE --> More
        More -->|yes| BP
    end

    More -->|no| AgEnd["EventAgentEnd · ext.AgentEnd()"]
    AgEnd --> Save["service saves session to disk"]
    Save --> Stream["Stream Protobuf Events to client"]
    Stream --> Render["Render: TUI · JSONL stdout · gRPC stream"]

Agents

Agent Loop

The agent is driven by an event-bus (internal/events). Every meaningful state transition emits an agent.Event to all subscribers.

EventBus Performance

The EventBus is async and non-blocking. Publish() enqueues to a 4096-item buffered channel per subscriber and returns immediately — it never blocks the agent loop. Each subscriber runs in its own goroutine. Slow subscribers drop events to protect the agent loop from backpressure.

Event Flow

sequenceDiagram
    participant User
    participant Agent
    participant LLM
    participant Tools

    User->>Agent: Prompt(text)
    Agent->>Agent: EventAgentStart
    loop each LLM turn
        Agent->>Agent: EventTurnStart · EventMessageStart
        Agent->>LLM: provider.Stream()
        LLM-->>Agent: EventTextDelta (×n)
        LLM-->>Agent: EventThinkingDelta (×n, if thinking enabled)
        LLM-->>Agent: EventToolCall (×n, if tools requested)
        Agent->>Agent: EventMessageEnd
        loop each tool call
            Agent->>Tools: execTool()
            Tools-->>Agent: EventToolDelta (streaming)
            Agent->>Agent: EventToolOutput
        end
        Agent->>Agent: EventTurnEnd
    end
    Agent->>Agent: EventAgentEnd

State Machine

The agent transitions through explicit states to prevent concurrent modification:

stateDiagram-v2
    [*] --> Idle
    Idle --> Thinking : Prompt()
    Thinking --> Executing : tool calls present
    Thinking --> Idle : no tool calls
    Thinking --> Compacting : token limit reached
    Thinking --> Aborting : Abort() called
    Executing --> Thinking : more turns needed
    Executing --> Idle : done
    Compacting --> Thinking : resume
    Aborting --> Idle
    Thinking --> Error
    Error --> [*]

Prompt Queues

Two queues support non-blocking interaction while the agent is running:

SteerQueue — Injected as a user message at the next tool boundary (interrupt-style)
FollowUpQueue — Processed as a new turn after the agent goes Idle

Tool System

Tools implement a simple interface:

type Tool interface {
    Name() string
    Description() string
    Schema() json.RawMessage
    Execute(ctx context.Context, args json.RawMessage, update ToolUpdate) (*ToolResult, error)
    IsReadOnly() bool
}

A ToolRegistry holds all registered tools. During a turn, when the LLM emits a tool call, execTool looks up the tool by name, executes it, and streams partial output via EventToolDelta before emitting the final EventToolOutput.

Built-in tools: read, write, edit, bash, grep, ls, find

Safety Enforcements

Dry-Run Mode: When DryRun is enabled, any tool that is not marked as read-only will bypass execution and return a descriptive preview of what it would have done.
Input Sanitization: Prompt template expansion automatically wraps user inputs in <untrusted_input> tags to prevent prompt injection into the base instructions.

Agents

Service Architecture

sharur follows a Strict Protobuf Internal Architecture. Instead of UI modes calling Go functions directly, all interfaces are treated as clients of a central AgentService.

Protobuf Boundary

The interface between the UI and the core is defined in proto/sharur/v1/agent.proto. This boundary ensures:

Consistency: All modes (TUI, CLI, JSON, Remote gRPC) use the exact same code paths and logic.
Decoupling: UI logic is completely isolated from agent state, session persistence, and provider adapters.
Interoperability: Any gRPC-capable client can interact with a sharur service.

In-Process Communication

For local CLI usage, sharur uses a specialized In-Process Client (internal/service/client.go). It uses bufconn to implement the pb.AgentServiceClient interface over an in-memory pipe. This provides the safety and structure of gRPC without the latency or configuration complexity of network ports.

Backend Service (`internal/service`)

The Service struct implements pb.AgentServiceServer. It owns the session.Manager and manages the lifecycle of agent.Agent instances. It translates between internal agent events (Go channels) and Protobuf event streams.

Session Loading Strategy

RPCs split into three lookup strategies:

Strategy	Used by	Behaviour
`getOrCreate(id)`	`Prompt`, `NewSession`	Always returns an entry — creates a fresh agent if `id` is unknown, loading from disk if a matching session file exists
`loadIfExists(id)`	`GetState`, `GetMessages`, `ConfigureSession`, `ForkSession`, `CloneSession`	Returns the entry if it is in memory or can be loaded from disk; returns `NotFound` for completely unknown IDs
`lookup(id)`	`Steer`, `Abort`, `FollowUp`, `StreamEvents`	In-memory only — these only make sense for a currently-running agent

This means a /resume <id> command can switch to any session ever saved to disk without a round-trip NewSession call: the first GetMessages or GetState call transparently loads it.

Providers

LLM Providers

Provider Interface

type Provider interface {
    Stream(ctx context.Context, req *CompletionRequest) (<-chan *Event, error)
    Info() ProviderInfo
}

All providers return a uniform Stream of Event values — text deltas, thinking deltas, tool calls, and usage. The agent’s consumeStream function normalizes these into the internal Message format, making the agent completely provider-agnostic.

CompletionRequest

type CompletionRequest struct {
    Model       string
    Messages    []types.Message
    Tools       []types.ToolInfo
    System      string
    Thinking    types.ThinkingLevel
    MaxTokens   int
    Temperature float64
    StreamOpts  StreamOptions
}

The BeforeProviderRequest extension hook receives this struct as JSON and can modify any field before it is sent to the provider — useful for overriding temperature, trimming the tool list, or adjusting MaxTokens per request.

ProviderInfo

type ProviderInfo struct {
    Name          string
    Model         string
    MaxTokens     int
    ContextWindow int  // 0 = unknown
    HasToolCall   bool
    HasImages     bool
}

Info() is called once at startup. The service uses ContextWindow to trigger compaction when the conversation grows too large. HasImages controls whether the TUI offers image attachment UI.

ModelLister

type ModelLister interface {
    ListModels() ([]string, error)
}

All five adapters implement ModelLister. When --list-models is passed, the CLI casts the active provider to ModelLister and prints the result. Each adapter queries the appropriate API:

Provider	Query mechanism
`ollama`	`GET /api/tags`
`llamacpp`	`GET /v1/models`
`openai`	`GET /v1/models`
`anthropic`	`GET /v1/models`
`google`	Gemini model list API

Supported Providers

Provider	Backend
`ollama`	Local Ollama server (HTTP)
`llamacpp`	llama.cpp server (HTTP, OpenAI-compatible)
`openai`	OpenAI API or any OpenAI-compatible endpoint
`anthropic`	Anthropic Messages API
`google`	Google Gemini API

Each adapter lives in internal/llm/ and translates the provider’s wire format into the uniform Stream abstraction.

Feature Matrix

Provider	Tools	Images	Thinking	Context Window
`ollama`	✓	✓	model-dependent	4096 (default)
`llamacpp`	✓	✗	✗	from server `n_ctx`
`openai`	✓	✓	reasoning models	model-dependent
`anthropic`	✓	✓	✓ extended	model-dependent
`google`	✓	✓	✗	1,000,000+

Per-Provider Notes

Ollama

The Ollama adapter uses the /api/chat endpoint with streaming enabled. Context window defaults to 4096 when not reported by the server. Thinking is supported on models that emit <think> tokens (e.g. qwq, deepseek-r1) — sharur surfaces these as EventThinkingDelta events by detecting the tag boundaries in the stream.

llama.cpp

Uses the OpenAI-compatible /v1/chat/completions endpoint. The context window (n_ctx) is queried from the server at startup. Image attachments are not supported because llama.cpp’s OpenAI endpoint does not accept multipart vision payloads in the standard format.

OpenAI

Uses the standard /v1/chat/completions streaming endpoint. Any server implementing this API — vLLM, LM Studio, Groq, Together AI — can be used by setting openAIBaseURL. Reasoning models (o3, o4-mini) emit reasoning_content deltas that are surfaced as EventThinkingDelta.

Anthropic

Uses the Messages API (/v1/messages) with streaming. Extended thinking is activated when req.Thinking is medium or high:

medium — 10,000-token thinking budget
high — 20,000-token thinking budget

The API requires temperature: 1.0 when extended thinking is enabled; the adapter sets this automatically and overrides any user-supplied temperature for that request.

Google

Uses the Gemini generateContent API via the google.golang.org/genai client library. Gemini 1.5 Pro and later have context windows of 1M+ tokens; compaction is rarely triggered for typical sessions.

Adding a Provider

Implement the Provider interface in internal/llm/yourprovider.go and register it in internal/config/factory.go. Implement ModelLister to enable --list-models. The adapter receives a fully-formed CompletionRequest; it is responsible for translating Message.ToolCalls and Message.Images into the target API’s format.

Sessions

Session Management

Sessions are persisted as JSONL files in a project-aware directory:

~/.sharur/sessions/
  --Users-alice-Projects-myapp--/     ← sanitized CWD
    2026-04-23T07-06-54_{uuid}.jsonl  ← timestamped session file
    2026-04-23T09-12-11_{uuid}.jsonl

Session File Format

Each .jsonl file contains one JSON object per line:

Line 0 (header): kind=header — session ID, parentId, model, timestamps, system prompt, compaction settings, dryRun flag
Subsequent lines: kind=message — individual conversation messages with full payloads (role, content, thinking, tool calls, tool call ID)

Session Tree

Sessions form a linked tree via parentId. The session.Manager.BuildTree() method assembles all sessions from the project directory into a []*TreeNode tree. FlattenTree produces a depth-first flat list with structured layout metadata (gutters, connectors, indentation), which the TUI layer uses to render a clean Unicode box-drawing tree diagram.

flowchart TD
    A["Session A
(root)"] --> B["Session B
(/branch from A)"]
    A --> C["Session C
(/fork of A)"]
    B --> D["Session D
(/branch from B at msg 5)"]
    B --> E["Session E
(/rebase of B)"]
    B --> F["Session F
(/merge into B)"]

    style C stroke-dasharray: 5 5

/fork creates an independent copy (dashed border above) with no parentId link — it does not appear as a child in the tree visualization.

Branching, Rebasing & Merging

flowchart TD
    Q{"What do you need?"}

    Q -->|"Explore an alternate
path from this point"| Branch["/branch [idx]
Child session, same history up to idx"]
    Q -->|"Independent copy
no tree relationship"| Fork["/fork
Detached snapshot"]
    Q -->|"Clean up the conversation
keep only specific messages"| Rebase["/rebase
Interactively select messages
for a new session"]
    Q -->|"Combine two sessions
into one context"| Merge["/merge <id>
LLM-synthesized merge turn
appended to current session"]

Command	Creates parent link	Copies history	Interactive
`/branch [idx]`	✓	up to `idx`	✗
`/fork`	✗	full	✗
`/rebase`	✓	selected messages	✓
`/merge <id>`	✗	appends other session	LLM turn

The /tree modal (keyboard shortcut B, F, R on a selected session) exposes all of these without leaving the TUI.

Compaction & Context Management

To stay within LLM context windows, sharur implements an auto-compaction strategy:

Trigger: When tokens > ContextWindow - reserveTokens, compaction fires.
Summarization: The agent uses the LLM to generate a structured summary () of the pruned messages.
File Tracking: The summary carries forward lists of files read and modified, so the assistant retains awareness of what it has already seen.
Split Turn Handling: If compaction cuts mid-turn, a “Turn Prefix Summary” is generated to preserve context for the remaining tool calls.
Session Tree Integration: Compaction events are stored as TypeCompaction records in the JSONL file, visible in /stats and preserved across restarts.

Compaction Configuration

// ~/.sharur/config.json or .sharur/config.json
{
  "compaction": {
    "enabled": true,
    "reserveTokens": 2048,
    "keepRecentTokens": 8192
  }
}

Field	Default	Description
`enabled`	`true`	Whether auto-compaction fires when the token budget is exceeded
`reserveTokens`	`2048`	Tokens to keep free at the top of the context window; compaction triggers when `used > window - reserveTokens`
`keepRecentTokens`	`8192`	Minimum recent-turn tokens to always retain after compaction, ensuring the current conversation thread survives

Trigger compaction manually at any time with /compact in the TUI or by calling the Compact RPC directly.

Export & Import

Sessions can be exported to and imported from JSONL files:

# Export from TUI
/export /path/to/session.jsonl

# Import into TUI (creates a new session from the file)
/import /path/to/session.jsonl

# Export from CLI without entering TUI
shr --export /path/to/session.html   # HTML snapshot

Exported JSONL files are self-contained: they include the session header and all messages. Imported sessions are assigned a new UUID and added to the current project’s session directory.

TUI Internals

The TUI is built with Bubble Tea (v2) and organized into focused files:

File	Responsibility
`interactive.go`	`Run()` entry point, gRPC client wiring
`model.go`	`model` struct definition, `newModel()`
`update.go`	`Update()` — key handling, slash commands, picker logic, `promptGRPC()`
`events.go`	`handleAgentEvent()` — maps `*pb.AgentEvent` payloads to TUI history updates
`view.go`	`View()` — renders chat history, status bar, input
`modal.go`	Stats, Config, and Session Tree modal overlays
`slash.go`	Slash command parsing and handlers (all via gRPC client)
`picker.go`	Fuzzy picker component (sessions, skills, files, prompts)
`keys.go`	Keybinding helpers (`Matches`, `K.Ctrl(...)`)
`types.go`	`historyEntry`, `contentItem`, `toolCallEntry` — render data model
`utils.go`	Helper functions (`Capitalize`)

Prompt Submission

Prompt submission uses promptGRPC(), which opens a client.Prompt() server-streaming RPC and drains *pb.AgentEvent messages into m.eventCh in a goroutine. The listenForEvent Bubble Tea command feeds that channel back into the update loop one event at a time.

Prompt History

The TUI maintains a per-session prompt history in m.promptHistory, synced from the service via GetMessages at startup and after session switches. Users navigate previous prompts using Up/Down arrow keys while the editor is focused; the current draft is preserved as m.draftInput.

Render Data Model

The TUI stores conversation history as []historyEntry. Each entry has an ordered []contentItem slice that preserves the exact stream order:

historyEntry {
  role: "assistant"
  items: [
    { kind: contentItemThinking, text: "..." }
    { kind: contentItemText,     text: "..." }
    { kind: contentItemToolCall, tc: { id, name, arg, status, streamingOutput } }
    { kind: contentItemToolOutput, out: { toolCallID, content, isError } }
  ]
}

This mirrors the content[] array model, ensuring correct temporal ordering of thinking, text, and tool calls.

Stats — Token counts, session metadata, file/path info
Config — Active model, provider, compaction settings
Session Tree — Interactive paginated tree with structured branch visualization; supports Resume (Enter) and Branch (B)
Rebase Picker — Selection interface for history manipulation
Merge Picker — Fuzzy finder for selecting sessions to merge into the current conversation

Build & Release

sharur uses a combination of Mage and GitHub Actions for CI/CD.

Versioning

The project version is maintained in a VERSION file in the repository root. During build, Magefile.go reads this file and injects it into the binary using linker flags (-ldflags "-X main.version=...").

Mage Targets

Target	Description
`Build`	Compile `shr` for the current platform with version injection
`Test`	Run all unit tests with coverage
`Vet`	Static analysis with `go vet`
`Lint`	Run `golangci-lint`
`Vuln`	Vulnerability scan with `govulncheck`
`All`	Run generate, build, test, vet, lint, and vuln in sequence
`Release`	Cross-compile for Linux, macOS, and Windows (AMD64/ARM64), package into `dist/`
`Generate`	Run `buf` to regenerate protobuf stubs
`Docs`	Generate API reference (gomarkdoc) and build the Hugo site
`DocsServe`	Run Hugo dev server at `localhost:1313` with live reload
`PkgSite`	Run `pkgsite` for local full API browsing including internals

CI/CD Pipelines

Continuous Integration (`ci.yml`)

Triggered on every push to main and all pull requests. Runs mage all within a Nix environment on both ubuntu-latest and macos-latest, then uploads per-platform binaries as build artifacts. Coverage is collected and summarised via go tool cover.

Automated Release (`release.yml`)

Triggered by pushing a version tag (e.g., v1.2.3). Runs mage release to build cross-platform assets and uses softprops/action-gh-release to publish them to a new GitHub Release.

Docs Deploy (`docs.yml`)

Triggered on push to main and on published releases. Runs mage docs (gomarkdoc + Hugo build) and deploys docs/public/ to the gh-pages branch via peaceiris/actions-gh-pages.