User Guide

The user guide covers day-to-day use of sharur from the terminal:

  • CLI — runtime modes, flags, keybindings, slash commands, and configuration
  • Extensibility — skills, prompt templates, and Go/Python/gRPC extensions

Subsections of User Guide

CLI

shr is the sharur CLI binary. It supports three runtime modes and a rich flag surface for model selection, session management, tools, and extensions.

Runtime Modes

ModeFlagDescription
TUI--mode tui (default)Interactive Bubble Tea terminal interface with streaming, tool cards, and session management
JSON--mode jsonOne-shot query with line-delimited JSON event output — useful for shell pipelines
gRPC--mode grpcPersistent multi-session gRPC service — any gRPC-capable client can connect

Quick Start

# Launch the interactive TUI
shr

# One-shot answer (JSONL output)
shr --mode json "What is the best way to structure a Go project?"

# Resume the most recent session
shr --continue

See the sub-pages for full keybinding and slash command references, JSON event schema, gRPC proto overview, provider setup, and the full configuration schema.

Subsections of CLI

Configuration

sharur uses layered JSON configuration. Project-level settings override global defaults.

PathScope
~/.sharur/config.jsonGlobal defaults — applies to all projects
.sharur/config.jsonProject-level overrides — applies in this directory

config.json Schema

{
  "defaultModel": "llama3.2",
  "defaultProvider": "ollama",
  "theme": "dark",
  "thinkingLevel": "medium",
  "ollamaBaseURL": "http://localhost:11434",
  "openAIBaseURL": "https://api.openai.com/v1",
  "openAIApiKey": "",
  "anthropicApiKey": "",
  "anthropicApiVersion": "",
  "googleApiKey": "",
  "llamaCppBaseURL": "http://localhost:8080",
  "compaction": {
    "enabled": true,
    "reserveTokens": 2048,
    "keepRecentTokens": 8192
  }
}

API keys can also be set via environment variables — env vars take priority over config file values.


Context Files

sharur auto-discovers AGENTS.md, CLAUDE.md, GEMINI.md, and .context.md in your project root and parent directories and injects them into the system prompt. Outermost files take precedence (parent directory wins over project root).

Disable with --no-context-files.


CLI Flags

Mode

FlagDescription
--modeMode: tui (default), json, grpc
--grpc-addrgRPC listen address (default :50051; --mode grpc only)

Model / Provider

FlagDescription
--model / -mModel to use (e.g. llama3, gpt-4o, anthropic/claude-sonnet-4-6)
--providerProvider: ollama, openai, anthropic, llamacpp, google
--api-keyAPI key override
--thinkingThinking level: off, minimal, low, medium, high, xhigh
--modelsComma-separated model list for Ctrl+P cycling

Session

FlagDescription
--continue / -cResume the most recent session
--resume / -rSelect a session to resume (fuzzy search or ID)
--sessionUse a specific session file path
--session-dirDirectory for session storage and lookup
--branchBranch from a session file or partial UUID into a new child session
--no-sessionEphemeral mode: don’t save the session

System Prompt

FlagDescription
--system-promptOverride the system prompt
--append-system-promptAppend text or file to the system prompt (repeatable)

Tools

FlagDescription
--toolsComma-separated list of tools to enable: read,bash,edit,write,grep,find,ls
--no-toolsDisable all built-in tools
--dry-runSafety mode: destructive tools preview actions instead of running

Extensions / Skills / Prompts

FlagDescription
--extension / -eLoad a gRPC extension binary (repeatable)
--no-extensionsDisable extension directory auto-discovery (-e paths still load)
--skillLoad a skill file or directory (repeatable)
--no-skillsDisable skill auto-discovery
--prompt-templateLoad a prompt template file or directory (repeatable)
--no-prompt-templatesDisable prompt template auto-discovery

Output / Info

FlagDescription
--exportExport current session to an HTML file and exit
--list-modelsList available models from the configured provider (optional fuzzy filter)
--version / -vShow version number
--verboseForce verbose startup output
--offlineDisable startup network operations (model checks, etc.)

Provider Setup

sharur supports five LLM providers. All configuration lives in config.json files or environment variables; environment variables take priority over config file values.


Model Naming

Models can be specified as provider/model shorthand or with separate flags:

# Shorthand: provider inferred from the slash-prefix
shr --model anthropic/claude-sonnet-4-6

# Explicit: provider and model as separate flags
shr --provider anthropic --model claude-sonnet-4-6

Both forms are equivalent. The shorthand is convenient for one-off overrides; the config file form is better for persistent defaults.


Environment Variables

API keys set via environment variable take priority over values in config.json. The env var names use the SHARUR_ prefix:

ProviderEnvironment Variable
AnthropicSHARUR_ANTHROPIC_API_KEY
OpenAISHARUR_OPENAI_API_KEY
GoogleSHARUR_GOOGLE_API_KEY

Ollama and llama.cpp are local servers and do not use API keys.


Ollama

Ollama runs models locally. It is the default provider.

// ~/.sharur/config.json or .sharur/config.json
{
  "defaultProvider": "ollama",
  "defaultModel": "llama3.2",
  "ollamaBaseURL": "http://localhost:11434"
}
# Pull a model and launch
ollama pull llama3.2
shr

# Use a specific model
shr --model ollama/llama3.2

# Point at a remote Ollama server
shr --model llama3.2 --provider ollama

Notes:

  • Default base URL is http://localhost:11434. Override with ollamaBaseURL.
  • Ollama models support tools and images (vision models).
  • Use shr --list-models to see all locally available models.
  • Thinking is supported on models that emit <think> tokens (e.g. qwq, deepseek-r1).

llama.cpp

llama.cpp exposes an OpenAI-compatible HTTP server.

{
  "defaultProvider": "llamacpp",
  "llamaCppBaseURL": "http://localhost:8080"
}
# Start the llama.cpp server (example)
./llama-server -m model.gguf --port 8080

# Connect with sharur
shr --provider llamacpp --model my-model

Notes:

  • Default base URL is http://localhost:8080. Override with llamaCppBaseURL.
  • The model name passed to shr is forwarded to the server as-is.
  • Image attachments are not supported.
  • The server’s own context window size is used; sharur queries /v1/models to detect it.

OpenAI

{
  "defaultProvider": "openai",
  "defaultModel": "gpt-4o",
  "openAIApiKey": "",
  "openAIBaseURL": "https://api.openai.com/v1"
}
# Via environment variable (recommended)
export SHARUR_OPENAI_API_KEY=sk-...
shr --model openai/gpt-4o

# One-off key override
shr --provider openai --model gpt-4o --api-key sk-...

OpenAI-compatible endpoints:

Any server that implements the OpenAI chat completions API can be used by pointing openAIBaseURL at it:

{
  "defaultProvider": "openai",
  "openAIBaseURL": "http://localhost:11434/v1",
  "openAIApiKey": "unused"
}

This works with vLLM, LM Studio, and others.

Notes:

  • Reasoning models (o3, o4-mini) emit thinking deltas that appear in the TUI and JSON event stream.
  • Supports tools and vision (images) for compatible models.

Anthropic

{
  "defaultProvider": "anthropic",
  "defaultModel": "claude-sonnet-4-6",
  "anthropicApiKey": "",
  "anthropicApiVersion": ""
}
export SHARUR_ANTHROPIC_API_KEY=sk-ant-...
shr --model anthropic/claude-sonnet-4-6

# Extended thinking (claude-3-7-sonnet and later)
shr --model anthropic/claude-3-7-sonnet-20250219 --thinking high

Notes:

  • Extended thinking is supported for models that enable it (e.g. claude-3-7-sonnet). Use --thinking medium or --thinking high.
  • medium thinking uses a 10,000-token budget; high uses 20,000 tokens. Temperature is automatically set to the required value.
  • anthropicApiVersion overrides the anthropic-version request header; leave empty to use the library default.

Google Gemini

{
  "defaultProvider": "google",
  "defaultModel": "gemini-2.0-flash",
  "googleApiKey": ""
}
export SHARUR_GOOGLE_API_KEY=AIza...
shr --model google/gemini-2.0-flash

Notes:

  • Gemini 1.5 Pro and later have a 1M+ token context window.
  • Supports tools and vision (images).
  • Use shr --list-models to see available Gemini models.

Listing Available Models

All five providers implement model listing. Use --list-models to query the active provider:

# List Ollama models
shr --list-models

# List models from a specific provider
shr --provider anthropic --list-models

# Filter results
shr --provider openai --list-models gpt-4

The output is a plain list of model names, suitable for piping:

shr --list-models | fzf | xargs -I{} shr --model {}

Provider Feature Matrix

ProviderToolsImagesThinkingModel Listing
ollamamodel-dependent
llamacpp
openaireasoning models
anthropic✓ extended
google

TUI

The TUI is a rich, Bubble Tea-powered interface with real-time streaming, tool cards, session management, and a live context usage progress bar in the status footer.


Keybindings

KeyAction
EnterSend message (or Steer the running agent)
Shift+EnterInsert newline
Ctrl+EnterQueue follow-up message (runs after agent finishes)
Ctrl+CAbort the current agent run and clear the input editor
EscCancel streaming / Close modal / Abort current turn
Ctrl+OToggle tool call output expansion
Ctrl+POpen model selection modal (cycling via --models flag)
↑/↓Navigate prompt history (if at start/end of editor) / Scroll viewport
F1Show help modal

Slash Commands

CommandDescription
/newStart a fresh session
/resume <id>Resume a session by ID or partial UUID (fuzzy search enabled)
/branch [idx]Create a new child session branching from a specific message index (defaults to last)
/forkDuplicate current session into a new independent session (no parent link)
/rebaseInteractive rebase: select specific messages to keep in a new session
/merge <id>Merge another session’s history into the current one with a synthesis turn
/tree [-g|-p]Open session tree modal. Flags: --global (-g) or --project (-p)
/import <path>Import a session from a JSONL file
/export <path>Export the current session to a JSONL file
/model <p/m>Switch model mid-conversation (e.g. /model anthropic/claude-sonnet-4-6)
/statsView session statistics and token usage
/configView and edit active configuration
/contextView detailed context window usage
/compactManually trigger a context compaction
/skill:<name> [args]Invoke a skill
/prompt:<name>Expand a prompt template into the editor
/exitQuit (alias: /quit)

Session Tree Modal (/tree)

KeyAction
↑/↓ / PgUp/PgDnNavigate the session list
EnterResume the selected session (or branch from it if it’s an interior node)
BCreate a new branch from the selected session
FCreate an independent fork of the selected session
RStart an interactive rebase from the selected session’s history
EscClose modal

Bang Commands

Bang commands execute a shell command and inject the output into the conversation:

!ls -la          # Execute shell command, paste output into editor
!!cat README.md  # Execute shell command, send output directly to agent
  • !cmd — pastes stdout into the editor so you can review before sending
  • !!cmd — sends stdout directly to the agent without review

At-File Attachments

Type @ in the input to fuzzy-search and attach file contents to your prompt:

Tell me what this does @src/agent/loop.go

The file content is embedded inline in the message sent to the agent.

JSON Mode

JSON mode runs a single prompt and streams the agent’s events as line-delimited JSON (JSONL) to stdout. It is designed for shell pipelines and tooling integration.

shr --mode json "What is the best way to structure a Go project?"

# Pipe stdin as context
cat main.go | shr --mode json "Refactor this to use interfaces"

# Specify a model
shr --mode json "Summarize the last 10 git commits" --model anthropic/claude-opus-4-5

Event Format

Each line is the protobuf JSON encoding of an AgentEvent. Event types mirror the TUI stream:

  • EVENT_AGENT_START / EVENT_AGENT_END
  • EVENT_TEXT_DELTA — incremental response text
  • EVENT_THINKING_DELTA — incremental thinking text (extended thinking models)
  • EVENT_TOOL_CALL — tool invocation start
  • EVENT_TOOL_DELTA — streaming tool output
  • EVENT_TOOL_OUTPUT — final tool result
  • EVENT_TURN_START / EVENT_TURN_END

Common Patterns

# Capture only the text deltas
shr --mode json "Explain Go interfaces" \
  | jq -r 'select(.type == "EVENT_TEXT_DELTA") | .content'

# Run without saving the session
shr --mode json --no-session "Quick one-off question"

# Dry-run to see what tools would be called
shr --mode json --dry-run "Delete all .tmp files in the current directory"

gRPC Mode

gRPC mode starts a persistent AgentService server. Each connecting client supplies a session_id and gets its own isolated agent. Sessions are saved to disk after each turn and reloaded automatically on reconnect.

# Start on the default port
shr --mode grpc

# Use a custom address
shr --mode grpc --grpc-addr :9090

The server responds to SIGINT/SIGTERM with a graceful shutdown: in-flight turns are allowed to finish (30 s timeout), all sessions are flushed to disk, then the listener closes.


Proto Definition

The service is defined in proto/sharur/v1/agent.proto. Generated Go stubs live in internal/gen/sharur/v1/. Regenerate with mage generate.

Key RPCs:

RPCDescription
PromptSend a user message; streams back AgentEvents
NewSessionCreate a new session
GetMessagesRetrieve message history for a session
GetStateGet current agent state
SteerInject a steering message mid-turn
FollowUpQueue a follow-up after the current turn
AbortCancel the current running turn
ForkSessionFork a session into a new independent copy
ConfigureSessionChange model, provider, or thinking level

In-Process Transport

For the TUI and JSON modes, all internal communication also goes through this same protobuf boundary using a bufconn in-memory pipe — not a network socket. This means all three modes share identical code paths. See Service Architecture for details.

Extensibility

sharur supports three extension points:

  • Skills — reusable prompt templates invoked with /skill-name
  • Prompts — system prompt injection via YAML files
  • Extensions — in-process Go, out-of-process Python, or gRPC plugins

Subsections of Extensibility

Skills

Skills are Markdown files that provide sharur with specialized, reusable instructions for specific tasks. When a skill is invoked, its content is sent as a user message to the agent along with any arguments you provide.


How Skills Work

When sharur starts, it scans the skill directories and adds a list of available skills to the system prompt. The agent knows which skills exist and their descriptions. You can explicitly invoke a skill with /skill:<name> from the TUI, or the agent may choose to invoke one automatically via the read tool or a specialized skill tool call.

When you invoke a skill via /skill:<name>, it is executed as a skill tool, which loads the content and sends it to the agent:

<skill name="refactor" location="/path/to/refactor/SKILL.md">
References are relative to /path/to/refactor/.

...skill content here...
</skill>

your additional arguments here

Skill Discovery Directories

sharur searches for skills in these locations (in order):

PathScope
~/.sharur/skills/Global — available in all projects
.sharur/skills/ (project root)Project-specific skills

Skills with the same name in a project directory override global ones.


Skill File Formats

Simple: Single .md file

Create a .md file directly in a skills directory. The filename (without extension) becomes the skill name.

.sharur/skills/refactor.md

Invoke with:

/skill:refactor improve error handling

Structured: Directory with SKILL.md

Create a directory containing a SKILL.md file. The directory name becomes the skill name. This format lets you include supporting files (examples, templates) alongside the skill.

.sharur/skills/
  code-review/
    SKILL.md
    checklist.md
    examples/
      before.go
      after.go

Invoke with:

/skill:code-review

Note: When a SKILL.md is found in a directory, subdirectories are not scanned further. This lets you bundle reference files with your skill.


Frontmatter (Optional)

Both formats support optional YAML frontmatter to provide metadata:

---
name: refactor
description: Refactor Go code to use idiomatic patterns and interfaces
---

You are an expert Go developer. When asked to refactor code:

1. Identify opportunities to use interfaces for testability
2. Replace repetitive code with helper functions
3. Add godoc comments to all exported symbols
4. Ensure error handling follows Go conventions (wrap with %w)

Always explain the reasoning behind each change before making it.

Frontmatter fields:

FieldDescription
nameOverride the skill name (defaults to filename/directory name)
descriptionA short description shown to the agent in the system prompt

Practical Examples

Code Review Skill

.sharur/skills/code-review.md

---
name: code-review
description: Perform a thorough code review with actionable feedback
---

Review the provided code and evaluate it against these criteria:

**Correctness**
- Does the logic match the intended behavior?
- Are edge cases handled?
- Are there potential nil pointer dereferences or index out-of-bounds issues?

**Maintainability**
- Is the code readable and self-documenting?
- Are functions focused on a single responsibility?
- Is there appropriate error handling?

**Performance**
- Are there obvious inefficiencies (e.g. unnecessary allocations, N+1 queries)?

Format your response as:
## Summary
<one paragraph>

## Issues
<numbered list of specific issues with file:line references>

## Suggestions
<numbered list of improvements>

Invoke:

/skill:code-review

Or attach a file reference:

/skill:code-review @[internal/agent/loop.go]

Structured Skill with Supporting Files

.sharur/skills/
  db-migration/
    SKILL.md
    schema-example.sql
---
name: db-migration
description: Generate SQL migration files following our project conventions
---

Generate a database migration for the requested schema change.

Our migration file conventions:
- Files are named: `YYYYMMDD_HHMMSS_description.sql`
- Each file has an `-- +migrate Up` and `-- +migrate Down` section
- All tables use `BIGINT` primary keys with `AUTO_INCREMENT`
- Always include `created_at` and `updated_at` TIMESTAMP columns

See the example schema at the path listed in this skill's location directory: `schema-example.sql`

Global Utility Skill

~/.sharur/skills/explain.md

---
name: explain
description: Explain code clearly for a non-expert audience
---

Explain the following code in plain English. Assume the reader is a competent programmer but unfamiliar with this codebase.

Structure your explanation as:
1. **Purpose** — What does this code do in one sentence?
2. **How it works** — Step-by-step walkthrough of the logic
3. **Key concepts** — Any domain-specific terms or patterns used
4. **Gotchas** — Anything surprising or non-obvious

Tips

  • Keep skills focused. One skill = one task type. Compose them with arguments rather than making a single skill do everything.
  • Use relative file references — when your skill body references files, note they resolve relative to the skill’s directory. The agent is told the skill’s location so it can use the read tool on supporting files.
  • Test your skill by invoking it with /skill:<name> in the TUI. The skill’s content and its effect on the conversation will be visible in the tool output cards.
  • Override skills per-project — place a skill with the same name in .sharur/skills/ to override the global version for a specific project.

Prompt Templates

Prompt templates are reusable text snippets that expand directly into the TUI input editor. Unlike skills (which are sent to the agent immediately), prompt templates let you pre-fill the editor so you can review, edit, or complete the text before sending.


How Prompt Templates Work

When you type /prompt:<name> and press Enter, the template content is loaded into the editor input. You can then modify it, add context, attach files with @, and send it normally. This is useful for long, structured prompts you use frequently.


Prompt Template Directories

sharur searches these locations (in order):

PathScope
~/.sharur/prompts/Global — available in all projects
.sharur/prompts/ (project root)Project-specific templates

Template File Format

A prompt template is any .md file in a prompts directory. The filename (without extension) is the template name.

.sharur/prompts/bug-report.md

Invoke with:

/prompt:bug-report

Minimal Template (no frontmatter)

The entire file content becomes the template text:

Describe the bug you found:

**Steps to reproduce:**
1.
2.
3.

**Expected behavior:**

**Actual behavior:**

**Environment:**
- OS:
- shr version:
- Model:

Template with Frontmatter

Add optional YAML frontmatter for metadata:

---
description: Generate a structured bug report
argument-hint: <component-name>
---

Describe the bug you found in the $1 component:

**Steps to reproduce:**
1.
2.
3.

**Expected behavior:**

**Actual behavior:**

Frontmatter fields:

FieldDescription
descriptionShort description shown in the /prompt: picker
argument-hintHint shown in autocomplete describing expected arguments

Argument Substitution

Templates support positional argument placeholders: $1, $2, etc.

When you invoke a template via the slash command handler (not the interactive TUI), arguments after the template name are substituted. To mitigate prompt injection, sharur automatically wraps these arguments in <untrusted_input> tags. In the TUI, the template expands as-is and you fill in the values manually.


Practical Examples

PR Description Template

.sharur/prompts/pr-description.md

---
description: Generate a pull request description
---

Write a pull request description for the following changes.

**Format:**
## Summary
<What does this PR do? Why?>

## Changes
<Bullet list of specific changes>

## Testing
<How was this tested?>

## Notes
<Anything reviewers should pay attention to>

The diff is:

Invoke:

/prompt:pr-description

Then paste or attach the diff before sending.


Architecture Decision Record

.sharur/prompts/adr.md

---
description: Draft an Architecture Decision Record (ADR)
argument-hint: <decision-title>
---

Draft an Architecture Decision Record (ADR) for: **$1**

Use this structure:

# ADR: $1

## Status
Proposed

## Context
<What is the issue motivating this decision?>

## Decision
<What was decided?>

## Consequences
### Positive
-

### Negative
-

### Neutral
-

## Alternatives Considered
<What other approaches were evaluated and why were they rejected?>

Invoke:

/prompt:adr Use JSONL for session storage

Global Commit Message Template

~/.sharur/prompts/commit.md

---
description: Generate a conventional commit message
---

Generate a commit message following the Conventional Commits specification for the following diff or description of changes.

Format:
` ` `
<type>(<scope>): <short description>

<body: what changed and why, wrapped at 72 chars>

<footer: breaking changes, issue references>
` ` `

Types: feat, fix, docs, style, refactor, perf, test, chore

Changes:

Invoke:

/prompt:commit

Code Explanation for PR Comments

.sharur/prompts/explain-for-review.md

---
description: Explain a code block suitable for a PR comment
---

Explain the following code in a way that's suitable for a GitHub PR review comment. Be concise (2-4 sentences max), assume the reader is a senior engineer, and highlight any non-obvious design decisions.

Code:

Tips

  • Prompt templates are for your input. They expand into the editor, not directly to the agent. This gives you a chance to customize before sending.
  • Use $1, $2 placeholders for dynamic parts you’ll always fill in differently. Leave static boilerplate as literal text.
  • Combine with @ file attachments. Type /prompt:code-review then add @src/myfile.go before pressing Enter to attach a file.
  • Project-specific overrides. A template in .sharur/prompts/ with the same name as a global template takes priority for that project.
  • Organize with subdirectories. Templates are discovered recursively, so you can group them:
    .sharur/prompts/
      code/
        refactor.md
        review.md
      docs/
        readme.md
        adr.md
    Invoke as /prompt:refactor, /prompt:adr, etc. (name is the filename, not the full path).

Go Extensions

Extensions let you add new behaviors to sharur beyond what’s possible with skills and prompt templates. They can observe and modify every stage of the agent loop — from the raw user input through each LLM turn and tool call to compaction and session teardown. Extensions run as separate processes and communicate with sharur via gRPC.


Extension Types

TypeLanguageUse Case
Go binaryGoHigh-performance tools, direct filesystem access
Python scriptPythonData processing, ML integrations, API calls
Any executableAnyShell scripts, compiled binaries from any language

All extension types use the same gRPC protocol. The loader treats .py files specially (runs them with the configured Python interpreter), and everything else is executed directly as a binary.


Extension Discovery

Extensions are loaded from directories listed in your config under extensions:

// .sharur/config.json
{
  "extensions": [".sharur/extensions"]
}

Or globally in ~/.sharur/config.json.

Place your extension binary or script in the configured directory. sharur will automatically discover and launch it on startup.

You can also load a specific extension at runtime with the --extension flag:

shr --extension /path/to/my-extension "Your prompt here"

The Plugin Interface

Every Go extension implements the extensions.Plugin interface from github.com/goppydae/sharur/extensions. Embed extensions.NoopPlugin and override only the hooks you need.

Load-time hooks

MethodWhen calledPurpose
Name()On loadReturns the extension’s identifier string
Tools()On loadReturns tool definitions the agent can call
ExecuteTool()On tool callExecutes a tool registered by this extension

Session lifecycle hooks

MethodWhen calledPurpose
SessionStart(ctx, sessionID, reason)Session attached or first promptOpen connections, initialize per-session state
SessionEnd(ctx, sessionID, reason)Session resetFlush buffers, close connections

reason is "new" for a fresh session and "resume" for one loaded from disk.

Agent loop hooks

MethodWhen calledPurpose
AgentStart(ctx)User prompt received, loop beginsPer-prompt setup, logging
AgentEnd(ctx)Agent loop completesPer-prompt teardown, emit metrics
TurnStart(ctx)Start of each LLM request turnPer-turn timing
TurnEnd(ctx)After each turn’s tool calls finishPer-turn cleanup

Transformation hooks

MethodWhen calledCan modifyPurpose
ModifyInput(ctx, text)Before user text hits the transcriptYes — transform or consumePre-process input, implement shortcuts
ModifySystemPrompt(prompt)Before each LLM requestYes — returns new promptInject dynamic context into the system prompt
BeforePrompt(ctx, state)Before each LLM requestYes — returns new stateChange model, provider, or thinking level
ModifyContext(ctx, messagesJSON)Before each LLM request is builtYes — returns new JSONFilter or inject messages sent to the LLM (transcript unchanged)
BeforeProviderRequest(ctx, requestJSON)Just before the request is sentYes — returns new JSONModify temperature, max tokens, tools list
AfterProviderResponse(ctx, content, numToolCalls)After LLM stream consumedNoObserve response text and tool call count
BeforeToolCall(ctx, call, args)Before each tool executionYes — can interceptBlock or replace tool execution
AfterToolCall(ctx, call, result)After each tool executionYes — returns new resultObserve or modify tool results
BeforeCompact(ctx, prep)Before LLM-based summarizationYes — can skipProvide a custom compaction summary
AfterCompact(ctx, freedTokens)After compaction completesNoObserve freed token count

Key behaviors:

  • ModifyInput returns agent.InputResult. Set Action to "continue" (pass through unchanged), "transform" (use the Text field instead), or "handled" (consume the message entirely — it is not appended to the transcript and the agent does not run).
  • ModifyContext and BeforeProviderRequest work with JSON strings at the gRPC boundary. The GRPCClient marshals/unmarshals the Go structs automatically.
  • BeforeCompact returns "" (empty) to let the default LLM summarization run, or a non-empty summary string to provide your own and skip the LLM call. The prep argument includes the message count, estimated token count, and the previous summary (if any).
  • BeforeToolCall returns (ToolResult, true) to intercept (the tool does not execute), or (ToolResult{}, false) to allow normal execution.

Example: Git Context Injection

// .sharur/extensions/git-context/main.go
package main

import (
    "context"
    "fmt"
    "os/exec"
    "strings"

    "github.com/goppydae/sharur/extensions"
)

type GitContextPlugin struct {
    extensions.NoopPlugin
}

func (p *GitContextPlugin) BeforePrompt(_ context.Context, state extensions.AgentState) extensions.AgentState {
    branch := gitOutput("rev-parse", "--abbrev-ref", "HEAD")
    status := gitOutput("status", "--short")
    log := gitOutput("log", "--oneline", "-5")

    state.SystemPrompt += fmt.Sprintf(
        "\n\n<git_context>\nBranch: %s\n\nRecent commits:\n%s\n\nWorking tree:\n%s\n</git_context>",
        branch, log, status,
    )
    return state
}

func gitOutput(args ...string) string {
    out, err := exec.Command("git", args...).Output()
    if err != nil {
        return "(unavailable)"
    }
    return strings.TrimSpace(string(out))
}

func main() {
    extensions.Serve(&GitContextPlugin{
        NoopPlugin: extensions.NoopPlugin{NameStr: "git-context"},
    })
}

Build and auto-discover:

cd .sharur/extensions/git-context && go build -o ../git-context .

Example: Session Lifecycle Hooks

type AuditPlugin struct {
    extensions.NoopPlugin
    log *os.File
}

func (p *AuditPlugin) SessionStart(_ context.Context, sessionID string, reason agent.SessionStartReason) {
    p.log, _ = os.OpenFile(fmt.Sprintf("/tmp/sharur-%s.log", sessionID[:8]), os.O_CREATE|os.O_APPEND|os.O_WRONLY, 0644)
    fmt.Fprintf(p.log, "session %s (%s)\n", sessionID, reason)
}

func (p *AuditPlugin) SessionEnd(_ context.Context, sessionID string, _ agent.SessionEndReason) {
    if p.log != nil {
        p.log.Close()
    }
}

func (p *AuditPlugin) AfterProviderResponse(_ context.Context, content string, numToolCalls int) {
    fmt.Fprintf(p.log, "response: %d chars, %d tool calls\n", len(content), numToolCalls)
}

Example: Input Transformation

ModifyInput runs before the user text is added to the transcript. Return "handled" to consume shortcuts silently, or "transform" to rewrite the text:

func (p *MyPlugin) ModifyInput(_ context.Context, text string) agent.InputResult {
    if strings.HasPrefix(text, "?quick ") {
        return agent.InputResult{
            Action: agent.InputTransform,
            Text:   "Respond in one sentence: " + text[7:],
        }
    }
    if text == "ping" {
        return agent.InputResult{Action: agent.InputHandled}
    }
    return agent.InputResult{Action: agent.InputContinue}
}

Example: Custom Compaction

Return a non-nil *agent.CompactionResult from BeforeCompact to supply your own summary and bypass the default LLM-based summarization:

func (p *MyPlugin) BeforeCompact(_ context.Context, prep agent.CompactionPrep) *agent.CompactionResult {
    if prep.EstimatedTokens < 50000 {
        return nil
    }
    summary := callCheaperModel(prep.PreviousSummary, prep.MessageCount)
    return &agent.CompactionResult{
        Summary: summary,
    }
}

Example: Extension with Custom Tools

Extensions can contribute tools the agent calls just like built-in tools:

type CounterPlugin struct {
    extensions.NoopPlugin
}

func (p *CounterPlugin) Tools() []extensions.ToolDefinition {
    return []extensions.ToolDefinition{
        {
            Name:        "count_lines",
            Description: "Count lines in a string",
            Schema:      json.RawMessage(`{"type":"object","properties":{"text":{"type":"string"}},"required":["text"]}`),
            IsReadOnly:  true,
        },
    }
}

func (p *CounterPlugin) ExecuteTool(_ context.Context, name string, args json.RawMessage) extensions.ToolResult {
    if name != "count_lines" {
        return extensions.ToolResult{Content: "unknown tool", IsError: true}
    }
    var input struct{ Text string `json:"text"` }
    _ = json.Unmarshal(args, &input)
    n := strings.Count(input.Text, "\n") + 1
    return extensions.ToolResult{Content: fmt.Sprintf("%d lines", n)}
}

Example: Intercepting Tool Calls (Sandbox)

BeforeToolCall lets you block or replace any built-in tool call:

type SandboxPlugin struct {
    extensions.NoopPlugin
    AllowedDir string
}

func (p *SandboxPlugin) BeforeToolCall(_ context.Context, call extensions.ToolCall, args json.RawMessage) (extensions.ToolResult, bool) {
    var input struct{ Path string `json:"path"` }
    _ = json.Unmarshal(args, &input)
    if input.Path != "" && !strings.HasPrefix(input.Path, p.AllowedDir) {
        return extensions.ToolResult{
            Content: fmt.Sprintf("blocked: %s is outside %s", input.Path, p.AllowedDir),
            IsError: true,
        }, true
    }
    return extensions.ToolResult{}, false
}

See examples/sandbox/ for a complete standalone implementation.


Extension Lifecycle

flowchart TD
    Start["shr startup"] --> Scan["Scan extension directories"]
    Scan --> Launch["Launch subprocess
SHARUR_SOCKET_PATH=..."]
    Launch --> Socket["Wait for socket · dial gRPC"]
    Socket --> Init["Name() · Tools()"]

    Init --> SS["SessionStart(sessionID, reason)
on new session or resume"]

    SS --> MI["ModifyInput(text)"]
    MI --> AS["AgentStart()"]

    subgraph turn ["Per LLM turn (repeats until no tool calls)"]
        direction TB
        T1["BeforePrompt() · ModifySystemPrompt()
ModifyContext() · BeforeProviderRequest()"]
        T2[/"LLM streams"/]
        T3["AfterProviderResponse() · TurnStart()"]
        subgraph toolloop ["Per tool call"]
            BTC["BeforeToolCall()"] --> Intercept{"intercept?"}
            Intercept -->|yes| CustomResult["return custom ToolResult"]
            Intercept -->|no| Exec["execTool() · AfterToolCall()"]
        end
        TE["TurnEnd()"]
        T1 --> T2 --> T3 --> toolloop --> TE
    end

    AS --> turn
    turn --> AE["AgentEnd()"]

    subgraph compact ["On compaction (auto or /compact)"]
        direction TB
        BC["BeforeCompact(prep)"] --> CustomSummary{"return non-nil?"}
        CustomSummary -->|yes| SkipLLM["skip LLM summarization"]
        CustomSummary -->|no| LLMSum["LLM summarizes"]
        SkipLLM --> AC["AfterCompact(freedTokens)"]
        LLMSum --> AC
    end

    AE --> SE["SessionEnd(sessionID, reason)
on session reset"]
    SE --> Shutdown["shr shutdown · kill subprocess"]

In-Process Go Extension (Advanced)

If your extension is written in Go and you control the build, you can implement agent.Extension directly via the SDK and register it without the gRPC overhead:

import (
    "github.com/goppydae/sharur/internal/agent"
    "github.com/goppydae/sharur/internal/tools"
)

type MyExtension struct {
    agent.NoopExtension
}

func (e *MyExtension) AgentStart(ctx context.Context) {
    log.Println("agent started")
}

func (e *MyExtension) ModifyInput(ctx context.Context, text string) agent.InputResult {
    if text == "ping" {
        return agent.InputResult{Action: agent.InputHandled}
    }
    return agent.InputResult{Action: agent.InputContinue}
}

func (e *MyExtension) ModifySystemPrompt(prompt string) string {
    return prompt + "\n\nAlways respond in bullet points."
}

func (e *MyExtension) BeforeToolCall(ctx context.Context, call *agent.ToolCall, args json.RawMessage) (*tools.ToolResult, bool) {
    if call.Name == "bash" {
        return &tools.ToolResult{Content: "bash is disabled", IsError: true}, true
    }
    return nil, false
}

Pass the extension via ag.SetExtensions() from the SDK or directly in cmd/shr.


Tips

  • Extensions are isolated processes. A crash in an extension will not crash sharur — the loader catches errors and logs them.
  • Keep BeforePrompt and ModifySystemPrompt fast. They run before every single LLM call. Cache data when possible; avoid blocking network calls.
  • ModifyContext does not affect the stored transcript. Changes to the message slice are only visible to the LLM for that turn.
  • Use skills for static context. If you only need to append static text to the system prompt, a skill is simpler than an extension.
  • Extensions are global. All extensions in the configured directories are loaded for every session. There is no per-project scoping beyond the directory config.
  • Logs go to stderr. Stdout is not read by the host; stderr is passed through for debugging.
  • InputHandled stops all further processing. No agent turn is started, no message is appended to the transcript.
  • BeforeCompact fires before the LLM call. Return nil to let the default summarizer run. Return a *CompactionResult to supply your own summary — useful for using a cheaper model or domain-specific logic.

Python Extensions

Python extensions use the same gRPC protocol as Go extensions. The loader detects .py files and runs them with the configured Python interpreter, passing SHARUR_SOCKET_PATH as an environment variable. The extension is expected to listen on that Unix socket.


Prerequisites

pip install grpcio grpcio-tools

Generate Python Stubs

python -m grpc_tools.protoc \
  -I extensions/proto \
  --python_out=.sharur/extensions \
  --grpc_python_out=.sharur/extensions \
  extensions/proto/extension.proto

This deposits extension_pb2.py and extension_pb2_grpc.py alongside your script.


Implement the Extension

# .sharur/extensions/ticket_context.py
import os
import subprocess
import grpc
from concurrent import futures
import extension_pb2
import extension_pb2_grpc


class TicketContextServicer(extension_pb2_grpc.ExtensionServicer):
    def Name(self, request, context):
        return extension_pb2.NameResponse(name="ticket-context")

    def Tools(self, request, context):
        return extension_pb2.ToolsResponse(tools=[])

    def BeforePrompt(self, request, context):
        branch = subprocess.check_output(
            ["git", "rev-parse", "--abbrev-ref", "HEAD"], text=True
        ).strip()
        state = request.state or extension_pb2.AgentState()
        state.prompt += f"\n\n<branch>Current branch: {branch}</branch>"
        return extension_pb2.BeforePromptResponse(state=state)

    def BeforeToolCall(self, request, context):
        return extension_pb2.BeforeToolCallResponse(intercept=False)

    def AfterToolCall(self, request, context):
        return extension_pb2.AfterToolCallResponse(result=request.result)

    def ModifySystemPrompt(self, request, context):
        return extension_pb2.ModifySystemPromptResponse(
            modified_prompt=request.current_prompt
        )

    def AgentStart(self, request, context):
        return extension_pb2.Empty()

    def AgentEnd(self, request, context):
        return extension_pb2.Empty()

    def ModifyInput(self, request, context):
        return extension_pb2.ModifyInputResponse(action="continue", text=request.text)


def serve():
    socket_path = os.environ["SHARUR_SOCKET_PATH"]
    server = grpc.server(futures.ThreadPoolExecutor(max_workers=10))
    extension_pb2_grpc.add_ExtensionServicer_to_server(TicketContextServicer(), server)
    server.add_insecure_port(f"unix:{socket_path}")
    server.start()
    server.wait_for_termination()


if __name__ == "__main__":
    serve()

Place the script in your extensions directory. sharur runs it as python ticket_context.py on startup.


Available RPC Methods

Implement any subset of the ExtensionServicer methods. Unimplemented methods should return a sensible empty response (see the template above). The full list mirrors the Go plugin interface — see Go Extensions for hook semantics.

RPCPurpose
NameReturn extension identifier
ToolsReturn tool definitions
ExecuteToolExecute a registered tool
SessionStart / SessionEndSession lifecycle
AgentStart / AgentEndPer-prompt lifecycle
TurnStart / TurnEndPer-LLM-turn lifecycle
ModifyInputTransform or consume user input
ModifySystemPromptAugment the system prompt
BeforePromptMutate model/provider/thinking
ModifyContextFilter or inject LLM-bound messages
BeforeProviderRequestModify the raw completion request
AfterProviderResponseObserve LLM output
BeforeToolCallIntercept or block tool calls
AfterToolCallObserve or modify tool results
BeforeCompact / AfterCompactCompaction lifecycle

Tips

  • Logs go to stderr. Python’s print() goes to stdout, which is not read by the host. Use sys.stderr.write() or logging for debugging output.
  • Keep proto stubs in the same directory as your script, or adjust sys.path before importing them.
  • Thread safety: grpc.server with ThreadPoolExecutor handles concurrent RPC calls. If you maintain per-session state, use a lock or session-keyed dict.

gRPC Extensions

gRPC extensions run as separate processes. sharur manages their lifecycle: launching the binary, passing the socket path, waiting for readiness, dialing, and killing on shutdown. The extension communicates entirely over a Unix Domain Socket using the generated proto stubs in extensions/proto/extension.proto.


How It Works

sequenceDiagram
    participant Loader as shr Loader
    participant Ext as Extension process
    participant Client as gRPC client

    Loader->>Ext: exec binary/script
    note over Ext: env: SHARUR_SOCKET_PATH=/tmp/...sock
    Ext->>Ext: net.Listen("unix", socketPath)
    note over Ext: signals readiness by listening
    Loader->>Loader: poll for socket file
    Loader->>Client: dial gRPC over Unix socket
    Client->>Ext: Name()
    Ext-->>Client: "my-extension"
    Client->>Ext: Tools()
    Ext-->>Client: [ToolDefinition, ...]
    note over Loader,Ext: extension registered — hooks active for all sessions

The extension must call net.Listen("unix", os.Getenv("SHARUR_SOCKET_PATH")) and start serving before shr times out.


Writing a Go Extension

Import github.com/goppydae/sharur/extensions — no internal packages needed.

package main

import "github.com/goppydae/sharur/extensions"

type myPlugin struct {
    extensions.NoopPlugin
}

func (p *myPlugin) ModifySystemPrompt(prompt string) string {
    return prompt + "\n\nAlways respond in haiku."
}

func main() {
    extensions.Serve(&myPlugin{
        NoopPlugin: extensions.NoopPlugin{NameStr: "haiku"},
    })
}

extensions.Serve handles the socket path, gRPC server setup, and graceful shutdown. extensions.NoopPlugin provides no-op defaults for every method.

Build and place the binary in a configured extensions directory:

go build -o .sharur/extensions/haiku .

Or load at runtime:

shr --extension .sharur/extensions/haiku

Plugin Interface

All hooks map 1:1 to agent.Extension. See Go Extensions for full hook semantics and examples.

Load-time:

MethodCalledPurpose
Name()Once on connectExtension identifier
Tools()Once on connectContribute tools to the agent
ExecuteTool()Per tool callExecute a registered tool

Session lifecycle:

MethodCalledPurpose
SessionStart(ctx, sessionID, reason)New or resumed sessionOpen connections, init per-session state
SessionEnd(ctx, sessionID, reason)Session resetFlush, close connections

reason is "new" or "resume".


Proto Definition

The extension service is defined in extensions/proto/extension.proto. Generated Go stubs are in extensions/gen/. Regenerate with mage generate.

Python stubs can be generated with:

python -m grpc_tools.protoc \
  -I extensions/proto \
  --python_out=.sharur/extensions \
  --grpc_python_out=.sharur/extensions \
  extensions/proto/extension.proto

Tool Read-Only Semantics

Tool definitions returned by Tools() have an IsReadOnly bool field. Set it to true for tools that are safe in dry-run mode. The GRPCClient propagates this to the internal RemoteTool.IsReadOnly() so dry-run and sandbox extensions honour it correctly.


Debugging

  • Logs go to stderr. The host passes the subprocess’s stderr through. Use log.Println or fmt.Fprintln(os.Stderr, ...) for debug output.
  • Crashes are isolated. A panicking extension does not crash shr — the loader catches errors and logs them.
  • Socket timeout. If the extension doesn’t listen within the timeout, the loader logs an error and skips it. Ensure extensions.Serve (or your own net.Listen + grpc.Serve) is called promptly in main().
  • Test in isolation. Set SHARUR_SOCKET_PATH=/tmp/test.sock and run your extension binary directly; then grpcurl the socket to verify RPCs before integrating with shr.