Thrum Daemon Architecture

TL;DR: The daemon is the one process everything talks to. Start it with thrum init (auto-starts) or thrum daemon start. It handles messaging, sync, the web UI, and real-time notifications — all on one port. You rarely interact with it directly.

See also: System Overview for how the daemon fits into the larger Thrum ecosystem.

Overview

The Thrum daemon is a background service that manages the .thrum/ directory and the sync worktree (at .git/thrum-sync/a-sync/ on the a-sync orphan branch), handles client connections via Unix socket and WebSocket, serves the embedded web UI, and coordinates message synchronization with Git. It serves as the central coordinator for all Thrum clients (CLI, Web UI, and MCP server).

Architecture

┌─────────────┐  ┌──────────────┐  ┌──────────────┐
│   CLI       │  │   Web UI     │  │  MCP Server  │
│  (client)   │  │  (browser)   │  │ (thrum mcp)  │
└──────┬──────┘  └──────┬───────┘  └──────┬───────┘
       │ Unix socket     │ WebSocket       │ Unix socket
       │ JSON-RPC 2.0    │ JSON-RPC 2.0    │ JSON-RPC 2.0
       ▼                 ▼                 ▼
┌──────────────────────────────────────────────────┐
│                     Daemon                        │
├──────────────────────────────────────────────────┤
│ • Unix Socket Server    (.thrum/var/thrum.sock)   │
│ • WebSocket + SPA       (localhost:9999)           │
│ • Lifecycle (flock, JSON PID, defer cleanup)       │
│ • RPC Handlers          (agent, message, sync)     │
│ • Sync Loop             (60s interval)             │
│ • Stale Context Cleanup                            │
└──────────────────┬───────────────────────────────┘
                   │
                   ▼
  .git/thrum-sync/a-sync/    (worktree on a-sync branch)
  ├── events.jsonl           (agent lifecycle events)
  └── messages/              (per-agent message logs)
      └── {agent_name}.jsonl

  .thrum/
  ├── var/
  │   ├── thrum.sock         (Unix socket)
  │   ├── thrum.pid          (JSON PID file)
  │   ├── thrum.lock         (flock for SIGKILL resilience)
  │   ├── ws.port            (WebSocket port number)
  │   ├── sync.lock          (sync loop file lock)
  │   └── messages.db        (SQLite projection)
  ├── identities/            (per-worktree agent identity files)
  │   └── {agent_name}.json
  └── redirect               (feature worktrees only)

Components

1. Unix Socket Server

Location: internal/daemon/server.go

The socket server implements JSON-RPC 2.0 protocol for client-daemon communication.

Key features:

JSON-RPC 2.0 format:

// Request
{
  "jsonrpc": "2.0",
  "method": "health",
  "params": {},
  "id": 1
}

// Response (success)
{
  "jsonrpc": "2.0",
  "result": {"status": "ok"},
  "id": 1
}

// Response (error)
{
  "jsonrpc": "2.0",
  "error": {
    "code": -32601,
    "message": "Method not found"
  },
  "id": 1
}

2. PID File Management

Location: internal/daemon/pidfile.go

Manages process ID file with JSON metadata for daemon lifecycle.

JSON PID format:

{
  "pid": 12345,
  "repo_path": "/Users/leon/dev/myproject",
  "started_at": "2026-02-08T12:00:00Z",
  "socket_path": "/Users/leon/dev/myproject/.thrum/var/thrum.sock"
}

Functions:

PID file location: .thrum/var/thrum.pid

Features:

3. File Locking (flock)

Location: internal/daemon/flock.go, internal/daemon/flock_unix.go, internal/daemon/flock_other.go

Provides SIGKILL-resilient daemon process detection using OS-level file locking.

How it works:

Key functions:

4. Lifecycle Management

Location: internal/daemon/lifecycle.go

Manages daemon startup, signal handling, and graceful shutdown with defense-in-depth cleanup.

Startup sequence (Lifecycle.Run()):

  1. Acquire file lock (flock) for SIGKILL resilience
  2. Pre-startup validation: check for existing daemon (repo affinity)
  3. Write JSON PID file with metadata
  4. Register defer safety net (catches panics, early returns)
  5. Start Unix socket server
  6. Start WebSocket server (if configured), write port file
  7. Start signal handler goroutine
  8. Wait for shutdown signal

Signal handling:

Shutdown sequence:

  1. Stop WebSocket server, remove port file
  2. Stop Unix socket server (waits up to 5s for in-flight requests), remove socket
  3. Remove PID file
  4. Release file lock

Defer cleanup safety net:

5. RPC Handlers

Location: internal/daemon/rpc/

RPC method handlers implement daemon functionality. All handlers are registered on both the Unix socket and WebSocket servers unless noted.

Registered handlers:

Category Methods Notes
Health health
Agent agent.register, agent.list, agent.whoami, agent.listContext, agent.delete, agent.cleanup delete and cleanup are Unix socket only
Session session.start, session.end, session.list, session.heartbeat, session.setIntent, session.setTask
Message message.send, message.get, message.list, message.edit, message.delete, message.markRead
Subscription subscribe, unsubscribe, subscriptions.list Subscriptions auto-cleanup on session end (v0.4.3)
Sync sync.force, sync.status Both Unix socket and WebSocket
User user.register, user.identify user.register restricted to WebSocket transport

See RPC API Reference for full documentation.

Health check response:

{
  "status": "ok",
  "uptime_ms": 12345,
  "version": "1.0.0",
  "repo_id": "abc123",
  "sync_state": "synced"
}

6. WebSocket Server and Embedded SPA

Location: internal/websocket/, internal/web/

The WebSocket server provides real-time communication and serves the embedded web UI on a single port.

Key features:

Route layout (with UI):

Route layout (without UI):

Embedded SPA (internal/web/embed.go):

Browser auto-registration:

Components:

7. Sync Loop Integration

Location: internal/sync/loop.go, internal/daemon/rpc/sync_rpc.go

The sync loop runs as a goroutine within the daemon, periodically synchronizing JSONL data via Git.

Key features:

Sync cycle:

  1. Acquire sync lock (.thrum/var/sync.lock)
  2. Fetch remote (git fetch in sync worktree)
  3. Merge all JSONL files (events.jsonl + messages/*.jsonl) with dedup
  4. Apply new events to SQLite projection
  5. Push merged changes back to remote
  6. Release lock

RPC methods:

8. Agent Work Context (Live Git State)

Location: internal/gitctx/, internal/daemon/cleanup/

Tracks what each agent is working on in real-time by extracting Git state on heartbeat.

internal/gitctx/ package:

Extracts git-derived work context from a worktree path:

RPC methods:

Cleanup logic (internal/daemon/cleanup/):

Local-Only Mode

When your repository is public, the daemon's sync loop pushes the a-sync branch to origin, which would expose private agent messages. Local-only mode disables all remote git operations while keeping everything else working.

Enable local-only mode

# Via CLI flag
thrum daemon start --local

# Via environment variable
THRUM_LOCAL=1 thrum daemon start

The setting persists in .thrum/config.json:

{ "local_only": true }

Priority order: CLI flag > environment variable > config file > default (true via thrum init).

What changes in local-only mode

Component Normal mode Local-only mode
Messaging Works Works
Sessions Works Works
SQLite projection Works Works
WebSocket / MCP Works Works
git push origin a-sync Every 60s Skipped
git fetch origin a-sync Every 60s Skipped
Remote branch setup Automatic Skipped

Check if local-only mode is active

thrum sync status
# Shows "Mode: local-only" or "Mode: normal"

thrum sync force
# Shows "local-only (remote sync disabled)" when active

9. State Management

Location: internal/daemon/state/

The State struct manages the daemon's persistent state: JSONL event logs and SQLite projection.

Constructor: NewState(thrumDir, syncDir, repoID) splits paths:

Event routing (per-agent JSONL sharding):

On initialization:

  1. Opens/migrates SQLite database
  2. Runs JSONL sharding migration (monolithic messages.jsonl -> per-agent files) if needed
  3. Backfills event_id for events that lack it
  4. Creates writers for events.jsonl and lazy-creates per-agent message writers

Exported methods:

10. Timeout Enforcement (v0.4.3)

All I/O paths enforce timeouts to prevent indefinite hangs:

Path Timeout Implementation
CLI dial 5s net.DialTimeout
RPC call 10s context.WithTimeout
Server per-request 10s http.TimeoutHandler
WebSocket handshake 10s websocket.Upgrader
Git commands 5s/10s safecmd wrapper
SQLite queries context-scoped safedb wrapper

The safedb and safecmd packages wrap all database and command operations with context-aware timeouts. All DB operations go through safedb wrappers and all command executions go through safecmd wrappers for context-aware timeout enforcement.

Lock scope has been reduced in v0.4.3 — no mutex is held during I/O, git, or WebSocket dispatch operations.

11. Client Library

Location: internal/daemon/client.go

Client library for connecting to the daemon.

Key functions:

Auto-start logic:

  1. Try to connect to existing daemon
  2. If not running, start daemon in background
  3. Wait for socket to become available (10s timeout)
  4. Return connected client

Daemon Lifecycle

For setup instructions, see Quickstart Guide.

Daemon States

 NOT RUNNING
     │
     ▼
 STARTING ────────┐
     │            │ (error)
     ▼            ▼
 RUNNING ──────▶ ERROR
     │
     ▼
 SHUTTING DOWN
     │
     ▼
 STOPPED

Checking Status

# Check if daemon is running (shows repo path from JSON PID)
thrum daemon status

Stopping the Daemon

# Graceful stop
thrum daemon stop

# Force stop (if graceful fails)
kill <pid>
# flock auto-released by OS; PID file cleaned up on next start

Directory Structure

.git/thrum-sync/a-sync/          # Sync worktree on a-sync orphan branch
├── events.jsonl                # Agent lifecycle events (source of truth)
└── messages/                   # Per-agent message logs (source of truth)
    └── {agent_name}.jsonl

.thrum/                         # Gitignored entirely
├── var/                        # Runtime files
│   ├── thrum.sock              # Unix socket for CLI/RPC
│   ├── thrum.pid               # JSON PID file (PID, RepoPath, StartedAt, SocketPath)
│   ├── thrum.lock              # flock file for SIGKILL resilience
│   ├── ws.port                 # WebSocket port number
│   ├── sync.lock               # Sync loop file lock
│   └── messages.db             # SQLite projection (query cache)
├── identities/                 # Per-worktree agent identity files
│   └── {agent_name}.json
└── redirect                    # (feature worktrees only) points to main .thrum/

Key files:

Development

Running Tests

# All daemon tests
go test ./internal/daemon/...

# With coverage
go test -cover ./internal/daemon/...

# With race detector
go test -race ./internal/daemon/...

Adding New RPC Methods

  1. Create handler in internal/daemon/rpc/
  2. Implement Handle(ctx, params) method
  3. Register in daemon startup (cmd/thrum/main.go)
  4. Add tests in corresponding _test.go file

Example:

// internal/daemon/rpc/mymethod.go
type MyMethodHandler struct {
    // dependencies
}

func (h *MyMethodHandler) Handle(ctx context.Context, params json.RawMessage) (any, error) {
    // implementation
    return response, nil
}

// Register in daemon (cmd/thrum/main.go)
server.RegisterHandler("mymethod", myMethodHandler.Handle)

Troubleshooting

Daemon won't start

Check:

  1. Is .thrum/var/ directory writable?
  2. Is socket path too long? (Unix sockets have 104-char limit)
  3. Is another daemon already running?
# Check JSON PID file
cat .thrum/var/thrum.pid

# Check if flock is held
# (if daemon died via SIGKILL, flock is released but PID file may remain)

# Check if process is running
ps aux | grep thrum

# Remove stale PID file (only if process is definitely not running)
rm .thrum/var/thrum.pid

Socket connection errors

Check:

  1. Is daemon running?
  2. Socket permissions correct (should be 0600)
  3. Socket file exists?
# Check socket
ls -l .thrum/var/thrum.sock

# Test connection
echo '{"jsonrpc":"2.0","method":"health","id":1}' | nc -U .thrum/var/thrum.sock

Graceful shutdown hangs

The daemon waits up to 5 seconds for in-flight requests to complete. If shutdown hangs longer:

# Send SIGKILL (flock auto-released by OS)
kill -9 <pid>

# Stale files are cleaned up automatically on next daemon start
# Manual cleanup if needed:
rm .thrum/var/thrum.sock
rm .thrum/var/thrum.pid

Implemented Features

Epic Feature Status
Epic 2 Daemon core (Unix socket, JSON-RPC) Complete
Epic 3 Agent & session management Complete
Epic 4 Message send/receive Complete
Epic 5 Git sync protocol Complete
Epic 6 Subscription & notifications Complete
Epic 8 WebSocket server Complete
Epic F Embedded SPA (single port) Complete
Epic 21 Agent Work Context (live git state) Complete
DLH Daemon Lifecycle Hardening (flock, JSON PID, defer cleanup) Complete
MCP MCP Server (thrum mcp serve) Complete
JSONL Sharding Per-agent JSONL files, event_id, naming, cleanup Complete
Sync WT Sync worktree at .git/thrum-sync/a-sync/ Complete
Agent Naming Human-readable agent names (--name, THRUM_NAME) Complete
Agent Cleanup agent.delete, agent.cleanup (orphan detection) Complete
Browser Auth Browser auto-registration via git config Complete
Local-Only Mode Disable remote sync for public repos Complete
Backup/Restore JSONL export, SQLite snapshot, GFS rotation, plugin hooks Complete

Next Steps