Design Doc - Claw Lens

System Overview

claw-lens is a local-based observability tool for AI agents. It reads the data OpenClaw already writes to disk — session logs, cache traces, cron jobs, agent memory files, and configuration — parses it into SQLite, and serves it to a React frontend through an Express API. The entire system runs locally. No external dependencies, no deployment. A single npx claw-lens-cli starts everything. Session logs go through the Parser into SQLite. Cache traces, cron jobs, agent memory files, and config are read directly by the API layer on demand.

Core Principles

Assumption: claw-lens users are not necessarily engineers. They’re tech-savvy, understand products and business, and know enough to ship with AI agents. They may not write code today, but they learn fast — and will gradually build deeper technical understanding as they go. The decisions below follow from this.

1. Zero Configuration

User assumption: The only prerequisite is Node.js. No additional infrastructure, no external services, no configuration files. How it shows up in code:

Embedded SQLite via better-sqlite3, single file at ~/.openclaw/claw-lens.db.
On startup: auto-creates schema (first run) or applies schema changes (version upgrade, e.g. adding provider on messages, arguments on tool_calls; clearing and rebuilding tables when scoring logic changes), ingests all session data (skipping unchanged files), loads model definitions from the local OpenClaw installation, and opens the browser.
Simplest entry point: npx claw-lens-cli — no install required. Also supports npm install -g claw-lens-cli for global install. Two CLI flags: --port (default 4242, also respects the PORT env var) and --no-open (suppress browser auto-open). OPENCLAW_HOME defaults to ~/.openclaw. Separate dev mode and production build workflows are available for contributors.

2. Cost First, Tokens on Demand

User assumption: The user cares about how much their agents cost. They understand $3.93 today instantly. They might not yet know what “42K input tokens” or “cache write tokens” means — but they will eventually, and when they do, the data is there. How it shows up in code:

The Overview KPI strip leads with Cost Today in USD, with 7-day and week-over-week comparison. Followed by Tokens Today, Sessions Today, Errors Today, and Cache Efficiency.
Cost is visible everywhere: per session in the session list, per model on the Overview, per agent on the Agents page.
fmtCost() formats all costs in USD with 4-decimal precision (e.g. $3.9253), down to per-message granularity. fmtTokens() formats token counts in human-readable scale (1.5M, 42.3K).
Token breakdown, cache hit rates (computed as cache_read / (cache_read + input_tokens)), per-model and per-agent cost splits, and cron vs. manual cost comparison are one click from the Overview (KPI cards link to the TokenUsage page). The detail is always there — the user gets to it when they need it.
Audit findings are classified by risk level (high/medium/low) and labeled with human-readable pattern names — e.g. “API Key / Access Token”, “AWS Access Key”, “Private Key (Credential)”, “Prompt Injection” — rather than raw pattern types or numeric scores.

3. Local-Based, Read-Only

claw-lens is a read-only observer. It reads the files OpenClaw already writes to disk and presents them — it does not instrument the agent runtime, does not modify session files, and does not send data off the machine. How it shows up in code:

Server binds to 127.0.0.1, CORS restricted to localhost. No outbound HTTP calls, no telemetry, no analytics. The only network connection is the WebSocket to the local OpenClaw Gateway.
All file operations are reads. The only file claw-lens writes is its own SQLite database (claw-lens.db). Stopping claw-lens has zero impact on running agents.

4. The Database Is a Cache, Not a Source of Truth

JSONL session files are the source of truth. The SQLite database is a derived index that can be rebuilt at any time. How it shows up in code:

ingestAll() scans all JSONL files and rebuilds the database on every startup (skipping unchanged files via mtime + size check).
Deleting claw-lens.db and re-running npx claw-lens-cli restores everything. No data is lost because nothing in SQLite is original — it all comes from files on disk.
When we change how data is computed (e.g. risk scoring logic), affected tables are cleared and re-ingested from source. This is safe precisely because the database is disposable.

Module Responsibilities

CLI Entry (`bin/claw-lens.ts`)

Accepts --port, --no-open flags and the OPENCLAW_HOME environment variable. Calls startServer() to boot the full service. Default port 4242.

Parser (`src/server/parser.ts`)

Input: JSONL files under ~/.openclaw/agents/*/sessions/, including archived files with .deleted and .reset suffixes. Responsibilities:

findSessionFiles(): Scans all agent directories, discovers session files, deduplicates by session ID (prefers active files).
parseSessionFile(): Parses JSONL line by line, extracting messages, tool calls, and session metadata.
Cron detection: A session is marked as cron if the first user message contains a [cron:UUID task-name] prefix.
Non-billable message filtering: Excludes internal Gateway messages (e.g. delivery-mirror, gateway-injected) that don’t represent actual LLM calls, so they don’t inflate cost or token counts.

Output: ParsedSession, ParsedMessage[], ParsedToolCall[].

Database (`src/server/db.ts`)

Responsibilities:

openDb(): Opens SQLite with WAL mode and foreign keys enabled.
initSchema(): Creates all tables on first run, applies additive schema changes on version upgrades.
ingestAll(): Iterates all discovered session files, calls parseSessionFile() + ingestSession() + ingestAuditEvents() for each. Change detection (mtime + size) skips unmodified files. Supports force mode for full wipe and rebuild.
ingestSession(): Per-session upsert transaction covering sessions / messages / tool_calls tables.
After ingestion, calls rebuildAllBaselines() to update per-agent behavioral profiles used by anomaly detection.

API Routes (`src/server/api/`)

Router	Endpoint	Purpose
`/api/sessions`	`GET /`	List all sessions with summary
	`GET /:id`	Single session + per-model breakdown
	`GET /:id/trace`	Live turn-by-turn trace parsed from JSONL
	`GET /:id/messages`	Per-message token timeline with seq and latency
	`GET /:id/messages/:msgId/content`	Raw message content from JSONL
`/api/timeline`	`GET /`	Timeline data bucketed by hour or day
	`GET /kpi`	KPI metrics (today/week/month/all-time costs and tokens)
`/api/tools`	`GET /`	Overall tool usage stats
	`GET /heatmap`	Assistant messages heatmap by date × hour-of-day
	`GET /:name/distribution`	Histogram + outliers for a single tool
`/api/stats`	`GET /`	Top-level summary numbers for dashboard header
	`GET /live-sessions`	All sessions active in last 3h for Live Monitor
	`GET /models`	Model cost breakdown
	`GET /agents`	Per-agent status & health
	`GET /agents/:name/daily`	Daily sessions + tokens for last 30 days
`/api/audit`	`GET /timeline`	Audit events timeline with filtering
	`GET /event/:id`	Full audit event detail with baseline context
	`GET /event/:id/following-calls`	External calls in same session after event
	`GET /findings`	Sensitive findings (credential detection, injections)
	`PATCH /findings/:id/dismiss`	Dismiss a finding
	`GET /agent-stats`	Per-agent risk stats and recommendations
	`GET /summary`	Overall audit summary (risk counts, findings)
	`GET /agents`	List distinct agents with audit data
	`GET /facets`	Distinct event types + severities for filters
	`GET /credential-inventory`	Credential exposure tracking
`/api/profiler`	`GET /sessions`	Session list with tool_ms metrics
	`GET /tokens`	Token consumption by agent
	`GET /tokens/:agent/sessions`	Top sessions by token for an agent
	`GET /loops`	Agent loop detector with depth analysis
`/api/cron`	`GET /jobs`	List all cron jobs
	`PATCH /jobs/:id`	Update cron job enabled status
	`GET /runs`	List cron job runs
`/api/debug`	`GET /status`	Cache-trace availability and metadata
	`GET /sessions`	Sessions parsed from cache trace
	`GET /session/:id/cache-replay`	Cache-trace replay stages for a session
	`GET /session/:id/context-window`	Context window usage timeline
	`GET /session/:id/timeline`	Session timeline with idle compression and tool metrics
`/api/tokens`	`GET /summary`	Token consumption summary
	`GET /trend`	Token trend data (bucketed by hour or day)
`/api/memory`	`GET /`	Agent memory file reads
`/api/refresh`	`POST /`	Force re-ingest

WebSocket Proxy (`src/server/api/live.ts`)

live.ts powers two user-facing capabilities:

File watcher: Watches session directories for .jsonl changes via fs.watch. On change, debounces 500ms, runs ingestAll(), then broadcasts data_updated to all connected browser clients. This keeps the dashboard current without manual refresh.
Live Monitor (/live page): Proxies real-time agent activity events from the OpenClaw Gateway to the browser via /ws/live, triggering immediate data refresh on the Live Monitor page without waiting for the 30-second polling interval.

The Gateway connection reads an auth token from ~/.openclaw/openclaw.json and sends it as a Bearer header on the WebSocket handshake. If the Gateway process is not running (or crashes), the connection drops — live.ts automatically retries with exponential backoff (1s, 2s, 4s, … capped at 30s) until the Gateway comes back. If the Gateway is not running, the file watcher still keeps the dashboard current. If session directories don’t exist yet (e.g. no agents have run), the UI falls back to 30-second polling.

Audit System (`src/server/audit/`)

Independent security audit subsystem:

audit-parser.ts: Reads session JSONL and processes tool calls in three passes — (1) build a tool call map from assistant messages, (2) match tool results, assess risk flags, scan for sensitive data and prompt injection, (3) compute final risk scores with full session context (e.g. was a discovered credential followed by an external call?).
risk-scorer.ts: Three-level risk scoring. High (3): rm -rf /, credential exfiltration, prompt injection. Medium (2): sudo, exposed secrets, new external domains. Low (1): unusual hours, volume spikes, atypical file paths. Events scoring 0 are not surfaced in the UI.
baseline.ts: Builds a per-agent behavioral profile from the last 30 days — top 20 tools, top 20 directories, top 12 active hours, average tool calls per session, known domains. Rebuilt after every ingestion cycle.
anomaly.ts: Compares each tool call against the agent’s baseline to detect deviations — activity outside typical hours, tool call volume >3x the session average, or file access outside typical directories. These deviations become Low-level risk flags.
sensitive-data.ts: 34 regex patterns covering API keys (Anthropic, OpenAI, AWS, GitHub, etc.), private keys, database URIs, PII. Matched secrets are masked: first 6 + •••••• + last 4 characters.
sensitive-paths.ts: File path pattern matching for sensitive locations — .ssh/, .env, keychain, credential files, PEM/PKCS12 keys. Includes a whitelist for OpenClaw’s own workspace paths.
injection-scanner.ts: 9 prompt injection patterns — instruction override, role hijack, exfil request, base64 payload, DAN/jailbreak, etc.

React Frontend (`src/ui/`)

Vite + React 19 + React Router. Dev server runs on port 6060, proxied to backend 4242.

Path	Component	Purpose
`/`	Overview	KPI dashboard
`/sessions`	Sessions	Session table + filters
`/agents`	Agents	Agent-level stats + memory
`/live`	LiveMonitor	Real-time Gateway monitor
`/audit`	Audit	Security audit timeline
`/profiler`	Profiler	Tool timing analysis
`/tokens`	TokenUsage	Token consumption breakdown
`/timeline`	SessionTimeline	Message timeline
`/memory`	Memory	Agent memory viewer
`/cron`	Cron	Scheduled task management
`/deepturn`	AgentLoops	Deep turn analysis
`/contextbreakdown`	DebugContext	Context window breakdown
`/cachetrace`	DebugReplay	Cache trace viewer
`/settings`	Settings	Configuration

Data Model

The diagram below shows how tables relate to each other. Full column definitions follow in the tables. sessions — one row per agent session.

Column	Type	Notes
`id`	text	PK, session UUID
`agent_name`	text	agent identifier
`started_at` / `ended_at`	int	Unix ms timestamps
`total_messages`	int	message count
`total_cost`	real	aggregated USD cost
`total_tokens`	int	aggregated tokens
`primary_model`	text	dominant model used
`error_count`	int	failed tool calls / errored messages
`is_cron`	int	1 if session started from a cron task
`cron_task`	text	cron task name if applicable
`task_summary`	text	extracted task description
`ingested_at`	int	Unix ms, last ingest timestamp

messages — one row per LLM turn (user / assistant / tool result).

Column	Type	Notes
`id`	text	PK, message UUID
`session_id`	text	FK → sessions.id
`agent_name`	text	denormalized for fast filtering
`parent_id`	text	parent message (for branching)
`timestamp`	int	Unix ms
`model` / `provider`	text	e.g. `claude-sonnet-4` / `anthropic`
`role`	text	`user` / `assistant` / `tool`
`input_tokens` / `output_tokens`	int	per-turn token usage
`cache_read` / `cache_write`	int	prompt cache hits/writes
`total_tokens`	int	sum of above
`cost_total`	real	per-turn cost
`cost_input` / `cost_output`	real	split by direction
`cost_cache_read` / `cost_cache_write`	real	cache-related cost
`stop_reason`	text	e.g. `end_turn`, `tool_use`, `max_tokens`
`error_message`	text	populated when call errored
`has_error`	int	boolean flag
`is_tool_result`	int	1 if message is a tool result (not user-authored)

tool_calls — one row per tool invocation inside an assistant message.

Column	Type	Notes
`id`	text	PK (composite with `message_id`)
`message_id`	text	PK, parent assistant message
`session_id`	text	FK → sessions.id
`agent_name`	text	denormalized
`timestamp`	int	Unix ms
`tool_name`	text	e.g. `bash`, `read`, `edit`
`duration_ms`	int	execution time
`success`	int	boolean flag
`arguments`	text	JSON-encoded arguments

audit_events — security-relevant events extracted from tool calls.

Column	Type	Notes
`id`	int	PK, autoincrement
`session_id`	text	FK → sessions.id
`agent_id`	text	agent identifier
`timestamp`	int	Unix ms
`event_type`	text	category (path_access, exec, web_fetch, …)
`tool_name`	text	source tool
`target`	text	file path, URL, or command
`extra_json`	text	structured context
`risk_flags`	text	CSV of triggered flags
`risk_score`	int	0-3 (none / low / medium / high)
`raw_input`	text	original tool input
`raw_output`	text	original tool output

sensitive_findings — secrets and prompt-injection patterns detected in message content.

Column	Type	Notes
`id`	int	PK
`audit_event_id`	int	FK → audit_events.id
`session_id`	text	FK → sessions.id
`agent_id`	text	agent identifier
`timestamp`	int	Unix ms
`pattern_type`	text	credential / api_key / prompt_injection / exfil_request / …
`pattern_matched`	text	the regex or keyword that matched
`context`	text	redacted snippet around the match
`severity`	text	`low` / `medium` / `high`
`dismissed`	int	user-acknowledged flag
`followed_by_external_call`	int	1 if a web/exec call followed the finding

ingest_state — per-file watermark for incremental parsing.

Column	Type	Notes
`file_path`	text	PK, absolute JSONL path
`mtime_ms`	int	last observed mtime in ms
`size_bytes`	int	last observed size in bytes
`ingested_at`	int	Unix ms

agent_baselines — per-agent behavioral profile over the last 30 days.

Column	Type	Notes
`agent_id`	text	PK
`computed_at`	int	Unix ms, last rebuild time
`common_tools`	text	JSON array of frequently used tools
`typical_paths`	text	JSON array of frequently accessed paths
`typical_hours`	text	JSON array of active hours
`avg_tool_calls_per_session`	real	avg tool calls per session
`known_domains`	text	JSON array of seen external domains

audit_ingest_state — per-file watermark for audit parsing (separate from session ingest).

Column	Type	Notes
`file_path`	text	PK, absolute JSONL path
`mtime_ms`	int	last observed mtime in ms
`size_bytes`	int	last observed size in bytes
`ingested_at`	int	Unix ms

settings — key-value store for schema version tracking and configuration.

Column	Type	Notes
`key`	text	PK (e.g. `audit_scoring_version`, `session_aggregate_version`)
`value`	text	version string or config value

Indexing strategy: sessions has separate indexes on agent_name and started_at; messages on session_id, timestamp, model; tool_calls on session_id, tool_name, timestamp; audit_events on (agent_id, timestamp), (event_type, timestamp), session_id. These cover the primary query paths from the API layer.

Technical Choices

SQLite (better-sqlite3) over PostgreSQL

SQLite is the only database that matches the operational model of a local-based tool. claw-lens is distributed via npx — there is no server to provision, no connection string to configure. The database is a single file at ~/.openclaw/claw-lens.db that lives next to the data it indexes. better-sqlite3 provides a low-overhead synchronous binding to SQLite’s C engine. The real performance gain comes from transaction batching: on startup, ingestAll wraps thousands of JSONL record upserts in a single db.transaction() block, which reduces fsync calls from one-per-row to one-per-transaction. WAL (Write-Ahead Logging) is enabled so that reads and writes can proceed independently — when you’re browsing the dashboard and a file watcher triggers a background re-ingest at the same time, your page loads don’t stall. Considered: PostgreSQL (operational mismatch — requiring a running database process for a local tool defeats the purpose), LevelDB/RocksDB (no SQL — aggregations and joins across sessions/messages/tool_calls would be painful), Prisma + SQLite (unnecessary abstraction for a schema we fully control). Trade-off: SQLite allows only one writer at a time. Under heavy ingestion, concurrent write attempts would queue behind a lock. In practice this is not an issue: claw-lens runs as a single Express server on one machine, and ingestion is the only write path. No concurrent write access from multiple processes, no cross-machine querying — both are acceptable for this use case.

Express over Fastify

Express is chosen for simplicity and debuggability. For a local, single-user dashboard, operational overhead matters more than performance. The router pattern maps cleanly to our ~10 API modules. Considered: Fastify (better performance and built-in schema validation, but both are unnecessary for internal APIs), Koa and Hono (similar capabilities but with smaller ecosystems or less mature tooling). Trade-off: Express lacks built-in validation and has middleware ordering pitfalls, but these are acceptable given the controlled environment.

WebSocket Proxy over Direct Browser Connection or SSE

claw-lens server connects to the OpenClaw Gateway over WebSocket (authenticated with a local token), and forwards events to browser tabs via its own WebSocket endpoint (/ws/live). The browser side is receive-only, so SSE would work. But the upstream is already WebSocket (ws library), and using the same protocol for the downstream half keeps it to one library, one connection model, and less code to maintain. This is a convenience choice, not a fundamental architectural requirement. Socket.IO adds a higher-level protocol layer we do not need. Polling is too latent for live monitoring. Trade-off: Reconnection is handled manually with exponential backoff (1s → 30s cap). If the gateway is unavailable, claw-lens still serves historical data — the system degrades gracefully.

Rule-Based Risk Scoring, not ML

The audit system uses deterministic rules rather than probabilistic models. Each tool call is evaluated against a fixed rule set — sensitive path access, dangerous command patterns, secret exposure, prompt injection signatures — and assigned a risk level (High / Medium / Low). The rules are transparent: anyone can read risk-scorer.ts and understand exactly why something was flagged. Considered: LLM-based classification (hallucination risk on security judgments — not acceptable for a security feature), anomaly-only detection (misses known-bad patterns that deterministic rules catch reliably). Trade-off: Rules can’t catch novel attack patterns. Anomaly detection (volume spikes, unusual hours, new domains) provides a second layer, but truly novel threats require rule updates.

Disposable Database, Rebuild-Based Schema Evolution

claw-lens is distributed via npm/npx, so upgrades must be zero-maintenance. There is no deploy pipeline, DBA, or migration window — but the previous claw-lens.db file may still exist on disk with an older schema. The key design choice is that the SQLite database is a derived cache, not the system of record. JSONL session files are the source of truth. That allows us to favor rebuild-based compatibility over complex migrations.

Startup validation: On startup, initSchema runs all table creation and schema checks before the API begins serving queries. The schema is guaranteed to be current before any request is handled.
Additive changes: New columns are added with ALTER TABLE ADD COLUMN; if the column already exists, the operation is skipped.
Breaking changes: Versioned derived data (e.g. scoring logic, data format) is invalidated via version keys in the settings table (e.g. audit_scoring_version, session_aggregate_version) and rebuilt from source JSONL files.
Failure recovery: If the database becomes inconsistent, deleting claw-lens.db and restarting safely rebuilds the cache from source.

This works because claw-lens does not treat SQLite as primary storage; it treats it as an index and query layer over durable files on disk.

What We Don’t Build

These are intentional exclusions, not a to-do list.

Not built	Why	Who this serves
Cloud sync	All data stays on the user’s machine. No account, no login, no data leaving localhost. A builder using AI agents for sensitive work shouldn’t worry about their session logs being uploaded anywhere.	Privacy-conscious users, enterprise builders.
Multi-tenant / team features	claw-lens is a single-player tool. One machine, one user, one SQLite file. Team observability is a different product with different trust boundaries.	Solo builders who want simplicity over collaboration overhead.
Instrumentation SDK	claw-lens doesn’t inject code into your agents. It doesn’t require you to add `import clawLens from 'claw-lens'` to your agent code. It reads JSONL files that OpenClaw already writes. Zero coupling.	Users who don’t want to modify their agent setup.
Alert routing / PagerDuty	Alerts were prototyped and removed (`DROP TABLE alert_history, alert_rules, alert_routing_policies`). A local dashboard that nobody else sees doesn’t need PagerDuty. The user is already looking at the screen.	Users who don’t have an ops team.

​System Overview

​Core Principles

​1. Zero Configuration

​2. Cost First, Tokens on Demand

​3. Local-Based, Read-Only

​4. The Database Is a Cache, Not a Source of Truth

​Module Responsibilities

​CLI Entry (bin/claw-lens.ts)

​Parser (src/server/parser.ts)

​Database (src/server/db.ts)

​API Routes (src/server/api/)

​WebSocket Proxy (src/server/api/live.ts)

​Audit System (src/server/audit/)

​React Frontend (src/ui/)

​Data Model

​Technical Choices

​SQLite (better-sqlite3) over PostgreSQL

​Express over Fastify

​WebSocket Proxy over Direct Browser Connection or SSE

​Rule-Based Risk Scoring, not ML

​Disposable Database, Rebuild-Based Schema Evolution

​What We Don’t Build