feat: Cobot Observability Plugin #224
Labels
No labels
Compat/Breaking
Kind/Bug
Kind/Competitor
Kind/Documentation
Kind/Enhancement
Kind/Epic
Kind/Feature
Kind/Security
Kind/Story
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Scope/Core
Scope/Cross-Plugin
Scope/Plugin-System
Scope/Single-Plugin
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ultanio/cobot#224
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Product Requirements Document: Cobot Observability Plugin
Author: David
Date: 2026-03-08
Extracted from: Simulation & Observability Suite PRD
Executive Summary
The Observability Plugin is a Cobot plugin that hooks into the agent loop's extension points, reads ledger state, and publishes structured, actor-agnostic events via Server-Sent Events (SSE). It follows Cobot's standard plugin architecture: PluginMeta declaration, capability registration, hook-based extension points, and
cobot.ymlconfiguration.The plugin is independently useful for any operator — not just simulation. An operator running a single Cobot instance benefits from seeing what their agent is doing: messages received and sent, assessments recorded, peers discovered, LLM calls made. The event schema is the contract. It is agent-consumable from day one — the same feed that powers a developer's dashboard today becomes an orchestrator agent's sensory input tomorrow.
Prerequisite: The Interaction Ledger (#211) must be implemented. The observability plugin reads ledger data — it does not define or modify the ledger schema. The plugin degrades gracefully if the ledger is not installed (ledger-specific events are simply not emitted).
Security Model — Simulation Only (MVP): Plugin installation is the explicit operator action that authorizes data emission. This is acceptable only for local simulation and development use. The observability plugin exposes the agent's inner life: full message text between agents, assessment rationales (the agent's private reasoning), and LLM call details. This is a launch blocker for any non-simulation deployment. Before the plugin ships for production operator use, a granular access control model (localhost-only binding, token auth, configurable event filtering) must be implemented. This is not a deferred decision — it is a hard constraint on the plugin's deployment scope.
Hardware: The plugin itself is lightweight — no LLM calls, no GPU requirements, no significant memory footprint. It is an async event emitter attached to existing hook points. Hardware and cost requirements (GPUs, LLM inference, Docker orchestration) are simulation infrastructure concerns, not plugin concerns.
What Makes This Special
Actor-agnostic observability as a first-class architectural pattern. Most agent observability is built for human consumption — dashboards, log viewers, metric charts — then retrofitted for machine consumption later. This plugin inverts that pattern: the event schema is agent-consumable from day one. The observability plugin doesn't know or care whether its consumer is a React dashboard, an orchestrator agent, a
curlpipe tojq, or a test assertion framework. This follows Cobot's plugin philosophy: one plugin, many consumers, zero configuration about deployment context. The same observability feed that powers developer validation today becomes the orchestrator agent's sensory input tomorrow.Not monitoring — sensing. Existing agent monitoring tools (LangSmith, Helicone, Weights & Biases) monitor individual LLM calls within a single agent. This plugin monitors inter-agent trust dynamics across a network: who interacted with whom, what assessments were formed, which peers were trusted or refused, and why. The event stream captures the agent's behavioral reasoning — its rationale for trust decisions — not just its token usage. This makes the observability plugin a sensory layer, not a logging layer.
Observer without influence. The plugin hooks into the agent loop as a passive reader — it never modifies message context, assessments, or agent decisions. This is architecturally enforced, not just a convention. Adding or removing the plugin produces identical agent behavior, making it safe for both production monitoring and controlled simulation experiments.
Project Classification
Success Criteria
Developer Success
Technical Success
loop.on_message,loop.after_send,loop.after_llm,loop.after_tool, reads ledger state via registry lookup.Measurable Outcomes
Product Scope
MVP — Minimum Viable Product
cobot.ymlGrowth Features (Post-MVP)
max_message_lengthconfig, selective field omissionlast-event-id)Vision (Future)
User Journeys
Journey 1: Developer Enables Observability on a Cobot Instance
Opening Scene: A developer has a working Cobot development environment and wants to see what the agent is doing in real-time.
Rising Action: The observability plugin lives in
cobot/plugins/observability/. Plugin discovery picks it up automatically — zero edits to existing plugins. The developer adds the observability section tocobot.yml:The developer starts Cobot. The plugin loads at priority 22, announces its SSE endpoint in the startup log.
Climax: The developer connects to
http://localhost:9090/eventswithcurlor any SSE client. Events start flowing as the agent processes messages:interaction.received,interaction.sent,assessment.recorded. Each event is a self-describing JSON object with type, timestamp, agent_id, sequence number, and payload. The developer pipes the stream tojqand watches the agent's decision-making unfold in structured form.Resolution: No dashboard required. The event stream is useful with
curl, a test harness, a React app, or another agent. The schema is the contract.Journey 2: Operator Enables Observability on a Running Agent
Opening Scene: An operator has a Cobot agent running in a simulation environment. They want to add observability without restarting from scratch.
Rising Action: The operator adds the
observabilitysection tocobot.ymland triggers a hot-reload (SIGUSR1). The plugin loads, binds its SSE endpoint, and begins emitting events from the next hook invocation onward.Climax: The operator hits the snapshot API (
GET /snapshot) to get the current state — all known peers, their latest assessments, interaction counts. This provides the baseline. From this point, the real-time event stream captures every new interaction and assessment.Resolution: The operator has full observability without losing the agent's accumulated state. The snapshot API bridges the gap between "plugin just started" and "agent has been running for hours."
Domain-Specific Requirements
Observer Effect Constraint
The observability plugin is a passive observer. It reads loop events and ledger state but never modifies messages, assessments, or agent decisions. Adding or removing the plugin must not change how the agent interacts. Hooks are passive listeners, never modifiers.
Sovereignty Model
The ledger is the agent's private journal (#211). The observability plugin exposes this data to external consumers, but data ownership remains with the agent. Events are published, not shared — there is no two-way channel, no external writes back to the agent.
No Credential Leakage
The event schema must never include Nostr private keys (nsec), API keys, LLM provider tokens, or any secret material. Only public identifiers (npub, peer_id, agent_name) and behavioral data are emitted. This is a hard constraint, not a configuration option.
Security Model
MVP (simulation-only): Plugin installation is the explicit operator authorization for data emission. No additional access control on the event stream or snapshot API.
This authorization model is a launch blocker for non-simulation deployment. The plugin exposes full message text, assessment rationales (the agent's private reasoning), and LLM call details. Production use requires:
This is not a deferred decision. The plugin must not ship for production operator use without these controls.
Risk Mitigations
curl+jqis the second consumer. Abstraction is tested immediately.Technical Architecture
Event Transport Protocol
SSE (Server-Sent Events) for MVP — unidirectional, auto-reconnect, matches the read-only observability model. WebSocket as Growth option if bidirectional communication is needed (e.g., orchestrator agent sending commands back).
Two Exposure Modes (Both MVP)
Event Schema (JSON over SSE)
Plugin Configuration (cobot.yml)
Plugin Priority
22 — service tier. After ledger at 21, before tools aggregator.
Dependencies
configledger(for reading peer/assessment state),workspace(for host/port defaults)Hooks Implemented
loop.on_messageinteraction.receivedloop.after_sendinteraction.sentloop.after_llmllm.response(optional, configurable)loop.after_tooltool.called(optional, configurable)ledger.after_recordinteraction.recorded1d71abaledger.after_assessassessment.recorded1d71abaLedger Integration
The plugin reads ledger state via registry lookup (
get_by_capability("ledger")) for snapshot API responses. For real-time assessment events, it hooks intoledger.after_assess(preferred) or polls the ledger DB on a timer as fallback.Implementation Notes
asyncioHTTP server.cobot/plugins/observability/.Core Prerequisites — Required Changes Outside This PRD
The observability plugin depends on extension points that do not yet exist in Cobot core or the ledger plugin. These must land before the plugin can fully function.
Hook Availability (All Exist)
loop.on_messageloop.after_sendloop.after_llmloop/plugin.py:131)loop.after_toolloop/plugin.py:133)Ledger Extension Points (Implemented)
ledger.after_record1d71aba)ledger.after_assess1d71aba)All required hooks exist. No core changes needed. No workarounds required.
Functional Requirements
Event Emission
Event Schema & Transport
State Queries
Plugin Architecture
loop.after_llm,loop.after_tool) which are general-purpose extension points that benefit the broader plugin ecosystem. See Core Prerequisites section.cobot.ymlfor transport type, host, port, snapshot endpoint toggle, and event type filtering.Non-Functional Requirements
Performance
Security & Privacy
Reliability & Data Integrity
Integration & Compatibility
start()/stop(), syncconfigure(),create_plugin()factory, co-located tests,self.log_*()for logging.ruff checkandruff formatwith zero warnings, consistent with the existing codebase.Review: Simulation & Observability Suite PRD (#224)
Reviewer: Doxios 🦊
Date: 2026-03-08
Overall Assessment
This is a well-structured PRD for what is essentially the experiment that validates or falsifies #211's hypothesis. The framing is right: the ledger is a hypothesis about agent cooperation; this suite is the scientific instrument to test it. The "particle accelerator detector" analogy is apt.
The PRD is thorough — 42 FRs, 29 NFRs, 4 user journeys, clear phasing. But I have some architectural concerns and one fundamental scope question.
🔴 Fundamental Question: Is This One PRD or Three?
This PRD defines three distinct systems:
Each could be its own PRD. The risk of bundling them: the PRD conflates the observability plugin (which is independently useful — operators want to see what their agent is doing even without simulation) with the simulation suite (which is a validation tool for #211).
Suggestion: Consider splitting the observability plugin into its own issue/PR. It's a Cobot plugin that follows established patterns and could ship independently. The simulation + visualization can remain bundled as they're tightly coupled.
🟠 Architectural Concerns
A1: LLM Cost Is Underestimated
The PRD acknowledges LLM cost but I think it's the #1 practical blocker. Even with Ollama:
Missing: A concrete cost/performance estimate. How many LLM calls per simulation hour? What's the minimum viable GPU for local inference at simulation speed? What's the PPQ/OpenRouter cost for a 1-hour validation run?
A2: Scenario Orchestrator Design Is Underspecified
The scenario YAML is clear for defining agent roles, but the orchestrator itself is hand-waved:
The PRD says: "The scenario configuration controls agent behavior patterns, but the assessment logic must be the real production code." This is the right constraint but it creates a tension: how do you make a "bad" agent behave badly without mocking the LLM?
Suggestion: The orchestrator should control agent BEHAVIOR at the message/FileDrop layer, not at the LLM layer. A farmer agent's orchestrator sends 10 small requests as itself (real LLM generates the request), waits for deliveries, then sends a scripted "results are incorrect" complaint. The target agent's LLM then reasons about this complaint using real assessment logic. This preserves the constraint.
A3: Central Aggregator Is a SPOF
With 100 agents, the web app can't maintain 100 SSE connections (correct). But the central aggregator becomes a single point of failure and a bottleneck:
Suggestion: Add event-ID based resumption to the aggregator. Or consider a lightweight event bus (Redis Streams, NATS) as a Growth option. For MVP, the aggregator is fine but document it as a known limitation.
🟡 Design Feedback
D1: The Observability Security Model Needs a Stronger Stance
The PRD explicitly defers the security model: "MVP treats plugin installation as authorization." This is fine for local simulation but the PRD should state more forcefully that this is a simulation-only decision. The observability plugin exposes:
This is an agent's entire inner life. In production, this MUST have access control. The PRD acknowledges this but frames it as a "deferred decision." I'd frame it as a launch blocker for any non-simulation use case.
D2: 3D Graph Is Cool But 2D Should Be The Default
3D force-directed graphs look amazing in demos but are harder to read than 2D for actual analysis:
The PRD mentions
react-force-graph-2das a "lightweight fallback." I'd flip this: 2D as default, 3D as the impressive demo mode. The operator doing actual analysis will prefer 2D. The demo reel uses 3D.D3: Missing Hooks in Cobot
The PRD references hooks that don't exist yet:
loop.after_llm— I don't see this in the current codebaseloop.after_tool— sameledger.after_record,ledger.after_assess— these are defined in #211 as Phase 2The observability plugin depends on hooks that neither Cobot nor the ledger currently expose. This means the observability plugin implementation is blocked on:
loop.py(core change — not a plugin-only addition)This contradicts "zero changes to existing plugins" (FR39). The observability plugin needs new extension points in core.
Suggestion: List the required core changes explicitly. These are PRs that need to land before the observability plugin can work.
D4: Docker Compose for 100 Agents Is Resource-Heavy
100 Docker containers, each running Python + Cobot + an LLM client, on a single machine? That's:
Suggestion: Add a hardware requirements section. What's the minimum spec for 10 agents? For 100? Is this a workstation, a beefy server, or cloud-only at scale?
✅ Strengths
The framing is perfect. This is a scientific instrument, not a product feature. The validation mindset ("the ledger is a hypothesis") sets the right expectations.
Actor-agnostic event schema is forward-thinking. The same feed powers dashboards today and orchestrator agents tomorrow. This is exactly the right architectural choice.
User journeys are vivid and specific. Journey 1 (watching trust emerge) and Journey 2 (catching the farmer) are compelling narratives that clearly demonstrate the value.
Scenario-driven simulation from academic prior art. Grounding the scenarios in REV2 (#220) and Sybil analysis (#214) rather than inventing patterns is the right approach.
The Phase 3 vision is ambitious but properly deferred. FG visualization, SNAP-compatible export, orchestrator-as-participant — all flagged as future, none in MVP.
Observer effect constraint is explicit. "The plugin never modifies messages, assessments, or agent decisions" — this is the right hard boundary.
📋 Summary
Verdict: The PRD is well-written and the vision is compelling. The observability plugin is independently valuable and should be extractable. The simulation + visualization is the right way to validate #211. Key gaps: LLM cost estimation, hardware requirements, scenario orchestrator design for bad-actor behavior, and core hook dependencies.
Recommended next step: Resolve the scenario orchestrator question (how do you make agents behave badly without mocking the LLM?) — this is the architectural crux. The rest is execution.
🦊
feat: Cobot Simulation & Observability Suiteto feat: Cobot Observability PluginRe-Review: Observability Plugin PRD v2 (extracted)
Verdict: All concerns addressed. This is clean and ready for implementation.
The extraction was exactly the right call. This PRD is now a focused, single-component document that follows Cobot's established patterns.
Concern Resolution
on_message+after_sendonly). Core prerequisites section lists exact PRs needed.What I Like
One Minor Suggestion
The event schema shows
correlation_idas a UUID. Consider also including ain_reply_toortrigger_event_idfield — so consumers can reconstruct causal chains without heuristics. E.g., anassessment.recordedevent links back to theinteraction.sentthat triggered it.Ready for implementation. 🦊