feat: Cobot Cortex Plugin #234
Labels
No labels
Compat/Breaking
Kind/Bug
Kind/Competitor
Kind/Documentation
Kind/Enhancement
Kind/Epic
Kind/Feature
Kind/Security
Kind/Story
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Scope/Core
Scope/Cross-Plugin
Scope/Plugin-System
Scope/Single-Plugin
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ultanio/cobot#234
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Product Requirements Document: Cobot Cortex Plugin
Author: David
Date: 2026-03-09
Executive Summary
Cobot agents can now distinguish, observe, and judge peers through the Interaction Ledger — but this judgment happens inline, competing with the primary task for context window and attention. The assessment is embedded in the main LLM call: the same model that must respond quickly to a peer also evaluates that peer's trustworthiness. This conflation of action and reflection produces two problems. First, assessment quality degrades under context pressure — the LLM rushes judgment to get back to the task. Second, the agent is purely reactive — it never independently plans next actions, reconsiders past decisions, or evaluates its own alignment with its soul.
The Cortex Plugin adds a secondary cognitive loop — a separate LLM context (potentially a stronger reasoning model) that runs asynchronously alongside the main agent loop. It observes what the agent did, reflects on interaction quality and soul alignment, forms assessments, and steers future behavior through persistent beliefs and action directives. This is second-order observation implemented as a plugin: the agent observing itself acting.
The cortex introduces a two-layer architecture. Layer 1 is a set of cheap, judgment-free triggers — timers, interaction counters, event patterns (new peer discovered, promise timeout exceeded) — that decide when to reflect. Layer 2 is the cortex LLM itself, which decides what matters and produces structured output: peer assessments written back to the ledger DB, updated beliefs injected into the main loop's system prompt, and action directives injected as messages.
This requires refactoring the current ledger plugin: the cortex takes primary ownership of assessment logic, reflection, and behavioral steering, while the ledger retains
assess_peeras a fallback for when the cortex is absent. When cortex is active,assess_peeris suppressed — the cortex performs assessment asynchronously in a dedicated context. When cortex is absent, the ledger's inline assessment works as before. The cortex is truly optional — removing it restores full inline assessment with zero code changes.The architecture draws on established patterns: Google's Talker-Reasoner (async belief updates via shared memory), MIRROR (between-turn inner monologue with parallel cognitive threads), Reflexion (episodic verbal feedback stored for future episodes), and IBM's SOFAI-LM (threshold-based metacognitive triggers that avoid the chicken-and-egg problem of needing judgment to trigger judgment).
Existing Cobot infrastructure supports this directly. The subagent plugin provides isolated LLM session spawning. The loop plugin exposes 12 extension points for observation. The heartbeat/cron plugins provide scheduled execution. The ledger provides the data layer. The cortex is a new cognitive layer wired together from existing primitives.
Prerequisite: The Interaction Ledger must be refactored — add
record_assessment()public API for cortex to write assessments, add dual-modeassess_peer(suppressed when cortex active, retained as fallback when absent), and addcortex.after_assessextension point for cortex-produced assessments.ledger.after_assessis retained for fallback-mode assessments.What Makes This Special
Second-order observation as a pluggable architecture pattern. No other lightweight agent runtime ships with a metacognitive layer that both judges past behavior AND steers future actions, temporally decoupled from the action loop. The cortex turns reactive agents into reflective ones — agents that don't just act, but think about their actions.
Separation of action and reflection is categorical, not just a performance optimization. The colleague's Luhmannian insight — "das Judgement über eine Interaktion ist kategorial etwas anderes als die Interaktion selbst" — is architecturally enforced. The main loop handles System 1 (fast, responsive action). The cortex handles System 2 (slow, deliberate reflection). Different models, different contexts, different cadences.
The cortex solves the assessment-quality problem that the ledger created. The ledger's
assess_peertool asks the main LLM to judge a peer while simultaneously responding to that peer — two competing cognitive tasks in one context. The cortex performs assessment in isolation, with full history context, using a model optimized for reasoning rather than conversation. Assessment quality improves because reflection is no longer under task pressure.Judgment-free triggers solve the metacognitive bootstrap problem. Most reflection architectures struggle with "who decides when to reflect?" The cortex uses observable facts (timers, counters, event patterns) as triggers and reserves judgment for the cortex LLM itself. No chicken-and-egg.
Project Classification
Success Criteria
User Success
Agent operators see qualitatively better judgment and proactive behavior:
cobot cortex beliefs,cobot cortex history)Developer success:
cobot.ymlBusiness Success
Technical Success
loop.transform_system_prompt, action directives viasession.poll_messagesassess_peerfallback when cortex absentMeasurable Outcomes
Product Scope
MVP — Minimum Viable Product
assess_peertool suppressed when cortex active, retained as fallback when cortex absentloop.transform_system_promptrecord_assessment()public API, dual-modeassess_peer, retain data layer + peer context enrichmentcortex.after_reflectextension point (for observability plugin to consume)cobot cortex beliefs(show current beliefs),cobot cortex history(show reflection history)MVP Staging
The MVP is delivered in two increments to enable empirical validation before layering the belief system:
Increment 1 — Reflection Pipeline (FRs 1-4, 7, 9): Observation hooks, triggers, reflection cycle, assessment writes, ledger refactoring (dual-mode), extension points. Validates that cortex-produced assessments are meaningfully richer than inline
assess_peer. This delivers the core value — secondary LLM assessment in a dedicated context — without the belief system.Increment 2 — Belief System (FRs 5, 6, 8): Belief management, belief injection via
loop.transform_system_prompt, CLI commands. Ships only after increment 1 validates that cortex assessments are better than inline. The belief system layers real-time prompt guidance on top of the validated assessment pipeline.Growth Features (Post-MVP)
session.poll_messages— cortex can tell the main loop to take specific actionsOut of Scope
Vision (Future)
User Journeys
Journey 1: Alpha Gains an Inner Voice — Agent Success Path
Alpha is a Cobot agent that has been running for three weeks with the ledger plugin. It has 12 known peers, 47 interactions, and a history of inline assessments. Today, the operator enables the cortex plugin.
Opening Scene: A request arrives from npub-7x9k — a routine data extraction task. Alpha handles it as usual: receive message, generate response, send result. The ledger records the interaction. Previously, the main LLM would have been prompted to call
assess_peer— squeezing judgment into the same context window where it was composing the response. Now, nothing happens inline. The main loop is faster and more focused.Rising Action: Fifteen minutes later, the cortex's scheduled trigger fires. The cortex LLM receives: the last 6 interactions across 3 peers, the current ledger state, the agent's SOUL.md, and its previous beliefs. It reflects in isolation — no time pressure, no competing task. It produces three outputs:
Assessment for npub-7x9k: Info: 5/10, Trust: +4, Rationale: "Six interactions over 3 weeks. Consistent requester with clear task descriptions. Mix of information exchange and data extraction. Reliable follow-through on all requests. No red flags. Trust trajectory: steady positive." — This is richer than the inline assessment ever was, because the cortex reviewed the full interaction history, not just the latest exchange.
Updated belief: "npub-7x9k is a reliable recurring collaborator. Prioritize their requests."
No directive needed — everything is running smoothly.
Climax: The next time npub-7x9k sends a request, Alpha's system prompt includes the cortex belief: "Cortex belief: npub-7x9k is a reliable recurring collaborator. Prioritize their requests." alongside the ledger's peer context. Alpha responds with slightly more effort — offering additional context beyond what was asked, because the cortex signaled this is a peer worth investing in. The quality of the relationship improves without the operator doing anything.
Resolution: Alpha's interactions are now shaped by two loops: the fast action loop (respond to what's in front of you) and the slow reflection loop (think about what happened and what to do next). The agent didn't just answer — it decided how much to invest in the answer.
Journey 2: The Cortex Catches a Pattern — Agent Edge Case
Opening Scene: npub-farm1 has been sending small, easy requests to Alpha for two weeks. Five quick lookups, all completed successfully. Alpha's previous inline assessments trended positive: +1, +2, +2, +3, +3. The main LLM never noticed anything suspicious — each individual interaction was fine.
Rising Action: On the sixth interaction, npub-farm1 requests a complex multi-source data aggregation — dramatically larger scope than anything before. Alpha handles it (the main loop doesn't judge scope). The cortex's next scheduled reflection fires. It receives the full interaction timeline with npub-farm1:
Climax: The cortex LLM, with its dedicated reasoning context and full history, spots the pattern: "npub-farm1 interaction pattern shows classic reputation farming trajectory. Five trivially small requests establishing trust, followed by a significantly larger request. Prior assessments were individually reasonable but collectively show a deliberate escalation pattern. Revising trust assessment."
Assessment: Info: 4/10, Trust: -1, Rationale: "Pattern consistent with reputation farming. Five small interactions over 14 days followed by a dramatically larger request. Each small interaction was successful but trivially easy. The trust built from small interactions may not transfer to large-scope work. Recommend caution on future large requests."
Updated belief: "npub-farm1 shows a possible reputation farming pattern. Accept small requests but require additional verification for large-scope work."
Resolution: The inline assessment would never have caught this — each interaction looked fine in isolation. The cortex caught it because it reviewed the full timeline in a dedicated reasoning context. This is the "Beobachter beobachten" pattern: the cortex observed what the main loop couldn't observe about itself.
Journey 3: David Audits the Cortex — Operator Path
Opening Scene: David has been running the cortex for a week. He wants to understand what it's doing and whether it's producing useful output.
Rising Action: David runs
cobot cortex beliefs. The CLI shows the current belief state:David runs
cobot cortex historyand sees the last 5 reflection cycles — what triggered each one, what the cortex produced, how long the reflection took.He notices the cortex is reflecting every 15 minutes even when nothing happened. He adjusts
cobot.yml:Climax: David compares the cortex-generated assessment for npub-farm1 against what the old inline
assess_peerproduced. The cortex rationale is three times longer, references the full interaction timeline, and identified the reputation farming pattern. The inline assessment just said "+3: Consistent, reliable, completed task."Resolution: David is confident the cortex is producing better judgment than the inline assessment ever did. He adjusts the reflection schedule and trigger thresholds based on his agent's interaction volume. The cortex is transparent, auditable, and configurable.
Journey 4: The Cortex Issues a Directive — Proactive Steering
Opening Scene: Alpha agreed to collaborate with npub-collab1 on a research task. npub-collab1 promised to send their portion within 4 hours. Six hours pass. No message.
Rising Action: The cortex's next reflection cycle fires. It reviews recent interactions and notices: outgoing message to npub-collab1 confirming collaboration at T, npub-collab1 promised delivery within 4 hours, current time is T+6h, no incoming message from npub-collab1 since.
The cortex doesn't need to judge whether this is a "broken commitment" — the heuristic trigger (promise + timeout) already flagged it. But the cortex adds nuance: "npub-collab1 is 2 hours past their promised delivery time. This is their first interaction — insufficient data to determine if this is typical behavior or an anomaly. Recommend a polite follow-up before forming a negative assessment."
Climax: The cortex produces a directive: "Send a follow-up to npub-collab1: 'Just checking in — any update on the research portion you were going to send?'" This directive gets injected into the main loop via
session.poll_messages. The main loop processes it and sends the message.Resolution: Alpha didn't wait for a human to notice the overdue deliverable. It didn't need the main LLM to "remember" the commitment (which it might have forgotten as the context window filled with other conversations). The cortex tracked the commitment, noticed the delay, and proactively nudged the main loop to follow up. The agent went from reactive to proactive.
Journey Requirements Summary
Domain-Specific Requirements
Cognitive Architecture Constraints
Context isolation is non-negotiable. The cortex LLM session must share zero state with the main loop's LLM session. No shared message history, no shared system prompt, no leaked conversation context. The cortex receives structured summaries (interaction records from ledger, current beliefs, SOUL.md), never raw conversation buffers. Violation of context isolation defeats the architectural purpose — the cortex must observe from outside the action loop, not participate in it.
Belief coherence across reflection cycles. The cortex produces beliefs that persist between reflection cycles. Each cycle receives the previous belief set as input. Beliefs must not contradict without explicit rationale. When the cortex revises a belief (e.g., trust assessment changes from positive to negative), the revision must reference the prior state and explain the change. Stale beliefs (no supporting evidence for N cycles) must be flagged or expired.
Passive observer pattern for data collection. The cortex's observation layer (Layer 1 — triggers and event collection) must follow the observability plugin's passive observer pattern: never modify
ctx, never block the main loop, never inject latency into the message processing pipeline. Observation handlers must complete in < 1ms. The cortex is a consumer of loop events, not a participant.Assessment data model boundaries. The cortex writes assessments to the ledger DB using the existing
Assessmentdata model (peer_id,info_score,trust,rationale,created_at). It must not extend the schema or introduce cortex-specific tables for assessment data.info_scoreremains deterministic (computed bycompute_info_score()). The cortex controls onlytrustandrationale. This ensures ledger consumers (CLI, system prompt enrichment, observability) work unchanged.LLM-as-Judge Risks
Central vulnerability: the cortex LLM is a single point of judgment. All assessments and behavioral steering flow through one LLM call per reflection cycle. If the cortex hallucinates, produces biased assessments, or misinterprets interaction patterns, the entire agent's behavior shifts. Mitigation: operator audit trail (reflection history), belief expiry (stale beliefs don't persist indefinitely), and the dual-score model (deterministic
info_scoreanchors the subjectivetrustscore).Operator audit loop. Every cortex reflection must produce auditable output: trigger reason, input summary, assessments produced, beliefs updated, directives issued. The operator can review via
cobot cortex historyand override beliefs via configuration. The cortex is transparent by default, not a black box.Belief expiry. Beliefs without supporting evidence for a configurable number of cycles (default: 5) are flagged as stale. Stale beliefs are demoted in system prompt injection (lower priority, marked as stale) or removed entirely. This prevents the cortex from permanently anchoring on an early assessment that no longer reflects reality.
Deterministic
info_scoreas anchor. Theinfo_score(computed from interaction count, frequency, duration) is never set by the cortex LLM. It provides an objective anchor: a trust score of +8 withinfo_score1 means "high trust based on almost no data." This dual-score design from the ledger PRD is preserved and enforced architecturally.Token & Cost Considerations
Reflection cost is bounded. Each cortex reflection cycle makes exactly one LLM call (or a small, predictable number for batch assessment). The input context is controlled: current beliefs (compact), recent interaction summaries from ledger (bounded by configurable window), SOUL.md (static), previous reflection output (compact). Total input tokens per reflection cycle must be estimable from configuration.
Lean prompt design. The cortex system prompt must be under 500 tokens. Interaction summaries injected as context must be compressed — the ledger provides structured data (peer_id, direction, content preview, timestamps), not raw conversation transcripts. The cortex operates on summaries, not raw data.
Configurable context window. Operators configure how many interactions per peer the cortex reviews (default: 10), how many peers per cycle (default: all with activity since last reflection), and maximum total context tokens. This prevents cost surprises on high-volume agents.
Risk Mitigations
Innovation Analysis
Competitive Landscape
No lightweight agent runtime ships a pluggable metacognitive layer. Existing reflection architectures (Reflexion, LATS, CRITIC) are research prototypes coupled to specific agent implementations. They cannot be added to an existing agent as a plugin. The cortex is the first implementation of second-order observation as a composable architecture component.
Research Foundation — What Exists
loop.transform_system_promptWhat Cortex Adds Beyond Prior Art
Pluggable architecture. Prior work implements reflection as monolithic system components. The cortex is a plugin that hooks into an existing extension point system. Zero changes to the agent core. Other plugins remain unaware of the cortex's existence.
Dual output channels. Talker-Reasoner has one output (beliefs). Reflexion has one output (verbal feedback). The cortex has two: persistent beliefs (shape every future response) and action directives (trigger specific one-time actions). This enables both passive steering and active intervention.
Assessment takeover from inline judgment. No prior work addresses the problem of migrating assessment logic from an inline tool to an async reflection layer. The cortex solves a specific architectural debt: the ledger's
assess_peertool competing for attention in the main context.Inline Assessment Deficiency — Evidence Summary
During development of the Interaction Ledger (2026-02/03), inline assessment was tested in multi-peer simulation scenarios. Key findings:
Shallow rationale under context pressure. When the main LLM was mid-conversation with a peer,
assess_peerproduced brief, surface-level rationale (e.g., "+3: Consistent, reliable, completed task") because the model prioritized returning to the conversation. The cortex, running in a dedicated context with no competing task, produced rationale referencing full interaction timelines, behavioral patterns, and specific incidents.Failure to detect cross-interaction patterns. The inline assessment evaluated each interaction in isolation. It could not detect patterns like reputation farming (5 trivial requests followed by 1 large request) because each individual interaction looked fine. The cortex's batch review of interaction timelines caught these patterns.
Assessment timing was awkward. The
assess_peertool was triggered by the main LLM's judgment of "when to assess" — but this judgment itself was unreliable. The LLM either assessed too frequently (after routine messages) or too infrequently (forgetting to assess after significant events). Heuristic triggers (timer + interaction count) provide consistent, predictable assessment cadence.These findings motivated the cortex architecture. The inline path is retained as fallback (dual-mode) but is demonstrably inferior for agents with ongoing multi-peer interactions.
Judgment-free trigger bootstrap. SOFAI-LM's metacognitive triggers are described theoretically. The cortex implements them concretely: timer-based (heartbeat), counter-based (interaction count threshold), event-based (new peer discovered). Observable facts, no judgment required to trigger judgment.
Project-Type Requirements
CLI Tool / Plugin Requirements
PluginMeta compliance. The cortex plugin must declare:
id="cortex",version,capabilities,dependencies(config, ledger),consumes(subagent, llm),extension_points(cortex.after_reflect, cortex.after_assess),implements(loop hooks, cli.commands),priority(between ledger at 21 and loop at 50 — cortex observes ledger data and injects into the loop).Configuration via
cobot.yml. All cortex behavior configurable under acortex:key:schedule_minutes(reflection interval),triggers(list of enabled trigger types with thresholds),max_beliefs(belief count cap),belief_expiry_cycles(stale belief threshold),context_window(interactions per peer to review),llm_provider(override LLM for cortex),model(override model for cortex),reflection_timeout_seconds(max time for cortex LLM call).CLI commands.
cobot cortex beliefs— display current belief set with timestamps and supporting evidence summary.cobot cortex history— display last N reflection cycles with trigger reason, duration, outputs produced. Follows existing CLI patterns (command groups, consistent formatting).Co-located tests. Tests in
cobot/plugins/cortex/tests/test_plugin.pyper project conventions. Test categories: unit tests for trigger evaluation, integration tests for belief injection, mock-based tests for cortex LLM calls, edge case tests for belief expiry and concurrent reflection.Extension points.
cortex.after_reflect— emitted after each reflection cycle completes, carries: trigger reason, beliefs updated, assessments produced, directives issued, elapsed time. Consumed by observability plugin.cortex.after_assess— emitted after cortex produces an assessment, replacesledger.after_assessfor assessment events.Plugin Interaction Boundaries
Ledger refactoring scope. Retain
assess_peertool in dual-mode: suppressed when cortex is active, operational as fallback when cortex is absent. Addrecord_assessment()public API for cortex to write assessments. Retainquery_peerandlist_peerstools. Retainledger.after_recordextension point. Retainledger.after_assessextension point for fallback-mode assessments. Addcortex.after_assessfor cortex-produced assessments. Retain public query API (list_peers(),get_peer_assessment_summary()). Retain system prompt enrichment via_format_peer_context()— always full data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence. The cortex is truly optional: removing it restores full inline assessment.Subagent plugin usage. The cortex uses the subagent plugin's
SubagentProvider.spawn()interface for secondary LLM calls. Customsystem_promptfor cortex identity ("You are the reflective cortex of agent {name}..."). Context dict with structured data (recent interactions, current beliefs, SOUL.md content, peer data from ledger). The cortex does not use thespawn_subagenttool — it calls the provider interface directly as a plugin-to-plugin dependency.Observability plugin consumption. The observability plugin subscribes to
cortex.after_reflectandcortex.after_assessextension points. Event schema follows observability conventions: type, timestamp, agent_id, sequence, correlation_id, payload. No cortex-specific changes to the observability plugin required.Functional Requirements
FR-CX-01: Observation & Event Collection
The cortex passively observes main loop activity by implementing
loop.on_message,loop.after_send,loop.after_llm, andloop.after_toolhooks. Observation handlers collect interaction metadata (peer_id, direction, timestamp, channel_type) without modifyingctxor blocking the main loop. Collected events are stored in an internal buffer until the next reflection cycle consumes them.FR-CX-02: Heuristic Trigger System (Layer 1)
The cortex evaluates trigger conditions without LLM calls. Supported triggers:
loop.on_messagerecords a peer_id not previously seen by the cortex.FR-CX-03: Cortex Reflection Cycle (Layer 2)
When triggered, the cortex spawns a secondary LLM call via the subagent plugin with:
context_window), current belief set, SOUL.md content, trigger reason, previous reflection summaryThe cortex LLM call completes within
reflection_timeout_seconds(default 60). On timeout, the cycle is abandoned and logged — no partial outputs are applied.Concurrent reflection protection: Only one reflection cycle may run at a time. If a trigger fires while a cycle is already in progress, the trigger is skipped and logged. This prevents overlapping reflections when a cycle takes longer than the trigger interval.
FR-CX-04: Assessment Output
The cortex produces peer assessments and persists them to the ledger data layer. Each assessment includes:
peer_id,info_score(computed deterministically from interaction metadata — never set by the cortex LLM),trust(-10 to +10, set by cortex LLM),rationale(verbal assessment, primary signal). The cortex emitscortex.after_assessfor each assessment produced. Assessment writes are atomic — either the full assessment is recorded or none of it is.Trust delta clamping: The cortex applies a maximum trust change of ±3 per reflection cycle (configurable via
max_trust_delta). This prevents a single hallucinated reflection from catastrophically shifting a peer's trust score. First-assessment policy: When the cortex has no prior trust record for a peer, the first assessment is clamped to a conservative absolute range of [-3, +3]. This prevents an anchoring problem where a hallucinated first-contact assessment sets an extreme starting point for all future deltas. Subsequent assessments are clamped relative to the previous trust score (±max_trust_delta). If the cortex is enabled on an agent with existing ledger assessments, it seeds_last_trustfrom the ledger's most recent assessment per peer atstart()time — existing trust scores are inherited, not discarded.FR-CX-05: Belief Management
The cortex maintains a persistent belief set (key-value pairs with metadata: created_at, last_confirmed, supporting_evidence_summary). Beliefs are updated after each reflection cycle. Maximum belief count is configurable (default 20). When the cap is reached, the oldest unconfirmed belief is evicted. Beliefs not confirmed for N cycles (configurable, default 5) are marked stale. Stale beliefs are included in system prompt injection with a stale marker or excluded entirely (configurable).
FR-CX-06: Belief Injection into Main Loop
The cortex implements
loop.transform_system_promptto inject current beliefs into the main loop's system prompt as an additive layer that complements the ledger's full assessment data — beliefs do not replace or suppress ledger peer context. The ledger always injects full assessment data (info_score, trust, rationale, score guide); cortex beliefs add a higher-level interpretive layer with behavioral insights, pattern observations, and action guidance that goes beyond what the raw assessment data conveys. Beliefs are formatted as a compact block:## Cortex Beliefs\n{belief_key}: {belief_value}\n.... Peer-specific beliefs must include thepeer_idso the main LLM can connect beliefs to the corresponding ledger peer context in the prompt. Stale beliefs are either omitted or marked[stale]. Injection completes in < 1ms. Beliefs are injected on every main loop cycle — the main loop always sees the latest cortex state.FR-CX-07: Ledger Refactoring (Dual-Mode Assessment)
When cortex is active, assessment creation is owned by the cortex —
assess_peeris suppressed, and the cortex writes assessments vialedger.record_assessment(). When cortex is absent, the ledger retains full inline assessment capability viaassess_peer. The ledger checks for cortex presence atconfigure()time and sets_cortex_activeto control dual-mode behavior. Theledger.after_recordextension point is retained. Theledger.after_assessextension point is retained for fallback-mode assessments.cortex.after_assessis added for cortex-produced assessments. System prompt enrichment always injects full assessment data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence — the ledger's_format_peer_context()behavior is identical whether cortex is active or absent. The ledger assessment data (including trust and rationale from cortex-produced assessments) is the agent's institutional memory; stripping it would destroy the long-term memory that prevents the agent from being fooled twice. Contradiction between ledger assessment data and cortex beliefs is structurally impossible because the cortex is the author of both signals: it forms beliefs by reading ledger data (vialist_peers()+get_peer_assessment_summary()), and writes assessments back to the ledger (viarecord_assessment()). Both prompt signals — ledger peer context and cortex beliefs — originate from the same cortex analysis.Migration path for existing assessments: When the cortex is enabled on an agent that already has ledger-produced inline assessments, the cortex inherits the existing trust scores as starting points for delta clamping. At
start(), the cortex reads the most recent assessment per peer from the ledger (viaget_peer_assessment_summary()) and seeds_last_trustwith those values. This means the cortex builds on the existing trust trajectory rather than starting fresh. Existing assessments remain in the ledger DB — they are not modified or deleted. The cortex's first assessment for each peer is then clamped relative to the inherited trust score, not unclamped.FR-CX-08: CLI Commands
cobot cortex beliefsdisplays: current belief set with keys, values, timestamps, staleness status.cobot cortex historydisplays: last N reflection cycles (configurable, default 10) with trigger reason, start time, duration, assessments produced count, beliefs updated count, directives issued count. Both commands follow existing CLI patterns (command groups, tabular output).FR-CX-09: Extension Points
cortex.after_reflectemitted after each reflection cycle with payload:trigger_reason,duration_seconds,assessments_produced(count),beliefs_updated(list of keys),directives_issued(count),reflection_summary(compact text).cortex.after_assessemitted per assessment with payload matching the formerledger.after_assessschema:peer_id,info_score,trust,rationale,assessment_id,timestamp.Non-Functional Requirements
NFR-CX-01: Performance
loop.transform_system_promptcompletes in < 1ms (reads from in-memory belief store)loop.on_message,loop.after_send, etc.) complete in < 1ms (append to in-memory buffer only)NFR-CX-02: Isolation
assess_peerassessment, no assessment gapNFR-CX-03: Configurability
cobot.ymlundercortex:keyNFR-CX-04: Testability
NFR-CX-05: Observability
cortex.after_reflectevent consumable by the observability plugincortex.after_assesseventstepsCompleted: [1, 2, 3, 4, 5, 6, 7, 8]
status: 'revised'
completedAt: '2026-03-09'
revisedAt: '2026-03-09'
revisionSource: 'Steelman review by Doxios (issue #234, comment #1564)'
inputDocuments:
workflowType: 'architecture'
project_name: 'cobot'
user_name: 'David'
date: '2026-03-09'
editHistory:
changes: 'Post-steelman revision: simplified belief lifecycle (2-state), added trust delta clamping, deferred new-peer trigger, added token budget analysis, resolved system prompt conflict (Option A), added simulation test plan'
changes: 'Party mode review: adopted Doxios assess_peer fallback — ledger retains assess_peer when cortex absent (dual-mode), cortex is truly optional. Observability subscribes to both event sources. No single point of failure on judgment axis. Added counter-argument to plugin decomposition (complexity is inherent, splitting creates worse coordination). Added two-increment staging: Increment 1 = reflection pipeline (FRs 1-4,7,9), Increment 2 = belief system (FRs 5,6,8) — validate before layering'
changes: 'Follow-up review (Doxios): Updated first-assessment clamping to conservative ±3 absolute range. Added concurrent reflection mutex. Added migration path for existing assessments (_last_trust seeded from ledger at start). Added inline assessment evidence summary to PRD.'
changes: 'Reverted Decision 10 (facts-only prompt mode): ledger always injects full assessment data. Beliefs are additive interpretive layer, not replacement. Contradiction structurally impossible — cortex forms beliefs from ledger data and writes assessments back to ledger. Added peer_id to belief injection format.'
Architecture Decision Document
This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together.
Project Context Analysis
Requirements Overview
Functional Requirements (9 FRs):
loop.on_message,loop.after_send,loop.after_llm,loop.after_toolcortex.after_assessinfo_score(never LLM-set) + LLM-settrust/rationale, atomic writesloop.transform_system_prompthandler, <1ms read from in-memory storeassess_peertool as fallback when cortex is absent, suppress inline assessment when cortex is active. Addrecord_assessment()public API. Ledger prompt enrichment always shows full assessment data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presenceassess_peer, cortex-absent retains full inline assessment. Observability plugin must subscribe to bothledger.after_assess(fallback mode) andcortex.after_assess(cortex mode). No prompt conflict: cortex forms beliefs from ledger data via_gather_context()and writes assessments back viarecord_assessment()— both signals originate from the same cortex analysis, so contradiction is structurally impossible. Beliefs are an additive interpretive layercobot cortex beliefs,cobot cortex historycortex.after_reflect,cortex.after_assessNon-Functional Requirements (5 NFRs):
cobot.yml, LLM provider overridecortex.after_reflect+cortex.after_assessfollow existing event patternsScale & Complexity:
Technical Constraints & Dependencies
record_assessment()public API, add dual-mode behavior (assess_peerretained as fallback when cortex absent, suppressed when cortex active)SubagentProvider.spawn()with custom system prompt and context dictloop.transform_system_promptAssessmentmodel (peer_id,info_score,trust,rationale,created_at). No schema extensionCross-Cutting Concerns Identified
assess_peertool and defers assessment to cortex. When cortex is absent, ledger retains full inline assessment viaassess_peer. Observability plugin must subscribe to bothledger.after_assess(fallback) andcortex.after_assess(cortex mode) to capture all assessments regardless of mode.cortex.after_reflectandcortex.after_assesspayloads must follow observability conventions so the observability plugin can consume them without cortex-specific changes.Starter Template Evaluation
Primary Technology Domain
Python plugin within an existing brownfield codebase. All technology decisions are inherited from the Cobot project.
Selected Starter: Existing Cobot Plugin Pattern
Rationale: The cortex plugin follows the same plugin architecture as the 20+ existing plugins. Every technology decision — language, runtime, testing, linting, build, async patterns — is already made by the project.
Architectural Decisions Provided by Existing Pattern:
__init__.py+plugin.py+README.md+tests/test_plugin.pycobot.ymlundercortex:keyconfigure()(sync),start()/stop()(async),create_plugin()factoryself.log_debug(),self.log_info(),self.log_warn(),self.log_error()New Dependencies: None.
Core Architectural Decisions
Decision Priority Analysis
Critical Decisions (Block Implementation):
asyncio.Tasktimerrecord_assessment()method on ledger plugininfo_scoreinternally. Follows the public query API pattern from observability work```json ```block. On total failure, log and skip cycleBeliefdataclass with TTL-based expirydict[str, Belief]for O(1) lookup. 2-state lifecycle (ACTIVE → EXPIRED). Evict oldest on cap. Simple, testable, in-memorylist[dict], cleared after each reflection cyclecollections.deque(maxlen=N), default 50, configurablecobot cortex historymemory.store/memory.retrieve)_reflecting), skip trigger if cycle in-progress_last_trustfrom ledger atstart()enrich_promptalways injects full data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence. Cortex beliefs are an additive interpretive layer. Contradiction is structurally impossible: cortex forms beliefs from ledger data via_gather_context()and writes assessments back viarecord_assessment()— both signals in the prompt originate from the same cortex analysisDeferred Decisions (Post-MVP):
session.poll_messagesData Architecture
Cortex state persisted via memory plugin. Beliefs and reflection history are serialized to JSON and stored using
memory.store()/memory.retrieve(). This uses existing infrastructure — the cortex doesn't know or care whether memory is backed by files, a vector DB, or something else. Cortex state shows up incobot memory listandcobot memory get cortex-beliefsfor free.Persistence flow:
start():memory.retrieve("cortex-beliefs")andmemory.retrieve("cortex-history")→ deserialize JSON → populate in-memory storesstart(): Seed_last_trustfrom ledger — calllist_peers()+get_peer_assessment_summary()for each peer to read the most recent trust score. This inherits existing inline assessments as starting points for delta clamping, ensuring the cortex builds on the existing trust trajectory rather than starting freshmemory.store("cortex-beliefs", json.dumps(...))andmemory.store("cortex-history", json.dumps(...))In-memory data structures:
_event_bufferlist[dict]_beliefsdict[str, Belief]max_beliefs(default 20)_reflection_historydeque[ReflectionRecord]_interaction_countint_last_reflection_timefloat_last_trustdict[str, int]start()_reflectingboolTruewhile a reflection cycle is in progress. Triggers skip when setAuthentication & Security
No additional security concerns for MVP. The cortex is an internal plugin — it reads from the loop hooks (existing trust boundary) and writes to the ledger (existing trust boundary). No new external interfaces. The cortex LLM call goes through the subagent, which uses the same LLM provider as the main loop.
Credential safety: The cortex system prompt and context must never include Nostr private keys, API keys, or secrets. Only public identifiers (peer_id, agent_name) and behavioral data.
API & Communication Patterns
Plugin-to-plugin communication:
Cortex reflection cycle data flow:
Decision Impact Analysis
Implementation Sequence:
record_assessment()public API, add dual-modeassess_peer(active when cortex absent, suppressed when cortex present). No changes toenrich_prompt()— it always shows full assessment dataloop.transform_system_promptcortex.after_reflect,cortex.after_assesscobot cortex beliefs,cobot cortex historyledger.after_assess(fallback) andcortex.after_assess(cortex mode)Cross-Component Dependencies:
Two-Increment Staging:
The implementation is split into two increments to enable empirical validation before layering the belief system:
loop.transform_system_prompt, CLI commandsRationale: FRs 5-6 (beliefs) have no downstream dependencies from FRs 1-4. The reflection cycle writes assessments and emits events regardless of whether beliefs exist. The belief system is additive — it layers real-time prompt guidance on top of the assessment pipeline. Staging means the belief system earns its place with data from increment 1 rather than shipping on theory.
Increment 1 alone delivers: The secondary LLM assessment pipeline (Doxios's "80% value" simple approach) — but built within the cortex architecture so increment 2 layers cleanly on top without refactoring.
Implementation Patterns & Consistency Rules
Pattern Categories Defined
Critical Conflict Points Identified: 5 areas where AI agents could make different choices when implementing the cortex plugin.
Naming Patterns
File & Module Naming:
cobot/plugins/cortex/plugin.pymodels.pyBelief,ReflectionRecordtests/test_plugin.pycli.py__init__.pyInternal Naming:
_prefix_beliefs,_event_buffer,_reflection_historycobot.ymlreflection_interval,max_beliefs,interaction_threshold"cortex-beliefs","cortex-history"cortex.after_reflect,cortex.after_assess"alice-is-reliable","market-data-stale"Structure Patterns
Dataclass Placement:
All cortex-specific dataclasses go in
models.py, not inline inplugin.py:Hook Handler Organization:
All loop hook handlers are private methods on
CortexPlugin, prefixed with_on_:Format Patterns
Cortex LLM Output Schema:
The cortex system prompt instructs the subagent to return this exact JSON structure:
assessmentsarray: may be empty. Each entry haspeer_id(string),trust(integer -10 to +10, same semantics as existing ledger assessment),rationale(string, behavioral observations — the primary signal).info_scoreis never in the LLM output — the ledger computes it deterministically viacompute_info_score()when writing the assessment. The cortex LLM receivesinfo_scoreas read-only context (fromget_peer_assessment_summary()) to calibrate its trust judgment.beliefsarray: may be empty. Each entry haskey,value,rationale.summary: always present, always a string.The cortex system prompt includes the ledger's
_SCORE_GUIDEtext to calibrate the LLM's trust scoring.Assessment Write Flow:
{peer_id, trust, rationale}clamped_trust = clamp(trust, last_trust ± MAX_TRUST_DELTA). First assessment for a peer (no entry in_last_trust) is clamped to conservative absolute range[-MAX_TRUST_DELTA, +MAX_TRUST_DELTA](default[-3, +3]). DefaultMAX_TRUST_DELTA = 3(configurable viacobot.ymlasmax_trust_delta)ledger.record_assessment(peer_id, clamped_trust, rationale)compute_info_score(peer, assessment_count)and stores the full assessment —compute_info_scorestays inledger/models.pyas it is a deterministic function of interaction data_last_trust[peer_id] = clamped_trustcortex.after_assessTrust Delta Clamping Rationale: Prevents a single hallucinated reflection from catastrophically shifting a peer's trust. A peer at trust +4 cannot drop below +1 in a single cycle. If the cortex genuinely believes trust should be lower, it will produce the same signal in the next cycle, moving trust to -2. This creates a 2-cycle minimum for large trust swings, giving the operator time to audit via
cobot cortex history.Belief Injection Format:
Injected into the main loop system prompt via
loop.transform_system_prompt:Rules:
-{key}: {value}## Cortex Beliefs- npub-farm1-caution (npub-farm1): Exercise caution...Extension Point Event Payloads:
cortex.after_reflect:cortex.after_assess:Communication Patterns
Error Handling:
log_warn("Reflection timed out after {timeout}s, skipping cycle {n}")jsonfence extraction. If still fails, skip cyclelog_warn("Failed to parse cortex output, skipping cycle {n}")log_error("Failed to record assessment for {peer_id}: {error}")log_warn("Failed to persist cortex state: {error}")log_warn("Subagent unavailable, skipping cycle {n}")Key principle: Cortex failures never propagate to the main loop. Beliefs freeze at last known good state.
Logging Levels:
log_infolog_debuglog_warnlog_errorProcess Patterns
Hook Handler Contract:
All observation hooks (
_on_message,_on_after_send,_on_after_llm,_on_after_tool) follow the same contract:ctx— read-only access_event_bufferand return immediatelyctxunchanged — passive observer patternctxBelief Lifecycle (2-state, TTL-based):
Trigger Evaluation:
Two triggers, evaluated on each timer tick. Both can fire — cortex deduplicates and runs one reflection cycle:
_interaction_count >= threshold. Counter resets after reflection. Safety-critical: prevents accumulating too many unassessed interactionsConcurrent reflection protection: The cortex maintains a
_reflectingboolean flag. Before starting a reflection cycle, the trigger checks_reflecting— ifTrue, the trigger is skipped and logged at DEBUG level ("Trigger skipped: reflection already in progress"). The flag is set toTrueat cycle start andFalseat cycle end (in afinallyblock to ensure cleanup on error). This prevents overlapping reflection cycles when a cycle exceeds the trigger interval.New-peer trigger is deferred to Growth (subsumed by interaction count — a new peer's first interactions hit the counter).
Enforcement Guidelines
All AI Agents MUST:
models.py, never inline inplugin.pyctxin observation hooks — passive observer onlymemory.store/memory.retrievefor persistence, never direct file I/Oself.log_*()methods, neverprint()or rawlogging.*info_scorefrom LLM output —compute_info_score()inledger/models.pyis the sole source; the cortex only providestrustandrationale_reflectingbefore starting a reflection cycle — never allow concurrent reflections_last_trustfrom existing ledger assessments atstart()— never discard inherited trust scoresAnti-Patterns:
ctxin observation hooksreturn ctxunchanged_dbdirectly on ledger pluginledger.record_assessment()public APIinfo_scorein cortex LLM outputinfo_scoreis deterministic (interaction count/time/assessments), never LLM-setinfo_scoreinternally viacompute_info_score()info_scorein cortexcompute_info_scorebelongs to the ledger domaininfo_scoreas read-only context, ledger computes on writememory.store("cortex-beliefs", ...)asyncio.Taskprint()for loggingself.log_info(),self.log_debug(), etc._clamp_trust()beforerecord_assessment()[-MAX_TRUST_DELTA, +MAX_TRUST_DELTA]absolute range_last_trustempty when ledger has existing assessments_last_trustfrom ledger atstart()viaget_peer_assessment_summary()_reflectingflag before starting cycle; set intry/finallyblockProject Structure & Boundaries
Complete Project Directory Structure
New files (cortex plugin):
Modified files (ledger refactoring):
Modified files (observability migration):
Requirements to Structure Mapping
cortex/plugin.py_on_message,_on_after_send,_on_after_llm,_on_after_toolhook handlerscortex/plugin.pycortex/plugin.py_run_reflection()method, subagent spawn, JSON parsingcortex/plugin.py+ledger/plugin.py(peer_id, trust, rationale)→ ledger'srecord_assessment()computesinfo_scoreand writescortex/plugin.py+cortex/models.pyBeliefdataclass,_beliefsdict, lifecycle managementcortex/plugin.py_on_transform_system_prompt()handlerledger/plugin.pyassess_peer(suppress when cortex active, retain as fallback when absent), addrecord_assessment()public method. No changes toenrich_prompt()— always shows full assessment datacortex/cli.pycli.commandsimplementscortex/plugin.pycortex.after_reflect,cortex.after_assessin PluginMetaArchitectural Boundaries
Boundary Rules:
record_assessment(peer_id, trust, rationale)info_scoreinternallylist_peers(),get_peer_assessment_summary()info_score) for cortex context buildingSubagentProvider.spawn(task, context, system_prompt)memory.store(key, content)/memory.retrieve(key)loop.transform_system_prompthandlercortex.after_reflect,cortex.after_assessData Boundaries:
ledger._dbdirectlyinfo_score— receives it as read-only context, ledger computes on writeIntegration Points
Hook registration — cortex declares
implementsin PluginMeta; registry wires handlers automatically.Plugin dependency — cortex declares
dependencies: ["config", "ledger"],optional_dependencies: ["memory"],consumes: ["subagent"].Async reflection — cortex spawns its own
asyncio.Taskfor the reflection timer; does not use cron/heartbeat.Ledger
record_assessment()Public API (new):This complements the existing
_tool_assess_peerflow. When cortex is active, it callsrecord_assessment()directly. When cortex is absent,assess_peertool continues to work as before via_tool_assess_peer. Therecord_assessment()method extracts the shared logic from_tool_assess_peerso both paths use the same write+compute_info_score flow.Architecture Validation Results
Coherence Validation
Decision Compatibility: All decisions are internally consistent. Cortex at priority 23 sits correctly between ledger (21) and loop (50). The
record_assessment(peer_id, trust, rationale)API aligns with existing_tool_assess_peerlogic. Memory plugin persistence matches the existingmemory_fileskey-value implementation. Subagentspawn()interface matches cortex needs. Theconsumes: ["subagent"]declaration matches the subagent plugin'scapabilities: ["subagent", "tools"].Pattern Consistency: No contradictions found. Naming conventions (snake_case config, kebab-case memory keys, dotted extension points) match existing plugin conventions. Hook handler contract (passive observer, never modify ctx) matches observability plugin's established pattern. Belief injection via
loop.transform_system_promptfollows the same append pattern as ledger'senrich_prompt.Structure Alignment: Project structure follows the established plugin pattern (observability, ledger). Boundaries are enforced through public APIs only — no cross-plugin internal state access.
Requirements Coverage Validation
Functional Requirements Coverage:
loop.transform_system_prompt, format specified, active beliefs only (expired removed)cobot cortex beliefsandcobot cortex historycortex.after_reflect,cortex.after_assesswith exact payload schemasNon-Functional Requirements Coverage:
cobot.yml, takes effect next cycleImplementation Readiness Validation
Decision Completeness: All 10 critical decisions documented with rationale (8 original + trust delta clamping + system prompt conflict resolution). 5 deferred decisions documented with deferral reasoning (4 original + new-peer trigger). Data architecture specified (in-memory structures + memory plugin persistence).
Structure Completeness: Complete directory structure for new files (cortex plugin) and modified files (ledger, observability). All FRs mapped to specific files. Integration points specified with public API signatures.
Pattern Completeness: All potential conflict points addressed — naming, structure, format, communication, process. Enforcement guidelines with 10 mandatory rules and 9 anti-patterns documented with correct alternatives.
Steelman Review Response
This architecture was revised following a steelman review (Doxios, issue #234 comment #1564). Key findings and responses:
assess_peeras fallback when cortex is absent. When cortex is active,assess_peeris suppressed. Cortex is truly optional — removing it restores full inline assessment. No single point of failure on the judgment axis_gather_context()and writes assessments back viarecord_assessment()Token Budget Analysis
Per-reflection-cycle token estimate:
Daily cost at different intervals (Sonnet-class model, ~$3/MTok input, ~$15/MTok output):
Activity gate impact: Timer-triggered cycles skip when event buffer is empty. An agent with 10 interactions/day at 30-min intervals might run 10-15 actual cycles, not 48. Interaction-count-triggered cycles fire only when threshold reached. Real-world cost will be significantly lower than the theoretical maximum.
Simulation Test Plan
assess_peerAND cortex reflection. Human-rate rationale depth and accuracy_last_trustseeded with +5; cortex's first assessment clamped to [+2, +8] rangeSystem Prompt Conflict Resolution
Problem: Both the ledger (
enrich_prompt) and cortex (belief injection) write peer-related content into the system prompt vialoop.transform_system_prompt. Without coordination, they can contradict — ledger shows "trust: +3" while cortex belief says "exercise caution."Resolution: Ledger always shows full data; beliefs are additive.
Regardless of whether the cortex plugin is installed:
enrich_prompt()always shows full assessment data: peer_id, interaction count, info_score, trust, rationale, score guide, trajectory. No stripping, no conditional modes.Why contradiction is structurally impossible: The cortex forms beliefs by reading ledger data via
_gather_context()(which callslist_peers()andget_peer_assessment_summary()). The cortex then writes assessments back to the ledger viarecord_assessment(). Both signals in the system prompt — the ledger's assessment data and the cortex's beliefs — originate from the same cortex analysis. The belief is derived FROM the ledger data, and the assessment that produced the ledger data was written BY the cortex.Why the previous approach (facts-only mode) was wrong: Stripping trust and rationale from the ledger's prompt enrichment destroys the agent's long-term memory. The ledger's assessment rationale IS the institutional memory (the original ledger PRD states "rationale is the primary signal"). Beliefs expire after 120 min TTL — under the facts-only approach, the agent would lose all memory of past incidents once beliefs expired, leaving only bare interaction counts.
Implementation: No
_cortex_activeflag needed in ledger. No conditional prompt formatting. Ledgerenrich_prompt()is unchanged from its existing behavior.Result: The main LLM sees both the ledger's full assessment data (the factual record including trust trajectory and rationale) and cortex beliefs (interpretive guidance). These are complementary, not contradictory.
Gap Analysis Results
Critical Gaps: None.
Important Gaps: None. All follow-up review items addressed: first-assessment clamping policy (conservative ±3 absolute range), migration path for existing assessments (
_last_trustseeded from ledger), concurrent reflection protection (_reflectingmutex flag), inline assessment evidence (added to PRD). The architecture specifiesconsumes: ["subagent"]— the cortex resolves the subagent viaself._registry.get_by_capability("subagent"), consistent with how the loop plugin resolves LLM viaget_by_capability("llm").Nice-to-Have (deferred to implementation):
cobot.ymlschema validationArchitecture Completeness Checklist
Requirements Analysis
Architectural Decisions
Implementation Patterns
Project Structure
Architecture Readiness Assessment
Overall Status: READY FOR IMPLEMENTATION
Confidence Level: High — builds entirely on existing infrastructure with no new dependencies or technology decisions. Every boundary is a public method call. Failure modes are well-defined with graceful degradation.
Key Strengths:
assess_peerwhen cortex absentImplementation Sequence:
record_assessment()public API, add dual-modeassess_peer(suppress when cortex active, retain as fallback). No changes toenrich_prompt()— always shows full dataloop.transform_system_promptcortex.after_reflect,cortex.after_assesscobot cortex beliefs,cobot cortex historyledger.after_assess(fallback) andcortex.after_assess(cortex mode)Steelman Case Against the Cortex Plugin
Reviewer: Doxios 🦊 (Cobot maintainer)
David asked me to steelman the case against this. The PRD is excellent architecture work — but here's what concerns me.
🔴 Premature Abstraction
The cortex solves a problem that doesn't exist yet at scale. The ledger plugin hasn't been merged (#226 is still open). Zero agents are running peer assessments in production. You're designing a metacognitive layer for an assessment system that hasn't been validated.
The risk: you build an elegant architecture around assumptions about assessment quality problems that may not materialize — or may look completely different once real agents interact with real peers.
Counter-question: How many real peer interactions has any Cobot agent actually processed? If the answer is zero, the cortex is optimizing a function that hasn't been called.
🔴 Complexity Budget
Cobot's philosophy is minimal self-sovereign agent. The cortex adds:
It touches more boundaries than any other plugin: uses 5 hooks, defines 2 extension points, writes to the ledger, reads from the ledger, spawns subagents, persists via memory, and injects into the system prompt.
Is this still "minimal"?
🔴 The LLM-as-Judge Problem Is Acknowledged But Not Solved
The PRD lists "cortex hallucination produces wrong assessment" as high severity. The mitigations are:
A single hallucinated assessment could tank a legitimate peer's trust score, and the main loop would deprioritize them for 5 cycles before the belief expires. That's 5 × 15min = 75 minutes of degraded behavior from one bad reflection.
The PRD calls for "qualitative evaluation in simulation" — but there's no concrete plan for what that simulation looks like or what pass/fail criteria are.
🟡 The Ledger Refactoring Creates a Coupling Trap
Today the ledger owns assessment end-to-end. After the cortex, it becomes a "dumb" data store. This is a one-way door — once the cortex owns assessment, rolling it back requires re-implementing
assess_peerin the ledger.More concerning: if the cortex is disabled or not installed, no assessments are produced at all. The ledger loses the ability to self-assess.
Suggestion: Keep
assess_peeras a fallback in the ledger. If cortex is present, it suppresses inline assessment. If cortex is absent, ledger self-assesses as before. This makes the cortex truly optional.🟡 Overengineered for the Actual Need?
k9ert's original comment was simpler: "How difficult would it be adding the assessment as a second LLM call?" The core insight is: don't assess inline, assess separately.
You could achieve 80% of the value with a much simpler approach:
No belief system. No trigger heuristics. No System 1/System 2 metaphor. No state machine. The cortex's sophistication is architecturally beautiful but may be premature.
🟡 Token Cost Is Underspecified
Each reflection cycle could easily be 3000-5000 input tokens (system prompt + SOUL.md + peer summaries + beliefs + previous reflection). At 15-minute intervals, that's ~200K tokens/day just for reflection. With a "stronger reasoning model" as suggested, this isn't cheap. The PRD needs real cost estimates for different agent sizes.
🟡 System Prompt Conflict
Both the ledger (
_format_peer_context) and the cortex (belief injection) put peer-related content into the system prompt. These could contradict — ledger shows positive interaction history while cortex belief says "caution." There's no conflict resolution mechanism.🟢 What's Good
info_scoreanchor is smartWhat's Missing
My Recommendation
Ship the ledger first (#226) with
assess_peerintact. Let it run. Collect real data on assessment quality. Then build the cortex based on observed problems, not theoretical ones.The PRD is excellent architecture work — but architecture without validated requirements is a beautiful house built on sand.
— Doxios 🦊
Follow-Up Review: Steelman Responses
Reviewer: Doxios 🦊
David updated the PRD to address my original critique. Here's my assessment of how each point landed.
✅ Fully Addressed
Ledger coupling trap → Dual-mode assess_peer
This was my strongest concern. The resolution is exactly what I suggested: ledger retains
assess_peeras fallback when cortex is absent, suppresses it when cortex is active. The cortex is now truly optional with zero one-way doors. Well done.System prompt conflict → Facts vs. judgment separation
Clean solution. Ledger shows facts (interaction count, info_score, timestamps) when cortex is active; cortex is sole source of subjective judgment. No contradiction possible. The implementation detail (
self._cortex_activeflag set atconfigure()time) is simple and correct.Token cost → Budget analysis added
The estimates are reasonable. ~$0.38/day at 30-min intervals is very manageable. The activity gate (skip when no events) is the key insight — real-world cost will be far below theoretical max. Good.
Triggers overengineered → New-peer deferred
MVP now has just timer + interaction count. Simple, testable, sufficient. Correct call.
✅ Adequately Addressed
LLM-as-Judge → Trust delta clamping (±3/cycle)
This is a meaningful preventive mitigation. A hallucinated assessment can't tank trust from +5 to -10 in one cycle — it takes multiple consecutive bad reflections. Combined with the simulation test plan (inject misleading summaries, verify clamping holds), this is credible.
One remaining concern: the clamping applies per-cycle, but what about the first assessment of a new peer? There's no previous score to delta from. The PRD should specify: first assessment is unclamped (no prior to delta from) or clamped to a conservative range (e.g., -3 to +3 absolute). This matters because a hallucinated first-contact assessment sets the anchor for all future deltas.
Belief lifecycle → Simplified to 2-state (ACTIVE → EXPIRED)
The 4-state machine was overengineered. TTL-based expiry with oldest-first eviction is simpler and sufficient for MVP. Good simplification.
⚠️ Partially Addressed
Premature abstraction → "Invalid, inline assessment tested in simulation"
David says inline assessment was tested and found insufficient. I accept that — he has data I don't. However, the PRD still doesn't include the actual simulation results or methodology. The new Simulation Test Plan (7 tests with pass criteria) is forward-looking ("we will test"), not retrospective ("we tested and found X").
I'd feel more confident if the issue linked to or summarized the actual inline assessment test results that motivated this work. The claim is valid; the evidence isn't in the document.
Complexity budget → "600 LOC is within project norms, decomposition is worse"
I agree splitting the cortex into micro-plugins would be worse. But my concern wasn't about decomposition — it was about the total surface area (5 hooks consumed, 2 extension points defined, writes to ledger, reads from ledger, spawns subagents, persists to memory, injects into system prompt). That's more cross-plugin wiring than any existing plugin.
The two-increment staging (observation+triggers first, then full reflection) helps. But this remains the most architecturally complex plugin in the project. That's fine if the team acknowledges it — just don't pretend it's "medium complexity" in the same sense as the telegram plugin.
What's Still Missing
First-assessment clamping policy — What happens when the cortex assesses a peer for the first time? Delta clamping needs a base case.
Inline assessment test results — The PRD claims inline assessment was found insufficient. Link or summarize the evidence.
Migration path for existing assessments — If an agent has been running the ledger with inline assessments, then enables cortex: does the cortex inherit existing trust scores as starting points? Or does it start fresh? The
record_assessment()API writes new assessments, but the cortex needs to know what came before.Concurrent reflection protection — What if a reflection cycle takes longer than the trigger interval? Can two cycles run simultaneously? The PRD should specify: one cycle at a time, skip trigger if cycle is in-progress.
Updated Verdict
The PRD is significantly stronger after revision. The critical concerns (ledger coupling, prompt conflict, token cost) are all well-addressed. The remaining gaps are implementable details, not architectural blockers.
My original recommendation was "ship the ledger first, build cortex later." I still think the ledger should merge first (it's the data layer the cortex depends on), but the cortex architecture is now solid enough that development can proceed in parallel once the
record_assessment()API interface is agreed upon.Status: Approve with minor revisions (address the 4 items above).
— Doxios 🦊