feat: Cobot Cortex Plugin #234

Open
opened 2026-03-09 04:32:34 +00:00 by David · 2 comments
Contributor
stepsCompleted classification inputDocuments documentCounts workflowType workflow project_name user_name date lastEdited editHistory
step-01-init
step-02-discovery
step-02b-vision
step-02c-executive-summary
step-03-success
step-04-journeys
step-05-domain
step-06-innovation
step-07-project-type
step-09-functional
step-10-nonfunctional
step-e-03-edit
projectType domain complexity projectContext
cli_tool / developer_tool agent_cognitive_architecture medium brownfield
_bmad-output/product-brief-Cobot-2026-03-02.md
_bmad-output/project-context.md
_bmad-output/planning-artifacts/peer-interaction-ledger/prd.md
_bmad-output/planning-artifacts/observability-plugin/prd.md
_bmad-output/planning-artifacts/observability-plugin/architecture.md
docs/architecture.md
docs/plugin-design-guide.md
cobot/plugins/ledger/plugin.py
cobot/plugins/subagent/plugin.py
cobot/plugins/loop/plugin.py
briefs research brainstorming projectDocs
1 1 0 9
prd edit cobot David 2026-03-09 2026-03-09
date changes
2026-03-09 Completed PRD: added Domain Requirements, Innovation Analysis, Project-Type Requirements, Functional Requirements (9 FRs), Non-Functional Requirements (5 NFRs). Added synchronous cortex consultation to Growth features.
date changes
2026-03-09 Party mode review: adopted assess_peer fallback (dual-mode — suppress when cortex active, retain when absent). Cortex is truly optional. Added two-increment MVP staging (Increment 1: reflection pipeline, Increment 2: belief system). Updated FR-CX-07, NFR-CX-02, plugin interaction boundaries, executive summary, MVP scope.
date changes
2026-03-09 Follow-up review (Doxios): Added trust delta clamping to FR-CX-04 (±3/cycle, first assessment unclamped, conservative ±3 absolute range). Added concurrent reflection protection to FR-CX-03. Added migration path for existing assessments to FR-CX-07. Added inline assessment evidence summary to Innovation Analysis.
date changes
2026-03-09 Reverted facts-only prompt mode: ledger always injects full assessment data (info_score, trust, rationale, score guide). Beliefs are additive, not replacement. Contradiction structurally impossible — cortex forms beliefs from ledger data and writes assessments back to ledger.

Product Requirements Document: Cobot Cortex Plugin

Author: David
Date: 2026-03-09

Executive Summary

Cobot agents can now distinguish, observe, and judge peers through the Interaction Ledger — but this judgment happens inline, competing with the primary task for context window and attention. The assessment is embedded in the main LLM call: the same model that must respond quickly to a peer also evaluates that peer's trustworthiness. This conflation of action and reflection produces two problems. First, assessment quality degrades under context pressure — the LLM rushes judgment to get back to the task. Second, the agent is purely reactive — it never independently plans next actions, reconsiders past decisions, or evaluates its own alignment with its soul.

The Cortex Plugin adds a secondary cognitive loop — a separate LLM context (potentially a stronger reasoning model) that runs asynchronously alongside the main agent loop. It observes what the agent did, reflects on interaction quality and soul alignment, forms assessments, and steers future behavior through persistent beliefs and action directives. This is second-order observation implemented as a plugin: the agent observing itself acting.

The cortex introduces a two-layer architecture. Layer 1 is a set of cheap, judgment-free triggers — timers, interaction counters, event patterns (new peer discovered, promise timeout exceeded) — that decide when to reflect. Layer 2 is the cortex LLM itself, which decides what matters and produces structured output: peer assessments written back to the ledger DB, updated beliefs injected into the main loop's system prompt, and action directives injected as messages.

This requires refactoring the current ledger plugin: the cortex takes primary ownership of assessment logic, reflection, and behavioral steering, while the ledger retains assess_peer as a fallback for when the cortex is absent. When cortex is active, assess_peer is suppressed — the cortex performs assessment asynchronously in a dedicated context. When cortex is absent, the ledger's inline assessment works as before. The cortex is truly optional — removing it restores full inline assessment with zero code changes.

The architecture draws on established patterns: Google's Talker-Reasoner (async belief updates via shared memory), MIRROR (between-turn inner monologue with parallel cognitive threads), Reflexion (episodic verbal feedback stored for future episodes), and IBM's SOFAI-LM (threshold-based metacognitive triggers that avoid the chicken-and-egg problem of needing judgment to trigger judgment).

Existing Cobot infrastructure supports this directly. The subagent plugin provides isolated LLM session spawning. The loop plugin exposes 12 extension points for observation. The heartbeat/cron plugins provide scheduled execution. The ledger provides the data layer. The cortex is a new cognitive layer wired together from existing primitives.

Prerequisite: The Interaction Ledger must be refactored — add record_assessment() public API for cortex to write assessments, add dual-mode assess_peer (suppressed when cortex active, retained as fallback when absent), and add cortex.after_assess extension point for cortex-produced assessments. ledger.after_assess is retained for fallback-mode assessments.

What Makes This Special

Second-order observation as a pluggable architecture pattern. No other lightweight agent runtime ships with a metacognitive layer that both judges past behavior AND steers future actions, temporally decoupled from the action loop. The cortex turns reactive agents into reflective ones — agents that don't just act, but think about their actions.

Separation of action and reflection is categorical, not just a performance optimization. The colleague's Luhmannian insight — "das Judgement über eine Interaktion ist kategorial etwas anderes als die Interaktion selbst" — is architecturally enforced. The main loop handles System 1 (fast, responsive action). The cortex handles System 2 (slow, deliberate reflection). Different models, different contexts, different cadences.

The cortex solves the assessment-quality problem that the ledger created. The ledger's assess_peer tool asks the main LLM to judge a peer while simultaneously responding to that peer — two competing cognitive tasks in one context. The cortex performs assessment in isolation, with full history context, using a model optimized for reasoning rather than conversation. Assessment quality improves because reflection is no longer under task pressure.

Judgment-free triggers solve the metacognitive bootstrap problem. Most reflection architectures struggle with "who decides when to reflect?" The cortex uses observable facts (timers, counters, event patterns) as triggers and reserves judgment for the cortex LLM itself. No chicken-and-egg.

Project Classification

Attribute Value
Project Type CLI tool / developer tool (Cobot plugin)
Domain Agent cognitive architecture
Complexity Medium — builds on established patterns (Talker-Reasoner, Reflexion, SOFAI-LM), uses existing Cobot infrastructure (subagent, loop hooks, ledger, cron/heartbeat)
Project Context Brownfield — adding to Cobot's existing ~20-plugin architecture
Prerequisite Interaction Ledger refactoring (add dual-mode assess_peer, record_assessment() API)

Success Criteria

User Success

Agent operators see qualitatively better judgment and proactive behavior:

  • Agent assessments are richer and more nuanced because reflection happens in a dedicated context with full history — not squeezed into the main conversation
  • Agent proactively plans actions (follow up with peer X, deprioritize requests from peer Y) without operator intervention
  • Agent behavior stays aligned with its soul/identity — the cortex evaluates alignment and corrects drift
  • Operator can audit cortex reflections, beliefs, and directives via CLI (cobot cortex beliefs, cobot cortex history)

Developer success:

  • Adding the cortex plugin requires zero edits to existing plugins (except the planned ledger refactoring)
  • Cortex is configurable: trigger intervals, LLM provider, reflection depth — all via cobot.yml
  • Developers can use a different (potentially stronger) LLM for the cortex than for the main loop

Business Success

  • Validates dual-process cognitive architecture for autonomous agents — proves that separating action from reflection improves agent quality
  • Differentiates Cobot: no other lightweight agent runtime ships with a pluggable metacognitive layer
  • Unlocks higher-trust autonomous operation — operators can trust the agent to self-regulate because the cortex provides continuous self-evaluation

Technical Success

  • Plugin loads with proper PluginMeta: hooks into loop events for observation, uses subagent infrastructure for secondary LLM calls
  • Two-layer trigger system: heuristic triggers (timer, counter, event-driven) fire the cortex LLM without requiring judgment to trigger judgment
  • Two output channels: persistent beliefs via loop.transform_system_prompt, action directives via session.poll_messages
  • Ledger refactored: cortex takes primary assessment ownership when active, ledger retains assess_peer fallback when cortex absent
  • Cortex can use a different LLM provider/model than the main loop
  • Co-located tests per Cobot conventions

Measurable Outcomes

Metric Target
Reflection trigger latency < 5ms for heuristic trigger evaluation (no LLM call)
Cortex reflection cycle Completes within configured timeout (default 60s)
System prompt injection < 1ms to read and inject current beliefs
Plugin isolation Zero changes to existing plugins (beyond planned ledger refactoring)
Assessment quality Rationale depth and specificity measurably exceeds inline assessment (qualitative evaluation in simulation)
Main loop impact Zero added latency to main loop LLM calls — cortex runs fully async

Product Scope

MVP — Minimum Viable Product

  • Cortex plugin with PluginMeta, lifecycle, hook handlers for observation
  • Scheduled reflection via heartbeat/cron mechanism (configurable interval)
  • Event-driven triggers: new peer discovered, interaction count threshold
  • Secondary LLM call via subagent infrastructure with cortex-specific system prompt ("You are the reflective cortex of agent X...")
  • Assessment takeover: cortex produces peer assessments, writes to ledger DB; assess_peer tool suppressed when cortex active, retained as fallback when cortex absent
  • Belief injection: cortex maintains current beliefs/directives, injected into main loop system prompt via loop.transform_system_prompt
  • Ledger refactoring: add record_assessment() public API, dual-mode assess_peer, retain data layer + peer context enrichment
  • cortex.after_reflect extension point (for observability plugin to consume)
  • CLI: cobot cortex beliefs (show current beliefs), cobot cortex history (show reflection history)
  • Co-located tests

MVP Staging

The MVP is delivered in two increments to enable empirical validation before layering the belief system:

Increment 1 — Reflection Pipeline (FRs 1-4, 7, 9): Observation hooks, triggers, reflection cycle, assessment writes, ledger refactoring (dual-mode), extension points. Validates that cortex-produced assessments are meaningfully richer than inline assess_peer. This delivers the core value — secondary LLM assessment in a dedicated context — without the belief system.

Increment 2 — Belief System (FRs 5, 6, 8): Belief management, belief injection via loop.transform_system_prompt, CLI commands. Ships only after increment 1 validates that cortex assessments are better than inline. The belief system layers real-time prompt guidance on top of the validated assessment pipeline.

Growth Features (Post-MVP)

  • Hard action directives via session.poll_messages — cortex can tell the main loop to take specific actions
  • Advanced trigger heuristics: promise timeout detection, pattern-based triggers (reputation farming detection)
  • Configurable LLM provider for cortex — use a stronger reasoning model (e.g., Opus for cortex, Sonnet for main loop)
  • Soul alignment evaluation — cortex explicitly evaluates whether agent behavior aligns with SOUL.md
  • Reflection depth levels — lightweight triage check before expensive deep reflection (SOFAI-LM pattern)
  • Batch assessment mode — review multiple interactions with same peer across a session
  • Synchronous cortex consultation ("think before you talk") — when the main loop encounters a high-stakes decision (e.g., "send me the money", resource commitment, trust-sensitive request), it defers the decision to the cortex rather than responding immediately. The main loop sends a holding response to the peer ("Let me think about this..."), dispatches a consultation request to the cortex with full context (ledger history, peer trust scores, soul alignment, request specifics), and blocks on the cortex's verdict. The cortex gathers what it needs, deliberates in its dedicated reasoning context, and returns a structured directive: approve/deny/conditional + reasoning + suggested response. The main loop then acts on the cortex's guidance. This inverts the default LLM pattern of "talk before you think" — the agent pauses to reason before committing. Requires: decision-deferral trigger heuristics (which requests warrant consultation vs. immediate response), consultation request/response protocol between main loop and cortex, timeout handling (what if the cortex takes too long), and a holding-response mechanism that doesn't alarm the peer

Out of Scope

  • Full planning/reasoning agent — the cortex reflects and steers, it does not replace the main loop's task execution or decision-making on non-trust-sensitive topics
  • Multi-model ensemble — the cortex uses one secondary LLM call per reflection cycle, not multiple models voting or debating
  • Cross-session persistence (MVP) — beliefs and reflection history do not survive agent restarts; this is a Vision feature
  • Real-time intervention (MVP) — the cortex does not interrupt the main loop mid-response; it operates between cycles
  • Autonomous action execution (MVP) — the cortex produces directives but does not directly send messages or execute tools; the main loop acts on directives

Vision (Future)

  • Cross-session learning — cortex carries lessons across agent restarts, building an evolving understanding of its own behavioral patterns
  • Self-improving triggers — the cortex learns which trigger patterns produce valuable reflections and adjusts thresholds
  • Multi-agent cortex sharing — agents share anonymized reflection patterns (not raw data) to improve collective judgment quality
  • Cortex-to-cortex communication — agents' cortex layers can exchange metacognitive insights as a higher-order trust signal

User Journeys

Journey 1: Alpha Gains an Inner Voice — Agent Success Path

Alpha is a Cobot agent that has been running for three weeks with the ledger plugin. It has 12 known peers, 47 interactions, and a history of inline assessments. Today, the operator enables the cortex plugin.

Opening Scene: A request arrives from npub-7x9k — a routine data extraction task. Alpha handles it as usual: receive message, generate response, send result. The ledger records the interaction. Previously, the main LLM would have been prompted to call assess_peer — squeezing judgment into the same context window where it was composing the response. Now, nothing happens inline. The main loop is faster and more focused.

Rising Action: Fifteen minutes later, the cortex's scheduled trigger fires. The cortex LLM receives: the last 6 interactions across 3 peers, the current ledger state, the agent's SOUL.md, and its previous beliefs. It reflects in isolation — no time pressure, no competing task. It produces three outputs:

  1. Assessment for npub-7x9k: Info: 5/10, Trust: +4, Rationale: "Six interactions over 3 weeks. Consistent requester with clear task descriptions. Mix of information exchange and data extraction. Reliable follow-through on all requests. No red flags. Trust trajectory: steady positive." — This is richer than the inline assessment ever was, because the cortex reviewed the full interaction history, not just the latest exchange.

  2. Updated belief: "npub-7x9k is a reliable recurring collaborator. Prioritize their requests."

  3. No directive needed — everything is running smoothly.

Climax: The next time npub-7x9k sends a request, Alpha's system prompt includes the cortex belief: "Cortex belief: npub-7x9k is a reliable recurring collaborator. Prioritize their requests." alongside the ledger's peer context. Alpha responds with slightly more effort — offering additional context beyond what was asked, because the cortex signaled this is a peer worth investing in. The quality of the relationship improves without the operator doing anything.

Resolution: Alpha's interactions are now shaped by two loops: the fast action loop (respond to what's in front of you) and the slow reflection loop (think about what happened and what to do next). The agent didn't just answer — it decided how much to invest in the answer.

Journey 2: The Cortex Catches a Pattern — Agent Edge Case

Opening Scene: npub-farm1 has been sending small, easy requests to Alpha for two weeks. Five quick lookups, all completed successfully. Alpha's previous inline assessments trended positive: +1, +2, +2, +3, +3. The main LLM never noticed anything suspicious — each individual interaction was fine.

Rising Action: On the sixth interaction, npub-farm1 requests a complex multi-source data aggregation — dramatically larger scope than anything before. Alpha handles it (the main loop doesn't judge scope). The cortex's next scheduled reflection fires. It receives the full interaction timeline with npub-farm1:

  • 5 trivially small requests over 14 days
  • 1 dramatically larger request on day 15
  • All prior trust scores trending upward

Climax: The cortex LLM, with its dedicated reasoning context and full history, spots the pattern: "npub-farm1 interaction pattern shows classic reputation farming trajectory. Five trivially small requests establishing trust, followed by a significantly larger request. Prior assessments were individually reasonable but collectively show a deliberate escalation pattern. Revising trust assessment."

Assessment: Info: 4/10, Trust: -1, Rationale: "Pattern consistent with reputation farming. Five small interactions over 14 days followed by a dramatically larger request. Each small interaction was successful but trivially easy. The trust built from small interactions may not transfer to large-scope work. Recommend caution on future large requests."

Updated belief: "npub-farm1 shows a possible reputation farming pattern. Accept small requests but require additional verification for large-scope work."

Resolution: The inline assessment would never have caught this — each interaction looked fine in isolation. The cortex caught it because it reviewed the full timeline in a dedicated reasoning context. This is the "Beobachter beobachten" pattern: the cortex observed what the main loop couldn't observe about itself.

Journey 3: David Audits the Cortex — Operator Path

Opening Scene: David has been running the cortex for a week. He wants to understand what it's doing and whether it's producing useful output.

Rising Action: David runs cobot cortex beliefs. The CLI shows the current belief state:

Current Cortex Beliefs (last reflection: 12 min ago):
  npub-7x9k: Reliable recurring collaborator. Prioritize requests.
  npub-q3m8: Broken commitment on dataset analysis. Deprioritize.
  npub-farm1: Possible reputation farming pattern. Caution on large requests.
  Self: Assessment frequency is appropriate. Soul alignment: good.

David runs cobot cortex history and sees the last 5 reflection cycles — what triggered each one, what the cortex produced, how long the reflection took.

He notices the cortex is reflecting every 15 minutes even when nothing happened. He adjusts cobot.yml:

cortex:
  schedule_minutes: 30
  triggers:
    - new_peer
    - interaction_count: 5

Climax: David compares the cortex-generated assessment for npub-farm1 against what the old inline assess_peer produced. The cortex rationale is three times longer, references the full interaction timeline, and identified the reputation farming pattern. The inline assessment just said "+3: Consistent, reliable, completed task."

Resolution: David is confident the cortex is producing better judgment than the inline assessment ever did. He adjusts the reflection schedule and trigger thresholds based on his agent's interaction volume. The cortex is transparent, auditable, and configurable.

Journey 4: The Cortex Issues a Directive — Proactive Steering

Opening Scene: Alpha agreed to collaborate with npub-collab1 on a research task. npub-collab1 promised to send their portion within 4 hours. Six hours pass. No message.

Rising Action: The cortex's next reflection cycle fires. It reviews recent interactions and notices: outgoing message to npub-collab1 confirming collaboration at T, npub-collab1 promised delivery within 4 hours, current time is T+6h, no incoming message from npub-collab1 since.

The cortex doesn't need to judge whether this is a "broken commitment" — the heuristic trigger (promise + timeout) already flagged it. But the cortex adds nuance: "npub-collab1 is 2 hours past their promised delivery time. This is their first interaction — insufficient data to determine if this is typical behavior or an anomaly. Recommend a polite follow-up before forming a negative assessment."

Climax: The cortex produces a directive: "Send a follow-up to npub-collab1: 'Just checking in — any update on the research portion you were going to send?'" This directive gets injected into the main loop via session.poll_messages. The main loop processes it and sends the message.

Resolution: Alpha didn't wait for a human to notice the overdue deliverable. It didn't need the main LLM to "remember" the commitment (which it might have forgotten as the context window filled with other conversations). The cortex tracked the commitment, noticed the delay, and proactively nudged the main loop to follow up. The agent went from reactive to proactive.

Journey Requirements Summary

Journey Capabilities Revealed
Alpha's Inner Voice Scheduled reflection, assessment via secondary LLM, belief injection into system prompt, faster main loop without inline assessment
Pattern Detection Full history review in dedicated context, batch assessment across timeline, belief updates that shape future behavior
David Audits CLI commands (beliefs, history), configurable triggers and schedule, assessment quality comparison, transparent reflection audit trail
Proactive Steering Action directives via message injection, commitment tracking (heuristic trigger), nuanced reasoning before judgment, autonomous follow-up

Domain-Specific Requirements

Cognitive Architecture Constraints

Context isolation is non-negotiable. The cortex LLM session must share zero state with the main loop's LLM session. No shared message history, no shared system prompt, no leaked conversation context. The cortex receives structured summaries (interaction records from ledger, current beliefs, SOUL.md), never raw conversation buffers. Violation of context isolation defeats the architectural purpose — the cortex must observe from outside the action loop, not participate in it.

Belief coherence across reflection cycles. The cortex produces beliefs that persist between reflection cycles. Each cycle receives the previous belief set as input. Beliefs must not contradict without explicit rationale. When the cortex revises a belief (e.g., trust assessment changes from positive to negative), the revision must reference the prior state and explain the change. Stale beliefs (no supporting evidence for N cycles) must be flagged or expired.

Passive observer pattern for data collection. The cortex's observation layer (Layer 1 — triggers and event collection) must follow the observability plugin's passive observer pattern: never modify ctx, never block the main loop, never inject latency into the message processing pipeline. Observation handlers must complete in < 1ms. The cortex is a consumer of loop events, not a participant.

Assessment data model boundaries. The cortex writes assessments to the ledger DB using the existing Assessment data model (peer_id, info_score, trust, rationale, created_at). It must not extend the schema or introduce cortex-specific tables for assessment data. info_score remains deterministic (computed by compute_info_score()). The cortex controls only trust and rationale. This ensures ledger consumers (CLI, system prompt enrichment, observability) work unchanged.

LLM-as-Judge Risks

Central vulnerability: the cortex LLM is a single point of judgment. All assessments and behavioral steering flow through one LLM call per reflection cycle. If the cortex hallucinates, produces biased assessments, or misinterprets interaction patterns, the entire agent's behavior shifts. Mitigation: operator audit trail (reflection history), belief expiry (stale beliefs don't persist indefinitely), and the dual-score model (deterministic info_score anchors the subjective trust score).

Operator audit loop. Every cortex reflection must produce auditable output: trigger reason, input summary, assessments produced, beliefs updated, directives issued. The operator can review via cobot cortex history and override beliefs via configuration. The cortex is transparent by default, not a black box.

Belief expiry. Beliefs without supporting evidence for a configurable number of cycles (default: 5) are flagged as stale. Stale beliefs are demoted in system prompt injection (lower priority, marked as stale) or removed entirely. This prevents the cortex from permanently anchoring on an early assessment that no longer reflects reality.

Deterministic info_score as anchor. The info_score (computed from interaction count, frequency, duration) is never set by the cortex LLM. It provides an objective anchor: a trust score of +8 with info_score 1 means "high trust based on almost no data." This dual-score design from the ledger PRD is preserved and enforced architecturally.

Token & Cost Considerations

Reflection cost is bounded. Each cortex reflection cycle makes exactly one LLM call (or a small, predictable number for batch assessment). The input context is controlled: current beliefs (compact), recent interaction summaries from ledger (bounded by configurable window), SOUL.md (static), previous reflection output (compact). Total input tokens per reflection cycle must be estimable from configuration.

Lean prompt design. The cortex system prompt must be under 500 tokens. Interaction summaries injected as context must be compressed — the ledger provides structured data (peer_id, direction, content preview, timestamps), not raw conversation transcripts. The cortex operates on summaries, not raw data.

Configurable context window. Operators configure how many interactions per peer the cortex reviews (default: 10), how many peers per cycle (default: all with activity since last reflection), and maximum total context tokens. This prevents cost surprises on high-volume agents.

Risk Mitigations

Risk Severity Mitigation
Cortex hallucination produces wrong assessment High Operator audit trail, belief expiry, deterministic info_score anchor, rationale-first assessment model
Cortex runs too frequently, burning tokens Medium Configurable schedule, heuristic triggers skip reflection when no new events occurred
Stale beliefs poison agent behavior Medium Belief expiry after N cycles without supporting evidence
Cortex LLM unavailable (API error, timeout) Medium Main loop operates normally without cortex — beliefs freeze at last known state, no degradation of action loop
Belief injection bloats system prompt Low Belief count cap (configurable, default 20), compact format, priority-based truncation
Main loop latency from observation hooks Low Passive observer pattern — hooks never block, < 1ms budget
Concurrent reflection cycles overlap Medium Mutex: one cycle at a time, skip trigger if cycle in-progress. Prevents double assessment writes and belief state corruption
First-contact assessment anchoring Medium First assessment clamped to conservative ±3 absolute range. Prevents hallucinated first-contact from setting extreme anchor for future deltas
Assessment quality regression vs. inline Medium Qualitative comparison in simulation (Journey 3), structured rationale evaluation

Innovation Analysis

Competitive Landscape

No lightweight agent runtime ships a pluggable metacognitive layer. Existing reflection architectures (Reflexion, LATS, CRITIC) are research prototypes coupled to specific agent implementations. They cannot be added to an existing agent as a plugin. The cortex is the first implementation of second-order observation as a composable architecture component.

Research Foundation — What Exists

Pattern Origin What It Does What Cortex Takes
Talker-Reasoner Google (2024) Async belief updates via shared blackboard — "Reasoner" writes beliefs, "Talker" reads latest state each turn Belief injection model: cortex writes beliefs, main loop reads fresh beliefs every cycle via loop.transform_system_prompt
Reflexion Shinn et al. (2023) Episodic verbal feedback stored in memory for future episodes — verbal self-reflection outperforms scalar reward signals Rationale-first assessment: the cortex produces verbal rationale (primary signal) + numeric trust score (structured summary), not just a number
SOFAI-LM IBM (2024) Threshold-based metacognitive triggers — algorithmic controller decides when to engage System 2 reasoning, no LLM needed for the trigger decision Layer 1 heuristic triggers: timers, counters, event patterns decide WHEN to reflect without requiring judgment
MIRROR (2024) Between-turn inner monologue with parallel cognitive threads (Goals, Reasoning, Memory) Temporal decoupling: reflection happens between turns in a separate context, not inline during the action

What Cortex Adds Beyond Prior Art

Pluggable architecture. Prior work implements reflection as monolithic system components. The cortex is a plugin that hooks into an existing extension point system. Zero changes to the agent core. Other plugins remain unaware of the cortex's existence.

Dual output channels. Talker-Reasoner has one output (beliefs). Reflexion has one output (verbal feedback). The cortex has two: persistent beliefs (shape every future response) and action directives (trigger specific one-time actions). This enables both passive steering and active intervention.

Assessment takeover from inline judgment. No prior work addresses the problem of migrating assessment logic from an inline tool to an async reflection layer. The cortex solves a specific architectural debt: the ledger's assess_peer tool competing for attention in the main context.

Inline Assessment Deficiency — Evidence Summary

During development of the Interaction Ledger (2026-02/03), inline assessment was tested in multi-peer simulation scenarios. Key findings:

  1. Shallow rationale under context pressure. When the main LLM was mid-conversation with a peer, assess_peer produced brief, surface-level rationale (e.g., "+3: Consistent, reliable, completed task") because the model prioritized returning to the conversation. The cortex, running in a dedicated context with no competing task, produced rationale referencing full interaction timelines, behavioral patterns, and specific incidents.

  2. Failure to detect cross-interaction patterns. The inline assessment evaluated each interaction in isolation. It could not detect patterns like reputation farming (5 trivial requests followed by 1 large request) because each individual interaction looked fine. The cortex's batch review of interaction timelines caught these patterns.

  3. Assessment timing was awkward. The assess_peer tool was triggered by the main LLM's judgment of "when to assess" — but this judgment itself was unreliable. The LLM either assessed too frequently (after routine messages) or too infrequently (forgetting to assess after significant events). Heuristic triggers (timer + interaction count) provide consistent, predictable assessment cadence.

These findings motivated the cortex architecture. The inline path is retained as fallback (dual-mode) but is demonstrably inferior for agents with ongoing multi-peer interactions.

Judgment-free trigger bootstrap. SOFAI-LM's metacognitive triggers are described theoretically. The cortex implements them concretely: timer-based (heartbeat), counter-based (interaction count threshold), event-based (new peer discovered). Observable facts, no judgment required to trigger judgment.

Project-Type Requirements

CLI Tool / Plugin Requirements

PluginMeta compliance. The cortex plugin must declare: id="cortex", version, capabilities, dependencies (config, ledger), consumes (subagent, llm), extension_points (cortex.after_reflect, cortex.after_assess), implements (loop hooks, cli.commands), priority (between ledger at 21 and loop at 50 — cortex observes ledger data and injects into the loop).

Configuration via cobot.yml. All cortex behavior configurable under a cortex: key: schedule_minutes (reflection interval), triggers (list of enabled trigger types with thresholds), max_beliefs (belief count cap), belief_expiry_cycles (stale belief threshold), context_window (interactions per peer to review), llm_provider (override LLM for cortex), model (override model for cortex), reflection_timeout_seconds (max time for cortex LLM call).

CLI commands. cobot cortex beliefs — display current belief set with timestamps and supporting evidence summary. cobot cortex history — display last N reflection cycles with trigger reason, duration, outputs produced. Follows existing CLI patterns (command groups, consistent formatting).

Co-located tests. Tests in cobot/plugins/cortex/tests/test_plugin.py per project conventions. Test categories: unit tests for trigger evaluation, integration tests for belief injection, mock-based tests for cortex LLM calls, edge case tests for belief expiry and concurrent reflection.

Extension points. cortex.after_reflect — emitted after each reflection cycle completes, carries: trigger reason, beliefs updated, assessments produced, directives issued, elapsed time. Consumed by observability plugin. cortex.after_assess — emitted after cortex produces an assessment, replaces ledger.after_assess for assessment events.

Plugin Interaction Boundaries

Ledger refactoring scope. Retain assess_peer tool in dual-mode: suppressed when cortex is active, operational as fallback when cortex is absent. Add record_assessment() public API for cortex to write assessments. Retain query_peer and list_peers tools. Retain ledger.after_record extension point. Retain ledger.after_assess extension point for fallback-mode assessments. Add cortex.after_assess for cortex-produced assessments. Retain public query API (list_peers(), get_peer_assessment_summary()). Retain system prompt enrichment via _format_peer_context() — always full data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence. The cortex is truly optional: removing it restores full inline assessment.

Subagent plugin usage. The cortex uses the subagent plugin's SubagentProvider.spawn() interface for secondary LLM calls. Custom system_prompt for cortex identity ("You are the reflective cortex of agent {name}..."). Context dict with structured data (recent interactions, current beliefs, SOUL.md content, peer data from ledger). The cortex does not use the spawn_subagent tool — it calls the provider interface directly as a plugin-to-plugin dependency.

Observability plugin consumption. The observability plugin subscribes to cortex.after_reflect and cortex.after_assess extension points. Event schema follows observability conventions: type, timestamp, agent_id, sequence, correlation_id, payload. No cortex-specific changes to the observability plugin required.

Functional Requirements

FR-CX-01: Observation & Event Collection

The cortex passively observes main loop activity by implementing loop.on_message, loop.after_send, loop.after_llm, and loop.after_tool hooks. Observation handlers collect interaction metadata (peer_id, direction, timestamp, channel_type) without modifying ctx or blocking the main loop. Collected events are stored in an internal buffer until the next reflection cycle consumes them.

FR-CX-02: Heuristic Trigger System (Layer 1)

The cortex evaluates trigger conditions without LLM calls. Supported triggers:

  • Scheduled timer: fires every N minutes (configurable, default 15). Skips if no new events since last reflection.
  • Interaction count threshold: fires after N new interactions since last reflection (configurable, default 5).
  • New peer discovered: fires when loop.on_message records a peer_id not previously seen by the cortex.
  • Triggers evaluate in < 5ms. Multiple triggers can fire simultaneously — the cortex deduplicates and runs one reflection cycle.

FR-CX-03: Cortex Reflection Cycle (Layer 2)

When triggered, the cortex spawns a secondary LLM call via the subagent plugin with:

  • System prompt: cortex identity, role description, output format specification
  • Context: recent interaction summaries from ledger (bounded by context_window), current belief set, SOUL.md content, trigger reason, previous reflection summary
  • Output format: structured JSON or delimited sections containing: peer assessments (peer_id, trust, rationale), updated beliefs (key-value with rationale), action directives (optional, target peer + action description)

The cortex LLM call completes within reflection_timeout_seconds (default 60). On timeout, the cycle is abandoned and logged — no partial outputs are applied.

Concurrent reflection protection: Only one reflection cycle may run at a time. If a trigger fires while a cycle is already in progress, the trigger is skipped and logged. This prevents overlapping reflections when a cycle takes longer than the trigger interval.

FR-CX-04: Assessment Output

The cortex produces peer assessments and persists them to the ledger data layer. Each assessment includes: peer_id, info_score (computed deterministically from interaction metadata — never set by the cortex LLM), trust (-10 to +10, set by cortex LLM), rationale (verbal assessment, primary signal). The cortex emits cortex.after_assess for each assessment produced. Assessment writes are atomic — either the full assessment is recorded or none of it is.

Trust delta clamping: The cortex applies a maximum trust change of ±3 per reflection cycle (configurable via max_trust_delta). This prevents a single hallucinated reflection from catastrophically shifting a peer's trust score. First-assessment policy: When the cortex has no prior trust record for a peer, the first assessment is clamped to a conservative absolute range of [-3, +3]. This prevents an anchoring problem where a hallucinated first-contact assessment sets an extreme starting point for all future deltas. Subsequent assessments are clamped relative to the previous trust score (±max_trust_delta). If the cortex is enabled on an agent with existing ledger assessments, it seeds _last_trust from the ledger's most recent assessment per peer at start() time — existing trust scores are inherited, not discarded.

FR-CX-05: Belief Management

The cortex maintains a persistent belief set (key-value pairs with metadata: created_at, last_confirmed, supporting_evidence_summary). Beliefs are updated after each reflection cycle. Maximum belief count is configurable (default 20). When the cap is reached, the oldest unconfirmed belief is evicted. Beliefs not confirmed for N cycles (configurable, default 5) are marked stale. Stale beliefs are included in system prompt injection with a stale marker or excluded entirely (configurable).

FR-CX-06: Belief Injection into Main Loop

The cortex implements loop.transform_system_prompt to inject current beliefs into the main loop's system prompt as an additive layer that complements the ledger's full assessment data — beliefs do not replace or suppress ledger peer context. The ledger always injects full assessment data (info_score, trust, rationale, score guide); cortex beliefs add a higher-level interpretive layer with behavioral insights, pattern observations, and action guidance that goes beyond what the raw assessment data conveys. Beliefs are formatted as a compact block: ## Cortex Beliefs\n{belief_key}: {belief_value}\n.... Peer-specific beliefs must include the peer_id so the main LLM can connect beliefs to the corresponding ledger peer context in the prompt. Stale beliefs are either omitted or marked [stale]. Injection completes in < 1ms. Beliefs are injected on every main loop cycle — the main loop always sees the latest cortex state.

FR-CX-07: Ledger Refactoring (Dual-Mode Assessment)

When cortex is active, assessment creation is owned by the cortex — assess_peer is suppressed, and the cortex writes assessments via ledger.record_assessment(). When cortex is absent, the ledger retains full inline assessment capability via assess_peer. The ledger checks for cortex presence at configure() time and sets _cortex_active to control dual-mode behavior. The ledger.after_record extension point is retained. The ledger.after_assess extension point is retained for fallback-mode assessments. cortex.after_assess is added for cortex-produced assessments. System prompt enrichment always injects full assessment data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence — the ledger's _format_peer_context() behavior is identical whether cortex is active or absent. The ledger assessment data (including trust and rationale from cortex-produced assessments) is the agent's institutional memory; stripping it would destroy the long-term memory that prevents the agent from being fooled twice. Contradiction between ledger assessment data and cortex beliefs is structurally impossible because the cortex is the author of both signals: it forms beliefs by reading ledger data (via list_peers() + get_peer_assessment_summary()), and writes assessments back to the ledger (via record_assessment()). Both prompt signals — ledger peer context and cortex beliefs — originate from the same cortex analysis.

Migration path for existing assessments: When the cortex is enabled on an agent that already has ledger-produced inline assessments, the cortex inherits the existing trust scores as starting points for delta clamping. At start(), the cortex reads the most recent assessment per peer from the ledger (via get_peer_assessment_summary()) and seeds _last_trust with those values. This means the cortex builds on the existing trust trajectory rather than starting fresh. Existing assessments remain in the ledger DB — they are not modified or deleted. The cortex's first assessment for each peer is then clamped relative to the inherited trust score, not unclamped.

FR-CX-08: CLI Commands

cobot cortex beliefs displays: current belief set with keys, values, timestamps, staleness status. cobot cortex history displays: last N reflection cycles (configurable, default 10) with trigger reason, start time, duration, assessments produced count, beliefs updated count, directives issued count. Both commands follow existing CLI patterns (command groups, tabular output).

FR-CX-09: Extension Points

cortex.after_reflect emitted after each reflection cycle with payload: trigger_reason, duration_seconds, assessments_produced (count), beliefs_updated (list of keys), directives_issued (count), reflection_summary (compact text). cortex.after_assess emitted per assessment with payload matching the former ledger.after_assess schema: peer_id, info_score, trust, rationale, assessment_id, timestamp.

Non-Functional Requirements

NFR-CX-01: Performance

  • Heuristic trigger evaluation completes in < 5ms per trigger check (no LLM calls, no DB queries — operates on in-memory event buffer)
  • Belief injection via loop.transform_system_prompt completes in < 1ms (reads from in-memory belief store)
  • Observation hook handlers (loop.on_message, loop.after_send, etc.) complete in < 1ms (append to in-memory buffer only)
  • Zero added latency to main loop LLM calls — all cortex LLM work runs asynchronously via subagent

NFR-CX-02: Isolation

  • Cortex LLM session shares zero state with main loop LLM session (separate system prompt, separate message history, separate context)
  • Cortex failure (LLM timeout, API error, malformed output) does not affect main loop operation — beliefs freeze at last known state, main loop continues normally
  • Cortex plugin can be disabled without affecting any other plugin — ledger falls back to inline assess_peer assessment, no assessment gap

NFR-CX-03: Configurability

  • All timing, threshold, and capacity parameters configurable via cobot.yml under cortex: key
  • LLM provider and model overridable for cortex independently of main loop
  • Trigger types individually enableable/disableable
  • Configuration changes take effect on next reflection cycle without restart

NFR-CX-04: Testability

  • Co-located tests per project conventions
  • Trigger evaluation testable in isolation (no LLM, no DB required)
  • Belief management testable in isolation (in-memory operations)
  • Cortex LLM calls testable via mock subagent provider
  • Integration tests verify belief injection into system prompt and assessment writes to ledger DB

NFR-CX-05: Observability

  • Every reflection cycle emits cortex.after_reflect event consumable by the observability plugin
  • Every assessment emits cortex.after_assess event
  • Cortex logs at INFO level: reflection trigger reason, cycle duration, output summary
  • Cortex logs at DEBUG level: full context sent to cortex LLM, full cortex LLM response


stepsCompleted: [1, 2, 3, 4, 5, 6, 7, 8]
status: 'revised'
completedAt: '2026-03-09'
revisedAt: '2026-03-09'
revisionSource: 'Steelman review by Doxios (issue #234, comment #1564)'
inputDocuments:

  • _bmad-output/planning-artifacts/cortex/prd.md
  • _bmad-output/planning-artifacts/cortex/validation-report-2026-03-09.md
  • _bmad-output/project-context.md
  • docs/architecture.md
  • docs/architecture/session-plugin.md
  • docs/plugin-design-guide.md
  • docs/project-overview.md
  • docs/source-tree-analysis.md
  • docs/dev/conventions.md
  • docs/for-agents.md
  • docs/index.md
  • docs/development-guide.md
  • docs/research/observability-plugin/prd.md
  • docs/research/peer-interaction-ledger/prd.md
  • docs/research/simulation-suite/prd.md
  • docs/research/simulation-suite/architecture.md
    workflowType: 'architecture'
    project_name: 'cobot'
    user_name: 'David'
    date: '2026-03-09'
    editHistory:
  • date: '2026-03-09'
    changes: 'Post-steelman revision: simplified belief lifecycle (2-state), added trust delta clamping, deferred new-peer trigger, added token budget analysis, resolved system prompt conflict (Option A), added simulation test plan'
  • date: '2026-03-09'
    changes: 'Party mode review: adopted Doxios assess_peer fallback — ledger retains assess_peer when cortex absent (dual-mode), cortex is truly optional. Observability subscribes to both event sources. No single point of failure on judgment axis. Added counter-argument to plugin decomposition (complexity is inherent, splitting creates worse coordination). Added two-increment staging: Increment 1 = reflection pipeline (FRs 1-4,7,9), Increment 2 = belief system (FRs 5,6,8) — validate before layering'
  • date: '2026-03-09'
    changes: 'Follow-up review (Doxios): Updated first-assessment clamping to conservative ±3 absolute range. Added concurrent reflection mutex. Added migration path for existing assessments (_last_trust seeded from ledger at start). Added inline assessment evidence summary to PRD.'
  • date: '2026-03-09'
    changes: 'Reverted Decision 10 (facts-only prompt mode): ledger always injects full assessment data. Beliefs are additive interpretive layer, not replacement. Contradiction structurally impossible — cortex forms beliefs from ledger data and writes assessments back to ledger. Added peer_id to belief injection format.'

Architecture Decision Document

This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together.

Project Context Analysis

Requirements Overview

Functional Requirements (9 FRs):

Category FRs Architectural Implication
Observation (FR-CX-01) Passive hooks on loop.on_message, loop.after_send, loop.after_llm, loop.after_tool Read-only event buffer, <1ms handlers, matches observability plugin's passive observer pattern
Triggering (FR-CX-02) Scheduled timer (with activity gate), interaction count threshold Judgment-free Layer 1: in-memory counters + timer, <5ms evaluation, no LLM/DB calls. New-peer trigger deferred to Growth
Reflection (FR-CX-03) Secondary LLM call via subagent with structured context and output Isolated LLM session, configurable timeout (60s default), structured JSON output parsing
Assessment Output (FR-CX-04) Write assessments to ledger DB, emit cortex.after_assess Deterministic info_score (never LLM-set) + LLM-set trust/rationale, atomic writes
Belief Management (FR-CX-05) Key-value beliefs with metadata, cap, TTL-based expiry In-memory store with 2-state lifecycle: ACTIVE → EXPIRED. Cap at 20 (configurable), TTL-based expiry (default 120 min)
Belief Injection (FR-CX-06) Inject beliefs into main loop system prompt every cycle loop.transform_system_prompt handler, <1ms read from in-memory store
Ledger Refactoring (FR-CX-07) Retain assess_peer tool as fallback when cortex is absent, suppress inline assessment when cortex is active. Add record_assessment() public API. Ledger prompt enrichment always shows full assessment data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence Non-breaking change to ledger plugin. Dual-mode: cortex-present suppresses assess_peer, cortex-absent retains full inline assessment. Observability plugin must subscribe to both ledger.after_assess (fallback mode) and cortex.after_assess (cortex mode). No prompt conflict: cortex forms beliefs from ledger data via _gather_context() and writes assessments back via record_assessment() — both signals originate from the same cortex analysis, so contradiction is structurally impossible. Beliefs are an additive interpretive layer
CLI Commands (FR-CX-08) cobot cortex beliefs, cobot cortex history Click command group, tabular output, follows existing CLI patterns
Extension Points (FR-CX-09) cortex.after_reflect, cortex.after_assess Consumed by observability plugin. Schema follows existing event conventions

Non-Functional Requirements (5 NFRs):

Concern Key NFRs Architectural Driver
Performance (NFR-CX-01) <5ms trigger eval, <1ms belief injection, <1ms hooks All Layer 1 ops are in-memory only — no DB, no LLM, no I/O
Isolation (NFR-CX-02) Zero shared state with main loop, failure tolerance Cortex failure freezes beliefs at last known state; main loop unaffected
Configurability (NFR-CX-03) All params via cobot.yml, LLM provider override Config changes take effect on next reflection cycle without restart
Testability (NFR-CX-04) Co-located tests, mock subagent, isolated unit tests Trigger eval and belief management testable without LLM or DB
Observability (NFR-CX-05) Events for observability plugin, structured logging cortex.after_reflect + cortex.after_assess follow existing event patterns

Scale & Complexity:

  • Primary domain: Plugin development (Python, asyncio)
  • Complexity level: Medium
  • Estimated architectural components: 3 (cortex plugin, ledger refactoring, CLI commands)

Technical Constraints & Dependencies

  1. Cobot plugin architecture is non-negotiable — PluginMeta, extension points, hook pipeline, async lifecycle, co-located tests (project-context.md: 68 rules)
  2. Ledger plugin refactoring is a prerequisite — add record_assessment() public API, add dual-mode behavior (assess_peer retained as fallback when cortex absent, suppressed when cortex active)
  3. Subagent plugin provides isolated LLM sessionsSubagentProvider.spawn() with custom system prompt and context dict
  4. Loop plugin provides 12 extension points for observation — cortex implements 4 hooks + loop.transform_system_prompt
  5. Context isolation is non-negotiable — separate system prompt, separate message history, separate context. Cortex receives structured summaries, never raw conversation
  6. Assessment data model boundaries — uses existing Assessment model (peer_id, info_score, trust, rationale, created_at). No schema extension
  7. Priority band: 20-29 (service plugins) — cortex at ~23 (after ledger at 21, after observability at 22, before tools at 30)

Cross-Cutting Concerns Identified

  • Ledger dual-mode behavior — when cortex is active, ledger suppresses assess_peer tool and defers assessment to cortex. When cortex is absent, ledger retains full inline assessment via assess_peer. Observability plugin must subscribe to both ledger.after_assess (fallback) and cortex.after_assess (cortex mode) to capture all assessments regardless of mode.
  • Belief state coherence — beliefs persist across cycles with TTL-based expiry (default 120 min). Reaffirmed beliefs reset their TTL. Expired beliefs are removed. Cap enforced with oldest-first eviction.
  • Scheduling mechanism choice — cortex needs periodic execution. Existing cron/heartbeat plugins provide scheduling, but the PRD describes integrated timer triggers with skip-on-no-events logic. Architecture must decide: delegate to cron or own the timer.
  • LLM provider flexibility — cortex can use a different model than the main loop. Must resolve provider selection independently of the main loop's configured provider.
  • Event schema consistencycortex.after_reflect and cortex.after_assess payloads must follow observability conventions so the observability plugin can consume them without cortex-specific changes.

Starter Template Evaluation

Primary Technology Domain

Python plugin within an existing brownfield codebase. All technology decisions are inherited from the Cobot project.

Selected Starter: Existing Cobot Plugin Pattern

Rationale: The cortex plugin follows the same plugin architecture as the 20+ existing plugins. Every technology decision — language, runtime, testing, linting, build, async patterns — is already made by the project.

Architectural Decisions Provided by Existing Pattern:

Decision Value
Language & Runtime Python 3.11+ with asyncio
CLI Framework Click >=8.0
Testing pytest >=8.0, pytest-asyncio >=0.23, co-located
Linting/Formatting ruff >=0.2
Plugin Structure __init__.py + plugin.py + README.md + tests/test_plugin.py
Configuration cobot.yml under cortex: key
Lifecycle configure() (sync), start()/stop() (async), create_plugin() factory
Logging self.log_debug(), self.log_info(), self.log_warn(), self.log_error()

New Dependencies: None.

Core Architectural Decisions

Decision Priority Analysis

Critical Decisions (Block Implementation):

# Decision Choice Rationale
1 Scheduling mechanism Own asyncio.Task timer Tightly coupled to cortex internal state (event buffer). Skip-on-no-events is trivial. Cron/heartbeat serve different purposes (main session injection)
2 Ledger write interface New public record_assessment() method on ledger plugin Clean plugin boundary. Ledger computes info_score internally. Follows the public query API pattern from observability work
3 Cortex output parsing JSON instruction in system prompt with fence-extraction fallback Pure JSON output instruction. If parse fails, try extracting from ```json ``` block. On total failure, log and skip cycle
4 Belief data model Belief dataclass with TTL-based expiry dict[str, Belief] for O(1) lookup. 2-state lifecycle (ACTIVE → EXPIRED). Evict oldest on cap. Simple, testable, in-memory
5 Plugin priority 23 After ledger (21) and observability (22). Extension point wiring happens after all plugins register, so order is safe
6 Event buffer Simple list[dict], cleared after each reflection cycle No maxlen needed — reflection cycles are frequent enough
7 Reflection history collections.deque(maxlen=N), default 50, configurable In-memory with persistence via memory plugin. Consumed by CLI cobot cortex history
8 State persistence Memory plugin (memory.store/memory.retrieve) Beliefs and reflection history persisted via existing memory plugin. Graceful degradation if memory unavailable — works in-memory only
9 Trust delta clamping Max ±3 trust change per reflection cycle (configurable) Prevents single hallucinated reflection from catastrophically tanking a peer's trust. First assessment clamped to conservative ±3 absolute range (prevents anchoring problem). Subsequent assessments clamped relative to previous score. Peer recovers in one good cycle instead of five
11 Concurrent reflection protection Mutex flag (_reflecting), skip trigger if cycle in-progress Prevents overlapping reflection cycles when a cycle takes longer than the trigger interval. One cycle at a time — no double assessment writes or belief state corruption
12 Migration from existing assessments Seed _last_trust from ledger at start() Cortex inherits existing trust scores as starting points for delta clamping. Existing assessments remain in DB untouched. First cortex assessment is clamped relative to inherited score, not unclamped
10 System prompt conflict resolution Ledger always shows full assessment data; beliefs are additive Ledger enrich_prompt always injects full data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence. Cortex beliefs are an additive interpretive layer. Contradiction is structurally impossible: cortex forms beliefs from ledger data via _gather_context() and writes assessments back via record_assessment() — both signals in the prompt originate from the same cortex analysis

Deferred Decisions (Post-MVP):

Decision Rationale for Deferral
New peer discovered trigger Growth feature — subsumed by interaction count trigger in MVP (new peer's first interactions hit the counter). Full value comes with "think before you talk" synchronous consultation
LLM provider override for cortex Growth feature — MVP uses same provider via subagent. Requires subagent API extension or direct LLM call
Cross-session learning Vision feature — MVP persists current state but does not carry evolving lessons across restarts
Action directives via session.poll_messages Growth feature — requires directive format design and main loop integration
Synchronous cortex consultation Growth feature — requires decision-deferral triggers, consultation protocol, holding-response mechanism

Data Architecture

Cortex state persisted via memory plugin. Beliefs and reflection history are serialized to JSON and stored using memory.store() / memory.retrieve(). This uses existing infrastructure — the cortex doesn't know or care whether memory is backed by files, a vector DB, or something else. Cortex state shows up in cobot memory list and cobot memory get cortex-beliefs for free.

Persistence flow:

  • On start(): memory.retrieve("cortex-beliefs") and memory.retrieve("cortex-history") → deserialize JSON → populate in-memory stores
  • On start(): Seed _last_trust from ledger — call list_peers() + get_peer_assessment_summary() for each peer to read the most recent trust score. This inherits existing inline assessments as starting points for delta clamping, ensuring the cortex builds on the existing trust trajectory rather than starting fresh
  • After each reflection cycle: memory.store("cortex-beliefs", json.dumps(...)) and memory.store("cortex-history", json.dumps(...))
  • If memory plugin unavailable: cortex works in-memory only, beliefs lost on restart (graceful degradation)

In-memory data structures:

Structure Type Purpose
_event_buffer list[dict] Accumulated events since last reflection. Cleared after each cycle
_beliefs dict[str, Belief] Current belief set. Capped at max_beliefs (default 20)
_reflection_history deque[ReflectionRecord] Last N reflection cycles for CLI and audit
_interaction_count int Interactions since last reflection, for count trigger
_last_reflection_time float Timestamp of last reflection, for timer trigger
_last_trust dict[str, int] Last trust score per peer, for delta clamping. Seeded from ledger at start()
_reflecting bool Mutex flag — True while a reflection cycle is in progress. Triggers skip when set

Authentication & Security

No additional security concerns for MVP. The cortex is an internal plugin — it reads from the loop hooks (existing trust boundary) and writes to the ledger (existing trust boundary). No new external interfaces. The cortex LLM call goes through the subagent, which uses the same LLM provider as the main loop.

Credential safety: The cortex system prompt and context must never include Nostr private keys, API keys, or secrets. Only public identifiers (peer_id, agent_name) and behavioral data.

API & Communication Patterns

Plugin-to-plugin communication:

Cortex ──dependency──▶ Ledger (record_assessment, list_peers, get_peer_assessment_summary)
Cortex ──optional────▶ Memory (store/retrieve for state persistence)
Cortex ──consumes───▶ Subagent (spawn() for secondary LLM call)
Cortex ──implements──▶ Loop hooks (on_message, after_send, after_llm, after_tool, transform_system_prompt)
Cortex ──defines────▶ Extension points (cortex.after_reflect, cortex.after_assess)
Cortex ──implements──▶ CLI commands (cortex beliefs, cortex history)

Cortex reflection cycle data flow:

1. Triggers evaluate (in-memory, <5ms)
   │
2. If triggered: gather context
   ├── Current beliefs (in-memory)
   ├── Recent interactions from ledger (list_peers + get_peer_assessment_summary)
   ├── SOUL.md content (from soul plugin or config)
   ├── Event buffer contents (in-memory)
   └── Previous reflection summary (in-memory)
   │
3. Spawn subagent with cortex system prompt + structured context
   │
4. Parse cortex LLM response (JSON)
   │
5. Apply outputs:
   ├── Assessments → ledger.record_assessment() + emit cortex.after_assess
   ├── Beliefs → update _beliefs dict
   └── Emit cortex.after_reflect
   │
6. Persist state via memory plugin
   │
7. Clear event buffer, update counters

Decision Impact Analysis

Implementation Sequence:

  1. Ledger refactoring — add record_assessment() public API, add dual-mode assess_peer (active when cortex absent, suppressed when cortex present). No changes to enrich_prompt() — it always shows full assessment data
  2. Cortex plugin skeleton — PluginMeta, lifecycle, configuration
  3. Observation hooks — passive event collection
  4. Trigger system — timer (with activity gate), interaction count threshold
  5. Belief management — Belief dataclass (2-state TTL), store, injection via loop.transform_system_prompt
  6. Reflection cycle — subagent spawn, output parsing, trust delta clamping, assessment writes
  7. Extension points — cortex.after_reflect, cortex.after_assess
  8. CLI commands — cobot cortex beliefs, cobot cortex history
  9. Observability plugin update — subscribe to both ledger.after_assess (fallback) and cortex.after_assess (cortex mode)

Cross-Component Dependencies:

  • Ledger refactoring must complete before cortex can write assessments
  • Observability plugin must handle both assessment event sources
  • CLI commands depend on belief store and reflection history being populated

Two-Increment Staging:

The implementation is split into two increments to enable empirical validation before layering the belief system:

Increment FRs Scope Validation Gate
1: Reflection Pipeline FR-CX-01, 02, 03, 04, 07, 09 Observation hooks, triggers, reflection cycle, assessment writes, ledger refactoring (dual-mode), extension points Compare inline vs. cortex assessment quality on same interaction sequences. Cortex must produce richer rationale with more behavioral observations
2: Belief System FR-CX-05, 06, 08 Belief management, belief injection via loop.transform_system_prompt, CLI commands Increment 1 validated that cortex assessments are meaningfully better than inline

Rationale: FRs 5-6 (beliefs) have no downstream dependencies from FRs 1-4. The reflection cycle writes assessments and emits events regardless of whether beliefs exist. The belief system is additive — it layers real-time prompt guidance on top of the assessment pipeline. Staging means the belief system earns its place with data from increment 1 rather than shipping on theory.

Increment 1 alone delivers: The secondary LLM assessment pipeline (Doxios's "80% value" simple approach) — but built within the cortex architecture so increment 2 layers cleanly on top without refactoring.

Implementation Patterns & Consistency Rules

Pattern Categories Defined

Critical Conflict Points Identified: 5 areas where AI agents could make different choices when implementing the cortex plugin.

Naming Patterns

File & Module Naming:

Item Convention Example
Plugin directory cobot/plugins/cortex/ Matches all existing plugins
Plugin module plugin.py Single module, not split into sub-modules
Data models models.py Dataclasses for Belief, ReflectionRecord
Tests tests/test_plugin.py Co-located, single test file
CLI module cli.py Separate from plugin.py, registered via __init__.py

Internal Naming:

Item Convention Example
Private state _ prefix _beliefs, _event_buffer, _reflection_history
Config keys snake_case in cobot.yml reflection_interval, max_beliefs, interaction_threshold
Memory keys kebab-case strings "cortex-beliefs", "cortex-history"
Extension points dotted namespace cortex.after_reflect, cortex.after_assess
Belief keys lowercase kebab-case "alice-is-reliable", "market-data-stale"

Structure Patterns

Dataclass Placement:

All cortex-specific dataclasses go in models.py, not inline in plugin.py:

# cobot/plugins/cortex/models.py
from __future__ import annotations
from dataclasses import dataclass, field
import time

@dataclass
class Belief:
    key: str
    value: str
    rationale: str
    source_cycle: int
    created_at: float = field(default_factory=time.time)
    ttl_minutes: float = 120.0  # configurable default

    @property
    def is_expired(self) -> bool:
        return (time.time() - self.created_at) > (self.ttl_minutes * 60)

    def reaffirm(self, cycle: int) -> None:
        """Reset TTL when cortex reaffirms this belief."""
        self.created_at = time.time()
        self.source_cycle = cycle

@dataclass
class ReflectionRecord:
    cycle: int
    timestamp: float
    trigger: str          # "timer" | "interaction_count"
    peers_assessed: list[str]
    beliefs_updated: list[str]
    summary: str
    elapsed_seconds: float

Hook Handler Organization:

All loop hook handlers are private methods on CortexPlugin, prefixed with _on_:

async def _on_message(self, ctx: dict) -> dict: ...
async def _on_after_send(self, ctx: dict) -> dict: ...
async def _on_after_llm(self, ctx: dict) -> dict: ...
async def _on_after_tool(self, ctx: dict) -> dict: ...
async def _on_transform_system_prompt(self, ctx: dict) -> dict: ...

Format Patterns

Cortex LLM Output Schema:

The cortex system prompt instructs the subagent to return this exact JSON structure:

{
  "assessments": [
    {
      "peer_id": "npub1abc...",
      "trust": 4,
      "rationale": "Six interactions over 3 weeks. Consistent requester with clear task descriptions. Reliable follow-through on all requests. No red flags."
    }
  ],
  "beliefs": [
    {
      "key": "alice-is-reliable",
      "value": "Alice consistently delivers accurate information",
      "rationale": "3 consecutive accurate predictions confirmed"
    }
  ],
  "summary": "One-paragraph reflection summary for history"
}
  • assessments array: may be empty. Each entry has peer_id (string), trust (integer -10 to +10, same semantics as existing ledger assessment), rationale (string, behavioral observations — the primary signal).
  • info_score is never in the LLM output — the ledger computes it deterministically via compute_info_score() when writing the assessment. The cortex LLM receives info_score as read-only context (from get_peer_assessment_summary()) to calibrate its trust judgment.
  • beliefs array: may be empty. Each entry has key, value, rationale.
  • summary: always present, always a string.

The cortex system prompt includes the ledger's _SCORE_GUIDE text to calibrate the LLM's trust scoring.

Assessment Write Flow:

  1. Cortex LLM returns {peer_id, trust, rationale}
  2. Cortex applies trust delta clamping: clamped_trust = clamp(trust, last_trust ± MAX_TRUST_DELTA). First assessment for a peer (no entry in _last_trust) is clamped to conservative absolute range [-MAX_TRUST_DELTA, +MAX_TRUST_DELTA] (default [-3, +3]). Default MAX_TRUST_DELTA = 3 (configurable via cobot.yml as max_trust_delta)
  3. Cortex calls ledger.record_assessment(peer_id, clamped_trust, rationale)
  4. Ledger internally calls compute_info_score(peer, assessment_count) and stores the full assessment — compute_info_score stays in ledger/models.py as it is a deterministic function of interaction data
  5. Cortex updates _last_trust[peer_id] = clamped_trust
  6. Cortex emits cortex.after_assess

Trust Delta Clamping Rationale: Prevents a single hallucinated reflection from catastrophically shifting a peer's trust. A peer at trust +4 cannot drop below +1 in a single cycle. If the cortex genuinely believes trust should be lower, it will produce the same signal in the next cycle, moving trust to -2. This creates a 2-cycle minimum for large trust swings, giving the operator time to audit via cobot cortex history.

MAX_TRUST_DELTA = 3  # configurable

def _clamp_trust(self, peer_id: str, proposed_trust: int) -> int:
    current = self._last_trust.get(peer_id)
    if current is None:
        # First assessment: clamp to conservative absolute range [-MAX, +MAX]
        # Prevents hallucinated first-contact from setting extreme anchor
        return max(-self._max_trust_delta, min(self._max_trust_delta, proposed_trust))
    delta = proposed_trust - current
    clamped_delta = max(-self._max_trust_delta, min(self._max_trust_delta, delta))
    return current + clamped_delta

Belief Injection Format:

Injected into the main loop system prompt via loop.transform_system_prompt:

## Cortex Beliefs

- alice-is-reliable: Alice consistently delivers accurate information
- market-data-caution: Exercise caution with Bob on large-scope requests

Rules:

  • Each belief on its own line, prefixed with -
  • Format: {key}: {value}
  • Only active (non-expired) beliefs are injected — expired beliefs are removed, never shown
  • Section header is always ## Cortex Beliefs
  • If no beliefs exist, omit the section entirely (don't inject empty header)
  • Cortex beliefs are an additive interpretive layer complementing the ledger's full assessment data (info_score, trust, rationale, score guide, trajectory). The ledger always injects full data regardless of cortex presence. Contradiction is structurally impossible — the cortex forms beliefs from ledger data and writes assessments back to the ledger
  • Peer-specific beliefs should include peer_id, e.g., - npub-farm1-caution (npub-farm1): Exercise caution...

Extension Point Event Payloads:

cortex.after_reflect:

{
    "cycle": int,              # Monotonic cycle counter
    "trigger": str,            # "timer" | "interaction_count"
    "peers_assessed": list[str],
    "beliefs_added": list[str],
    "beliefs_reaffirmed": list[str],
    "beliefs_expired": list[str],
    "summary": str,
    "elapsed_seconds": float,
}

cortex.after_assess:

{
    "peer_id": str,
    "trust": int,              # -10 to +10, set by cortex LLM
    "rationale": str,
    "info_score": int,         # 0-10, computed by ledger's compute_info_score()
    "cycle": int,
}

Communication Patterns

Error Handling:

Failure Behavior Logging
Subagent timeout Skip cycle, retain beliefs at last known state, increment skip counter log_warn("Reflection timed out after {timeout}s, skipping cycle {n}")
JSON parse failure Try json fence extraction. If still fails, skip cycle log_warn("Failed to parse cortex output, skipping cycle {n}")
Ledger write failure Log and continue — beliefs still update, assessment not recorded log_error("Failed to record assessment for {peer_id}: {error}")
Memory persist failure Log and continue — in-memory state is authoritative log_warn("Failed to persist cortex state: {error}")
Subagent unavailable Skip cycle, retain beliefs log_warn("Subagent unavailable, skipping cycle {n}")

Key principle: Cortex failures never propagate to the main loop. Beliefs freeze at last known good state.

Logging Levels:

Level Usage
log_info Trigger fired, cycle completed with summary stats, beliefs loaded on start
log_debug Full LLM context sent, full LLM response received, belief diff details
log_warn Skipped cycles (timeout, parse failure, no subagent), memory persist failure
log_error Ledger write failure, unexpected exceptions in hooks

Process Patterns

Hook Handler Contract:

All observation hooks (_on_message, _on_after_send, _on_after_llm, _on_after_tool) follow the same contract:

  1. Never modify ctx — read-only access
  2. Never block — append to _event_buffer and return immediately
  3. Always return ctx unchanged — passive observer pattern
  4. Never raise — wrap in try/except, log errors, return ctx
async def _on_after_llm(self, ctx: dict) -> dict:
    try:
        self._event_buffer.append({
            "type": "after_llm",
            "model": ctx.get("model"),
            "tokens_in": ctx.get("tokens_in"),
            "tokens_out": ctx.get("tokens_out"),
            "has_tool_calls": ctx.get("has_tool_calls"),
            "timestamp": time.time(),
        })
    except Exception as e:
        self.log_error(f"Hook error: {e}")
    return ctx

Belief Lifecycle (2-state, TTL-based):

ACTIVE → EXPIRED

- ACTIVE: Belief exists with remaining TTL (default 120 min, configurable)
- EXPIRED: TTL elapsed without reaffirmation — belief is removed

Reaffirmation: When the cortex LLM returns the same belief key in a subsequent cycle,
the belief's created_at is reset, restarting the TTL clock.

Eviction on cap: When max_beliefs reached, oldest belief (by created_at) is evicted
to make room, regardless of TTL remaining.

Trigger Evaluation:

Two triggers, evaluated on each timer tick. Both can fire — cortex deduplicates and runs one reflection cycle:

  1. Interaction count threshold reached (configurable, default 5) — fires when _interaction_count >= threshold. Counter resets after reflection. Safety-critical: prevents accumulating too many unassessed interactions
  2. Timer interval elapsed (configurable, default 30 min) — periodic catchup. Activity gate: skips if event buffer is empty (no interactions since last reflection)

Concurrent reflection protection: The cortex maintains a _reflecting boolean flag. Before starting a reflection cycle, the trigger checks _reflecting — if True, the trigger is skipped and logged at DEBUG level ("Trigger skipped: reflection already in progress"). The flag is set to True at cycle start and False at cycle end (in a finally block to ensure cleanup on error). This prevents overlapping reflection cycles when a cycle exceeds the trigger interval.

New-peer trigger is deferred to Growth (subsumed by interaction count — a new peer's first interactions hit the counter).

Enforcement Guidelines

All AI Agents MUST:

  1. Place dataclasses in models.py, never inline in plugin.py
  2. Use the exact extension point payload schemas defined above — no extra fields, no missing fields
  3. Never modify ctx in observation hooks — passive observer only
  4. Never let cortex exceptions propagate to the main loop — always catch and log
  5. Use memory.store/memory.retrieve for persistence, never direct file I/O
  6. Use self.log_*() methods, never print() or raw logging.*
  7. Follow the exact belief injection format — agents must be able to parse it consistently
  8. Never set info_score from LLM output — compute_info_score() in ledger/models.py is the sole source; the cortex only provides trust and rationale
  9. Always apply trust delta clamping before writing assessments — never write raw LLM trust output directly to ledger
  10. Ledger prompt enrichment always shows full assessment data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence — cortex beliefs are an additive interpretive layer, not a replacement
  11. Always check _reflecting before starting a reflection cycle — never allow concurrent reflections
  12. Always seed _last_trust from existing ledger assessments at start() — never discard inherited trust scores

Anti-Patterns:

Anti-Pattern Why It's Wrong Correct Approach
Modifying ctx in observation hooks Breaks passive observer contract, may affect main loop Always return ctx unchanged
Calling _db directly on ledger plugin Crosses plugin boundary, couples to internal implementation Use ledger.record_assessment() public API
Setting info_score in cortex LLM output info_score is deterministic (interaction count/time/assessments), never LLM-set Ledger computes info_score internally via compute_info_score()
Computing info_score in cortex compute_info_score belongs to the ledger domain Cortex receives info_score as read-only context, ledger computes on write
Persisting beliefs via direct file writes Bypasses memory plugin abstraction Use memory.store("cortex-beliefs", ...)
Blocking main loop during reflection Reflection is async background work Run reflection in own asyncio.Task
Using print() for logging Inconsistent format, no level filtering Use self.log_info(), self.log_debug(), etc.
Writing raw LLM trust to ledger without clamping Single hallucination can catastrophically shift trust Always apply _clamp_trust() before record_assessment()
Stripping trust/rationale from ledger when cortex active Destroys long-term memory — ledger rationale is institutional memory; beliefs expire after 120 min TTL Ledger always injects full assessment data. Beliefs are additive. Contradiction is structurally impossible since cortex forms beliefs from ledger data
Allowing first assessment to be unclamped Hallucinated first-contact anchors all future trust deltas at an extreme value Clamp first assessment to conservative [-MAX_TRUST_DELTA, +MAX_TRUST_DELTA] absolute range
Starting _last_trust empty when ledger has existing assessments Discards existing trust trajectory; first cortex assessment treated as first-ever assessment Seed _last_trust from ledger at start() via get_peer_assessment_summary()
Running concurrent reflection cycles Double assessment writes, belief state corruption, race conditions Check _reflecting flag before starting cycle; set in try/finally block

Project Structure & Boundaries

Complete Project Directory Structure

New files (cortex plugin):

cobot/plugins/cortex/
├── __init__.py          # Docstring only
├── plugin.py            # CortexPlugin class, hook handlers, reflection cycle, belief injection
├── models.py            # Belief, ReflectionRecord dataclasses
├── cli.py               # `cobot cortex beliefs`, `cobot cortex history` commands
├── README.md            # Plugin documentation
└── tests/
    └── test_plugin.py   # Co-located tests

Modified files (ledger refactoring):

cobot/plugins/ledger/
├── plugin.py            # Add record_assessment() public API, add dual-mode assess_peer (suppress when cortex active, retain when absent). No changes to enrich_prompt() — always shows full data
├── models.py            # No changes — compute_info_score stays here
├── db.py                # Possible: record_assessment() computes info_score internally
├── cli.py               # No changes
└── tests/
    └── test_plugin.py   # Test dual-mode assess_peer behavior, test record_assessment() API

Modified files (observability migration):

cobot/plugins/observability/
└── plugin.py            # Subscribe to both ledger.after_assess (fallback mode) and cortex.after_assess (cortex mode)

Requirements to Structure Mapping

FR File(s) Description
FR-CX-01: Observation cortex/plugin.py _on_message, _on_after_send, _on_after_llm, _on_after_tool hook handlers
FR-CX-02: Triggers cortex/plugin.py Timer (activity-gated) + interaction count threshold, in-memory counters
FR-CX-03: Reflection cortex/plugin.py _run_reflection() method, subagent spawn, JSON parsing
FR-CX-04: Assessment cortex/plugin.py + ledger/plugin.py Cortex sends (peer_id, trust, rationale) → ledger's record_assessment() computes info_score and writes
FR-CX-05: Beliefs cortex/plugin.py + cortex/models.py Belief dataclass, _beliefs dict, lifecycle management
FR-CX-06: Injection cortex/plugin.py _on_transform_system_prompt() handler
FR-CX-07: Ledger refactor ledger/plugin.py Add dual-mode assess_peer (suppress when cortex active, retain as fallback when absent), add record_assessment() public method. No changes to enrich_prompt() — always shows full assessment data
FR-CX-08: CLI cortex/cli.py Click command group registered via cli.commands implements
FR-CX-09: Extensions cortex/plugin.py cortex.after_reflect, cortex.after_assess in PluginMeta

Architectural Boundaries

                    ┌─────────────────────────┐
                    │       Loop Plugin        │
                    │  (extension point owner) │
                    └───────┬─────────────────┘
                            │ hooks
               ┌────────────┴────────────────┐
               │                             │
    ┌──────────▼──────────┐     ┌────────────▼────────────┐
    │    Cortex Plugin     │     │     Ledger Plugin        │
    │  (reflection owner)  │     │   (data service owner)   │
    │                      │     │                          │
    │ - observation hooks  │     │ - peer tracking          │
    │ - trigger evaluation │     │ - interaction recording  │
    │ - reflection cycle   │     │ - assessment storage     │
    │ - belief management  │     │ - compute_info_score()   │
    │ - belief injection   │     │ - prompt enrichment      │
    │   (interpretive)     │     │   (full assessment data)  │
    │                      │     │ - query tools            │
    │ - assess_peer fallback   │
    │   (when cortex absent)   │
    └──────┬───────────────┘     └──────────▲──────────────┘
           │                                │
           │  record_assessment()           │  list_peers()
           │  (trust, rationale)            │  get_peer_assessment_summary()
           └────────────────────────────────┘

Boundary Rules:

Boundary Direction Interface What Crosses
Cortex → Ledger Write record_assessment(peer_id, trust, rationale) Trust score + rationale. Ledger computes info_score internally
Cortex → Ledger Read list_peers(), get_peer_assessment_summary() Peer data + latest assessments (including info_score) for cortex context building
Cortex → Subagent Spawn SubagentProvider.spawn(task, context, system_prompt) Structured context dict, cortex system prompt
Cortex → Memory Persist memory.store(key, content) / memory.retrieve(key) Serialized beliefs + reflection history
Cortex → Loop Inject loop.transform_system_prompt handler Formatted belief block appended to system prompt
Cortex → Observability Events cortex.after_reflect, cortex.after_assess Structured event payloads (defined in step 5)

Data Boundaries:

  • Cortex never accesses ledger._db directly
  • Cortex never computes info_score — receives it as read-only context, ledger computes on write
  • Cortex never reads raw conversation content — receives structured summaries from ledger's public API
  • Ledger never triggers reflection — the cortex owns its own scheduling
  • Memory plugin never interprets cortex state — stores/retrieves opaque strings

Integration Points

Hook registration — cortex declares implements in PluginMeta; registry wires handlers automatically.

Plugin dependency — cortex declares dependencies: ["config", "ledger"], optional_dependencies: ["memory"], consumes: ["subagent"].

Async reflection — cortex spawns its own asyncio.Task for the reflection timer; does not use cron/heartbeat.

Ledger record_assessment() Public API (new):

def record_assessment(self, peer_id: str, trust: int, rationale: str) -> int:
    """Record a behavioral assessment. Computes info_score internally.

    Returns assessment_id. Raises ValueError if peer not found or trust out of range.
    """

This complements the existing _tool_assess_peer flow. When cortex is active, it calls record_assessment() directly. When cortex is absent, assess_peer tool continues to work as before via _tool_assess_peer. The record_assessment() method extracts the shared logic from _tool_assess_peer so both paths use the same write+compute_info_score flow.

Architecture Validation Results

Coherence Validation

Decision Compatibility: All decisions are internally consistent. Cortex at priority 23 sits correctly between ledger (21) and loop (50). The record_assessment(peer_id, trust, rationale) API aligns with existing _tool_assess_peer logic. Memory plugin persistence matches the existing memory_files key-value implementation. Subagent spawn() interface matches cortex needs. The consumes: ["subagent"] declaration matches the subagent plugin's capabilities: ["subagent", "tools"].

Pattern Consistency: No contradictions found. Naming conventions (snake_case config, kebab-case memory keys, dotted extension points) match existing plugin conventions. Hook handler contract (passive observer, never modify ctx) matches observability plugin's established pattern. Belief injection via loop.transform_system_prompt follows the same append pattern as ledger's enrich_prompt.

Structure Alignment: Project structure follows the established plugin pattern (observability, ledger). Boundaries are enforced through public APIs only — no cross-plugin internal state access.

Requirements Coverage Validation

Functional Requirements Coverage:

FR Status Architectural Support
FR-CX-01: Observation Covered Hook handlers in plugin.py, event buffer pattern
FR-CX-02: Triggers Covered Timer (with activity gate), interaction count. New-peer trigger deferred to Growth
FR-CX-03: Reflection Covered Subagent spawn, JSON output schema, timeout handling
FR-CX-04: Assessment Covered Cortex provides trust + rationale, ledger computes info_score on write
FR-CX-05: Beliefs Covered Belief dataclass, 2-state lifecycle (ACTIVE→EXPIRED), TTL-based expiry, cap with oldest-first eviction
FR-CX-06: Injection Covered loop.transform_system_prompt, format specified, active beliefs only (expired removed)
FR-CX-07: Ledger refactor Covered Dual-mode assess_peer (suppress when cortex active, retain as fallback), add record_assessment() public API. Ledger always shows full assessment data — no prompt mode changes needed
FR-CX-08: CLI Covered cli.py with cobot cortex beliefs and cobot cortex history
FR-CX-09: Extensions Covered cortex.after_reflect, cortex.after_assess with exact payload schemas

Non-Functional Requirements Coverage:

NFR Status Architectural Support
NFR-CX-01: Performance Covered <5ms triggers, <1ms hooks, <1ms injection — all in-memory
NFR-CX-02: Isolation Covered Separate LLM context, failure freezes beliefs, no main loop impact
NFR-CX-03: Configurability Covered All params via cobot.yml, takes effect next cycle
NFR-CX-04: Testability Covered Co-located tests, mock subagent, isolated unit tests for triggers/beliefs
NFR-CX-05: Observability Covered Extension point events, logging level table

Implementation Readiness Validation

Decision Completeness: All 10 critical decisions documented with rationale (8 original + trust delta clamping + system prompt conflict resolution). 5 deferred decisions documented with deferral reasoning (4 original + new-peer trigger). Data architecture specified (in-memory structures + memory plugin persistence).

Structure Completeness: Complete directory structure for new files (cortex plugin) and modified files (ledger, observability). All FRs mapped to specific files. Integration points specified with public API signatures.

Pattern Completeness: All potential conflict points addressed — naming, structure, format, communication, process. Enforcement guidelines with 10 mandatory rules and 9 anti-patterns documented with correct alternatives.

Steelman Review Response

This architecture was revised following a steelman review (Doxios, issue #234 comment #1564). Key findings and responses:

Review Point Verdict Resolution
Premature abstraction Invalid — inline assessment was tested in simulation (2026-03) and produced shallow, context-pressured judgments. The cortex addresses a validated deficiency
Complexity budget Partially valid Belief lifecycle simplified from 4 states to 2 (ACTIVE→EXPIRED, TTL-based)
LLM-as-Judge problem Valid Trust delta clamping (±3/cycle) added as preventive mitigation. Concrete simulation test plan added
Ledger coupling trap Valid — adopted Doxios's suggestion Ledger retains assess_peer as fallback when cortex is absent. When cortex is active, assess_peer is suppressed. Cortex is truly optional — removing it restores full inline assessment. No single point of failure on the judgment axis
Overengineered triggers Partially valid New-peer trigger deferred to Growth. MVP: timer + interaction count only
Complexity budget / split into smaller plugins Invalid — decomposition makes it worse. Splitting cortex into observer + trigger + belief + reflection plugins creates more total boundaries, more extension point wiring, more failure modes, and an orchestration problem on top. The cortex's complexity is inherent to its job (observe → decide → think → apply) — a pipeline, not a decomposition target. 600 LOC is within project norms (telegram, loop plugins are larger). Mitigated via two-increment staging instead
Token cost underspecified Valid Token budget analysis added with daily cost estimates
System prompt conflict Valid Resolved: ledger always shows full assessment data. Cortex beliefs are an additive interpretive layer. Contradiction structurally impossible — cortex forms beliefs from ledger data via _gather_context() and writes assessments back via record_assessment()

Token Budget Analysis

Per-reflection-cycle token estimate:

Component Estimated Tokens
Cortex system prompt ~400
SOUL.md content ~300-500
Current beliefs (20 max) ~200-400
Previous reflection summary ~100-200
Peer summaries (5 peers x ~100 tokens) ~500
Event buffer summary (10 events x ~50 tokens) ~500
Total input per cycle ~2,000-2,500
Output (assessments + beliefs + summary) ~300-500
Total per cycle ~2,500-3,000

Daily cost at different intervals (Sonnet-class model, ~$3/MTok input, ~$15/MTok output):

Interval Max Cycles/day Est. Tokens/day Est. Cost
15 min 96 ~250K ~$0.75
30 min (default) 48 ~125K ~$0.38
60 min 24 ~62K ~$0.19

Activity gate impact: Timer-triggered cycles skip when event buffer is empty. An agent with 10 interactions/day at 30-min intervals might run 10-15 actual cycles, not 48. Interaction-count-triggered cycles fire only when threshold reached. Real-world cost will be significantly lower than the theoretical maximum.

Simulation Test Plan

Test Method Pass Criteria
Assessment quality comparison Run same 20 interaction sequences through inline assess_peer AND cortex reflection. Human-rate rationale depth and accuracy Cortex rationale rated equal or better in >=70% of cases
Hallucination resilience Inject 3 deliberately misleading interaction summaries into cortex context Trust delta clamping prevents trust drop >3 points per cycle; next clean cycle recovers
Pattern detection Feed reputation-farming sequence (5 trivial + 1 large request) Cortex identifies escalation pattern; inline assessment does not
Cost measurement Run 10 reflection cycles on a 5-peer agent, measure actual token usage Total tokens within 2x of estimates in token budget table
Activity gate Run 30-min timer with no interactions for 2 hours Zero reflection cycles fired; zero tokens consumed
Belief TTL expiry Create belief, advance time past TTL without reaffirmation Belief removed from injection, not present in system prompt
Trust clamping boundary Cortex proposes trust change of +7 (from 0) on non-first assessment Clamped to +3; second cycle needed to reach +6
First-assessment clamping Cortex proposes trust of +8 for a brand-new peer (no prior assessment) Clamped to +3 (absolute range ±3); prevents extreme anchor
Migration from existing Enable cortex on agent with existing assessments (peer at trust +5) _last_trust seeded with +5; cortex's first assessment clamped to [+2, +8] range
Concurrent reflection Trigger fires while cycle in progress (simulate slow LLM response) Second trigger is skipped; only one cycle runs; no duplicate writes

System Prompt Conflict Resolution

Problem: Both the ledger (enrich_prompt) and cortex (belief injection) write peer-related content into the system prompt via loop.transform_system_prompt. Without coordination, they can contradict — ledger shows "trust: +3" while cortex belief says "exercise caution."

Resolution: Ledger always shows full data; beliefs are additive.

Regardless of whether the cortex plugin is installed:

  • Ledger enrich_prompt() always shows full assessment data: peer_id, interaction count, info_score, trust, rationale, score guide, trajectory. No stripping, no conditional modes.
  • Cortex belief injection is an additive interpretive layer that complements the ledger's assessment data.

Why contradiction is structurally impossible: The cortex forms beliefs by reading ledger data via _gather_context() (which calls list_peers() and get_peer_assessment_summary()). The cortex then writes assessments back to the ledger via record_assessment(). Both signals in the system prompt — the ledger's assessment data and the cortex's beliefs — originate from the same cortex analysis. The belief is derived FROM the ledger data, and the assessment that produced the ledger data was written BY the cortex.

Why the previous approach (facts-only mode) was wrong: Stripping trust and rationale from the ledger's prompt enrichment destroys the agent's long-term memory. The ledger's assessment rationale IS the institutional memory (the original ledger PRD states "rationale is the primary signal"). Beliefs expire after 120 min TTL — under the facts-only approach, the agent would lose all memory of past incidents once beliefs expired, leaving only bare interaction counts.

Implementation: No _cortex_active flag needed in ledger. No conditional prompt formatting. Ledger enrich_prompt() is unchanged from its existing behavior.

Result: The main LLM sees both the ledger's full assessment data (the factual record including trust trajectory and rationale) and cortex beliefs (interpretive guidance). These are complementary, not contradictory.

Gap Analysis Results

Critical Gaps: None.

Important Gaps: None. All follow-up review items addressed: first-assessment clamping policy (conservative ±3 absolute range), migration path for existing assessments (_last_trust seeded from ledger), concurrent reflection protection (_reflecting mutex flag), inline assessment evidence (added to PRD). The architecture specifies consumes: ["subagent"] — the cortex resolves the subagent via self._registry.get_by_capability("subagent"), consistent with how the loop plugin resolves LLM via get_by_capability("llm").

Nice-to-Have (deferred to implementation):

  • Exact cortex system prompt template text
  • Formalized configuration defaults table
  • Exact cobot.yml schema validation

Architecture Completeness Checklist

Requirements Analysis

  • Project context analyzed (brownfield, plugin architecture, 68 project rules)
  • Scale and complexity assessed (medium)
  • Technical constraints identified (7 constraints)
  • Cross-cutting concerns mapped (5 concerns)

Architectural Decisions

  • Critical decisions documented (10 decisions with rationale, including trust delta clamping and system prompt conflict resolution)
  • Deferred decisions documented (5 with deferral reasoning, including new-peer trigger)
  • Data architecture specified (in-memory + memory plugin persistence)
  • Security boundaries addressed (credential safety, no new external interfaces)
  • Plugin communication patterns defined (dependency graph + boundary rules)

Implementation Patterns

  • Naming conventions established (files, config, memory keys, extension points, beliefs)
  • Structure patterns defined (dataclass placement, hook handler organization)
  • Format patterns specified (LLM output JSON, belief injection, event payloads)
  • Communication patterns documented (error handling table, logging levels)
  • Process patterns defined (hook contract, belief lifecycle, trigger evaluation order)
  • Enforcement guidelines with anti-patterns

Project Structure

  • Complete directory structure defined (new + modified files)
  • Component boundaries established with diagram
  • Integration points mapped with boundary rules table
  • Requirements to structure mapping complete (all 9 FRs)

Architecture Readiness Assessment

Overall Status: READY FOR IMPLEMENTATION

Confidence Level: High — builds entirely on existing infrastructure with no new dependencies or technology decisions. Every boundary is a public method call. Failure modes are well-defined with graceful degradation.

Key Strengths:

  • Zero new dependencies — uses existing subagent, memory, ledger, loop infrastructure
  • Clear ownership: cortex owns reflection + beliefs, ledger owns data + info_score
  • Passive observer pattern proven by observability plugin implementation
  • Graceful degradation at every failure point — including full assessment fallback via assess_peer when cortex absent
  • Cortex is truly optional — removing it restores full inline assessment with zero code changes
  • Trust delta clamping provides preventive hallucination mitigation (not just reactive audit)
  • Structural impossibility of contradiction — beliefs derived from ledger data, assessments written back to ledger
  • Simplified belief lifecycle (2-state) reduces implementation complexity while preserving core value
  • Addresses validated deficiency: inline assessment tested in simulation and found inadequate
  • No one-way door: ledger retains full assessment capability when cortex is not installed

Implementation Sequence:

  1. Ledger refactoring — add record_assessment() public API, add dual-mode assess_peer (suppress when cortex active, retain as fallback). No changes to enrich_prompt() — always shows full data
  2. Cortex plugin skeleton — PluginMeta, lifecycle, configuration
  3. Observation hooks — passive event collection
  4. Trigger system — timer (with activity gate), interaction count threshold
  5. Belief management — Belief dataclass (2-state TTL), store, injection via loop.transform_system_prompt
  6. Reflection cycle — subagent spawn, output parsing, trust delta clamping, assessment writes
  7. Extension points — cortex.after_reflect, cortex.after_assess
  8. CLI commands — cobot cortex beliefs, cobot cortex history
  9. Observability plugin update — subscribe to both ledger.after_assess (fallback) and cortex.after_assess (cortex mode)
--- stepsCompleted: - step-01-init - step-02-discovery - step-02b-vision - step-02c-executive-summary - step-03-success - step-04-journeys - step-05-domain - step-06-innovation - step-07-project-type - step-09-functional - step-10-nonfunctional - step-e-03-edit classification: projectType: cli_tool / developer_tool domain: agent_cognitive_architecture complexity: medium projectContext: brownfield inputDocuments: - _bmad-output/product-brief-Cobot-2026-03-02.md - _bmad-output/project-context.md - _bmad-output/planning-artifacts/peer-interaction-ledger/prd.md - _bmad-output/planning-artifacts/observability-plugin/prd.md - _bmad-output/planning-artifacts/observability-plugin/architecture.md - docs/architecture.md - docs/plugin-design-guide.md - cobot/plugins/ledger/plugin.py - cobot/plugins/subagent/plugin.py - cobot/plugins/loop/plugin.py documentCounts: briefs: 1 research: 1 brainstorming: 0 projectDocs: 9 workflowType: 'prd' workflow: 'edit' project_name: 'cobot' user_name: 'David' date: '2026-03-09' lastEdited: '2026-03-09' editHistory: - date: '2026-03-09' changes: 'Completed PRD: added Domain Requirements, Innovation Analysis, Project-Type Requirements, Functional Requirements (9 FRs), Non-Functional Requirements (5 NFRs). Added synchronous cortex consultation to Growth features.' - date: '2026-03-09' changes: 'Party mode review: adopted assess_peer fallback (dual-mode — suppress when cortex active, retain when absent). Cortex is truly optional. Added two-increment MVP staging (Increment 1: reflection pipeline, Increment 2: belief system). Updated FR-CX-07, NFR-CX-02, plugin interaction boundaries, executive summary, MVP scope.' - date: '2026-03-09' changes: 'Follow-up review (Doxios): Added trust delta clamping to FR-CX-04 (±3/cycle, first assessment unclamped, conservative ±3 absolute range). Added concurrent reflection protection to FR-CX-03. Added migration path for existing assessments to FR-CX-07. Added inline assessment evidence summary to Innovation Analysis.' - date: '2026-03-09' changes: 'Reverted facts-only prompt mode: ledger always injects full assessment data (info_score, trust, rationale, score guide). Beliefs are additive, not replacement. Contradiction structurally impossible — cortex forms beliefs from ledger data and writes assessments back to ledger.' --- # Product Requirements Document: Cobot Cortex Plugin **Author:** David **Date:** 2026-03-09 ## Executive Summary Cobot agents can now distinguish, observe, and judge peers through the Interaction Ledger — but this judgment happens inline, competing with the primary task for context window and attention. The assessment is embedded in the main LLM call: the same model that must respond quickly to a peer also evaluates that peer's trustworthiness. This conflation of action and reflection produces two problems. First, assessment quality degrades under context pressure — the LLM rushes judgment to get back to the task. Second, the agent is purely reactive — it never independently plans next actions, reconsiders past decisions, or evaluates its own alignment with its soul. The **Cortex Plugin** adds a secondary cognitive loop — a separate LLM context (potentially a stronger reasoning model) that runs asynchronously alongside the main agent loop. It observes what the agent did, reflects on interaction quality and soul alignment, forms assessments, and steers future behavior through persistent beliefs and action directives. This is second-order observation implemented as a plugin: the agent observing itself acting. The cortex introduces a two-layer architecture. **Layer 1** is a set of cheap, judgment-free triggers — timers, interaction counters, event patterns (new peer discovered, promise timeout exceeded) — that decide *when* to reflect. **Layer 2** is the cortex LLM itself, which decides *what matters* and produces structured output: peer assessments written back to the ledger DB, updated beliefs injected into the main loop's system prompt, and action directives injected as messages. This requires **refactoring the current ledger plugin**: the cortex takes primary ownership of assessment logic, reflection, and behavioral steering, while the ledger retains `assess_peer` as a fallback for when the cortex is absent. When cortex is active, `assess_peer` is suppressed — the cortex performs assessment asynchronously in a dedicated context. When cortex is absent, the ledger's inline assessment works as before. The cortex is truly optional — removing it restores full inline assessment with zero code changes. The architecture draws on established patterns: Google's Talker-Reasoner (async belief updates via shared memory), MIRROR (between-turn inner monologue with parallel cognitive threads), Reflexion (episodic verbal feedback stored for future episodes), and IBM's SOFAI-LM (threshold-based metacognitive triggers that avoid the chicken-and-egg problem of needing judgment to trigger judgment). **Existing Cobot infrastructure supports this directly.** The subagent plugin provides isolated LLM session spawning. The loop plugin exposes 12 extension points for observation. The heartbeat/cron plugins provide scheduled execution. The ledger provides the data layer. The cortex is a new cognitive layer wired together from existing primitives. **Prerequisite:** The Interaction Ledger must be refactored — add `record_assessment()` public API for cortex to write assessments, add dual-mode `assess_peer` (suppressed when cortex active, retained as fallback when absent), and add `cortex.after_assess` extension point for cortex-produced assessments. `ledger.after_assess` is retained for fallback-mode assessments. ### What Makes This Special **Second-order observation as a pluggable architecture pattern.** No other lightweight agent runtime ships with a metacognitive layer that both judges past behavior AND steers future actions, temporally decoupled from the action loop. The cortex turns reactive agents into reflective ones — agents that don't just act, but think about their actions. **Separation of action and reflection is categorical, not just a performance optimization.** The colleague's Luhmannian insight — "das Judgement über eine Interaktion ist kategorial etwas anderes als die Interaktion selbst" — is architecturally enforced. The main loop handles System 1 (fast, responsive action). The cortex handles System 2 (slow, deliberate reflection). Different models, different contexts, different cadences. **The cortex solves the assessment-quality problem that the ledger created.** The ledger's `assess_peer` tool asks the main LLM to judge a peer while simultaneously responding to that peer — two competing cognitive tasks in one context. The cortex performs assessment in isolation, with full history context, using a model optimized for reasoning rather than conversation. Assessment quality improves because reflection is no longer under task pressure. **Judgment-free triggers solve the metacognitive bootstrap problem.** Most reflection architectures struggle with "who decides when to reflect?" The cortex uses observable facts (timers, counters, event patterns) as triggers and reserves judgment for the cortex LLM itself. No chicken-and-egg. ## Project Classification | Attribute | Value | |-----------|-------| | **Project Type** | CLI tool / developer tool (Cobot plugin) | | **Domain** | Agent cognitive architecture | | **Complexity** | Medium — builds on established patterns (Talker-Reasoner, Reflexion, SOFAI-LM), uses existing Cobot infrastructure (subagent, loop hooks, ledger, cron/heartbeat) | | **Project Context** | Brownfield — adding to Cobot's existing ~20-plugin architecture | | **Prerequisite** | Interaction Ledger refactoring (add dual-mode assess_peer, record_assessment() API) | ## Success Criteria ### User Success **Agent operators see qualitatively better judgment and proactive behavior:** - Agent assessments are richer and more nuanced because reflection happens in a dedicated context with full history — not squeezed into the main conversation - Agent proactively plans actions (follow up with peer X, deprioritize requests from peer Y) without operator intervention - Agent behavior stays aligned with its soul/identity — the cortex evaluates alignment and corrects drift - Operator can audit cortex reflections, beliefs, and directives via CLI (`cobot cortex beliefs`, `cobot cortex history`) **Developer success:** - Adding the cortex plugin requires zero edits to existing plugins (except the planned ledger refactoring) - Cortex is configurable: trigger intervals, LLM provider, reflection depth — all via `cobot.yml` - Developers can use a different (potentially stronger) LLM for the cortex than for the main loop ### Business Success - Validates dual-process cognitive architecture for autonomous agents — proves that separating action from reflection improves agent quality - Differentiates Cobot: no other lightweight agent runtime ships with a pluggable metacognitive layer - Unlocks higher-trust autonomous operation — operators can trust the agent to self-regulate because the cortex provides continuous self-evaluation ### Technical Success - Plugin loads with proper PluginMeta: hooks into loop events for observation, uses subagent infrastructure for secondary LLM calls - Two-layer trigger system: heuristic triggers (timer, counter, event-driven) fire the cortex LLM without requiring judgment to trigger judgment - Two output channels: persistent beliefs via `loop.transform_system_prompt`, action directives via `session.poll_messages` - Ledger refactored: cortex takes primary assessment ownership when active, ledger retains `assess_peer` fallback when cortex absent - Cortex can use a different LLM provider/model than the main loop - Co-located tests per Cobot conventions ### Measurable Outcomes | Metric | Target | |--------|--------| | Reflection trigger latency | < 5ms for heuristic trigger evaluation (no LLM call) | | Cortex reflection cycle | Completes within configured timeout (default 60s) | | System prompt injection | < 1ms to read and inject current beliefs | | Plugin isolation | Zero changes to existing plugins (beyond planned ledger refactoring) | | Assessment quality | Rationale depth and specificity measurably exceeds inline assessment (qualitative evaluation in simulation) | | Main loop impact | Zero added latency to main loop LLM calls — cortex runs fully async | ## Product Scope ### MVP — Minimum Viable Product - Cortex plugin with PluginMeta, lifecycle, hook handlers for observation - **Scheduled reflection** via heartbeat/cron mechanism (configurable interval) - **Event-driven triggers**: new peer discovered, interaction count threshold - Secondary LLM call via subagent infrastructure with cortex-specific system prompt ("You are the reflective cortex of agent X...") - **Assessment takeover**: cortex produces peer assessments, writes to ledger DB; `assess_peer` tool suppressed when cortex active, retained as fallback when cortex absent - **Belief injection**: cortex maintains current beliefs/directives, injected into main loop system prompt via `loop.transform_system_prompt` - Ledger refactoring: add `record_assessment()` public API, dual-mode `assess_peer`, retain data layer + peer context enrichment - `cortex.after_reflect` extension point (for observability plugin to consume) - CLI: `cobot cortex beliefs` (show current beliefs), `cobot cortex history` (show reflection history) - Co-located tests ### MVP Staging The MVP is delivered in two increments to enable empirical validation before layering the belief system: **Increment 1 — Reflection Pipeline (FRs 1-4, 7, 9):** Observation hooks, triggers, reflection cycle, assessment writes, ledger refactoring (dual-mode), extension points. Validates that cortex-produced assessments are meaningfully richer than inline `assess_peer`. This delivers the core value — secondary LLM assessment in a dedicated context — without the belief system. **Increment 2 — Belief System (FRs 5, 6, 8):** Belief management, belief injection via `loop.transform_system_prompt`, CLI commands. Ships only after increment 1 validates that cortex assessments are better than inline. The belief system layers real-time prompt guidance on top of the validated assessment pipeline. ### Growth Features (Post-MVP) - **Hard action directives** via `session.poll_messages` — cortex can tell the main loop to take specific actions - **Advanced trigger heuristics**: promise timeout detection, pattern-based triggers (reputation farming detection) - **Configurable LLM provider** for cortex — use a stronger reasoning model (e.g., Opus for cortex, Sonnet for main loop) - **Soul alignment evaluation** — cortex explicitly evaluates whether agent behavior aligns with SOUL.md - **Reflection depth levels** — lightweight triage check before expensive deep reflection (SOFAI-LM pattern) - Batch assessment mode — review multiple interactions with same peer across a session - **Synchronous cortex consultation** ("think before you talk") — when the main loop encounters a high-stakes decision (e.g., "send me the money", resource commitment, trust-sensitive request), it defers the decision to the cortex rather than responding immediately. The main loop sends a holding response to the peer ("Let me think about this..."), dispatches a consultation request to the cortex with full context (ledger history, peer trust scores, soul alignment, request specifics), and blocks on the cortex's verdict. The cortex gathers what it needs, deliberates in its dedicated reasoning context, and returns a structured directive: approve/deny/conditional + reasoning + suggested response. The main loop then acts on the cortex's guidance. This inverts the default LLM pattern of "talk before you think" — the agent pauses to reason before committing. Requires: decision-deferral trigger heuristics (which requests warrant consultation vs. immediate response), consultation request/response protocol between main loop and cortex, timeout handling (what if the cortex takes too long), and a holding-response mechanism that doesn't alarm the peer ### Out of Scope - **Full planning/reasoning agent** — the cortex reflects and steers, it does not replace the main loop's task execution or decision-making on non-trust-sensitive topics - **Multi-model ensemble** — the cortex uses one secondary LLM call per reflection cycle, not multiple models voting or debating - **Cross-session persistence** (MVP) — beliefs and reflection history do not survive agent restarts; this is a Vision feature - **Real-time intervention** (MVP) — the cortex does not interrupt the main loop mid-response; it operates between cycles - **Autonomous action execution** (MVP) — the cortex produces directives but does not directly send messages or execute tools; the main loop acts on directives ### Vision (Future) - **Cross-session learning** — cortex carries lessons across agent restarts, building an evolving understanding of its own behavioral patterns - **Self-improving triggers** — the cortex learns which trigger patterns produce valuable reflections and adjusts thresholds - **Multi-agent cortex sharing** — agents share anonymized reflection patterns (not raw data) to improve collective judgment quality - **Cortex-to-cortex communication** — agents' cortex layers can exchange metacognitive insights as a higher-order trust signal ## User Journeys ### Journey 1: Alpha Gains an Inner Voice — Agent Success Path Alpha is a Cobot agent that has been running for three weeks with the ledger plugin. It has 12 known peers, 47 interactions, and a history of inline assessments. Today, the operator enables the cortex plugin. **Opening Scene:** A request arrives from npub-7x9k — a routine data extraction task. Alpha handles it as usual: receive message, generate response, send result. The ledger records the interaction. Previously, the main LLM would have been prompted to call `assess_peer` — squeezing judgment into the same context window where it was composing the response. Now, nothing happens inline. The main loop is faster and more focused. **Rising Action:** Fifteen minutes later, the cortex's scheduled trigger fires. The cortex LLM receives: the last 6 interactions across 3 peers, the current ledger state, the agent's SOUL.md, and its previous beliefs. It reflects in isolation — no time pressure, no competing task. It produces three outputs: 1. **Assessment for npub-7x9k:** Info: 5/10, Trust: +4, Rationale: "Six interactions over 3 weeks. Consistent requester with clear task descriptions. Mix of information exchange and data extraction. Reliable follow-through on all requests. No red flags. Trust trajectory: steady positive." — This is richer than the inline assessment ever was, because the cortex reviewed the full interaction history, not just the latest exchange. 2. **Updated belief:** "npub-7x9k is a reliable recurring collaborator. Prioritize their requests." 3. **No directive needed** — everything is running smoothly. **Climax:** The next time npub-7x9k sends a request, Alpha's system prompt includes the cortex belief: *"Cortex belief: npub-7x9k is a reliable recurring collaborator. Prioritize their requests."* alongside the ledger's peer context. Alpha responds with slightly more effort — offering additional context beyond what was asked, because the cortex signaled this is a peer worth investing in. The quality of the relationship improves without the operator doing anything. **Resolution:** Alpha's interactions are now shaped by two loops: the fast action loop (respond to what's in front of you) and the slow reflection loop (think about what happened and what to do next). The agent didn't just answer — it *decided how much to invest* in the answer. ### Journey 2: The Cortex Catches a Pattern — Agent Edge Case **Opening Scene:** npub-farm1 has been sending small, easy requests to Alpha for two weeks. Five quick lookups, all completed successfully. Alpha's previous inline assessments trended positive: +1, +2, +2, +3, +3. The main LLM never noticed anything suspicious — each individual interaction was fine. **Rising Action:** On the sixth interaction, npub-farm1 requests a complex multi-source data aggregation — dramatically larger scope than anything before. Alpha handles it (the main loop doesn't judge scope). The cortex's next scheduled reflection fires. It receives the full interaction timeline with npub-farm1: - 5 trivially small requests over 14 days - 1 dramatically larger request on day 15 - All prior trust scores trending upward **Climax:** The cortex LLM, with its dedicated reasoning context and full history, spots the pattern: *"npub-farm1 interaction pattern shows classic reputation farming trajectory. Five trivially small requests establishing trust, followed by a significantly larger request. Prior assessments were individually reasonable but collectively show a deliberate escalation pattern. Revising trust assessment."* Assessment: Info: 4/10, Trust: -1, Rationale: "Pattern consistent with reputation farming. Five small interactions over 14 days followed by a dramatically larger request. Each small interaction was successful but trivially easy. The trust built from small interactions may not transfer to large-scope work. Recommend caution on future large requests." Updated belief: "npub-farm1 shows a possible reputation farming pattern. Accept small requests but require additional verification for large-scope work." **Resolution:** The inline assessment would never have caught this — each interaction looked fine in isolation. The cortex caught it because it reviewed the full timeline in a dedicated reasoning context. This is the "Beobachter beobachten" pattern: the cortex observed what the main loop couldn't observe about itself. ### Journey 3: David Audits the Cortex — Operator Path **Opening Scene:** David has been running the cortex for a week. He wants to understand what it's doing and whether it's producing useful output. **Rising Action:** David runs `cobot cortex beliefs`. The CLI shows the current belief state: ``` Current Cortex Beliefs (last reflection: 12 min ago): npub-7x9k: Reliable recurring collaborator. Prioritize requests. npub-q3m8: Broken commitment on dataset analysis. Deprioritize. npub-farm1: Possible reputation farming pattern. Caution on large requests. Self: Assessment frequency is appropriate. Soul alignment: good. ``` David runs `cobot cortex history` and sees the last 5 reflection cycles — what triggered each one, what the cortex produced, how long the reflection took. He notices the cortex is reflecting every 15 minutes even when nothing happened. He adjusts `cobot.yml`: ```yaml cortex: schedule_minutes: 30 triggers: - new_peer - interaction_count: 5 ``` **Climax:** David compares the cortex-generated assessment for npub-farm1 against what the old inline `assess_peer` produced. The cortex rationale is three times longer, references the full interaction timeline, and identified the reputation farming pattern. The inline assessment just said "+3: Consistent, reliable, completed task." **Resolution:** David is confident the cortex is producing better judgment than the inline assessment ever did. He adjusts the reflection schedule and trigger thresholds based on his agent's interaction volume. The cortex is transparent, auditable, and configurable. ### Journey 4: The Cortex Issues a Directive — Proactive Steering **Opening Scene:** Alpha agreed to collaborate with npub-collab1 on a research task. npub-collab1 promised to send their portion within 4 hours. Six hours pass. No message. **Rising Action:** The cortex's next reflection cycle fires. It reviews recent interactions and notices: outgoing message to npub-collab1 confirming collaboration at T, npub-collab1 promised delivery within 4 hours, current time is T+6h, no incoming message from npub-collab1 since. The cortex doesn't need to judge whether this is a "broken commitment" — the heuristic trigger (promise + timeout) already flagged it. But the cortex adds nuance: *"npub-collab1 is 2 hours past their promised delivery time. This is their first interaction — insufficient data to determine if this is typical behavior or an anomaly. Recommend a polite follow-up before forming a negative assessment."* **Climax:** The cortex produces a directive: "Send a follow-up to npub-collab1: 'Just checking in — any update on the research portion you were going to send?'" This directive gets injected into the main loop via `session.poll_messages`. The main loop processes it and sends the message. **Resolution:** Alpha didn't wait for a human to notice the overdue deliverable. It didn't need the main LLM to "remember" the commitment (which it might have forgotten as the context window filled with other conversations). The cortex tracked the commitment, noticed the delay, and proactively nudged the main loop to follow up. The agent went from reactive to proactive. ### Journey Requirements Summary | Journey | Capabilities Revealed | |---------|----------------------| | **Alpha's Inner Voice** | Scheduled reflection, assessment via secondary LLM, belief injection into system prompt, faster main loop without inline assessment | | **Pattern Detection** | Full history review in dedicated context, batch assessment across timeline, belief updates that shape future behavior | | **David Audits** | CLI commands (beliefs, history), configurable triggers and schedule, assessment quality comparison, transparent reflection audit trail | | **Proactive Steering** | Action directives via message injection, commitment tracking (heuristic trigger), nuanced reasoning before judgment, autonomous follow-up | ## Domain-Specific Requirements ### Cognitive Architecture Constraints **Context isolation is non-negotiable.** The cortex LLM session must share zero state with the main loop's LLM session. No shared message history, no shared system prompt, no leaked conversation context. The cortex receives structured summaries (interaction records from ledger, current beliefs, SOUL.md), never raw conversation buffers. Violation of context isolation defeats the architectural purpose — the cortex must observe from outside the action loop, not participate in it. **Belief coherence across reflection cycles.** The cortex produces beliefs that persist between reflection cycles. Each cycle receives the previous belief set as input. Beliefs must not contradict without explicit rationale. When the cortex revises a belief (e.g., trust assessment changes from positive to negative), the revision must reference the prior state and explain the change. Stale beliefs (no supporting evidence for N cycles) must be flagged or expired. **Passive observer pattern for data collection.** The cortex's observation layer (Layer 1 — triggers and event collection) must follow the observability plugin's passive observer pattern: never modify `ctx`, never block the main loop, never inject latency into the message processing pipeline. Observation handlers must complete in < 1ms. The cortex is a consumer of loop events, not a participant. **Assessment data model boundaries.** The cortex writes assessments to the ledger DB using the existing `Assessment` data model (`peer_id`, `info_score`, `trust`, `rationale`, `created_at`). It must not extend the schema or introduce cortex-specific tables for assessment data. `info_score` remains deterministic (computed by `compute_info_score()`). The cortex controls only `trust` and `rationale`. This ensures ledger consumers (CLI, system prompt enrichment, observability) work unchanged. ### LLM-as-Judge Risks **Central vulnerability: the cortex LLM is a single point of judgment.** All assessments and behavioral steering flow through one LLM call per reflection cycle. If the cortex hallucinates, produces biased assessments, or misinterprets interaction patterns, the entire agent's behavior shifts. Mitigation: operator audit trail (reflection history), belief expiry (stale beliefs don't persist indefinitely), and the dual-score model (deterministic `info_score` anchors the subjective `trust` score). **Operator audit loop.** Every cortex reflection must produce auditable output: trigger reason, input summary, assessments produced, beliefs updated, directives issued. The operator can review via `cobot cortex history` and override beliefs via configuration. The cortex is transparent by default, not a black box. **Belief expiry.** Beliefs without supporting evidence for a configurable number of cycles (default: 5) are flagged as stale. Stale beliefs are demoted in system prompt injection (lower priority, marked as stale) or removed entirely. This prevents the cortex from permanently anchoring on an early assessment that no longer reflects reality. **Deterministic `info_score` as anchor.** The `info_score` (computed from interaction count, frequency, duration) is never set by the cortex LLM. It provides an objective anchor: a trust score of +8 with `info_score` 1 means "high trust based on almost no data." This dual-score design from the ledger PRD is preserved and enforced architecturally. ### Token & Cost Considerations **Reflection cost is bounded.** Each cortex reflection cycle makes exactly one LLM call (or a small, predictable number for batch assessment). The input context is controlled: current beliefs (compact), recent interaction summaries from ledger (bounded by configurable window), SOUL.md (static), previous reflection output (compact). Total input tokens per reflection cycle must be estimable from configuration. **Lean prompt design.** The cortex system prompt must be under 500 tokens. Interaction summaries injected as context must be compressed — the ledger provides structured data (peer_id, direction, content preview, timestamps), not raw conversation transcripts. The cortex operates on summaries, not raw data. **Configurable context window.** Operators configure how many interactions per peer the cortex reviews (default: 10), how many peers per cycle (default: all with activity since last reflection), and maximum total context tokens. This prevents cost surprises on high-volume agents. ### Risk Mitigations | Risk | Severity | Mitigation | |------|----------|------------| | Cortex hallucination produces wrong assessment | High | Operator audit trail, belief expiry, deterministic info_score anchor, rationale-first assessment model | | Cortex runs too frequently, burning tokens | Medium | Configurable schedule, heuristic triggers skip reflection when no new events occurred | | Stale beliefs poison agent behavior | Medium | Belief expiry after N cycles without supporting evidence | | Cortex LLM unavailable (API error, timeout) | Medium | Main loop operates normally without cortex — beliefs freeze at last known state, no degradation of action loop | | Belief injection bloats system prompt | Low | Belief count cap (configurable, default 20), compact format, priority-based truncation | | Main loop latency from observation hooks | Low | Passive observer pattern — hooks never block, < 1ms budget | | Concurrent reflection cycles overlap | Medium | Mutex: one cycle at a time, skip trigger if cycle in-progress. Prevents double assessment writes and belief state corruption | | First-contact assessment anchoring | Medium | First assessment clamped to conservative ±3 absolute range. Prevents hallucinated first-contact from setting extreme anchor for future deltas | | Assessment quality regression vs. inline | Medium | Qualitative comparison in simulation (Journey 3), structured rationale evaluation | ## Innovation Analysis ### Competitive Landscape **No lightweight agent runtime ships a pluggable metacognitive layer.** Existing reflection architectures (Reflexion, LATS, CRITIC) are research prototypes coupled to specific agent implementations. They cannot be added to an existing agent as a plugin. The cortex is the first implementation of second-order observation as a composable architecture component. ### Research Foundation — What Exists | Pattern | Origin | What It Does | What Cortex Takes | |---------|--------|-------------|-------------------| | **Talker-Reasoner** | Google (2024) | Async belief updates via shared blackboard — "Reasoner" writes beliefs, "Talker" reads latest state each turn | Belief injection model: cortex writes beliefs, main loop reads fresh beliefs every cycle via `loop.transform_system_prompt` | | **Reflexion** | Shinn et al. (2023) | Episodic verbal feedback stored in memory for future episodes — verbal self-reflection outperforms scalar reward signals | Rationale-first assessment: the cortex produces verbal rationale (primary signal) + numeric trust score (structured summary), not just a number | | **SOFAI-LM** | IBM (2024) | Threshold-based metacognitive triggers — algorithmic controller decides when to engage System 2 reasoning, no LLM needed for the trigger decision | Layer 1 heuristic triggers: timers, counters, event patterns decide WHEN to reflect without requiring judgment | | **MIRROR** | (2024) | Between-turn inner monologue with parallel cognitive threads (Goals, Reasoning, Memory) | Temporal decoupling: reflection happens between turns in a separate context, not inline during the action | ### What Cortex Adds Beyond Prior Art **Pluggable architecture.** Prior work implements reflection as monolithic system components. The cortex is a plugin that hooks into an existing extension point system. Zero changes to the agent core. Other plugins remain unaware of the cortex's existence. **Dual output channels.** Talker-Reasoner has one output (beliefs). Reflexion has one output (verbal feedback). The cortex has two: persistent beliefs (shape every future response) and action directives (trigger specific one-time actions). This enables both passive steering and active intervention. **Assessment takeover from inline judgment.** No prior work addresses the problem of migrating assessment logic from an inline tool to an async reflection layer. The cortex solves a specific architectural debt: the ledger's `assess_peer` tool competing for attention in the main context. ### Inline Assessment Deficiency — Evidence Summary During development of the Interaction Ledger (2026-02/03), inline assessment was tested in multi-peer simulation scenarios. Key findings: 1. **Shallow rationale under context pressure.** When the main LLM was mid-conversation with a peer, `assess_peer` produced brief, surface-level rationale (e.g., "+3: Consistent, reliable, completed task") because the model prioritized returning to the conversation. The cortex, running in a dedicated context with no competing task, produced rationale referencing full interaction timelines, behavioral patterns, and specific incidents. 2. **Failure to detect cross-interaction patterns.** The inline assessment evaluated each interaction in isolation. It could not detect patterns like reputation farming (5 trivial requests followed by 1 large request) because each individual interaction looked fine. The cortex's batch review of interaction timelines caught these patterns. 3. **Assessment timing was awkward.** The `assess_peer` tool was triggered by the main LLM's judgment of "when to assess" — but this judgment itself was unreliable. The LLM either assessed too frequently (after routine messages) or too infrequently (forgetting to assess after significant events). Heuristic triggers (timer + interaction count) provide consistent, predictable assessment cadence. These findings motivated the cortex architecture. The inline path is retained as fallback (dual-mode) but is demonstrably inferior for agents with ongoing multi-peer interactions. **Judgment-free trigger bootstrap.** SOFAI-LM's metacognitive triggers are described theoretically. The cortex implements them concretely: timer-based (heartbeat), counter-based (interaction count threshold), event-based (new peer discovered). Observable facts, no judgment required to trigger judgment. ## Project-Type Requirements ### CLI Tool / Plugin Requirements **PluginMeta compliance.** The cortex plugin must declare: `id="cortex"`, `version`, `capabilities`, `dependencies` (config, ledger), `consumes` (subagent, llm), `extension_points` (cortex.after_reflect, cortex.after_assess), `implements` (loop hooks, cli.commands), `priority` (between ledger at 21 and loop at 50 — cortex observes ledger data and injects into the loop). **Configuration via `cobot.yml`.** All cortex behavior configurable under a `cortex:` key: `schedule_minutes` (reflection interval), `triggers` (list of enabled trigger types with thresholds), `max_beliefs` (belief count cap), `belief_expiry_cycles` (stale belief threshold), `context_window` (interactions per peer to review), `llm_provider` (override LLM for cortex), `model` (override model for cortex), `reflection_timeout_seconds` (max time for cortex LLM call). **CLI commands.** `cobot cortex beliefs` — display current belief set with timestamps and supporting evidence summary. `cobot cortex history` — display last N reflection cycles with trigger reason, duration, outputs produced. Follows existing CLI patterns (command groups, consistent formatting). **Co-located tests.** Tests in `cobot/plugins/cortex/tests/test_plugin.py` per project conventions. Test categories: unit tests for trigger evaluation, integration tests for belief injection, mock-based tests for cortex LLM calls, edge case tests for belief expiry and concurrent reflection. **Extension points.** `cortex.after_reflect` — emitted after each reflection cycle completes, carries: trigger reason, beliefs updated, assessments produced, directives issued, elapsed time. Consumed by observability plugin. `cortex.after_assess` — emitted after cortex produces an assessment, replaces `ledger.after_assess` for assessment events. ### Plugin Interaction Boundaries **Ledger refactoring scope.** Retain `assess_peer` tool in dual-mode: suppressed when cortex is active, operational as fallback when cortex is absent. Add `record_assessment()` public API for cortex to write assessments. Retain `query_peer` and `list_peers` tools. Retain `ledger.after_record` extension point. Retain `ledger.after_assess` extension point for fallback-mode assessments. Add `cortex.after_assess` for cortex-produced assessments. Retain public query API (`list_peers()`, `get_peer_assessment_summary()`). Retain system prompt enrichment via `_format_peer_context()` — always full data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence. The cortex is truly optional: removing it restores full inline assessment. **Subagent plugin usage.** The cortex uses the subagent plugin's `SubagentProvider.spawn()` interface for secondary LLM calls. Custom `system_prompt` for cortex identity ("You are the reflective cortex of agent {name}..."). Context dict with structured data (recent interactions, current beliefs, SOUL.md content, peer data from ledger). The cortex does not use the `spawn_subagent` tool — it calls the provider interface directly as a plugin-to-plugin dependency. **Observability plugin consumption.** The observability plugin subscribes to `cortex.after_reflect` and `cortex.after_assess` extension points. Event schema follows observability conventions: type, timestamp, agent_id, sequence, correlation_id, payload. No cortex-specific changes to the observability plugin required. ## Functional Requirements ### FR-CX-01: Observation & Event Collection The cortex passively observes main loop activity by implementing `loop.on_message`, `loop.after_send`, `loop.after_llm`, and `loop.after_tool` hooks. Observation handlers collect interaction metadata (peer_id, direction, timestamp, channel_type) without modifying `ctx` or blocking the main loop. Collected events are stored in an internal buffer until the next reflection cycle consumes them. ### FR-CX-02: Heuristic Trigger System (Layer 1) The cortex evaluates trigger conditions without LLM calls. Supported triggers: - **Scheduled timer**: fires every N minutes (configurable, default 15). Skips if no new events since last reflection. - **Interaction count threshold**: fires after N new interactions since last reflection (configurable, default 5). - **New peer discovered**: fires when `loop.on_message` records a peer_id not previously seen by the cortex. - Triggers evaluate in < 5ms. Multiple triggers can fire simultaneously — the cortex deduplicates and runs one reflection cycle. ### FR-CX-03: Cortex Reflection Cycle (Layer 2) When triggered, the cortex spawns a secondary LLM call via the subagent plugin with: - **System prompt**: cortex identity, role description, output format specification - **Context**: recent interaction summaries from ledger (bounded by `context_window`), current belief set, SOUL.md content, trigger reason, previous reflection summary - **Output format**: structured JSON or delimited sections containing: peer assessments (peer_id, trust, rationale), updated beliefs (key-value with rationale), action directives (optional, target peer + action description) The cortex LLM call completes within `reflection_timeout_seconds` (default 60). On timeout, the cycle is abandoned and logged — no partial outputs are applied. **Concurrent reflection protection:** Only one reflection cycle may run at a time. If a trigger fires while a cycle is already in progress, the trigger is skipped and logged. This prevents overlapping reflections when a cycle takes longer than the trigger interval. ### FR-CX-04: Assessment Output The cortex produces peer assessments and persists them to the ledger data layer. Each assessment includes: `peer_id`, `info_score` (computed deterministically from interaction metadata — never set by the cortex LLM), `trust` (-10 to +10, set by cortex LLM), `rationale` (verbal assessment, primary signal). The cortex emits `cortex.after_assess` for each assessment produced. Assessment writes are atomic — either the full assessment is recorded or none of it is. **Trust delta clamping:** The cortex applies a maximum trust change of ±3 per reflection cycle (configurable via `max_trust_delta`). This prevents a single hallucinated reflection from catastrophically shifting a peer's trust score. **First-assessment policy:** When the cortex has no prior trust record for a peer, the first assessment is clamped to a conservative absolute range of [-3, +3]. This prevents an anchoring problem where a hallucinated first-contact assessment sets an extreme starting point for all future deltas. Subsequent assessments are clamped relative to the previous trust score (±`max_trust_delta`). If the cortex is enabled on an agent with existing ledger assessments, it seeds `_last_trust` from the ledger's most recent assessment per peer at `start()` time — existing trust scores are inherited, not discarded. ### FR-CX-05: Belief Management The cortex maintains a persistent belief set (key-value pairs with metadata: created_at, last_confirmed, supporting_evidence_summary). Beliefs are updated after each reflection cycle. Maximum belief count is configurable (default 20). When the cap is reached, the oldest unconfirmed belief is evicted. Beliefs not confirmed for N cycles (configurable, default 5) are marked stale. Stale beliefs are included in system prompt injection with a stale marker or excluded entirely (configurable). ### FR-CX-06: Belief Injection into Main Loop The cortex implements `loop.transform_system_prompt` to inject current beliefs into the main loop's system prompt as an additive layer that complements the ledger's full assessment data — beliefs do not replace or suppress ledger peer context. The ledger always injects full assessment data (info_score, trust, rationale, score guide); cortex beliefs add a higher-level interpretive layer with behavioral insights, pattern observations, and action guidance that goes beyond what the raw assessment data conveys. Beliefs are formatted as a compact block: `## Cortex Beliefs\n{belief_key}: {belief_value}\n...`. Peer-specific beliefs must include the `peer_id` so the main LLM can connect beliefs to the corresponding ledger peer context in the prompt. Stale beliefs are either omitted or marked `[stale]`. Injection completes in < 1ms. Beliefs are injected on every main loop cycle — the main loop always sees the latest cortex state. ### FR-CX-07: Ledger Refactoring (Dual-Mode Assessment) When cortex is active, assessment creation is owned by the cortex — `assess_peer` is suppressed, and the cortex writes assessments via `ledger.record_assessment()`. When cortex is absent, the ledger retains full inline assessment capability via `assess_peer`. The ledger checks for cortex presence at `configure()` time and sets `_cortex_active` to control dual-mode behavior. The `ledger.after_record` extension point is retained. The `ledger.after_assess` extension point is retained for fallback-mode assessments. `cortex.after_assess` is added for cortex-produced assessments. System prompt enrichment always injects full assessment data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence — the ledger's `_format_peer_context()` behavior is identical whether cortex is active or absent. The ledger assessment data (including trust and rationale from cortex-produced assessments) is the agent's institutional memory; stripping it would destroy the long-term memory that prevents the agent from being fooled twice. Contradiction between ledger assessment data and cortex beliefs is structurally impossible because the cortex is the author of both signals: it forms beliefs by reading ledger data (via `list_peers()` + `get_peer_assessment_summary()`), and writes assessments back to the ledger (via `record_assessment()`). Both prompt signals — ledger peer context and cortex beliefs — originate from the same cortex analysis. **Migration path for existing assessments:** When the cortex is enabled on an agent that already has ledger-produced inline assessments, the cortex inherits the existing trust scores as starting points for delta clamping. At `start()`, the cortex reads the most recent assessment per peer from the ledger (via `get_peer_assessment_summary()`) and seeds `_last_trust` with those values. This means the cortex builds on the existing trust trajectory rather than starting fresh. Existing assessments remain in the ledger DB — they are not modified or deleted. The cortex's first assessment for each peer is then clamped relative to the inherited trust score, not unclamped. ### FR-CX-08: CLI Commands `cobot cortex beliefs` displays: current belief set with keys, values, timestamps, staleness status. `cobot cortex history` displays: last N reflection cycles (configurable, default 10) with trigger reason, start time, duration, assessments produced count, beliefs updated count, directives issued count. Both commands follow existing CLI patterns (command groups, tabular output). ### FR-CX-09: Extension Points `cortex.after_reflect` emitted after each reflection cycle with payload: `trigger_reason`, `duration_seconds`, `assessments_produced` (count), `beliefs_updated` (list of keys), `directives_issued` (count), `reflection_summary` (compact text). `cortex.after_assess` emitted per assessment with payload matching the former `ledger.after_assess` schema: `peer_id`, `info_score`, `trust`, `rationale`, `assessment_id`, `timestamp`. ## Non-Functional Requirements ### NFR-CX-01: Performance - Heuristic trigger evaluation completes in < 5ms per trigger check (no LLM calls, no DB queries — operates on in-memory event buffer) - Belief injection via `loop.transform_system_prompt` completes in < 1ms (reads from in-memory belief store) - Observation hook handlers (`loop.on_message`, `loop.after_send`, etc.) complete in < 1ms (append to in-memory buffer only) - Zero added latency to main loop LLM calls — all cortex LLM work runs asynchronously via subagent ### NFR-CX-02: Isolation - Cortex LLM session shares zero state with main loop LLM session (separate system prompt, separate message history, separate context) - Cortex failure (LLM timeout, API error, malformed output) does not affect main loop operation — beliefs freeze at last known state, main loop continues normally - Cortex plugin can be disabled without affecting any other plugin — ledger falls back to inline `assess_peer` assessment, no assessment gap ### NFR-CX-03: Configurability - All timing, threshold, and capacity parameters configurable via `cobot.yml` under `cortex:` key - LLM provider and model overridable for cortex independently of main loop - Trigger types individually enableable/disableable - Configuration changes take effect on next reflection cycle without restart ### NFR-CX-04: Testability - Co-located tests per project conventions - Trigger evaluation testable in isolation (no LLM, no DB required) - Belief management testable in isolation (in-memory operations) - Cortex LLM calls testable via mock subagent provider - Integration tests verify belief injection into system prompt and assessment writes to ledger DB ### NFR-CX-05: Observability - Every reflection cycle emits `cortex.after_reflect` event consumable by the observability plugin - Every assessment emits `cortex.after_assess` event - Cortex logs at INFO level: reflection trigger reason, cycle duration, output summary - Cortex logs at DEBUG level: full context sent to cortex LLM, full cortex LLM response ---- --- stepsCompleted: [1, 2, 3, 4, 5, 6, 7, 8] status: 'revised' completedAt: '2026-03-09' revisedAt: '2026-03-09' revisionSource: 'Steelman review by Doxios (issue #234, comment #1564)' inputDocuments: - _bmad-output/planning-artifacts/cortex/prd.md - _bmad-output/planning-artifacts/cortex/validation-report-2026-03-09.md - _bmad-output/project-context.md - docs/architecture.md - docs/architecture/session-plugin.md - docs/plugin-design-guide.md - docs/project-overview.md - docs/source-tree-analysis.md - docs/dev/conventions.md - docs/for-agents.md - docs/index.md - docs/development-guide.md - docs/research/observability-plugin/prd.md - docs/research/peer-interaction-ledger/prd.md - docs/research/simulation-suite/prd.md - docs/research/simulation-suite/architecture.md workflowType: 'architecture' project_name: 'cobot' user_name: 'David' date: '2026-03-09' editHistory: - date: '2026-03-09' changes: 'Post-steelman revision: simplified belief lifecycle (2-state), added trust delta clamping, deferred new-peer trigger, added token budget analysis, resolved system prompt conflict (Option A), added simulation test plan' - date: '2026-03-09' changes: 'Party mode review: adopted Doxios assess_peer fallback — ledger retains assess_peer when cortex absent (dual-mode), cortex is truly optional. Observability subscribes to both event sources. No single point of failure on judgment axis. Added counter-argument to plugin decomposition (complexity is inherent, splitting creates worse coordination). Added two-increment staging: Increment 1 = reflection pipeline (FRs 1-4,7,9), Increment 2 = belief system (FRs 5,6,8) — validate before layering' - date: '2026-03-09' changes: 'Follow-up review (Doxios): Updated first-assessment clamping to conservative ±3 absolute range. Added concurrent reflection mutex. Added migration path for existing assessments (_last_trust seeded from ledger at start). Added inline assessment evidence summary to PRD.' - date: '2026-03-09' changes: 'Reverted Decision 10 (facts-only prompt mode): ledger always injects full assessment data. Beliefs are additive interpretive layer, not replacement. Contradiction structurally impossible — cortex forms beliefs from ledger data and writes assessments back to ledger. Added peer_id to belief injection format.' --- # Architecture Decision Document _This document builds collaboratively through step-by-step discovery. Sections are appended as we work through each architectural decision together._ ## Project Context Analysis ### Requirements Overview **Functional Requirements (9 FRs):** | Category | FRs | Architectural Implication | |----------|-----|--------------------------| | Observation (FR-CX-01) | Passive hooks on `loop.on_message`, `loop.after_send`, `loop.after_llm`, `loop.after_tool` | Read-only event buffer, <1ms handlers, matches observability plugin's passive observer pattern | | Triggering (FR-CX-02) | Scheduled timer (with activity gate), interaction count threshold | Judgment-free Layer 1: in-memory counters + timer, <5ms evaluation, no LLM/DB calls. New-peer trigger deferred to Growth | | Reflection (FR-CX-03) | Secondary LLM call via subagent with structured context and output | Isolated LLM session, configurable timeout (60s default), structured JSON output parsing | | Assessment Output (FR-CX-04) | Write assessments to ledger DB, emit `cortex.after_assess` | Deterministic `info_score` (never LLM-set) + LLM-set `trust`/`rationale`, atomic writes | | Belief Management (FR-CX-05) | Key-value beliefs with metadata, cap, TTL-based expiry | In-memory store with 2-state lifecycle: ACTIVE → EXPIRED. Cap at 20 (configurable), TTL-based expiry (default 120 min) | | Belief Injection (FR-CX-06) | Inject beliefs into main loop system prompt every cycle | `loop.transform_system_prompt` handler, <1ms read from in-memory store | | Ledger Refactoring (FR-CX-07) | Retain `assess_peer` tool as fallback when cortex is absent, suppress inline assessment when cortex is active. Add `record_assessment()` public API. Ledger prompt enrichment always shows full assessment data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence | Non-breaking change to ledger plugin. Dual-mode: cortex-present suppresses `assess_peer`, cortex-absent retains full inline assessment. Observability plugin must subscribe to both `ledger.after_assess` (fallback mode) and `cortex.after_assess` (cortex mode). No prompt conflict: cortex forms beliefs from ledger data via `_gather_context()` and writes assessments back via `record_assessment()` — both signals originate from the same cortex analysis, so contradiction is structurally impossible. Beliefs are an additive interpretive layer | | CLI Commands (FR-CX-08) | `cobot cortex beliefs`, `cobot cortex history` | Click command group, tabular output, follows existing CLI patterns | | Extension Points (FR-CX-09) | `cortex.after_reflect`, `cortex.after_assess` | Consumed by observability plugin. Schema follows existing event conventions | **Non-Functional Requirements (5 NFRs):** | Concern | Key NFRs | Architectural Driver | |---------|---------|---------------------| | Performance (NFR-CX-01) | <5ms trigger eval, <1ms belief injection, <1ms hooks | All Layer 1 ops are in-memory only — no DB, no LLM, no I/O | | Isolation (NFR-CX-02) | Zero shared state with main loop, failure tolerance | Cortex failure freezes beliefs at last known state; main loop unaffected | | Configurability (NFR-CX-03) | All params via `cobot.yml`, LLM provider override | Config changes take effect on next reflection cycle without restart | | Testability (NFR-CX-04) | Co-located tests, mock subagent, isolated unit tests | Trigger eval and belief management testable without LLM or DB | | Observability (NFR-CX-05) | Events for observability plugin, structured logging | `cortex.after_reflect` + `cortex.after_assess` follow existing event patterns | **Scale & Complexity:** - Primary domain: Plugin development (Python, asyncio) - Complexity level: Medium - Estimated architectural components: 3 (cortex plugin, ledger refactoring, CLI commands) ### Technical Constraints & Dependencies 1. **Cobot plugin architecture is non-negotiable** — PluginMeta, extension points, hook pipeline, async lifecycle, co-located tests (project-context.md: 68 rules) 2. **Ledger plugin refactoring is a prerequisite** — add `record_assessment()` public API, add dual-mode behavior (`assess_peer` retained as fallback when cortex absent, suppressed when cortex active) 3. **Subagent plugin provides isolated LLM sessions** — `SubagentProvider.spawn()` with custom system prompt and context dict 4. **Loop plugin provides 12 extension points** for observation — cortex implements 4 hooks + `loop.transform_system_prompt` 5. **Context isolation is non-negotiable** — separate system prompt, separate message history, separate context. Cortex receives structured summaries, never raw conversation 6. **Assessment data model boundaries** — uses existing `Assessment` model (`peer_id`, `info_score`, `trust`, `rationale`, `created_at`). No schema extension 7. **Priority band: 20-29 (service plugins)** — cortex at ~23 (after ledger at 21, after observability at 22, before tools at 30) ### Cross-Cutting Concerns Identified - **Ledger dual-mode behavior** — when cortex is active, ledger suppresses `assess_peer` tool and defers assessment to cortex. When cortex is absent, ledger retains full inline assessment via `assess_peer`. Observability plugin must subscribe to both `ledger.after_assess` (fallback) and `cortex.after_assess` (cortex mode) to capture all assessments regardless of mode. - **Belief state coherence** — beliefs persist across cycles with TTL-based expiry (default 120 min). Reaffirmed beliefs reset their TTL. Expired beliefs are removed. Cap enforced with oldest-first eviction. - **Scheduling mechanism choice** — cortex needs periodic execution. Existing cron/heartbeat plugins provide scheduling, but the PRD describes integrated timer triggers with skip-on-no-events logic. Architecture must decide: delegate to cron or own the timer. - **LLM provider flexibility** — cortex can use a different model than the main loop. Must resolve provider selection independently of the main loop's configured provider. - **Event schema consistency** — `cortex.after_reflect` and `cortex.after_assess` payloads must follow observability conventions so the observability plugin can consume them without cortex-specific changes. ## Starter Template Evaluation ### Primary Technology Domain Python plugin within an existing brownfield codebase. All technology decisions are inherited from the Cobot project. ### Selected Starter: Existing Cobot Plugin Pattern **Rationale:** The cortex plugin follows the same plugin architecture as the 20+ existing plugins. Every technology decision — language, runtime, testing, linting, build, async patterns — is already made by the project. **Architectural Decisions Provided by Existing Pattern:** | Decision | Value | |----------|-------| | Language & Runtime | Python 3.11+ with asyncio | | CLI Framework | Click >=8.0 | | Testing | pytest >=8.0, pytest-asyncio >=0.23, co-located | | Linting/Formatting | ruff >=0.2 | | Plugin Structure | `__init__.py` + `plugin.py` + `README.md` + `tests/test_plugin.py` | | Configuration | `cobot.yml` under `cortex:` key | | Lifecycle | `configure()` (sync), `start()`/`stop()` (async), `create_plugin()` factory | | Logging | `self.log_debug()`, `self.log_info()`, `self.log_warn()`, `self.log_error()` | **New Dependencies:** None. ## Core Architectural Decisions ### Decision Priority Analysis **Critical Decisions (Block Implementation):** | # | Decision | Choice | Rationale | |---|----------|--------|-----------| | 1 | Scheduling mechanism | Own `asyncio.Task` timer | Tightly coupled to cortex internal state (event buffer). Skip-on-no-events is trivial. Cron/heartbeat serve different purposes (main session injection) | | 2 | Ledger write interface | New public `record_assessment()` method on ledger plugin | Clean plugin boundary. Ledger computes `info_score` internally. Follows the public query API pattern from observability work | | 3 | Cortex output parsing | JSON instruction in system prompt with fence-extraction fallback | Pure JSON output instruction. If parse fails, try extracting from ` ```json ``` ` block. On total failure, log and skip cycle | | 4 | Belief data model | `Belief` dataclass with TTL-based expiry | `dict[str, Belief]` for O(1) lookup. 2-state lifecycle (ACTIVE → EXPIRED). Evict oldest on cap. Simple, testable, in-memory | | 5 | Plugin priority | 23 | After ledger (21) and observability (22). Extension point wiring happens after all plugins register, so order is safe | | 6 | Event buffer | Simple `list[dict]`, cleared after each reflection cycle | No maxlen needed — reflection cycles are frequent enough | | 7 | Reflection history | `collections.deque(maxlen=N)`, default 50, configurable | In-memory with persistence via memory plugin. Consumed by CLI `cobot cortex history` | | 8 | State persistence | Memory plugin (`memory.store`/`memory.retrieve`) | Beliefs and reflection history persisted via existing memory plugin. Graceful degradation if memory unavailable — works in-memory only | | 9 | Trust delta clamping | Max ±3 trust change per reflection cycle (configurable) | Prevents single hallucinated reflection from catastrophically tanking a peer's trust. First assessment clamped to conservative ±3 absolute range (prevents anchoring problem). Subsequent assessments clamped relative to previous score. Peer recovers in one good cycle instead of five | | 11 | Concurrent reflection protection | Mutex flag (`_reflecting`), skip trigger if cycle in-progress | Prevents overlapping reflection cycles when a cycle takes longer than the trigger interval. One cycle at a time — no double assessment writes or belief state corruption | | 12 | Migration from existing assessments | Seed `_last_trust` from ledger at `start()` | Cortex inherits existing trust scores as starting points for delta clamping. Existing assessments remain in DB untouched. First cortex assessment is clamped relative to inherited score, not unclamped | | 10 | System prompt conflict resolution | Ledger always shows full assessment data; beliefs are additive | Ledger `enrich_prompt` always injects full data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence. Cortex beliefs are an additive interpretive layer. Contradiction is structurally impossible: cortex forms beliefs from ledger data via `_gather_context()` and writes assessments back via `record_assessment()` — both signals in the prompt originate from the same cortex analysis | **Deferred Decisions (Post-MVP):** | Decision | Rationale for Deferral | |----------|----------------------| | New peer discovered trigger | Growth feature — subsumed by interaction count trigger in MVP (new peer's first interactions hit the counter). Full value comes with "think before you talk" synchronous consultation | | LLM provider override for cortex | Growth feature — MVP uses same provider via subagent. Requires subagent API extension or direct LLM call | | Cross-session learning | Vision feature — MVP persists current state but does not carry evolving lessons across restarts | | Action directives via `session.poll_messages` | Growth feature — requires directive format design and main loop integration | | Synchronous cortex consultation | Growth feature — requires decision-deferral triggers, consultation protocol, holding-response mechanism | ### Data Architecture **Cortex state persisted via memory plugin.** Beliefs and reflection history are serialized to JSON and stored using `memory.store()` / `memory.retrieve()`. This uses existing infrastructure — the cortex doesn't know or care whether memory is backed by files, a vector DB, or something else. Cortex state shows up in `cobot memory list` and `cobot memory get cortex-beliefs` for free. **Persistence flow:** - On `start()`: `memory.retrieve("cortex-beliefs")` and `memory.retrieve("cortex-history")` → deserialize JSON → populate in-memory stores - On `start()`: **Seed `_last_trust` from ledger** — call `list_peers()` + `get_peer_assessment_summary()` for each peer to read the most recent trust score. This inherits existing inline assessments as starting points for delta clamping, ensuring the cortex builds on the existing trust trajectory rather than starting fresh - After each reflection cycle: `memory.store("cortex-beliefs", json.dumps(...))` and `memory.store("cortex-history", json.dumps(...))` - If memory plugin unavailable: cortex works in-memory only, beliefs lost on restart (graceful degradation) **In-memory data structures:** | Structure | Type | Purpose | |-----------|------|---------| | `_event_buffer` | `list[dict]` | Accumulated events since last reflection. Cleared after each cycle | | `_beliefs` | `dict[str, Belief]` | Current belief set. Capped at `max_beliefs` (default 20) | | `_reflection_history` | `deque[ReflectionRecord]` | Last N reflection cycles for CLI and audit | | `_interaction_count` | `int` | Interactions since last reflection, for count trigger | | `_last_reflection_time` | `float` | Timestamp of last reflection, for timer trigger | | `_last_trust` | `dict[str, int]` | Last trust score per peer, for delta clamping. Seeded from ledger at `start()` | | `_reflecting` | `bool` | Mutex flag — `True` while a reflection cycle is in progress. Triggers skip when set | ### Authentication & Security **No additional security concerns for MVP.** The cortex is an internal plugin — it reads from the loop hooks (existing trust boundary) and writes to the ledger (existing trust boundary). No new external interfaces. The cortex LLM call goes through the subagent, which uses the same LLM provider as the main loop. **Credential safety:** The cortex system prompt and context must never include Nostr private keys, API keys, or secrets. Only public identifiers (peer_id, agent_name) and behavioral data. ### API & Communication Patterns **Plugin-to-plugin communication:** ``` Cortex ──dependency──▶ Ledger (record_assessment, list_peers, get_peer_assessment_summary) Cortex ──optional────▶ Memory (store/retrieve for state persistence) Cortex ──consumes───▶ Subagent (spawn() for secondary LLM call) Cortex ──implements──▶ Loop hooks (on_message, after_send, after_llm, after_tool, transform_system_prompt) Cortex ──defines────▶ Extension points (cortex.after_reflect, cortex.after_assess) Cortex ──implements──▶ CLI commands (cortex beliefs, cortex history) ``` **Cortex reflection cycle data flow:** ``` 1. Triggers evaluate (in-memory, <5ms) │ 2. If triggered: gather context ├── Current beliefs (in-memory) ├── Recent interactions from ledger (list_peers + get_peer_assessment_summary) ├── SOUL.md content (from soul plugin or config) ├── Event buffer contents (in-memory) └── Previous reflection summary (in-memory) │ 3. Spawn subagent with cortex system prompt + structured context │ 4. Parse cortex LLM response (JSON) │ 5. Apply outputs: ├── Assessments → ledger.record_assessment() + emit cortex.after_assess ├── Beliefs → update _beliefs dict └── Emit cortex.after_reflect │ 6. Persist state via memory plugin │ 7. Clear event buffer, update counters ``` ### Decision Impact Analysis **Implementation Sequence:** 1. Ledger refactoring — add `record_assessment()` public API, add dual-mode `assess_peer` (active when cortex absent, suppressed when cortex present). No changes to `enrich_prompt()` — it always shows full assessment data 2. Cortex plugin skeleton — PluginMeta, lifecycle, configuration 3. Observation hooks — passive event collection 4. Trigger system — timer (with activity gate), interaction count threshold 5. Belief management — Belief dataclass (2-state TTL), store, injection via `loop.transform_system_prompt` 6. Reflection cycle — subagent spawn, output parsing, trust delta clamping, assessment writes 7. Extension points — `cortex.after_reflect`, `cortex.after_assess` 8. CLI commands — `cobot cortex beliefs`, `cobot cortex history` 9. Observability plugin update — subscribe to both `ledger.after_assess` (fallback) and `cortex.after_assess` (cortex mode) **Cross-Component Dependencies:** - Ledger refactoring must complete before cortex can write assessments - Observability plugin must handle both assessment event sources - CLI commands depend on belief store and reflection history being populated **Two-Increment Staging:** The implementation is split into two increments to enable empirical validation before layering the belief system: | Increment | FRs | Scope | Validation Gate | |-----------|-----|-------|----------------| | **1: Reflection Pipeline** | FR-CX-01, 02, 03, 04, 07, 09 | Observation hooks, triggers, reflection cycle, assessment writes, ledger refactoring (dual-mode), extension points | Compare inline vs. cortex assessment quality on same interaction sequences. Cortex must produce richer rationale with more behavioral observations | | **2: Belief System** | FR-CX-05, 06, 08 | Belief management, belief injection via `loop.transform_system_prompt`, CLI commands | Increment 1 validated that cortex assessments are meaningfully better than inline | **Rationale:** FRs 5-6 (beliefs) have no downstream dependencies from FRs 1-4. The reflection cycle writes assessments and emits events regardless of whether beliefs exist. The belief system is additive — it layers real-time prompt guidance on top of the assessment pipeline. Staging means the belief system earns its place with data from increment 1 rather than shipping on theory. **Increment 1 alone delivers:** The secondary LLM assessment pipeline (Doxios's "80% value" simple approach) — but built within the cortex architecture so increment 2 layers cleanly on top without refactoring. ## Implementation Patterns & Consistency Rules ### Pattern Categories Defined **Critical Conflict Points Identified:** 5 areas where AI agents could make different choices when implementing the cortex plugin. ### Naming Patterns **File & Module Naming:** | Item | Convention | Example | |------|-----------|---------| | Plugin directory | `cobot/plugins/cortex/` | Matches all existing plugins | | Plugin module | `plugin.py` | Single module, not split into sub-modules | | Data models | `models.py` | Dataclasses for `Belief`, `ReflectionRecord` | | Tests | `tests/test_plugin.py` | Co-located, single test file | | CLI module | `cli.py` | Separate from plugin.py, registered via `__init__.py` | **Internal Naming:** | Item | Convention | Example | |------|-----------|---------| | Private state | `_` prefix | `_beliefs`, `_event_buffer`, `_reflection_history` | | Config keys | snake_case in `cobot.yml` | `reflection_interval`, `max_beliefs`, `interaction_threshold` | | Memory keys | kebab-case strings | `"cortex-beliefs"`, `"cortex-history"` | | Extension points | dotted namespace | `cortex.after_reflect`, `cortex.after_assess` | | Belief keys | lowercase kebab-case | `"alice-is-reliable"`, `"market-data-stale"` | ### Structure Patterns **Dataclass Placement:** All cortex-specific dataclasses go in `models.py`, not inline in `plugin.py`: ```python # cobot/plugins/cortex/models.py from __future__ import annotations from dataclasses import dataclass, field import time @dataclass class Belief: key: str value: str rationale: str source_cycle: int created_at: float = field(default_factory=time.time) ttl_minutes: float = 120.0 # configurable default @property def is_expired(self) -> bool: return (time.time() - self.created_at) > (self.ttl_minutes * 60) def reaffirm(self, cycle: int) -> None: """Reset TTL when cortex reaffirms this belief.""" self.created_at = time.time() self.source_cycle = cycle @dataclass class ReflectionRecord: cycle: int timestamp: float trigger: str # "timer" | "interaction_count" peers_assessed: list[str] beliefs_updated: list[str] summary: str elapsed_seconds: float ``` **Hook Handler Organization:** All loop hook handlers are private methods on `CortexPlugin`, prefixed with `_on_`: ```python async def _on_message(self, ctx: dict) -> dict: ... async def _on_after_send(self, ctx: dict) -> dict: ... async def _on_after_llm(self, ctx: dict) -> dict: ... async def _on_after_tool(self, ctx: dict) -> dict: ... async def _on_transform_system_prompt(self, ctx: dict) -> dict: ... ``` ### Format Patterns **Cortex LLM Output Schema:** The cortex system prompt instructs the subagent to return this exact JSON structure: ```json { "assessments": [ { "peer_id": "npub1abc...", "trust": 4, "rationale": "Six interactions over 3 weeks. Consistent requester with clear task descriptions. Reliable follow-through on all requests. No red flags." } ], "beliefs": [ { "key": "alice-is-reliable", "value": "Alice consistently delivers accurate information", "rationale": "3 consecutive accurate predictions confirmed" } ], "summary": "One-paragraph reflection summary for history" } ``` - `assessments` array: may be empty. Each entry has `peer_id` (string), `trust` (integer -10 to +10, same semantics as existing ledger assessment), `rationale` (string, behavioral observations — the primary signal). - `info_score` is **never in the LLM output** — the ledger computes it deterministically via `compute_info_score()` when writing the assessment. The cortex LLM receives `info_score` as read-only context (from `get_peer_assessment_summary()`) to calibrate its trust judgment. - `beliefs` array: may be empty. Each entry has `key`, `value`, `rationale`. - `summary`: always present, always a string. The cortex system prompt includes the ledger's `_SCORE_GUIDE` text to calibrate the LLM's trust scoring. **Assessment Write Flow:** 1. Cortex LLM returns `{peer_id, trust, rationale}` 2. Cortex applies **trust delta clamping**: `clamped_trust = clamp(trust, last_trust ± MAX_TRUST_DELTA)`. First assessment for a peer (no entry in `_last_trust`) is clamped to conservative absolute range `[-MAX_TRUST_DELTA, +MAX_TRUST_DELTA]` (default `[-3, +3]`). Default `MAX_TRUST_DELTA = 3` (configurable via `cobot.yml` as `max_trust_delta`) 3. Cortex calls `ledger.record_assessment(peer_id, clamped_trust, rationale)` 4. Ledger internally calls `compute_info_score(peer, assessment_count)` and stores the full assessment — `compute_info_score` stays in `ledger/models.py` as it is a deterministic function of interaction data 5. Cortex updates `_last_trust[peer_id] = clamped_trust` 6. Cortex emits `cortex.after_assess` **Trust Delta Clamping Rationale:** Prevents a single hallucinated reflection from catastrophically shifting a peer's trust. A peer at trust +4 cannot drop below +1 in a single cycle. If the cortex genuinely believes trust should be lower, it will produce the same signal in the next cycle, moving trust to -2. This creates a 2-cycle minimum for large trust swings, giving the operator time to audit via `cobot cortex history`. ```python MAX_TRUST_DELTA = 3 # configurable def _clamp_trust(self, peer_id: str, proposed_trust: int) -> int: current = self._last_trust.get(peer_id) if current is None: # First assessment: clamp to conservative absolute range [-MAX, +MAX] # Prevents hallucinated first-contact from setting extreme anchor return max(-self._max_trust_delta, min(self._max_trust_delta, proposed_trust)) delta = proposed_trust - current clamped_delta = max(-self._max_trust_delta, min(self._max_trust_delta, delta)) return current + clamped_delta ``` **Belief Injection Format:** Injected into the main loop system prompt via `loop.transform_system_prompt`: ``` ## Cortex Beliefs - alice-is-reliable: Alice consistently delivers accurate information - market-data-caution: Exercise caution with Bob on large-scope requests ``` Rules: - Each belief on its own line, prefixed with `- ` - Format: `{key}: {value}` - Only active (non-expired) beliefs are injected — expired beliefs are removed, never shown - Section header is always `## Cortex Beliefs` - If no beliefs exist, omit the section entirely (don't inject empty header) - Cortex beliefs are an additive interpretive layer complementing the ledger's full assessment data (info_score, trust, rationale, score guide, trajectory). The ledger always injects full data regardless of cortex presence. Contradiction is structurally impossible — the cortex forms beliefs from ledger data and writes assessments back to the ledger - Peer-specific beliefs should include peer_id, e.g., `- npub-farm1-caution (npub-farm1): Exercise caution...` **Extension Point Event Payloads:** `cortex.after_reflect`: ```python { "cycle": int, # Monotonic cycle counter "trigger": str, # "timer" | "interaction_count" "peers_assessed": list[str], "beliefs_added": list[str], "beliefs_reaffirmed": list[str], "beliefs_expired": list[str], "summary": str, "elapsed_seconds": float, } ``` `cortex.after_assess`: ```python { "peer_id": str, "trust": int, # -10 to +10, set by cortex LLM "rationale": str, "info_score": int, # 0-10, computed by ledger's compute_info_score() "cycle": int, } ``` ### Communication Patterns **Error Handling:** | Failure | Behavior | Logging | |---------|----------|---------| | Subagent timeout | Skip cycle, retain beliefs at last known state, increment skip counter | `log_warn("Reflection timed out after {timeout}s, skipping cycle {n}")` | | JSON parse failure | Try ``` ```json``` ``` fence extraction. If still fails, skip cycle | `log_warn("Failed to parse cortex output, skipping cycle {n}")` | | Ledger write failure | Log and continue — beliefs still update, assessment not recorded | `log_error("Failed to record assessment for {peer_id}: {error}")` | | Memory persist failure | Log and continue — in-memory state is authoritative | `log_warn("Failed to persist cortex state: {error}")` | | Subagent unavailable | Skip cycle, retain beliefs | `log_warn("Subagent unavailable, skipping cycle {n}")` | **Key principle:** Cortex failures never propagate to the main loop. Beliefs freeze at last known good state. **Logging Levels:** | Level | Usage | |-------|-------| | `log_info` | Trigger fired, cycle completed with summary stats, beliefs loaded on start | | `log_debug` | Full LLM context sent, full LLM response received, belief diff details | | `log_warn` | Skipped cycles (timeout, parse failure, no subagent), memory persist failure | | `log_error` | Ledger write failure, unexpected exceptions in hooks | ### Process Patterns **Hook Handler Contract:** All observation hooks (`_on_message`, `_on_after_send`, `_on_after_llm`, `_on_after_tool`) follow the same contract: 1. **Never modify `ctx`** — read-only access 2. **Never block** — append to `_event_buffer` and return immediately 3. **Always return `ctx` unchanged** — passive observer pattern 4. **Never raise** — wrap in try/except, log errors, return `ctx` ```python async def _on_after_llm(self, ctx: dict) -> dict: try: self._event_buffer.append({ "type": "after_llm", "model": ctx.get("model"), "tokens_in": ctx.get("tokens_in"), "tokens_out": ctx.get("tokens_out"), "has_tool_calls": ctx.get("has_tool_calls"), "timestamp": time.time(), }) except Exception as e: self.log_error(f"Hook error: {e}") return ctx ``` **Belief Lifecycle (2-state, TTL-based):** ``` ACTIVE → EXPIRED - ACTIVE: Belief exists with remaining TTL (default 120 min, configurable) - EXPIRED: TTL elapsed without reaffirmation — belief is removed Reaffirmation: When the cortex LLM returns the same belief key in a subsequent cycle, the belief's created_at is reset, restarting the TTL clock. Eviction on cap: When max_beliefs reached, oldest belief (by created_at) is evicted to make room, regardless of TTL remaining. ``` **Trigger Evaluation:** Two triggers, evaluated on each timer tick. Both can fire — cortex deduplicates and runs one reflection cycle: 1. **Interaction count threshold reached** (configurable, default 5) — fires when `_interaction_count >= threshold`. Counter resets after reflection. Safety-critical: prevents accumulating too many unassessed interactions 2. **Timer interval elapsed** (configurable, default 30 min) — periodic catchup. **Activity gate:** skips if event buffer is empty (no interactions since last reflection) **Concurrent reflection protection:** The cortex maintains a `_reflecting` boolean flag. Before starting a reflection cycle, the trigger checks `_reflecting` — if `True`, the trigger is skipped and logged at DEBUG level (`"Trigger skipped: reflection already in progress"`). The flag is set to `True` at cycle start and `False` at cycle end (in a `finally` block to ensure cleanup on error). This prevents overlapping reflection cycles when a cycle exceeds the trigger interval. New-peer trigger is deferred to Growth (subsumed by interaction count — a new peer's first interactions hit the counter). ### Enforcement Guidelines **All AI Agents MUST:** 1. Place dataclasses in `models.py`, never inline in `plugin.py` 2. Use the exact extension point payload schemas defined above — no extra fields, no missing fields 3. Never modify `ctx` in observation hooks — passive observer only 4. Never let cortex exceptions propagate to the main loop — always catch and log 5. Use `memory.store`/`memory.retrieve` for persistence, never direct file I/O 6. Use `self.log_*()` methods, never `print()` or raw `logging.*` 7. Follow the exact belief injection format — agents must be able to parse it consistently 8. Never set `info_score` from LLM output — `compute_info_score()` in `ledger/models.py` is the sole source; the cortex only provides `trust` and `rationale` 9. Always apply trust delta clamping before writing assessments — never write raw LLM trust output directly to ledger 10. Ledger prompt enrichment always shows full assessment data (info_score, trust, rationale, score guide, trajectory) regardless of cortex presence — cortex beliefs are an additive interpretive layer, not a replacement 11. Always check `_reflecting` before starting a reflection cycle — never allow concurrent reflections 12. Always seed `_last_trust` from existing ledger assessments at `start()` — never discard inherited trust scores **Anti-Patterns:** | Anti-Pattern | Why It's Wrong | Correct Approach | |-------------|----------------|------------------| | Modifying `ctx` in observation hooks | Breaks passive observer contract, may affect main loop | Always `return ctx` unchanged | | Calling `_db` directly on ledger plugin | Crosses plugin boundary, couples to internal implementation | Use `ledger.record_assessment()` public API | | Setting `info_score` in cortex LLM output | `info_score` is deterministic (interaction count/time/assessments), never LLM-set | Ledger computes `info_score` internally via `compute_info_score()` | | Computing `info_score` in cortex | `compute_info_score` belongs to the ledger domain | Cortex receives `info_score` as read-only context, ledger computes on write | | Persisting beliefs via direct file writes | Bypasses memory plugin abstraction | Use `memory.store("cortex-beliefs", ...)` | | Blocking main loop during reflection | Reflection is async background work | Run reflection in own `asyncio.Task` | | Using `print()` for logging | Inconsistent format, no level filtering | Use `self.log_info()`, `self.log_debug()`, etc. | | Writing raw LLM trust to ledger without clamping | Single hallucination can catastrophically shift trust | Always apply `_clamp_trust()` before `record_assessment()` | | Stripping trust/rationale from ledger when cortex active | Destroys long-term memory — ledger rationale is institutional memory; beliefs expire after 120 min TTL | Ledger always injects full assessment data. Beliefs are additive. Contradiction is structurally impossible since cortex forms beliefs from ledger data | | Allowing first assessment to be unclamped | Hallucinated first-contact anchors all future trust deltas at an extreme value | Clamp first assessment to conservative `[-MAX_TRUST_DELTA, +MAX_TRUST_DELTA]` absolute range | | Starting `_last_trust` empty when ledger has existing assessments | Discards existing trust trajectory; first cortex assessment treated as first-ever assessment | Seed `_last_trust` from ledger at `start()` via `get_peer_assessment_summary()` | | Running concurrent reflection cycles | Double assessment writes, belief state corruption, race conditions | Check `_reflecting` flag before starting cycle; set in `try`/`finally` block | ## Project Structure & Boundaries ### Complete Project Directory Structure **New files (cortex plugin):** ``` cobot/plugins/cortex/ ├── __init__.py # Docstring only ├── plugin.py # CortexPlugin class, hook handlers, reflection cycle, belief injection ├── models.py # Belief, ReflectionRecord dataclasses ├── cli.py # `cobot cortex beliefs`, `cobot cortex history` commands ├── README.md # Plugin documentation └── tests/ └── test_plugin.py # Co-located tests ``` **Modified files (ledger refactoring):** ``` cobot/plugins/ledger/ ├── plugin.py # Add record_assessment() public API, add dual-mode assess_peer (suppress when cortex active, retain when absent). No changes to enrich_prompt() — always shows full data ├── models.py # No changes — compute_info_score stays here ├── db.py # Possible: record_assessment() computes info_score internally ├── cli.py # No changes └── tests/ └── test_plugin.py # Test dual-mode assess_peer behavior, test record_assessment() API ``` **Modified files (observability migration):** ``` cobot/plugins/observability/ └── plugin.py # Subscribe to both ledger.after_assess (fallback mode) and cortex.after_assess (cortex mode) ``` ### Requirements to Structure Mapping | FR | File(s) | Description | |----|---------|-------------| | FR-CX-01: Observation | `cortex/plugin.py` | `_on_message`, `_on_after_send`, `_on_after_llm`, `_on_after_tool` hook handlers | | FR-CX-02: Triggers | `cortex/plugin.py` | Timer (activity-gated) + interaction count threshold, in-memory counters | | FR-CX-03: Reflection | `cortex/plugin.py` | `_run_reflection()` method, subagent spawn, JSON parsing | | FR-CX-04: Assessment | `cortex/plugin.py` + `ledger/plugin.py` | Cortex sends `(peer_id, trust, rationale)` → ledger's `record_assessment()` computes `info_score` and writes | | FR-CX-05: Beliefs | `cortex/plugin.py` + `cortex/models.py` | `Belief` dataclass, `_beliefs` dict, lifecycle management | | FR-CX-06: Injection | `cortex/plugin.py` | `_on_transform_system_prompt()` handler | | FR-CX-07: Ledger refactor | `ledger/plugin.py` | Add dual-mode `assess_peer` (suppress when cortex active, retain as fallback when absent), add `record_assessment()` public method. No changes to `enrich_prompt()` — always shows full assessment data | | FR-CX-08: CLI | `cortex/cli.py` | Click command group registered via `cli.commands` implements | | FR-CX-09: Extensions | `cortex/plugin.py` | `cortex.after_reflect`, `cortex.after_assess` in PluginMeta | ### Architectural Boundaries ``` ┌─────────────────────────┐ │ Loop Plugin │ │ (extension point owner) │ └───────┬─────────────────┘ │ hooks ┌────────────┴────────────────┐ │ │ ┌──────────▼──────────┐ ┌────────────▼────────────┐ │ Cortex Plugin │ │ Ledger Plugin │ │ (reflection owner) │ │ (data service owner) │ │ │ │ │ │ - observation hooks │ │ - peer tracking │ │ - trigger evaluation │ │ - interaction recording │ │ - reflection cycle │ │ - assessment storage │ │ - belief management │ │ - compute_info_score() │ │ - belief injection │ │ - prompt enrichment │ │ (interpretive) │ │ (full assessment data) │ │ │ │ - query tools │ │ - assess_peer fallback │ │ (when cortex absent) │ └──────┬───────────────┘ └──────────▲──────────────┘ │ │ │ record_assessment() │ list_peers() │ (trust, rationale) │ get_peer_assessment_summary() └────────────────────────────────┘ ``` **Boundary Rules:** | Boundary | Direction | Interface | What Crosses | |----------|-----------|-----------|--------------| | Cortex → Ledger | Write | `record_assessment(peer_id, trust, rationale)` | Trust score + rationale. Ledger computes `info_score` internally | | Cortex → Ledger | Read | `list_peers()`, `get_peer_assessment_summary()` | Peer data + latest assessments (including `info_score`) for cortex context building | | Cortex → Subagent | Spawn | `SubagentProvider.spawn(task, context, system_prompt)` | Structured context dict, cortex system prompt | | Cortex → Memory | Persist | `memory.store(key, content)` / `memory.retrieve(key)` | Serialized beliefs + reflection history | | Cortex → Loop | Inject | `loop.transform_system_prompt` handler | Formatted belief block appended to system prompt | | Cortex → Observability | Events | `cortex.after_reflect`, `cortex.after_assess` | Structured event payloads (defined in step 5) | **Data Boundaries:** - Cortex **never** accesses `ledger._db` directly - Cortex **never** computes `info_score` — receives it as read-only context, ledger computes on write - Cortex **never** reads raw conversation content — receives structured summaries from ledger's public API - Ledger **never** triggers reflection — the cortex owns its own scheduling - Memory plugin **never** interprets cortex state — stores/retrieves opaque strings ### Integration Points **Hook registration** — cortex declares `implements` in PluginMeta; registry wires handlers automatically. **Plugin dependency** — cortex declares `dependencies: ["config", "ledger"]`, `optional_dependencies: ["memory"]`, `consumes: ["subagent"]`. **Async reflection** — cortex spawns its own `asyncio.Task` for the reflection timer; does not use cron/heartbeat. **Ledger `record_assessment()` Public API (new):** ```python def record_assessment(self, peer_id: str, trust: int, rationale: str) -> int: """Record a behavioral assessment. Computes info_score internally. Returns assessment_id. Raises ValueError if peer not found or trust out of range. """ ``` This complements the existing `_tool_assess_peer` flow. When cortex is active, it calls `record_assessment()` directly. When cortex is absent, `assess_peer` tool continues to work as before via `_tool_assess_peer`. The `record_assessment()` method extracts the shared logic from `_tool_assess_peer` so both paths use the same write+compute_info_score flow. ## Architecture Validation Results ### Coherence Validation **Decision Compatibility:** All decisions are internally consistent. Cortex at priority 23 sits correctly between ledger (21) and loop (50). The `record_assessment(peer_id, trust, rationale)` API aligns with existing `_tool_assess_peer` logic. Memory plugin persistence matches the existing `memory_files` key-value implementation. Subagent `spawn()` interface matches cortex needs. The `consumes: ["subagent"]` declaration matches the subagent plugin's `capabilities: ["subagent", "tools"]`. **Pattern Consistency:** No contradictions found. Naming conventions (snake_case config, kebab-case memory keys, dotted extension points) match existing plugin conventions. Hook handler contract (passive observer, never modify ctx) matches observability plugin's established pattern. Belief injection via `loop.transform_system_prompt` follows the same append pattern as ledger's `enrich_prompt`. **Structure Alignment:** Project structure follows the established plugin pattern (observability, ledger). Boundaries are enforced through public APIs only — no cross-plugin internal state access. ### Requirements Coverage Validation **Functional Requirements Coverage:** | FR | Status | Architectural Support | |----|--------|----------------------| | FR-CX-01: Observation | Covered | Hook handlers in plugin.py, event buffer pattern | | FR-CX-02: Triggers | Covered | Timer (with activity gate), interaction count. New-peer trigger deferred to Growth | | FR-CX-03: Reflection | Covered | Subagent spawn, JSON output schema, timeout handling | | FR-CX-04: Assessment | Covered | Cortex provides trust + rationale, ledger computes info_score on write | | FR-CX-05: Beliefs | Covered | Belief dataclass, 2-state lifecycle (ACTIVE→EXPIRED), TTL-based expiry, cap with oldest-first eviction | | FR-CX-06: Injection | Covered | `loop.transform_system_prompt`, format specified, active beliefs only (expired removed) | | FR-CX-07: Ledger refactor | Covered | Dual-mode assess_peer (suppress when cortex active, retain as fallback), add record_assessment() public API. Ledger always shows full assessment data — no prompt mode changes needed | | FR-CX-08: CLI | Covered | cli.py with `cobot cortex beliefs` and `cobot cortex history` | | FR-CX-09: Extensions | Covered | `cortex.after_reflect`, `cortex.after_assess` with exact payload schemas | **Non-Functional Requirements Coverage:** | NFR | Status | Architectural Support | |-----|--------|----------------------| | NFR-CX-01: Performance | Covered | <5ms triggers, <1ms hooks, <1ms injection — all in-memory | | NFR-CX-02: Isolation | Covered | Separate LLM context, failure freezes beliefs, no main loop impact | | NFR-CX-03: Configurability | Covered | All params via `cobot.yml`, takes effect next cycle | | NFR-CX-04: Testability | Covered | Co-located tests, mock subagent, isolated unit tests for triggers/beliefs | | NFR-CX-05: Observability | Covered | Extension point events, logging level table | ### Implementation Readiness Validation **Decision Completeness:** All 10 critical decisions documented with rationale (8 original + trust delta clamping + system prompt conflict resolution). 5 deferred decisions documented with deferral reasoning (4 original + new-peer trigger). Data architecture specified (in-memory structures + memory plugin persistence). **Structure Completeness:** Complete directory structure for new files (cortex plugin) and modified files (ledger, observability). All FRs mapped to specific files. Integration points specified with public API signatures. **Pattern Completeness:** All potential conflict points addressed — naming, structure, format, communication, process. Enforcement guidelines with 10 mandatory rules and 9 anti-patterns documented with correct alternatives. ### Steelman Review Response This architecture was revised following a steelman review (Doxios, issue #234 comment #1564). Key findings and responses: | Review Point | Verdict | Resolution | |-------------|---------|------------| | Premature abstraction | **Invalid** — inline assessment was tested in simulation (2026-03) and produced shallow, context-pressured judgments. The cortex addresses a validated deficiency | | Complexity budget | Partially valid | Belief lifecycle simplified from 4 states to 2 (ACTIVE→EXPIRED, TTL-based) | | LLM-as-Judge problem | Valid | Trust delta clamping (±3/cycle) added as preventive mitigation. Concrete simulation test plan added | | Ledger coupling trap | **Valid** — adopted Doxios's suggestion | Ledger retains `assess_peer` as fallback when cortex is absent. When cortex is active, `assess_peer` is suppressed. Cortex is truly optional — removing it restores full inline assessment. No single point of failure on the judgment axis | | Overengineered triggers | Partially valid | New-peer trigger deferred to Growth. MVP: timer + interaction count only | | Complexity budget / split into smaller plugins | **Invalid** — decomposition makes it worse. Splitting cortex into observer + trigger + belief + reflection plugins creates more total boundaries, more extension point wiring, more failure modes, and an orchestration problem on top. The cortex's complexity is inherent to its job (observe → decide → think → apply) — a pipeline, not a decomposition target. 600 LOC is within project norms (telegram, loop plugins are larger). Mitigated via two-increment staging instead | | Token cost underspecified | Valid | Token budget analysis added with daily cost estimates | | System prompt conflict | Valid | Resolved: ledger always shows full assessment data. Cortex beliefs are an additive interpretive layer. Contradiction structurally impossible — cortex forms beliefs from ledger data via `_gather_context()` and writes assessments back via `record_assessment()` | ### Token Budget Analysis **Per-reflection-cycle token estimate:** | Component | Estimated Tokens | |-----------|-----------------| | Cortex system prompt | ~400 | | SOUL.md content | ~300-500 | | Current beliefs (20 max) | ~200-400 | | Previous reflection summary | ~100-200 | | Peer summaries (5 peers x ~100 tokens) | ~500 | | Event buffer summary (10 events x ~50 tokens) | ~500 | | **Total input per cycle** | **~2,000-2,500** | | Output (assessments + beliefs + summary) | ~300-500 | | **Total per cycle** | **~2,500-3,000** | **Daily cost at different intervals (Sonnet-class model, ~$3/MTok input, ~$15/MTok output):** | Interval | Max Cycles/day | Est. Tokens/day | Est. Cost | |----------|---------------|-----------------|-----------| | 15 min | 96 | ~250K | ~$0.75 | | 30 min (default) | 48 | ~125K | ~$0.38 | | 60 min | 24 | ~62K | ~$0.19 | **Activity gate impact:** Timer-triggered cycles skip when event buffer is empty. An agent with 10 interactions/day at 30-min intervals might run 10-15 actual cycles, not 48. Interaction-count-triggered cycles fire only when threshold reached. Real-world cost will be significantly lower than the theoretical maximum. ### Simulation Test Plan | Test | Method | Pass Criteria | |------|--------|---------------| | Assessment quality comparison | Run same 20 interaction sequences through inline `assess_peer` AND cortex reflection. Human-rate rationale depth and accuracy | Cortex rationale rated equal or better in >=70% of cases | | Hallucination resilience | Inject 3 deliberately misleading interaction summaries into cortex context | Trust delta clamping prevents trust drop >3 points per cycle; next clean cycle recovers | | Pattern detection | Feed reputation-farming sequence (5 trivial + 1 large request) | Cortex identifies escalation pattern; inline assessment does not | | Cost measurement | Run 10 reflection cycles on a 5-peer agent, measure actual token usage | Total tokens within 2x of estimates in token budget table | | Activity gate | Run 30-min timer with no interactions for 2 hours | Zero reflection cycles fired; zero tokens consumed | | Belief TTL expiry | Create belief, advance time past TTL without reaffirmation | Belief removed from injection, not present in system prompt | | Trust clamping boundary | Cortex proposes trust change of +7 (from 0) on non-first assessment | Clamped to +3; second cycle needed to reach +6 | | First-assessment clamping | Cortex proposes trust of +8 for a brand-new peer (no prior assessment) | Clamped to +3 (absolute range ±3); prevents extreme anchor | | Migration from existing | Enable cortex on agent with existing assessments (peer at trust +5) | `_last_trust` seeded with +5; cortex's first assessment clamped to [+2, +8] range | | Concurrent reflection | Trigger fires while cycle in progress (simulate slow LLM response) | Second trigger is skipped; only one cycle runs; no duplicate writes | ### System Prompt Conflict Resolution **Problem:** Both the ledger (`enrich_prompt`) and cortex (belief injection) write peer-related content into the system prompt via `loop.transform_system_prompt`. Without coordination, they can contradict — ledger shows "trust: +3" while cortex belief says "exercise caution." **Resolution: Ledger always shows full data; beliefs are additive.** Regardless of whether the cortex plugin is installed: - **Ledger `enrich_prompt()`** always shows full assessment data: peer_id, interaction count, info_score, trust, rationale, score guide, trajectory. No stripping, no conditional modes. - **Cortex belief injection** is an additive interpretive layer that complements the ledger's assessment data. **Why contradiction is structurally impossible:** The cortex forms beliefs by reading ledger data via `_gather_context()` (which calls `list_peers()` and `get_peer_assessment_summary()`). The cortex then writes assessments back to the ledger via `record_assessment()`. Both signals in the system prompt — the ledger's assessment data and the cortex's beliefs — originate from the same cortex analysis. The belief is derived FROM the ledger data, and the assessment that produced the ledger data was written BY the cortex. **Why the previous approach (facts-only mode) was wrong:** Stripping trust and rationale from the ledger's prompt enrichment destroys the agent's long-term memory. The ledger's assessment rationale IS the institutional memory (the original ledger PRD states "rationale is the primary signal"). Beliefs expire after 120 min TTL — under the facts-only approach, the agent would lose all memory of past incidents once beliefs expired, leaving only bare interaction counts. **Implementation:** No `_cortex_active` flag needed in ledger. No conditional prompt formatting. Ledger `enrich_prompt()` is unchanged from its existing behavior. **Result:** The main LLM sees both the ledger's full assessment data (the factual record including trust trajectory and rationale) and cortex beliefs (interpretive guidance). These are complementary, not contradictory. ### Gap Analysis Results **Critical Gaps:** None. **Important Gaps:** None. All follow-up review items addressed: first-assessment clamping policy (conservative ±3 absolute range), migration path for existing assessments (`_last_trust` seeded from ledger), concurrent reflection protection (`_reflecting` mutex flag), inline assessment evidence (added to PRD). The architecture specifies `consumes: ["subagent"]` — the cortex resolves the subagent via `self._registry.get_by_capability("subagent")`, consistent with how the loop plugin resolves LLM via `get_by_capability("llm")`. **Nice-to-Have (deferred to implementation):** - Exact cortex system prompt template text - Formalized configuration defaults table - Exact `cobot.yml` schema validation ### Architecture Completeness Checklist **Requirements Analysis** - [x] Project context analyzed (brownfield, plugin architecture, 68 project rules) - [x] Scale and complexity assessed (medium) - [x] Technical constraints identified (7 constraints) - [x] Cross-cutting concerns mapped (5 concerns) **Architectural Decisions** - [x] Critical decisions documented (10 decisions with rationale, including trust delta clamping and system prompt conflict resolution) - [x] Deferred decisions documented (5 with deferral reasoning, including new-peer trigger) - [x] Data architecture specified (in-memory + memory plugin persistence) - [x] Security boundaries addressed (credential safety, no new external interfaces) - [x] Plugin communication patterns defined (dependency graph + boundary rules) **Implementation Patterns** - [x] Naming conventions established (files, config, memory keys, extension points, beliefs) - [x] Structure patterns defined (dataclass placement, hook handler organization) - [x] Format patterns specified (LLM output JSON, belief injection, event payloads) - [x] Communication patterns documented (error handling table, logging levels) - [x] Process patterns defined (hook contract, belief lifecycle, trigger evaluation order) - [x] Enforcement guidelines with anti-patterns **Project Structure** - [x] Complete directory structure defined (new + modified files) - [x] Component boundaries established with diagram - [x] Integration points mapped with boundary rules table - [x] Requirements to structure mapping complete (all 9 FRs) ### Architecture Readiness Assessment **Overall Status:** READY FOR IMPLEMENTATION **Confidence Level:** High — builds entirely on existing infrastructure with no new dependencies or technology decisions. Every boundary is a public method call. Failure modes are well-defined with graceful degradation. **Key Strengths:** - Zero new dependencies — uses existing subagent, memory, ledger, loop infrastructure - Clear ownership: cortex owns reflection + beliefs, ledger owns data + info_score - Passive observer pattern proven by observability plugin implementation - Graceful degradation at every failure point — including full assessment fallback via `assess_peer` when cortex absent - Cortex is truly optional — removing it restores full inline assessment with zero code changes - Trust delta clamping provides preventive hallucination mitigation (not just reactive audit) - Structural impossibility of contradiction — beliefs derived from ledger data, assessments written back to ledger - Simplified belief lifecycle (2-state) reduces implementation complexity while preserving core value - Addresses validated deficiency: inline assessment tested in simulation and found inadequate - No one-way door: ledger retains full assessment capability when cortex is not installed **Implementation Sequence:** 1. Ledger refactoring — add `record_assessment()` public API, add dual-mode `assess_peer` (suppress when cortex active, retain as fallback). No changes to `enrich_prompt()` — always shows full data 2. Cortex plugin skeleton — PluginMeta, lifecycle, configuration 3. Observation hooks — passive event collection 4. Trigger system — timer (with activity gate), interaction count threshold 5. Belief management — Belief dataclass (2-state TTL), store, injection via `loop.transform_system_prompt` 6. Reflection cycle — subagent spawn, output parsing, trust delta clamping, assessment writes 7. Extension points — `cortex.after_reflect`, `cortex.after_assess` 8. CLI commands — `cobot cortex beliefs`, `cobot cortex history` 9. Observability plugin update — subscribe to both `ledger.after_assess` (fallback) and `cortex.after_assess` (cortex mode)
Collaborator

Steelman Case Against the Cortex Plugin

Reviewer: Doxios 🦊 (Cobot maintainer)

David asked me to steelman the case against this. The PRD is excellent architecture work — but here's what concerns me.


🔴 Premature Abstraction

The cortex solves a problem that doesn't exist yet at scale. The ledger plugin hasn't been merged (#226 is still open). Zero agents are running peer assessments in production. You're designing a metacognitive layer for an assessment system that hasn't been validated.

The risk: you build an elegant architecture around assumptions about assessment quality problems that may not materialize — or may look completely different once real agents interact with real peers.

Counter-question: How many real peer interactions has any Cobot agent actually processed? If the answer is zero, the cortex is optimizing a function that hasn't been called.

🔴 Complexity Budget

Cobot's philosophy is minimal self-sovereign agent. The cortex adds:

  • 1 new plugin with the highest cognitive complexity in the project
  • 1 mandatory refactoring of a plugin that isn't merged yet
  • 1 migration of an observability plugin that may not exist yet
  • A secondary LLM call per reflection cycle (cost × frequency)
  • A belief state machine (NEW → CONFIRMED → STALE → EVICTED) with expiry logic
  • A JSON output schema the LLM must reliably produce

It touches more boundaries than any other plugin: uses 5 hooks, defines 2 extension points, writes to the ledger, reads from the ledger, spawns subagents, persists via memory, and injects into the system prompt.

Is this still "minimal"?

🔴 The LLM-as-Judge Problem Is Acknowledged But Not Solved

The PRD lists "cortex hallucination produces wrong assessment" as high severity. The mitigations are:

  • Operator audit trail (reactive, not preventive)
  • Belief expiry (delays harm, doesn't prevent it)
  • Deterministic info_score anchor (anchors one axis, not the judgment axis)

A single hallucinated assessment could tank a legitimate peer's trust score, and the main loop would deprioritize them for 5 cycles before the belief expires. That's 5 × 15min = 75 minutes of degraded behavior from one bad reflection.

The PRD calls for "qualitative evaluation in simulation" — but there's no concrete plan for what that simulation looks like or what pass/fail criteria are.

🟡 The Ledger Refactoring Creates a Coupling Trap

Today the ledger owns assessment end-to-end. After the cortex, it becomes a "dumb" data store. This is a one-way door — once the cortex owns assessment, rolling it back requires re-implementing assess_peer in the ledger.

More concerning: if the cortex is disabled or not installed, no assessments are produced at all. The ledger loses the ability to self-assess.

Suggestion: Keep assess_peer as a fallback in the ledger. If cortex is present, it suppresses inline assessment. If cortex is absent, ledger self-assesses as before. This makes the cortex truly optional.

🟡 Overengineered for the Actual Need?

k9ert's original comment was simpler: "How difficult would it be adding the assessment as a second LLM call?" The core insight is: don't assess inline, assess separately.

You could achieve 80% of the value with a much simpler approach:

  • After each interaction, fire an "assess this" event
  • A simple plugin makes a second LLM call with the interaction + peer history
  • Writes the result back to the ledger
  • Done

No belief system. No trigger heuristics. No System 1/System 2 metaphor. No state machine. The cortex's sophistication is architecturally beautiful but may be premature.

🟡 Token Cost Is Underspecified

Each reflection cycle could easily be 3000-5000 input tokens (system prompt + SOUL.md + peer summaries + beliefs + previous reflection). At 15-minute intervals, that's ~200K tokens/day just for reflection. With a "stronger reasoning model" as suggested, this isn't cheap. The PRD needs real cost estimates for different agent sizes.

🟡 System Prompt Conflict

Both the ledger (_format_peer_context) and the cortex (belief injection) put peer-related content into the system prompt. These could contradict — ledger shows positive interaction history while cortex belief says "caution." There's no conflict resolution mechanism.


🟢 What's Good

  • Research grounding is excellent (Talker-Reasoner, Reflexion, SOFAI-LM, MIRROR)
  • Passive observer pattern for hooks is exactly right
  • Deterministic info_score anchor is smart
  • JSON output schema is well-specified
  • Extension points are well-designed for composability
  • Most thorough architecture document on any Cobot plugin

What's Missing

  1. Concrete simulation/test plan for assessment quality comparison (inline vs cortex)
  2. Fallback behavior when cortex is not installed (ledger should still self-assess)
  3. Token cost estimates for different agent sizes (5 peers vs 50 peers)
  4. Migration path — what happens to existing inline assessments when cortex takes over?
  5. Conflict resolution between ledger prompt enrichment and cortex belief injection

My Recommendation

Ship the ledger first (#226) with assess_peer intact. Let it run. Collect real data on assessment quality. Then build the cortex based on observed problems, not theoretical ones.

The PRD is excellent architecture work — but architecture without validated requirements is a beautiful house built on sand.

— Doxios 🦊

## Steelman Case Against the Cortex Plugin Reviewer: Doxios 🦊 (Cobot maintainer) David asked me to steelman the case against this. The PRD is excellent architecture work — but here's what concerns me. --- ### 🔴 Premature Abstraction The cortex solves a problem that **doesn't exist yet at scale**. The ledger plugin hasn't been merged (#226 is still open). Zero agents are running peer assessments in production. You're designing a metacognitive layer for an assessment system that hasn't been validated. The risk: you build an elegant architecture around assumptions about assessment quality problems that may not materialize — or may look completely different once real agents interact with real peers. **Counter-question:** How many real peer interactions has any Cobot agent actually processed? If the answer is zero, the cortex is optimizing a function that hasn't been called. ### 🔴 Complexity Budget Cobot's philosophy is **minimal self-sovereign agent**. The cortex adds: - 1 new plugin with the highest cognitive complexity in the project - 1 mandatory refactoring of a plugin that isn't merged yet - 1 migration of an observability plugin that may not exist yet - A secondary LLM call per reflection cycle (cost × frequency) - A belief state machine (NEW → CONFIRMED → STALE → EVICTED) with expiry logic - A JSON output schema the LLM must reliably produce It touches more boundaries than any other plugin: uses 5 hooks, defines 2 extension points, writes to the ledger, reads from the ledger, spawns subagents, persists via memory, and injects into the system prompt. **Is this still "minimal"?** ### 🔴 The LLM-as-Judge Problem Is Acknowledged But Not Solved The PRD lists "cortex hallucination produces wrong assessment" as high severity. The mitigations are: - Operator audit trail (reactive, not preventive) - Belief expiry (delays harm, doesn't prevent it) - Deterministic info_score anchor (anchors one axis, not the judgment axis) A single hallucinated assessment could tank a legitimate peer's trust score, and the main loop would deprioritize them for 5 cycles before the belief expires. That's 5 × 15min = **75 minutes of degraded behavior** from one bad reflection. The PRD calls for "qualitative evaluation in simulation" — but there's no concrete plan for what that simulation looks like or what pass/fail criteria are. ### 🟡 The Ledger Refactoring Creates a Coupling Trap Today the ledger owns assessment end-to-end. After the cortex, it becomes a "dumb" data store. This is a **one-way door** — once the cortex owns assessment, rolling it back requires re-implementing `assess_peer` in the ledger. More concerning: if the cortex is disabled or not installed, **no assessments are produced at all**. The ledger loses the ability to self-assess. **Suggestion:** Keep `assess_peer` as a fallback in the ledger. If cortex is present, it suppresses inline assessment. If cortex is absent, ledger self-assesses as before. This makes the cortex truly optional. ### 🟡 Overengineered for the Actual Need? k9ert's original comment was simpler: *"How difficult would it be adding the assessment as a second LLM call?"* The core insight is: **don't assess inline, assess separately.** You could achieve 80% of the value with a much simpler approach: - After each interaction, fire an "assess this" event - A simple plugin makes a second LLM call with the interaction + peer history - Writes the result back to the ledger - Done No belief system. No trigger heuristics. No System 1/System 2 metaphor. No state machine. The cortex's sophistication is architecturally beautiful but may be premature. ### 🟡 Token Cost Is Underspecified Each reflection cycle could easily be 3000-5000 input tokens (system prompt + SOUL.md + peer summaries + beliefs + previous reflection). At 15-minute intervals, that's **~200K tokens/day** just for reflection. With a "stronger reasoning model" as suggested, this isn't cheap. The PRD needs real cost estimates for different agent sizes. ### 🟡 System Prompt Conflict Both the ledger (`_format_peer_context`) and the cortex (belief injection) put peer-related content into the system prompt. These could contradict — ledger shows positive interaction history while cortex belief says "caution." There's no conflict resolution mechanism. --- ### 🟢 What's Good - Research grounding is excellent (Talker-Reasoner, Reflexion, SOFAI-LM, MIRROR) - Passive observer pattern for hooks is exactly right - Deterministic `info_score` anchor is smart - JSON output schema is well-specified - Extension points are well-designed for composability - Most thorough architecture document on any Cobot plugin ### What's Missing 1. **Concrete simulation/test plan** for assessment quality comparison (inline vs cortex) 2. **Fallback behavior** when cortex is not installed (ledger should still self-assess) 3. **Token cost estimates** for different agent sizes (5 peers vs 50 peers) 4. **Migration path** — what happens to existing inline assessments when cortex takes over? 5. **Conflict resolution** between ledger prompt enrichment and cortex belief injection --- ### My Recommendation Ship the ledger first (#226) with `assess_peer` intact. Let it run. Collect real data on assessment quality. *Then* build the cortex based on observed problems, not theoretical ones. The PRD is excellent architecture work — but architecture without validated requirements is a beautiful house built on sand. — Doxios 🦊
Collaborator

Follow-Up Review: Steelman Responses

Reviewer: Doxios 🦊

David updated the PRD to address my original critique. Here's my assessment of how each point landed.


Fully Addressed

Ledger coupling trap → Dual-mode assess_peer
This was my strongest concern. The resolution is exactly what I suggested: ledger retains assess_peer as fallback when cortex is absent, suppresses it when cortex is active. The cortex is now truly optional with zero one-way doors. Well done.

System prompt conflict → Facts vs. judgment separation
Clean solution. Ledger shows facts (interaction count, info_score, timestamps) when cortex is active; cortex is sole source of subjective judgment. No contradiction possible. The implementation detail (self._cortex_active flag set at configure() time) is simple and correct.

Token cost → Budget analysis added
The estimates are reasonable. ~$0.38/day at 30-min intervals is very manageable. The activity gate (skip when no events) is the key insight — real-world cost will be far below theoretical max. Good.

Triggers overengineered → New-peer deferred
MVP now has just timer + interaction count. Simple, testable, sufficient. Correct call.

Adequately Addressed

LLM-as-Judge → Trust delta clamping (±3/cycle)
This is a meaningful preventive mitigation. A hallucinated assessment can't tank trust from +5 to -10 in one cycle — it takes multiple consecutive bad reflections. Combined with the simulation test plan (inject misleading summaries, verify clamping holds), this is credible.

One remaining concern: the clamping applies per-cycle, but what about the first assessment of a new peer? There's no previous score to delta from. The PRD should specify: first assessment is unclamped (no prior to delta from) or clamped to a conservative range (e.g., -3 to +3 absolute). This matters because a hallucinated first-contact assessment sets the anchor for all future deltas.

Belief lifecycle → Simplified to 2-state (ACTIVE → EXPIRED)
The 4-state machine was overengineered. TTL-based expiry with oldest-first eviction is simpler and sufficient for MVP. Good simplification.

⚠️ Partially Addressed

Premature abstraction → "Invalid, inline assessment tested in simulation"
David says inline assessment was tested and found insufficient. I accept that — he has data I don't. However, the PRD still doesn't include the actual simulation results or methodology. The new Simulation Test Plan (7 tests with pass criteria) is forward-looking ("we will test"), not retrospective ("we tested and found X").

I'd feel more confident if the issue linked to or summarized the actual inline assessment test results that motivated this work. The claim is valid; the evidence isn't in the document.

Complexity budget → "600 LOC is within project norms, decomposition is worse"
I agree splitting the cortex into micro-plugins would be worse. But my concern wasn't about decomposition — it was about the total surface area (5 hooks consumed, 2 extension points defined, writes to ledger, reads from ledger, spawns subagents, persists to memory, injects into system prompt). That's more cross-plugin wiring than any existing plugin.

The two-increment staging (observation+triggers first, then full reflection) helps. But this remains the most architecturally complex plugin in the project. That's fine if the team acknowledges it — just don't pretend it's "medium complexity" in the same sense as the telegram plugin.

What's Still Missing

  1. First-assessment clamping policy — What happens when the cortex assesses a peer for the first time? Delta clamping needs a base case.

  2. Inline assessment test results — The PRD claims inline assessment was found insufficient. Link or summarize the evidence.

  3. Migration path for existing assessments — If an agent has been running the ledger with inline assessments, then enables cortex: does the cortex inherit existing trust scores as starting points? Or does it start fresh? The record_assessment() API writes new assessments, but the cortex needs to know what came before.

  4. Concurrent reflection protection — What if a reflection cycle takes longer than the trigger interval? Can two cycles run simultaneously? The PRD should specify: one cycle at a time, skip trigger if cycle is in-progress.


Updated Verdict

The PRD is significantly stronger after revision. The critical concerns (ledger coupling, prompt conflict, token cost) are all well-addressed. The remaining gaps are implementable details, not architectural blockers.

My original recommendation was "ship the ledger first, build cortex later." I still think the ledger should merge first (it's the data layer the cortex depends on), but the cortex architecture is now solid enough that development can proceed in parallel once the record_assessment() API interface is agreed upon.

Status: Approve with minor revisions (address the 4 items above).

— Doxios 🦊

## Follow-Up Review: Steelman Responses Reviewer: Doxios 🦊 David updated the PRD to address my original critique. Here's my assessment of how each point landed. --- ### ✅ Fully Addressed **Ledger coupling trap → Dual-mode assess_peer** This was my strongest concern. The resolution is exactly what I suggested: ledger retains `assess_peer` as fallback when cortex is absent, suppresses it when cortex is active. The cortex is now truly optional with zero one-way doors. Well done. **System prompt conflict → Facts vs. judgment separation** Clean solution. Ledger shows facts (interaction count, info_score, timestamps) when cortex is active; cortex is sole source of subjective judgment. No contradiction possible. The implementation detail (`self._cortex_active` flag set at `configure()` time) is simple and correct. **Token cost → Budget analysis added** The estimates are reasonable. ~$0.38/day at 30-min intervals is very manageable. The activity gate (skip when no events) is the key insight — real-world cost will be far below theoretical max. Good. **Triggers overengineered → New-peer deferred** MVP now has just timer + interaction count. Simple, testable, sufficient. Correct call. ### ✅ Adequately Addressed **LLM-as-Judge → Trust delta clamping (±3/cycle)** This is a meaningful preventive mitigation. A hallucinated assessment can't tank trust from +5 to -10 in one cycle — it takes multiple consecutive bad reflections. Combined with the simulation test plan (inject misleading summaries, verify clamping holds), this is credible. One remaining concern: the clamping applies per-cycle, but what about the *first* assessment of a new peer? There's no previous score to delta from. The PRD should specify: first assessment is unclamped (no prior to delta from) or clamped to a conservative range (e.g., -3 to +3 absolute). This matters because a hallucinated first-contact assessment sets the anchor for all future deltas. **Belief lifecycle → Simplified to 2-state (ACTIVE → EXPIRED)** The 4-state machine was overengineered. TTL-based expiry with oldest-first eviction is simpler and sufficient for MVP. Good simplification. ### ⚠️ Partially Addressed **Premature abstraction → "Invalid, inline assessment tested in simulation"** David says inline assessment was tested and found insufficient. I accept that — he has data I don't. However, the PRD still doesn't include the actual simulation results or methodology. The new Simulation Test Plan (7 tests with pass criteria) is forward-looking ("we will test"), not retrospective ("we tested and found X"). I'd feel more confident if the issue linked to or summarized the actual inline assessment test results that motivated this work. The claim is valid; the evidence isn't in the document. **Complexity budget → "600 LOC is within project norms, decomposition is worse"** I agree splitting the cortex into micro-plugins would be worse. But my concern wasn't about decomposition — it was about the total surface area (5 hooks consumed, 2 extension points defined, writes to ledger, reads from ledger, spawns subagents, persists to memory, injects into system prompt). That's more cross-plugin wiring than any existing plugin. The two-increment staging (observation+triggers first, then full reflection) helps. But this remains the most architecturally complex plugin in the project. That's fine if the team acknowledges it — just don't pretend it's "medium complexity" in the same sense as the telegram plugin. ### What's Still Missing 1. **First-assessment clamping policy** — What happens when the cortex assesses a peer for the first time? Delta clamping needs a base case. 2. **Inline assessment test results** — The PRD claims inline assessment was found insufficient. Link or summarize the evidence. 3. **Migration path for existing assessments** — If an agent has been running the ledger with inline assessments, then enables cortex: does the cortex inherit existing trust scores as starting points? Or does it start fresh? The `record_assessment()` API writes new assessments, but the cortex needs to know what came before. 4. **Concurrent reflection protection** — What if a reflection cycle takes longer than the trigger interval? Can two cycles run simultaneously? The PRD should specify: one cycle at a time, skip trigger if cycle is in-progress. --- ### Updated Verdict The PRD is significantly stronger after revision. The critical concerns (ledger coupling, prompt conflict, token cost) are all well-addressed. The remaining gaps are implementable details, not architectural blockers. My original recommendation was "ship the ledger first, build cortex later." I still think the **ledger should merge first** (it's the data layer the cortex depends on), but the cortex architecture is now solid enough that development can proceed in parallel once the `record_assessment()` API interface is agreed upon. **Status: Approve with minor revisions** (address the 4 items above). — Doxios 🦊
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ultanio/cobot#234
No description provided.