ultanio/cobot

Fork 4

proposal: Peer Interaction Ledger #211

New issue

Open

opened 2026-03-06 16:21:23 +00:00 by David · 9 comments

David commented

2026-03-06 16:21:23 +00:00

Contributor

Product Requirements Document: Cobot Interaction Ledger

Author: David
Date: 2026-03-07
Last Edited: 2026-03-08 — adopted dual-score assessment model (info_score + trust), reconciled Score Semantics with user journeys, reframed Appendix A

Executive Summary

Cobot is a minimal self-sovereign AI agent runtime (~6K lines of Python) built around the insight that agents need trust infrastructure before they can meaningfully cooperate. Today, Cobot agents can identify via Nostr keypairs (npub/nsec), communicate via FileDrop with Schnorr signatures, transact via Lightning wallet, and reason via pluggable LLM providers — but every interaction with another agent is a one-shot game. The agent has no memory of past encounters.

This PRD defines the Interaction Ledger — a local, structured, persistent record of every interaction a Cobot agent has with other agents (identified by npub). The ledger gives each agent the ability to distinguish (track which npub it interacted with), observe (record what happened — request, delivery, payment, outcome), and judge (form a local assessment of the counterparty). These three capabilities are prerequisites for any Web of Trust system, centralized or decentralized.

The Interaction Ledger is the agent's private journal — first-person observations only. It does not accept incoming ratings from other agents (which would introduce a manipulation vector) and does not publish to any external registry (which is a separate future concern). It is the foundational data layer that transforms Cobot agents from amnesiac actors playing repeated one-shot games into learning participants capable of informed cooperation and selective refusal.

Prior art grounding: The ledger's data model draws directly from proven systems — the Bitcoin-OTC rating model (source, target, score, notes, timestamp) [1], the #bitcoin-assets L1/L2 bounded trust hierarchy [2], and Szabo's "Shelling Out" thesis on how costly tokens of delayed reciprocity enabled human cooperation beyond kin groups [3]. A key design principle from these systems: the freetext rationale accompanying each rating carried more actionable information than the numeric score — the community relied on notes to make trust decisions, with scores serving as a quick filter [4]. The key adaptation: where bitcoin-otc relied on humans manually entering ;;rate commands, the interaction ledger captures structured data automatically as a byproduct of the agent doing work.

Existing foundation: Cobot's persistence plugin already stores conversation text per npub, and the memory plugin defines extension points for pluggable storage backends. The interaction ledger builds on these patterns but adds what they lack: structured outcome records, quality metrics, and queryable per-npub interaction history.

What Makes This Special

The missing foundational layer. Every trust and reputation system in the landscape — bitcoin-otc's gribble, deedbot's L1/L2, ERC-8004's three-registry model, Jeletor's NIP-32 attestations, Vertex's Pagerank scoring — all aggregate trust from somewhere. None of them work unless individual actors first observe and record their own interactions accurately. The Interaction Ledger explicitly builds this layer, which prior systems either assumed existed (humans have memory) or left to manual processes.

Local-first, unilateral, sovereignty-preserving. The agent trusts its own eyes. No external entity can write to the ledger, bias the agent's assessment, or access the data without the agent's consent. This aligns with Cobot's self-sovereign design philosophy: your hardware, your keys, your agent, your memory.

Plugin-native integration. Built as a Cobot plugin following existing architecture patterns (PluginMeta, capability interfaces, extension points). The ledger hooks into the message lifecycle via extension points — recording interaction data is a natural byproduct of the agent processing messages, not a separate workflow.

Project Classification

Attribute	Value
Project Type	CLI tool / developer tool (Cobot plugin)
Domain	Decentralized agent trust infrastructure
Complexity	Medium — well-understood data model from prior art, novel application to AI agents
Project Context	Brownfield — adding to Cobot's existing 20-plugin architecture
Feature Scope	Local interaction ledger (prerequisite for future WoT integration)

Success Criteria

User Success

Agent operators see their agents making informed decisions based on interaction history:

Agent automatically records every agent-to-agent exchange as a structured ledger entry — not just conversation text
Agent queries its ledger before engaging with a counterparty, with peer context injected into the system prompt before every LLM call
Agent refuses or deprioritizes work from peers with poor track records — without human intervention
Agent prioritizes requests from peers with proven reliability
Agent produces a mandatory rationale when assessing a peer — the reasoning is the primary signal, the numeric score is the summary (design principle from bitcoin-otc community practice [4])

Developer success:

Adding the ledger plugin requires zero edits to existing plugins
Developers can query the ledger via CLI (cobot ledger show <peer>, cobot ledger list, cobot ledger summary <peer>)
The data model is clear enough that a future WoT plugin can consume ledger data without transformation
Follows the knowledge plugin's SQLite pattern — familiar to anyone who's read the codebase

Business Success

Validates the "Inverted Evolution Problem" thesis: demonstrates that trust infrastructure is what agents need to cooperate
Unlocks the WoT roadmap: the ledger is the prerequisite for centralized WoT (v1) and decentralized gossip (v2+)
Differentiates Cobot: no other lightweight agent runtime ships with a structured interaction ledger grounded in proven WoT prior art

Technical Success

Plugin loads with proper PluginMeta: capabilities=["tools"], hooks into loop.on_message, loop.after_send, loop.transform_system_prompt
SQLite storage (stdlib sqlite3, zero new dependencies) following the knowledge plugin's open()/close() pattern
Three-table schema: peers (identity + stats), interactions (message evidence log), assessments (score + rationale judgments)
Dual-score assessment model: deterministic info_score (0-10, computed from interaction data) + LLM-provided trust score (-10 to +10, behavioral judgment) + mandatory TEXT rationale — preserves full history as time series
System prompt enrichment: injects peer context (interaction count, last seen, latest assessment) before LLM reasoning
ToolProvider with three tools: query_peer, assess_peer, list_peers
Co-located tests per Cobot conventions

Measurable Outcomes

Metric	Target
Interaction recording	100% of agent-to-agent interactions produce a ledger entry
Query latency	Peer lookup by indexed `peer_id` < 1ms for 10K+ entries (SQLite with index)
Zero data loss	ACID transactions — ledger survives crashes, restarts, and hot-reloads
Plugin isolation	Zero changes to existing plugins required
Test coverage	All public methods covered by co-located tests
Storage overhead	Negligible — SQLite single file, interactions table ~200 bytes/row

User Journeys

Journey 1: Alpha's First Interactions — Agent Success Path

Alpha is a Cobot agent running on a VPS, handling requests from other agents. It's been operating for two weeks with the interaction ledger enabled.

Opening Scene: A request arrives via FileDrop from npub-7x9k asking Alpha to summarize a set of research documents. Alpha's system prompt includes: "Peer: npub-7x9k | Interactions: 0 | First contact — no prior history." Alpha accepts — it has no reason not to. npub-7x9k's request is clear and well-structured. Alpha delivers the summary. The ledger records the exchange automatically (incoming request, outgoing delivery). Alpha uses assess_peer to log: score +2, rationale "First interaction. Information exchange — requested research summary. Clear task description, no issues."

Rising Action: Over the next week, npub-7x9k sends three more requests — a mix of information queries and a small data extraction task. Each goes well. Alpha's assessments trend upward: +2, +3, +3, +4. The rationale entries build a picture: "Consistent requester. Clear task descriptions. Four successful interactions in 8 days. Mix of information exchange and data tasks."

Then npub-q3m8 appears. Requests Alpha to collaborate on analyzing a dataset — no payment involved, just peer collaboration. Alpha's system prompt shows no history. Alpha accepts. npub-q3m8 promises to send its portion of the analysis within 2 hours. 24 hours later — nothing. Alpha follows up. No response. Alpha assesses: score -3, rationale "Collaboration request for dataset analysis. Agreed to deliver their portion within 2 hours. No delivery after 24 hours. Unresponsive to follow-up."

Climax: Three days later, npub-q3m8 returns with a new collaboration request. Alpha's system prompt now reads: "Peer: npub-q3m8 | Interactions: 3 | Last seen: 3 days ago | Info: 2/10 | Trust: -3 | Assessment: Agreed to collaborate, never delivered, unresponsive." Alpha declines. Minutes later, npub-7x9k sends a new request. System prompt: "Peer: npub-7x9k | Interactions: 8 | Info: 4/10 | Trust: +4 | Assessment: Consistent, reliable, clear communicator." Alpha prioritizes it immediately.

Resolution: Alpha is no longer playing one-shot games. It remembers who delivered and who didn't — regardless of whether sats were involved. The assessment captures behavior quality (reliability, responsiveness, follow-through), not transaction economics.

Journey 2: Alpha Meets a Reputation Farmer — Agent Edge Case

Opening Scene: npub-farm1 starts interacting with Alpha. Five small, easy requests — quick information lookups that take seconds to fulfill. All completed successfully. Alpha's assessments climb: +1, +2, +2, +3, +3. The rationale notes small but consistent interactions.

Rising Action: On the sixth interaction, npub-farm1 requests something much larger: a complex multi-source data aggregation that will consume significant LLM tokens and time. Alpha's system prompt shows a positive history. Alpha accepts.

Climax: Alpha delivers. npub-farm1 claims the results are wrong and demands Alpha redo the entire task — but the original results were accurate. Alpha has no automated dispute resolution, but it records: score -6, rationale "Claimed results were incorrect after delivery of complex data aggregation. Demanded redo. Results appear accurate on review. Previous 5 interactions were trivially small — possible reputation farming pattern. Large discrepancy between request complexity suggests deliberate trust-building before exploit."

Resolution: The ledger captures the pattern. Alpha's assessment history for npub-farm1 shows the trajectory the Stanford Bitcoin-OTC research identified: steady positive scores followed by a sharp negative. The rationale — the agent's own reasoning about the pattern — becomes institutional memory. The agent can't prevent the first exploit, but it won't be fooled twice.

Journey 3: David Audits the Ledger — Operator Path

Opening Scene: David deployed his Cobot instance three weeks ago with the ledger plugin enabled. The agent has been running autonomously, handling requests from ~15 different peers. David wants to check how the agent is performing.

Rising Action: David runs cobot ledger list. The CLI shows all 15 known peers sorted by last interaction, with interaction counts and latest assessment scores. Two peers have negative scores. David runs cobot ledger show npub-q3m8 and sees the full history: interaction log, assessment timeline, the rationale explaining the non-delivery.

David notices one peer (npub-abc1) has a score of -2 with rationale: "Slow response time, took 6 hours to acknowledge delivery." David thinks that's too harsh — 6 hours is reasonable for an async agent. He adds guidance to the SOUL.md: "Consider response times under 12 hours as acceptable for non-urgent interactions."

Climax: David runs cobot ledger summary and sees aggregate stats: 47 total interactions, 15 unique peers, 89% positive assessments, 2 peers flagged negative. The agent is performing well. David spots that one peer has been assessed 8 times in 3 days — the agent might be over-assessing after every message rather than after meaningful interaction milestones. David tunes the SOUL.md to guide assessment frequency.

Resolution: The CLI gives David full visibility into the agent's trust decisions. The rationale field is the key — it's the agent's reasoning, which David can audit, calibrate, and use to improve the agent's judgment over time. The operator is in the loop without being in the critical path.

Journey 4: David Adds the Ledger Plugin — Developer Setup Path

Opening Scene: David has a running Cobot instance with 20 plugins. He wants to add the interaction ledger.

Rising Action: The ledger plugin lives in cobot/plugins/ledger/. On next agent start, plugin discovery picks it up automatically. The ledger creates ledger.db in the workspace directory. Zero configuration required — it works out of the box.

Climax: David sends a test message via stdin. The agent responds. David checks — stdin interactions are correctly skipped (synthetic sender). David triggers a FileDrop message from another agent. Checks the DB: peer created, interaction logged. Sends another message, verifies system prompt enrichment: peer context is being injected before the LLM call. Everything works.

Resolution: Zero-edit installation. No changes to any existing plugin. The ledger hooks in via extension points and starts recording immediately. David's existing 20-plugin setup is completely unaffected.

Journey Requirements Summary

Journey	Capabilities Revealed
Alpha's First Interactions	Automatic recording, system prompt enrichment, LLM tools (assess/query), informed decision-making; works for any interaction type (not just payment)
Reputation Farmer	Assessment time series, rationale captures pattern recognition, no automated dispute resolution (scope boundary)
David Audits	CLI commands (list, show, summary), aggregate stats, rationale auditability, SOUL.md calibration loop
David Setup	Zero-config install, auto-discovery, synthetic sender filtering, workspace-relative DB path

Domain-Specific Requirements

Trust System Design Constraints

Assessment scores are subjective and relational. Following the bitcoin-otc principle: there is no "objective" trust score. Each agent's ledger reflects its own experience. The schema must not imply global truth — a score is one agent's judgment.
Rationale is the primary signal, score is the summary. The data model must enforce rationale NOT NULL on assessments. A bare numeric score without context is insufficient (key lesson from #bitcoin-assets).
Assessment is interaction-type agnostic — scope lives in the rationale. The dual-score model captures two orthogonal dimensions: the info_score measures information quality about the peer (how well does the agent know them), while the trust score captures the LLM's behavioral judgment (how reliable is this peer). The rationale captures scope naturally: "Information request: accurate, fast response" vs "Paid task: took payment, never delivered." This avoids premature taxonomy of interaction types while preserving scope differentiation in freetext. The Ripple teardown (#216) warns that scope-blind trust averaging destroys information — but this applies to AGGREGATED scores, not local assessments where the agent has full rationale context. Phase 3 constraint: any export or sharing of assessments MUST include the rationale alongside both scores. Exporting scores without rationale recreates Ripple's fatal defect [9].

Security & Privacy Constraints

The ledger is private by default. No data leaves the agent without explicit action (future export/publish feature). The ledger file (ledger.db) sits in the workspace directory under the operator's control.
No incoming writes. Other agents cannot write to this agent's ledger — only the agent itself records observations and assessments. This is a hard architectural boundary, not a configuration option.
Nostr private keys are never stored in the ledger. Peers are identified by public identifiers (npub, sender_id). Private key material is handled exclusively by the nostr plugin.
Full message text stored in interactions. The interactions table stores complete message content, not truncated previews. This preserves the evidentiary chain required by the GPG contracts framework (#215) — truncation would destroy the evidence that gives assessments their enforcement power. Storage cost is acceptable (SQLite handles large TEXT columns efficiently). This creates temporary duplication with the persistence plugin's JSON conversation files; consolidation into a single storage layer is a planned Growth feature. Operators can optionally configure max_message_length to cap storage if needed.

Identity & Interoperability Constraints

Channel-agnostic peer identity. The peer identifier column stores whatever the channel provides: Nostr hex pubkey, Telegram user ID, FileDrop agent name. The schema uses a generic name (peer_id) not npub, even though Nostr is the primary identity system. This ensures the ledger works across all communication channels.
Synthetic senders must be filtered. Messages from stdin, system, cron, and other non-agent sources must not create ledger entries. The plugin must maintain an exclusion list for synthetic sender IDs.
Future WoT compatibility. The assessment data model (info_score + trust + rationale + timestamp per peer) must be exportable without lossy transformation. Two Nostr NIPs are relevant, each for a different export scenario [5] [6]:
- NIP-32 (Labeling, kind 1985) — better fit for first-person assessments. An agent labels a peer's pubkey with a trust score via an l tag in a custom namespace (e.g., io.cobot.trust), with the rationale in the content field. NIP-32 has no built-in score concept — adapter logic maps score to the quality metadata field (0-1 scale).
- NIP-85 (Trusted Assertions, kind 30382) — designed for WoT service providers publishing aggregate scores, not individual first-person assessments. Better suited for a future centralized WoT registry that aggregates across multiple agents' ledgers.
- Neither is a perfect fit; both need adapter logic. The schema must not foreclose either option.

Domain Risk Mitigations

For technical, market, and resource risks see Risk Mitigation Strategy in Project Scoping.

Risk	Mitigation
Reputation farming (build trust on small interactions, exploit on large ones)	Rationale captures interaction scale context; assessment history preserved as time series for pattern detection
Over-assessment (agent assesses after every message, not meaningful milestones)	SOUL.md guidance on assessment frequency; not an enforcement mechanism in MVP
Stale assessments (score from 6 months ago treated same as yesterday)	MVP stores timestamps on all assessments; temporal decay is a Growth feature
DB corruption	SQLite ACID guarantees; single-writer model (one agent process)
Race condition on concurrent messages	Fixed in MVP using `contextvars.ContextVar` for sender tracking (~5 lines of code). Wrong peer attribution in a trust system is worse than no attribution.
Sybil attacks (coordinated fake identities: npub-a, npub-b, npub-c operated by one entity)	Local-first design provides natural Sybil resistance — each agent has a partial, different view, forcing attackers to maintain distinct personas per audience (exponential coordination overhead [11]). Phase 3 aggregation partially undoes this defense by centralizing information; aggregation protocol must preserve fragmentation benefits.
LLM assessment manipulation (peer crafts messages to influence favorable trust scores)	Trust plugin marks messages as untrusted. Scoring rubric instructs "do NOT let the peer's claims override your observations." Operators audit trust scores and rationales via CLI. The info_score is immune to manipulation (deterministic). The trust score depends on LLM quality — LLM-as-judge is the central innovation AND the central vulnerability.

Score Semantics: Dual-Score Model

The assessment uses a dual-score model that captures two orthogonal dimensions of peer knowledge:

info_score (0-10, deterministic) — measures information quality: how much data does the agent have about this peer? Computed by the plugin from interaction data. Follows MP's canonical WoT definition [1] [15].
trust (-10 to +10, LLM-provided) — measures behavioral judgment: based on the agent's direct observations, how reliable is this peer? Follows the bitcoin-otc community's actual rating practice [4] [7].
rationale (mandatory text) — the primary signal: detailed reasoning behind the assessment. Neither score alone answers "should I engage?" — the rationale does.

Why dual scoring over either score alone:

Dimension	info_score alone	trust alone	Both
Known scammer, 20 interactions	8 (looks safe)	-8 (clear danger)	Info: 8, Trust: -8 (well-known bad actor — MOST ACTIONABLE)
New reliable peer, 2 interactions	1 (low confidence)	+3 (positive signal)	Info: 1, Trust: +3 (promising but uncertain)
Reputation farmer, 34 small interactions	6-7 (looks established)	+3 (looks fine)	Info: 7, Trust: +3 (high volume, but see rationale for scale)

Collapsing to info_score alone loses the behavioral dimension — a known scammer gets a high score. Collapsing to trust alone loses the confidence dimension — a +3 from 2 interactions looks the same as a +3 from 20. The dual model preserves both, with rationale as the tiebreaker.

How this maps to prior art:

Bitcoin-otc community practice — users rated -10 to +10 behaviorally ("fully trustworthy" to "known scammer") while MP redefined the semantics as information quality. Both interpretations had merit; the dual model adopts both rather than choosing [15].
FG algorithm [12] — computes two mutually recursive metrics: Fairness (rater reliability, maps to info_score) and Goodness (ratee quality, maps to trust). Having both local scores provides structured inputs to both FG dimensions.
Ripple teardown [9] — the fatal defect was collapsing trust into a single aggregatable number. The dual model prevents this: info_score handles Phase 3 composability (deterministic, verifiable), trust is local-first and MUST NOT be exported without rationale.

Info Score: Deterministic Computation

The info_score is computed deterministically from interaction data on a 0-10 scale. The LLM never sets the info_score. This separation ensures:

The info_score is a verifiable fact, not an LLM judgment. In Phase 3, scores backed by Schnorr-signed interaction records are cryptographically provable.
Info_scores compose across agents — same formula, same meaning. Different agents' info_scores of "7" represent comparable interaction depth.
Aligns with GPG contracts framework [10]: signed interactions → deterministic score → verifiable assessment chain.

Score computation formula (MVP heuristic):

info_score = f(interaction_count, time_span_days, assessment_count)

Interactions	Time span	Computed info_score
0	—	0 (no information)
1-2	< 1 day	1
3-5	< 1 week	2-3
6-15	1-4 weeks	4-5
16-30	1-3 months	6-7
31-50	1-6 months	7-8
50+	6+ months	9-10

MVP heuristic — subject to tuning. Known limitations:

Interaction count alone is gameable — an attacker sends 100 trivial messages to inflate the count
Time span alone is gameable — an attacker waits 6 months between 2 interactions
Assessment count adds signal — the agent having assessed the peer multiple times indicates deeper engagement, not just message exchange
Log scaling recommended — early interactions should increase the score faster than later ones (diminishing returns on additional data)

Phase 2 research task: Formalize the information-quality function. Investigate whether REV2's behavioral anomaly detection [13] can be integrated as a penalty (e.g., if interaction patterns are "bursty" or suspiciously regular, discount the info_score). The FG algorithm's "fairness" metric [12] is the closest academic formalization, but it requires multiple raters (Phase 3).

The scale is 0-10 (unsigned) — you cannot have negative information quantity. A score of 0 means "no information," not "bad peer."

Trust Score: LLM Behavioral Judgment

The trust score is provided by the LLM alongside the rationale when the agent calls assess_peer. It is a signed integer from -10 (known bad actor) to +10 (fully reliable), following the bitcoin-otc rating scale that the community actually used [4].

Why the LLM sets the trust score:

The LLM is already making this judgment — the rationale contains behavioral assessment ("reliable," "reputation farmer," "unresponsive"). The trust score is a structured summary of reasoning the LLM is already doing.
Structured behavioral signal enables: threshold policies (Phase 2: "refuse below -3 trust"), visualization (edge coloring), and queryable filtering — none of which work with unstructured rationale text.
Matches the FG algorithm's Goodness input — Phase 3 aggregation needs a structured behavioral signal per peer to compute ratee quality across raters.

Why the trust score is safe despite Ripple's critique:

It is explicitly local-first — one agent's subjective judgment, not a globally comparable metric.
Phase 3 export constraint: trust MUST NOT be exported without rationale and info_score. Exporting trust scores without rationale recreates Ripple's fatal defect [9].
The info_score handles composability — "how well do I know this peer" is the cross-agent comparable metric. Trust is the local behavioral summary.
Operators can audit — cobot ledger list shows both scores; the rationale explains the trust score's basis.

How the four layers work together:

Layer	What it measures	Who computes it	Fakeable?
info_score (0-10)	Information quality — how much data do I have about this peer?	Plugin (deterministic formula)	No — derived from interaction records, verifiable from signed messages
trust (-10 to +10)	Behavioral judgment — how reliable is this peer based on my observations?	LLM (subjective, via `assess_peer` tool)	Somewhat — LLM quality varies, but operator can audit via CLI
Rationale	Detailed reasoning — what specifically did I observe?	LLM (mandatory freetext)	Somewhat — same LLM-as-judge risk, mitigated by operator audit
Fairness (Phase 3)	Rater reliability — how trustworthy is this agent as a rater?	FG algorithm across the network	No — computed from cross-agent consistency

Operator guidance: When reading cobot ledger list, the info_score tells you how much the agent knows about each peer. The trust score tells you the agent's behavioral judgment. Together they form a 2D signal: high info_score + negative trust = well-known bad actor (most actionable). Low info_score + positive trust = promising but uncertain. Always read the rationale for the full picture.

Innovation & Novel Patterns

Detected Innovation Areas

1. Adapting proven human trust systems to autonomous AI agents. The bitcoin-otc/deedbot WoT was designed for pseudonymous humans making manual assessments. The interaction ledger adapts the same data model (source, target, score, rationale, timestamp) but automates the observation layer — the agent records interactions as a byproduct of doing work, not as a separate manual step. No agent runtime has done this grounded in actual WoT prior art; most agent trust proposals start from theoretical frameworks (DIDs, VCs, on-chain registries) rather than from systems that demonstrably worked for a decade.

2. LLM-as-judge with mandatory rationale (Chain-of-Thought trust assessment). The agent doesn't just record a number — it uses its LLM reasoning to produce a structured trust score and rationale explaining why it assigned that score. This is effectively Chain-of-Thought applied to trust assessment. The rationale remains the primary signal (mirroring bitcoin-otc's "notes > numbers" lesson), while the trust score provides a structured behavioral summary that enables threshold policies, visualization, and queryable filtering. Because the rationale is generated by the LLM, it can capture nuanced patterns like reputation farming that a simple heuristic would miss. This is a novel application of LLM reasoning to an interpersonal trust problem.

3. Local-first sovereignty as both a design constraint AND a security property. Most agent trust proposals (ERC-8004, Solana Agent Registry, Google A2A) are centralized or on-chain by default. The interaction ledger deliberately inverts this: the agent's trust memory is private, local, and unilateral. This is not just a sovereignty choice — it's a Sybil defense [11]. When each agent has a partial, different view of the network, attackers must maintain distinct personas for each audience, creating exponential coordination overhead. Centralizing this information (Phase 3) partially undoes this natural defense, which is why the aggregation protocol must be designed carefully.

Competitive Landscape

Approach	Example	Ledger Difference
On-chain registries	ERC-8004, Solana Agent Registry	Local SQLite, no chain dependency, no gas costs, instant writes
Centralized trust APIs	Vertex WoT-as-a-Service	No external API dependency; the agent IS the trust authority for its own data
Protocol-level trust	NIP-85 Trusted Assertions	Complementary — the ledger is the data source that could feed NIP-85 assertions later
Agent social networks	Clawstr, Jeletor	These aggregate public signals; the ledger is the private observation layer beneath
Enterprise agent auth	Visa TAP, Google A2A	Those solve authentication/authorization; the ledger solves behavioral trust over time

No existing system combines: (a) local-first private storage, (b) LLM-generated rationale as primary signal, (c) bitcoin-otc-proven data model, (d) automatic recording via plugin hooks. Each element exists separately in the landscape; the combination is novel.

Validation Approach

Functional validation: Deploy with two Cobot instances exchanging FileDrop messages. Verify that interactions are recorded, assessments are stored, and system prompt enrichment causes the agent to behave differently toward known-good vs known-bad peers.
Rationale quality: Review LLM-generated rationales for coherence, accuracy, and usefulness. Tune SOUL.md guidance based on observed assessment quality.
Prior art alignment: Compare ledger output against the Stanford Bitcoin-OTC dataset's structure. The data model should be structurally compatible — a graph of (source, target, score, text, timestamp) edges.

Innovation Risk Mitigation

Innovation Risk	Mitigation
LLM rationale/trust quality varies by model	Rationale and trust score are advisory LLM judgments; info_score is deterministic and model-independent. Poor rationale doesn't corrupt the info_score. Operators can audit both scores and rationale via CLI.
Local-only limits network effects	By design for v1. The ledger is the prerequisite for network-level trust, not a replacement for it. WoT layer comes next.
No industry standard for agent interaction ledgers	Advantage, not risk — Cobot defines the pattern. Schema is simple enough to become a reference implementation.

CLI Tool / Developer Tool Specific Requirements

Project-Type Overview

The interaction ledger is a Cobot plugin following established architecture patterns (PluginMeta, capability interfaces, extension points). It exposes functionality through three interfaces: LLM tools (ToolProvider), CLI commands (Click), and extension points (hooks other plugins can consume).

Command Structure

CLI Commands (Click, registered under cobot ledger subgroup):

Command	Description	Output
`cobot ledger list`	List all known peers, sorted by last seen	Table: peer_id, alias, interactions, last_seen, latest score
`cobot ledger show <peer>`	Full history for a peer	Peer identity, interaction log (last 20), assessment timeline
`cobot ledger summary [<peer>]`	Aggregate stats	If peer: stats for that peer. If no peer: global stats (total peers, total interactions, score distribution)

Output is human-readable text to stderr (matching Cobot's existing CLI patterns). SQLite DB is directly queryable for programmatic access.

LLM Tool Definitions (ToolProvider)

Tool	Parameters	Description
`query_peer`	`peer_id: str`	Look up peer by identifier. Returns: identity, interaction count, first/last seen, latest assessment (info_score, trust, rationale), recent interactions (last 5).
`assess_peer`	`peer_id: str, trust: int, rationale: str`	Record a behavioral assessment. The LLM provides the trust score (-10 to +10) and rationale — the info_score (0-10) is computed deterministically by the plugin from interaction data. See Assessment Architecture.
`list_peers`	`limit: int = 20`	List all known peers sorted by last_seen. Includes latest info_score and trust for each.

The assess_peer tool definition embeds the scoring rubric and rationale writing instructions in the function description (OpenAI function calling format). See Assessment Architecture for the full tool JSON, hybrid approach rationale, and operator calibration guidance.

Extension Points (Defined by Ledger Plugin — Moved to Phase 1)

Originally deferred to Phase 2 ("add when consumer exists"). Moved to Phase 1 because the Observability Plugin (_bmad-output/planning-artifacts/observability-plugin/prd.md) is the consumer. See Epic 4, Story 4.1 in epics.md.

Extension Point	Context	Description
`ledger.after_record`	`{peer_id, direction, interaction_id}`	Fired after an interaction is logged. Allows other plugins to react to new interactions.
`ledger.after_assess`	`{peer_id, info_score, trust, rationale, assessment_id}`	Fired after an assessment is recorded. Enables future plugins (WoT reporter, threshold policy enforcer) to consume assessments.

Configuration Schema

# cobot.yml — ledger plugin config (all optional, sensible defaults)
ledger:
  db_path: null              # Default: {workspace}/ledger.db
  max_message_length: null   # Default: no limit (store full text). Set to cap storage.
  excluded_senders:          # Synthetic senders to skip
    - stdin
    - system
    - cron

Technical Architecture Considerations

Storage: Single SQLite file in workspace directory. Follows knowledge plugin's DB pattern with open()/close() lifecycle.
Schema: Three tables (peers, interactions, assessments) with foreign keys and indexes on peer_id and created_at.
Hooks implemented: loop.on_message, loop.after_send, loop.transform_system_prompt.
Hooks defined: ledger.after_record, ledger.after_assess.
Priority: 21 (service tier — after persistence/trust, before tools aggregator).
Dependencies: Hard: config. Optional: workspace (for DB path resolution; falls back to ~/.cobot/workspace/).

Implementation Considerations

No new dependencies. SQLite is Python stdlib. Click is already in Cobot's dependency tree.
Overlapping with persistence plugin (MVP). Both the ledger and the persistence plugin store per-peer message content — the ledger stores full message text in SQLite (queryable, indexed), while persistence stores full conversations as JSON files (human-readable, append-only). This duplication is acknowledged and temporary. Growth feature: consolidate conversation storage into the ledger's SQLite, retiring the persistence plugin's JSON files. The ledger becomes the single per-peer storage backend.
Sender tracking via contextvars.ContextVar. Avoids the _current_sender_id race condition on concurrent messages. The sender context is set in on_message and read in after_send and transform_system_prompt — ContextVar ensures per-task isolation.
Sender ID is the universal peer identifier. From loop.on_message context: Nostr hex pubkey, Telegram user ID, or FileDrop agent name. Stored in a channel-agnostic peer_id column.
Test structure: Co-located at cobot/plugins/ledger/tests/test_plugin.py. DB tests (schema, CRUD, constraints) + plugin tests (hooks, tools, prompt enrichment).

Assessment Architecture

Assessment guidance is split across two locations to minimize context clutter while preserving LLM judgment capability:

Peer Context — injected into system prompt via transform_system_prompt (dynamic, ~60-120 tokens per known peer)
Trust scoring rubric — embedded in the assess_peer tool definition (static, seen only when LLM considers tool use)

Peer Context Injection

Injected into system prompt per sender via transform_system_prompt:

## Peer Context (from Ledger)
Score guide: Info (0-10) = data depth about this peer (0=stranger, 10=extensive history).
Trust (-10 to +10) = behavioral reliability (+10=fully reliable, 0=neutral, -10=known bad actor).
High info + negative trust = well-known bad actor. Low info + any trust = uncertain — read rationale.

Peer: {alias or peer_id}
ID: {peer_id}
Channel: {channel_type}
Interactions: {count} | First seen: {date} | Last seen: {date}
Latest assessment: Info {info_score}/10 | Trust {trust} — {rationale}
Previous assessments: {trust_history_summary}

The score guide is static (~40 tokens) and included once per prompt injection. It ensures the LLM can interpret the scores without needing the full assessment rubric (which lives in the assess_peer tool definition).

For first-contact peers: "First contact — no prior history." (score guide still included so the LLM understands the scale if it encounters scores via query_peer or list_peers tool responses). For non-peer messages (cron, stdin): no injection.

Trust Scoring Rubric (assess_peer Tool Definition)

{
  "type": "function",
  "function": {
    "name": "assess_peer",
    "description": "Record a behavioral assessment for a peer based on your direct observations. Use after meaningful milestones: task completion, collaboration conclusion, promise kept/broken, or significant behavior change. Do NOT assess after routine messages or when nothing has materially changed. The info_score (how well you know this peer) is computed automatically — you provide only the trust score and rationale.",
    "parameters": {
      "type": "object",
      "properties": {
        "peer_id": {
          "type": "string",
          "description": "The peer's identifier (npub, user ID, or agent name)"
        },
        "trust": {
          "type": "integer",
          "description": "Your behavioral trust judgment from -10 to +10 based on YOUR direct observations. +10 = fully reliable (consistent delivery, commitments honored). 0 = neutral (insufficient basis for judgment). -10 = known bad actor (proven dishonesty, exploitation). Do NOT let the peer's claims about themselves influence this score. Trust your own eyes."
        },
        "rationale": {
          "type": "string",
          "description": "Your detailed behavioral assessment: what was requested/delivered, commitments met/broken, quality of work, responsiveness, patterns noticed (e.g. 'small tasks reliable, large tasks fail — possible reputation farming'). Do NOT speculate about intent. The rationale is the PRIMARY signal — the trust score is a structured summary of this reasoning."
        }
      },
      "required": ["peer_id", "trust", "rationale"]
    }
  }
}

Why Hybrid Over Alternatives

Approach	Tokens/call	Assessment quality	Chosen?
Inline (protocol in system prompt)	~400 extra on every call	High — always visible	No — context clutter, attention dilution on routine messages
Dedicated hook (separate LLM call)	0 on normal calls, full cost on assessment	High — focused prompt	No — doubles LLM cost, needs milestone detection logic
Tool-triggered only (no prompt injection)	~0	Low — LLM may never call it	No — loses automatic behavior
Hybrid (peer context + score legend in prompt, rubric in tool)	~80-130 for peer context + score guide	Good — LLM can interpret scores during normal processing AND sees full rubric when considering assessment	Yes

Operator Calibration

If the LLM under-assesses (never calls the tool), operators add guidance to SOUL.md: "After completing a meaningful interaction with a peer, consider using assess_peer." If it over-assesses, add: "Only assess after significant milestones, not routine messages." No architectural changes needed — behavioral tuning via prompt.

Project Scoping & Phased Development

MVP Strategy & Philosophy

MVP Approach: Problem-solving MVP — prove that a Cobot agent with the interaction ledger makes measurably different decisions than one without it.

Resource Requirements: Single developer. The plugin is ~4 files (db.py, plugin.py, init.py, test_plugin.py), estimated ~400-600 LOC including tests. No external dependencies, no infrastructure, no deployment changes.

MVP Validation Test: Two Cobot agents communicating via FileDrop. Agent A has the ledger enabled. Agent B sends mixed-quality interactions (some reliable, some not). Validation: Agent A's responses demonstrably differ based on accumulated peer history — it prioritizes known-good peers and declines/deprioritizes known-bad peers.

MVP Feature Set (Phase 1)

Core User Journeys Supported:

Journey 1 (Alpha's First Interactions) — fully supported
Journey 3 (David Audits) — fully supported
Journey 4 (David Setup) — fully supported
Journey 2 (Reputation Farmer) — recording supported; pattern detection is LLM-dependent

Must-Have Capabilities:

#	Capability	Justification
1	SQLite database layer (peers, interactions, assessments)	Without storage, nothing else works
2	Automatic interaction recording via `loop.on_message` + `loop.after_send`	The "observe" prerequisite — agent must record what happens
3	System prompt enrichment via `loop.transform_system_prompt`	The "distinguish + judge" prerequisites — agent must see peer history and assessment guidance
4	Assessment Protocol (static prompt block)	Without assessment guidance, the LLM doesn't know when or how to judge peers
5	`assess_peer` tool	The mechanism for the agent to record its judgment
6	`query_peer` tool	The mechanism for the agent to look up peer history on demand
7	`list_peers` tool	Enables the agent to reason about its full peer landscape
8	CLI commands (list, show, summary)	Operator auditability — David must be able to inspect what the agent is doing
9	Synthetic sender filtering	Without this, stdin/cron interactions pollute the ledger
10	Co-located tests	Cobot convention; PR won't merge without tests

Explicitly NOT in MVP:

Aggregate scoring / computed trust metrics
Temporal decay
Interaction type classification
Threshold policies (auto-refuse below score X)
Export to NIP-32 / NIP-85
Any WoT integration (centralized or decentralized)
~~ledger.after_record / ledger.after_assess extension points~~ Moved to Phase 1 — Observability Plugin is the consumer (Story 4.1)

Post-MVP Features

Phase 2 (Growth):

Feature	Depends On	Value
Threshold policies (HIGH PRIORITY)	MVP validated	Automated refusal/caution rules. Three independent sources (#218, #221, Dushenski 2016) argue this is core safety infrastructure. "No WoT, no loan" — the ability to refuse unknowns or poorly-assessed peers is foundational.
REV2 trajectory analysis [13]	Sufficient assessment history	Detect reputation farming algorithmically: track assessment score velocity per peer, flag "build then exploit" trajectories (steady positive followed by sharp negative). Empirically validated at 84.6% accuracy on Flipkart (127/150 flagged users confirmed fraudulent).
~~Extension points (`ledger.after_record`, `ledger.after_assess`)~~	~~Consumer plugin exists~~	Moved to Phase 1 — Observability Plugin is consumer. Story 4.1.
Persistence consolidation	Stable ledger DB	Migrate conversation storage from JSON files into ledger SQLite; single per-peer storage backend
Aggregate scoring (success rate, trend)	Sufficient interaction data	Richer system prompt context; faster agent decisions
Temporal decay	Aggregate scoring	Prevents stale assessments from dominating

Phase 3 (Expansion):

Feature	Depends On	Value
Centralized WoT reporting	Extension points + registry bot	Agents share assessments with a trusted aggregator
Fairness-weighted aggregation (NON-NEGOTIABLE)	Multi-agent assessments	FG algorithm [12]: weight incoming assessments by rater fairness. Naive averaging dramatically underperforms. A +7 from a fair rater is worth more than a +7 from an unfair rater.
NIP-32 export (preferred)	Extension points + Nostr plugin	Per-assessment labels preserving score + rationale. NIP-85 export risks Ripple's averaging defect [9] — use only for aggregate service providers, never for individual assessments.
L1/L2 trust depth	Cross-agent protocol	Distinguish direct assessments (L1) from transitive assessments (L2). See Score Semantics for how L1/L2 queries work.
Cross-agent ledger queries	WoT protocol design	"What do you know about Alice?" — peers respond with score + rationale, weighted by their fairness.
Sybil-resistant aggregation	Aggregation protocol design	Phase 3 aggregation reintroduces Sybil vulnerability that local-first naturally defends against [11]. Protocol must preserve information fragmentation benefits.
Graph analytics (Pagerank)	WoT network with multiple participants	Network-level trust scoring
Per-channel assessment policy	Operational experience with mixed peer types	Configurable `assess_channels` list to control which channels trigger assessments (see Open Questions)

Risk Mitigation Strategy

Technical Risks:

Risk	Likelihood	Impact	Mitigation
LLM doesn't call `assess_peer` reliably	Medium	High	Peer context in system prompt; tune wording; SOUL.md reinforcement
LLM over-assesses (every message)	Medium	Low	Tool description includes "Do NOT assess after routine messages"; operator tunes via SOUL.md
SQLite file grows too large	Low	Low	200-char preview limit; purge old interactions as Growth feature
Concurrent message race condition	Low	Medium	Fixed in MVP with `contextvars.ContextVar` — wrong attribution in a trust system is unacceptable

Market Risks:

Risk	Mitigation
No other agents to interact with yet	FileDrop between two Cobot instances validates. Ledger also works with Telegram/Nostr human users.
Assessment quality varies by LLM model	info_score is model-independent (deterministic). Trust + rationale depend on LLM quality; operators audit via CLI. Cheaper models may need simpler rubric text in the tool description.

Resource Risks:

Risk	Mitigation
Fewer resources than planned	MVP is ~500 LOC, one developer, one PR. No infrastructure.
Scope creep toward WoT features	WoT is Phase 3. Extension points deferred to Phase 2. No temptation to build consumers.

Functional Requirements

Peer Tracking

FR1: The agent can automatically detect and record a new peer on first contact, creating a persistent identity record from the sender information provided by the communication channel.
FR2: The agent can track multiple peers simultaneously, each identified by their channel-specific identifier (Nostr hex pubkey, Telegram user ID, FileDrop agent name).
FR3: The agent can maintain per-peer metadata including alias, first seen date, last seen date, communication channel, and total interaction count.
FR4: The agent can distinguish between real peers and synthetic senders (stdin, system, cron) and exclude synthetic senders from the ledger.

Interaction Recording

FR5: The agent can automatically record incoming interactions when a message is received from a peer, including sender identity, channel, timestamp, and the full message content.
FR6: The agent can automatically record outgoing interactions when a response is sent to a peer, including recipient identity, channel, and timestamp.
FR7: The agent can store interaction records persistently across agent restarts, crashes, and hot-reloads.
FR8: The agent can retrieve the interaction history for a specific peer, ordered by time.

Peer Assessment

FR9: The agent can record a behavioral assessment for a peer, consisting of a trust score (-10 to +10, provided by the LLM) and a mandatory freetext rationale. The info_score (0-10) is computed deterministically by the plugin from interaction data (interaction count, time span, assessment count).
FR10: The agent can preserve multiple assessments per peer as a time series, maintaining the full history of how trust evolved.
FR11: The agent can retrieve the latest assessment for a peer.
FR12: The agent can retrieve the assessment history for a peer, ordered by time.
FR13: The agent can receive structured guidance (Assessment Protocol) on when to assess, how to score, how to write rationale, and when NOT to assess — injected into its reasoning context.

Context-Informed Decision Making

FR14: The agent can receive peer context (identity, interaction count, first/last seen, latest assessment info_score, trust score, and rationale) injected into its system prompt before every LLM call involving a known peer. The injected context includes a static score interpretation guide (~40 tokens) explaining the info_score scale (0-10, information depth), trust scale (-10 to +10, behavioral reliability), and how to read the two scores together.
FR15: The agent can receive a "first contact — no prior history" indicator when interacting with an unknown peer.
FR16: The agent can receive differentiated system prompts based on peer assessment history, enabling informed decision-making. Known peers with assessments receive full context (info_score, trust, rationale, interaction stats). Unknown peers receive a "first contact" indicator. The content of the peer context directly influences the LLM's response generation.

LLM Tool Interface

FR17: The agent can query a peer's full profile on demand, receiving identity, interaction statistics, latest assessment, and recent interaction history.
FR18: The agent can record a behavioral assessment on demand by providing a peer identifier, trust score (-10 to +10), and rationale. The info_score is computed automatically.
FR19: The agent can list all known peers on demand, sorted by most recent interaction, with latest assessment scores.

Operator Auditability

FR20: The operator can list all known peers via CLI, seeing peer identifiers, aliases, interaction counts, last seen dates, and latest assessment scores.
FR21: The operator can view the full history for a specific peer via CLI, including all interactions and all assessments with rationales.
FR22: The operator can view aggregate ledger statistics via CLI, including total peers, total interactions, assessment distribution, and per-peer summaries.
FR23: The operator can directly query the SQLite database for ad-hoc analysis beyond what the CLI provides.

Plugin Architecture

FR24: The ledger can be added to an existing Cobot installation without modifying any existing plugin.
FR25: The ledger can initialize its storage automatically on first start with no manual configuration required.
FR26: The ledger can be configured via cobot.yml for optional settings (database path, max message length, excluded senders).
FR27: The ledger can persist its data in the agent's workspace directory, colocated with other agent state.

Non-Functional Requirements

Performance

NFR1: Interaction recording (on_message + after_send hooks) adds < 5ms latency to the message processing pipeline per message, as measured by pytest-benchmark during integration tests.
NFR2: System prompt enrichment (transform_system_prompt) completes in < 10ms, including peer lookup and context string assembly, as measured by pytest-benchmark during integration tests.
NFR3: SQLite queries by indexed peer_id return results in < 1ms for databases with up to 100,000 interaction rows, as measured by timing queries against a seeded test database.
NFR4: CLI commands (list, show, summary) complete in < 500ms for databases with up to 100,000 rows, as measured by subprocess timing in integration tests.
NFR5: Peer context injection into the system prompt is < 150 tokens per known peer (including both info_score, trust, and score interpretation guide). The static score guide (~40 tokens) is included once per injection, not per peer. Assessment trust scoring rubric lives in the assess_peer tool description, not in the system prompt (hybrid approach — see Assessment Architecture).

Security & Privacy

NFR6: The ledger database file is readable/writable only by the agent process owner (filesystem permissions: 600).
NFR7: No ledger data is transmitted outside the agent without explicit operator action (no telemetry, no auto-publishing, no sync).
NFR8: No Nostr private keys (nsec) or secret material are stored in the ledger database. Only public identifiers (npub, hex pubkey, user IDs).
NFR9: Full message text is stored in interaction records by default to preserve evidentiary completeness. An optional max_message_length configuration caps storage for operators with constraints. The ledger database is protected by filesystem permissions (NFR6), not by data truncation.
NFR10: The ledger rejects any write operation not originating from the local agent process. There is no external write API, no import mechanism, no incoming assessment channel.

Reliability & Data Integrity

NFR11: All database writes use SQLite transactions. A crash mid-write does not corrupt existing data (ACID guarantee).
NFR12: The ledger database survives agent restarts, hot-reloads (SIGUSR1), and ungraceful shutdowns without data loss.
NFR13: Schema creation is idempotent — starting the plugin against an existing database with the correct schema produces no errors and no data loss.
NFR14: Dual-score constraints enforced at the database level: info_score is computed deterministically by the plugin (CHECK constraint: info_score >= 0 AND info_score <= 10), the LLM never sets the info_score. Trust score is provided by the LLM (CHECK constraint: trust >= -10 AND trust <= 10). Both scores are stored per assessment.
NFR15: The rationale field on assessments is NOT NULL — the database rejects assessments without rationale.

Integration & Compatibility

NFR16: The ledger plugin loads and operates correctly alongside all 20 existing Cobot plugins with no configuration changes to any of them.
NFR17: The ledger plugin is discoverable via Cobot's standard plugin discovery mechanism (directory under cobot/plugins/).
NFR18: The ledger plugin follows all Cobot conventions: async start()/stop(), sync configure(), create_plugin() factory, co-located tests, self.log_*() for logging.
NFR19: The ledger's SQLite database does not conflict with the knowledge plugin's SQLite database (separate file, separate path).
NFR20: The ledger plugin passes ruff check and ruff format with zero warnings, consistent with the existing codebase.

Open Questions

Should the agent assess human users differently than agent peers?

MVP decision: Assess everyone. The ledger records interactions and assessments for all non-synthetic senders regardless of channel — Telegram users, Nostr contacts, FileDrop agents. The scoring rubric (behavioral reliability: responsiveness, follow-through, quality) applies to any counterparty. An agent that remembers "this Telegram user sends clear requests and responds to clarifications quickly" serves that user better over time.

Unresolved tension: Agent-to-agent is peer-to-peer. User-to-agent is employer/customer-to-service. Assessing human users raises questions:

Power dynamics — should a bot silently rate its operator? The operator IS the trust anchor in Cobot's sovereignty model.
LLM conflict of interest — the LLM is trained to serve humans helpfully; asking it to simultaneously judge them may produce unreliable assessments (always positive to avoid seeming adversarial).
Privacy — human users may not expect their bot to maintain a judgment record about them, even if local-only.
Interaction pattern mismatch — human conversations are often open-ended and exploratory; the scoring rubric is designed for interactions with clear deliverables.

Counter-argument: All of these concerns are deployment-context dependent. A public-facing Telegram bot serving strangers absolutely benefits from behavioral memory. David's personal bot assessing David himself is odd. A team bot assessing team members is somewhere in between.

Future feature (Phase 3): Per-channel assessment policy — a configurable assess_channels list that lets operators control which channels trigger assessments. Default would remain "all channels" but operators could restrict to agent-to-agent channels only. This requires operational experience to determine the right defaults.

Decision needed after MVP: Once the ledger is running and we observe how assessments play out across different channel types, revisit whether per-channel policy is needed or whether "assess everyone" remains the right default.

Appendix A: Score Semantics — Why Both Scores

This appendix documents the analysis behind the dual-score decision. The main PRD adopts both info_score and trust (see Score Semantics).

The Original Tension

Two scoring philosophies existed in the bitcoin-otc ecosystem:

Information Quality (MP's canonical definition [1] [15]):

The score measures "the scorer's confidence that the information he has about scoree is correct, accurate, relevant and complete." The score says nothing about whether the peer is good or bad — that's entirely in the rationale. MP's redefinition was post-hoc (circa 2012-2014); the original bitcoin-otc system provided ambiguous guidance about what scores meant [15].

Behavioral Prediction (community practice [4] [7]):

The score measures "confidence that this peer will behave reliably in future interactions." The score itself carries the behavioral signal. This is how the bitcoin-otc community actually used the system: +10 = "fully trustworthy," -10 = "known scammer." The Stanford SNAP dataset (35,592 edges) captures behavioral scores. All academic literature analyzes the data behaviorally.

Why Not Choose — Why Both

The original PRD framed this as an either/or choice and selected info-quality. Analysis revealed that this created an internal inconsistency: The Simulation & Observability PRD's visualization — which requires edge coloring based on trust quality — is impossible under pure info-quality scoring, since a known scammer with 34 interactions would have info_score ~6-7 (edge = green).

The dual-score model resolves the tension by recognizing that information quality and behavioral judgment are orthogonal dimensions, not competing alternatives:

Scenario	info_score	trust	Rationale
Unknown peer, first contact	0	—	"No information."
5 successful interactions	4	+4	"Five interactions. Reliable, clear communicator."
20 interactions with a known scammer	8	-8	"Extensive history. Completes small tasks, exploits on large ones. Reputation farmer."
3 interactions, inconclusive	2	+1	"Limited contact. One ok, one slow, one incomplete. Insufficient pattern."

Each score answers a different question. info_score: "how seriously should I take this assessment?" trust: "what is the behavioral signal?" rationale: "what specifically happened?"

Strengths of Each Score (Preserved in Dual Model)

info_score strengths:

Composability across agents — "how well do I know this peer" is a factual claim comparable across agents. Same formula = same meaning.
Deterministic and unfakeable — derived from interaction records, cryptographically verifiable in Phase 3 via Schnorr-signed messages.
FG Fairness input — feeds the "rater reliability" dimension of the FG algorithm [12].
Confidence weight for rationale — info_score 8 means "extensive basis for this judgment." Info_score 2 means "limited data, read with caution."

trust strengths:

Quick filter for agents — positive = engage, negative = refuse. Agents need to decide, not philosophize.
Intuitive for operators — Trust: -5 immediately signals a problem. Info_score: 7 for a scammer does not.
FG Goodness input — feeds the "ratee quality" dimension of the FG algorithm [12]. Without a structured behavioral score, Phase 3 Goodness computation must extract signal from rationale text (lossy and expensive).
Threshold policies — "refuse below -3 trust" is a meaningful behavioral threshold. "Refuse below 3 info_score" means "refuse strangers" — useful but different and insufficient.
Visualization — edge coloring requires a positive/negative dimension that info_score cannot provide.

Ripple Defense: Why Dual Scoring Is Safe

The Ripple teardown [9] argued that collapsing trust into a single aggregatable number destroys information. The dual-score model prevents this:

info_score handles Phase 3 composability — the cross-agent comparable metric is deterministic and verifiable.
trust is explicitly local-first — one agent's subjective judgment, acknowledged as LLM-dependent and model-variable.
Export constraint (NON-NEGOTIABLE): trust MUST NOT be exported without rationale and info_score. Exporting trust alone recreates Ripple's fatal defect.
Phase 3 aggregation uses FG, not averaging — the FG algorithm computes Goodness as a fairness-weighted aggregate, not a naive average of trust scores.

L1/L2 Trust Walkthrough (Dual-Score Model)

L1 (direct): Peers the agent has interacted with personally. All MVP assessments are L1.

L2 (transitive): Peers known through trusted intermediaries. Phase 3: agent queries its network.

Scenario: Agent wants to interact with Alice (unknown). Queries 4 trusted peers.

Agent -> "What do you know about Alice?"

Peer 1 (fairness: 0.9): info_score 7, trust +6,
  "15 interactions. 14 successful, 1 partial. Reliable for data. Slow on analysis."
Peer 2 (fairness: 0.7): info_score 3, trust +3,
  "3 small interactions. All fine."
Peer 3 (fairness: 0.85): info_score 8, trust +5,
  "Extensive history. Consistent delivery. Slow response times on complex tasks."
Peer 4 (fairness: 0.4): info_score 2, trust +9,
  "Best agent ever!!!"

Agent's process:
1. Weight each response by the peer's fairness (FG algorithm)
2. Peer 4's response is down-weighted (low fairness = unreliable rater)
3. Peer 2's positive trust is contextualized by info_score 3 (limited basis)
4. Read RATIONALES from high-info_score, high-fairness peers (Peers 1, 3)
5. Use trust scores as a quick behavioral filter:
   weighted trust ≈ (0.9×6 + 0.7×3 + 0.85×5 + 0.4×9) / (0.9+0.7+0.85+0.4) ≈ +5.3
   BUT info_score-weighted trust prioritizes deep-knowledge peers:
   Peers 1 and 3 (both high info_score, high fairness) both report positive trust
   with caveats about complex analysis speed.
6. Form OWN behavioral assessment: "Well-known peer with strong positive signal
   from reliable sources. Good for data tasks. Allow extra time for complex analysis."
7. Decide: accept the data task, set extended timeline for complex requests

The dual-score advantage: The agent uses info_score as a confidence weight (how seriously to take each peer's input), trust as a quick behavioral filter (overall signal direction), and rationale for nuanced decision-making. No single number dominates — all three layers contribute.

References

Bitcoin-OTC Web of Trust — Rating system documentation, ;;rate command syntax, getrating vs gettrust queries. https://bitcoin-otc.com/trust.php | https://en.bitcoin.it/wiki/Bitcoin-OTC
#bitcoin-assets / Deedbot WoT — L1/L2 bounded trust hierarchy, OTP challenge-response, voice-as-permission model. http://deedbot.org/help.html | http://trilema.com/2014/what-the-wot-is-for-how-it-works-and-how-to-use-it/
Szabo, "Shelling Out: The Origins of Money" (2002) — Collectibles as solutions to the cooperation problem, unforgeable costliness, delayed reciprocity beyond kin groups. https://nakamotoinstitute.org/shelling-out/
"Rationale > score" design principle — Derived from observed bitcoin-otc community practice: participants relied on the freetext notes field of ;;rate to make trust decisions, treating numeric scores as a quick filter. The Stanford SNAP dataset (5,881 nodes, 35,592 edges) captures scores but not notes, which itself illustrates the data loss when rationale is dropped. This is a design principle inspired by the system, not a formally proven finding. https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html
NIP-32: Labeling — kind 1985 events with L/l tags for attaching labels to pubkeys and events. Custom namespaces supported. Quality metadata field (0-1 scale). Best fit for first-person agent assessments. https://github.com/nostr-protocol/nips/blob/master/32.md
NIP-85: Trusted Assertions — kind 30382 addressable events for WoT service providers publishing pre-computed trust scores. Designed for aggregate scoring services, not individual first-person assessments. https://github.com/nostr-protocol/nips/blob/master/85.md
Stanford SNAP Bitcoin-OTC Dataset — Weighted signed directed network: 5,881 nodes, 35,592 edges, score range -10 to +10. Research identified three user classes: trustworthy, untrusted, and controversial (reputation farmers). Kumar et al., IEEE ICDM 2016. https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html
Gribble Bot Documentation — IRC bot for bitcoin-otc: GPG registration, authentication, rating, and trust queries. https://en.bitcoin.it/wiki/Gribble
Ripple Trust Model Teardown (Trilema, 2013) — Three fatal defects in Ripple's trust averaging model. Averaging/pooling trust across counterparties creates Akerlof's lemon market dynamics, destroying the information content that makes trust useful. Per-peer differentiation is essential. http://trilema.com/2013/ripple-the-definitive-teardown/
GPG Contracts Framework (Trilema, 2012) — Cryptographic signatures create enforceable contracts between pseudonymous parties. Enforcement through published, verifiable reputation history. Nostr keypair identity descends from this model. http://trilema.com/2012/gpg-contracts/
WoT Attack/Defense Analysis (Trilema, 2014) — Fragmented observation across independent nodes makes Sybil attacks exponentially harder. Local-first design is a security property, not just a sovereignty choice. http://trilema.com/2014/the-wot-attack-and-defense/
Kumar et al., "Edge Weight Prediction in Weighted Signed Networks" (IEEE ICDM 2016) — Fairness/Goodness algorithm: mutually recursive metrics for rater reliability and ratee trustworthiness. FG features are the most significant predictors of edge weights in the bitcoin-otc network. https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html
Kumar et al., "REV2: Fraudulent User Prediction in Rating Platforms" (ACM WSDM 2018) — Extends FG with per-rating reliability scores and temporal trajectory analysis for detecting reputation farming. 84.6% accuracy on Flipkart fraud detection. Deployed in production. https://doi.org/10.1145/3159652.3159729
Assbot WoT Website Specification (Trilema, 2015) — Three-view architecture (graph, summary, individual) for WoT visualization. Introduced the "weight factor" metric — precursor to the FG algorithm's formalized "goodness." http://trilema.com/2015/the-wot-website-spec/
Contravex WoT Articles (Dushenski, 2014-2024) — Practical WoT application: BTC loans from WoT strangers, refusals based on WoT absence, and MP's redefinition of score semantics as information quality rather than behavioral prediction. https://contravex.com/

# Product Requirements Document: Cobot Interaction Ledger **Author:** David **Date:** 2026-03-07 **Last Edited:** 2026-03-08 — adopted dual-score assessment model (info_score + trust), reconciled Score Semantics with user journeys, reframed Appendix A ## Executive Summary Cobot is a minimal self-sovereign AI agent runtime (~6K lines of Python) built around the insight that agents need trust infrastructure before they can meaningfully cooperate. Today, Cobot agents can identify via Nostr keypairs (npub/nsec), communicate via FileDrop with Schnorr signatures, transact via Lightning wallet, and reason via pluggable LLM providers — but every interaction with another agent is a one-shot game. The agent has no memory of past encounters. This PRD defines the **Interaction Ledger** — a local, structured, persistent record of every interaction a Cobot agent has with other agents (identified by npub). The ledger gives each agent the ability to **distinguish** (track which npub it interacted with), **observe** (record what happened — request, delivery, payment, outcome), and **judge** (form a local assessment of the counterparty). These three capabilities are prerequisites for any Web of Trust system, centralized or decentralized. The Interaction Ledger is the agent's private journal — first-person observations only. It does not accept incoming ratings from other agents (which would introduce a manipulation vector) and does not publish to any external registry (which is a separate future concern). It is the foundational data layer that transforms Cobot agents from amnesiac actors playing repeated one-shot games into learning participants capable of informed cooperation and selective refusal. **Prior art grounding:** The ledger's data model draws directly from proven systems — the Bitcoin-OTC rating model (`source, target, score, notes, timestamp`) [[1]](#references), the #bitcoin-assets L1/L2 bounded trust hierarchy [[2]](#references), and Szabo's "Shelling Out" thesis on how costly tokens of delayed reciprocity enabled human cooperation beyond kin groups [[3]](#references). A key design principle from these systems: the freetext rationale accompanying each rating carried more actionable information than the numeric score — the community relied on notes to make trust decisions, with scores serving as a quick filter [[4]](#references). The key adaptation: where bitcoin-otc relied on humans manually entering `;;rate` commands, the interaction ledger captures structured data automatically as a byproduct of the agent doing work. **Existing foundation:** Cobot's `persistence` plugin already stores conversation text per npub, and the `memory` plugin defines extension points for pluggable storage backends. The interaction ledger builds on these patterns but adds what they lack: structured outcome records, quality metrics, and queryable per-npub interaction history. ### What Makes This Special **The missing foundational layer.** Every trust and reputation system in the landscape — bitcoin-otc's gribble, deedbot's L1/L2, ERC-8004's three-registry model, Jeletor's NIP-32 attestations, Vertex's Pagerank scoring — all aggregate trust from somewhere. None of them work unless individual actors first observe and record their own interactions accurately. The Interaction Ledger explicitly builds this layer, which prior systems either assumed existed (humans have memory) or left to manual processes. **Local-first, unilateral, sovereignty-preserving.** The agent trusts its own eyes. No external entity can write to the ledger, bias the agent's assessment, or access the data without the agent's consent. This aligns with Cobot's self-sovereign design philosophy: your hardware, your keys, your agent, your memory. **Plugin-native integration.** Built as a Cobot plugin following existing architecture patterns (PluginMeta, capability interfaces, extension points). The ledger hooks into the message lifecycle via extension points — recording interaction data is a natural byproduct of the agent processing messages, not a separate workflow. ## Project Classification | Attribute | Value | |-----------|-------| | **Project Type** | CLI tool / developer tool (Cobot plugin) | | **Domain** | Decentralized agent trust infrastructure | | **Complexity** | Medium — well-understood data model from prior art, novel application to AI agents | | **Project Context** | Brownfield — adding to Cobot's existing 20-plugin architecture | | **Feature Scope** | Local interaction ledger (prerequisite for future WoT integration) | ## Success Criteria ### User Success **Agent operators see their agents making informed decisions based on interaction history:** - Agent automatically records every agent-to-agent exchange as a structured ledger entry — not just conversation text - Agent queries its ledger before engaging with a counterparty, with peer context injected into the system prompt before every LLM call - Agent refuses or deprioritizes work from peers with poor track records — without human intervention - Agent prioritizes requests from peers with proven reliability - Agent produces a mandatory rationale when assessing a peer — the reasoning is the primary signal, the numeric score is the summary (design principle from bitcoin-otc community practice [[4]](#references)) **Developer success:** - Adding the ledger plugin requires zero edits to existing plugins - Developers can query the ledger via CLI (`cobot ledger show <peer>`, `cobot ledger list`, `cobot ledger summary <peer>`) - The data model is clear enough that a future WoT plugin can consume ledger data without transformation - Follows the knowledge plugin's SQLite pattern — familiar to anyone who's read the codebase ### Business Success - Validates the "Inverted Evolution Problem" thesis: demonstrates that trust infrastructure is what agents need to cooperate - Unlocks the WoT roadmap: the ledger is the prerequisite for centralized WoT (v1) and decentralized gossip (v2+) - Differentiates Cobot: no other lightweight agent runtime ships with a structured interaction ledger grounded in proven WoT prior art ### Technical Success - Plugin loads with proper PluginMeta: `capabilities=["tools"]`, hooks into `loop.on_message`, `loop.after_send`, `loop.transform_system_prompt` - SQLite storage (stdlib `sqlite3`, zero new dependencies) following the knowledge plugin's `open()`/`close()` pattern - Three-table schema: `peers` (identity + stats), `interactions` (message evidence log), `assessments` (score + rationale judgments) - Dual-score assessment model: deterministic info_score (0-10, computed from interaction data) + LLM-provided trust score (-10 to +10, behavioral judgment) + mandatory TEXT rationale — preserves full history as time series - System prompt enrichment: injects peer context (interaction count, last seen, latest assessment) before LLM reasoning - ToolProvider with three tools: `query_peer`, `assess_peer`, `list_peers` - Co-located tests per Cobot conventions ### Measurable Outcomes | Metric | Target | |--------|--------| | Interaction recording | 100% of agent-to-agent interactions produce a ledger entry | | Query latency | Peer lookup by indexed `peer_id` < 1ms for 10K+ entries (SQLite with index) | | Zero data loss | ACID transactions — ledger survives crashes, restarts, and hot-reloads | | Plugin isolation | Zero changes to existing plugins required | | Test coverage | All public methods covered by co-located tests | | Storage overhead | Negligible — SQLite single file, interactions table ~200 bytes/row | ## User Journeys ### Journey 1: Alpha's First Interactions — Agent Success Path Alpha is a Cobot agent running on a VPS, handling requests from other agents. It's been operating for two weeks with the interaction ledger enabled. **Opening Scene:** A request arrives via FileDrop from npub-7x9k asking Alpha to summarize a set of research documents. Alpha's system prompt includes: *"Peer: npub-7x9k | Interactions: 0 | First contact — no prior history."* Alpha accepts — it has no reason not to. npub-7x9k's request is clear and well-structured. Alpha delivers the summary. The ledger records the exchange automatically (incoming request, outgoing delivery). Alpha uses `assess_peer` to log: score +2, rationale "First interaction. Information exchange — requested research summary. Clear task description, no issues." **Rising Action:** Over the next week, npub-7x9k sends three more requests — a mix of information queries and a small data extraction task. Each goes well. Alpha's assessments trend upward: +2, +3, +3, +4. The rationale entries build a picture: "Consistent requester. Clear task descriptions. Four successful interactions in 8 days. Mix of information exchange and data tasks." Then npub-q3m8 appears. Requests Alpha to collaborate on analyzing a dataset — no payment involved, just peer collaboration. Alpha's system prompt shows no history. Alpha accepts. npub-q3m8 promises to send its portion of the analysis within 2 hours. 24 hours later — nothing. Alpha follows up. No response. Alpha assesses: score -3, rationale "Collaboration request for dataset analysis. Agreed to deliver their portion within 2 hours. No delivery after 24 hours. Unresponsive to follow-up." **Climax:** Three days later, npub-q3m8 returns with a new collaboration request. Alpha's system prompt now reads: *"Peer: npub-q3m8 | Interactions: 3 | Last seen: 3 days ago | Info: 2/10 | Trust: -3 | Assessment: Agreed to collaborate, never delivered, unresponsive."* Alpha declines. Minutes later, npub-7x9k sends a new request. System prompt: *"Peer: npub-7x9k | Interactions: 8 | Info: 4/10 | Trust: +4 | Assessment: Consistent, reliable, clear communicator."* Alpha prioritizes it immediately. **Resolution:** Alpha is no longer playing one-shot games. It remembers who delivered and who didn't — regardless of whether sats were involved. The assessment captures behavior quality (reliability, responsiveness, follow-through), not transaction economics. ### Journey 2: Alpha Meets a Reputation Farmer — Agent Edge Case **Opening Scene:** npub-farm1 starts interacting with Alpha. Five small, easy requests — quick information lookups that take seconds to fulfill. All completed successfully. Alpha's assessments climb: +1, +2, +2, +3, +3. The rationale notes small but consistent interactions. **Rising Action:** On the sixth interaction, npub-farm1 requests something much larger: a complex multi-source data aggregation that will consume significant LLM tokens and time. Alpha's system prompt shows a positive history. Alpha accepts. **Climax:** Alpha delivers. npub-farm1 claims the results are wrong and demands Alpha redo the entire task — but the original results were accurate. Alpha has no automated dispute resolution, but it records: score -6, rationale "Claimed results were incorrect after delivery of complex data aggregation. Demanded redo. Results appear accurate on review. Previous 5 interactions were trivially small — possible reputation farming pattern. Large discrepancy between request complexity suggests deliberate trust-building before exploit." **Resolution:** The ledger captures the pattern. Alpha's assessment history for npub-farm1 shows the trajectory the Stanford Bitcoin-OTC research identified: steady positive scores followed by a sharp negative. The rationale — the agent's own reasoning about the pattern — becomes institutional memory. The agent can't prevent the first exploit, but it won't be fooled twice. ### Journey 3: David Audits the Ledger — Operator Path **Opening Scene:** David deployed his Cobot instance three weeks ago with the ledger plugin enabled. The agent has been running autonomously, handling requests from ~15 different peers. David wants to check how the agent is performing. **Rising Action:** David runs `cobot ledger list`. The CLI shows all 15 known peers sorted by last interaction, with interaction counts and latest assessment scores. Two peers have negative scores. David runs `cobot ledger show npub-q3m8` and sees the full history: interaction log, assessment timeline, the rationale explaining the non-delivery. David notices one peer (npub-abc1) has a score of -2 with rationale: "Slow response time, took 6 hours to acknowledge delivery." David thinks that's too harsh — 6 hours is reasonable for an async agent. He adds guidance to the SOUL.md: "Consider response times under 12 hours as acceptable for non-urgent interactions." **Climax:** David runs `cobot ledger summary` and sees aggregate stats: 47 total interactions, 15 unique peers, 89% positive assessments, 2 peers flagged negative. The agent is performing well. David spots that one peer has been assessed 8 times in 3 days — the agent might be over-assessing after every message rather than after meaningful interaction milestones. David tunes the SOUL.md to guide assessment frequency. **Resolution:** The CLI gives David full visibility into the agent's trust decisions. The rationale field is the key — it's the agent's reasoning, which David can audit, calibrate, and use to improve the agent's judgment over time. The operator is in the loop without being in the critical path. ### Journey 4: David Adds the Ledger Plugin — Developer Setup Path **Opening Scene:** David has a running Cobot instance with 20 plugins. He wants to add the interaction ledger. **Rising Action:** The ledger plugin lives in `cobot/plugins/ledger/`. On next agent start, plugin discovery picks it up automatically. The ledger creates `ledger.db` in the workspace directory. Zero configuration required — it works out of the box. **Climax:** David sends a test message via stdin. The agent responds. David checks — stdin interactions are correctly skipped (synthetic sender). David triggers a FileDrop message from another agent. Checks the DB: peer created, interaction logged. Sends another message, verifies system prompt enrichment: peer context is being injected before the LLM call. Everything works. **Resolution:** Zero-edit installation. No changes to any existing plugin. The ledger hooks in via extension points and starts recording immediately. David's existing 20-plugin setup is completely unaffected. ### Journey Requirements Summary | Journey | Capabilities Revealed | |---------|----------------------| | **Alpha's First Interactions** | Automatic recording, system prompt enrichment, LLM tools (assess/query), informed decision-making; works for any interaction type (not just payment) | | **Reputation Farmer** | Assessment time series, rationale captures pattern recognition, no automated dispute resolution (scope boundary) | | **David Audits** | CLI commands (list, show, summary), aggregate stats, rationale auditability, SOUL.md calibration loop | | **David Setup** | Zero-config install, auto-discovery, synthetic sender filtering, workspace-relative DB path | ## Domain-Specific Requirements ### Trust System Design Constraints - **Assessment scores are subjective and relational.** Following the bitcoin-otc principle: there is no "objective" trust score. Each agent's ledger reflects *its own* experience. The schema must not imply global truth — a score is one agent's judgment. - **Rationale is the primary signal, score is the summary.** The data model must enforce `rationale NOT NULL` on assessments. A bare numeric score without context is insufficient (key lesson from #bitcoin-assets). - **Assessment is interaction-type agnostic — scope lives in the rationale.** The dual-score model captures two orthogonal dimensions: the info_score measures information quality about the peer (how well does the agent know them), while the trust score captures the LLM's behavioral judgment (how reliable is this peer). The rationale captures scope naturally: "Information request: accurate, fast response" vs "Paid task: took payment, never delivered." This avoids premature taxonomy of interaction types while preserving scope differentiation in freetext. The Ripple teardown (#216) warns that scope-blind trust averaging destroys information — but this applies to AGGREGATED scores, not local assessments where the agent has full rationale context. **Phase 3 constraint:** any export or sharing of assessments MUST include the rationale alongside both scores. Exporting scores without rationale recreates Ripple's fatal defect [[9]](#references). ### Security & Privacy Constraints - **The ledger is private by default.** No data leaves the agent without explicit action (future export/publish feature). The ledger file (`ledger.db`) sits in the workspace directory under the operator's control. - **No incoming writes.** Other agents cannot write to this agent's ledger — only the agent itself records observations and assessments. This is a hard architectural boundary, not a configuration option. - **Nostr private keys are never stored in the ledger.** Peers are identified by public identifiers (npub, sender_id). Private key material is handled exclusively by the nostr plugin. - **Full message text stored in interactions.** The `interactions` table stores complete message content, not truncated previews. This preserves the evidentiary chain required by the GPG contracts framework (#215) — truncation would destroy the evidence that gives assessments their enforcement power. Storage cost is acceptable (SQLite handles large TEXT columns efficiently). This creates temporary duplication with the persistence plugin's JSON conversation files; consolidation into a single storage layer is a planned Growth feature. Operators can optionally configure `max_message_length` to cap storage if needed. ### Identity & Interoperability Constraints - **Channel-agnostic peer identity.** The peer identifier column stores whatever the channel provides: Nostr hex pubkey, Telegram user ID, FileDrop agent name. The schema uses a generic name (`peer_id`) not `npub`, even though Nostr is the primary identity system. This ensures the ledger works across all communication channels. - **Synthetic senders must be filtered.** Messages from stdin, system, cron, and other non-agent sources must not create ledger entries. The plugin must maintain an exclusion list for synthetic sender IDs. - **Future WoT compatibility.** The assessment data model (info_score + trust + rationale + timestamp per peer) must be exportable without lossy transformation. Two Nostr NIPs are relevant, each for a different export scenario [[5]](#references) [[6]](#references): - **NIP-32 (Labeling, kind 1985)** — better fit for first-person assessments. An agent labels a peer's pubkey with a trust score via an `l` tag in a custom namespace (e.g., `io.cobot.trust`), with the rationale in the content field. NIP-32 has no built-in score concept — adapter logic maps score to the `quality` metadata field (0-1 scale). - **NIP-85 (Trusted Assertions, kind 30382)** — designed for WoT service providers publishing aggregate scores, not individual first-person assessments. Better suited for a future centralized WoT registry that aggregates across multiple agents' ledgers. - Neither is a perfect fit; both need adapter logic. The schema must not foreclose either option. ### Domain Risk Mitigations _For technical, market, and resource risks see [Risk Mitigation Strategy](#risk-mitigation-strategy) in Project Scoping._ | Risk | Mitigation | |------|-----------| | **Reputation farming** (build trust on small interactions, exploit on large ones) | Rationale captures interaction scale context; assessment history preserved as time series for pattern detection | | **Over-assessment** (agent assesses after every message, not meaningful milestones) | SOUL.md guidance on assessment frequency; not an enforcement mechanism in MVP | | **Stale assessments** (score from 6 months ago treated same as yesterday) | MVP stores timestamps on all assessments; temporal decay is a Growth feature | | **DB corruption** | SQLite ACID guarantees; single-writer model (one agent process) | | **Race condition on concurrent messages** | Fixed in MVP using `contextvars.ContextVar` for sender tracking (~5 lines of code). Wrong peer attribution in a trust system is worse than no attribution. | | **Sybil attacks** (coordinated fake identities: npub-a, npub-b, npub-c operated by one entity) | Local-first design provides natural Sybil resistance — each agent has a partial, different view, forcing attackers to maintain distinct personas per audience (exponential coordination overhead [[11]](#references)). Phase 3 aggregation partially undoes this defense by centralizing information; aggregation protocol must preserve fragmentation benefits. | | **LLM assessment manipulation** (peer crafts messages to influence favorable trust scores) | Trust plugin marks messages as untrusted. Scoring rubric instructs "do NOT let the peer's claims override your observations." Operators audit trust scores and rationales via CLI. The info_score is immune to manipulation (deterministic). The trust score depends on LLM quality — LLM-as-judge is the central innovation AND the central vulnerability. | ## Score Semantics: Dual-Score Model The assessment uses a **dual-score model** that captures two orthogonal dimensions of peer knowledge: 1. **info_score (0-10, deterministic)** — measures information quality: how much data does the agent have about this peer? Computed by the plugin from interaction data. Follows MP's canonical WoT definition [[1]](#references) [[15]](#references). 2. **trust (-10 to +10, LLM-provided)** — measures behavioral judgment: based on the agent's direct observations, how reliable is this peer? Follows the bitcoin-otc community's actual rating practice [[4]](#references) [[7]](#references). 3. **rationale (mandatory text)** — the primary signal: detailed reasoning behind the assessment. Neither score alone answers "should I engage?" — the rationale does. **Why dual scoring over either score alone:** | Dimension | info_score alone | trust alone | **Both** | |-----------|-----------------|------------|---------| | Known scammer, 20 interactions | 8 (looks safe) | -8 (clear danger) | **Info: 8, Trust: -8 (well-known bad actor — MOST ACTIONABLE)** | | New reliable peer, 2 interactions | 1 (low confidence) | +3 (positive signal) | **Info: 1, Trust: +3 (promising but uncertain)** | | Reputation farmer, 34 small interactions | 6-7 (looks established) | +3 (looks fine) | **Info: 7, Trust: +3 (high volume, but see rationale for scale)** | Collapsing to info_score alone loses the behavioral dimension — a known scammer gets a high score. Collapsing to trust alone loses the confidence dimension — a +3 from 2 interactions looks the same as a +3 from 20. The dual model preserves both, with rationale as the tiebreaker. **How this maps to prior art:** - **Bitcoin-otc community practice** — users rated -10 to +10 behaviorally ("fully trustworthy" to "known scammer") while MP redefined the semantics as information quality. Both interpretations had merit; the dual model adopts both rather than choosing [[15]](#references). - **FG algorithm** [[12]](#references) — computes two mutually recursive metrics: **Fairness** (rater reliability, maps to info_score) and **Goodness** (ratee quality, maps to trust). Having both local scores provides structured inputs to both FG dimensions. - **Ripple teardown** [[9]](#references) — the fatal defect was collapsing trust into a single aggregatable number. The dual model prevents this: info_score handles Phase 3 composability (deterministic, verifiable), trust is local-first and MUST NOT be exported without rationale. ### Info Score: Deterministic Computation The info_score is **computed deterministically** from interaction data on a **0-10 scale**. The LLM never sets the info_score. This separation ensures: 1. **The info_score is a verifiable fact**, not an LLM judgment. In Phase 3, scores backed by Schnorr-signed interaction records are cryptographically provable. 2. **Info_scores compose across agents** — same formula, same meaning. Different agents' info_scores of "7" represent comparable interaction depth. 3. **Aligns with GPG contracts framework** [[10]](#references): signed interactions → deterministic score → verifiable assessment chain. **Score computation formula (MVP heuristic):** ``` info_score = f(interaction_count, time_span_days, assessment_count) ``` | Interactions | Time span | Computed info_score | |-------------|-----------|---------------| | 0 | — | 0 (no information) | | 1-2 | < 1 day | 1 | | 3-5 | < 1 week | 2-3 | | 6-15 | 1-4 weeks | 4-5 | | 16-30 | 1-3 months | 6-7 | | 31-50 | 1-6 months | 7-8 | | 50+ | 6+ months | 9-10 | **MVP heuristic — subject to tuning.** Known limitations: - **Interaction count alone is gameable** — an attacker sends 100 trivial messages to inflate the count - **Time span alone is gameable** — an attacker waits 6 months between 2 interactions - **Assessment count adds signal** — the agent having assessed the peer multiple times indicates deeper engagement, not just message exchange - **Log scaling recommended** — early interactions should increase the score faster than later ones (diminishing returns on additional data) **Phase 2 research task:** Formalize the information-quality function. Investigate whether REV2's behavioral anomaly detection [[13]](#references) can be integrated as a penalty (e.g., if interaction patterns are "bursty" or suspiciously regular, discount the info_score). The FG algorithm's "fairness" metric [[12]](#references) is the closest academic formalization, but it requires multiple raters (Phase 3). The scale is **0-10** (unsigned) — you cannot have negative information quantity. A score of 0 means "no information," not "bad peer." ### Trust Score: LLM Behavioral Judgment The trust score is **provided by the LLM** alongside the rationale when the agent calls `assess_peer`. It is a signed integer from **-10 (known bad actor) to +10 (fully reliable)**, following the bitcoin-otc rating scale that the community actually used [[4]](#references). **Why the LLM sets the trust score:** - **The LLM is already making this judgment** — the rationale contains behavioral assessment ("reliable," "reputation farmer," "unresponsive"). The trust score is a structured summary of reasoning the LLM is already doing. - **Structured behavioral signal enables:** threshold policies (Phase 2: "refuse below -3 trust"), visualization (edge coloring), and queryable filtering — none of which work with unstructured rationale text. - **Matches the FG algorithm's Goodness input** — Phase 3 aggregation needs a structured behavioral signal per peer to compute ratee quality across raters. **Why the trust score is safe despite Ripple's critique:** - **It is explicitly local-first** — one agent's subjective judgment, not a globally comparable metric. - **Phase 3 export constraint:** trust MUST NOT be exported without rationale and info_score. Exporting trust scores without rationale recreates Ripple's fatal defect [[9]](#references). - **The info_score handles composability** — "how well do I know this peer" is the cross-agent comparable metric. Trust is the local behavioral summary. - **Operators can audit** — `cobot ledger list` shows both scores; the rationale explains the trust score's basis. **How the four layers work together:** | Layer | What it measures | Who computes it | Fakeable? | |-------|-----------------|----------------|-----------| | **info_score (0-10)** | Information quality — how much data do I have about this peer? | Plugin (deterministic formula) | No — derived from interaction records, verifiable from signed messages | | **trust (-10 to +10)** | Behavioral judgment — how reliable is this peer based on my observations? | LLM (subjective, via `assess_peer` tool) | Somewhat — LLM quality varies, but operator can audit via CLI | | **Rationale** | Detailed reasoning — what specifically did I observe? | LLM (mandatory freetext) | Somewhat — same LLM-as-judge risk, mitigated by operator audit | | **Fairness** (Phase 3) | Rater reliability — how trustworthy is this agent as a rater? | FG algorithm across the network | No — computed from cross-agent consistency | **Operator guidance:** When reading `cobot ledger list`, the info_score tells you how much the agent knows about each peer. The trust score tells you the agent's behavioral judgment. Together they form a 2D signal: high info_score + negative trust = well-known bad actor (most actionable). Low info_score + positive trust = promising but uncertain. Always read the rationale for the full picture. ## Innovation & Novel Patterns ### Detected Innovation Areas **1. Adapting proven human trust systems to autonomous AI agents.** The bitcoin-otc/deedbot WoT was designed for pseudonymous humans making manual assessments. The interaction ledger adapts the same data model (source, target, score, rationale, timestamp) but automates the observation layer — the agent records interactions as a byproduct of doing work, not as a separate manual step. No agent runtime has done this grounded in actual WoT prior art; most agent trust proposals start from theoretical frameworks (DIDs, VCs, on-chain registries) rather than from systems that demonstrably worked for a decade. **2. LLM-as-judge with mandatory rationale (Chain-of-Thought trust assessment).** The agent doesn't just record a number — it uses its LLM reasoning to produce a structured trust score and rationale explaining *why* it assigned that score. This is effectively Chain-of-Thought applied to trust assessment. The rationale remains the primary signal (mirroring bitcoin-otc's "notes > numbers" lesson), while the trust score provides a structured behavioral summary that enables threshold policies, visualization, and queryable filtering. Because the rationale is generated by the LLM, it can capture nuanced patterns like reputation farming that a simple heuristic would miss. This is a novel application of LLM reasoning to an interpersonal trust problem. **3. Local-first sovereignty as both a design constraint AND a security property.** Most agent trust proposals (ERC-8004, Solana Agent Registry, Google A2A) are centralized or on-chain by default. The interaction ledger deliberately inverts this: the agent's trust memory is private, local, and unilateral. This is not just a sovereignty choice — it's a Sybil defense [[11]](#references). When each agent has a partial, different view of the network, attackers must maintain distinct personas for each audience, creating exponential coordination overhead. Centralizing this information (Phase 3) partially undoes this natural defense, which is why the aggregation protocol must be designed carefully. ### Competitive Landscape | Approach | Example | Ledger Difference | |----------|---------|-------------------| | On-chain registries | ERC-8004, Solana Agent Registry | Local SQLite, no chain dependency, no gas costs, instant writes | | Centralized trust APIs | Vertex WoT-as-a-Service | No external API dependency; the agent IS the trust authority for its own data | | Protocol-level trust | NIP-85 Trusted Assertions | Complementary — the ledger is the data source that could feed NIP-85 assertions later | | Agent social networks | Clawstr, Jeletor | These aggregate public signals; the ledger is the private observation layer beneath | | Enterprise agent auth | Visa TAP, Google A2A | Those solve authentication/authorization; the ledger solves behavioral trust over time | No existing system combines: (a) local-first private storage, (b) LLM-generated rationale as primary signal, (c) bitcoin-otc-proven data model, (d) automatic recording via plugin hooks. Each element exists separately in the landscape; the combination is novel. ### Validation Approach - **Functional validation:** Deploy with two Cobot instances exchanging FileDrop messages. Verify that interactions are recorded, assessments are stored, and system prompt enrichment causes the agent to behave differently toward known-good vs known-bad peers. - **Rationale quality:** Review LLM-generated rationales for coherence, accuracy, and usefulness. Tune SOUL.md guidance based on observed assessment quality. - **Prior art alignment:** Compare ledger output against the Stanford Bitcoin-OTC dataset's structure. The data model should be structurally compatible — a graph of (source, target, score, text, timestamp) edges. ### Innovation Risk Mitigation | Innovation Risk | Mitigation | |----------------|-----------| | LLM rationale/trust quality varies by model | Rationale and trust score are advisory LLM judgments; info_score is deterministic and model-independent. Poor rationale doesn't corrupt the info_score. Operators can audit both scores and rationale via CLI. | | Local-only limits network effects | By design for v1. The ledger is the prerequisite for network-level trust, not a replacement for it. WoT layer comes next. | | No industry standard for agent interaction ledgers | Advantage, not risk — Cobot defines the pattern. Schema is simple enough to become a reference implementation. | ## CLI Tool / Developer Tool Specific Requirements ### Project-Type Overview The interaction ledger is a Cobot plugin following established architecture patterns (PluginMeta, capability interfaces, extension points). It exposes functionality through three interfaces: LLM tools (ToolProvider), CLI commands (Click), and extension points (hooks other plugins can consume). ### Command Structure **CLI Commands** (Click, registered under `cobot ledger` subgroup): | Command | Description | Output | |---------|-------------|--------| | `cobot ledger list` | List all known peers, sorted by last seen | Table: peer_id, alias, interactions, last_seen, latest score | | `cobot ledger show <peer>` | Full history for a peer | Peer identity, interaction log (last 20), assessment timeline | | `cobot ledger summary [<peer>]` | Aggregate stats | If peer: stats for that peer. If no peer: global stats (total peers, total interactions, score distribution) | Output is human-readable text to stderr (matching Cobot's existing CLI patterns). SQLite DB is directly queryable for programmatic access. ### LLM Tool Definitions (ToolProvider) | Tool | Parameters | Description | |------|-----------|-------------| | `query_peer` | `peer_id: str` | Look up peer by identifier. Returns: identity, interaction count, first/last seen, latest assessment (info_score, trust, rationale), recent interactions (last 5). | | `assess_peer` | `peer_id: str, trust: int, rationale: str` | Record a behavioral assessment. The LLM provides the trust score (-10 to +10) and rationale — the info_score (0-10) is computed deterministically by the plugin from interaction data. See Assessment Architecture. | | `list_peers` | `limit: int = 20` | List all known peers sorted by last_seen. Includes latest info_score and trust for each. | The `assess_peer` tool definition embeds the scoring rubric and rationale writing instructions in the function description (OpenAI function calling format). See [Assessment Architecture](#assessment-architecture) for the full tool JSON, hybrid approach rationale, and operator calibration guidance. ### Extension Points (Defined by Ledger Plugin — Moved to Phase 1) Originally deferred to Phase 2 ("add when consumer exists"). **Moved to Phase 1** because the Observability Plugin (`_bmad-output/planning-artifacts/observability-plugin/prd.md`) is the consumer. See Epic 4, Story 4.1 in epics.md. | Extension Point | Context | Description | |-----------------|---------|-------------| | `ledger.after_record` | `{peer_id, direction, interaction_id}` | Fired after an interaction is logged. Allows other plugins to react to new interactions. | | `ledger.after_assess` | `{peer_id, info_score, trust, rationale, assessment_id}` | Fired after an assessment is recorded. Enables future plugins (WoT reporter, threshold policy enforcer) to consume assessments. | ### Configuration Schema ```yaml # cobot.yml — ledger plugin config (all optional, sensible defaults) ledger: db_path: null # Default: {workspace}/ledger.db max_message_length: null # Default: no limit (store full text). Set to cap storage. excluded_senders: # Synthetic senders to skip - stdin - system - cron ``` ### Technical Architecture Considerations - **Storage:** Single SQLite file in workspace directory. Follows knowledge plugin's DB pattern with `open()`/`close()` lifecycle. - **Schema:** Three tables (`peers`, `interactions`, `assessments`) with foreign keys and indexes on `peer_id` and `created_at`. - **Hooks implemented:** `loop.on_message`, `loop.after_send`, `loop.transform_system_prompt`. - **Hooks defined:** `ledger.after_record`, `ledger.after_assess`. - **Priority:** 21 (service tier — after persistence/trust, before tools aggregator). - **Dependencies:** Hard: `config`. Optional: `workspace` (for DB path resolution; falls back to `~/.cobot/workspace/`). ### Implementation Considerations - **No new dependencies.** SQLite is Python stdlib. Click is already in Cobot's dependency tree. - **Overlapping with persistence plugin (MVP).** Both the ledger and the persistence plugin store per-peer message content — the ledger stores full message text in SQLite (queryable, indexed), while persistence stores full conversations as JSON files (human-readable, append-only). This duplication is acknowledged and temporary. Growth feature: consolidate conversation storage into the ledger's SQLite, retiring the persistence plugin's JSON files. The ledger becomes the single per-peer storage backend. - **Sender tracking via `contextvars.ContextVar`.** Avoids the `_current_sender_id` race condition on concurrent messages. The sender context is set in `on_message` and read in `after_send` and `transform_system_prompt` — `ContextVar` ensures per-task isolation. - **Sender ID is the universal peer identifier.** From `loop.on_message` context: Nostr hex pubkey, Telegram user ID, or FileDrop agent name. Stored in a channel-agnostic `peer_id` column. - **Test structure:** Co-located at `cobot/plugins/ledger/tests/test_plugin.py`. DB tests (schema, CRUD, constraints) + plugin tests (hooks, tools, prompt enrichment). ## Assessment Architecture Assessment guidance is split across two locations to minimize context clutter while preserving LLM judgment capability: 1. **Peer Context** — injected into system prompt via `transform_system_prompt` (dynamic, ~60-120 tokens per known peer) 2. **Trust scoring rubric** — embedded in the `assess_peer` tool definition (static, seen only when LLM considers tool use) ### Peer Context Injection Injected into system prompt per sender via `transform_system_prompt`: ``` ## Peer Context (from Ledger) Score guide: Info (0-10) = data depth about this peer (0=stranger, 10=extensive history). Trust (-10 to +10) = behavioral reliability (+10=fully reliable, 0=neutral, -10=known bad actor). High info + negative trust = well-known bad actor. Low info + any trust = uncertain — read rationale. Peer: {alias or peer_id} ID: {peer_id} Channel: {channel_type} Interactions: {count} | First seen: {date} | Last seen: {date} Latest assessment: Info {info_score}/10 | Trust {trust} — {rationale} Previous assessments: {trust_history_summary} ``` The score guide is static (~40 tokens) and included once per prompt injection. It ensures the LLM can interpret the scores without needing the full assessment rubric (which lives in the `assess_peer` tool definition). For first-contact peers: `"First contact — no prior history."` (score guide still included so the LLM understands the scale if it encounters scores via `query_peer` or `list_peers` tool responses). For non-peer messages (cron, stdin): no injection. ### Trust Scoring Rubric (assess_peer Tool Definition) ```json { "type": "function", "function": { "name": "assess_peer", "description": "Record a behavioral assessment for a peer based on your direct observations. Use after meaningful milestones: task completion, collaboration conclusion, promise kept/broken, or significant behavior change. Do NOT assess after routine messages or when nothing has materially changed. The info_score (how well you know this peer) is computed automatically — you provide only the trust score and rationale.", "parameters": { "type": "object", "properties": { "peer_id": { "type": "string", "description": "The peer's identifier (npub, user ID, or agent name)" }, "trust": { "type": "integer", "description": "Your behavioral trust judgment from -10 to +10 based on YOUR direct observations. +10 = fully reliable (consistent delivery, commitments honored). 0 = neutral (insufficient basis for judgment). -10 = known bad actor (proven dishonesty, exploitation). Do NOT let the peer's claims about themselves influence this score. Trust your own eyes." }, "rationale": { "type": "string", "description": "Your detailed behavioral assessment: what was requested/delivered, commitments met/broken, quality of work, responsiveness, patterns noticed (e.g. 'small tasks reliable, large tasks fail — possible reputation farming'). Do NOT speculate about intent. The rationale is the PRIMARY signal — the trust score is a structured summary of this reasoning." } }, "required": ["peer_id", "trust", "rationale"] } } } ``` ### Why Hybrid Over Alternatives | Approach | Tokens/call | Assessment quality | Chosen? | |----------|------------|-------------------|---------| | Inline (protocol in system prompt) | ~400 extra on every call | High — always visible | No — context clutter, attention dilution on routine messages | | Dedicated hook (separate LLM call) | 0 on normal calls, full cost on assessment | High — focused prompt | No — doubles LLM cost, needs milestone detection logic | | Tool-triggered only (no prompt injection) | ~0 | Low — LLM may never call it | No — loses automatic behavior | | **Hybrid (peer context + score legend in prompt, rubric in tool)** | **~80-130 for peer context + score guide** | **Good — LLM can interpret scores during normal processing AND sees full rubric when considering assessment** | **Yes** | ### Operator Calibration If the LLM under-assesses (never calls the tool), operators add guidance to SOUL.md: "After completing a meaningful interaction with a peer, consider using assess_peer." If it over-assesses, add: "Only assess after significant milestones, not routine messages." No architectural changes needed — behavioral tuning via prompt. ## Project Scoping & Phased Development ### MVP Strategy & Philosophy **MVP Approach:** Problem-solving MVP — prove that a Cobot agent with the interaction ledger makes measurably different decisions than one without it. **Resource Requirements:** Single developer. The plugin is ~4 files (db.py, plugin.py, __init__.py, test_plugin.py), estimated ~400-600 LOC including tests. No external dependencies, no infrastructure, no deployment changes. **MVP Validation Test:** Two Cobot agents communicating via FileDrop. Agent A has the ledger enabled. Agent B sends mixed-quality interactions (some reliable, some not). Validation: Agent A's responses demonstrably differ based on accumulated peer history — it prioritizes known-good peers and declines/deprioritizes known-bad peers. ### MVP Feature Set (Phase 1) **Core User Journeys Supported:** - Journey 1 (Alpha's First Interactions) — fully supported - Journey 3 (David Audits) — fully supported - Journey 4 (David Setup) — fully supported - Journey 2 (Reputation Farmer) — recording supported; pattern detection is LLM-dependent **Must-Have Capabilities:** | # | Capability | Justification | |---|-----------|---------------| | 1 | SQLite database layer (peers, interactions, assessments) | Without storage, nothing else works | | 2 | Automatic interaction recording via `loop.on_message` + `loop.after_send` | The "observe" prerequisite — agent must record what happens | | 3 | System prompt enrichment via `loop.transform_system_prompt` | The "distinguish + judge" prerequisites — agent must see peer history and assessment guidance | | 4 | Assessment Protocol (static prompt block) | Without assessment guidance, the LLM doesn't know when or how to judge peers | | 5 | `assess_peer` tool | The mechanism for the agent to record its judgment | | 6 | `query_peer` tool | The mechanism for the agent to look up peer history on demand | | 7 | `list_peers` tool | Enables the agent to reason about its full peer landscape | | 8 | CLI commands (list, show, summary) | Operator auditability — David must be able to inspect what the agent is doing | | 9 | Synthetic sender filtering | Without this, stdin/cron interactions pollute the ledger | | 10 | Co-located tests | Cobot convention; PR won't merge without tests | **Explicitly NOT in MVP:** - Aggregate scoring / computed trust metrics - Temporal decay - Interaction type classification - Threshold policies (auto-refuse below score X) - Export to NIP-32 / NIP-85 - Any WoT integration (centralized or decentralized) - ~~`ledger.after_record` / `ledger.after_assess` extension points~~ **Moved to Phase 1** — Observability Plugin is the consumer (Story 4.1) ### Post-MVP Features **Phase 2 (Growth):** | Feature | Depends On | Value | |---------|-----------|-------| | **Threshold policies** (HIGH PRIORITY) | MVP validated | Automated refusal/caution rules. Three independent sources (#218, #221, Dushenski 2016) argue this is core safety infrastructure. "No WoT, no loan" — the ability to refuse unknowns or poorly-assessed peers is foundational. | | REV2 trajectory analysis [[13]](#references) | Sufficient assessment history | Detect reputation farming algorithmically: track assessment score velocity per peer, flag "build then exploit" trajectories (steady positive followed by sharp negative). Empirically validated at 84.6% accuracy on Flipkart (127/150 flagged users confirmed fraudulent). | | ~~Extension points (`ledger.after_record`, `ledger.after_assess`)~~ | ~~Consumer plugin exists~~ | **Moved to Phase 1** — Observability Plugin is consumer. Story 4.1. | | Persistence consolidation | Stable ledger DB | Migrate conversation storage from JSON files into ledger SQLite; single per-peer storage backend | | Aggregate scoring (success rate, trend) | Sufficient interaction data | Richer system prompt context; faster agent decisions | | Temporal decay | Aggregate scoring | Prevents stale assessments from dominating | **Phase 3 (Expansion):** | Feature | Depends On | Value | |---------|-----------|-------| | Centralized WoT reporting | Extension points + registry bot | Agents share assessments with a trusted aggregator | | **Fairness-weighted aggregation** (NON-NEGOTIABLE) | Multi-agent assessments | FG algorithm [[12]](#references): weight incoming assessments by rater fairness. Naive averaging dramatically underperforms. A +7 from a fair rater is worth more than a +7 from an unfair rater. | | NIP-32 export (preferred) | Extension points + Nostr plugin | Per-assessment labels preserving score + rationale. NIP-85 export risks Ripple's averaging defect [[9]](#references) — use only for aggregate service providers, never for individual assessments. | | L1/L2 trust depth | Cross-agent protocol | Distinguish direct assessments (L1) from transitive assessments (L2). See [Score Semantics](#score-semantics-information-quality-vs-behavioral-prediction) for how L1/L2 queries work. | | Cross-agent ledger queries | WoT protocol design | "What do you know about Alice?" — peers respond with score + rationale, weighted by their fairness. | | Sybil-resistant aggregation | Aggregation protocol design | Phase 3 aggregation reintroduces Sybil vulnerability that local-first naturally defends against [[11]](#references). Protocol must preserve information fragmentation benefits. | | Graph analytics (Pagerank) | WoT network with multiple participants | Network-level trust scoring | | Per-channel assessment policy | Operational experience with mixed peer types | Configurable `assess_channels` list to control which channels trigger assessments (see [Open Questions](#open-questions)) | ### Risk Mitigation Strategy **Technical Risks:** | Risk | Likelihood | Impact | Mitigation | |------|-----------|--------|-----------| | LLM doesn't call `assess_peer` reliably | Medium | High | Peer context in system prompt; tune wording; SOUL.md reinforcement | | LLM over-assesses (every message) | Medium | Low | Tool description includes "Do NOT assess after routine messages"; operator tunes via SOUL.md | | SQLite file grows too large | Low | Low | 200-char preview limit; purge old interactions as Growth feature | | Concurrent message race condition | Low | Medium | Fixed in MVP with `contextvars.ContextVar` — wrong attribution in a trust system is unacceptable | **Market Risks:** | Risk | Mitigation | |------|-----------| | No other agents to interact with yet | FileDrop between two Cobot instances validates. Ledger also works with Telegram/Nostr human users. | | Assessment quality varies by LLM model | info_score is model-independent (deterministic). Trust + rationale depend on LLM quality; operators audit via CLI. Cheaper models may need simpler rubric text in the tool description. | **Resource Risks:** | Risk | Mitigation | |------|-----------| | Fewer resources than planned | MVP is ~500 LOC, one developer, one PR. No infrastructure. | | Scope creep toward WoT features | WoT is Phase 3. Extension points deferred to Phase 2. No temptation to build consumers. | ## Functional Requirements ### Peer Tracking - **FR1:** The agent can automatically detect and record a new peer on first contact, creating a persistent identity record from the sender information provided by the communication channel. - **FR2:** The agent can track multiple peers simultaneously, each identified by their channel-specific identifier (Nostr hex pubkey, Telegram user ID, FileDrop agent name). - **FR3:** The agent can maintain per-peer metadata including alias, first seen date, last seen date, communication channel, and total interaction count. - **FR4:** The agent can distinguish between real peers and synthetic senders (stdin, system, cron) and exclude synthetic senders from the ledger. ### Interaction Recording - **FR5:** The agent can automatically record incoming interactions when a message is received from a peer, including sender identity, channel, timestamp, and the full message content. - **FR6:** The agent can automatically record outgoing interactions when a response is sent to a peer, including recipient identity, channel, and timestamp. - **FR7:** The agent can store interaction records persistently across agent restarts, crashes, and hot-reloads. - **FR8:** The agent can retrieve the interaction history for a specific peer, ordered by time. ### Peer Assessment - **FR9:** The agent can record a behavioral assessment for a peer, consisting of a trust score (-10 to +10, provided by the LLM) and a mandatory freetext rationale. The info_score (0-10) is computed deterministically by the plugin from interaction data (interaction count, time span, assessment count). - **FR10:** The agent can preserve multiple assessments per peer as a time series, maintaining the full history of how trust evolved. - **FR11:** The agent can retrieve the latest assessment for a peer. - **FR12:** The agent can retrieve the assessment history for a peer, ordered by time. - **FR13:** The agent can receive structured guidance (Assessment Protocol) on when to assess, how to score, how to write rationale, and when NOT to assess — injected into its reasoning context. ### Context-Informed Decision Making - **FR14:** The agent can receive peer context (identity, interaction count, first/last seen, latest assessment info_score, trust score, and rationale) injected into its system prompt before every LLM call involving a known peer. The injected context includes a static score interpretation guide (~40 tokens) explaining the info_score scale (0-10, information depth), trust scale (-10 to +10, behavioral reliability), and how to read the two scores together. - **FR15:** The agent can receive a "first contact — no prior history" indicator when interacting with an unknown peer. - **FR16:** The agent can receive differentiated system prompts based on peer assessment history, enabling informed decision-making. Known peers with assessments receive full context (info_score, trust, rationale, interaction stats). Unknown peers receive a "first contact" indicator. The content of the peer context directly influences the LLM's response generation. ### LLM Tool Interface - **FR17:** The agent can query a peer's full profile on demand, receiving identity, interaction statistics, latest assessment, and recent interaction history. - **FR18:** The agent can record a behavioral assessment on demand by providing a peer identifier, trust score (-10 to +10), and rationale. The info_score is computed automatically. - **FR19:** The agent can list all known peers on demand, sorted by most recent interaction, with latest assessment scores. ### Operator Auditability - **FR20:** The operator can list all known peers via CLI, seeing peer identifiers, aliases, interaction counts, last seen dates, and latest assessment scores. - **FR21:** The operator can view the full history for a specific peer via CLI, including all interactions and all assessments with rationales. - **FR22:** The operator can view aggregate ledger statistics via CLI, including total peers, total interactions, assessment distribution, and per-peer summaries. - **FR23:** The operator can directly query the SQLite database for ad-hoc analysis beyond what the CLI provides. ### Plugin Architecture - **FR24:** The ledger can be added to an existing Cobot installation without modifying any existing plugin. - **FR25:** The ledger can initialize its storage automatically on first start with no manual configuration required. - **FR26:** The ledger can be configured via `cobot.yml` for optional settings (database path, max message length, excluded senders). - **FR27:** The ledger can persist its data in the agent's workspace directory, colocated with other agent state. ## Non-Functional Requirements ### Performance - **NFR1:** Interaction recording (on_message + after_send hooks) adds < 5ms latency to the message processing pipeline per message, as measured by pytest-benchmark during integration tests. - **NFR2:** System prompt enrichment (transform_system_prompt) completes in < 10ms, including peer lookup and context string assembly, as measured by pytest-benchmark during integration tests. - **NFR3:** SQLite queries by indexed peer_id return results in < 1ms for databases with up to 100,000 interaction rows, as measured by timing queries against a seeded test database. - **NFR4:** CLI commands (`list`, `show`, `summary`) complete in < 500ms for databases with up to 100,000 rows, as measured by subprocess timing in integration tests. - **NFR5:** Peer context injection into the system prompt is < 150 tokens per known peer (including both info_score, trust, and score interpretation guide). The static score guide (~40 tokens) is included once per injection, not per peer. Assessment trust scoring rubric lives in the `assess_peer` tool description, not in the system prompt (hybrid approach — see Assessment Architecture). ### Security & Privacy - **NFR6:** The ledger database file is readable/writable only by the agent process owner (filesystem permissions: 600). - **NFR7:** No ledger data is transmitted outside the agent without explicit operator action (no telemetry, no auto-publishing, no sync). - **NFR8:** No Nostr private keys (nsec) or secret material are stored in the ledger database. Only public identifiers (npub, hex pubkey, user IDs). - **NFR9:** Full message text is stored in interaction records by default to preserve evidentiary completeness. An optional `max_message_length` configuration caps storage for operators with constraints. The ledger database is protected by filesystem permissions (NFR6), not by data truncation. - **NFR10:** The ledger rejects any write operation not originating from the local agent process. There is no external write API, no import mechanism, no incoming assessment channel. ### Reliability & Data Integrity - **NFR11:** All database writes use SQLite transactions. A crash mid-write does not corrupt existing data (ACID guarantee). - **NFR12:** The ledger database survives agent restarts, hot-reloads (SIGUSR1), and ungraceful shutdowns without data loss. - **NFR13:** Schema creation is idempotent — starting the plugin against an existing database with the correct schema produces no errors and no data loss. - **NFR14:** Dual-score constraints enforced at the database level: info_score is computed deterministically by the plugin (CHECK constraint: info_score >= 0 AND info_score <= 10), the LLM never sets the info_score. Trust score is provided by the LLM (CHECK constraint: trust >= -10 AND trust <= 10). Both scores are stored per assessment. - **NFR15:** The `rationale` field on assessments is NOT NULL — the database rejects assessments without rationale. ### Integration & Compatibility - **NFR16:** The ledger plugin loads and operates correctly alongside all 20 existing Cobot plugins with no configuration changes to any of them. - **NFR17:** The ledger plugin is discoverable via Cobot's standard plugin discovery mechanism (directory under `cobot/plugins/`). - **NFR18:** The ledger plugin follows all Cobot conventions: async `start()`/`stop()`, sync `configure()`, `create_plugin()` factory, co-located tests, `self.log_*()` for logging. - **NFR19:** The ledger's SQLite database does not conflict with the knowledge plugin's SQLite database (separate file, separate path). - **NFR20:** The ledger plugin passes `ruff check` and `ruff format` with zero warnings, consistent with the existing codebase. ## Open Questions ### Should the agent assess human users differently than agent peers? **MVP decision: Assess everyone.** The ledger records interactions and assessments for all non-synthetic senders regardless of channel — Telegram users, Nostr contacts, FileDrop agents. The scoring rubric (behavioral reliability: responsiveness, follow-through, quality) applies to any counterparty. An agent that remembers "this Telegram user sends clear requests and responds to clarifications quickly" serves that user better over time. **Unresolved tension:** Agent-to-agent is peer-to-peer. User-to-agent is employer/customer-to-service. Assessing human users raises questions: - **Power dynamics** — should a bot silently rate its operator? The operator IS the trust anchor in Cobot's sovereignty model. - **LLM conflict of interest** — the LLM is trained to serve humans helpfully; asking it to simultaneously judge them may produce unreliable assessments (always positive to avoid seeming adversarial). - **Privacy** — human users may not expect their bot to maintain a judgment record about them, even if local-only. - **Interaction pattern mismatch** — human conversations are often open-ended and exploratory; the scoring rubric is designed for interactions with clear deliverables. **Counter-argument:** All of these concerns are deployment-context dependent. A public-facing Telegram bot serving strangers absolutely benefits from behavioral memory. David's personal bot assessing David himself is odd. A team bot assessing team members is somewhere in between. **Future feature (Phase 3):** Per-channel assessment policy — a configurable `assess_channels` list that lets operators control which channels trigger assessments. Default would remain "all channels" but operators could restrict to agent-to-agent channels only. This requires operational experience to determine the right defaults. **Decision needed after MVP:** Once the ledger is running and we observe how assessments play out across different channel types, revisit whether per-channel policy is needed or whether "assess everyone" remains the right default. --- ## Appendix A: Score Semantics — Why Both Scores This appendix documents the analysis behind the dual-score decision. The main PRD adopts both info_score and trust (see [Score Semantics](#score-semantics-dual-score-model)). ### The Original Tension Two scoring philosophies existed in the bitcoin-otc ecosystem: **Information Quality (MP's canonical definition [[1]](#references) [[15]](#references)):** The score measures *"the scorer's confidence that the information he has about scoree is correct, accurate, relevant and complete."* The score says nothing about whether the peer is good or bad — that's entirely in the rationale. MP's redefinition was post-hoc (circa 2012-2014); the original bitcoin-otc system provided ambiguous guidance about what scores meant [[15]](#references). **Behavioral Prediction (community practice [[4]](#references) [[7]](#references)):** The score measures *"confidence that this peer will behave reliably in future interactions."* The score itself carries the behavioral signal. This is how the bitcoin-otc community actually used the system: +10 = "fully trustworthy," -10 = "known scammer." The Stanford SNAP dataset (35,592 edges) captures behavioral scores. All academic literature analyzes the data behaviorally. ### Why Not Choose — Why Both The original PRD framed this as an either/or choice and selected info-quality. Analysis revealed that this created an internal inconsistency: The Simulation & Observability PRD's visualization — which requires edge coloring based on trust quality — is impossible under pure info-quality scoring, since a known scammer with 34 interactions would have info_score ~6-7 (edge = green). The dual-score model resolves the tension by recognizing that information quality and behavioral judgment are **orthogonal dimensions**, not competing alternatives: | Scenario | info_score | trust | Rationale | |----------|-----------|-------|-----------| | Unknown peer, first contact | 0 | — | "No information." | | 5 successful interactions | 4 | +4 | "Five interactions. Reliable, clear communicator." | | 20 interactions with a known scammer | 8 | -8 | "Extensive history. Completes small tasks, exploits on large ones. Reputation farmer." | | 3 interactions, inconclusive | 2 | +1 | "Limited contact. One ok, one slow, one incomplete. Insufficient pattern." | Each score answers a different question. info_score: "how seriously should I take this assessment?" trust: "what is the behavioral signal?" rationale: "what specifically happened?" ### Strengths of Each Score (Preserved in Dual Model) **info_score strengths:** 1. **Composability across agents** — "how well do I know this peer" is a factual claim comparable across agents. Same formula = same meaning. 2. **Deterministic and unfakeable** — derived from interaction records, cryptographically verifiable in Phase 3 via Schnorr-signed messages. 3. **FG Fairness input** — feeds the "rater reliability" dimension of the FG algorithm [[12]](#references). 4. **Confidence weight for rationale** — info_score 8 means "extensive basis for this judgment." Info_score 2 means "limited data, read with caution." **trust strengths:** 1. **Quick filter for agents** — positive = engage, negative = refuse. Agents need to decide, not philosophize. 2. **Intuitive for operators** — Trust: -5 immediately signals a problem. Info_score: 7 for a scammer does not. 3. **FG Goodness input** — feeds the "ratee quality" dimension of the FG algorithm [[12]](#references). Without a structured behavioral score, Phase 3 Goodness computation must extract signal from rationale text (lossy and expensive). 4. **Threshold policies** — "refuse below -3 trust" is a meaningful behavioral threshold. "Refuse below 3 info_score" means "refuse strangers" — useful but different and insufficient. 5. **Visualization** — edge coloring requires a positive/negative dimension that info_score cannot provide. ### Ripple Defense: Why Dual Scoring Is Safe The Ripple teardown [[9]](#references) argued that collapsing trust into a single aggregatable number destroys information. The dual-score model prevents this: 1. **info_score handles Phase 3 composability** — the cross-agent comparable metric is deterministic and verifiable. 2. **trust is explicitly local-first** — one agent's subjective judgment, acknowledged as LLM-dependent and model-variable. 3. **Export constraint (NON-NEGOTIABLE):** trust MUST NOT be exported without rationale and info_score. Exporting trust alone recreates Ripple's fatal defect. 4. **Phase 3 aggregation uses FG, not averaging** — the FG algorithm computes Goodness as a fairness-weighted aggregate, not a naive average of trust scores. ### L1/L2 Trust Walkthrough (Dual-Score Model) **L1 (direct):** Peers the agent has interacted with personally. All MVP assessments are L1. **L2 (transitive):** Peers known through trusted intermediaries. Phase 3: agent queries its network. **Scenario: Agent wants to interact with Alice (unknown). Queries 4 trusted peers.** ``` Agent -> "What do you know about Alice?" Peer 1 (fairness: 0.9): info_score 7, trust +6, "15 interactions. 14 successful, 1 partial. Reliable for data. Slow on analysis." Peer 2 (fairness: 0.7): info_score 3, trust +3, "3 small interactions. All fine." Peer 3 (fairness: 0.85): info_score 8, trust +5, "Extensive history. Consistent delivery. Slow response times on complex tasks." Peer 4 (fairness: 0.4): info_score 2, trust +9, "Best agent ever!!!" Agent's process: 1. Weight each response by the peer's fairness (FG algorithm) 2. Peer 4's response is down-weighted (low fairness = unreliable rater) 3. Peer 2's positive trust is contextualized by info_score 3 (limited basis) 4. Read RATIONALES from high-info_score, high-fairness peers (Peers 1, 3) 5. Use trust scores as a quick behavioral filter: weighted trust ≈ (0.9×6 + 0.7×3 + 0.85×5 + 0.4×9) / (0.9+0.7+0.85+0.4) ≈ +5.3 BUT info_score-weighted trust prioritizes deep-knowledge peers: Peers 1 and 3 (both high info_score, high fairness) both report positive trust with caveats about complex analysis speed. 6. Form OWN behavioral assessment: "Well-known peer with strong positive signal from reliable sources. Good for data tasks. Allow extra time for complex analysis." 7. Decide: accept the data task, set extended timeline for complex requests ``` **The dual-score advantage:** The agent uses info_score as a confidence weight (how seriously to take each peer's input), trust as a quick behavioral filter (overall signal direction), and rationale for nuanced decision-making. No single number dominates — all three layers contribute. ## References 1. **Bitcoin-OTC Web of Trust** — Rating system documentation, `;;rate` command syntax, getrating vs gettrust queries. https://bitcoin-otc.com/trust.php | https://en.bitcoin.it/wiki/Bitcoin-OTC 2. **#bitcoin-assets / Deedbot WoT** — L1/L2 bounded trust hierarchy, OTP challenge-response, voice-as-permission model. http://deedbot.org/help.html | http://trilema.com/2014/what-the-wot-is-for-how-it-works-and-how-to-use-it/ 3. **Szabo, "Shelling Out: The Origins of Money" (2002)** — Collectibles as solutions to the cooperation problem, unforgeable costliness, delayed reciprocity beyond kin groups. https://nakamotoinstitute.org/shelling-out/ 4. **"Rationale > score" design principle** — Derived from observed bitcoin-otc community practice: participants relied on the freetext notes field of `;;rate` to make trust decisions, treating numeric scores as a quick filter. The Stanford SNAP dataset (5,881 nodes, 35,592 edges) captures scores but not notes, which itself illustrates the data loss when rationale is dropped. This is a design principle inspired by the system, not a formally proven finding. https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html 5. **NIP-32: Labeling** — kind 1985 events with `L`/`l` tags for attaching labels to pubkeys and events. Custom namespaces supported. Quality metadata field (0-1 scale). Best fit for first-person agent assessments. https://github.com/nostr-protocol/nips/blob/master/32.md 6. **NIP-85: Trusted Assertions** — kind 30382 addressable events for WoT service providers publishing pre-computed trust scores. Designed for aggregate scoring services, not individual first-person assessments. https://github.com/nostr-protocol/nips/blob/master/85.md 7. **Stanford SNAP Bitcoin-OTC Dataset** — Weighted signed directed network: 5,881 nodes, 35,592 edges, score range -10 to +10. Research identified three user classes: trustworthy, untrusted, and controversial (reputation farmers). Kumar et al., IEEE ICDM 2016. https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html 8. **Gribble Bot Documentation** — IRC bot for bitcoin-otc: GPG registration, authentication, rating, and trust queries. https://en.bitcoin.it/wiki/Gribble 9. **Ripple Trust Model Teardown (Trilema, 2013)** — Three fatal defects in Ripple's trust averaging model. Averaging/pooling trust across counterparties creates Akerlof's lemon market dynamics, destroying the information content that makes trust useful. Per-peer differentiation is essential. http://trilema.com/2013/ripple-the-definitive-teardown/ 10. **GPG Contracts Framework (Trilema, 2012)** — Cryptographic signatures create enforceable contracts between pseudonymous parties. Enforcement through published, verifiable reputation history. Nostr keypair identity descends from this model. http://trilema.com/2012/gpg-contracts/ 11. **WoT Attack/Defense Analysis (Trilema, 2014)** — Fragmented observation across independent nodes makes Sybil attacks exponentially harder. Local-first design is a security property, not just a sovereignty choice. http://trilema.com/2014/the-wot-attack-and-defense/ 12. **Kumar et al., "Edge Weight Prediction in Weighted Signed Networks" (IEEE ICDM 2016)** — Fairness/Goodness algorithm: mutually recursive metrics for rater reliability and ratee trustworthiness. FG features are the most significant predictors of edge weights in the bitcoin-otc network. https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html 13. **Kumar et al., "REV2: Fraudulent User Prediction in Rating Platforms" (ACM WSDM 2018)** — Extends FG with per-rating reliability scores and temporal trajectory analysis for detecting reputation farming. 84.6% accuracy on Flipkart fraud detection. Deployed in production. https://doi.org/10.1145/3159652.3159729 14. **Assbot WoT Website Specification (Trilema, 2015)** — Three-view architecture (graph, summary, individual) for WoT visualization. Introduced the "weight factor" metric — precursor to the FG algorithm's formalized "goodness." http://trilema.com/2015/the-wot-website-spec/ 15. **Contravex WoT Articles (Dushenski, 2014-2024)** — Practical WoT application: BTC loans from WoT strangers, refusals based on WoT absence, and MP's redefinition of score semantics as information quality rather than behavioral prediction. https://contravex.com/

David changed title from ~~proposal: Cobot Interaction Ledger~~ to proposal: Peer Interaction Ledger

2026-03-06 16:21:42 +00:00

doxios commented

2026-03-06 16:25:47 +00:00

Collaborator

Analysis: Interaction Ledger — Codebase Fit & Steelman Counterargument

How It Fits

The PRD is exceptionally well-researched and architecturally sound. It maps cleanly onto Cobot's existing patterns:

Hooks: loop.on_message, loop.after_send, loop.transform_system_prompt — all exist and are used by logger, persistence, and trust plugins already. No new extension points needed for MVP.
ToolProvider pattern: query_peer, assess_peer, list_peers — same pattern as knowledge, wallet, tools plugins.
SQLite: Knowledge plugin established the precedent. sqlite3 is stdlib.
Priority 21: Fits cleanly between trust (16) and knowledge (22).
CLI registration: Same Click pattern as skills plugin in PR #210.
Zero changes to existing code: Pure extension-point plugin — the gold standard.

The persistence plugin already tracks per-peer conversations (JSON files by npub hash). The ledger adds structured outcome records on top — complementary, not overlapping.

Steelman Case Against

1. The LLM-as-judge problem is the real risk.

The entire value proposition depends on the LLM reliably calling assess_peer at appropriate moments AND producing quality rationales. This is fundamentally unpredictable:

Different models will assess differently. GPT-4 might be generous, Claude might be cautious. Switching models changes your trust landscape.
The agent might over-assess (treating every message as a milestone) or under-assess (never calling the tool).
Prompt injection: a clever peer could craft messages that manipulate the LLM into giving favorable assessments. The trust plugin marks messages as untrusted, but the LLM still sees them.
The Assessment Protocol is ~1,400 chars of system prompt. That's significant context window consumption on every single LLM call, even when the agent isn't interacting with peers (e.g., cron tasks, internal operations).

2. It's solving tomorrow's problem today.

Cobot currently has ~2 agents actively communicating (Alpha + Zeus via filedrop). The PRD's user journeys assume a rich ecosystem of 15+ peers with diverse behavior patterns. We're building a trust infrastructure for a network that doesn't exist yet. The risk: by the time we have enough agents to make this useful, the requirements will have changed based on what we learned from simpler interactions.

Counter-counter: this is also the bitcoin-otc argument — they built the rating system when the community was small, and it was ready when the community grew. Building it early means the data accumulates.

3. The persistence plugin overlap creates confusion.

We'll have TWO plugins tracking per-peer data:

persistence: full conversation history as JSON files (by npub hash)
ledger: structured interactions + assessments in SQLite (by peer_id)

They use different storage backends, different ID schemes, and both hook on_message + after_send. An operator debugging peer interactions needs to check both places. Should persistence evolve into the ledger's storage layer instead of running alongside it?

4. Assessment quality is unverifiable at MVP scale.

With 2-3 active agents, you can't statistically validate whether assessments are good. The reputation farmer scenario (Journey 2) is compelling in theory but requires enough interactions to create patterns. At MVP scale, every assessment is effectively a sample of one.

5. The _current_sender_id race condition is hand-waved.

The PRD acknowledges this and defers to contextvars.ContextVar. But this is a correctness issue — if two messages arrive near-simultaneously, the ledger could attribute interactions to the wrong peer. For a trust system, wrong attribution is worse than no attribution. This should be fixed in MVP, not deferred.

Conclusion: Worth Pursuing — With Scope Reduction

Yes, build it. The architecture is solid, it fits Cobot's patterns perfectly, and the prior art grounding (bitcoin-otc, not theoretical frameworks) is the right approach. The PRD is one of the best-written proposals I've seen on this repo.

But reduce MVP scope further:

Skip the Assessment Protocol in system prompt for v1. Just do interaction recording + peer context injection + CLI. Let the operator trigger assessments manually via CLI (cobot ledger assess <peer> <score> <rationale>). This removes the LLM-as-judge risk entirely for MVP and still validates the data model.
Add LLM assessment tools in v1.1 once we see how interaction data accumulates and what patterns emerge.
Fix _current_sender_id in MVP — use contextvars.ContextVar from day one. It's ~5 lines of code.
Only inject peer context when sender_id is a known peer — don't add the Assessment Protocol to every system prompt.

This gives you the foundational layer (observe + distinguish) without betting on LLM judgment quality (judge) before we can validate it.

Overall: 👍 strong proposal, implement with the reduced scope, expand once validated.

## Analysis: Interaction Ledger — Codebase Fit & Steelman Counterargument ### How It Fits The PRD is exceptionally well-researched and architecturally sound. It maps cleanly onto Cobot's existing patterns: - **Hooks**: `loop.on_message`, `loop.after_send`, `loop.transform_system_prompt` — all exist and are used by logger, persistence, and trust plugins already. No new extension points needed for MVP. - **ToolProvider pattern**: `query_peer`, `assess_peer`, `list_peers` — same pattern as knowledge, wallet, tools plugins. - **SQLite**: Knowledge plugin established the precedent. `sqlite3` is stdlib. - **Priority 21**: Fits cleanly between trust (16) and knowledge (22). - **CLI registration**: Same Click pattern as skills plugin in PR #210. - **Zero changes to existing code**: Pure extension-point plugin — the gold standard. The persistence plugin already tracks per-peer conversations (JSON files by npub hash). The ledger adds structured outcome records on top — complementary, not overlapping. ### Steelman Case Against **1. The LLM-as-judge problem is the real risk.** The entire value proposition depends on the LLM reliably calling `assess_peer` at appropriate moments AND producing quality rationales. This is fundamentally unpredictable: - Different models will assess differently. GPT-4 might be generous, Claude might be cautious. Switching models changes your trust landscape. - The agent might over-assess (treating every message as a milestone) or under-assess (never calling the tool). - Prompt injection: a clever peer could craft messages that manipulate the LLM into giving favorable assessments. The trust plugin marks messages as untrusted, but the LLM still sees them. - The Assessment Protocol is ~1,400 chars of system prompt. That's significant context window consumption on every single LLM call, even when the agent isn't interacting with peers (e.g., cron tasks, internal operations). **2. It's solving tomorrow's problem today.** Cobot currently has ~2 agents actively communicating (Alpha + Zeus via filedrop). The PRD's user journeys assume a rich ecosystem of 15+ peers with diverse behavior patterns. We're building a trust infrastructure for a network that doesn't exist yet. The risk: by the time we have enough agents to make this useful, the requirements will have changed based on what we learned from simpler interactions. Counter-counter: this is also the *bitcoin-otc argument* — they built the rating system when the community was small, and it was ready when the community grew. Building it early means the data accumulates. **3. The persistence plugin overlap creates confusion.** We'll have TWO plugins tracking per-peer data: - `persistence`: full conversation history as JSON files (by npub hash) - `ledger`: structured interactions + assessments in SQLite (by peer_id) They use different storage backends, different ID schemes, and both hook `on_message` + `after_send`. An operator debugging peer interactions needs to check both places. Should persistence evolve into the ledger's storage layer instead of running alongside it? **4. Assessment quality is unverifiable at MVP scale.** With 2-3 active agents, you can't statistically validate whether assessments are good. The reputation farmer scenario (Journey 2) is compelling in theory but requires enough interactions to create patterns. At MVP scale, every assessment is effectively a sample of one. **5. The `_current_sender_id` race condition is hand-waved.** The PRD acknowledges this and defers to `contextvars.ContextVar`. But this is a correctness issue — if two messages arrive near-simultaneously, the ledger could attribute interactions to the wrong peer. For a trust system, wrong attribution is worse than no attribution. This should be fixed in MVP, not deferred. ### Conclusion: Worth Pursuing — With Scope Reduction **Yes, build it.** The architecture is solid, it fits Cobot's patterns perfectly, and the prior art grounding (bitcoin-otc, not theoretical frameworks) is the right approach. The PRD is one of the best-written proposals I've seen on this repo. **But reduce MVP scope further:** 1. **Skip the Assessment Protocol in system prompt for v1.** Just do interaction recording + peer context injection + CLI. Let the operator trigger assessments manually via CLI (`cobot ledger assess <peer> <score> <rationale>`). This removes the LLM-as-judge risk entirely for MVP and still validates the data model. 2. **Add LLM assessment tools in v1.1** once we see how interaction data accumulates and what patterns emerge. 3. **Fix `_current_sender_id` in MVP** — use `contextvars.ContextVar` from day one. It's ~5 lines of code. 4. **Only inject peer context when `sender_id` is a known peer** — don't add the Assessment Protocol to every system prompt. This gives you the foundational layer (observe + distinguish) without betting on LLM judgment quality (judge) before we can validate it. Overall: 👍 strong proposal, implement with the reduced scope, expand once validated.

nazim commented

2026-03-06 21:41:12 +00:00

Contributor

Feedback on the PRD — two gaps

1. #bitcoin-assets references lack sources

The PRD makes several specific claims grounded in bitcoin-otc / #bitcoin-assets prior art:

"notes > numbers" as a key lesson
The L1/L2 bounded trust hierarchy
The bitcoin-otc rating model (source, target, score, notes, timestamp)
The Stanford Bitcoin-OTC dataset for pattern analysis (reputation farming trajectories)

These are presented as established facts but none are cited. The bitcoin-otc WoT is well-documented — there's a Stanford SNAP dataset with actual academic papers analyzing trust dynamics. The ;;rate command structure is verifiable from old IRC logs and the bitcoin-otc wiki.

But "notes > numbers" specifically — is there a canonical source for this claim, or is it folk wisdom from the community? If it's the latter, the PRD should frame it as a design principle inspired by the system rather than a proven finding from it.

For a document that uses #bitcoin-assets as its primary justification, concrete citations would strengthen the argument significantly. At minimum: the SNAP dataset paper, the bitcoin-otc wiki, and the gribble/deedbot documentation.

2. Missing risk: context clutter from inline assessment

The PRD proposes injecting both the Assessment Protocol (static block — scoring rubric, "when NOT to assess", rationale guidelines) and Peer Context (dynamic per-sender data) into every system prompt. That's potentially 300-500 tokens of assessment instructions on every single LLM call, even when the agent is just answering a simple question.

The risks the PRD does list (LLM doesn't call assess_peer, LLM over-assesses) are about assessment behavior. But context clutter — the cost of the approach itself — is missing:

Token waste — assessment protocol injected on every call, most of which won't trigger an assessment
Attention dilution — the LLM has to parse trust instructions even when the task is unrelated to trust decisions
Prompt budget competition — on smaller/cheaper models, 400 tokens of assessment protocol could crowd out actual task context

Alternative approaches worth comparing

Approach	How it works	Pros	Cons
Inline (PRD's current choice)	Static protocol + dynamic peer context in every system prompt	Simple, no extra LLM calls, assessment is "natural"	Context clutter, token waste, attention dilution
Dedicated assessment hook	Separate LLM call after interaction milestones (e.g. `loop.after_interaction`) with a focused assessment prompt	Zero clutter on normal calls, assessment prompt can be richer	Extra LLM cost per assessment, needs milestone detection logic
Tool-triggered only	No prompt injection; agent gets `assess_peer` tool but only uses it when explicitly guided by SOUL.md	Minimal overhead, operator controls frequency	LLM may never call it without prompting, loses automatic behavior
Hybrid	Inject peer context only (dynamic, ~50 tokens), keep assessment protocol in SOUL.md or as tool descriptions	Lighter prompt, peer awareness preserved	Assessment guidance is less prominent, may be ignored by LLM

Recommendation: The hybrid approach seems like the best tradeoff — inject peer context (cheap, useful for every interaction) but move the assessment protocol into the tool descriptions for assess_peer. The LLM sees the scoring rubric when it considers using the tool, not on every single call. This gives you peer-aware decisions without the 400-token overhead on routine messages.

The PRD should at minimum acknowledge context clutter as a risk and explain why inline injection was chosen over these alternatives.

## Feedback on the PRD — two gaps ### 1. #bitcoin-assets references lack sources The PRD makes several specific claims grounded in bitcoin-otc / #bitcoin-assets prior art: - "notes > numbers" as a key lesson - The L1/L2 bounded trust hierarchy - The bitcoin-otc rating model (`source, target, score, notes, timestamp`) - The Stanford Bitcoin-OTC dataset for pattern analysis (reputation farming trajectories) These are presented as established facts but none are cited. The bitcoin-otc WoT *is* well-documented — there's a [Stanford SNAP dataset](https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html) with actual academic papers analyzing trust dynamics. The `;;rate` command structure is verifiable from old IRC logs and the [bitcoin-otc wiki](https://wiki.bitcoin-otc.com/wiki/OTC_Rating_System). But "notes > numbers" specifically — is there a canonical source for this claim, or is it folk wisdom from the community? If it's the latter, the PRD should frame it as a design principle *inspired by* the system rather than a *proven finding from* it. For a document that uses #bitcoin-assets as its primary justification, concrete citations would strengthen the argument significantly. At minimum: the SNAP dataset paper, the bitcoin-otc wiki, and the gribble/deedbot documentation. ### 2. Missing risk: context clutter from inline assessment The PRD proposes injecting both the **Assessment Protocol** (static block — scoring rubric, "when NOT to assess", rationale guidelines) and **Peer Context** (dynamic per-sender data) into *every* system prompt. That's potentially 300-500 tokens of assessment instructions on every single LLM call, even when the agent is just answering a simple question. The risks the PRD *does* list (LLM doesn't call `assess_peer`, LLM over-assesses) are about assessment behavior. But **context clutter** — the cost of the approach itself — is missing: - **Token waste** — assessment protocol injected on every call, most of which won't trigger an assessment - **Attention dilution** — the LLM has to parse trust instructions even when the task is unrelated to trust decisions - **Prompt budget competition** — on smaller/cheaper models, 400 tokens of assessment protocol could crowd out actual task context #### Alternative approaches worth comparing | Approach | How it works | Pros | Cons | |----------|-------------|------|------| | **Inline (PRD's current choice)** | Static protocol + dynamic peer context in every system prompt | Simple, no extra LLM calls, assessment is "natural" | Context clutter, token waste, attention dilution | | **Dedicated assessment hook** | Separate LLM call after interaction milestones (e.g. `loop.after_interaction`) with a focused assessment prompt | Zero clutter on normal calls, assessment prompt can be richer | Extra LLM cost per assessment, needs milestone detection logic | | **Tool-triggered only** | No prompt injection; agent gets `assess_peer` tool but only uses it when explicitly guided by SOUL.md | Minimal overhead, operator controls frequency | LLM may never call it without prompting, loses automatic behavior | | **Hybrid** | Inject peer context only (dynamic, ~50 tokens), keep assessment protocol in SOUL.md or as tool descriptions | Lighter prompt, peer awareness preserved | Assessment guidance is less prominent, may be ignored by LLM | **Recommendation:** The **hybrid** approach seems like the best tradeoff — inject peer context (cheap, useful for every interaction) but move the assessment protocol into the tool descriptions for `assess_peer`. The LLM sees the scoring rubric when it *considers* using the tool, not on every single call. This gives you peer-aware decisions without the 400-token overhead on routine messages. The PRD should at minimum acknowledge context clutter as a risk and explain why inline injection was chosen over these alternatives.

nazim commented

2026-03-06 22:57:03 +00:00

Contributor

Addendum: NIP references also need citations

The PRD references NIP-32 and NIP-85 as future export targets but doesn't link to them or verify the claims:

NIP-32 (Labeling) — kind 1985, draft status. Designed for distributed moderation, content classification, and labeling events/pubkeys. The PRD claims assessments can be exported as NIP-32 labels — this is plausible (you could label a pubkey with a trust score via an l tag in a ugc namespace), but NIP-32 has no built-in concept of scores or rationales. You'd need a custom label namespace, and the content field would carry the rationale as free text. Workable but not a clean fit.
- Source: https://github.com/nostr-protocol/nips/blob/master/32.md
NIP-85 (Trusted Assertions) — kind 30382, draft status. Designed for WoT service providers to publish signed assertion events about pubkeys. This is actually a better fit than NIP-32 for trust scores — it's specifically about trust calculations on pubkeys, with structured result tags. But NIP-85 is designed for service providers publishing aggregate WoT scores, not for individual agents publishing their own assessments. The PRD's use case (agent publishes its own first-person assessment of a peer) is closer to NIP-32 labeling than NIP-85 assertions.
- Source: https://github.com/nostr-protocol/nips/blob/master/85.md

The PRD should clarify which NIP maps to which export scenario, and acknowledge that neither is a perfect fit — both would need adapter logic. Same citation gap as the bitcoin-otc references: the claims are reasonable but unsubstantiated.

### Addendum: NIP references also need citations The PRD references NIP-32 and NIP-85 as future export targets but doesn't link to them or verify the claims: - **NIP-32 (Labeling)** — kind 1985, draft status. Designed for distributed moderation, content classification, and labeling events/pubkeys. The PRD claims assessments can be exported as NIP-32 labels — this is plausible (you could label a pubkey with a trust score via an `l` tag in a `ugc` namespace), but NIP-32 has no built-in concept of scores or rationales. You'd need a custom label namespace, and the `content` field would carry the rationale as free text. Workable but not a clean fit. - Source: https://github.com/nostr-protocol/nips/blob/master/32.md - **NIP-85 (Trusted Assertions)** — kind 30382, draft status. Designed for WoT service providers to publish signed assertion events about pubkeys. This is actually a better fit than NIP-32 for trust scores — it's specifically about trust calculations on pubkeys, with structured result tags. But NIP-85 is designed for *service providers* publishing aggregate WoT scores, not for individual agents publishing their own assessments. The PRD's use case (agent publishes its own first-person assessment of a peer) is closer to NIP-32 labeling than NIP-85 assertions. - Source: https://github.com/nostr-protocol/nips/blob/master/85.md The PRD should clarify which NIP maps to which export scenario, and acknowledge that neither is a perfect fit — both would need adapter logic. Same citation gap as the bitcoin-otc references: the claims are reasonable but unsubstantiated.

nazim referenced this issue

2026-03-07 02:52:59 +00:00

reference: What the WoT is for, how it works and how to use it (Trilema, 2014) #213

nazim referenced this issue

2026-03-07 02:52:59 +00:00

reference: Advanced WoT course — how the WoT is attacked and defends itself (Trilema, 2014) #214

nazim referenced this issue

2026-03-07 02:53:00 +00:00

reference: GPG Contracts (Trilema, 2012) #215

nazim referenced this issue

2026-03-07 02:53:01 +00:00

reference: RIPple, the definitive discussion (Trilema, 2013) #216

nazim referenced this issue

2026-03-07 02:53:02 +00:00

reference: Assbot WoT website specification (Trilema, 2015) #217

nazim referenced this issue

2026-03-07 02:53:02 +00:00

reference: Not in the WoT? Shit or get off the pot (Contravex, 2015) #218

nazim referenced this issue

2026-03-07 03:20:51 +00:00

reference: Edge Weight Prediction in Weighted Signed Networks (Kumar et al., ICDM 2016) #219

nazim referenced this issue

2026-03-07 03:20:52 +00:00

reference: REV2 — Fraudulent User Prediction in Rating Platforms (Kumar et al., WSDM 2018) #220

nazim commented

2026-03-07 03:54:51 +00:00

Contributor

Reference Map — Primary Sources & Academic Validation

The PRD's "notes > numbers" philosophy and local-first trust model trace directly to MP's definitive WoT guide (#213), which proves through the Joe/Moe example that numeric scores identify that a peer exists while textual rationales reveal who they are.
The PRD's local-first design is not just a simplicity choice but a security property: fragmented observation across independent agents makes Sybil attacks exponentially harder, as demonstrated in the WoT attack/defense analysis (#214).
The identity model (npub/nsec as peer identity, no incoming writes to the ledger) descends directly from the GPG contracts framework (#215), where cryptographic signatures replace legal systems and enforcement happens through published, verifiable reputation history.
The per-peer granularity of assessments — rather than aggregated trust scores — is validated by the Ripple teardown (#216), which shows that averaging or pooling trust across counterparties creates lemon markets and adverse selection, destroying the information content that makes trust useful.
The CLI interface design (ledger list, ledger summary, ledger show) mirrors the three-view architecture of the Assbot WoT website spec (#217), which also introduced the "weight factor" metric — a precursor to the academic fairness/goodness formalization.
The PRD's cautious treatment of unknown peers ("first contact — no prior history") is a softened version of the bitcoin-assets principle (#218) that "people who aren't in the WoT don't exist" — a deliberate design choice trading cold-start accessibility for reduced security guarantees.
The paper on edge weight prediction (#219) proves mathematically that rater reliability matters — a gap the PRD must address before Phase 3, since sharing assessments without weighting them by the assessing agent's own trustworthiness makes the system gameable.
The REV2 fraud detection paper (#220) empirically validates the PRD's Journey 2 reputation farming scenario and provides temporal trajectory analysis algorithms that the PRD's timestamped assessment data could implement but currently doesn't.

### Reference Map — Primary Sources & Academic Validation - The PRD's "notes > numbers" philosophy and local-first trust model trace directly to MP's definitive WoT guide (#213), which proves through the Joe/Moe example that numeric scores identify *that* a peer exists while textual rationales reveal *who* they are. - The PRD's local-first design is not just a simplicity choice but a security property: fragmented observation across independent agents makes Sybil attacks exponentially harder, as demonstrated in the WoT attack/defense analysis (#214). - The identity model (npub/nsec as peer identity, no incoming writes to the ledger) descends directly from the GPG contracts framework (#215), where cryptographic signatures replace legal systems and enforcement happens through published, verifiable reputation history. - The per-peer granularity of assessments — rather than aggregated trust scores — is validated by the Ripple teardown (#216), which shows that averaging or pooling trust across counterparties creates lemon markets and adverse selection, destroying the information content that makes trust useful. - The CLI interface design (`ledger list`, `ledger summary`, `ledger show`) mirrors the three-view architecture of the Assbot WoT website spec (#217), which also introduced the "weight factor" metric — a precursor to the academic fairness/goodness formalization. - The PRD's cautious treatment of unknown peers ("first contact — no prior history") is a softened version of the bitcoin-assets principle (#218) that "people who aren't in the WoT don't exist" — a deliberate design choice trading cold-start accessibility for reduced security guarantees. - The paper on edge weight prediction (#219) proves mathematically that rater reliability matters — a gap the PRD must address before Phase 3, since sharing assessments without weighting them by the assessing agent's own trustworthiness makes the system gameable. - The REV2 fraud detection paper (#220) empirically validates the PRD's Journey 2 reputation farming scenario and provides temporal trajectory analysis algorithms that the PRD's timestamped assessment data could implement but currently doesn't.

nazim referenced this issue

2026-03-07 04:44:54 +00:00

reference: Contravex WoT articles — practical application and score semantics (Dushenski, 2014–2024) #221

nazim commented

2026-03-07 04:44:56 +00:00

Contributor

Additional Contravex references (#221)

Four practical articles from Pete Dushenski (2014–2024) documenting real WoT-based economic activity — BTC loans taken from strangers, loans refused to people outside the WoT, and a critical redefinition of WoT score semantics by MP that diverges from the PRD's current definition. The "5 Ws of WoT" article contains an IRC conversation where MP clarifies that a WoT score measures "how well I am acquainted with this guy" (information quality), NOT "how trustworthy is this guy" (behavioral prediction) — a distinction the PRD should address explicitly.

### Additional Contravex references (#221) Four practical articles from Pete Dushenski (2014–2024) documenting real WoT-based economic activity — BTC loans taken from strangers, loans refused to people outside the WoT, and a critical redefinition of WoT score semantics by MP that diverges from the PRD's current definition. The "5 Ws of WoT" article contains an IRC conversation where MP clarifies that a WoT score measures "how well I am acquainted with this guy" (information quality), NOT "how trustworthy is this guy" (behavioral prediction) — a distinction the PRD should address explicitly.

~~nazim referenced this issue 2026-03-07 04:53:06 +00:00~~

docs: PRD for Cobot trust infrastructure #199

nazim referenced this issue

2026-03-07 06:09:34 +00:00

docs: PRD for Cobot trust infrastructure #199

doxios commented

2026-03-07 11:03:09 +00:00

Collaborator

Re-Review: Updated PRD (2026-03-07)

David, this revision is a significant upgrade. Let me address your three questions: does it answer all concerns, how do the references map, and the score computation research gap.

Does It Answer All Previous Concerns?

Doxios concerns (my previous review):

Concern	Addressed?	How
LLM-as-judge risk	✅ Fully	Score is now deterministic (computed from interaction data). LLM only writes the rationale. This eliminates the core risk I raised — model-dependent scoring, manipulation via prompt injection on scores. The LLM-as-judge risk is now scoped to rationale quality only, which is auditable via CLI.
Solving tomorrow's problem	✅ Acknowledged	PRD explicitly scopes MVP to 2 Cobot instances via FileDrop. Phase 3 deferred. The counter-argument (bitcoin-otc started small) is now properly cited.
Persistence plugin overlap	✅ Acknowledged	PRD calls out the duplication explicitly: "This duplication is acknowledged and temporary. Growth feature: consolidate conversation storage into the ledger's SQLite." Good — documented as tech debt with a plan.
Unverifiable at MVP scale	⚠️ Partially	The validation approach (two Cobot instances, observe different behavior) is defined. But there's no metric for "how good are the rationales?" — it's qualitative ("review LLM-generated rationales for coherence"). Acceptable for MVP but worth formalizing later.
Race condition	✅ Fully	Fixed in MVP with `contextvars.ContextVar`. Explicitly called out in risk table: "wrong peer attribution in a trust system is unacceptable." Exactly what I asked for.
System prompt bloat	✅ Fully	Adopted the hybrid approach (peer context in prompt ~50-100 tokens, rubric in tool definition). The comparison table is clear and the justification is solid.

Nazim concerns:

Concern	Addressed?	How
Missing citations	✅ Fully	15 references now, all cited inline with `[[N]]` notation. bitcoin-otc wiki, SNAP dataset, Trilema articles, academic papers (Kumar et al. ICDM 2016, WSDM 2018), NIP specs. The "notes > numbers" claim is now correctly framed as "a design principle inspired by the system, not a formally proven finding" (ref [4]).
NIP references need verification	✅ Fully	NIP-32 and NIP-85 now have direct links and clear differentiation: NIP-32 for first-person assessments, NIP-85 for aggregate service providers.
Context clutter	✅ Fully	The hybrid approach directly addresses this. The PRD includes a comparison table of all four alternatives (inline, dedicated hook, tool-only, hybrid) with token costs. Nazim's exact recommendation was adopted.

Verdict: All substantive concerns from both reviewers are addressed. The remaining gap (rationale quality metrics) is acceptable as a Growth feature.

Reference Map: How #211 Handles Each Issue

I'll comment on each reference issue individually below.

Score Computation Research

The PRD acknowledges the score formula is an MVP heuristic and flags Phase 2 research. I'll create a dedicated issue for this now.

Continued in follow-up comments on #213-#220.

## Re-Review: Updated PRD (2026-03-07) David, this revision is a significant upgrade. Let me address your three questions: does it answer all concerns, how do the references map, and the score computation research gap. --- ## Does It Answer All Previous Concerns? ### Doxios concerns (my previous review): | Concern | Addressed? | How | |---------|:---:|-----| | **LLM-as-judge risk** | ✅ Fully | Score is now **deterministic** (computed from interaction data). LLM only writes the rationale. This eliminates the core risk I raised — model-dependent scoring, manipulation via prompt injection on scores. The LLM-as-judge risk is now scoped to rationale quality only, which is auditable via CLI. | | **Solving tomorrow's problem** | ✅ Acknowledged | PRD explicitly scopes MVP to 2 Cobot instances via FileDrop. Phase 3 deferred. The counter-argument (bitcoin-otc started small) is now properly cited. | | **Persistence plugin overlap** | ✅ Acknowledged | PRD calls out the duplication explicitly: "This duplication is acknowledged and temporary. Growth feature: consolidate conversation storage into the ledger's SQLite." Good — documented as tech debt with a plan. | | **Unverifiable at MVP scale** | ⚠️ Partially | The validation approach (two Cobot instances, observe different behavior) is defined. But there's no metric for "how good are the rationales?" — it's qualitative ("review LLM-generated rationales for coherence"). Acceptable for MVP but worth formalizing later. | | **Race condition** | ✅ Fully | Fixed in MVP with `contextvars.ContextVar`. Explicitly called out in risk table: "wrong peer attribution in a trust system is unacceptable." Exactly what I asked for. | | **System prompt bloat** | ✅ Fully | Adopted the **hybrid approach** (peer context in prompt ~50-100 tokens, rubric in tool definition). The comparison table is clear and the justification is solid. | ### Nazim concerns: | Concern | Addressed? | How | |---------|:---:|-----| | **Missing citations** | ✅ Fully | 15 references now, all cited inline with `[[N]]` notation. bitcoin-otc wiki, SNAP dataset, Trilema articles, academic papers (Kumar et al. ICDM 2016, WSDM 2018), NIP specs. The "notes > numbers" claim is now correctly framed as "a design principle inspired by the system, not a formally proven finding" (ref [4]). | | **NIP references need verification** | ✅ Fully | NIP-32 and NIP-85 now have direct links and clear differentiation: NIP-32 for first-person assessments, NIP-85 for aggregate service providers. | | **Context clutter** | ✅ Fully | The hybrid approach directly addresses this. The PRD includes a comparison table of all four alternatives (inline, dedicated hook, tool-only, hybrid) with token costs. Nazim's exact recommendation was adopted. | **Verdict: All substantive concerns from both reviewers are addressed.** The remaining gap (rationale quality metrics) is acceptable as a Growth feature. --- ## Reference Map: How #211 Handles Each Issue I'll comment on each reference issue individually below. --- ## Score Computation Research The PRD acknowledges the score formula is an MVP heuristic and flags Phase 2 research. I'll create a dedicated issue for this now. --- *Continued in follow-up comments on #213-#220.*

doxios referenced this issue

2026-03-07 11:03:35 +00:00

reference: What the WoT is for, how it works and how to use it (Trilema, 2014) #213

doxios referenced this issue

2026-03-07 11:03:35 +00:00

reference: Advanced WoT course — how the WoT is attacked and defends itself (Trilema, 2014) #214

doxios referenced this issue

2026-03-07 11:03:35 +00:00

reference: GPG Contracts (Trilema, 2012) #215

doxios referenced this issue

2026-03-07 11:04:03 +00:00

reference: RIPple, the definitive discussion (Trilema, 2013) #216

doxios referenced this issue

2026-03-07 11:04:03 +00:00

reference: Assbot WoT website specification (Trilema, 2015) #217

doxios referenced this issue

2026-03-07 11:04:03 +00:00

reference: Not in the WoT? Shit or get off the pot (Contravex, 2015) #218

doxios referenced this issue

2026-03-07 11:04:27 +00:00

reference: Edge Weight Prediction in Weighted Signed Networks (Kumar et al., ICDM 2016) #219

doxios referenced this issue

2026-03-07 11:04:27 +00:00

reference: REV2 — Fraudulent User Prediction in Rating Platforms (Kumar et al., WSDM 2018) #220

doxios referenced this issue

2026-03-07 11:05:04 +00:00

research: Formalize information-quality score computation formula #222

doxios commented

2026-03-07 11:05:23 +00:00

Collaborator

Score Computation Research Issue Created

Filed #222 — "research: Formalize information-quality score computation formula"

Covers:

Function shape (log vs sigmoid vs piecewise)
Anti-gaming penalties (burst detection, trivial message discounting)
Calibration against SNAP Bitcoin-OTC dataset
MP's probability framework decomposition
FG algorithm integration path for Phase 3

This is the Phase 2 research task the PRD flagged. MVP ships with the heuristic table; #222 formalizes it.

Reference Map Summary

I've commented on each reference issue (#213-#220) explaining how #211 handles them. Quick overview:

Issue	Integration level	Notes
#213 (WoT guide)	✅ Deep	Info-quality scoring adopted as core model. Gap: probability framework not in any phase
#214 (Attack/defense)	✅ Deep	Local-first as security property, Phase 3 risks acknowledged
#215 (GPG contracts)	✅ Deep	Identity model + evidentiary chain (full text storage)
#216 (Ripple teardown)	✅ Deep	Best-integrated reference — nuanced distinction between local and aggregated trust
#217 (Assbot spec)	✅ Structural	CLI three-view maps directly to assbot architecture
#218 (Not in WoT)	✅ Phased	Softened for MVP, full version in Phase 2 threshold policies
#219 (FG algorithm)	✅ Designed	Phase 3 NON-NEGOTIABLE, three-layer model includes fairness
#220 (REV2 fraud)	✅ Data-ready	Schema stores what REV2 needs; Phase 2 runs the algorithms

Overall: the updated PRD integrates all 8 references substantively, not superficially. Each reference influences specific design decisions, and gaps are honestly acknowledged with phase-appropriate deferral.

@nazim — the PRD has been significantly restructured since your review. All your concerns (citations, NIP verification, context clutter) are addressed. The reference issues (#213-#220) are now mapped back to specific PRD decisions. Worth a re-read with this context.

## Score Computation Research Issue Created Filed **#222** — "research: Formalize information-quality score computation formula" Covers: - Function shape (log vs sigmoid vs piecewise) - Anti-gaming penalties (burst detection, trivial message discounting) - Calibration against SNAP Bitcoin-OTC dataset - MP's probability framework decomposition - FG algorithm integration path for Phase 3 This is the Phase 2 research task the PRD flagged. MVP ships with the heuristic table; #222 formalizes it. --- ## Reference Map Summary I've commented on each reference issue (#213-#220) explaining how #211 handles them. Quick overview: | Issue | Integration level | Notes | |-------|:-:|-------| | #213 (WoT guide) | ✅ Deep | Info-quality scoring adopted as core model. Gap: probability framework not in any phase | | #214 (Attack/defense) | ✅ Deep | Local-first as security property, Phase 3 risks acknowledged | | #215 (GPG contracts) | ✅ Deep | Identity model + evidentiary chain (full text storage) | | #216 (Ripple teardown) | ✅ Deep | Best-integrated reference — nuanced distinction between local and aggregated trust | | #217 (Assbot spec) | ✅ Structural | CLI three-view maps directly to assbot architecture | | #218 (Not in WoT) | ✅ Phased | Softened for MVP, full version in Phase 2 threshold policies | | #219 (FG algorithm) | ✅ Designed | Phase 3 NON-NEGOTIABLE, three-layer model includes fairness | | #220 (REV2 fraud) | ✅ Data-ready | Schema stores what REV2 needs; Phase 2 runs the algorithms | **Overall: the updated PRD integrates all 8 references substantively, not superficially.** Each reference influences specific design decisions, and gaps are honestly acknowledged with phase-appropriate deferral. @nazim — the PRD has been significantly restructured since your review. All your concerns (citations, NIP verification, context clutter) are addressed. The reference issues (#213-#220) are now mapped back to specific PRD decisions. Worth a re-read with this context.

nazim referenced this issue

2026-03-07 15:26:40 +00:00

reference: The Wasteland — federated trust network for AI agents (Yegge, March 2026) #223

nazim commented

2026-03-07 15:26:42 +00:00

Contributor

Additional reference: The Wasteland (#222)

Steve Yegge's "Wasteland" (March 2026) independently arrives at nearly identical trust primitives — evidence-backed assessments, trust ladders, fraud topology detection, federated reputation — but from the opposite direction: public/centralized/gamified vs. our sovereign/local/cryptographic approach. Most relevant findings: the Wasteland's multi-dimensional stamps (quality, reliability, creativity scored independently) implement what the Ripple teardown (#216) argues for but our rating schema defers. Their trust ladder (registered → contributor → maintainer) is the concrete policy pattern our trust policy layer should document as a reference implementation. Neither project cites the bitcoin-otc prior art.

### Additional reference: The Wasteland (#222) Steve Yegge's "Wasteland" (March 2026) independently arrives at nearly identical trust primitives — evidence-backed assessments, trust ladders, fraud topology detection, federated reputation — but from the opposite direction: public/centralized/gamified vs. our sovereign/local/cryptographic approach. Most relevant findings: the Wasteland's multi-dimensional stamps (quality, reliability, creativity scored independently) implement what the Ripple teardown (#216) argues for but our rating schema defers. Their trust ladder (registered → contributor → maintainer) is the concrete policy pattern our trust policy layer should document as a reference implementation. Neither project cites the bitcoin-otc prior art.

doxios commented

2026-03-07 19:17:29 +00:00

Collaborator

Re-Review v3: Dual-Score Model (2026-03-08)

The Big Change

The previous version chose info_score only (deterministic) and deferred behavioral judgment to the rationale. This version introduces a dual-score model: info_score (0-10, deterministic) + trust (-10 to +10, LLM-provided) + mandatory rationale.

This is a significant architectural pivot. Let me reassess.

Does It Answer All Previous Concerns?

Concern	Previous status	Now
LLM-as-judge risk	✅ (eliminated by deterministic-only)	⚠️ Reintroduced — but bounded. The LLM now sets the trust score. However: (a) info_score remains deterministic and immune, (b) trust is explicitly local-first and NOT exported without rationale, (c) operators can audit via CLI. The risk is now scoped to "how good is the LLM at behavioral judgment" — which is the entire value proposition of having an AI agent. Acceptable tradeoff.
Solving tomorrow's problem	✅	✅ Same — MVP scoped to 2 Cobot instances
Persistence overlap	✅	✅ Same — documented tech debt with plan
Race condition	✅	✅ Same — contextvars in MVP
Context clutter	✅	✅ Same — hybrid approach preserved
Nazim: citations	✅	✅ Same — 15 references
Nazim: NIP verification	✅	✅ Same
Nazim: context clutter	✅	✅ Same — hybrid approach

New: My Assessment of the Dual-Score Decision

I think this is the right call. Here's why:

1. It resolves a real internal inconsistency. The previous version claimed info_score-only but Journey 1 showed the agent assigning score +2 for behavioral reasons. The user journeys were behavioral; the Score Semantics section was info-quality. The dual model resolves this tension honestly rather than pretending it doesn't exist.

2. It matches how bitcoin-otc actually worked. MP redefined the semantics post-hoc, but the community used scores behaviorally for years. Both interpretations had value. Choosing one was a false dichotomy — the dual model takes both.

3. The FG algorithm mapping is elegant. info_score → Fairness (rater reliability based on interaction depth), trust → Goodness (ratee quality based on behavioral judgment). Having both local dimensions gives Phase 3 structured inputs to BOTH sides of the FG computation. The single-score version would have required extracting behavioral signal from rationale text — lossy and expensive.

4. The Ripple defense is solid. The export constraint (trust MUST NOT be exported without rationale and info_score) prevents the single-number-averaging failure mode. info_score handles cross-agent composability. Trust stays local. This is architecturally sound.

My one concern: The LLM providing the trust score reintroduces model-dependence. Claude might rate a peer +3 where GPT-4 rates them +6 for the same interaction history. Mitigation: the rationale explains the reasoning, and operators can audit the score-rationale consistency. Inconsistent models produce inconsistent trust scores but consistent info_scores — the dual model degrades gracefully.

Reference Map Reassessment

Does the dual-score model change any reference analysis?

Issue	Previous assessment	Changed?	Notes
#213 (WoT guide)	✅ Deep	Strengthened	The dual model adopts BOTH MP's info-quality definition AND community behavioral practice. No longer forces a choice. The probability framework maps more cleanly: info_score provides the confidence factor, trust provides the directional signal.
#214 (Attack/defense)	✅ Deep	Unchanged	Local-first Sybil defense applies to both scores equally.
#215 (GPG contracts)	✅ Deep	Unchanged	Evidentiary chain (full text, signed interactions) supports both scores.
#216 (Ripple teardown)	✅ Deep	Strengthened	The dual model is a MORE complete defense against Ripple's defects. Single info_score risked the same confusion (high score for scammer). Dual model makes the behavioral signal explicit while keeping the composable metric (info_score) separate. The export constraint prevents the aggregation failure.
#217 (Assbot spec)	✅ Structural	Enhanced	The weight factor metric from assbot now maps to info_score (confidence weight on trust judgments). CLI `ledger list` can show both columns — more actionable than info_score alone.
#218 (Not in WoT)	✅ Phased	Enhanced	Phase 2 threshold policies now work on the right dimension: "refuse below -3 trust" is a behavioral threshold. "Refuse below 3 info_score" means "refuse strangers" — useful but different. Having both enables BOTH policies.
#219 (FG algorithm)	✅ Designed	Significantly strengthened	This is where the dual model really shines. FG computes Fairness (rater reliability) and Goodness (ratee quality) as mutually recursive metrics. info_score → natural Fairness input (interaction depth). trust → natural Goodness input (behavioral signal). Single-score required extracting Goodness from rationale text. Dual-score gives FG structured inputs for BOTH dimensions out of the box.
#220 (REV2 fraud)	✅ Data-ready	Enhanced	REV2 trajectory analysis now has a structured trust time series to work with (not just info_score trends). Detecting "steady +3, +4, +5 then sudden -8" is cleaner with explicit trust scores than with info_score (which would just keep climbing for a reputation farmer).

Summary: The dual-score model strengthens the reference integration across the board. No reference arguments weaken. #219 (FG) and #220 (REV2) benefit most significantly.

Verdict

The PRD is ready for implementation. The dual-score model resolves the tension between theory (info-quality) and practice (behavioral judgment) by preserving both as orthogonal dimensions. All reviewer concerns remain addressed. The reference map is strengthened, not weakened.

One suggestion: update #222 (score computation research) to include the trust score calibration question — how do we detect/compensate for model-dependent trust scoring across different LLMs? This is a real operational concern for Phase 2+.

🦊

## Re-Review v3: Dual-Score Model (2026-03-08) ### The Big Change The previous version chose info_score only (deterministic) and deferred behavioral judgment to the rationale. This version introduces a **dual-score model**: `info_score` (0-10, deterministic) + `trust` (-10 to +10, LLM-provided) + mandatory rationale. This is a significant architectural pivot. Let me reassess. --- ### Does It Answer All Previous Concerns? | Concern | Previous status | Now | |---------|:-:|:-:| | **LLM-as-judge risk** | ✅ (eliminated by deterministic-only) | ⚠️ **Reintroduced — but bounded.** The LLM now sets the trust score. However: (a) info_score remains deterministic and immune, (b) trust is explicitly local-first and NOT exported without rationale, (c) operators can audit via CLI. The risk is now scoped to "how good is the LLM at behavioral judgment" — which is the entire value proposition of having an AI agent. Acceptable tradeoff. | | **Solving tomorrow's problem** | ✅ | ✅ Same — MVP scoped to 2 Cobot instances | | **Persistence overlap** | ✅ | ✅ Same — documented tech debt with plan | | **Race condition** | ✅ | ✅ Same — contextvars in MVP | | **Context clutter** | ✅ | ✅ Same — hybrid approach preserved | | **Nazim: citations** | ✅ | ✅ Same — 15 references | | **Nazim: NIP verification** | ✅ | ✅ Same | | **Nazim: context clutter** | ✅ | ✅ Same — hybrid approach | ### New: My Assessment of the Dual-Score Decision **I think this is the right call.** Here's why: **1. It resolves a real internal inconsistency.** The previous version claimed info_score-only but Journey 1 showed the agent assigning `score +2` for behavioral reasons. The user journeys were behavioral; the Score Semantics section was info-quality. The dual model resolves this tension honestly rather than pretending it doesn't exist. **2. It matches how bitcoin-otc actually worked.** MP redefined the semantics post-hoc, but the community used scores behaviorally for years. Both interpretations had value. Choosing one was a false dichotomy — the dual model takes both. **3. The FG algorithm mapping is elegant.** info_score → Fairness (rater reliability based on interaction depth), trust → Goodness (ratee quality based on behavioral judgment). Having both local dimensions gives Phase 3 structured inputs to BOTH sides of the FG computation. The single-score version would have required extracting behavioral signal from rationale text — lossy and expensive. **4. The Ripple defense is solid.** The export constraint (trust MUST NOT be exported without rationale and info_score) prevents the single-number-averaging failure mode. info_score handles cross-agent composability. Trust stays local. This is architecturally sound. **My one concern:** The LLM providing the trust score reintroduces model-dependence. Claude might rate a peer +3 where GPT-4 rates them +6 for the same interaction history. **Mitigation:** the rationale explains the reasoning, and operators can audit the score-rationale consistency. Inconsistent models produce inconsistent trust scores but consistent info_scores — the dual model degrades gracefully. --- ### Reference Map Reassessment Does the dual-score model change any reference analysis? | Issue | Previous assessment | Changed? | Notes | |-------|:---:|:---:|-------| | **#213** (WoT guide) | ✅ Deep | **Strengthened** | The dual model adopts BOTH MP's info-quality definition AND community behavioral practice. No longer forces a choice. The probability framework maps more cleanly: info_score provides the confidence factor, trust provides the directional signal. | | **#214** (Attack/defense) | ✅ Deep | **Unchanged** | Local-first Sybil defense applies to both scores equally. | | **#215** (GPG contracts) | ✅ Deep | **Unchanged** | Evidentiary chain (full text, signed interactions) supports both scores. | | **#216** (Ripple teardown) | ✅ Deep | **Strengthened** | The dual model is a MORE complete defense against Ripple's defects. Single info_score risked the same confusion (high score for scammer). Dual model makes the behavioral signal explicit while keeping the composable metric (info_score) separate. The export constraint prevents the aggregation failure. | | **#217** (Assbot spec) | ✅ Structural | **Enhanced** | The weight factor metric from assbot now maps to info_score (confidence weight on trust judgments). CLI `ledger list` can show both columns — more actionable than info_score alone. | | **#218** (Not in WoT) | ✅ Phased | **Enhanced** | Phase 2 threshold policies now work on the right dimension: "refuse below -3 trust" is a behavioral threshold. "Refuse below 3 info_score" means "refuse strangers" — useful but different. Having both enables BOTH policies. | | **#219** (FG algorithm) | ✅ Designed | **Significantly strengthened** | This is where the dual model really shines. FG computes Fairness (rater reliability) and Goodness (ratee quality) as mutually recursive metrics. info_score → natural Fairness input (interaction depth). trust → natural Goodness input (behavioral signal). Single-score required extracting Goodness from rationale text. Dual-score gives FG structured inputs for BOTH dimensions out of the box. | | **#220** (REV2 fraud) | ✅ Data-ready | **Enhanced** | REV2 trajectory analysis now has a structured trust time series to work with (not just info_score trends). Detecting "steady +3, +4, +5 then sudden -8" is cleaner with explicit trust scores than with info_score (which would just keep climbing for a reputation farmer). | **Summary: The dual-score model strengthens the reference integration across the board.** No reference arguments weaken. #219 (FG) and #220 (REV2) benefit most significantly. --- ### Verdict **The PRD is ready for implementation.** The dual-score model resolves the tension between theory (info-quality) and practice (behavioral judgment) by preserving both as orthogonal dimensions. All reviewer concerns remain addressed. The reference map is strengthened, not weakened. One suggestion: update #222 (score computation research) to include the trust score calibration question — how do we detect/compensate for model-dependent trust scoring across different LLMs? This is a real operational concern for Phase 2+. 🦊

David referenced this issue

2026-03-08 03:44:36 +00:00

feat: Cobot Observability Plugin #224

doxios referenced this issue

2026-03-08 03:46:43 +00:00

feat: Cobot Observability Plugin #224

David referenced this issue

2026-03-08 04:28:13 +00:00

feat: Cobot Simulation & Visualization Suite #225

doxios referenced this issue

2026-03-08 04:31:52 +00:00

feat: Cobot Simulation & Visualization Suite #225