research: Formalize information-quality score computation formula #222

Open
opened 2026-03-07 11:05:04 +00:00 by doxios · 0 comments
Collaborator

Context

The Interaction Ledger PRD (#211) defines a deterministic information-quality score (0-10) computed from interaction data. The MVP uses a heuristic table:

Interactions Time span Score
0 0
1-2 < 1 day 1
3-5 < 1 week 2-3
6-15 1-4 weeks 4-5
16-30 1-3 months 6-7
31-50 1-6 months 7-8
50+ 6+ months 9-10

The PRD explicitly flags this as an MVP heuristic subject to tuning and calls for Phase 2 research.

Research Questions

1. What is the right function shape?

The heuristic maps f(interaction_count, time_span_days, assessment_count) → score. Questions:

  • Log scaling vs linear vs sigmoid? The PRD recommends log scaling (early interactions increase score faster, diminishing returns). But what curve? score = min(10, k * log(1 + interactions))? Or a weighted combination?
  • How to weight time span vs count? 50 interactions in 1 day vs 5 interactions over 6 months — which gives higher information quality? The heuristic implies both matter but doesn't define the weighting.
  • Assessment count as a bonus? The PRD suggests assessment count adds signal ("the agent having assessed the peer multiple times indicates deeper engagement"). How much bonus? Additive? Multiplicative?

2. Anti-gaming considerations

The PRD identifies three gaming vectors:

  • Interaction count inflation: Send 100 trivial messages to inflate count
  • Time span padding: Wait 6 months between 2 interactions
  • Burst patterns: Suspicious regularity or burstiness in timing

Research needed:

  • Can REV2's temporal anomaly detection (#220) be integrated as a penalty factor?
  • Should the formula discount interactions below a minimum content length or complexity?
  • Should rapid-fire interactions (< N seconds apart) be collapsed into one?

3. Calibration data

The Stanford SNAP Bitcoin-OTC dataset (5,881 nodes, 35,592 edges) provides real interaction/rating data. Can we:

  • Backtest the heuristic against actual bitcoin-otc interaction patterns?
  • Compare our score distribution against the dataset's score distribution?
  • Use the dataset's temporal patterns to calibrate the time span weighting?

4. MP's probability framework

The WoT guide (#213) describes a probability calculation: P = 0.2 × 0.5 × 0.66 × 0.15 = 0.8% where factors represent confidence levels. Can the information-quality score be decomposed into confidence factors?

For example:

  • Factor 1: interaction volume confidence (0-1)
  • Factor 2: time span confidence (0-1)
  • Factor 3: assessment depth confidence (0-1)
  • Score = round(product × 10)

This would give the score a cleaner mathematical foundation.

5. Relationship to FG fairness (Phase 3)

The FG algorithm (#219) computes fairness/goodness from cross-agent data. When Phase 3 adds fairness weighting:

  • Does the local information-quality formula need to change?
  • Should fairness-weighted scores use a different formula than local-only scores?
  • How does the information-quality score interact with FG's iterative convergence?

Proposed Approach

  1. Implement MVP heuristic as a simple lookup table (done in #211)
  2. Collect real data from Cobot agents running the ledger
  3. Backtest against SNAP dataset to calibrate weights
  4. Propose a formal function (likely log-scaled with anti-gaming penalties)
  5. Validate the formal function against collected Cobot data
  6. Iterate based on Phase 2 operational experience

Acceptance Criteria

  • Documented formula with mathematical justification
  • Backtest results against SNAP Bitcoin-OTC dataset
  • Anti-gaming penalty specification (burst detection, minimum content length)
  • Comparison of candidate functions (log, sigmoid, piecewise) with pros/cons
  • Recommendation for Phase 3 FG integration

References


Created by Doxios 🦊 as flagged in #211 Phase 2 research tasks

## Context The Interaction Ledger PRD (#211) defines a deterministic information-quality score (0-10) computed from interaction data. The MVP uses a heuristic table: | Interactions | Time span | Score | |-------------|-----------|-------| | 0 | — | 0 | | 1-2 | < 1 day | 1 | | 3-5 | < 1 week | 2-3 | | 6-15 | 1-4 weeks | 4-5 | | 16-30 | 1-3 months | 6-7 | | 31-50 | 1-6 months | 7-8 | | 50+ | 6+ months | 9-10 | The PRD explicitly flags this as an MVP heuristic subject to tuning and calls for Phase 2 research. ## Research Questions ### 1. What is the right function shape? The heuristic maps `f(interaction_count, time_span_days, assessment_count) → score`. Questions: - **Log scaling vs linear vs sigmoid?** The PRD recommends log scaling (early interactions increase score faster, diminishing returns). But what curve? `score = min(10, k * log(1 + interactions))`? Or a weighted combination? - **How to weight time span vs count?** 50 interactions in 1 day vs 5 interactions over 6 months — which gives higher information quality? The heuristic implies both matter but doesn't define the weighting. - **Assessment count as a bonus?** The PRD suggests assessment count adds signal ("the agent having assessed the peer multiple times indicates deeper engagement"). How much bonus? Additive? Multiplicative? ### 2. Anti-gaming considerations The PRD identifies three gaming vectors: - **Interaction count inflation:** Send 100 trivial messages to inflate count - **Time span padding:** Wait 6 months between 2 interactions - **Burst patterns:** Suspicious regularity or burstiness in timing Research needed: - Can REV2's temporal anomaly detection (#220) be integrated as a penalty factor? - Should the formula discount interactions below a minimum content length or complexity? - Should rapid-fire interactions (< N seconds apart) be collapsed into one? ### 3. Calibration data The Stanford SNAP Bitcoin-OTC dataset (5,881 nodes, 35,592 edges) provides real interaction/rating data. Can we: - Backtest the heuristic against actual bitcoin-otc interaction patterns? - Compare our score distribution against the dataset's score distribution? - Use the dataset's temporal patterns to calibrate the time span weighting? ### 4. MP's probability framework The WoT guide (#213) describes a probability calculation: `P = 0.2 × 0.5 × 0.66 × 0.15 = 0.8%` where factors represent confidence levels. Can the information-quality score be decomposed into confidence factors? For example: - Factor 1: interaction volume confidence (0-1) - Factor 2: time span confidence (0-1) - Factor 3: assessment depth confidence (0-1) - Score = round(product × 10) This would give the score a cleaner mathematical foundation. ### 5. Relationship to FG fairness (Phase 3) The FG algorithm (#219) computes fairness/goodness from cross-agent data. When Phase 3 adds fairness weighting: - Does the local information-quality formula need to change? - Should fairness-weighted scores use a different formula than local-only scores? - How does the information-quality score interact with FG's iterative convergence? ## Proposed Approach 1. **Implement MVP heuristic** as a simple lookup table (done in #211) 2. **Collect real data** from Cobot agents running the ledger 3. **Backtest against SNAP dataset** to calibrate weights 4. **Propose a formal function** (likely log-scaled with anti-gaming penalties) 5. **Validate** the formal function against collected Cobot data 6. **Iterate** based on Phase 2 operational experience ## Acceptance Criteria - [ ] Documented formula with mathematical justification - [ ] Backtest results against SNAP Bitcoin-OTC dataset - [ ] Anti-gaming penalty specification (burst detection, minimum content length) - [ ] Comparison of candidate functions (log, sigmoid, piecewise) with pros/cons - [ ] Recommendation for Phase 3 FG integration ## References - #211 — Interaction Ledger PRD (defines the heuristic) - #213 — MP's WoT guide (probability framework) - #219 — Edge Weight Prediction / FG algorithm - #220 — REV2 fraud detection (temporal analysis) - Stanford SNAP dataset: https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html --- *Created by Doxios 🦊 as flagged in #211 Phase 2 research tasks*
Sign in to join this conversation.
No milestone
No project
No assignees
1 participant
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ultanio/cobot#222
No description provided.