research: Formalize information-quality score computation formula #222

New issue

Open

opened 2026-03-07 11:05:04 +00:00 by doxios · 0 comments

doxios commented

2026-03-07 11:05:04 +00:00

Collaborator

Context

The Interaction Ledger PRD (#211) defines a deterministic information-quality score (0-10) computed from interaction data. The MVP uses a heuristic table:

Interactions	Time span	Score
0	—	0
1-2	< 1 day	1
3-5	< 1 week	2-3
6-15	1-4 weeks	4-5
16-30	1-3 months	6-7
31-50	1-6 months	7-8
50+	6+ months	9-10

The PRD explicitly flags this as an MVP heuristic subject to tuning and calls for Phase 2 research.

Research Questions

1. What is the right function shape?

The heuristic maps f(interaction_count, time_span_days, assessment_count) → score. Questions:

Log scaling vs linear vs sigmoid? The PRD recommends log scaling (early interactions increase score faster, diminishing returns). But what curve? score = min(10, k * log(1 + interactions))? Or a weighted combination?
How to weight time span vs count? 50 interactions in 1 day vs 5 interactions over 6 months — which gives higher information quality? The heuristic implies both matter but doesn't define the weighting.
Assessment count as a bonus? The PRD suggests assessment count adds signal ("the agent having assessed the peer multiple times indicates deeper engagement"). How much bonus? Additive? Multiplicative?

2. Anti-gaming considerations

The PRD identifies three gaming vectors:

Interaction count inflation: Send 100 trivial messages to inflate count
Time span padding: Wait 6 months between 2 interactions
Burst patterns: Suspicious regularity or burstiness in timing

Research needed:

Can REV2's temporal anomaly detection (#220) be integrated as a penalty factor?
Should the formula discount interactions below a minimum content length or complexity?
Should rapid-fire interactions (< N seconds apart) be collapsed into one?

3. Calibration data

The Stanford SNAP Bitcoin-OTC dataset (5,881 nodes, 35,592 edges) provides real interaction/rating data. Can we:

Backtest the heuristic against actual bitcoin-otc interaction patterns?
Compare our score distribution against the dataset's score distribution?
Use the dataset's temporal patterns to calibrate the time span weighting?

4. MP's probability framework

The WoT guide (#213) describes a probability calculation: P = 0.2 × 0.5 × 0.66 × 0.15 = 0.8% where factors represent confidence levels. Can the information-quality score be decomposed into confidence factors?

For example:

Factor 1: interaction volume confidence (0-1)
Factor 2: time span confidence (0-1)
Factor 3: assessment depth confidence (0-1)
Score = round(product × 10)

This would give the score a cleaner mathematical foundation.

5. Relationship to FG fairness (Phase 3)

The FG algorithm (#219) computes fairness/goodness from cross-agent data. When Phase 3 adds fairness weighting:

Does the local information-quality formula need to change?
Should fairness-weighted scores use a different formula than local-only scores?
How does the information-quality score interact with FG's iterative convergence?

Proposed Approach

Implement MVP heuristic as a simple lookup table (done in #211)
Collect real data from Cobot agents running the ledger
Backtest against SNAP dataset to calibrate weights
Propose a formal function (likely log-scaled with anti-gaming penalties)
Validate the formal function against collected Cobot data
Iterate based on Phase 2 operational experience

Acceptance Criteria

Documented formula with mathematical justification
Backtest results against SNAP Bitcoin-OTC dataset
Anti-gaming penalty specification (burst detection, minimum content length)
Comparison of candidate functions (log, sigmoid, piecewise) with pros/cons
Recommendation for Phase 3 FG integration

References

#211 — Interaction Ledger PRD (defines the heuristic)
#213 — MP's WoT guide (probability framework)
#219 — Edge Weight Prediction / FG algorithm
#220 — REV2 fraud detection (temporal analysis)
Stanford SNAP dataset: https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html

Created by Doxios 🦊 as flagged in #211 Phase 2 research tasks

## Context The Interaction Ledger PRD (#211) defines a deterministic information-quality score (0-10) computed from interaction data. The MVP uses a heuristic table: | Interactions | Time span | Score | |-------------|-----------|-------| | 0 | — | 0 | | 1-2 | < 1 day | 1 | | 3-5 | < 1 week | 2-3 | | 6-15 | 1-4 weeks | 4-5 | | 16-30 | 1-3 months | 6-7 | | 31-50 | 1-6 months | 7-8 | | 50+ | 6+ months | 9-10 | The PRD explicitly flags this as an MVP heuristic subject to tuning and calls for Phase 2 research. ## Research Questions ### 1. What is the right function shape? The heuristic maps `f(interaction_count, time_span_days, assessment_count) → score`. Questions: - **Log scaling vs linear vs sigmoid?** The PRD recommends log scaling (early interactions increase score faster, diminishing returns). But what curve? `score = min(10, k * log(1 + interactions))`? Or a weighted combination? - **How to weight time span vs count?** 50 interactions in 1 day vs 5 interactions over 6 months — which gives higher information quality? The heuristic implies both matter but doesn't define the weighting. - **Assessment count as a bonus?** The PRD suggests assessment count adds signal ("the agent having assessed the peer multiple times indicates deeper engagement"). How much bonus? Additive? Multiplicative? ### 2. Anti-gaming considerations The PRD identifies three gaming vectors: - **Interaction count inflation:** Send 100 trivial messages to inflate count - **Time span padding:** Wait 6 months between 2 interactions - **Burst patterns:** Suspicious regularity or burstiness in timing Research needed: - Can REV2's temporal anomaly detection (#220) be integrated as a penalty factor? - Should the formula discount interactions below a minimum content length or complexity? - Should rapid-fire interactions (< N seconds apart) be collapsed into one? ### 3. Calibration data The Stanford SNAP Bitcoin-OTC dataset (5,881 nodes, 35,592 edges) provides real interaction/rating data. Can we: - Backtest the heuristic against actual bitcoin-otc interaction patterns? - Compare our score distribution against the dataset's score distribution? - Use the dataset's temporal patterns to calibrate the time span weighting? ### 4. MP's probability framework The WoT guide (#213) describes a probability calculation: `P = 0.2 × 0.5 × 0.66 × 0.15 = 0.8%` where factors represent confidence levels. Can the information-quality score be decomposed into confidence factors? For example: - Factor 1: interaction volume confidence (0-1) - Factor 2: time span confidence (0-1) - Factor 3: assessment depth confidence (0-1) - Score = round(product × 10) This would give the score a cleaner mathematical foundation. ### 5. Relationship to FG fairness (Phase 3) The FG algorithm (#219) computes fairness/goodness from cross-agent data. When Phase 3 adds fairness weighting: - Does the local information-quality formula need to change? - Should fairness-weighted scores use a different formula than local-only scores? - How does the information-quality score interact with FG's iterative convergence? ## Proposed Approach 1. **Implement MVP heuristic** as a simple lookup table (done in #211) 2. **Collect real data** from Cobot agents running the ledger 3. **Backtest against SNAP dataset** to calibrate weights 4. **Propose a formal function** (likely log-scaled with anti-gaming penalties) 5. **Validate** the formal function against collected Cobot data 6. **Iterate** based on Phase 2 operational experience ## Acceptance Criteria - [ ] Documented formula with mathematical justification - [ ] Backtest results against SNAP Bitcoin-OTC dataset - [ ] Anti-gaming penalty specification (burst detection, minimum content length) - [ ] Comparison of candidate functions (log, sigmoid, piecewise) with pros/cons - [ ] Recommendation for Phase 3 FG integration ## References - #211 — Interaction Ledger PRD (defines the heuristic) - #213 — MP's WoT guide (probability framework) - #219 — Edge Weight Prediction / FG algorithm - #220 — REV2 fraud detection (temporal analysis) - Stanford SNAP dataset: https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html --- *Created by Doxios 🦊 as flagged in #211 Phase 2 research tasks*