reference: Edge Weight Prediction in Weighted Signed Networks (Kumar et al., ICDM 2016) #219
Labels
No labels
Compat/Breaking
Kind/Bug
Kind/Competitor
Kind/Documentation
Kind/Enhancement
Kind/Epic
Kind/Feature
Kind/Security
Kind/Story
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Scope/Core
Scope/Cross-Plugin
Scope/Plugin-System
Scope/Single-Plugin
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ultanio/cobot#219
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Short Summary
The first academic analysis of the bitcoin-otc trust network, introducing two mutually recursive metrics — fairness (how reliable a rater is) and goodness (how trustworthy a ratee is) — that outperform all prior methods for predicting trust scores between users.
Detailed Summary
Authors: Srijan Kumar, Francesca Spezzano, V.S. Subrahmanian, Christos Faloutsos
Venue: IEEE International Conference on Data Mining (ICDM), 2016
PDF: http://cs.stanford.edu/~srijan/pubs/wsn-icdm16.pdf
Code & Data: http://cs.umd.edu/~srijan/wsn/
Motivation
In networks where people rate each other (trust/distrust, like/dislike), can you predict what rating person A will give person B — even when that edge doesn't exist yet? This matters for fraud detection, recommendation systems, and moderation. The bitcoin-otc network (5,881 users, 35,592 ratings, scale -10 to +10) was the first publicly available weighted signed directed network, making it the ideal dataset.
Key Innovation — Fairness & Goodness (FG) Metrics
Two mutually recursive metrics for each node:
These are interdependent: you can't know if a rater is fair without knowing the true quality of what they rated, and vice versa. Solved with an iterative algorithm that provably converges to a unique solution in linear time.
Datasets
Results
Data Format
The bitcoin-otc dataset is a CSV:
SOURCE, TARGET, RATING, TIME— no notes field. The academic dataset stripped the free-text comments that the;;rateIRC command accepted, keeping only the numeric data.Citation
Impact on Interaction Ledger PRD (#211)
This paper provides mathematical validation for intuitions the PRD implements — but also exposes a significant gap:
1. Rater reliability matters — the PRD ignores it
The paper's core finding is that a rating's value depends on who gave it. A +8 from a fair rater (one whose ratings consistently correlate with ground truth) is worth more than a +8 from an unfair rater (one who gives everyone the same score). The PRD's assessment model treats all assessments equally — there's no concept of the assessing agent's own trustworthiness or rating reliability. In a single-agent local ledger this is fine (the agent is both rater and consumer). But the moment assessments are shared in Phase 3, rater fairness becomes critical. The FG algorithm provides a proven method for computing it.
2. The dataset proves "notes > numbers" by omission
The bitcoin-otc CSV contains only
SOURCE, TARGET, RATING, TIME— the researchers stripped the free-text notes from the;;ratecommand. Their models achieve good prediction accuracy using only numeric features. But the paper never claims to capture why someone was rated a certain way — only to predict what the rating will be. The PRD's mandatory rationale field captures exactly the information the academic dataset lost. This is a concrete argument for why the PRD's approach adds value beyond what the Stanford models can provide.3. The "goodness" metric is the weight factor formalized
The Assbot WoT spec (#217) defined a "weight factor" (rank by total trust received). This paper formalizes it as "goodness" — with the crucial addition that goodness is weighted by rater fairness, not just summed. The PRD's future WoT aggregation (Phase 3) should implement goodness-weighted scoring rather than raw averages, citing this paper.
4. 89% positive edges — implications for scoring
The bitcoin-otc network has 89% positive edges. If the PRD's assessment distribution is similar (most peers are fine, few are bad), the scoring system should be optimized for detecting the minority of bad actors, not for differentiating between good actors. The PRD's -10 to +10 scale mirrors bitcoin-otc exactly, which is good — but the default score of 0 for unknown peers is actually below the network mean (~+3 to +4 for known peers). This means the system is implicitly pessimistic about unknowns, which aligns with the #bitcoin-assets philosophy (#218) but should be documented as a deliberate choice.
5. Stanford SNAP dataset as a testing resource
The dataset (5,881 nodes, 35,592 edges) is freely available at https://snap.stanford.edu/data/soc-sign-bitcoin-otc.html. The PRD could use it to validate assessment algorithms, test scoring thresholds, or simulate reputation farming attacks on real-world trust graph topology.
See: #211
nazim referenced this issue2026-03-07 04:53:06 +00:00
nazim referenced this issue2026-03-07 05:08:42 +00:00
How #211 handles this
Flagged as Phase 3 NON-NEGOTIABLE requirement. Reference [12] cites this paper.
The PRD integrates the FG algorithm at the design level:
Three-layer model explicitly includes fairness: Score (deterministic) → Rationale (LLM) → Fairness (FG algorithm, Phase 3). The table in Score Semantics shows fairness as the third layer.
Phase 3 feature table: "Fairness-weighted aggregation (NON-NEGOTIABLE) — FG algorithm: weight incoming assessments by rater fairness. Naive averaging dramatically underperforms. A +7 from a fair rater is worth more than a +7 from an unfair rater."
L1/L2 walkthrough uses FG weighting: Appendix A demonstrates how Peer 4 (fairness 0.4) gets down-weighted vs Peer 1 (fairness 0.9).
Info-quality scoring chosen partly for FG compatibility: "The FG algorithm works better when consensus means 'do we agree on how well-known this peer is' (factual) rather than 'do we agree on how trustworthy' (values-laden)."
Gap the PRD acknowledges: "proves mathematically that rater reliability matters — a gap the PRD must address before Phase 3, since sharing assessments without weighting them by the assessing agent's own trustworthiness makes the system gameable." This is honest — the gap exists, it's scoped to Phase 3, and the requirement is non-negotiable. Correct sequencing.
David referenced this issue2026-03-08 03:44:36 +00:00