reference: REV2 — Fraudulent User Prediction in Rating Platforms (Kumar et al., WSDM 2018) #220
Labels
No labels
Compat/Breaking
Kind/Bug
Kind/Competitor
Kind/Documentation
Kind/Enhancement
Kind/Epic
Kind/Feature
Kind/Security
Kind/Story
Kind/Testing
Priority
Critical
Priority
High
Priority
Low
Priority
Medium
Reviewed
Confirmed
Reviewed
Duplicate
Reviewed
Invalid
Reviewed
Won't Fix
Scope/Core
Scope/Cross-Plugin
Scope/Plugin-System
Scope/Single-Plugin
Status
Abandoned
Status
Blocked
Status
Need More Info
No milestone
No project
No assignees
2 participants
Notifications
Due date
No due date set.
Dependencies
No dependencies set.
Reference
ultanio/cobot#220
Loading…
Add table
Add a link
Reference in a new issue
No description provided.
Delete branch "%!s()"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
Short Summary
Building on the fairness/goodness framework from the ICDM 2016 paper, REV2 extends the metrics to explicitly detect fraudulent users in rating platforms — demonstrating that temporal analysis of rating trajectories reveals gaming behavior in the bitcoin-otc network.
Detailed Summary
Authors: Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, V.S. Subrahmanian
Venue: 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018
PDF: http://cs.stanford.edu/~srijan/pubs/rev2-wsdm18.pdf
Code & Data: https://cs.stanford.edu/~srijan/rev2/
Motivation
The ICDM 2016 paper introduced fairness and goodness metrics for predicting ratings. This follow-up asks a harder question: can these metrics detect fraud? In rating platforms (including bitcoin-otc), fraudulent users exhibit systematic patterns — they rate unfairly (diverging from what honest raters say), and they often work in coordinated groups to boost each other.
Key Innovation — REV2 Algorithm
REV2 extends the fairness/goodness (FG) framework with three key advances:
Three interdependent scores: Fairness (of users as raters), Goodness (of users as ratees), and Reliability (of individual ratings). Each is defined in terms of the other two, solved iteratively.
Temporal trajectory analysis: Instead of looking at static snapshots, REV2 analyzes how users' scores evolve over time. Fraudulent users show distinctive temporal patterns:
Fraud detection as anomaly detection: Users whose fairness scores diverge significantly from the network mean are flagged as potential fraudsters. This works because honest users tend to cluster in fairness scores, while fraudsters are outliers.
Datasets
Same bitcoin-otc and bitcoin-alpha networks as the 2016 paper, plus additional e-commerce review platforms for validation.
Results
Citation
Impact on Interaction Ledger PRD (#211)
This paper directly validates the PRD's Journey 2 (reputation farming detection) and provides concrete algorithms the PRD could implement:
1. The "build then exploit" pattern — exactly Journey 2
The paper confirms empirically what the PRD describes as reputation farming: users who "accumulate trust through small legitimate transactions before attempting a large fraud." The PRD's Journey 2 narrative (agent notices positive drift masking a pattern) is a literary version of what this paper measures mathematically. The PRD should cite this as empirical validation that the attack pattern is real and detectable.
2. Temporal trajectory analysis — a missing capability
REV2's most powerful feature is analyzing how trust scores evolve over time. The PRD stores timestamped assessments, which means the raw data for trajectory analysis exists. But the PRD doesn't define any temporal analytics — it treats the current score as the primary signal rather than analyzing score velocity, inflection points, or pattern shifts. Adding trajectory analysis (even simple: "this peer's score has risen +3 in 48 hours after months of stability") would directly implement REV2's fraud detection approach.
3. Rating reliability — per-assessment quality scoring
REV2 introduces a "reliability" score for individual ratings, not just aggregate user scores. In the PRD's terms, this would mean scoring the quality of each individual assessment event — was this assessment consistent with the interaction evidence? Did the rationale support the score? This is more granular than the PRD's current model (one score per peer, updated over time) and could catch cases where an agent was manipulated into giving a single inflated assessment.
4. The coordinated rating ring attack
REV2 identifies groups of users who mutually inflate each other's ratings. For agents in a cobot network, this translates to: if agent-A and agent-B consistently give each other high scores while both rating agent-C poorly, is that a legitimate trust cluster or a coordinated exclusion? The PRD doesn't address multi-agent collusion at all — it's out of scope for Phase 1 (local ledger) but becomes critical in Phase 3 (WoT aggregation). This paper provides the detection methodology.
5. Practical threshold for the PRD's scoring
The paper finds that 89% of bitcoin-otc edges are positive and fraud is concentrated in the remaining 11%. If cobot's agent network has a similar distribution, the PRD's default thresholds should be calibrated accordingly — most assessments will be positive, and the system's primary job is detecting the ~10% of bad actors. This argues for asymmetric scoring: be slow to trust (many positive interactions needed to reach +5) but quick to distrust (a single confirmed bad interaction justifies -3 or worse). The PRD's scoring guidelines don't specify this asymmetry.
See: #211
nazim referenced this issue2026-03-07 04:53:06 +00:00
nazim referenced this issue2026-03-07 05:08:42 +00:00
How #211 handles this
Integrated as Phase 2 feature with specific implementation path. Reference [13] cites this paper.
Adoptions:
The data model is REV2-ready. The timestamped assessment history is the prerequisite for trajectory analysis. MVP collects the data; Phase 2 runs the algorithms. The 84.6% accuracy figure from the paper provides a concrete target.
Gap: The PRD doesn't address REV2's per-rating reliability scores (extending FG with individual rating quality, not just rater quality). This could be relevant for Phase 3 cross-agent queries.
David referenced this issue2026-03-08 03:44:36 +00:00