reference: REV2 — Fraudulent User Prediction in Rating Platforms (Kumar et al., WSDM 2018) #220

Open
opened 2026-03-07 03:20:52 +00:00 by nazim · 2 comments
Contributor

Short Summary

Building on the fairness/goodness framework from the ICDM 2016 paper, REV2 extends the metrics to explicitly detect fraudulent users in rating platforms — demonstrating that temporal analysis of rating trajectories reveals gaming behavior in the bitcoin-otc network.

Detailed Summary

Authors: Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, V.S. Subrahmanian
Venue: 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018
PDF: http://cs.stanford.edu/~srijan/pubs/rev2-wsdm18.pdf
Code & Data: https://cs.stanford.edu/~srijan/rev2/

Motivation

The ICDM 2016 paper introduced fairness and goodness metrics for predicting ratings. This follow-up asks a harder question: can these metrics detect fraud? In rating platforms (including bitcoin-otc), fraudulent users exhibit systematic patterns — they rate unfairly (diverging from what honest raters say), and they often work in coordinated groups to boost each other.

Key Innovation — REV2 Algorithm

REV2 extends the fairness/goodness (FG) framework with three key advances:

  1. Three interdependent scores: Fairness (of users as raters), Goodness (of users as ratees), and Reliability (of individual ratings). Each is defined in terms of the other two, solved iteratively.

  2. Temporal trajectory analysis: Instead of looking at static snapshots, REV2 analyzes how users' scores evolve over time. Fraudulent users show distinctive temporal patterns:

    • Initial period of legitimate-seeming activity (building reputation)
    • Sudden shift to exploitative behavior (cashing in on built reputation)
    • Coordinated bursts of mutual positive ratings (rating rings)
  3. Fraud detection as anomaly detection: Users whose fairness scores diverge significantly from the network mean are flagged as potential fraudsters. This works because honest users tend to cluster in fairness scores, while fraudsters are outliers.

Datasets

Same bitcoin-otc and bitcoin-alpha networks as the 2016 paper, plus additional e-commerce review platforms for validation.

Results

  • REV2 outperformed prior fraud detection methods across all tested networks
  • Confirmed that bitcoin-otc contains identifiable fraud patterns even in a relatively high-trust community (89% positive edges)
  • Temporal trajectories revealed the "build then exploit" pattern — users who accumulate trust through small legitimate transactions before attempting a large fraud
  • Rating reliability scores identified individual ratings that were likely fraudulent even when the rater's overall fairness was moderate

Citation

@inproceedings{kumar2018rev2,
  title={Rev2: Fraudulent user prediction in rating platforms},
  author={Kumar, Srijan and Hooi, Bryan and Makhija, Disha and Kumar, Mohit and Faloutsos, Christos and Subrahmanian, VS},
  booktitle={Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining},
  pages={333--341},
  year={2018},
  organization={ACM}
}
## Short Summary Building on the fairness/goodness framework from the ICDM 2016 paper, REV2 extends the metrics to explicitly **detect fraudulent users** in rating platforms — demonstrating that temporal analysis of rating trajectories reveals gaming behavior in the bitcoin-otc network. ## Detailed Summary **Authors:** Srijan Kumar, Bryan Hooi, Disha Makhija, Mohit Kumar, Christos Faloutsos, V.S. Subrahmanian **Venue:** 11th ACM International Conference on Web Search and Data Mining (WSDM), 2018 **PDF:** http://cs.stanford.edu/~srijan/pubs/rev2-wsdm18.pdf **Code & Data:** https://cs.stanford.edu/~srijan/rev2/ ### Motivation The ICDM 2016 paper introduced fairness and goodness metrics for *predicting ratings*. This follow-up asks a harder question: can these metrics *detect fraud*? In rating platforms (including bitcoin-otc), fraudulent users exhibit systematic patterns — they rate unfairly (diverging from what honest raters say), and they often work in coordinated groups to boost each other. ### Key Innovation — REV2 Algorithm REV2 extends the fairness/goodness (FG) framework with three key advances: 1. **Three interdependent scores:** Fairness (of users as raters), Goodness (of users as ratees), and Reliability (of individual ratings). Each is defined in terms of the other two, solved iteratively. 2. **Temporal trajectory analysis:** Instead of looking at static snapshots, REV2 analyzes how users' scores evolve over time. Fraudulent users show distinctive temporal patterns: - Initial period of legitimate-seeming activity (building reputation) - Sudden shift to exploitative behavior (cashing in on built reputation) - Coordinated bursts of mutual positive ratings (rating rings) 3. **Fraud detection as anomaly detection:** Users whose fairness scores diverge significantly from the network mean are flagged as potential fraudsters. This works because honest users tend to cluster in fairness scores, while fraudsters are outliers. ### Datasets Same bitcoin-otc and bitcoin-alpha networks as the 2016 paper, plus additional e-commerce review platforms for validation. ### Results - REV2 outperformed prior fraud detection methods across all tested networks - Confirmed that bitcoin-otc contains identifiable fraud patterns even in a relatively high-trust community (89% positive edges) - Temporal trajectories revealed the "build then exploit" pattern — users who accumulate trust through small legitimate transactions before attempting a large fraud - Rating reliability scores identified individual ratings that were likely fraudulent even when the rater's overall fairness was moderate ### Citation ```bibtex @inproceedings{kumar2018rev2, title={Rev2: Fraudulent user prediction in rating platforms}, author={Kumar, Srijan and Hooi, Bryan and Makhija, Disha and Kumar, Mohit and Faloutsos, Christos and Subrahmanian, VS}, booktitle={Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining}, pages={333--341}, year={2018}, organization={ACM} } ```
Author
Contributor

Impact on Interaction Ledger PRD (#211)

This paper directly validates the PRD's Journey 2 (reputation farming detection) and provides concrete algorithms the PRD could implement:

1. The "build then exploit" pattern — exactly Journey 2

The paper confirms empirically what the PRD describes as reputation farming: users who "accumulate trust through small legitimate transactions before attempting a large fraud." The PRD's Journey 2 narrative (agent notices positive drift masking a pattern) is a literary version of what this paper measures mathematically. The PRD should cite this as empirical validation that the attack pattern is real and detectable.

2. Temporal trajectory analysis — a missing capability

REV2's most powerful feature is analyzing how trust scores evolve over time. The PRD stores timestamped assessments, which means the raw data for trajectory analysis exists. But the PRD doesn't define any temporal analytics — it treats the current score as the primary signal rather than analyzing score velocity, inflection points, or pattern shifts. Adding trajectory analysis (even simple: "this peer's score has risen +3 in 48 hours after months of stability") would directly implement REV2's fraud detection approach.

3. Rating reliability — per-assessment quality scoring

REV2 introduces a "reliability" score for individual ratings, not just aggregate user scores. In the PRD's terms, this would mean scoring the quality of each individual assessment event — was this assessment consistent with the interaction evidence? Did the rationale support the score? This is more granular than the PRD's current model (one score per peer, updated over time) and could catch cases where an agent was manipulated into giving a single inflated assessment.

4. The coordinated rating ring attack

REV2 identifies groups of users who mutually inflate each other's ratings. For agents in a cobot network, this translates to: if agent-A and agent-B consistently give each other high scores while both rating agent-C poorly, is that a legitimate trust cluster or a coordinated exclusion? The PRD doesn't address multi-agent collusion at all — it's out of scope for Phase 1 (local ledger) but becomes critical in Phase 3 (WoT aggregation). This paper provides the detection methodology.

5. Practical threshold for the PRD's scoring

The paper finds that 89% of bitcoin-otc edges are positive and fraud is concentrated in the remaining 11%. If cobot's agent network has a similar distribution, the PRD's default thresholds should be calibrated accordingly — most assessments will be positive, and the system's primary job is detecting the ~10% of bad actors. This argues for asymmetric scoring: be slow to trust (many positive interactions needed to reach +5) but quick to distrust (a single confirmed bad interaction justifies -3 or worse). The PRD's scoring guidelines don't specify this asymmetry.

See: #211

### Impact on Interaction Ledger PRD (#211) This paper directly validates the PRD's Journey 2 (reputation farming detection) and provides concrete algorithms the PRD could implement: #### 1. The "build then exploit" pattern — exactly Journey 2 The paper confirms empirically what the PRD describes as reputation farming: users who "accumulate trust through small legitimate transactions before attempting a large fraud." The PRD's Journey 2 narrative (agent notices positive drift masking a pattern) is a literary version of what this paper measures mathematically. The PRD should cite this as empirical validation that the attack pattern is real and detectable. #### 2. Temporal trajectory analysis — a missing capability REV2's most powerful feature is analyzing how trust scores *evolve over time*. The PRD stores timestamped assessments, which means the raw data for trajectory analysis exists. But the PRD doesn't define any temporal analytics — it treats the current score as the primary signal rather than analyzing score velocity, inflection points, or pattern shifts. Adding trajectory analysis (even simple: "this peer's score has risen +3 in 48 hours after months of stability") would directly implement REV2's fraud detection approach. #### 3. Rating reliability — per-assessment quality scoring REV2 introduces a "reliability" score for individual ratings, not just aggregate user scores. In the PRD's terms, this would mean scoring the quality of each individual assessment event — was this assessment consistent with the interaction evidence? Did the rationale support the score? This is more granular than the PRD's current model (one score per peer, updated over time) and could catch cases where an agent was manipulated into giving a single inflated assessment. #### 4. The coordinated rating ring attack REV2 identifies groups of users who mutually inflate each other's ratings. For agents in a cobot network, this translates to: if agent-A and agent-B consistently give each other high scores while both rating agent-C poorly, is that a legitimate trust cluster or a coordinated exclusion? The PRD doesn't address multi-agent collusion at all — it's out of scope for Phase 1 (local ledger) but becomes critical in Phase 3 (WoT aggregation). This paper provides the detection methodology. #### 5. Practical threshold for the PRD's scoring The paper finds that 89% of bitcoin-otc edges are positive and fraud is concentrated in the remaining 11%. If cobot's agent network has a similar distribution, the PRD's default thresholds should be calibrated accordingly — most assessments will be positive, and the system's primary job is detecting the ~10% of bad actors. This argues for asymmetric scoring: be slow to trust (many positive interactions needed to reach +5) but quick to distrust (a single confirmed bad interaction justifies -3 or worse). The PRD's scoring guidelines don't specify this asymmetry. See: #211
Collaborator

How #211 handles this

Integrated as Phase 2 feature with specific implementation path. Reference [13] cites this paper.

Adoptions:

  • Journey 2 (Reputation Farmer) is the REV2 scenario. The PRD's user journey matches the paper's empirical finding: "steady positive scores followed by a sharp negative" trajectory.
  • Phase 2 feature table: "REV2 trajectory analysis — track assessment score velocity per peer, flag 'build then exploit' trajectories. Empirically validated at 84.6% accuracy on Flipkart (127/150 flagged users confirmed fraudulent)."
  • Assessment time series data model: The schema stores assessments with timestamps, which is exactly what REV2 needs for temporal trajectory analysis.
  • Score computation research (Phase 2): The PRD flags investigating whether "REV2's behavioral anomaly detection can be integrated as a penalty (e.g., if interaction patterns are 'bursty' or suspiciously regular, discount the score)."

The data model is REV2-ready. The timestamped assessment history is the prerequisite for trajectory analysis. MVP collects the data; Phase 2 runs the algorithms. The 84.6% accuracy figure from the paper provides a concrete target.

Gap: The PRD doesn't address REV2's per-rating reliability scores (extending FG with individual rating quality, not just rater quality). This could be relevant for Phase 3 cross-agent queries.

## How #211 handles this **Integrated as Phase 2 feature with specific implementation path.** Reference [13] cites this paper. Adoptions: - **Journey 2 (Reputation Farmer)** is the REV2 scenario. The PRD's user journey matches the paper's empirical finding: "steady positive scores followed by a sharp negative" trajectory. - **Phase 2 feature table:** "REV2 trajectory analysis — track assessment score velocity per peer, flag 'build then exploit' trajectories. Empirically validated at 84.6% accuracy on Flipkart (127/150 flagged users confirmed fraudulent)." - **Assessment time series data model:** The schema stores assessments with timestamps, which is exactly what REV2 needs for temporal trajectory analysis. - **Score computation research (Phase 2):** The PRD flags investigating whether "REV2's behavioral anomaly detection can be integrated as a penalty (e.g., if interaction patterns are 'bursty' or suspiciously regular, discount the score)." **The data model is REV2-ready.** The timestamped assessment history is the prerequisite for trajectory analysis. MVP collects the data; Phase 2 runs the algorithms. The 84.6% accuracy figure from the paper provides a concrete target. **Gap:** The PRD doesn't address REV2's per-rating reliability scores (extending FG with individual rating quality, not just rater quality). This could be relevant for Phase 3 cross-agent queries.
Sign in to join this conversation.
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
ultanio/cobot#220
No description provided.