Skip to main content

Datasets

Standard Dataset

Amazon

Citation Author(s):
Han geng
Submitted by:
fu shuo
Last updated:
DOI:
10.21227/a1xp-yx25
2 views
Categories:
Keywords:
No Ratings Yet

Abstract

Graph Neural Networks (GNNs) have become the predominant approach for graph fraud detection due to their intrinsic capability to handle graph-structured data and effectively capture complex relational patterns in fraudulent behaviors. However, existing GNN-based graph fraud detection models face limitations: homophily-based models struggle with handling heterogeneous relationships in fraud graphs, while heterophily-based models typically model only a single attribute- or structural-space, leading to constrained detection performance. To address these issues, this paper introduces DualH-FDNet, a semi-supervised graph fraud detection model based on dual-space heterogeneous relation analysis. This model represents user relationships as multi-relational heterogeneous directed graphs and employs a multi-layer graph convolutional architecture. Each convolutional layer consists of three modules: (1) Heterogeneity Learning Module: Utilize the label information of labeled nodes in relational subgraphs to learn heterogeneity separately in the attribute-space and structural-space, and achieve feature interaction of dual-space heterogeneity through a weighted fusion strategy. (2) Cross-Space Graph Aggregation Module: It computes attention weights based on fused heterophily representations and updates node representations via multi-relational graph aggregation. (3) Prototype-Guided Classification Module: It constructs category prototypes using labeled node representations and labels, guiding the classification of unlabeled nodes through prototype learning. Additionally, to tackle the challenges of scarce labeled data and label imbalance, the model utilizes balanced sampling strategies for semi-supervised training. Experimental results show that on the YelpChi and Amazon datasets, DualH-FDNet improves Recall by 0.9626% and 0.6444%, respectively, and AUC by 0.8594% and 0.1479% compared to the best-performing baseline models among nine comparative models. This study offers a novel solution for fraud detection in complex heterogeneous graph environments. The code and data are available at https://github.com/AyomF/DualH-FDNet.

Instructions:

The Amazon dataset contains user review data for musical instruments, with users labeled as "1" (fraudulent) or "0" (benign). The relationship types include: U-P-U (User-Product-User), which refers to users who reviewed the same product; U-S-U (User-Star-User), which refers to users who gave the same star rating to the same product within a week; and U-V-U (User-Text Similarity-User), which refers to users whose review texts are in the top 5% of similarity rankings. The dataset comprises 11,944 nodes, with fraudulent nodes accounting for 6.87%.

Dataset Files

Files have not been uploaded for this dataset