Amazon

- Citation Author(s):
-
Han geng
- Submitted by:
- fu shuo
- Last updated:
- DOI:
- 10.21227/a1xp-yx25
- Categories:
- Keywords:
Abstract
Graph Neural Networks (GNNs) have become the predominant approach for graph fraud detection due to their intrinsic capability to handle graph-structured data and effectively capture complex relational patterns in fraudulent behaviors. However, existing GNN-based graph fraud detection models face limitations: homophily-based models struggle with handling heterogeneous relationships in fraud graphs, while heterophily-based models typically model only a single attribute- or structural-space, leading to constrained detection performance. To address these issues, this paper introduces DualH-FDNet, a semi-supervised graph fraud detection model based on dual-space heterogeneous relation analysis. This model represents user relationships as multi-relational heterogeneous directed graphs and employs a multi-layer graph convolutional architecture. Each convolutional layer consists of three modules: (1) Heterogeneity Learning Module: Utilize the label information of labeled nodes in relational subgraphs to learn heterogeneity separately in the attribute-space and structural-space, and achieve feature interaction of dual-space heterogeneity through a weighted fusion strategy. (2) Cross-Space Graph Aggregation Module: It computes attention weights based on fused heterophily representations and updates node representations via multi-relational graph aggregation. (3) Prototype-Guided Classification Module: It constructs category prototypes using labeled node representations and labels, guiding the classification of unlabeled nodes through prototype learning. Additionally, to tackle the challenges of scarce labeled data and label imbalance, the model utilizes balanced sampling strategies for semi-supervised training. Experimental results show that on the YelpChi and Amazon datasets, DualH-FDNet improves Recall by 0.9626% and 0.6444%, respectively, and AUC by 0.8594% and 0.1479% compared to the best-performing baseline models among nine comparative models. This study offers a novel solution for fraud detection in complex heterogeneous graph environments. The code and data are available at https://github.com/AyomF/DualH-FDNet.
Instructions:
The Amazon dataset contains user review data for musical instruments, with users labeled as "1" (fraudulent) or "0" (benign). The relationship types include: U-P-U (User-Product-User), which refers to users who reviewed the same product; U-S-U (User-Star-User), which refers to users who gave the same star rating to the same product within a week; and U-V-U (User-Text Similarity-User), which refers to users whose review texts are in the top 5% of similarity rankings. The dataset comprises 11,944 nodes, with fraudulent nodes accounting for 6.87%.