RF-GCN_Dataset

Citation Author(s):
Shiping
Wang
Submitted by:
Shiping Wang
Last updated:
Wed, 10/23/2024 - 01:57
DOI:
10.21227/kfta-yy54
License:
75 Views
Categories:
Keywords:
0
0 ratings - Please login to submit your rating.

Abstract 

Cora, Citeseer, and Pubmed are three citation networks for research papers, where nodes represent publications and edges denote citation links. Node attributes consist of bag-of-words representations of the papers.

ACM is a paper network where nodes represent papers connected by edges if two papers share the same author. The network is characterized by features that include bagof- words representations of paper keywords.

BlogCatalog is a social network where nodes represent users and edges represent their relationships, with labels indicating the topic categories provided by the users.

Flickr is a social network derived from an image and video hosting website. In this network, nodes denote users, edges denote their relationships, and all nodes are categorized into nine classes.

Cornell, Texas, and Wisconsin are three sub-datasets. Nodes represent web pages, and edges indicate hyperlinks between them. The node features consist of the bag-ofwords representation of web pages. These web pages are manually categorized into five groups: student, project, course, staff, and faculty.

Film represents the actor-only subgraph of the filmdirector- actor-writer network. Each node represents an actor, and the edge between them indicates their cooccurrence on the same Wikipedia page.

IMDB-BINARY and IMDB-MULTI are datasets where each graph represents an actor’s ego-network, with nodes as actors and edges as co-appearances in movies, categorized by movie genre.

COLLAB is a scientific collaboration dataset derived from three public sources. Each graph in the dataset represents a researcher’s self-network. Similar to movie datasets, these graphs are categorized according to the researchers’ domains.

MUTAG is a bioinformatics dataset containing 188 mutagenic aromatic and heteroaromatic nitro compounds with 7 labels.

PROTEINS  is a bioinformatics dataset where nodes denote secondary structure elements with 3 labels. An edge between any two nodes indicates that they are neighbors in the amino acid sequence or 3D space.

PTC is a bioinformatics dataset comprising 344 chemical compounds that report carcinogenicity for male and female rats and it includes 19 discrete labels.

 

Instructions: 

Cora, Citeseer, and Pubmed are three citation networks for research papers, where nodes represent publications and edges denote citation links. Node attributes consist of bag-of-words representations of the papers.

ACM is a paper network where nodes represent papers connected by edges if two papers share the same author. The network is characterized by features that include bagof- words representations of paper keywords.

BlogCatalog is a social network where nodes represent users and edges represent their relationships, with labels indicating the topic categories provided by the users.

Flickr is a social network derived from an image and video hosting website. In this network, nodes denote users, edges denote their relationships, and all nodes are categorized into nine classes.

Cornell, Texas, and Wisconsin are three sub-datasets. Nodes represent web pages, and edges indicate hyperlinks between them. The node features consist of the bag-ofwords representation of web pages. These web pages are manually categorized into five groups: student, project, course, staff, and faculty.

Film represents the actor-only subgraph of the filmdirector- actor-writer network. Each node represents an actor, and the edge between them indicates their cooccurrence on the same Wikipedia page.

IMDB-BINARY and IMDB-MULTI are datasets where each graph represents an actor’s ego-network, with nodes as actors and edges as co-appearances in movies, categorized by movie genre.

COLLAB is a scientific collaboration dataset derived from three public sources. Each graph in the dataset represents a researcher’s self-network. Similar to movie datasets, these graphs are categorized according to the researchers’ domains.

MUTAG is a bioinformatics dataset containing 188 mutagenic aromatic and heteroaromatic nitro compounds with 7 labels.

PROTEINS  is a bioinformatics dataset where nodes denote secondary structure elements with 3 labels. An edge between any two nodes indicates that they are neighbors in the amino acid sequence or 3D space.

PTC is a bioinformatics dataset comprising 344 chemical compounds that report carcinogenicity for male and female rats and it includes 19 discrete labels.