Datasets
Standard Dataset
DKS-Dataset
- Citation Author(s):
- Submitted by:
- Bin Yang
- Last updated:
- Mon, 07/08/2024 - 04:34
- DOI:
- 10.21227/qex9-kq42
- License:
- Categories:
- Keywords:
Abstract
Most existing graph keyword search works assume that the graph data is complete and clean, that is, there are no missing information (such as keywords or edges) and contaminated information (such as keywords) on the graph. However, real-world graphs often suffer from missing or being contaminated, making the keyword search on graphs much more challenging. We provide this dataset for the keyword search on dirty graphs.
There are 5 standard real-world datasets from various domains. CiteSeer is a standard citation network dataset, where vertices represent documents, edges represent citation links and keywords are the bag-of-words representation of papers. Cornell and Wisconsin are two subdatasets of a webpage dataset collected from computer science departments of various universities, where nodes denote web pages, edges denote hyperlinks between nodes and keywords are the bag-of-words representation of web pages. Toy and Video are co-purchase networks. Their nodes denote the products and the keywords are features of the product. An edge is built if two products are purchased by one customer.