Artificial Intelligence
The dataset is used to detect essential protein in uncertain PPI network.
- Categories:
The PPI datasets were collected from four different sources: DIP, MIPS, Gavin, and Krogan. All self-interactions and repeated interactions were filtered. The essential proteins were collected from the following four different databases: MIPS,SGD,DEGand SGDP (http://www.sequence.stanford.edu/group/). Gene expression data were downloaded from the Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo/) with accession number GSE3431.
- Categories:
A Chinese dataset for table-to-text generation named WIKIBIOCN which inculeds 33,244 biography sentences with related tables from Chinese Wikipedia (July 2018).
The dataset is divided into training set (30,000), verification set (1000) and test set (2,244).
- Categories:
We crawled large amounts of biomedical articles from PubMed for the keyphrase extraction system evaluation.
The articles, that consist of title, abstract and keyphrases provided by the authors, are used for the experiments.
In our paper, cancer-related biomedical articles are selected.
- Categories:
Since there is no image-based personality dataset, we used the ChaLearn dataset for creating a new dataset that met the characteristics we required for this work, i.e., selfie images where only one person appears and his face is visible, labeled with the person's apparent personality in the photo.
- Categories:
These datasets are used to detect Intrusions in Controller Area Network (CAN) bus. Intrusions are detected using various Machine Learning and Deep Learning algorithms.
.
- Categories:
With the development of audio synthesis techniques, the most state-of-art synthesis methods based on Generative Adversarial Network(GAN) have been proposed. Whether the automatic speaker verification (ASV) systems are vulnerability to the GAN based synthesized audios is urgently needed to be verified. We present a publicly available set of GAN based synthesized audios generated by some open source schemes (WaveGAN,TifGAN,GANSynth,MelGAN), which allows researches to verify impact of the GAN-synthetic audio on security of ASV systems.
- Categories:
ArPC is an Arabic paraphrase identification corpus. It consists of 1331 sentence pairs along with their binary score that indicates weather the pairs are paraphrase or not. The corpus has been manually annotated by three Arabic native speakers.
- Categories: