Skip to main content

Datasets

Standard Dataset

DSE DATASET

Citation Author(s):
Fang Jing Hao
Submitted by:
Fang Hao
Last updated:
DOI:
10.21227/7mtj-v817
Data Format:
No Ratings Yet

Abstract

In this study, two datasets are employed. The old dataset, stored in the "olddata" folder, draws on the datasets from previous research by Luo et al. and Wan et al. The new dataset, located in the "newdata" folder, focuses on 680 antineoplastic drugs retrieved from a database. Besides, three other bioentities - Protein, Disease, and ADR - are recompiled from multiple public databases such as the mentioned database, another database, Comparative Toxicogenomics Database, and SIDER database.Both datasets represent the relationships among drugs, adverse drug reactions (ADR), diseases, and targets as binary matrices (0 for no interaction, 1 for interaction). They both consist of four types of nodes and six types of edges. The new dataset, with its latest data and different sources from the old one, has fewer drugs, ADR nodes, and drug - ADR and drug - protein edges, but more protein - protein and protein - disease edges. This makes it valuable for verifying the prediction of ADR using other biological process information when there are insufficient drug - ADR facts. For IEEE DataPort users, grasping these details will enable them to effectively utilize the datasets for research related to antineoplastic drugs and bioentity interactions.

Instructions:

Data description

olddata, namely the first piece of data.

  • drug.txt: Drug name

  • protein.txt: Protein name

  • disease.txt: Disease name

  • se.txt: Side effect name

  • drug_dict_map: Mapping table of drug names and DrugBank IDs

  • protein_dict_map: Mapping table of protein names and UniProt IDs

  • mat_drug_se.txt : Drug-side effect relationship matrix

  • mat_protein_protein.txt : Protein-protein relationship matrix

  • mat_drug_drug.txt : Drug-drug relationship matrix

  • mat_protein_disease.txt : Protein-disease relationship matrix

  • mat_drug_disease.txt : Drug-disease relationship matrix

  • mat_drug_protein.txt : Drug-protein relationship matrix

newdata, namely the second piece of data.

  • all_data.xlsx:All data information of the second piece of data

  • mat_drug_se.txt : Drug-side effect relationship matrix

  • mat_protein_protein.txt : Protein-protein relationship matrix

  • mat_drug_drug.txt : Drug-drug relationship matrix

  • mat_protein_disease.txt : Protein-disease relationship matrix

  • mat_drug_disease.txt : Drug-disease relationship matrix

  • mat_drug_protein.txt : Drug-protein relationship matrix

data.

  • takeToDict_data.xlsx:The id comparison table of drugs and adverse reactions

Result description

DSE_XXX, there is a folder named DSE_XXX, where XXX corresponds to the training results of each of the seven algorithms.

newdata_result, namely the training result parameters of the latest dataset.

  • auc_aupr_precision_recall_f1_mcc.csv:Result parameters obtained by ten-fold cross-validation

  • training_metrics.xlsx:Detailed result parameter records obtained from one round of training.

olddata_result, namely the training result parameters of the old dataset.

  • auc_aupr_precision_recall_f1_mcc.csv:Result parameters obtained by ten-fold cross-validation

  • training_metrics.xlsx:Detailed result parameter records obtained from one round of training.