Datasets
Standard Dataset
DGCMF-MSN

- Citation Author(s):
- Submitted by:
- Min Jin
- Last updated:
- Thu, 03/27/2025 - 14:01
- DOI:
- 10.21227/12d3-7t19
- License:
- Categories:
- Keywords:
Abstract
The dataset of the DGCMF-MSN, where includes 1,020 drug entities, 5,598 standardized side effects, and 133,750 validated positive association samples. Additionally, the feature-engineered data derived from the three-modal data of these 1,020 drugs are also included. Drug-se_matrix.txt is a matrix of drug-side effect associations. Drugs.smiles contains feature engineering results derived from SMILES representations. Drugs.fpt contains molecular fingerprint feature engineering results. The files mpnn_toxcast.npy, nf_toxcast.npy, weave_toxcast.npy, and afp_toxcast.npy represent graph embeddings of molecular structures generated by MPNN, NF, Weave, and AFP models respectively. Drugs1020.json contains identifiers for these 1020 drugs.
The dataset of the DGCMF-MSN, where includes 1,020 drug entities, 5,598 standardized side effects, and 133,750 validated positive association samples. Additionally, the feature-engineered data derived from the three-modal data of these 1,020 drugs are also included. Drug-se_matrix.txt is a matrix of drug-side effect associations. Drugs.smiles contains feature engineering results derived from SMILES representations. Drugs.fpt contains molecular fingerprint feature engineering results. The files mpnn_toxcast.npy, nf_toxcast.npy, weave_toxcast.npy, and afp_toxcast.npy represent graph embeddings of molecular structures generated by MPNN, NF, Weave, and AFP models respectively. Drugs1020.json contains identifiers for these 1020 drugs.
Comments
The load_data function in test.py contains the usage of the data set.