DGCMF-MSN

Citation Author(s):
Min
Jin
Submitted by:
Min Jin
Last updated:
Thu, 03/27/2025 - 14:01
DOI:
10.21227/12d3-7t19
License:
0
0 ratings - Please login to submit your rating.

Abstract 

The dataset of the DGCMF-MSN, where includes 1,020 drug entities, 5,598 standardized side effects, and 133,750 validated positive association samples. Additionally, the feature-engineered data derived from the three-modal data of these 1,020 drugs are also included. Drug-se_matrix.txt is a matrix of drug-side effect associations. Drugs.smiles contains feature engineering results derived from SMILES representations. Drugs.fpt contains molecular fingerprint feature engineering results. The files mpnn_toxcast.npy, nf_toxcast.npy, weave_toxcast.npy, and afp_toxcast.npy represent graph embeddings of molecular structures generated by MPNN, NF, Weave, and AFP models respectively. Drugs1020.json contains identifiers for these 1020 drugs.

Instructions: 

The dataset of the DGCMF-MSN, where includes 1,020 drug entities, 5,598 standardized side effects, and 133,750 validated positive association samples. Additionally, the feature-engineered data derived from the three-modal data of these 1,020 drugs are also included. Drug-se_matrix.txt is a matrix of drug-side effect associations. Drugs.smiles contains feature engineering results derived from SMILES representations. Drugs.fpt contains molecular fingerprint feature engineering results. The files mpnn_toxcast.npy, nf_toxcast.npy, weave_toxcast.npy, and afp_toxcast.npy represent graph embeddings of molecular structures generated by MPNN, NF, Weave, and AFP models respectively. Drugs1020.json contains identifiers for these 1020 drugs.

Comments

The load_data function in test.py contains the usage of the data set.

Submitted by Min Jin on Thu, 03/27/2025 - 14:04

Dataset Files

    Files have not been uploaded for this dataset