Name: High drug-likeness (QED>0.9) dataset 2 million molecules
Creator: Wen Xing
License: https://creativecommons.org/licenses/by/4.0/
Keywords: Artificial Intelligence

Abstract

Development and Evaluation of a Novel GPT-like Conditional Molecule Generator

This study presents the development and evaluation of a novel GPT-like conditional molecule generator designed to optimize the synthesis of chemical compounds with desirable properties. The model incorporates six pivotal physicochemical properties as conditions:

Molecular weight
Number of non-hydrogen atoms
Ring count
Hydrophobicity
Quantitative estimation of drug-likeness (QED)
Synthetic accessibility score (SAS)

By integrating these specific attributes, the generator successfully produced a high-QED database, consisting of approximately 2 million molecules, all exhibiting a QED higher than 0.9. This achievement not only demonstrates the model's effectiveness in generating structurally diverse and potentially pharmacologically viable molecules but also underscores its utility in accelerating drug discovery processes.

Instructions:

The generated data contains 2 columns and about 2 million rows. The columns are:

Smiles
QED values

Funding Agency:

SINTEF

Grant Number:

SIP2023

Comments

For manucript

Submitted by Wen Xing on Fri, 06/07/2024 - 03:39

Typto corrected

Submitted by Wen Xing on Thu, 08/15/2024 - 08:27

Dataset Files

high_QED_molecules.csv (108.81 MB)

Datasets

Standard Dataset

High drug-likeness (QED>0.9) dataset 2 million molecules

Abstract

Development and Evaluation of a Novel GPT-like Conditional Molecule Generator

Comments

More from this Author

High drug-likeness (QED>0.9) dataset 2 million...

Dataset Files

QUESTIONS?