High drug-likeness (QED>0.9) dataset 2 million molecues

Citation Author(s):
Wen
Xing
Submitted by:
Wen Xing
Last updated:
Fri, 01/03/2025 - 09:55
DOI:
10.21227/8zmv-e285
License:
0
0 ratings - Please login to submit your rating.

Abstract 

<h1>Development and Evaluation of a Novel GPT-like Conditional Molecule Generator</h1>

<p>
  This study presents the development and evaluation of a novel GPT-like conditional molecule generator designed to optimize the synthesis of chemical compounds with desirable properties. The model incorporates six pivotal physicochemical properties as conditions:
</p>

<ul>
  <li>Molecular weight</li>
  <li>Number of non-hydrogen atoms</li>
  <li>Ring count</li>
  <li>Hydrophobicity</li>
  <li>Quantitative estimation of drug-likeness (<em>QED</em>)</li>
  <li>Synthetic accessibility score (<em>SAS</em>)</li>
</ul>

<p>
  By integrating these specific attributes, the generator successfully produced a high-QED database, consisting of approximately 2 million molecules, all exhibiting a QED higher than 0.9. This achievement not only demonstrates the model's effectiveness in generating structurally diverse and potentially pharmacologically viable molecules but also underscores its utility in accelerating drug discovery processes.
</p>

Instructions: 

<p>
  The generated data contains 2 columns and about 2 million rows. The columns are:
</p>

<ul>
  <li>Smiles</li>
  <li>QED values</li>
</ul>

Funding Agency: 
SINTEF
Grant Number: 
SIP2023