GDB-9-Ex_EOM-CCSD-SUBSET-100

Citation Author(s):
Kshitij
Mehta
Oak Ridge National Laboratory
Massimiliano
Lupo Pasini
Oak Ridge National Laboratory
Stephan
Irle
Oak Ridge National Laboratory
Pilsun
Yoo
Dmitry
Ganyushin
Oak Ridge National Laboratory
Submitted by:
kshitij mehta
Last updated:
Tue, 03/11/2025 - 19:57
DOI:
10.21227/zjk1-zp13
Data Format:
Research Article Link:
License:
0
0 ratings - Please login to submit your rating.

Abstract 

This is a subset of the original GDB-9-Ex_EOM-CCSD dataset at https://doi.org/10.13139/OLCF/2318313. It consists of 100 randomly selected molecules from the original dataset that consists of 80,593 molecules. This dataset contains data-intensive quantum chemical electronic structure calculations for organic molecules of the GDB-9-Ex dataset. Calculations were performed using the Equation of Motion Coupled Cluster (EOM-CCSD) first principles method using the ORCA software. It provides UV-vis spectra calculations of molecules with a high level of accuracy. The optical spectra behavior was collected based on the optimized molecular geometries in the DFTB method with 3ob parameters. All calculations utilized the def2-TZVP basis sets with the auxiliary def2/J and def2-TZVP/C basis sets. The similarity-transformed EOM-CCSD method that used domain-based local pair natural orbitals (DLPNO) approximation which constitutes the STEOM-DLPNO-CCSD method was used. This method is based on the STEOM approach and was found to make accurate predictions of transition energies for organic molecules. For the excitation energy calculations, the lowest 50 excitation states were calculated.

Instructions: 

The dataset consists of 100 directories, one for each of the 100 molecules in the dataset. Each directory contains two files: 

geo_end.xyz: ASCII file containing optimized geometry of the molecule with cartesian coordinates.

orca.stdout: Standard ASCII output generated by the ORCA software containing the output of the TDDFT calculations.

Molecule directories are named ‘mol_num’ where num corresponds to the id of the molecule from GDB-9-Ex dataset. It does not possess any practical significance as the geometry of the molecule is provided in the included geo_end.xyz file.

Total dataset size: 101 MiB

Funding Agency: 
U.S. Department of Energy (DOE)
Grant Number: 
DE-AC05-00OR22725

More from this Author