LanGroup Datasets:
Advancing Chemical Research
Through High-Quality Data

LanGroup Datasets

Due to rapid advancements in deep learning techniques, the demand for large-volume high-quality databases grows significantly in chemical researches. LanGroup has developed several quantum chemistry datasets to support machine learning studies in chemistry.

Available Datasets

QCDGE: Quantum Chemistry Database with Ground- and Excited-State Properties

A comprehensive quantum-chemistry database that includes 443,106 small organic molecules with sizes up to 10 atoms, containing C, N, O and F heavy atoms. This database features both ground-state and excited-state properties, making it particularly valuable for machine learning applications in excited-state research.

StoL25: Conformations of Large Molecules Assembled from Small Molecular Building Blocks

This dataset includes 200 molecules (16-25 heavy atoms: C, N, O, F) from ChEMBL, with conformations generated by RDKit and StoL, then optimized at the B3LYP/6-31G/BJD3* level. It contains several optimized conformations for each molecule, providing a rich source of data for benchmarking conformation generation methods.

Access Information

Each dataset page contains detailed information about accessing the data. Please visit the individual dataset pages for specific download links and access instructions.