LanGroup Datasets:
Advancing Chemical Research
Through High-Quality Data

LanGroup Datasets

Due to rapid advancements in deep learning techniques, the demand for large-volume high-quality databases grows significantly in chemical researches. LanGroup has developed several quantum chemistry datasets to support machine learning studies in chemistry.

Available Datasets

Quantum Chemistry Datasets

QCDGE: Quantum Chemistry Database with Ground- and Excited-State Properties

A comprehensive quantum-chemistry database that includes 443,106 small organic molecules with sizes up to 10 atoms, containing C, N, O and F heavy atoms. This database features both ground-state and excited-state properties, making it particularly valuable for machine learning applications in excited-state research.

QCDGE_CI: Quantum Chemistry Dataset with Ground-state and Conical-Intersection Structures

The first quantum chemistry repository of its scale to systematically combine ground-state and S₀–S₁ conical intersection geometries and energies. The dataset contains 260,541 small molecules (C, N, O, F, up to 10 heavy atoms), enabling researchers to train models to directly predict CI geometries and energies at scale and opening the door to high-throughput virtual screening of photoreactions and accelerated mechanistic discovery.

StoL25: Conformations of Large Molecules Assembled from Small Molecular Building Blocks

This dataset includes 200 molecules (16-25 heavy atoms: C, N, O, F) from ChEMBL, with conformations generated by RDKit and StoL, then optimized at the B3LYP/6-31G/BJD3* level. It contains several optimized conformations for each molecule, providing a rich source of data for benchmarking conformation generation methods.

Nonadaibaitc Dynamics Dataset

Keto Isocytosine (TSH)

NAMD simulation results of keto isocytosine in Phys. Chem. Chem. Phys., 2022, 24, 24362-24382.

C2N2H6

NAMD simulation results of C2N2H6.

Access Information

Each dataset page contains detailed information about accessing the data. Please visit the individual dataset pages for specific download links and access instructions.