openff-toolkit: Generate a new 3D Molecule test set (formerly: 3D Molecule loading frequently has issues with ambiguous stereochemistry)

Now that I’m checking for defined stereochemistry when we import from OEMol/RDMols, I’m running into a lot of molecules that have “undefined” stereochemistry when loaded from our 3D molecule test set (molecules/zinc-subset-tripos.mol2.gz). I suspect that many of these arise from limitations in our aromaticity models (eg. not realizing that a carbon is sp2 instead of sp3).

I’ve created an optional flag in many XMol --> OFFMol functions called exception_if_undefined_stereo. If this is set to True (default), an exception will be raised if a molecule is imported with undefined stereochemistry. Otherwise a warning will be printed and the function will return None.

7 of the first 200 molecules to be loaded by OpenEye are flagged as having undefined stereochemistry. They are: ZINC00407326 ZINC01760197 ZINC04086758 ZINC00393651 ZINC05343219 ZINC05309165 ZINC14984530

More information about how we’re implementing aromaticity perception be found in #70, and in Slack discussions from July

While we should come back to this, it will have to take a back seat to the initial SMIRNOFF implementation for now.

Code to reproduce problem is here:

from openforcefield.topology import Molecule
from openforcefield.utils import get_data_filename
from openeye import oechem
filename = get_data_filename('molecules/zinc-subset-tripos.mol2.gz')
mols = list()
oemol = oechem.OEMol()
ifs = oechem.oemolistream(filename)
c=0
skipmols = []
while oechem.OEReadMolecule(ifs, oemol):
    print(oemol.GetTitle(), c)
    c += 1
    if c in skipmols:
        continue
    oechem.OEFindRingAtomsAndBonds(oemol)
    # Less stereo errors if we ue OEAroModel_OpenEye, but still some
    oechem.OEAssignAromaticFlags(oemol, oechem.OEAroModel_MDL)
    #oechem.OEAssignAromaticFlags(oemol, oechem.OEAroModel_OpenEye)
    oechem.OE3DToInternalStereo(oemol)
    print("Atoms")
    cip = [oechem.OEPerceiveCIPStereo(oemol, atom) for atom in oemol.GetAtoms() if atom.IsChiral()]
    for i in cip:
        if i==oechem.OECIPAtomStereo_S:
            print('S')
        if i==oechem.OECIPAtomStereo_R:
            print('R')        
        if i==oechem.OECIPAtomStereo_NotStereo:
            print('Not stereo')        
        if i==oechem.OECIPAtomStereo_UnspecStereo:
            print('Unspecified')
    print("Bonds")
    cip = [oechem.OEPerceiveCIPStereo(oemol, bond) for bond in oemol.GetBonds() if bond.IsChiral()] #if bond.IsChiral()])
    print(cip)
    for i in cip:
        if i==oechem.OECIPBondStereo_E:
            print('E')
        if i==oechem.OECIPBondStereo_Z:
            print('Z')
        if i==oechem.OECIPBondStereo_NotStereo:
            print('Not stereo')            
        if i==oechem.OECIPBondStereo_UnspecStereo:
            print('Unspecified')
    mol = Molecule.from_openeye(oemol, exception_if_undefined_stereo=False)
    mols.append(mol)

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 20 (13 by maintainers)

Most upvoted comments

ZINC01760197, I think, does… This carbon looks sp3 to me, as drawn on the ZINC website.

Yes, you’re right, I missed that one.

This clearly seems to be a problem with ZINC. Maybe we should use FreeSolv or MiniDrugBank for these tests?

This is something @bannanc may be able to take a quick look at, at some point.