openff-toolkit: Generate a new 3D Molecule test set (formerly: 3D Molecule loading frequently has issues with ambiguous stereochemistry)
Now that I’m checking for defined stereochemistry when we import from OEMol/RDMols, I’m running into a lot of molecules that have “undefined” stereochemistry when loaded from our 3D molecule test set (molecules/zinc-subset-tripos.mol2.gz). I suspect that many of these arise from limitations in our aromaticity models (eg. not realizing that a carbon is sp2 instead of sp3).
I’ve created an optional flag in many XMol --> OFFMol functions called exception_if_undefined_stereo. If this is set to True (default), an exception will be raised if a molecule is imported with undefined stereochemistry. Otherwise a warning will be printed and the function will return None.
7 of the first 200 molecules to be loaded by OpenEye are flagged as having undefined stereochemistry. They are: ZINC00407326 ZINC01760197 ZINC04086758 ZINC00393651 ZINC05343219 ZINC05309165 ZINC14984530
More information about how we’re implementing aromaticity perception be found in #70, and in Slack discussions from July
While we should come back to this, it will have to take a back seat to the initial SMIRNOFF implementation for now.
Code to reproduce problem is here:
from openforcefield.topology import Molecule
from openforcefield.utils import get_data_filename
from openeye import oechem
filename = get_data_filename('molecules/zinc-subset-tripos.mol2.gz')
mols = list()
oemol = oechem.OEMol()
ifs = oechem.oemolistream(filename)
c=0
skipmols = []
while oechem.OEReadMolecule(ifs, oemol):
print(oemol.GetTitle(), c)
c += 1
if c in skipmols:
continue
oechem.OEFindRingAtomsAndBonds(oemol)
# Less stereo errors if we ue OEAroModel_OpenEye, but still some
oechem.OEAssignAromaticFlags(oemol, oechem.OEAroModel_MDL)
#oechem.OEAssignAromaticFlags(oemol, oechem.OEAroModel_OpenEye)
oechem.OE3DToInternalStereo(oemol)
print("Atoms")
cip = [oechem.OEPerceiveCIPStereo(oemol, atom) for atom in oemol.GetAtoms() if atom.IsChiral()]
for i in cip:
if i==oechem.OECIPAtomStereo_S:
print('S')
if i==oechem.OECIPAtomStereo_R:
print('R')
if i==oechem.OECIPAtomStereo_NotStereo:
print('Not stereo')
if i==oechem.OECIPAtomStereo_UnspecStereo:
print('Unspecified')
print("Bonds")
cip = [oechem.OEPerceiveCIPStereo(oemol, bond) for bond in oemol.GetBonds() if bond.IsChiral()] #if bond.IsChiral()])
print(cip)
for i in cip:
if i==oechem.OECIPBondStereo_E:
print('E')
if i==oechem.OECIPBondStereo_Z:
print('Z')
if i==oechem.OECIPBondStereo_NotStereo:
print('Not stereo')
if i==oechem.OECIPBondStereo_UnspecStereo:
print('Unspecified')
mol = Molecule.from_openeye(oemol, exception_if_undefined_stereo=False)
mols.append(mol)
About this issue
- Original URL
- State: open
- Created 6 years ago
- Comments: 20 (13 by maintainers)
Yes, you’re right, I missed that one.
This clearly seems to be a problem with ZINC. Maybe we should use FreeSolv or MiniDrugBank for these tests?
This is something @bannanc may be able to take a quick look at, at some point.