rdkit: ERROR SMARTS matching SMILES

Description:

  • RDKit Version: 2017.09.3
  • Platform: Windows

When matching one SMARTS with a specific SMILES somehow one match is missing.

Following code:

smiles = 'C1=CC=C2C(=C1)C3=CC=CC4=C3C5=C(C=C4)C(C(C=C25)O)O'
mol = Chem.MolFromSmiles(smiles)

q = Chem.MolFromSmarts('[cH]')

mol.GetSubstructMatches(q)

returns:
((0,), (1,), (2,), (5,), (7,), (8,), (9,), (14,), (15,))

It returns 9 hits, although it should find 10. Atom with index 18 is missing. If I draw the image the aromaticity looks correct:

clipboard01

About this issue

  • Original URL
  • State: open
  • Created 6 years ago
  • Comments: 27 (18 by maintainers)

Most upvoted comments

SMARTS has no aromaticity model and is very literal. In all your cases where it doesn’t match, you are having aromaticity mismatches. You can see this as follows:

mol = Chem.MolFromSmiles(‘C1=COC(=C1)C=O’)

Chem.MolToSmiles(mol)

‘O=Cc1ccco1’

Since RDKit doesn’t support both a kekule and aromaticity model simultaneously, this simply isn’t going to match.

I suggest reading this: http://www.daylight.com/dayhtml/doc/theory/theory.smarts.html specifically the smarts vs smiles section.

Cheers, Brian

On Mon, Aug 27, 2018 at 4:39 AM Simon notifications@github.com wrote:

I found another error for a SMARTS equal to the SMILES:

mol = Chem.MolFromSmiles('C1=COC(=C1)C=O) patt = Chem.MolFromSmarts(‘C1=COC(=C1)C=O’) mol.GetSubstructMatches(patt) returns empty

[image: clipboard01] https://user-images.githubusercontent.com/6115499/44649795-2f3e0a00-a9e5-11e8-8b12-b406cd0b018a.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rdkit/rdkit/issues/2011#issuecomment-416156987, or mute the thread https://github.com/notifications/unsubscribe-auth/AJbioGEUK42JfqcbKS3-ulX_08crTIwvks5uU7ArgaJpZM4WK_DE .

I will say this is one of those corner cases where if you draw the ring as aromatic, it sure looks like there are five bonds to a carbon which makes some of my medicinal chemistry friends look a bit queasy.

Cheers, Brian

On Fri, Aug 24, 2018 at 10:36 AM Brian Kelley fustigator@gmail.com wrote:

This is subtler, the ring is aromatic. RDKit doesn’t store both the aromatic and double bond types, they are independent. The matching patterns are:

pat = Chem.MolFromSmarts(“[#6&H1]:[#6&H1]”)

or

pat = Chem.MolFromSmarts(“[#6&H1]~[#6&H1]”)

~ is any bond, : is aromatic.

Or you could use,

pat = Chem.MolFromSmarts(“[#6&H1]-,=,:[#6&H1]”)

depending.

Cheers, Brian On Fri, Aug 24, 2018 at 10:30 AM Simon notifications@github.com wrote:

I actually ment to match this:

[image: unbenannt] https://user-images.githubusercontent.com/6115499/44590242-fae70580-a7ba-11e8-99fb-3bc6320337c8.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rdkit/rdkit/issues/2011#issuecomment-415776875, or mute the thread https://github.com/notifications/unsubscribe-auth/AJbioLQdnw4-y8J0kv6dxMfMDo-tTvQQks5uUA4egaJpZM4WK_DE .

This is subtler, the ring is aromatic. RDKit doesn’t store both the aromatic and double bond types, they are independent. The matching patterns are:

pat = Chem.MolFromSmarts(“[#6&H1]:[#6&H1]”)

or

pat = Chem.MolFromSmarts(“[#6&H1]~[#6&H1]”)

~ is any bond, : is aromatic.

Or you could use,

pat = Chem.MolFromSmarts(“[#6&H1]-,=,:[#6&H1]”)

depending.

Cheers, Brian On Fri, Aug 24, 2018 at 10:30 AM Simon notifications@github.com wrote:

I actually ment to match this:

[image: unbenannt] https://user-images.githubusercontent.com/6115499/44590242-fae70580-a7ba-11e8-99fb-3bc6320337c8.png

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/rdkit/rdkit/issues/2011#issuecomment-415776875, or mute the thread https://github.com/notifications/unsubscribe-auth/AJbioLQdnw4-y8J0kv6dxMfMDo-tTvQQks5uUA4egaJpZM4WK_DE .

The nitrogen is aromatic which is why it isn’t matching.

This is, in general, why using atomic numbers can be better than using symbols. Both the following also match.

pat = Chem.MolFromSmarts(‘[CH3]n’)

pat = Chem.MolFromSmarts(‘[#6&H3][#7]’)

On Fri, Aug 24, 2018 at 8:05 AM Simon notifications@github.com wrote:

Found another one:

mol = Chem.MolFromSmiles(‘CN1C2=CC=CC=C2C=C3C1=CC4=CC=CC=C43’) mol.HasSubstructMatch(Chem.MolFromSmarts(‘[CH3]N’))

returns False (should be True)

Again image suggests it should work: [image: test] https://user-images.githubusercontent.com/6115499/44583653-712d3d00-a7a6-11e8-864e-c21998c66942.png

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/rdkit/rdkit/issues/2011#issuecomment-415739302, or mute the thread https://github.com/notifications/unsubscribe-auth/AJbioCm95BbShtbxX2gjDQekgCtX6rzxks5uT-wCgaJpZM4WK_DE .