openff-toolkit: GLOBAL_TOOLKIT_REGISTRY does not behave as anticipated

Reported by @SimonBoothroyd

Describe the bug Changing GLOBAL_TOOLKIT_REGISTRY should allow the user to change the the cheminformatics toolkit that’s used on the backend. However, this doesn’t seem to work

To Reproduce

from openforcefield.topology import Molecule
from openforcefield.utils.toolkits import ToolkitRegistry, RDKitToolkitWrapper, GLOBAL_TOOLKIT_REGISTRY
mol = Molecule.from_smiles('CC')
mol.generate_conformers()
print(mol.conformers[0][0])

[ 0.81511319 -0.53833634 0.49281269] A

These are the coordinates of the first atom in the first conformer when OpenEye generates it

mol.generate_conformers(toolkit_registry=RDKitToolkitWrapper())
print(mol.conformers[0][0])

[-0.74552317 0.04144446 0.01170607] A

These are the coordinates of the first atom of the first conformer when RDKit makes the conformer

GLOBAL_TOOLKIT_REGISTRY = ToolkitRegistry(toolkit_precedence=[RDKitToolkitWrapper])
#import openforcefield.topology.molecule
print(GLOBAL_TOOLKIT_REGISTRY.registered_toolkits)

[<openforcefield.utils.toolkits.RDKitToolkitWrapper object at 0x118bc65f8>]

I am able to override the imported GLOBAL_TOOLKIT_REGISTRY to only have RDKit

mol.generate_conformers()
print(mol.conformers[0][0])

[ 0.81511319 -0.53833634 0.49281269] A

But this function still generates the OpenEye conformer

Additional information

The GLOBAL_TOOLKIT_REGISTRY can be imported from a number of modules – for example:

  • openforcefield.topology.molecule.GLOBAL_TOOLKIT_REGISTRY
  • openforcefield.utils.GLOBAL_TOOLKIT_REGISTRY
  • openforcefield.topology.molecule.openforcefield.utils.toolkits.GLOBAL_TOOLKIT_REGISTRY

In my testing, changing any one of these doesn’t appear to affect the others.

I’m concerned that what’s really happening here is that a new GLOBAL_TOOLKIT_REGISTRY is being imported for each module. I haven’t worked with this pattern of globally-shared objects before, so I’m not sure how to proceed.

@jaimergp suggested using the context manager approach, where we could use something like with using_toolkit(RDKit): ... to define blocks where certain toolkits or registries are used. This seems much smoother than tweaking a “GLOBAL” variable every time we want different defaults. I’m not sure how we’ll choose to solve this issue, but this may be a convenient feature to add/test if this issue initiates a larger refactor.

About this issue

  • Original URL
  • State: closed
  • Created 4 years ago
  • Comments: 20 (18 by maintainers)

Most upvoted comments

Let me try to summarize the state of this discussion:

We have identified three topics which may or may not be totally separate:

  1. Should we enforce that GLOBAL_TOOLKIT_REGISTRY be a singleton?
  2. Should we encourage use patterns where people modify GLOBAL_TOOLKIT_REGISTRY?
  3. Should we make a context manager like using_toolkits(...) that safely handles temporary modifications to GLOBAL_TOOLKIT_REGISTRY?

My thoughts on these are:

For 1), I don’t think this is necessary any longer. The driving force behind this suggestion was my concern that maybe we had tons of little GLOBAL_TOOLKIT_REGISTRYs running around. This doesn’t actually appear to be the case. Instead, what’s really happening is that assignment statements of the form from openforcefield.utils.toolkits import GLOBAL_TOOLKIT_REGISTRY; GLOBAL_TOOLKIT_REGISTRY = X don’t actually modify the original imported object – Instead it makes a new local variable that has nothing to do with openforcefield.utils.toolkits.GLOBAL_TOOLKIT_REGISTRY. So far, we have not found a way to do an in-place replacement of the original.

For 2): no. No we shouldn’t. It’s dangerous to register and deregister toolkits in the global registry live, and it introduces ambiguity about the scope of those changes. Regarding assignment calls, even if we could make GLOBAL_TOOLKIT_REGISTRY = ... statements work like I wanted, it would be unpythonic and confuse advanced python users who know that it shouldn’t work that way.

For 3): That would be great as a new feature. It would enable a cleaner way to switch contexts, but it wouldn’t fundamentally make any new behavior accessible, since the same thing could be achieved by repeatedly specifying toolkit_registry=... kwargs throughout a code block. It would probably make US do the dirty work of 2) in secret, but that’s actually good since at least the complexity would be cordoned off into a single place.

In conclusion

  • We’ve “solved” the problem as stated in the title of the PR – The problem isn’t the “behavior”, it what I “anticipated” the behavior should be.
  • We haven’t found any actual problems that would be solved by enforcing singleton-ness
  • We should move away from advertising the existence of GLOBAL_TOOLKIT_REGISTRY to users, or recommending modifying it to solve problems.
  • I’d like to pursue the context manager idea

Does this seem like a good route forward?

I’m concerned that what’s really happening here is that a new GLOBAL_TOOLKIT_REGISTRY is being imported for each module.

I just wanted to clear this up a little. Each module is importing the same variable from .toolkits. However, if I understand correctly, the objects get compiled into memory when you first import your module into your interpreter (from openff.toolkit.topology import Molecule). During this compilation, the function default arguments get evaluated. At this point the GLOBAL_TOOLKIT_REGISTRY variable name in the function signature is meaningless; instead, the function keyword points to the object in memory. Therefore you can’t replace it by replacing GLOBAL_TOOLKIT_REGISTRY, whether as a local or global variable, but you can alter behaviour by modifying the object itself.

This doesn’t actually appear to be the case. Instead, what’s really happening is that assignment statements of the form from openforcefield.utils.toolkits import GLOBAL_TOOLKIT_REGISTRY; GLOBAL_TOOLKIT_REGISTRY = X don’t actually modify the original imported object – Instead it makes a new local variable that has nothing to do with openforcefield.utils.toolkits.GLOBAL_TOOLKIT_REGISTRY. So far, we have not found a way to do an in-place replacement of the original.

So, you can reassign openforcefield.utils.toolkits.GLOBAL_TOOLKIT_REGISTRY = MY_REGISTRY and new functions using this variable will be updated. However, existing, compiled, functions are now totally distinct. There is no way to do in-place replacement of the original in compiled objects.

One can override the GLOBAL_TOOLKIT_REGISTRY by switching the order of the precedence inside the object instead of creating a new GLOBAL_TOOLKIT_REGISTRY, but I recall using private attributes to do that so a public-facing method would be handy.

A very nice summary @j-wags !

I’d like to pursue the context manager idea

I’d also be interested in seeing this, with the idea that it would be good to mock something up, try it out, get feedback, but then dump it if it doesn’t feel right! 🙂

Also happy to stick with just passing as an argument, but as you say that it is a bit more tedious and possibly easy to miss once if you are passing it several times in succession.

We should move away from advertising the existence of GLOBAL_TOOLKIT_REGISTRY to users, or recommending modifying it to solve problems.

It would probably be good to just expose a read-only way of accessing it (e.g. a static global_toolkit_registry() function which returns a copy), but not a way in which it can be overwritten / mutated.

Apologies for the wall of text, I just got really interested in this for some reason - the short of it is that I think the original issue is resolved, but there are reasons to consider a refactor.

Some of this behavior is fixed now that GLOBAL_TOOLKIT_REGISTRY can be modified. Below is run with all toolkits available and stock behavior


In [1]: from openforcefield.topology import Molecule


In [2]: from openforcefield.utils.toolkits import *

In [3]: mol = Molecule.from_smiles('CC')
   ...: mol.generate_conformers()
   ...: print(mol.conformers[0][0])
[ 0.81511319 -0.53833634  0.49281269] A

In [4]: mol.generate_conformers(toolkit_registry=OpenEyeToolkitWrapper())
   ...: print(mol.conformers[0][0])
[ 0.81511319 -0.53833634  0.49281269] A  # Confirms default behavior uses OpenEye

In [5]: mol.generate_conformers(toolkit_registry=RDKitToolkitWrapper())
   ...: print(mol.conformers[0][0])
[-0.74552317  0.04144446  0.01170607] A

In [6]: GLOBAL_TOOLKIT_REGISTRY.deregister_toolkit(OpenEyeToolkitWrapper())

In [7]: mol.generate_conformers()
   ...: print(mol.conformers[0][0])
[-0.74552317  0.04144446  0.01170607] A  # Matches RDKit values

In [8]: from openforcefield.utils.toolkits import GLOBAL_TOOLKIT_REGISTRY

In [9]: mol.generate_conformers()
   ...: print(mol.conformers[0][0])
[-0.74552317  0.04144446  0.01170607] A  # Still matches RDKit values, even after re-importing

In [10]: GLOBAL_TOOLKIT_REGISTRY.registered_toolkits
Out[10]:
[ToolkitWrapper around The RDKit version 2020.03.4,
 ToolkitWrapper around AmberTools version 20.0,
 ToolkitWrapper around Built-in Toolkit version None]  # Still has OpenEye de-registered

This seems to satisfy one of the requirements above

Changing GLOBAL_TOOLKIT_REGISTRY should allow the user to change the the cheminformatics toolkit that’s used on the backend.

I’m not great with global variables in Python (since it’s something that should typically be avoided), but whatever is happening to that variable seems to persist whether or not you try and import it again. Its hash and memory location do not change when deregistering toolkits (not that they necessarily would be expected to) and re-importing it does not change behavior, implying that …

The GLOBAL_TOOLKIT_REGISTRY can be imported from a number of modules – for example:

The other ways to import it, I believe, just import it from openforcefield.utils.toolkits, which is the only place it’s actually instantiated. So it’s all pointing to the same variable, no matter how you get it.

I think the tricky thing happening above was GLOBAL_TOOLKIT_REGISTRY = ToolkitRegistry(toolkit_precedence=[RDKitToolkitWrapper]) since the variable sitting in an interpreter may not be what the methods call internally (without explicitly passing that in). I could be wrong

A singleton would mostly work here, since we want to be able to create a single instance of a class and forbid re-instantiation of it. From my research (learning many of these CS concepts on the fly) it seems to be fairy controversial whether or not a singleton is an anti-pattern (link, partially because global variables can serve most purposes better but also because people apparently have a habit of using singletons when they shouldn’t. A downside here is that, along a sequence of unit tests, and modifications to GLOBAL_TOOLKIT_REGISTRY must be re-done before it is used again to ensure the “out of the box” behavior in another unit tests. From the snippets above, this seems to also happen with the current implementation and we’ve just gotten lucky with our unit needs not stressing this behavior. Another downside is that it is fairy difficult to extend, although I’m not sure we can anticipate making different versions of this one.

I’m concerned that what’s really happening here is that a new GLOBAL_TOOLKIT_REGISTRY is being imported for each module.

I don’t think this is happening, although my testing isn’t that extensive.

… context manager approach, where we could use something like with using_toolkit(RDKit): ... to define blocks where certain toolkits or registries are used.

What is the benefit of using this over specifying the toolkit to each method call, or creating a temporary registries for particular purposes?

registry_to_use_now = ToolkitRegistry(...)
...  # do stuff
other_registry = ToolkitRegistry(...)
.... # do other stuff

I think the first question to address is whether GLOBAL_TOOLKIT_REGISTRY is actually a singleton or not.

I think it many respects it is, although I’m inclined to say that it’s just a global-like variable that acts like a singleton. Since it’s an instance, not a class, I don’t think the definitions of singleton can really be considered here (unless we refactor it to a new class that exhibits the behaviors of one)

There are some singleton implementations described here (you have to scroll down for the elegant/simple ones). This could be a solution, but we should have somebody with more experience comment on this.