deepchem: Errors featurizing PDBBind with RdkitGridFeaturizer
I’m trying to run this script from the book examples, which uses RdkitGridFeaturizer to featurize PDBBind. Here’s what I find.
-
With the latest code, using the core subset and featurizing only the binding pockets, it is much faster than before (minutes instead of hours).
-
While running, it fills the console with a huge number of lines like these:
Coordinates are outside of the box (atom id = 7, coords xyz = [-3.97537037 0.55992593 -8.72777778], coords in box = [ 2 4 -1]
Coordinates are outside of the box (atom id = 105, coords xyz = [-1.53037037 -1.94707407 -8.81577778], coords in box = [ 3 3 -1]
Coordinates are outside of the box (atom id = 106, coords xyz = [-1.41437037 -0.88707407 -9.36477778], coords in box = [ 3 3 -1]
I suspect there’s one line for every atom that isn’t in the binding pocket?
- It fails with this error:
/home/peastman/miniconda3/envs/tf2/lib/python3.8/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray
return array(a, dtype, copy=False, order=order)
Traceback (most recent call last):
File "pdbbind_nn.py", line 11, in <module>
n_features = train_dataset.X.shape[1]
IndexError: tuple index out of range
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 16 (16 by maintainers)
Thanks! That seems to have fixed it.
How about the warning messages about coordinates outside the box? Can we eliminate them? If I set the box width to 75 then they go away, but then it runs out of memory trying to create the model because the dataset has 1,975,467 features! And 75A is way bigger than a typical binding site. (I assume it’s in A? The documentation doesn’t say.) The whole point of only featurizing the binding site is that a lot of atoms should be outside the box, so it doesn’t make sense to warn about them.