umap: PicklingError numpy 1.20

I am having the following issue using numpy 1.20 and umpa-learn.

See the following traceback.

PicklingError                             Traceback (most recent call last)
<command-1906172682191955> in <module>
     19 
     20 UMAP = reducer.fit_transform(article_cluster_data[[
---> 21     s for s in eans_embeddings.columns if "embedding" in s
     22 ]])
     23 

/databricks/python/lib/python3.7/site-packages/umap/umap_.py in fit_transform(self, X, y)
   2633             Local radii of data points in the embedding (log-transformed).
   2634         """
-> 2635         self.fit(X, y)
   2636         if self.transform_mode == "embedding":
   2637             if self.output_dens:

/databricks/python/lib/python3.7/site-packages/umap/umap_.py in fit(self, X, y)
   2571 
   2572         numba.set_num_threads(self._original_n_threads)
-> 2573         self._input_hash = joblib.hash(self._raw_data)
   2574 
   2575         return self

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in hash(obj, hash_name, coerce_mmap)
    265     else:
    266         hasher = Hasher(hash_name=hash_name)
--> 267     return hasher.hash(obj)

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in hash(self, obj, return_digest)
     66     def hash(self, obj, return_digest=True):
     67         try:
---> 68             self.dump(obj)
     69         except pickle.PicklingError as e:
     70             e.args += ('PicklingError while hashing %r: %r' % (obj, e),)

/databricks/python/lib/python3.7/pickle.py in dump(self, obj)
    435         if self.proto >= 4:
    436             self.framer.start_framing()
--> 437         self.save(obj)
    438         self.write(STOP)
    439         self.framer.end_framing()

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
    240             klass = obj.__class__
    241             obj = (klass, ('HASHED', obj.descr))
--> 242         Hasher.save(self, obj)
    243 
    244 

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
     92                 cls = obj.__self__.__class__
     93                 obj = _MyHash(func_name, inst, cls)
---> 94         Pickler.save(self, obj)
     95 
     96     def memoize(self, obj):

/databricks/python/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/databricks/python/lib/python3.7/pickle.py in save_tuple(self, obj)
    772         if n <= 3 and self.proto >= 2:
    773             for element in obj:
--> 774                 save(element)
    775             # Subtle.  Same as in the big comment below.
    776             if id(obj) in memo:

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
    240             klass = obj.__class__
    241             obj = (klass, ('HASHED', obj.descr))
--> 242         Hasher.save(self, obj)
    243 
    244 

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
     92                 cls = obj.__self__.__class__
     93                 obj = _MyHash(func_name, inst, cls)
---> 94         Pickler.save(self, obj)
     95 
     96     def memoize(self, obj):

/databricks/python/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/databricks/python/lib/python3.7/pickle.py in save_tuple(self, obj)
    787         write(MARK)
    788         for element in obj:
--> 789             save(element)
    790 
    791         if id(obj) in memo:

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
    240             klass = obj.__class__
    241             obj = (klass, ('HASHED', obj.descr))
--> 242         Hasher.save(self, obj)
    243 
    244 

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
     92                 cls = obj.__self__.__class__
     93                 obj = _MyHash(func_name, inst, cls)
---> 94         Pickler.save(self, obj)
     95 
     96     def memoize(self, obj):

/databricks/python/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    502         f = self.dispatch.get(t)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return
    506 

/databricks/python/lib/python3.7/pickle.py in save_tuple(self, obj)
    772         if n <= 3 and self.proto >= 2:
    773             for element in obj:
--> 774                 save(element)
    775             # Subtle.  Same as in the big comment below.
    776             if id(obj) in memo:

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
    240             klass = obj.__class__
    241             obj = (klass, ('HASHED', obj.descr))
--> 242         Hasher.save(self, obj)
    243 
    244 

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
     92                 cls = obj.__self__.__class__
     93                 obj = _MyHash(func_name, inst, cls)
---> 94         Pickler.save(self, obj)
     95 
     96     def memoize(self, obj):

/databricks/python/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    516                 issc = False
    517             if issc:
--> 518                 self.save_global(obj)
    519                 return
    520 

/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save_global(self, obj, name, pack)
    115             Pickler.save_global(self, obj, **kwargs)
    116         except pickle.PicklingError:
--> 117             Pickler.save_global(self, obj, **kwargs)
    118             module = getattr(obj, "__module__", None)
    119             if module == '__main__':

/databricks/python/lib/python3.7/pickle.py in save_global(self, obj, name)
    958             raise PicklingError(
    959                 "Can't pickle %r: it's not found as %s.%s" %
--> 960                 (obj, module_name, name)) from None
    961         else:
    962             if obj2 is not obj:
PicklingError: ("Can't pickle <class 'numpy.dtype[float32]'>: it's not found as numpy.dtype[float32]", 'PicklingError while hashing array([[-0.3997416 , -0.19219466, -0.83981943, ..., -0.9273374 ,\n         1.4046632 ,  0.30895016],\n       [-0.04274601, -0.12016755, -0.53093857, ..., -0.9320015 ,\n         0.8004919 ,  0.14586882],\n       [ 0.10363793,  0.21220148, -0.5180615 , ..., -1.103286  ,\n         1.030384  ,  0.33772892],\n       ...,\n       [ 0.45876223,  0.13564155, -0.37127146, ..., -0.24023826,\n         0.6981608 ,  0.5868731 ],\n       [-0.12448474, -0.12088505, -0.5615971 , ..., -0.42116365,\n         1.4583211 ,  0.395956  ],\n       [-0.10243232, -0.24882779,  0.15550528, ..., -0.7924694 ,\n         1.1544111 ,  0.19003616]], dtype=float32): PicklingError("Can\'t pickle <class \'numpy.dtype[float32]\'>: it\'s not found as numpy.dtype[float32]")')

About this issue

  • Original URL
  • State: open
  • Created 3 years ago
  • Reactions: 7
  • Comments: 16 (8 by maintainers)

Most upvoted comments

@lmcinnes @Augusttell I am not sure this should be closed as this problem, at least for me and I believe others as well, still persists.

ran into the same problem with numpy 1.20, downgraded to numpy==1.19 and umap worked.

@yamengzhang In the worst case if you just want to get working you can simply comment out the relevant lines in you umap installation – the joblib hashing isn’t required; it just speeds things up in certain very specific cases. If you remove the following lines:

https://github.com/lmcinnes/umap/blob/9113f4a3f1fa091e6874134bf26ac98e48c5c7ed/umap/umap_.py#L2572

and

https://github.com/lmcinnes/umap/blob/9113f4a3f1fa091e6874134bf26ac98e48c5c7ed/umap/umap_.py#L2670-L2681

it may actually work (although the numpy issues may rear their head elsewhere – I’m not sure).

numpy 1.20 seems to be causing a variety of issues right now. I don’t think it is the numpy package itself, but the way pip is handling things with dependencies packages that depend on numpy and may, or may not, be built against specific numpy versions. It is, to be honest, quite complicated and beyond my knowledge. I am hoping this will eventually sort itself out over the next few weeks as the upstream issues with numpy 1.20 get sorted out.

I resolved it using the following requirements list:

umap-learn==0.5.0 numpy==1.20.0 scipy==1.5.4 scikit-learn==0.24.1 numba==0.52 pynndescent==0.5.1 tbb==2021.1.1

For me it seem to be some kind of package conflict with the new numpy release.

I suspect there is something going on with dependency resolution / solving with PyPI (or conda, depending on which people are using) that means that depending on exactly what other packages are installed a conflict ends up occurring. That likely means it is going to be something subtle in other packages installed, and what their dependencies are, that will result in something going astray. It is going to make it hard to track down. Realistically nothing has changed in the umap-learn released, but numpy 1.20 seems to have had some flow on effects. Starting from a clean virtual environment may be one option to get something working.