umap: PicklingError numpy 1.20
I am having the following issue using numpy 1.20 and umpa-learn.
See the following traceback.
PicklingError Traceback (most recent call last)
<command-1906172682191955> in <module>
19
20 UMAP = reducer.fit_transform(article_cluster_data[[
---> 21 s for s in eans_embeddings.columns if "embedding" in s
22 ]])
23
/databricks/python/lib/python3.7/site-packages/umap/umap_.py in fit_transform(self, X, y)
2633 Local radii of data points in the embedding (log-transformed).
2634 """
-> 2635 self.fit(X, y)
2636 if self.transform_mode == "embedding":
2637 if self.output_dens:
/databricks/python/lib/python3.7/site-packages/umap/umap_.py in fit(self, X, y)
2571
2572 numba.set_num_threads(self._original_n_threads)
-> 2573 self._input_hash = joblib.hash(self._raw_data)
2574
2575 return self
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in hash(obj, hash_name, coerce_mmap)
265 else:
266 hasher = Hasher(hash_name=hash_name)
--> 267 return hasher.hash(obj)
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in hash(self, obj, return_digest)
66 def hash(self, obj, return_digest=True):
67 try:
---> 68 self.dump(obj)
69 except pickle.PicklingError as e:
70 e.args += ('PicklingError while hashing %r: %r' % (obj, e),)
/databricks/python/lib/python3.7/pickle.py in dump(self, obj)
435 if self.proto >= 4:
436 self.framer.start_framing()
--> 437 self.save(obj)
438 self.write(STOP)
439 self.framer.end_framing()
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
240 klass = obj.__class__
241 obj = (klass, ('HASHED', obj.descr))
--> 242 Hasher.save(self, obj)
243
244
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
92 cls = obj.__self__.__class__
93 obj = _MyHash(func_name, inst, cls)
---> 94 Pickler.save(self, obj)
95
96 def memoize(self, obj):
/databricks/python/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
502 f = self.dispatch.get(t)
503 if f is not None:
--> 504 f(self, obj) # Call unbound method with explicit self
505 return
506
/databricks/python/lib/python3.7/pickle.py in save_tuple(self, obj)
772 if n <= 3 and self.proto >= 2:
773 for element in obj:
--> 774 save(element)
775 # Subtle. Same as in the big comment below.
776 if id(obj) in memo:
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
240 klass = obj.__class__
241 obj = (klass, ('HASHED', obj.descr))
--> 242 Hasher.save(self, obj)
243
244
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
92 cls = obj.__self__.__class__
93 obj = _MyHash(func_name, inst, cls)
---> 94 Pickler.save(self, obj)
95
96 def memoize(self, obj):
/databricks/python/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
502 f = self.dispatch.get(t)
503 if f is not None:
--> 504 f(self, obj) # Call unbound method with explicit self
505 return
506
/databricks/python/lib/python3.7/pickle.py in save_tuple(self, obj)
787 write(MARK)
788 for element in obj:
--> 789 save(element)
790
791 if id(obj) in memo:
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
240 klass = obj.__class__
241 obj = (klass, ('HASHED', obj.descr))
--> 242 Hasher.save(self, obj)
243
244
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
92 cls = obj.__self__.__class__
93 obj = _MyHash(func_name, inst, cls)
---> 94 Pickler.save(self, obj)
95
96 def memoize(self, obj):
/databricks/python/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
502 f = self.dispatch.get(t)
503 if f is not None:
--> 504 f(self, obj) # Call unbound method with explicit self
505 return
506
/databricks/python/lib/python3.7/pickle.py in save_tuple(self, obj)
772 if n <= 3 and self.proto >= 2:
773 for element in obj:
--> 774 save(element)
775 # Subtle. Same as in the big comment below.
776 if id(obj) in memo:
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
240 klass = obj.__class__
241 obj = (klass, ('HASHED', obj.descr))
--> 242 Hasher.save(self, obj)
243
244
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save(self, obj)
92 cls = obj.__self__.__class__
93 obj = _MyHash(func_name, inst, cls)
---> 94 Pickler.save(self, obj)
95
96 def memoize(self, obj):
/databricks/python/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
516 issc = False
517 if issc:
--> 518 self.save_global(obj)
519 return
520
/databricks/python/lib/python3.7/site-packages/joblib/hashing.py in save_global(self, obj, name, pack)
115 Pickler.save_global(self, obj, **kwargs)
116 except pickle.PicklingError:
--> 117 Pickler.save_global(self, obj, **kwargs)
118 module = getattr(obj, "__module__", None)
119 if module == '__main__':
/databricks/python/lib/python3.7/pickle.py in save_global(self, obj, name)
958 raise PicklingError(
959 "Can't pickle %r: it's not found as %s.%s" %
--> 960 (obj, module_name, name)) from None
961 else:
962 if obj2 is not obj:
PicklingError: ("Can't pickle <class 'numpy.dtype[float32]'>: it's not found as numpy.dtype[float32]", 'PicklingError while hashing array([[-0.3997416 , -0.19219466, -0.83981943, ..., -0.9273374 ,\n 1.4046632 , 0.30895016],\n [-0.04274601, -0.12016755, -0.53093857, ..., -0.9320015 ,\n 0.8004919 , 0.14586882],\n [ 0.10363793, 0.21220148, -0.5180615 , ..., -1.103286 ,\n 1.030384 , 0.33772892],\n ...,\n [ 0.45876223, 0.13564155, -0.37127146, ..., -0.24023826,\n 0.6981608 , 0.5868731 ],\n [-0.12448474, -0.12088505, -0.5615971 , ..., -0.42116365,\n 1.4583211 , 0.395956 ],\n [-0.10243232, -0.24882779, 0.15550528, ..., -0.7924694 ,\n 1.1544111 , 0.19003616]], dtype=float32): PicklingError("Can\'t pickle <class \'numpy.dtype[float32]\'>: it\'s not found as numpy.dtype[float32]")')
About this issue
- Original URL
- State: open
- Created 3 years ago
- Reactions: 7
- Comments: 16 (8 by maintainers)
@lmcinnes @Augusttell I am not sure this should be closed as this problem, at least for me and I believe others as well, still persists.
ran into the same problem with numpy 1.20, downgraded to numpy==1.19 and umap worked.
@yamengzhang In the worst case if you just want to get working you can simply comment out the relevant lines in you umap installation – the joblib hashing isn’t required; it just speeds things up in certain very specific cases. If you remove the following lines:
https://github.com/lmcinnes/umap/blob/9113f4a3f1fa091e6874134bf26ac98e48c5c7ed/umap/umap_.py#L2572
and
https://github.com/lmcinnes/umap/blob/9113f4a3f1fa091e6874134bf26ac98e48c5c7ed/umap/umap_.py#L2670-L2681
it may actually work (although the numpy issues may rear their head elsewhere – I’m not sure).
numpy 1.20 seems to be causing a variety of issues right now. I don’t think it is the numpy package itself, but the way pip is handling things with dependencies packages that depend on numpy and may, or may not, be built against specific numpy versions. It is, to be honest, quite complicated and beyond my knowledge. I am hoping this will eventually sort itself out over the next few weeks as the upstream issues with numpy 1.20 get sorted out.
I resolved it using the following requirements list:
umap-learn==0.5.0 numpy==1.20.0 scipy==1.5.4 scikit-learn==0.24.1 numba==0.52 pynndescent==0.5.1 tbb==2021.1.1
For me it seem to be some kind of package conflict with the new numpy release.
I suspect there is something going on with dependency resolution / solving with PyPI (or conda, depending on which people are using) that means that depending on exactly what other packages are installed a conflict ends up occurring. That likely means it is going to be something subtle in other packages installed, and what their dependencies are, that will result in something going astray. It is going to make it hard to track down. Realistically nothing has changed in the umap-learn released, but numpy 1.20 seems to have had some flow on effects. Starting from a clean virtual environment may be one option to get something working.