numba: pyspark, cluster mode, cannot cache function '__shear_dense': no locator available
I have a problem, need help: when I use pyspark in aws, in Local mode, it worked; but cluster mode, it doesn’t,and have problem:
File "/mnt/yarn/usercache/hadoop/appcache/application_1646288077921_0008/container_1646288077921_0008_01_000002/pyspark.zip/pyspark/worker.py", line 364, in main
func, profiler, deserializer, serializer = read_command(pickleSer, infile)
File "/mnt/yarn/usercache/hadoop/appcache/application_1646288077921_0008/container_1646288077921_0008_01_000002/pyspark.zip/pyspark/worker.py", line 69, in read_command
command = serializer._read_with_length(file)
File "/mnt/yarn/usercache/hadoop/appcache/application_1646288077921_0008/container_1646288077921_0008_01_000002/pyspark.zip/pyspark/serializers.py", line 173, in _read_with_length
return self.loads(obj)
File "/mnt/yarn/usercache/hadoop/appcache/application_1646288077921_0008/container_1646288077921_0008_01_000002/pyspark.zip/pyspark/serializers.py", line 587, in loads
return pickle.loads(obj, encoding=encoding)
File "/mnt/yarn/usercache/hadoop/appcache/application_1646288077921_0008/container_1646288077921_0008_01_000002/pyspark.zip/pyspark/cloudpickle.py", line 875, in subimport
__import__(name)
File "/usr/local/lib/python3.7/site-packages/librosa/__init__.py", line 211, in <module>
from . import core
File "/usr/local/lib/python3.7/site-packages/librosa/core/__init__.py", line 5, in <module>
from .convert import * # pylint: disable=wildcard-import
File "/usr/local/lib/python3.7/site-packages/librosa/core/convert.py", line 7, in <module>
from . import notation
File "/usr/local/lib/python3.7/site-packages/librosa/core/notation.py", line 8, in <module>
from ..util.exceptions import ParameterError
File "/usr/local/lib/python3.7/site-packages/librosa/util/__init__.py", line 83, in <module>
from .utils import * # pylint: disable=wildcard-import
File "/usr/local/lib/python3.7/site-packages/librosa/util/utils.py", line 1848, in <module>
def __shear_dense(X, factor=+1, axis=-1):
File "/usr/local/lib64/python3.7/site-packages/numba/core/decorators.py", line 214, in wrapper
disp.enable_caching()
File "/usr/local/lib64/python3.7/site-packages/numba/core/dispatcher.py", line 812, in enable_caching
self._cache = FunctionCache(self.py_func)
File "/usr/local/lib64/python3.7/site-packages/numba/core/caching.py", line 610, in __init__
self._impl = self._impl_class(py_func)
File "/usr/local/lib64/python3.7/site-packages/numba/core/caching.py", line 348, in __init__
"for file %r" % (qualname, source_path))
RuntimeError: cannot cache function '__shear_dense': no locator available for file '/usr/local/lib/python3.7/site-packages/librosa/util/utils.py'
I also try to set ‘ENV NUMBA_CACHE_DIR=/tmp/NUMBA_CACHE_DIR/’ in root, but it didn’t work
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Comments: 15 (7 by maintainers)
As I know, when spark-submit local, the user is ‘hadoop’; but when spark-submit yarn, the excutor node user is ‘yarn’. so which user environment val should I set? I try to set ‘NUMBA_CACHE_DIR’ in ‘hadoop’, ‘yarn’ and ‘root’, but all of them didn’t work