numba: pyspark, cluster mode, cannot cache function '__shear_dense': no locator available

I have a problem, need help: when I use pyspark in aws, in Local mode, it worked; but cluster mode, it doesn’t,and have problem:

File "/mnt/yarn/usercache/hadoop/appcache/application_1646288077921_0008/container_1646288077921_0008_01_000002/pyspark.zip/pyspark/worker.py", line 364, in main
    func, profiler, deserializer, serializer = read_command(pickleSer, infile)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1646288077921_0008/container_1646288077921_0008_01_000002/pyspark.zip/pyspark/worker.py", line 69, in read_command
    command = serializer._read_with_length(file)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1646288077921_0008/container_1646288077921_0008_01_000002/pyspark.zip/pyspark/serializers.py", line 173, in _read_with_length
    return self.loads(obj)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1646288077921_0008/container_1646288077921_0008_01_000002/pyspark.zip/pyspark/serializers.py", line 587, in loads
    return pickle.loads(obj, encoding=encoding)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1646288077921_0008/container_1646288077921_0008_01_000002/pyspark.zip/pyspark/cloudpickle.py", line 875, in subimport
    __import__(name)
  File "/usr/local/lib/python3.7/site-packages/librosa/__init__.py", line 211, in <module>
    from . import core
  File "/usr/local/lib/python3.7/site-packages/librosa/core/__init__.py", line 5, in <module>
    from .convert import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python3.7/site-packages/librosa/core/convert.py", line 7, in <module>
    from . import notation
  File "/usr/local/lib/python3.7/site-packages/librosa/core/notation.py", line 8, in <module>
    from ..util.exceptions import ParameterError
  File "/usr/local/lib/python3.7/site-packages/librosa/util/__init__.py", line 83, in <module>
    from .utils import *  # pylint: disable=wildcard-import
  File "/usr/local/lib/python3.7/site-packages/librosa/util/utils.py", line 1848, in <module>
    def __shear_dense(X, factor=+1, axis=-1):
  File "/usr/local/lib64/python3.7/site-packages/numba/core/decorators.py", line 214, in wrapper
    disp.enable_caching()
  File "/usr/local/lib64/python3.7/site-packages/numba/core/dispatcher.py", line 812, in enable_caching
    self._cache = FunctionCache(self.py_func)
  File "/usr/local/lib64/python3.7/site-packages/numba/core/caching.py", line 610, in __init__
    self._impl = self._impl_class(py_func)
  File "/usr/local/lib64/python3.7/site-packages/numba/core/caching.py", line 348, in __init__
    "for file %r" % (qualname, source_path))
RuntimeError: cannot cache function '__shear_dense': no locator available for file '/usr/local/lib/python3.7/site-packages/librosa/util/utils.py'

I also try to set ‘ENV NUMBA_CACHE_DIR=/tmp/NUMBA_CACHE_DIR/’ in root, but it didn’t work

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (7 by maintainers)

Most upvoted comments

As I know, when spark-submit local, the user is ‘hadoop’; but when spark-submit yarn, the excutor node user is ‘yarn’. so which user environment val should I set? I try to set ‘NUMBA_CACHE_DIR’ in ‘hadoop’, ‘yarn’ and ‘root’, but all of them didn’t work