jina: [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ]

Describe your proposal/problem

Dear Jina Team,

Recently, I tried nlp-simple example and replaced TransformerTorchEncoder with TextPaddlehubEncoder. However, I got the following error.

Traceback (most recent call last):
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/jina/peapods/runtimes/zmq/zed.py", line 73, in _load_executor
    self._executor = BaseExecutor.load_config(
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/jina/jaml/__init__.py", line 531, in load_config
    return JAML.load(tag_yml, substitute=False)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/jina/jaml/__init__.py", line 89, in load
    r = yaml.load(stream, Loader=JinaLoader)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/yaml/__init__.py", line 114, in load
    return loader.get_single_data()
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/yaml/constructor.py", line 51, in get_single_data
    return self.construct_document(node)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/yaml/constructor.py", line 55, in construct_document
    data = self.construct_object(node)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/yaml/constructor.py", line 100, in construct_object
    data = constructor(self, node)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/jina/jaml/__init__.py", line 422, in _from_yaml
    return get_parser(cls, version=data.get('version', None)).parse(cls, data)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/jina/jaml/parsers/executor/legacy.py", line 130, in parse
    obj = cls(
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/jina/executors/__init__.py", line 82, in __call__
    getattr(obj, '_post_init_wrapper', lambda *x: None)(m, r)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/jina/executors/__init__.py", line 174, in _post_init_wrapper
    self.post_init()
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/jina/hub/encoders/nlp/TextPaddlehubEncoder/__init__.py", line 38, in post_init
    self.model = hub.Module(name=self.model_name)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddlehub/module/module.py", line 171, in __new__
    module = cls.init_with_name(
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddlehub/module/module.py", line 263, in init_with_name
    user_module_cls = manager.install(name=name, version=version, source=source, update=update, branch=branch)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddlehub/module/manager.py", line 188, in install
    return self._install_from_name(name, version)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddlehub/module/manager.py", line 263, in _install_from_name
    return self._install_from_url(item['url'])
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddlehub/module/manager.py", line 256, in _install_from_url
    return self._install_from_archive(file)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddlehub/module/manager.py", line 361, in _install_from_archive
    return self._install_from_directory(directory)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddlehub/module/manager.py", line 345, in _install_from_directory
    hub_module_cls = HubModule.load(self._get_normalized_path(module_info.name))
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddlehub/module/module.py", line 219, in load
    paddle.set_device(place)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddle/device.py", line 166, in set_device
    framework._set_expected_place(place)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddle/fluid/framework.py", line 317, in _set_expected_place
    _set_dygraph_tracer_expected_place(place)
  File "/home/work/renyuanhang/my_anaconda3/envs/jina_rocket2/lib/python3.8/site-packages/paddle/fluid/framework.py", line 311, in _set_dygraph_tracer_expected_place
    _dygraph_tracer_._expected_place = place
OSError: (External)  Cuda error(3), initialization error.
  [Advise: The API call failed because the CUDA driver and runtime could not be initialized. ] (at /paddle/paddle/fluid/platform/gpu_info.cc:229)

It seems to be related with https://github.com/PaddlePaddle/Paddle/issues/25185.

However, I have no idea how to fix this. Could you help me to figure it out? Thank you very much!


Environment

paddlehub==2.0.0 paddlenlp==2.0.1 paddlepaddle-gpu==2.0.2.post90 jina==1.3.0

Screenshots

image

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 47 (28 by maintainers)

Most upvoted comments

hi @ryh95 i looked into the issue, seems it’s not only related to Paddle, but also Pytorch, look at this thread. Basically, Jina use multithreading with fork as start method, while CUDA does not support it well, and for Jina we have to stick with fork as the default starting method.

As far as I know, serving your solution without enabling GPU is the easier way to make it work.

Hi, @alexcg1. I tried your suggestions, namely, I started from the wikipedia-sentences example and replaced the TransformerTorchEncoder with the TextPaddlehubEncoder. However, I still encountered the aforementioned Cuda error(3) with Jina 1.3

Maybe this problem is related to multiprocessing as this issue points out?