xmanager: `xmanager launch` cannot resolve `'docker'` in subprocess

Hi All,

I am trying to run the example xmanager launch ./xmanager/examples/cifar10_tensorflow/launcher.py. However, I get the following error. Do you have any suggestion where this error may coming from and how could I fix it?

I1020 10:57:30.250377 4561079808 build_image.py:134] Local docker: {'Platform': {'Name': 'Docker Engine - Community'}, 'Components': [{'Name': 'Engine', 'Version': '20.10.8', 'Details': {'ApiVersion': '1.41', 'Arch': 'amd64', 'BuildTime': '2021-07-30T19:52:10.000000000+00:00', 'Experimental': 'false', 'GitCommit': '75249d8', 'GoVersion': 'go1.16.6', 'KernelVersion': '5.10.47-linuxkit', 'MinAPIVersion': '1.12', 'Os': 'linux'}}, {'Name': 'containerd', 'Version': '1.4.9', 'Details': {'GitCommit': 'e25210fe30a0a703442421b0f60afac609f950a3'}}, {'Name': 'runc', 'Version': '1.0.1', 'Details': {'GitCommit': 'v1.0.1-0-g4144b63'}}, {'Name': 'docker-init', 'Version': '0.19.0', 'Details': {'GitCommit': 'de40ad0'}}], 'Version': '20.10.8', 'ApiVersion': '1.41', 'MinAPIVersion': '1.12', 'GitCommit': '75249d8', 'GoVersion': 'go1.16.6', 'Os': 'linux', 'Arch': 'amd64', 'KernelVersion': '5.10.47-linuxkit', 'BuildTime': '2021-07-30T19:52:10.000000000+00:00'}
I1020 10:57:30.250654 4561079808 docker_lib.py:64] Building Docker image
Dockerfile:

FROM gcr.io/deeplearning-platform-release/tf2-gpu.2-6

RUN if ! id 1000; then useradd -m -u 1000 clouduser; fi

ENV LANG=C.UTF-8
RUN apt-get update && apt-get install -y git netcat
RUN python -m pip install --upgrade pip setuptools
COPY cifar10_tensorflow/requirements.txt /cifar10_tensorflow/requirements.txt
RUN python -m pip install -r cifar10_tensorflow/requirements.txt
COPY cifar10_tensorflow/ /cifar10_tensorflow
RUN chown -R 1000:root /cifar10_tensorflow && chmod -R 775 /cifar10_tensorflow
WORKDIR cifar10_tensorflow

COPY entrypoint.sh ./entrypoint.sh
RUN chown -R 1000:root ./entrypoint.sh && chmod -R 775 ./entrypoint.sh

ENTRYPOINT ["./entrypoint.sh"]

Size of Docker input: 7.0 kB
Building Docker image, please wait...
Traceback (most recent call last):
  File "/Users/chuchu/anaconda3/envs/jax/bin/xmanager", line 8, in <module>
    sys.exit(entrypoint())
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cli/cli.py", line 65, in entrypoint
    app.run(main)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cli/cli.py", line 41, in main
    app.run(m.main, argv=argv)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "/Users/chuchu/Documents/gt_local/try/xmanager/examples/cifar10_tensorflow/launcher.py", line 48, in main
    args={},
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm/core.py", line 484, in package
    return cls._async_packager.package(packageables)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm/async_packager.py", line 104, in package
    executables = self._package_batch(packageables)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm_local/packaging/router.py", line 56, in package
    for packageable in packageables
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm_local/packaging/router.py", line 56, in <listcomp>
    for packageable in packageables
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm/pattern_matching.py", line 113, in apply
    return case.handle(*values)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm_local/packaging/router.py", line 27, in _visit_caip_spec
    packageable.executable_spec)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm_local/packaging/cloud.py", line 153, in package_cloud_executable
    return _CLOUD_PACKAGING_ROUTER(packageable, executable_spec)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm/pattern_matching.py", line 113, in apply
    return case.handle(*values)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/xm_local/packaging/cloud.py", line 129, in _package_python_container
    packageable.env_vars, push_image_tag))
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cloud/build_image.py", line 110, in build
    image_name, project, bucket)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cloud/build_image.py", line 154, in build_by_dockerfile
    show_docker_command_progress=_SHOW_DOCKER_COMMAND_PROGRESS.value)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cloud/docker_lib.py", line 70, in build_docker_image
    dockerfile, show_docker_command_progress)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/site-packages/xmanager/cloud/docker_lib.py", line 113, in _build_image_with_docker_command
    subprocess.run(command, check=True, env={'DOCKER_BUILDKIT': '1'})
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/subprocess.py", line 488, in run
    with Popen(*popenargs, **kwargs) as process:
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/subprocess.py", line 800, in __init__
    restore_signals, start_new_session)
  File "/Users/chuchu/anaconda3/envs/jax/lib/python3.7/subprocess.py", line 1551, in _execute_child
    raise child_exception_type(errno_num, err_msg, err_filename)
FileNotFoundError: [Errno 2] No such file or directory: 'docker': 'docker'

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 23 (2 by maintainers)

Most upvoted comments

In IAM Admin, could you add xmanager@lraexp.iam.gserviceaccount.com as a Storage Admin? This was supposed to have been done in this function.

Just fyi, this service account (xmanager@lraexp.iam.gserviceaccount.com) is owned by you and is bound to this project, which you can view in Service Accounts. Granting this account additional permissions does not provide anyone else access other than the owners/editors of your project.

Regarding docker bug, perhaps we should include a environment-variable that users can overwrite to point to the full path of docker.

Pshiko, thank you for investigating the problem. We will apply the fixes you propose with some amendments. In run_container_subprocess we shouldn’t have been overriding environment at all. These variables should be set inside of the container, not for the docker run process.

I wonder if appending shell=True to subprocess.run on line 113 makes a difference…