aim: The docker container doesn't work out of the box

🐛 Bug

Hello, First of all, thank you very much for the creazy amount of work put into this.

Currently, it seems like the docker setup fails. I get this message : Aborted! '/opt/aim' is not a valid Aim repository. Do you want to initialize it? [y/N]:

To reproduce

docker run -d -p 43800:43800 -v /testFolder:/opt/aim aimstack/aim

Expected behavior

The container running and listening on port 43800

Environment

  • Aim Version (e.g., 3.0.1) : None
  • Python version : None
  • pip version : None
  • OS (e.g., Linux) : Ubuntu 22.04
  • Any other relevant information Latest Docker image

Additional context

Is there a way to add a yes to everything paramter (something like apt-get install -y)?

Thank you very much in advance.

About this issue

  • Original URL
  • State: open
  • Created 2 years ago
  • Reactions: 3
  • Comments: 16 (3 by maintainers)

Most upvoted comments

@cceyda I haven’t added the --host 0.0.0.0 to the server explicitly, because it’s the default host when running with server, so that part is fine.

@feldlime I actually got it working by splitting the 2 out (init step from server and ui). That way I don’t need to deal with timeouts and just do it manually. After initializing through docker with init, and later starting the compose file with server and ui, everything works and I don’t see those weird python errors anymore. So it seems initializing first (and not at the same time as booting up server and ui) was the trick - otherwise the data could become corrupted. Thanks for all the feedback and help!

Thanks for the catch @feldlime adding --host 0.0.0.0 was the key! I have tested and the below docker-compose has worked for me both on local & on a remote machine. @vanhumbeecka You have to add command: server --host 0.0.0.0 to the server aswell. and make sure the directory you are mounting in the volumes exists. It is ~/aim/training_logs for the example below. Also check the versions match between your client(python that is logging runs) and the server(docker image).

version: "3"

services:
  ui:
    image: aimstack/aim:3.16.0
    container_name: aim_ui
    restart: unless-stopped
    command: up --host 0.0.0.0
    ports:
      - 43800:43800
    volumes:
    - ~/aim/training_logs:/opt/aim
    networks:
      - aim

  server:
    image: aimstack/aim:3.16.0
    container_name: aim_server
    restart: unless-stopped
    command: server --host 0.0.0.0
    ports:
      - 53800:53800
    volumes:
    - ~/aim/training_logs:/opt/aim
    networks:
      - aim

networks:
  aim:
    driver: bridge

Here is a fake run to test remote connection:

from aim import Run

aim_run = Run(repo='aim://[remote_ip]:53800',
            experiment="docker_remote_test")  # replace example IP with your tracking server IP/hostname

# Log run parameters
aim_run['params'] = {
    'learning_rate': 0.001,
    'batch_size': 32,
}

aim_run.track(5, name='loss', epoch=0,
                          context={'subset':'train'})
aim_run.track(4, name='loss', epoch=1,
                          context={'subset':'train'})
aim_run.track(3, name='loss', epoch=2,
                          context={'subset':'train'})
aim_run.track(2, name='loss', epoch=3,
                          context={'subset':'train'})

Hello, I had a deeper look. So I wanted to setup an aimstack instance on a machine then have other machines send data to it. If my understunding is good, there needs to be a server receiveing the data, writing it to the right folder. Then a docker ui displaying this data.

After creating a folder then doing a : aim init then running the docker containers with a volume linked to the initialised folder. It now runs without any error. But for some reason, I can’t get anything display on web browser. I get the connexion was reinnitialised.

It seems like the Docker part is not yet stable, I will limit my self to a locally hosted server and ui even if it’s not the best solution.

To answer your question, I wanted to track all experiments on a server and have training on a different server. I run the training in a normal python script not in a docker.

Thank you still for your answers and work !

Best regards, Ilias.

Hello, Thank you for the fast reply. I see what the problem is (I will test it as soon as I have access to a computer). There needs to be an initialised aim on the host, so in my case :

  • Go to /testFolder
  • aim unit
  • Start the docker pointing at /testFolder

But what if I can’t install aim in the host? How can I init it without?

Thanks again!