docker-airflow: "Failed to fetch log file from worker" when running LocalExecutor

I’m seeing this in the webinterface when trying to access the logs of a task, but only when running LocalExecutor - could this be a misconfiguration?

*** Log file isn't local.
*** Fetching here: http://33d456e47018:8793/log/g_pipelines/rawlogs_pipeline/2016-10-24T05:30:00
*** Failed to fetch log file from worker.

*** Reading remote logs...
*** Unsupported remote log location.

About this issue

  • Original URL
  • State: open
  • Created 8 years ago
  • Reactions: 72
  • Comments: 70 (11 by maintainers)

Most upvoted comments

I took a different approach to solve this, which was to declare /usr/local/airflow/logs as a volume in my Dockerfile extending this image, and then to have my webserver container use the volumes from the scheduler. This allows having just one process per container. Note that the directory has to be created before declaring it as a volume otherwise it will be owned by root and the scheduler won’t be able to write to it.

Posted in Airflow’s chat:

… I can control the port number of the worker with worker_log_server_port but the host seems to be inferred somehow: *** Fetching here: http://02cd4b7f1893:8793/log/functional-tests-v1/collect_only/2016-11-02T00:00:00

Looking thru the code I see the host comes from `TaskInstance.hostname: https://github.com/apache/incubator-airflow/blob/6f4704a447756d6a17c617afe1a9b54d629c79ac/airflow/www/views.py#L756

So, how could/should I manipulate the hostname such that I can view logs from within the airflow admin UI?

FYI, I can view the logs from my worker by manually going to the URL where it exposes them. So that’s working. I just need airflow’s admin to be aware of the proper location in the logs UI.

I had to add 8793 to the ports for the worker to expose the port, then I can navigate to the logs at http://[mydockerhost]:8793/log/[dag_id]/[task_id]/[execution_date]

worker:
    image: puckel/docker-airflow:1.7.1.3-3
    restart: always
    environment:
        - EXECUTOR=Celery
    ports:
        - "8793:8793"
    command: worker

Does anyone have a complete workaround to this problem ? I think I have the same problem, but I’m not able to make it work 😕

I made the volume sharing worked by using volumes: - ./airflow-logs:/usr/local/airflow/logs

and we have to make sure ./airflow-logs is owned by a user airflow on the host chown airflow: ./airflow-logs

restart the service with docker-compose

My workaround, you could call it solution, is to mount a named-volume between scheduler, worker and webserver for the logfiles so that they appear “local” to the webserver. I am using the celery-executor.

[...]
#for scheduler, worker and webserver
 volumes:
            - ./dags/${DEPLOYMENT_CONFIGURATION}/:/usr/local/airflow/dags
            - ./plugins:/usr/local/airflow/plugins
            - ./helper:/usr/local/airflow/helper
            - airflowlogs:/usr/local/airflow/logs
[...]
volumes:
        airflowlogs: {}

I had same problem here (airflow v1.8.0). When I have used UI or command trigger_dag to run my dag :

airflow trigger_dag <my_dag_here>

Nothing was happening, logs in UI shows :

*** Log file isn't local.
*** Fetching here: http://:8793/log/<my_dag_here>/<my_task_here>/2017-07-13T14:08:46
*** Failed to fetch log file from worker.

*** Reading remote logs...
*** Unsupported remote log location.

Airflow even doesn’t created folders for log files.

But it works if I run command with backfill as suggested by @licryle .

airflow backfill <my_dag_here> -s -1

same here, I can’t see logs when launching from the UI, nor with airflow trigger_dag, however it seems to work when starting with airflow backfill.

So here’s the issue as I see it:

The worker creates a /etc/hosts entry for its hostname:

$ docker exec -it worker sh -c "cat /etc/hosts | grep 7c69e75cba80"
172.17.0.30 7c69e75cba80

But the webserver does not:

$ docker exec -it webserver sh -c "cat /etc/hosts | grep 7c69e75cba80"

The assumption by airflow must be that these two processes are executing on the same host. In a containerized env, this is not the case. Therefore the webserver container needs a hosts mapping to the worker(s).

@omegagi it’s the regular log files produced for each task in the dag. The good thing of that setup is all logs are shared by all docker container and also persisted on the host machine. When docker-airflow restarted, we still have the previous logs available

Another possible solution just found it incubator-airflow-dev mailing list, and i quote (thread name: Fixing logging for Airflow inside Docker):

I’ve got the same situation. I have dockerized airflow workers are running on different EC2s. To resolve this issue, I’ve set hostname of the docker container as the ip address of the EC2.

If you are using docker compose, you can add hostname field to the yaml file. Otherwise, use -h option to set hostname.

@puckel It seems from your desktop that your image correctly resolves the worker IP from its celery identification.

*** Fetching here: https://THEWORKERID/log/...

While this is failing for me, and according to @crorella and @licryle this behavior is related to tasks started from the UI only. Can you confirm this?

If that’s the case, I’m going to open this issue in the Apache Airflow issue tracker instead and we could close this one, since it’s very likely not a problem with the docker setup but with Airflow itself.

We are running airflow workers in docker containers on separate machines and had the exact same problem as in the initial post:

*** Log file isn't local.
*** Fetching here: http://33d456e47018:8793/log/g_pipelines/rawlogs_pipeline/2016-10-24T05:30:00
*** Failed to fetch log file from worker.

*** Reading remote logs...
*** Unsupported remote log location.

For us, setting the network_mode to host for the airflow worker containers causes that they can get the hostname of the machine they are running on. The hostname will be reported to celery and airflow will use this name to create the log URL, fixing our problem.

Just going to put this here since I’ve spent a couple hours troubleshooting this over the past few months and have given up multiple times. I’m running docker-airflow (w/ Celery executor) behind a corporate proxy and had HTTP_PROXY environment variables set. This was redirecting the log fetching calls to a proxy authentication web login page.

The fix was to simply add - NO_PROXY=* to the environment: section of the webserver service in my docker compose yml.

Hi,

I’ve fixed the problem in branch v1.8.0 (commit d4636f9). Airflow try to fetch logs on port tcp/8793 which is created by airflow worker. If using LocalExecutor, the sheduler and the webserver must be on the same host.

@manur @AdimUser What if there are multiple docker containers running in the same host machine at the port 8793 ?

I passed the

–hostname $HOSTNAME

docker parameter to docker run for the worker. so it will set the hostname for the docker instance.

@tkaymak @villasv

By setting the network to be that of the host, I was able to work around this. This forces docker to assign the same hostname to containers as the underlying host.

# On the worker host
docker run \
  --net=host \
  ...
  airflow worker
# On the master (webserver, scheduler host)
docker run \
  --net=host \
  ...
  airflow scheduler

docker run \
  --net=host \
  ...
  airflow webserver

I now see the container hostnames for task logs reflect the underlying hostnames:

*** Log file does not exist: /usr/local/airflow/logs/example-task/2020-01-05T08:57:24+00:00/1.log
*** Fetching from: http://ip-10-0-1-16.ec2.internal:8793/log/example-task/2020-01-05T08:57:24+00:00/1.log

For what it’s worth, I am also seeing the message below when I triggered a DAG manually through the UI using the LocalExecutor.

 *** Log file isn't local.
 *** Fetching here: http://:8793/log/<my_dag_here>/<my_task_here>/2017-07-13T14:08:46
 *** Failed to fetch log file from worker.
 
 *** Reading remote logs...
 *** Unsupported remote log location.

In addition the task seemed to be stuck in a running state.

When I changed from schedule_interval="@once" and manually triggering the dag to schedule_interval="* * * * *" and letting the scheduler pick it up the logs were visible as expected and the job ran as expected.

I had been hoping to use something like a one-time DAG to load connections to bootstrap a container for DAG development.

@sdikby No, it’s not fixing this issue.

The problem only occurs when using load_examples = True , I’ve tested with tuto.py dag with LOAD_EX=n in the compose file and the webserver have no problem to access logs on workers

@EBazarov Do you use a custom dag or examples ?