moby: `docker service logs` stops showing logs from containers on different nodes

Description Running docker service logs foo on a swarm master where foo is a service with multiple replicas across different nodes eventually stops merging the logs from those other nodes. It seems to always work just fine right after the service is created.

Steps to reproduce the issue:

  1. Create a service foo with replicas across multiple nodes
  2. Run docker service logs --follow foo
  3. Initially observe logs from multiple containers across different nodes
  4. Go away and do something else for a while
  5. Run docker service logs --follow foo
  6. Observe old logs from multiple containers across different nodes but new logs only contain logs from the node on which you’re running the command

Describe the results you received: Logs from containers on the current node only

Describe the results you expected: Logs from all containers on all nodes

Additional information you deem important (e.g. issue happens only occasionally): Seems to work fine at first but then within some amount of time it stops working. I’ve tried with both json-file and journald log drivers.

Output of docker version:

Client:
 Version:      17.05.0-ce
 API version:  1.29
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:54 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.05.0-ce
 API version:  1.29 (minimum version 1.12)
 Go version:   go1.7.5
 Git commit:   89658be
 Built:        Thu May  4 22:10:54 2017
 OS/Arch:      linux/amd64
 Experimental: true

Output of docker info:

Containers: 7
 Running: 7
 Paused: 0
 Stopped: 0
Images: 6
Server Version: 17.05.0-ce
Storage Driver: aufs
 Root Dir: /var/lib/docker/aufs
 Backing Filesystem: extfs
 Dirs: 57
 Dirperm1 Supported: true
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins: 
 Volume: local
 Network: bridge host ipvlan macvlan null overlay
Swarm: active
 NodeID: jbsbgj3on5coa7f996rle8bpk
 Is Manager: true
 ClusterID: 7uzbzxfjt8nf6p18wbzv8ek84
 Managers: 1
 Nodes: 2
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
 Node Address: 172.16.0.5
 Manager Addresses:
  172.16.0.5:2377
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9048e5e50717ea4497b757314bad98ea3763c145
runc version: 9c2d8d184e5da67c95d601382adf14862e4f2228
init version: 949e6fa
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.0-75-generic
Operating System: Ubuntu 16.04.2 LTS
OSType: linux
Architecture: x86_64
CPUs: 1
Total Memory: 1.636GiB
Name: swarmm-master-94917428-0
ID: NNL7:YHDL:5ALU:4ZXF:J3BL:VAIV:UI2T:TV5U:UGQL:UCQC:WWCP:TQDO
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): true
 File Descriptors: 149
 Goroutines: 310
 System Time: 2017-05-12T22:02:26.629917059Z
 EventsListeners: 7
Registry: https://index.docker.io/v1/
Experimental: true
Insecure Registries:
 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support

Additional environment details (AWS, VirtualBox, physical, etc.): Running on azure using an acs-engine template (https://github.com/Azure/acs-engine). Currently just testing this so I’m using one manager and one worker node. The replicas for my service get split over both nodes.

About this issue

  • Original URL
  • State: open
  • Created 7 years ago
  • Reactions: 17
  • Comments: 67 (6 by maintainers)

Most upvoted comments

We ultimately decided to switch to kubernetes. We’re a small team, and really wanted to avoid the complexity of k8s. But honestly, it’s been quite a pleasure to work with. I wish everyone luck that this gets addressed … really frustrating to have this issue sit for so long

We experience the same problem (Docker version 17.12.1-ce, build 7390fc6). But if we user the Service ID instead of the Service Name, the logs go through:

docker service logs -f $(docker service ls --format "{{.ID}}\t{{.Name}}" | grep myservicename | cut -f1)

We’re experiencing this as well. Not only are we missing logs from docker service logs but they’re not being sent to our log aggregator as well. Not all the time and I’m not sure how to reproduce it, but we see this often enough.

When I run docker service logs -f foo_bar it spits out the log until some days ago. But this one works for me: docker service logs -f --since 24h foo_bar and it is tailing along. May help you until bug is fixed. I am running 18.06.1-ce, build e68fc7a on Ubuntu 16.04. (For me using the ID of the active service container didn’t help; it stopped at some point yesterday).

Somehow this still seems to be an issue.

Adding the --since flag fixes it.

eg: docker service logs -tf <service> --since 24h

2 and a half years after, the problem persists. No workaround works for me… Anyone has something that is not rebooting?

try docker log -f container-id

I am no docker expert by any means, so don’t ask me anything about it. But it is a very anoying problem and I want to share what I do when it happens.

I sometimes use the below command, which does work, but it’s anoying because on a cluster (3 nodes in my case) I have to log thre processes. Since the service is load balanced, but it does work.

watch -t docker logs --tail 500 process_name.1.xxxxxxxxxxxxx

For me to make the docker service logs work on my swarm cluster (3 nodes) I have to demote the Leader. Then restart docker service on that node. Wait till it is ready and reachable. Then promote to master again.

So the commands are along the lines of:

docker node demote managerX
systemctl restart docker
docker node promote managerX

I run the below to make sure all the 3 nodes run the service

docker service scale my_service=4
docker service scale my_service=3

Perhaps it helps someone 👍

When I run docker service logs -f foo_bar it spits out the log until some days ago. But this one works for me: docker service logs -f --since 24h foo_bar and it is tailing along. May help you until bug is fixed. I am running 18.06.1-ce, build e68fc7a on Ubuntu 16.04. (For me using the ID of the active service container didn’t help; it stopped at some point yesterday).

This works for me, on 18.09.1, with a caveat, which could potentially be a clue to what the issue is. If this is my current state (just a snippet, there are multiple instances of the service on multiple machines):

ID                  NAME                CURRENT STATE
h454joroxko6        tv_web.1            Running 20 minutes ago
xegys2ytn8hi         \_ tv_web.1        Shutdown 30 minutes ago
pq4n2mmxnxn8         \_ tv_web.1        Shutdown about an hour ago

Note the currently running service has only been up for 20 minutes.

If I do docker service logs -f --since 24h tv_web, I get stale logs from the old containers. Same if I even do docker service logs -f --since 1h tv_web.

But if I make the the “since” doesn’t go farther back then the start time of the running service, so in this case say 10m:

docker service logs -f --since 10m tv_web.

Then all will be well, and the current logs will tail.

Same here

Docker version 19.03.11 
APIVersion  1.40

I experience the same on a standalone swarm using Docker 18.06.1-ce.

I have one container running and one that was terminated:

sudo docker service ps local-apt_go-apt-mirror
ID                  NAME                            IMAGE                            NODE    DESIRED STATE       CURRENT STATE            ERROR               PORTS
9kq98iroy7df        local-apt_go-apt-mirror.1       corp.local/go-apt-mirror:latest  apt     Running             Running 11 minutes ago
ebeouczzc5ad         \_ local-apt_go-apt-mirror.1   corp.local/go-apt-mirror:latest  apt     Shutdown            Shutdown 24 hours ago

When running sudo docker service logs -f local-apt_go-apt-mirror, I got the log of the terminated container.

Same here:

Docker version 19.03.11, build 42e35e61f3

Not sure if this is related to #35011 but rotating the swarm certificates via docker swarm ca --rotate does help in this case as well.

Sadly this didn’t solve the issue. Logs are not showing when docker service logs .... I did disable firewall rules just in case but no, no logs output 😦

EDIT: demoting/promoting start worked inmediately: https://github.com/moby/moby/issues/35932#issuecomment-517299052

I’m getting the same issue here.

Docker version 18.02.0-ce, build fc4de44

hi, we have the sample problem. when a service gets moved, doing service logs on the machine that ran the previous task shows the old logs of that task instead of the running one.

however using the taskid with service logs shows the right logs

Client:
 Version:      17.09.0-ce
 API version:  1.32
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:41:23 2017
 OS/Arch:      linux/amd64

Server:
 Version:      17.09.0-ce
 API version:  1.32 (minimum version 1.12)
 Go version:   go1.8.3
 Git commit:   afdb6d4
 Built:        Tue Sep 26 22:42:49 2017
 OS/Arch:      linux/amd64
 Experimental: true

Indeed, tried demoting / promoting and the logs “came back”: https://github.com/moby/moby/issues/35932#issuecomment-517299052

The issue happened after (re-)deploying a stack/service multiple times (10+)

Same here, Docker 19.03.5 on CentOS 7.

I have to connect to directly to worker nodes to get logs from the container directly. Using portainer for that now. Pity me.

Guys, too early to be sure, but I think explicitly setting the logger config made it work for me. logging: driver: "json-file" options: max-file: 1 max-size: 20m

so if you need see log for emergency you can use the classic

docker log -f container-id

obviously you need an session in the docker node where run the container

You find the node with the next command

docker stack ps stackname

same problem but i use suse package 😢

Containers: 59
 Running: 22
 Paused: 0
 Stopped: 37
Images: 291
Server Version: 18.06.1-ce
Storage Driver: btrfs
 Build Version: Btrfs v3.18.2+20150430
 Library Version: 101
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
 Volume: local
 Network: bridge host macvlan null overlay
 Log: awslogs fluentd gcplogs gelf journald json-file logentries splunk syslog
Swarm: active
 NodeID: vk2j3y71fx51kk0njcpxjr0mo
 Is Manager: true
 ClusterID: qpih12g0fxuqqey42638fvvkb
 Managers: 2
 Nodes: 3
 Orchestration:
  Task History Retention Limit: 5
 Raft:
  Snapshot Interval: 10000
  Number of Old Snapshots to Retain: 0
  Heartbeat Tick: 1
  Election Tick: 3
 Dispatcher:
  Heartbeat Period: 5 seconds
 CA Configuration:
  Expiry Duration: 3 months
  Force Rotate: 0
 Autolock Managers: false
 Root Rotation In Progress: false
 Node Address: 172.17.1.108
 Manager Addresses:
  172.17.1.108:2377
  172.17.1.175:2377
Runtimes: oci runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 468a545b9edcd5932818eb9de8e72413e616e86e
runc version: 69663f0bd4b60df09991c08812a60108003fa340
init version: v0.1.3_catatonit (expected: fec3683b971d9c3ef73f284f176672c44b448662)
Security Options:
 apparmor
 seccomp
  Profile: default
Kernel Version: 4.4.175-94.79-default
Operating System: SUSE Linux Enterprise Server 12 SP3
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 9.585GiB
Name: srvdockerintra01
ID: 3MRA:H6E2:ZZ6O:5OYB:NYO2:2OZJ:KQ47:3D2O:LVH5:65JE:IOQ4:LMAZ
Docker Root Dir: /var/lib/docker
Debug Mode (client): false
Debug Mode (server): false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false

 127.0.0.0/8
Live Restore Enabled: false

WARNING: No swap limit support
WARNING: No kernel memory limit support


Client:
 Version:           18.06.1-ce
 API version:       1.38
 Go version:        go1.10.7
 Git commit:        e68fc7a215d7
 Built:             Wed Dec 19 10:23:04 2018
 OS/Arch:           linux/amd64
 Experimental:      false

Server:
 Engine:
  Version:          18.06.1-ce
  API version:      1.38 (minimum version 1.12)
  Go version:       go1.10.7
  Git commit:       e68fc7a215d7
  Built:            Tue Aug 21 17:16:31 2018
  OS/Arch:          linux/amd64
  Experimental:     false

docker service logs stops complete on ~$ docker version Client: Version: 18.09.3 API version: 1.39 Go version: go1.10.8 Git commit: 774a1f4 Built: Thu Feb 28 06:40:58 2019 OS/Arch: linux/amd64 Experimental: false

Server: Docker Engine - Community Engine: Version: 18.09.3 API version: 1.39 (minimum version 1.12) Go version: go1.10.8 Git commit: 774a1f4 Built: Thu Feb 28 05:59:55 2019 OS/Arch: linux/amd64 Experimental: false

This seems to be a problem for me now and none of the above seems to help 😕

@wayne-o We had to remove node by node as swarm manager and re-elect them again to managers. Once all nodes had been through this, it started working again. Tried that as well? 😃

This is affecting us, as well, on 18.06.1-ce, build e68fc7a. I can tail the logs using the id which technically is good enough for us to limp along with.

@dperny Yes I will try to grab some logs today. Thanks for looking into this.