moby: "tls: bad record MAC" on Qemu Risc-V VM
Description
While trying to docker pull
some images, Docker logs multiple local error: tls: bad record MAC
errors into the Journal and keep retrying layers to pull the images. Eventually they get pulled.
...
Nov 24 13:32:51 qemuriscv dockerd[430]: time="2019-11-24T13:32:51.589142373Z" level=error msg="fatal task error" error="No such container: func_gateway.1.s97f0g29121agb0ew8tegqc95" module=node/agent/taskmanager node.id=xnmja6254jyia6kahqri8cune service.id=ofbzjkclec63bkzr47mnzzyei task.id=s97f0g29121agb0ew8tegqc95
Nov 24 13:32:51 qemuriscv dockerd[430]: time="2019-11-24T13:32:51.716350921Z" level=info msg="API listen on /run/docker.sock"
Nov 24 13:33:06 qemuriscv dockerd[430]: time="2019-11-24T13:33:06.884992784Z" level=info msg="Download failed, retrying (1/5): local error: tls: bad record MAC"
Nov 24 13:33:14 qemuriscv dockerd[430]: time="2019-11-24T13:33:14.156336788Z" level=info msg="Download failed, retrying (2/5): local error: tls: bad record MAC"
Nov 24 13:33:14 qemuriscv dockerd[430]: time="2019-11-24T13:33:14.177138551Z" level=info msg="Download failed, retrying (3/5): local error: tls: bad record MAC"
Nov 24 13:33:25 qemuriscv dockerd[430]: time="2019-11-24T13:33:25.986011763Z" level=info msg="Download failed, retrying
...
Steps to reproduce the issue:
- Fetch Qemu Virtual machine from https://github.com/carlosedp/riscv-bringup/releases/download/v1.0/debian-riscv64-20181123.tar.bz2
- Unpack the VM and run with
run_debian.sh
script. - Download Docker for Risc-V built from
master
on 22/11/2019 in the VM from https://github.com/carlosedp/riscv-bringup/releases/download/v1.0/docker-19.03.5-dev_riscv64.deb - Install Docker with
sudo apt install ./docker-19.03.5-dev_riscv64.deb
- Try pulling a Docker image like
docker pull carlosedp/faas-gateway:riscv64
- Check journal with
sudo journalctl -xef -u docker
for error messages.
Describe the results you received:
Docker keeps printing local error: tls: bad record MAC
and retrying some layers until eventually pulling the image.
Describe the results you expected:
Image get pulled with no errors.
Additional information you deem important (e.g. issue happens only occasionally):
I also tried disabling all checksumming from eth0 in the VM (as I experienced similar problems on ARM before) but no behavior change:
❯ sudo ethtool --offload eth0 rx off tx off
sudo ethtool -K eth0 rx off tx off
sudo ethtool -K eth0 gso off
sudo ethtool -K eth0 gro off
sudo ethtool -K eth0 tso off
Cannot change rx-checksumming
Cannot change tx-checksumming
Cannot change rx-checksumming
Cannot change tx-checksumming
Cannot change tcp-segmentation-offload
❯ sudo ethtool -k eth0
Features for eth0:
rx-checksumming: off [fixed]
tx-checksumming: off
tx-checksum-ipv4: off [fixed]
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: off [fixed]
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: off
tx-scatter-gather: off [fixed]
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: off
tx-tcp-segmentation: off [fixed]
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off [fixed]
tx-tcp6-segmentation: off [fixed]
udp-fragmentation-offload: off
generic-segmentation-offload: off
generic-receive-offload: off
large-receive-offload: off
rx-vlan-offload: off [fixed]
tx-vlan-offload: off [fixed]
ntuple-filters: off [fixed]
receive-hashing: off [fixed]
highdma: on [fixed]
rx-vlan-filter: on [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-gre-csum-segmentation: off [fixed]
tx-ipxip4-segmentation: off [fixed]
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]
tx-udp_tnl-csum-segmentation: off [fixed]
tx-gso-partial: off [fixed]
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: off [fixed]
rx-vlan-stag-hw-parse: off [fixed]
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: off [fixed]
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: off [fixed]
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: off [fixed]
tls-hw-record: off [fixed]
Output of docker version
:
❯ docker version
Client:
Version: unknown-version
API version: 1.40
Go version: devel +1c193b20a6 in the future
Git commit: 016a3232
Built: Fri Nov 22 16:22:26 2019
OS/Arch: linux/riscv64
Experimental: true
Server:
Engine:
Version: dev
API version: 1.41 (minimum version 1.12)
Go version: devel +1c193b20a6 in the future
Git commit: d1d5f64766-unsupported
Built: Fri Nov 22 18:12:30 2019
OS/Arch: linux/riscv64
Experimental: false
containerd:
Version: v1.3.1
GitCommit: c7a4f874b3267c499484aae602d1257b12d69e40
docker-init:
Version: 0.18.0
GitCommit: 3c53686
Output of docker info
:
❯ docker info
Client:
Debug Mode: false
Server:
Containers: 28
Running: 7
Paused: 0
Stopped: 21
Images: 13
Server Version: dev
Storage Driver: overlay2
Backing Filesystem: <unknown>
Supports d_type: true
Native Overlay Diff: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: xnmja6254jyia6kahqri8cune
Is Manager: true
ClusterID: z5l4k24sfd7z92uygtl1vcgma
Managers: 1
Nodes: 1
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 10.0.2.15
Manager Addresses:
10.0.2.15:2377
Runtimes: crun runc
Default Runtime: crun
Init Binary: docker-init
containerd version: c7a4f874b3267c499484aae602d1257b12d69e40
runc version: N/A
init version: 3c53686 (expected: fec3683)
Kernel Version: 5.3.0-rc4-g6c2a8bd-dirty
Operating System: Debian GNU/Linux bullseye/sid
OSType: linux
Architecture: riscv64
CPUs: 6
Total Memory: 3.856GiB
Name: qemuriscv
ID: LWAK:QM3O:5VSI:VHOX:YBL5:3N6T:KOVA:IX5U:PR2A:77EG:K3VU:J6YK
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Additional environment details (AWS, VirtualBox, physical, etc.):
Qemu Virtual Machine running Debian RootFS and Kernel 5.3-rc4.
I’m using crun (https://github.com/giuseppe/crun) as runtime since Runc depends on CGO which is not available on Risc-V.
My Kernel is built with this config: https://gist.github.com/carlosedp/4100815296675d126334f8f6afba4e93
Similar error to https://github.com/moby/moby/issues/38746.
About this issue
- Original URL
- State: closed
- Created 5 years ago
- Comments: 30 (13 by maintainers)
I’m not sure if this is related but I saw something that matches this description a couple of months ago. It was really mysterious, I did some debugging but didn’t really get to anything useful.
To me, it was not risc-v related, just qemu with user-net on a mac. I saw some net corruption from processes like git and docker. When I tried to trace what was going on it seemed that sometimes a packet was missing the last byte. Sounds unbelievable but not making this up. I could even repro it with
nc
in a very basic linux vm in the end iirc. When I traced the network traffic with qemu or tcpdump the missing byte was there. But it was not after syscall returned, or even in the strace/bpftrace output. I tried different versions of qemu as well. And I believe even different machines (cc @tiborvass)