amazon-ecs-agent: XFS driver hanging
I’m seeing periodic instability with docker 1.9.1 and ecs 1.7.1 (official builds) using the standard storage options on Amazon linux. I’ve not got a reliable repro case yet - but I’ll see what I can do about that.
The problem manifests as docker becoming unresponsive and unable to extract new docker images. Dmesg shows the xfs subsystem hanging:
[88440.204121] INFO: task xfsaild/dm-14:12382 blocked for more than 120 seconds.
[88440.207384] Tainted: G E 4.1.13-19.31.amzn1.x86_64 #1
[88440.210618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88440.214368] xfsaild/dm-14 D ffff8800b7873d18 0 12382 2 0x00000000
[88440.217840] ffff8800b7873d18 ffffffff81a154c0 ffff880203ffb300 ffff8800e9ec6928
[88440.221420] ffff8800b7874000 ffff880203ffb300 0000000000000000 ffff8800e9ec6800
[88440.224979] ffff8800b8544800 ffff8800b7873d38 ffffffff814dd6a7 ffff8800e9ec6928
[88440.228631] Call Trace:
[88440.229754] [<ffffffff814dd6a7>] schedule+0x37/0x90
[88440.232038] [<ffffffffa0481c11>] _xfs_log_force+0x171/0x270 [xfs]
[88440.234938] [<ffffffff81094bb0>] ? wake_up_state+0x20/0x20
[88440.237534] [<ffffffffa0481d3a>] xfs_log_force+0x2a/0x90 [xfs]
[88440.240179] [<ffffffffa048c3f0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[88440.243351] [<ffffffffa048c52b>] xfsaild+0x13b/0x5a0 [xfs]
[88440.245772] [<ffffffffa048c3f0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[88440.248848] [<ffffffffa048c3f0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[88440.251887] [<ffffffff81087349>] kthread+0xc9/0xe0
[88440.253985] [<ffffffff81087280>] ? kthread_create_on_node+0x180/0x180
[88440.257390] [<ffffffff814e1aa2>] ret_from_fork+0x42/0x70
[88440.259879] [<ffffffff81087280>] ? kthread_create_on_node+0x180/0x180
[88911.223796] device-mapper: thin: Data device (dm-1) discard unsupported: Disabling discard passdown.
[88911.229093] device-mapper: thin: 253:2: growing the data device from 44800 to 53760 blocks
docker info:
Containers: 11
Images: 333
Server Version: 1.9.1
Storage Driver: devicemapper
Pool Name: docker-docker--pool
Pool Blocksize: 524.3 kB
Base Device Size: 107.4 GB
Backing Filesystem: xfs
Data file:
Metadata file:
Data Space Used: 16.69 GB
Data Space Total: 28.19 GB
Data Space Available: 11.5 GB
Metadata Space Used: 8.016 MB
Metadata Space Total: 25.17 MB
Metadata Space Available: 17.15 MB
Udev Sync Supported: true
Deferred Removal Enabled: true
Deferred Deletion Enabled: true
Deferred Deleted Device Count: 0
Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.13-19.31.amzn1.x86_64
Operating System: Amazon Linux AMI 2015.09
CPUs: 2
Total Memory: 7.8 GiB
Name: ip-172-31-13-236
ID: QKK2:EU7A:YOOB:LJ47:D62L:TXTR:QENQ:6NQA:RLBJ:3VXM:7KXO:JGRC
About this issue
- Original URL
- State: closed
- Created 8 years ago
- Reactions: 1
- Comments: 33 (15 by maintainers)
@abby-fuller Thanks for reporting as well. With the issues we seem to have with XFS, we’re likely looking to move back to ext4 with our next AMI.