amazon-ecs-agent: XFS driver hanging

I’m seeing periodic instability with docker 1.9.1 and ecs 1.7.1 (official builds) using the standard storage options on Amazon linux. I’ve not got a reliable repro case yet - but I’ll see what I can do about that.

The problem manifests as docker becoming unresponsive and unable to extract new docker images. Dmesg shows the xfs subsystem hanging:

[88440.204121] INFO: task xfsaild/dm-14:12382 blocked for more than 120 seconds.
[88440.207384]       Tainted: G            E   4.1.13-19.31.amzn1.x86_64 #1
[88440.210618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88440.214368] xfsaild/dm-14   D ffff8800b7873d18     0 12382      2 0x00000000
[88440.217840]  ffff8800b7873d18 ffffffff81a154c0 ffff880203ffb300 ffff8800e9ec6928
[88440.221420]  ffff8800b7874000 ffff880203ffb300 0000000000000000 ffff8800e9ec6800
[88440.224979]  ffff8800b8544800 ffff8800b7873d38 ffffffff814dd6a7 ffff8800e9ec6928
[88440.228631] Call Trace:
[88440.229754]  [<ffffffff814dd6a7>] schedule+0x37/0x90
[88440.232038]  [<ffffffffa0481c11>] _xfs_log_force+0x171/0x270 [xfs]
[88440.234938]  [<ffffffff81094bb0>] ? wake_up_state+0x20/0x20
[88440.237534]  [<ffffffffa0481d3a>] xfs_log_force+0x2a/0x90 [xfs]
[88440.240179]  [<ffffffffa048c3f0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[88440.243351]  [<ffffffffa048c52b>] xfsaild+0x13b/0x5a0 [xfs]
[88440.245772]  [<ffffffffa048c3f0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[88440.248848]  [<ffffffffa048c3f0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[88440.251887]  [<ffffffff81087349>] kthread+0xc9/0xe0
[88440.253985]  [<ffffffff81087280>] ? kthread_create_on_node+0x180/0x180
[88440.257390]  [<ffffffff814e1aa2>] ret_from_fork+0x42/0x70
[88440.259879]  [<ffffffff81087280>] ? kthread_create_on_node+0x180/0x180
[88911.223796] device-mapper: thin: Data device (dm-1) discard unsupported: Disabling discard passdown.
[88911.229093] device-mapper: thin: 253:2: growing the data device from 44800 to 53760 blocks

docker info:

Containers: 11
Images: 333
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 16.69 GB
 Data Space Total: 28.19 GB
 Data Space Available: 11.5 GB
 Metadata Space Used: 8.016 MB
 Metadata Space Total: 25.17 MB
 Metadata Space Available: 17.15 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.13-19.31.amzn1.x86_64
Operating System: Amazon Linux AMI 2015.09
CPUs: 2
Total Memory: 7.8 GiB
Name: ip-172-31-13-236
ID: QKK2:EU7A:YOOB:LJ47:D62L:TXTR:QENQ:6NQA:RLBJ:3VXM:7KXO:JGRC

About this issue

Original URL
State: closed
Created 8 years ago
Reactions: 1
Comments: 33 (15 by maintainers)

Most upvoted comments

@abby-fuller Thanks for reporting as well. With the issues we seem to have with XFS, we’re likely looking to move back to ext4 with our next AMI.

samuelkarp on Mar 11, 2016