amazon-ecs-agent: XFS driver hanging

I’m seeing periodic instability with docker 1.9.1 and ecs 1.7.1 (official builds) using the standard storage options on Amazon linux. I’ve not got a reliable repro case yet - but I’ll see what I can do about that.

The problem manifests as docker becoming unresponsive and unable to extract new docker images. Dmesg shows the xfs subsystem hanging:

[88440.204121] INFO: task xfsaild/dm-14:12382 blocked for more than 120 seconds.
[88440.207384]       Tainted: G            E   4.1.13-19.31.amzn1.x86_64 #1
[88440.210618] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[88440.214368] xfsaild/dm-14   D ffff8800b7873d18     0 12382      2 0x00000000
[88440.217840]  ffff8800b7873d18 ffffffff81a154c0 ffff880203ffb300 ffff8800e9ec6928
[88440.221420]  ffff8800b7874000 ffff880203ffb300 0000000000000000 ffff8800e9ec6800
[88440.224979]  ffff8800b8544800 ffff8800b7873d38 ffffffff814dd6a7 ffff8800e9ec6928
[88440.228631] Call Trace:
[88440.229754]  [<ffffffff814dd6a7>] schedule+0x37/0x90
[88440.232038]  [<ffffffffa0481c11>] _xfs_log_force+0x171/0x270 [xfs]
[88440.234938]  [<ffffffff81094bb0>] ? wake_up_state+0x20/0x20
[88440.237534]  [<ffffffffa0481d3a>] xfs_log_force+0x2a/0x90 [xfs]
[88440.240179]  [<ffffffffa048c3f0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[88440.243351]  [<ffffffffa048c52b>] xfsaild+0x13b/0x5a0 [xfs]
[88440.245772]  [<ffffffffa048c3f0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[88440.248848]  [<ffffffffa048c3f0>] ? xfs_trans_ail_cursor_first+0x90/0x90 [xfs]
[88440.251887]  [<ffffffff81087349>] kthread+0xc9/0xe0
[88440.253985]  [<ffffffff81087280>] ? kthread_create_on_node+0x180/0x180
[88440.257390]  [<ffffffff814e1aa2>] ret_from_fork+0x42/0x70
[88440.259879]  [<ffffffff81087280>] ? kthread_create_on_node+0x180/0x180
[88911.223796] device-mapper: thin: Data device (dm-1) discard unsupported: Disabling discard passdown.
[88911.229093] device-mapper: thin: 253:2: growing the data device from 44800 to 53760 blocks

docker info:

Containers: 11
Images: 333
Server Version: 1.9.1
Storage Driver: devicemapper
 Pool Name: docker-docker--pool
 Pool Blocksize: 524.3 kB
 Base Device Size: 107.4 GB
 Backing Filesystem: xfs
 Data file: 
 Metadata file: 
 Data Space Used: 16.69 GB
 Data Space Total: 28.19 GB
 Data Space Available: 11.5 GB
 Metadata Space Used: 8.016 MB
 Metadata Space Total: 25.17 MB
 Metadata Space Available: 17.15 MB
 Udev Sync Supported: true
 Deferred Removal Enabled: true
 Deferred Deletion Enabled: true
 Deferred Deleted Device Count: 0
 Library Version: 1.02.93-RHEL7 (2015-01-28)
Execution Driver: native-0.2
Logging Driver: json-file
Kernel Version: 4.1.13-19.31.amzn1.x86_64
Operating System: Amazon Linux AMI 2015.09
CPUs: 2
Total Memory: 7.8 GiB
Name: ip-172-31-13-236
ID: QKK2:EU7A:YOOB:LJ47:D62L:TXTR:QENQ:6NQA:RLBJ:3VXM:7KXO:JGRC

About this issue

  • Original URL
  • State: closed
  • Created 8 years ago
  • Reactions: 1
  • Comments: 33 (15 by maintainers)

Most upvoted comments

@abby-fuller Thanks for reporting as well. With the issues we seem to have with XFS, we’re likely looking to move back to ext4 with our next AMI.