ganeti: Hail fails with FailN1 for an EXT instance allocation in 2.16

Hey,

A strange issue caught our attention when trying to update snf-ganeti to 2.16. Hail fails to allocate an instance with EXT disk template. This used to work in 2.15.

The output:

➜ ok0-mc1.dev /var/log/ganeti  # gnt-instance add -I hail -o snf-image+jessie -B memory=2G,vcpus=4 -t ext --disk=0:size=10G,provider=rbd,name=alexvol --net 0:network=snf-net-1,ip=pool --no-name-check --no-ip-check test                                           
Failure: prerequisites not met for this operation:
error type: insufficient_resources, error details:
Can't compute nodes using iallocator 'hail': Request failed: Group default (preferred): No valid allocation solutions, failure reasons: FailN1: 3

while with -n everything works as expected:

➜ ok0-mc1.dev /var/log/ganeti  # gnt-instance add -n ok0-00.dev.okeanos.grnet.gr -o snf-image+jessie -B memory=2G,vcpus=4 -t ext --disk=0:size=10G,provider=rbd,name=alexvol --net 0:network=snf-net-1,ip=pool --no-name-check --no-ip-check test
Thu Jun  8 11:50:54 2017  - INFO: NIC/0 inherits netparams ['snf-link-1', 'routed', u'']
Thu Jun  8 11:50:54 2017  - INFO: Chose IP <ip> from network snf-net-1
Thu Jun  8 11:50:55 2017 * disk 0, size 10.0G
Thu Jun  8 11:50:55 2017 * creating instance disks...
Thu Jun  8 11:50:56 2017 adding instance test to cluster config
Thu Jun  8 11:50:56 2017 adding disks to cluster config
Thu Jun  8 11:50:56 2017  - INFO: Waiting for instance test to sync disks
Thu Jun  8 11:50:57 2017  - INFO: Instance test's disks are in sync
Thu Jun  8 11:50:57 2017  - INFO: Waiting for instance test to sync disks
Thu Jun  8 11:50:57 2017  - INFO: Instance test's disks are in sync
Thu Jun  8 11:50:57 2017 * running the instance OS create scripts...
Thu Jun  8 11:51:53 2017 * starting instance...

The same goes with DRBD disks:

➜ ok0-mc1.dev /var/log/ganeti  # gnt-instance add -I hail -o snf-image+jessie -B memory=2G,vcpus=4 -t drbd --disk=0:size=10G --net 0:network=snf-net-1,ip=pool --no-name-check --no-ip-check test
Thu Jun  8 14:15:51 2017  - INFO: Selected nodes for instance test via iallocator hail: ok0-01.dev.okeanos.grnet.gr, ok0-00.dev.okeanos.grnet.gr
Thu Jun  8 14:15:51 2017  - INFO: NIC/0 inherits netparams ['snf-link-1', 'routed', u'']
Thu Jun  8 14:15:51 2017  - INFO: Chose IP <ip> from network snf-net-1
Thu Jun  8 14:15:53 2017 * creating instance disks...

So, I tried to debug this with gnt-debug (thanks @apoikos!):

root@ok0-mc1:~# gnt-debug allocator --dir in --mode allocate --mem 2G --disks 1G -t ext -o no_such_os no_such_instance > h-alloc-ext.json
root@ok0-mc1:~# /usr/lib/ganeti/iallocators/hail -v -p h-alloc-ext.json 
Received request: Allocate (Instance {name = "no_such_instance", alias = "no_such_instance", mem = 2048, dsk = 1024, disks = [Disk {dskSize = 1024, dskSpindles = Nothing}], vcpus = 1, runSt = Running, pNode = 0, sNode = 0, idx = -1, util = DynUtil {cpuWeight = 1.0, memWeight = 1.0, dskWeight = 1.0, netWeight = 1.0}, movable = True, autoBalance = True, diskTemplate = DTExt, spindleUse = 1, allTags = [], exclTags = [], dsrdLocTags = fromList [], locationScore = 0, arPolicy = ArNotEnabled, nics = [Nic {mac = Just "00:11:22:33:44:55", ip = Nothing, mode = Nothing, link = Nothing, bridge = Nothing, network = Nothing}], forthcoming = False}) (AllocDetails 1 Nothing) Nothing

Initial cluster status:
 F Name                          t_mem n_mem i_mem x_mem  f_mem  u_mem r_mem t_dsk f_dsk pcpu vcpu pcnt scnt    p_fmem p_fdsk r_cpu   lCpu   lMem   lDsk   lNet
   ok0-02.dev.okeanos.grnet.gr  193809  4096  3072 -5372 192013 186641  7168     0     0   24   47   16    7    0.9630 1.0000  1.96 16.000 16.000 23.000 16.000
 - ok0-mc1.dev.okeanos.grnet.gr      0  4096     0 -4096      0  -4096     0     0     0    0    0    0    0 -Infinity 1.0000   NaN  0.000  0.000  0.000  0.000
 - ok0-mc2.dev.okeanos.grnet.gr      0  4096     0 -4096      0  -4096     0     0     0    0    0    0    0 -Infinity 1.0000   NaN  0.000  0.000  0.000  0.000
   ok0-00.dev.okeanos.grnet.gr  193809  4096  5120 -7049 191642 184593  4096     0     0   24   49   17    5    0.9524 1.0000  2.04 17.000 17.000 22.000 17.000
 - ok0-mc0.dev.okeanos.grnet.gr      0  4096     0 -4096      0  -4096     0     0     0    0    0    0    0 -Infinity 1.0000   NaN  0.000  0.000  0.000  0.000
   ok0-01.dev.okeanos.grnet.gr  193809  4096  4096 -6023 191640 185617  8192     0     0   24   48   17    6    0.9577 1.0000  2.00 17.000 17.000 23.000 17.000

{"success":false,"info":"Request failed: Group default (preferred): No valid allocation solutions, failure reasons: FailN1: 3","result":[]}

Final cluster status:
 F Name                          t_mem n_mem i_mem x_mem  f_mem  u_mem r_mem t_dsk f_dsk pcpu vcpu pcnt scnt    p_fmem p_fdsk r_cpu   lCpu   lMem   lDsk   lNet
   ok0-02.dev.okeanos.grnet.gr  193809  4096  3072 -5372 192013 186641  7168     0     0   24   47   16    7    0.9630 1.0000  1.96 16.000 16.000 23.000 16.000
 - ok0-mc1.dev.okeanos.grnet.gr      0  4096     0 -4096      0  -4096     0     0     0    0    0    0    0 -Infinity 1.0000   NaN  0.000  0.000  0.000  0.000
 - ok0-mc2.dev.okeanos.grnet.gr      0  4096     0 -4096      0  -4096     0     0     0    0    0    0    0 -Infinity 1.0000   NaN  0.000  0.000  0.000  0.000
   ok0-00.dev.okeanos.grnet.gr  193809  4096  5120 -7049 191642 184593  4096     0     0   24   49   17    5    0.9524 1.0000  2.04 17.000 17.000 22.000 17.000
 - ok0-mc0.dev.okeanos.grnet.gr      0  4096     0 -4096      0  -4096     0     0     0    0    0    0    0 -Infinity 1.0000   NaN  0.000  0.000  0.000  0.000
   ok0-01.dev.okeanos.grnet.gr  193809  4096  4096 -6023 191640 185617  8192     0     0   24   48   17    6    0.9577 1.0000  2.00 17.000 17.000 23.000 17.000

While with DRBD:

root@ok0-mc1:~# gnt-debug allocator --dir in --mode allocate --mem 2G --disks 10G -t drbd -o no_such_os no_such_instance > h-alloc-drbd.json
root@ok0-mc1:~# /usr/lib/ganeti/iallocators/hail h-alloc-drbd.json 
{"success":true,"info":"Request successful: Selected group: default, Group default (preferred): score: 0.57086361, successes 6, failures 0 () for node(s) ok0-02.dev.okeanos.grnet.gr/ok0-00.dev.okeanos.grnet.gr","result":["ok0-02.dev.okeanos.grnet.gr","ok0-00.dev.okeanos.grnet.gr"]}

(also, hail works as expected using --no-capacity-checks)

Trying to reduce the output of the gnt-debug allocator command I noticed that the problem appears when there is an existing DRBD instance in the cluster and we try to allocate an EXT instance. A minimal input file for hail that triggers the error is here. This makes hail fail with FailN1 in 2.16 while it works fine in 2.15!

I tried to solve this but I’m not sure that I get this straight! Let me give it a shot:

When trying to allocate an EXT instance, cluster status reports total and free disk space (t_dsk/f_dsk) 0 as the EXT resources are not “accountable”
hail tries to test for N+1 redundant allocation by invoking canEvacuateNode (in src/Ganeti/HTools/GlobalN1.hs)
canEvacuateNode tries to failover DRBD instances by invoking move (from src/Ganeti/HTools/Cluster/Moves.hs) with Failover
move invokes applyMoveEx (same file) with force = True and Failover
applyMoveEx calls addPriEx which tries to compute new_dsk_forth with decIf uses_disk (fDskForth t) (Instance.dsk inst) where uses_disk = Instance.usesLocalStorage inst and localStorageTemplates = [ T.DTDrbd8, T.DTPlain ].
This results in new_dsk_forth <= 0 which raises a Bad T.FailDisk
The FailDisk appears as an FailN1 because of failed monadic computations in collectionToSolution (in src/Ganeti/HTools/Cluster/AllocationSolution.hs)

This might be triggered by @aehlig’s commit which enables the N+1 redundancy in tryAlloc:

commit 02132b44fd7471c02891f11ce41971d301227b70
Author: Klaus Aehlig <aehlig@google.com>
Date:   Fri Apr 17 14:54:28 2015 +0200

    Make tryAlloc honor global N+1 redundancy

    When looking for an allocation, make htools restrict to
    those that are globally N+1 redundant. As checking for
    N+1 redundancy is an expensive operation, we first
    look for the best allocations and filter out later.

    For the time being, we do not change the semantics of
    iterateAlloc; i.e., for iterateAlloc we will pretend
    that capacity checks are ignored.

    Signed-off-by: Klaus Aehlig <aehlig@google.com>
    Reviewed-by: Petr Pudlak <pudlak@google.com>

Then again, the above scenario might not make any sense! 😃 Anyway, we think that this is a rather serious issue that might even be blocking for the next release…

Let me know if I can help somehow!

On a side note, it seems a bit strange to me that n_mem is different in 2.15 and 2.16 but I 'd say that this has nothing to do with the reported issue.

About this issue

Original URL
State: open
Created 7 years ago
Comments: 15 (11 by maintainers)

Most upvoted comments

Hey fellas,

any update on this issue? Is there a fix coming?

irregulator on Nov 10, 2017

As a workaround, I’ve added the --no-capacity-checks options to the default-iallocato-params cluster configuration option: gnt-cluster modify --default-iallocator-params='-no-capacity-checks'

It seems to work, bypassing the new GlobalN+1 checks.

From the documentation:

--no-capacity-checks
 Normally, hspace will only consider those allocations where all instances
 of a node can immediately restarted should that node fail. With this
 option given, hspace will check only N+1 redundancy for DRBD instances.

From the NEWS:

- ``htools`` now also take into account N+1 redundancy for plain and shared
 storage. To obtain the old behavior, add the ``--no-capacity-checks`` option.

philipgian on Jun 19, 2017