netmiko: Timeouts are not directly configurable

In most of the code, timeouts are handled as number of iterations (which are not always configurable) and depend on the delay factor. Wouldn’t it be easier to be able to set the timeouts in terms of seconds (floating)? Here is an example:

# Check if the only thing you received was a newline
count = 0
prompt = prompt.strip()
while count <= 10 and not prompt:
    prompt = self.read_channel().strip()
    if prompt:
        if self.ansi_escape_codes:
            prompt = self.strip_ansi_escape_codes(prompt).strip()
    else:
        self.write_channel(self.RETURN)
        time.sleep(delay_factor * .1)
    count += 1

Diminishing the delay_factor might make the code fail as the timeout is also reduced. I see the delay factor more as a polling interval, it should be configurable without affecting the timeout. It basically means that decreasing it will make the process use more CPU time and nothing else. One could even give a whole CPU to netmiko by setting it 0.

Here is an example of a working replacement:

# The timeout should be given as an argument (not the case in my example obviously)
# it should be configurable globally from the connection handler
# 11 seconds is the original timeout as the delay_factor defaults to 1, and there are 11 iterations
timeout = 11
# Check if the only thing you received was a newline
prompt = prompt.strip()
start = time.time()
while time.time() - start < timeout and not prompt:
    prompt = self.read_channel().strip()
    if prompt:
        if self.ansi_escape_codes:
            prompt = self.strip_ansi_escape_codes(prompt).strip()
    else:
        self.write_channel(self.RETURN)
        time.sleep(delay_factor * .1)

About this issue

Original URL
State: closed
Created 6 years ago
Comments: 22 (13 by maintainers)

Most upvoted comments

@ktbyers way better, ~4s per action.

It could be nice to have the possibility to choose via an option between the select or sleep method, like for example:

For stability choose the sleep method.
For performance choose the select method.

I know that is a lot of work to do but just saying 😃

goldyfruit on Apr 11, 2018

Yes, understood, but Netmiko supports 44 diverse platforms and I only have access to about 7 to 10 of them. So real testing is hard (especially for fundamental-core changes). Mocking also doesn’t generally work since most of the issues are related to device interactions.

I will read through what you have proposed more thoroughly. I have only skimmed through it.

delay_factor and global_delay_factor really are generally intended to go slower not faster.

For major tasks, like send_command, the method will complete once the trailing prompt is detected (or the expect_string if specified). The loop delay there is .2 seconds so it is a pretty marginal difference per loop (and the network devices are generally pretty slow on the CLI).

To me 4 seconds is fine…i.e. I care much more for every 100 users how many of them will have it fail or work (than is it 4 seconds or 600mS)

If you are talking outside of core methods (like send_command and send_config_set), then going faster, reliably is probably going to be difficult (note, I have tried to go faster in the past).

…that all being said, I am open to thinking about these core concepts and how they are implemented. I started to try to come up with a better integration in Netmiko 2 (but this is more for ease of use). Basically, I started to move towards timeout controlling the how many loops and the delay per loop in Netmiko 2 (where prior to this timeout in Netmiko really did very little).

This was mostly so I could make it easier for end-users on how to adjust the settings when they ran into devices that failed (i.e. they need to increase how long Netmiko sleeps).

I am also probably open to allowing timeout/delay_factor/global_delay_factor (including with new concepts) to have things go faster…though the defaults will need conservative. Basically if someone wants to go faster and decrease their reliability, they can choose to do so.

ktbyers on Feb 13, 2018

I saw you have written an issue about unit tests (functional tests too?). I can always create PoC, but I wouldn’t dare merge it without them.

According to me, the fact that the concepts of delay factors/polling intervals and timeouts are linked is misleading and error prone because if you try decrease the delay factors, you may break things (since the timeout decreases).

At the very least, we should make max_loops and delay_factor configurable everywhere (so that anyone can change the timeout).

Doing a small test (not using select, but adding a timeout with 100ms polling time), I managed to get from 4s to connect and execute ‘ls’ to 600ms, almost 8x speed up.

xavierhardy on Feb 13, 2018