zsh-async: workers getting killed in 1.8.0+

I’ve recently updated from 1.7.2 and I have found that https://github.com/mafredri/zsh-async/commit/361dc171e65c82f57ad814ebecea91c98a6d4b68 has caused a regression in my setup.

I use zsh-async to update my prompt with git info. Here’s my implementation with only the relevant parts:

update_prompt() {
    cd $1
    rc=$2
    ~/.prompt $rc 0
}

refresh_prompt() {
    local output=$3
    local next_is_ready=$6

    # If there are multiple refreshes in flight then only use the latest one,
    # therefore we can ignore this output if the next is ready
    if [[ $next_is_ready == 1 ]]; then
        return
    fi

    PROMPT="$(echo $output)"
    zle reset-prompt
}

async_start_worker      gitprompt -n
async_register_callback gitprompt refresh_prompt

prompt_precmd() {
    rc=$?

    # Set initial prompt without scm info
    PROMPT=$(echo "$(~/.prompt zsh $rc 1)")

    async_flush_jobs gitprompt
    async_job gitprompt update_prompt "$(pwd)" "$rc"
}

add-zsh-hook precmd prompt_precmd

When I quickly refresh my prompt (pressing enter quickly in sucession), it causes my gitprompt worker to be killed. I also get a zle error.

cem-dev:lewrus01:fancy-prompt[master+2+1]
❯

cem-dev:lewrus01:fancy-prompt[master+2+1]
❯
refresh_prompt:zle:12: widgets can only be called when ZLE is active

async_job: no such async worker: gitprompt

cem-dev:lewrus01:fancy-prompt[...]
❯
async_job: no such async worker: gitprompt

I got neither of these problems on 1.7.2. I guess the problem is something to do with using the zle watcher?

Any help on this would be greatly appreciated.

About this issue

Original URL
State: closed
Created 4 years ago
Comments: 15 (5 by maintainers)

Commits related to this issue

Work around the dead worker bug https://github.com/mafredri/zsh-async/issues/42 — committed to chorn/chorn-zsh-prompt by chorn 3 years ago
fix(dotfiles): handle more zsh-async error codes Trying to fix occasional: widgets can only be called when ZLE is active Following the example of: - https://github.com/sindresorhus/pure/blob/d... — committed to wincent/wincent by wincent 9 months ago

Most upvoted comments

@howardjohn Thanks for testing, and too bad about the workaround. If it’s any help, here’s how we set the worker restart up in Pure: https://github.com/sindresorhus/pure/blob/dfc8062c64df8821eaec7d741c75f3cee20d37e3/pure.zsh#L478-L495

mafredri on Oct 28, 2020

@reobin thanks! that is essentially what i have been doing manually, ran it for a few hours and seems great.

I use it for my prompt, so it gets a decent number of jobs (every time I hit enter) but shouldn’t be more than a couple per second

I also haven’t reproduced it consistently so its hard to quickly test out changes but I can throw them in my shell for a while and see what happens

$ zsh --version
zsh 5.8 (x86_64-debian-linux-gnu)

howardjohn on Oct 26, 2020

As expected, I had the workaround messed up, simple type 🤦‍♀️ . I verified the workaround does work, added some logging when it occurs so I see its transparently happened a couple times.

Unfortunately seems like #49 did not seem to help much here during my testing.

howardjohn on Nov 2, 2020

@howardjohn I’m working on some improvements, but I can’t say for sure if they will help.

First off, how are you using zsh-async? I’ve never been able to reproduce constant worker death but I know some scenarios that can cause it. For instance, sending hundreds of jobs to the worker in quick succession.

Edit: And what version of zsh are you using?

It’d be great if you could try out #45, then maybe #49. And finally there’s a pretty huge rewrite going on in the (very WIP) test-rebased branch (based of the mentioned PRs). It’s possibly the best bet at fixing worker death but will require a lot more testing and fine tuning.

@reobin it’s not ideal, but indeed the best solution for current master branch, thanks for suggesting it!

mafredri on Oct 26, 2020

@howardjohn If it’s any help, the only thing I was able to do was reinitializing the workers when they die.

The second argument that is given to the callback function is the return code.

Docs on all the return codes:

1 Corrupt worker output.
2 ZLE watcher detected an error on the worker fd.
3 Response from async_job when worker is missing.
130 Async worker crashed, this should not happen but it can mean the file descriptor has become corrupt. This must be followed by a async_stop_worker [name] and then the worker and tasks should be restarted. It is unknown why this happens.

By just checking for this return code in the callback, you can reinitialize your workers when needed. I haven’t had a problem in months with typewritten since I implemented that check.

Example of callback function checking for the return code:

tw_prompt_callback() {
  local tw_name=$1 tw_code=$2 tw_output=$3

  # Check for return codes indicating an error
  if (( tw_code == 2 )) || (( tw_code == 3 )) || (( tw_code == 130 )); then
    # reinit async workers
    async_stop_worker tw_worker # stop the current worker
    tw_async_init_worker # Init the worker again, and register the callback, see below
    tw_async_init_tasks # Init all the tasks
  elif (( tw_code )); then
    # return code is not empty, reinit all tasks
    tw_async_init_tasks
  fi;
  ...
}

# For reference purpose
tw_async_init_worker() {
  async_start_worker tw_worker -n
  async_register_callback tw_worker tw_prompt_callback
}

reobin on Oct 26, 2020