core: Whisper Addon - Pipeline timeout

The problem

Speech-to-text has an error resulting in pipeline timeout.

Several attempts to fix the problem:

  • Reinstalled Whisper several times
  • Tried other languages
  • Tried other models
  • Wyoming intrgation reloaded
  • Wyoming intrgation deleted and re-added
  • ESPhome (2023.4.2) tried from webinstaller (Voice Assistant) & latest version (2023.4.4)

Exactly the same problem every time.

As an input source I use “M5Stack ATOM Echo Development Kit”, according to the instructions here.

What version of Home Assistant Core has the issue?

core-2023.5.1

What was the last working version of Home Assistant Core?

No response

What type of installation are you running?

Home Assistant OS

Integration causing the issue

Whisper

Link to integration documentation on our website

https://www.home-assistant.io/integrations/wyoming/

Diagnostics information

home-assistant_wyoming_2023-05-05T02-28-50.373Z.log

Example YAML snippet

No response

Anything in the logs that might be useful for us?

Whisper Addon:

s6-rc: info: service s6rc-oneshot-runner: starting
s6-rc: info: service s6rc-oneshot-runner successfully started
s6-rc: info: service fix-attrs: starting
s6-rc: info: service fix-attrs successfully started
s6-rc: info: service legacy-cont-init: starting
s6-rc: info: service legacy-cont-init successfully started
s6-rc: info: service whisper: starting
s6-rc: info: service whisper successfully started
s6-rc: info: service discovery: starting
INFO:__main__:Ready
[04:11:01] INFO: Successfully send discovery information to Home Assistant.
s6-rc: info: service discovery successfully started
s6-rc: info: service legacy-services: starting
s6-rc: info: service legacy-services successfully started
ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-20' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:26> exception=ValueError("can't extend empty axis 0 using modes other than 'constant' or 'empty'")>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 32, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 61, in handle_event
    segments, _info = self.model.transcribe(
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/transcribe.py", line 124, in transcribe
    features = self.feature_extractor(audio)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/feature_extractor.py", line 152, in __call__
    frames = self.fram_wave(waveform)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisper/feature_extractor.py", line 98, in fram_wave
    frame = np.pad(frame, pad_width=padd_width, mode="reflect")
  File "<__array_function__ internals>", line 200, in pad
  File "/usr/local/lib/python3.9/dist-packages/numpy/lib/arraypad.py", line 815, in pad
    raise ValueError(
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'


Debug Addistant:

stage: stt
run:
  pipeline: 01gzm3e9q5123zc88tmmmzbvwf
  language: de
events:
  - type: run-start
    data:
      pipeline: 01gzm3e9q5123zc88tmmmzbvwf
      language: de
    timestamp: "2023-05-05T02:12:57.766709+00:00"
  - type: stt-start
    data:
      engine: stt.faster_whisper
      metadata:
        language: de
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2023-05-05T02:12:57.766981+00:00"
stt:
  engine: stt.faster_whisper
  metadata:
    language: de
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: false

Additional information

No response

About this issue

  • Original URL
  • State: closed
  • Created a year ago
  • Reactions: 8
  • Comments: 65 (15 by maintainers)

Most upvoted comments

Just a heads up for those of you running a VM HA-instance: Double check that you have the avx -instruction sets enabled for your home assistant VM. This will have a huge impact on inference times.

I’m using Proxmox and by enabling x86-64_v3 for the HA vm, I got the “small” model (and not even the int8 variety) to run with 3-4s delay for most prompts whereas before it would always timeout.

Wow… That’s a terrible design

-------- Original Message -------- On 6 Jul 2023, 21:29, ChopperRob wrote:

Everytime audio is send from the esphome device, in my case an athom echo, it is send to a different port on the home assistant device.

I did a packet capture while doing 2 voice commands. During this time i see 2 TCP streams running on the normal 6053 port and port 80.

And i see 2 UDP streams both around 60KB in size. The first is send to port 43280 on the HA device, the second to port 54921. Earlier packet captures the destination port was 44483, 51865 etc.

Weird thing is that the source port is always 58466, normally the source port is random and the destination is fixed.

I opened all UDP traffic from my esphome devices to Home Assistant and Whisper is now working every time. (still have a different issue with piper, the device can’t play the repsonse audio)

— Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you are subscribed to this thread.Message ID: @.***>

It’s unusable.

Just a heads up for those of you running a VM HA-instance: Double check that you have the avx -instruction sets enabled for your home assistant VM. This will have a huge impact on inference times.

I’m using Proxmox and by enabling x86-64_v3 for the HA vm, I got the “small” model (and not even the int8 variety) to run with 3-4s delay for most prompts whereas before it would always timeout.

This helped me a lot. I changed my cpu to host as I only have the one proxmox node. I used this forum post to make that decision: https://forum.proxmox.com/threads/cpu-type-host-vs-kvm64.111165/

Everytime audio is send from the esphome device, in my case an athom echo, it is send to a different port on the home assistant device.

I did a packet capture while doing 2 voice commands. During this time i see 2 TCP streams running on the normal 6053 esphome api port and port 80 to grab the response audio.

And i see 2 UDP streams both around 60KB in size. The first is send to port 43280 on the HA device, the second to port 54921. Earlier packet captures the destination port was 44483, 51865 etc.

Weird thing is that the source port is always 58466, normally the source port is random and the destination is fixed.

I opened all UDP traffic from my esphome devices to Home Assistant on my firewall and Whisper is now working every time. (still have a different issue with piper, the device can’t play the repsonse audio)

My guess is esphome and HA will choose an UDP port via the communications on the API channels, so both ends know what ports will be used and HA can open the correct port.

A little OT but following up on my earlier comment, with the updates from chapter 4 everything is working for me now. Confidence restored and looking forward to eventually replacing alexa. 😄

Found interesting dependencies (proxmox with host CPU) - two different setups Piper works much faster (x2) in some ITX machine with Intel® Core™ i5-3470 CPU @ 3.20GHz compared with DELL R610 with Intel® Xeon® CPU E5620 @ 2.40GHz I assume is not because of the CPU speed, only. For me this is an optimization issue

This is for model small-int8. Small model is unusable in Xeon case.

I suspect this would be because E5620 does not support AVX instruction sets. … and AFAIK the benefit that AVX brings to the table is not about optimization.

I actually experienced this today after upgrading to home assistant 2023.8.0
Starting assist would show the ‘…’ while waiting for my voice, saying anything or just waiting would have no response and eventually return ‘Timeout running pipeline’. I’m running home assistant, whisper and piper in docker and everything was working fine on 2023.7 I tried updating the versions of whisper and piper but that didn’t change anything.

When going to settings > voice assistant > home assistant, I noticed that text-to-speech piper config was using ‘Amy (low)’ instead of ‘Ryan (low)’ that I had set previously. Clicking ‘try voice’ took a few minutes before finally generating the voice. I later realised when looking at the piper logs the delay was because it had to download the amy files. After trying the voice in the settings, assist was working fine again, no more timeouts, even switching back to Ryan works fine.

I guess assist was timing out because it was asking piper for a voice that piper didn’t have downloaded. Not sure at what point it switched but thought I’d share my experience in case it helps anyone else.

Wow… That’s a terrible design

yeah, not the best from a network perspective.

I just found out the issue in my case, the esphome device uses a random UDP port to send the audio to Home Assistant. My firewall was blocking this.

I have the same issue with my atom-echo and local voice assistant. If I set the voice assistant to cloud I get the following log in the homeassistant core log

Voice error: Error processing nl-NL speech: 400 No audio data received

It looks to me like the atom-echo is not sending the recording correctly, but I don’t know how to debug this.

I immidiately did, as i want to split traffic on network level and have HA host stable and dedicated 😃 I don’t do it, but you could also run HA OS in a VM and get best of both worlds.

You are right about the docs, of course. But the voice stuff is still early days and I personally like that we are getting fast updates and functionalities - even if documentation is behind.

Same issue. Only “base” model is working occasionally - VM 4gb RAM, 4 CPU (Intel® Xeon® CPU E5620 @ 2.40GHz) The whisper process taking 300%+ from the CPUs and eventually crashes Still exist in HASS 2023.8.1 and the latest version of whisper 1.0.0 I am using Firefox browser and companion android app.

future: <Task finished name='Task-20' coro=<AsyncEventHandler.run() done, defined at /usr/local/lib/python3.9/dist-packages/wyoming/server.py:28> exception=ConnectionResetError('Connection lost')>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 35, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py", line 45, in handle_event
    await self.write_event(self.wyoming_info_event)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 26, in write_event
    await async_write_event(event, self.writer)
  File "/usr/local/lib/python3.9/dist-packages/wyoming/event.py", line 114, in async_write_event
    await writer.drain()
  File "/usr/lib/python3.9/asyncio/streams.py", line 387, in drain
    await self._protocol._drain_helper()
  File "/usr/lib/python3.9/asyncio/streams.py", line 190, in _drain_helper
    raise ConnectionResetError('Connection lost')
ConnectionResetError: Connection lost

I was really impressed when HA came with voice assistant. It has understood czech language instantly. As I was speaking it immediatelly wrote what I said with almost no errors. After some time not using it I realized now the voice command (microphone icon) was missing and there was no possibility to select czech language. I found out I have to install the Whisper addon, but it just does not work. The addon’s CPU load is very very high, the response, if any, is very slow (several seconds) and it never understands what I say. Is there any possibility to revert back to the original voice-to-text service?

Upgraded the machine to 4gb ram, now it works!

surprisingly enough, my whisper communication problem got fixed too when i allowed the ‘atom’ device more wiggle-room though my firewall… im guessing its due to my Vlan config, but anyways, just happy it got figured out (at least in my case). ill probably spend the next few hours turning stuff on and off via voice commands, and than never use it again until there will be a “waking word” to try out…

cheers

Same issue here. But it worked a few weeks ago. Running on Raspi 4B 8GB.

Here’s what the debug assistant gives me:

stage: error
run:
  pipeline: 01h0are59f0x2f1q1efcc0kvr3
  language: de
  runner_data:
    stt_binary_handler_id: 1
    timeout: 30
events:
  - type: run-start
    data:
      pipeline: 01h0are59f0x2f1q1efcc0kvr3
      language: de
      runner_data:
        stt_binary_handler_id: 1
        timeout: 30
    timestamp: "2023-05-14T20:01:34.221550+00:00"
  - type: stt-start
    data:
      engine: stt.faster_whisper
      metadata:
        language: de
        format: wav
        codec: pcm
        bit_rate: 16
        sample_rate: 16000
        channel: 1
    timestamp: "2023-05-14T20:01:34.222296+00:00"
  - type: error
    data:
      code: stt-stream-failed
      message: Speech to text failed
    timestamp: "2023-05-14T20:01:42.375516+00:00"
stt:
  engine: stt.faster_whisper
  metadata:
    language: de
    format: wav
    codec: pcm
    bit_rate: 16
    sample_rate: 16000
    channel: 1
  done: false
error:
  code: stt-stream-failed
  message: Speech to text failed

image

ETA: since the switch to HA Yellow it seems like I can’t even get text commands parsed by the assistant anymore. No idea what’s going on. If I set language to English it works (in text), but in German it does not match the intent that I’m looking at.

@nima-1102 Where did you initiate the voice from? The Atom Echo or the browser microphone button? or someplace else?

Tried with the M5Stack ATOM Echo, Browser & IOS HA APP, always the same message.

I also have the same issue, I initiate the chat by pressing the button on the M5 Atom Echo - the error I see in the whisper container is:

ERROR:asyncio:Task exception was never retrieved
future: <Task finished name='Task-41' coro=<AsyncEventHandler.run() done, defined 
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/dist-packages/wyoming/server.py", line 32, in run
    if not (await self.handle_event(event)):
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/handler.py",
    segments, _info = self.model.transcribe(
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisp
    features = self.feature_extractor(audio)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisp
    frames = self.fram_wave(waveform)
  File "/usr/local/lib/python3.9/dist-packages/wyoming_faster_whisper/faster_whisp
    frame = np.pad(frame, pad_width=padd_width, mode="reflect")
  File "<__array_function__ internals>", line 200, in pad
  File "/usr/local/lib/python3.9/dist-packages/numpy/lib/arraypad.py", line 815, i
    raise ValueError(
ValueError: can't extend empty axis 0 using modes other than 'constant' or 'empty'

The logs on EspHome are:

[18:12:40][D][binary_sensor:036]: ‘Button’: Sending state ON [18:12:40][D][voice_assistant:065]: Requesting start… [18:12:40][D][voice_assistant:045]: Starting… [18:12:40][D][voice_assistant:083]: Assist Pipeline running [18:12:40][D][light:035]: ‘M5Stack Atom Echo d4e650’ Setting: [18:12:40][D][light:058]: Red: 0%, Green: 0%, Blue: 100% [18:12:42][D][binary_sensor:036]: ‘Button’: Sending state OFF [18:12:42][D][voice_assistant:073]: Signaling stop…

My ESPHome config looks like:

substitutions:
  name: m5stack-atom-echo-d4e650
  friendly_name: M5Stack Atom Echo d4e650
packages:
  m5stack.atom-echo-voice-assistant: github://esphome/firmware/voice-assistant/m5stack-atom-echo.yaml@main
esphome:
  name: ${name}
  name_add_mac_suffix: false
  friendly_name: ${friendly_name}
api:
  encryption:
    key: <REDACTED>

wifi:
  ssid: !secret wifi_ssid
  password: !secret wifi_password

I updated 2023-05-11

I run haos on oracle virtualbox on a Windows machine, so processing capability should not be the root cause of this issue

@nima-1102 What hardware are you running Whisper on? Pipelines have a timeout of 30 seconds, so it won’t work if Whisper takes more than that.

Home Assistant runs in a Proxmox VM (2x CPU, 10GB RAM) on an underutilized Intel Nuc (NUC8i3BEK), 32GB Ram, 4 x Intel® Core™ i3-8109U CPU @ 3.00GHz. I don’t think that the problem is related to the hardware, since the error occurs immediately after speaking and the VM has a total utilization of about 15% CPU / 3GB Ram.

The problem doesn’t seem to be related to the M5Stack ATOM Echo either, the same thing happens in the web browser & ios app as well.

What can I try or how can I help to narrow down the problem?