supervisor: Supervisor 22.03.2: Timeout while processing CodeNotary

Describe the issue you are experiencing

image

What is the used version of the Supervisor?

2022.03.2

What type of installation are you running?

Home Assistant Supervised

Which operating system are you running on?

Home Assistant Operating System

What is the version of your installed operating system?

Home Assistant OS 7.4

What version of Home Assistant Core is installed?

core-2022.3.3

Steps to reproduce the issue

Restart Home Assistant 2. 3. …

Anything in the Supervisor logs that might be useful for us?

22-03-09 10:18:31 WARNING (MainThread) [supervisor.utils.codenotary] rpc error: code = Unknown desc = unexpected HTTP status code received from server: 500 (Internal Server Error); transport: received unexpected content-type "text/plain; charset=utf-8"

6 Times

Additional information

No response

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Reactions: 57
  • Comments: 139 (25 by maintainers)

Most upvoted comments

Unlocked the issue, as the mitigation is in place now.

@frenck Not a problem, I’ll wait, but don’t say “should not impact the system at all”. In my opinion the problem is serious. And an even more serious problem is such a deep dependence on one cloud service. What about “Open source home automation that puts local control and privacy first.”

This is causing so many issues for me. For a “local-first” solution to home automation, this brief outage shouldn’t be causing so much mayhem.

ℹ️ I’m locking this issue, as things are running in circles.

Will update this response while things progress.

What is going on?

First of all, don’t panic! Nothing is broken/bad or in a state to worry about.

The Source of the issue is codesigning / verification for security is failing, as Codenotary has issues processing authorizations. This causes errors in the logs, but should not be blocking a system (it can cause start-up delays, but it will recover).

I cannot install a new Home Assistant instance

That is a bug, which is being worked on. This should not have happened. Even if Codenotary isn’t available, it should not block a new fresh installation.

My instance is marked unhealthy/unsupported

This is normal during the startup of the Supervisor. This will turn into supported/healthy ones all checks have run. As the startup is delayed due to timeout, it may stick with that for a bit; however, it will recover.

Updates/Progress

  • We are working on a change to the observer to not show “unsupported/unhealthy” during startup, as that is not correct. It should show “Processing…” as a state until it knows for sure the system is in a supported/unsupported/healthy/unhealthy state. This to avoid future confusion.

  • The timeouts can trigger a bug in the Supervisor, which mostly causes new installation to fail. A fix for that is in the making to prevent that for now and the future. Also, it ensures we start up faster, even if the external service is unavailable.

  • Codenotary seems to be recovering, their indexes do seem to return “unsigned” in quite a few cases. According to them, their service is catching up and should thus resolve momentarily. If your system reports “Unhealthy” because of that, don’t worry. It will recover.

  • We have adjusted our version file to advertise the previous version of the Supervisor; this will mitigate the issue for fresh/new installations (which were unable to complete).

  • We have deployed a newer version of the Supervisor (2022.03.3) on the stable channel to handle a timeout issue on our end better.

  • We have received multiple reports that restarting the Supervisor (or rebooting your instance if you want to use a harder measure), should resolve all the open issues reported.

  • If you have access to the system terminal or one of the SSH add-on terminals, you can try running the following commands to trigger an update:

    ha supervisor reload
    ha supervisor update
    

So, the Codenotary check failure, should not impact the system at all. It causes a trace/warning in the logs, but should not interfere otherwise.

Most reported cases above, are of this nature. There is nothing to worry about and you can safely ignore the message at this point.

ℹ️ Please do not reply with “I have the same issue” or similar. That will not add anything to resolving the issue itself. Instead, add a 👍 emoji to the initial issue post.

Same here, and I REALLY REALLY REALLY REALLY REALLY REALLY REALLY REALLY wish that you stop this supervisor auto update! Come on! I usually have a DNS block to stop this but had left it unblocked and got caught out with another supervisor auto update! PLEASE STOP THIS AUTO UPDATE! This is really getting silly!

For me solved: ha core rebuild

Can you please reopen https://github.com/codenotary/cas/issues/263 @frenck - I now also get the rpc errors again due to timeout. Thx

So, the Codenotary check failure, should not impact the system at all. It causes a trace/warning in the logs, but should not interfere otherwise.

This is only true for updates (it blocks updates), but it’s impossible to install a new system anymore since it can’t validate the docker images downloaded.

My HASS crashed today for some reason… I ended up wiping it and was going to reinstall from last nights backup which failed several times.

Tried a complete reinstall and I’m stuck with the Timeout while processing CodeNotary and it’s at the loading, must wait up to 20m screen.

Supervisor logs just so a restart of the homeassistant container every few minutes but no progress

Dying here, what on earth is going on today.

The change introduced a new, well-intentioned, likely valuable, cloud based dependency. The ability to disable it or have some method of working around that dependency would be helpful.

For me also the same issue, trying to update from core-2022.2.9 to core-2022.3.3 with supervisor-2022.03.2 and Home Assistant OS 7.4.

22-03-10 00:30:28 ERROR (MainThread) [supervisor.jobs] Unhandled exception: Traceback (most recent call last): File "/usr/src/supervisor/supervisor/jobs/decorator.py", line 106, in wrapper return await self._method(*args, **kwargs) File "/usr/src/supervisor/supervisor/homeassistant/core.py", line 229, in update await _update(version) File "/usr/src/supervisor/supervisor/homeassistant/core.py", line 207, in _update await self.instance.update( File "/usr/src/supervisor/supervisor/utils/__init__.py", line 33, in wrap_api return await method(api, *args, **kwargs) asyncio.exceptions.TimeoutError 22

Thanks for your time, I am sorted I updated the core from CLI and the front-end updated I can now see the add-ons page again

Not sure if it was the front-end or the core that sorted it.

Whatever was done, appears to have resolved the issue for me. Thanks!

@to4ko During restarts it will try to verify signatures with Codenotary, once the timeout passes on those commands, everything should be back to normal (give it a couple of minutes).

and now it turns to unhealthy and got solved with supervisor restart…

image

same on HAOS running in VM

image

@MTrab all my instances recovered within 10-15 minutes (including instances running stable, beta, and dev, and instances that run on Raspberry Pi’s, Home Assistant Blue’s, Supervised VM’s, and Home Assistant OS VMs).

Been trying to reproduce and break it on all those systems trying to see what goes south and how it behaves (as it should not block permanently, it should recover). All recover.

New installation is an issue as it seems, which is just an honest bug that is being looked into right now.

I’ve seen the same issue after Supervisor updated to 2022.3.2 - I now can’t update core to 2022.3.3 Log:- 22-03-09 16:38:37 INFO (SyncWorker_2) [supervisor.docker.interface] Updating image ghcr.io/home-assistant/generic-x86-64-homeassistant:2022.2.9 to ghcr.io/home-assistant/generic-x86-64-homeassistant:2022.3.3 22-03-09 16:38:37 INFO (SyncWorker_2) [supervisor.docker.interface] Downloading docker image ghcr.io/home-assistant/generic-x86-64-homeassistant with tag 2022.3.3. 22-03-09 16:39:33 ERROR (MainThread) [supervisor.jobs] Unhandled exception: Traceback (most recent call last): File “/usr/src/supervisor/supervisor/jobs/decorator.py”, line 106, in wrapper return await self._method(*args, **kwargs) File “/usr/src/supervisor/supervisor/homeassistant/core.py”, line 229, in update await _update(version) File “/usr/src/supervisor/supervisor/homeassistant/core.py”, line 207, in _update await self.instance.update( File “/usr/src/supervisor/supervisor/utils/init.py”, line 33, in wrap_api return await method(api, *args, **kwargs) asyncio.exceptions.TimeoutError 22-03-09 16:39:43 ERROR (MainThread) [supervisor.utils.codenotary] Timeout while processing CodeNotary 22-03-09 16:39:44 INFO (MainThread) [supervisor.homeassistant.core] Updating Home Assistant to version 2022.3.3 22-03-09 16:39:44 INFO (SyncWorker_6) [supervisor.docker.interface] Updating image ghcr.io/home-assistant/generic-x86-64-homeassistant:2022.2.9 to ghcr.io/home-assistant/generic-x86-64-homeassistant:2022.3.3 22-03-09 16:39:44 INFO (SyncWorker_6) [supervisor.docker.interface] Downloading docker image ghcr.io/home-assistant/generic-x86-64-homeassistant with tag 2022.3.3.

Also seeing errors with CodeNotary after updating to Supervisor 2022.03.2 while running Debian 11 and Home Assistant Supervised with version core-2022.3.2:

[supervisor.utils.codenotary] Timeout while processing CodeNotary

Going blind with a Logitech K400+ on the rpi4 and waiting extremely painful 10 minutes got it working. Thank god, because I couldn’t find that stupid mini-hdmi adapter anywhere…

ha su repair ha core rebuild ha host reboot

works for me, thank you

A full reboot worked for me! Thanks for fixing this issue!!

Everything goes back to work. Thank you guys

@ossconsulting I did read it, it is full of incorrect assumptions. That said, this is not the place for that discussion as I said above as well, which was based on actually reading your posting. Thanks 👍

Even when this is fixed, this issue brings to light a flaw in the way we check the integrity of Home Assistant. Let me explain:

Many Home Assistant owners chose this project to have a local solution that is useable without being dependent on an Internet connection or any third parties for their home automation to function. I feel that with this incident, we can state that we actually have a dependency on a third party and can not trust Home Assistant to be a truly local stand-alone solution. The impact of this incident was wider than just not being able to execute updates to the system. It resulted in:

  • Long bootup/restart times.
  • Some installations, it seems, did not even recover by themselves.
  • Misleading statements of code injections to the installation.
  • Watchdog restarts of at least one integration (Zwave-JS) was impacted.
  • New installations did not complete at all.
  • Recovery seemed rather random and largely dependent on if you were lucky enough to get a reply from CodeNotary’s servers. (I have not dared to restart my HA instance after it recovered because I don’t know if and when it will recover again)

For me personally, this resulted in parts of my house not functioning through a large part of the night without me knowing about it, followed by a reboot and several restarts of Supervisor before the situation was somewhat resolved. Statements like “CodeNotary has issues processing authorizations”, “their service is catching up”, “their indexes do seem to return ‘unsigned’ in quite a few cases” do little to comfort me at all and frankly makes me frown at the claimed “distributed ledger” character of the solution. Apparently it is either not distributed, or not distributed enough, to prevent situations like this. It also raises questions like:

  • What happens if CodeNotary is hacked/overloaded/DDOS-ed?
  • What happens if CodeNotary goes bankrupt or decides to stop/suspend the service for whatever reason?
  • What data does CodeNotary store about my system, IP, etc… At the very least there must be logs about how often I restart Supervisor and the fact that I run Home Assistant and also what version. Where does this data go? What privacy law do they adhere to?
  • Does anyone other than CodeNotary nodes running that we can fall back on if CodeNotary DNS/servers fails for a longer time? Is that even allowed/possible?
  • Why depend on a commercial party at all? Are there other, more open solutions? Seems like a keyserver-like solution as is used with Debian/Ubuntu packages should give a comparable level of security…

I understand that with a solution like CodeNotary, safety is improved but I would like to suggest a few changes to make the dependency on it a lot less:

  • An easy way to turn it off altogether if so desired by the system owner.
  • Ignore it when timeouts/disconnects are encountered and we are not updating software. Not being able to check should never prevent a home from functioning.
  • And/or cache signing results locally so that CodeNotary’s servers are not needed during every restart of Supervisor, only when updating software.

With these changes, at least we have a choice to disable it and especially the local caching would help making the system less dependent on a third party. I can live with not being able to update when the code integrity is not assured. Actually that is a good thing. It is more difficult for me to live with the fact that my house will randomly stop functioning when someone else drops the ball…

Roger.

Bob.

FWIW, warning is reported to supervisor.addon.validate

Unlocked the issue, as the mitigation is in place now.

thanks! got updated right after release…all good! thanks!

Seems to be working now. I was able to update as well rebooted and system is running steady. Thank you!!!

@frenck Yes, it is a comment for CodeNotary but I don’t know how to contact them, Hopefully they can see my comment here

Having the same problem here. I am on supervisor-2022.03.2. This morning I noticed that Zwave-JS had crashed. Normally it should be restarted automatically but it didn’t. Reason seemed to be that the system was unhealthy. Checking on the system page, supervisor indeed told me that the system was unhealthy and pointed me to a rather alarming page that there were code injections in my system (which I do not believe since it is not exposed to the outside). In the logs there were a lot of “Timeout while processing CodeNotary” messages. So the immediate impact is larger than just not being able to update:

  • Watchdog restarts are inhibited
  • User is erroneously sent to a page that states he is powned. (https://www.home-assistant.io/more-info/unhealthy/untrusted)
  • It mumbled something along the lines of not being able to auto-heal, but the logs seem to have been rotated and I can no longer find the exact message.

After rebooting the host it still did not work, but after I restarted supervisor twice more, at least the system now works as it did and the “unhealthy” warning is gone. The timeout messages are still in the logs though.

@JohnnyM84 That issue is not related to this reported issue. Please raise a separate issue for that. Thanks 👍

after the upgrade of the supervisor i had the same error and my backups cannote be read: ERROR (MainThread) [supervisor.backups.backup] Can’t validate data for /data/backup/9968f77e.tar: expected boolean for dictionary value @ data[‘protected’]. Got ‘8’

I removed them and made 1 new and that is working, what could be the problem for it?

image image image image @frenck it doesn’t look harmless

@to4ko During restarts it will try to verify signatures with Codenotary, once the timeout passes on those commands, everything should be back to normal (give it a couple of minutes).

any updates ??? i tried all above and still getting the error , bearing in mind my hass os working no other problems

Please just press Subscribe button, do not chat here. With every reply everyone is notified and it can be very annoying.

@BebeMischa did that fix not being able to update for you? For me it threw a diffrent error

I was able to update, but after that my system was marked unsupported. That issue is now gone.

That issue is gone but can you still check for updates because after dns flush/restart the unsupported because of systemd-… is gone but i cant check for updates it gives me Failed to to call /refresh_updates

Logger: homeassistant.components.hassio Source: components/hassio/websocket_api.py:120 Integration: Home Assistant Supervisor (documentation, issues) First occurred: 07:47:34 (1 occurrences) Last logged: 07:47:34

Failed to to call /refresh_updates -

Super ugly hack that I DON’T RECOMMEND AT ALL to anyone because you completely bypass the image check! ONLY DO THIS IF YOU CAN’T WAIT AT ALL!!

Everything is done as root.

# On the machine running HA
docker exec -it hassio_supervisor sh
# in the container
vi /usr/src/supervisor/supervisor/utils/codenotary.py

You’ll want the beginning of the cas_validate function to look like this (in the docker container):

async def cas_validate(
    signer: str,
    checksum: str,
) -> None:
    """Validate data against CodeNotary."""
    if (checksum, signer) in _CACHE:
        return

    _CACHE.add((checksum, signer))  # This line was added
    return                          # This line was added

Then

# exit the container and
service hassio-supervisor restart

On paper (from the release notes), this seemed pretty harmless. Why is this snowballing?

Bob.

For me its still the same issue, trying to update from core-2022.2.9 to core-2022.3.3 with supervisor-2022.03.2 and Home Assistant OS 7.4.

22-03-09 23:29:02 INFO (SyncWorker_2) [supervisor.docker.interface] Downloading docker image ghcr.io/home-assistant/raspberrypi4-64-homeassistant with tag 2022.3.3.
22-03-09 23:29:23 ERROR (MainThread) [supervisor.jobs] Unhandled exception: 
Traceback (most recent call last):
  File "/usr/src/supervisor/supervisor/jobs/decorator.py", line 106, in wrapper
    return await self._method(*args, **kwargs)
  File "/usr/src/supervisor/supervisor/homeassistant/core.py", line 229, in update
    await _update(version)
  File "/usr/src/supervisor/supervisor/homeassistant/core.py", line 207, in _update
    await self.instance.update(
  File "/usr/src/supervisor/supervisor/utils/__init__.py", line 33, in wrap_api
    return await method(api, *args, **kwargs)
asyncio.exceptions.TimeoutError
22-03-09 23:29:33 ERROR (MainThread) [supervisor.utils.codenotary] Timeout while processing CodeNotary

22-03-09 15:42:04 WARNING (MainThread) [supervisor.utils.codenotary] rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 35.168.17.115:443: i/o timeout" That’s CST

I am trying to update from 2022.2.6 to 2022.3.3, but I am getting error:

22-03-09 22:18:43 ERROR (MainThread) [supervisor.jobs] Unhandled exception: 
Traceback (most recent call last):
  File "/usr/src/supervisor/supervisor/jobs/decorator.py", line 106, in wrapper
    return await self._method(*args, **kwargs)
  File "/usr/src/supervisor/supervisor/homeassistant/core.py", line 229, in update
    await _update(version)
  File "/usr/src/supervisor/supervisor/homeassistant/core.py", line 207, in _update
    await self.instance.update(
  File "/usr/src/supervisor/supervisor/utils/__init__.py", line 33, in wrap_api
    return await method(api, *args, **kwargs)
asyncio.exceptions.TimeoutError
22-03-09 22:18:43 WARNING (MainThread) [supervisor.utils.codenotary] rpc error: code = Unavailable desc = connection error: desc = "transport: Error while dialing dial tcp 35.168.17.115:443: i/o timeout"
22-03-09 22:18:45 ERROR (MainThread) [supervisor.utils.codenotary] Timeout while processing CodeNotary
22-03-09 22:19:06 ERROR (MainThread) [supervisor.utils.codenotary] Timeout while processing CodeNotary
22-03-09 22:19:06 INFO (MainThread) [supervisor.resolution.evaluate] System evaluation complete
22-03-09 22:19:06 INFO (MainThread) [supervisor.resolution.fixup] Starting system autofix at state CoreState.RUNNING
22-03-09 22:19:06 INFO (MainThread) [supervisor.resolution.fixup] System autofix complete

Supervisor is already the latest: supervisor-2022.03.2

On hassio it started working after:

ha dns update
ha dns restart

might have been a coincidence too, and the upstream DNS happened to update around the same time.