edgex-go: security-proxy-setup fails after Ubuntu upgrade

🐞 Bug Report

Affected Services

  • Kong
  • security-proxy-setup

Is this a regression?

No

Description and Minimal Reproduction

Kong suffers unexplainable hang on startup after Ubuntu upgrade. This causes security-proxy-setup to fail to create proxy routes. Reported in Slack. See additional details for full Slack conversation.

🌍 Your Environment

Deployment Environment:

EdgeX Version: EdgeX Hanoi

Anything else relevant?

Marcelo Gallardo May 6th at 12:55 PM Hello, I need help trying to figure it out an issue that has popped up on an Edgex installation running on Rpi4. The whole thing has been working for months and now suddenly stopped working. I have tried just doing a simple ping to the secured API and somehow it seems like the paths to the services are not being registered. I have looked at the kong logs and I can see the following entry everytime I ping the gateway: GET /metadata/api/v1/ping HTTP/1.1 404. I have enabled DEBUG logging on the metadata service but nothing is shown. Any help or hint will be greatly appreciated. 22 replies

Lenny Goodell (Intel) 4 days ago Which release are you using?

Marcelo Gallardo 4 days ago Hanoi. I have tested access to the local urls from the same machine and it works fine (http://localhost:48081/api/v1/ping) it looks like the paths are not registered with kong. What component does that? I have been looking at all the log files and can’t find anything (edited)

Lenny Goodell (Intel) 3 days ago Proxy Setup

Lenny Goodell (Intel) 3 days ago It is a one shot service that exits once complete. (edited)

Marcelo Gallardo 3 days ago I was looking at the logs of the edgex-proxy service and noticed that it is failing with an error before it can setup the routes for the services. I don’t know why is this error coming now after it has been running successfully for months. Following is the error: edgex-proxy | level=INFO ts=2021-05-07T17:20:58.460341439Z app=edgex-security-proxy-setup source=service.go:103 msg=“the service on http://kong:8001 is up successfully” edgex-proxy | level=DEBUG ts=2021-05-07T17:20:58.460942404Z app=edgex-security-proxy-setup source=client.go:286 msg=“Using Secrets URL of https://edgex-vault:8200/v1/secret/edgex/edgex-security-proxy-setup/kong-tls” edgex-proxy | level=DEBUG ts=2021-05-07T17:20:58.47150529Z app=edgex-security-proxy-setup source=service.go:297 msg=“trying to upload cert to proxy server” edgex-proxy | level=ERROR ts=2021-05-07T17:21:08.472869068Z app=edgex-security-proxy-setup source=service.go:312 Posthttp://kong:8001/certificates:contextdeadlineexceeded(Client.Timeoutexceededwhileawaitingheaders)= msg=“failed to upload cert to proxy server with error %s” edgex-proxy | level=ERROR ts=2021-05-07T17:21:08.473117651Z app=edgex-security-proxy-setup source=init.go:62 msg=“Post "http://kong:8001/certificates\”: context deadline exceeded (Client.Timeout exceeded while awaiting headers)" Any ideas on the reason of this failure? Thanks for your help (edited)

Lenny Goodell (Intel) 3 days ago @Big-B (Intel), @Jim Wang (Intel) Thoughts? Is this a TTL issue? @Marcelo Gallardo did you try a clean restart of all the service?

Marcelo Gallardo 3 days ago Yes. I always restart everything clean. It seems like the certificates might have expired or something like that.

Big-B (Intel) 3 days ago This looks like a straight-up timeout to me. What EdgeX version is this? Hanoi?

Big-B (Intel) 3 days ago We added a patch in the Ireland compose file to set KONG_DNS_ORDER: LAST,A,CNAME Maybe stick that in the kong service environment section and see if that improves things.

Marcelo Gallardo 3 days ago @Big-B (Intel) This is Hanoi release. Anyway to configure the timeout? I am just surprised as this has been working for months and suddenly it stopped working this week. Nothing has changed on the docker-compose file. This docker-compose has been running fine since Hanoi was released.

Big-B (Intel) 3 days ago I have seen this kind of problem before, and I had to reinstall my Linux system to fix it. It may be related to some software I installed or patch I applied. I never got to the bottom of it and like you, other people didn’t have a problem with it. It’s like Kong gets stuck on something during initialization. Try doing the docker-compose up kong, and wait a minute or two, and then start up the rest of the stack just to confirm that it is kong’s coming alive that is the problem here. Also, as you notice, this condition isn’t easy to detect, as Kong opens the port and starts listening on it, but doesn’t actually serve content. (edited)

Marcelo Gallardo 3 days ago @Big-B (Intel) I think it is something related to the OS as the only thing I changed on the machine is accept the Ubuntu updates. After doing the Ubuntu update, I think this issue started happening. I will try installing Ubuntu again 😞 and hopefully is not something that is part of the latest release of Ubuntu.

Big-B (Intel) 3 days ago @Marcelo Gallardo I bet if we could figure out what’s setting it off we could get Kong changed to not trigger it.

Big-B (Intel) 3 days ago Good opportunity for snapshot/restore if you’re doing this in a VM to triangulate it.

Marcelo Gallardo 3 days ago Unfortunately this is on a Rpi4. From now on, I will make a backup of the SD card when all is working and don’t update the OS.

Marcelo Gallardo 3 days ago @Big-B (Intel) I did what you suggested and it fixed the issue. Waiting 1 minute after starting Kong seems to fix whatever is causing the timeout. Thanks.

Marcelo Gallardo 3 days ago @Big-B (Intel) I will be good if we can determine what is causing that though.

Big-B (Intel) 3 days ago Whatever it isn’t isn’t arch-specific. That’s something.

Marcelo Gallardo 3 days ago @Big-B (Intel) Even though the ping is working most of the time, I have been seeing quite a few errors in the kong log file. Something is causing issues with Kong

    • [07/May/2021:21:19:17 +0000] “GET /metadata/api/v1/ping HTTP/1.1” 499 0

Big-B (Intel) 3 days ago https://httpstatuses.com/499 “A non-standard status code introduced by nginx for the case when a client closes the connection while nginx is processing the request.” Did you try the env var suggestion? (edited)

Marcelo Gallardo 3 days ago I will try adding that but somehow this is happening only on this specific machine. I have 4 other Edgex installations and they are working fine. I will also try installing the OS again and see if that fixes the original issue. Thanks

Marcelo Gallardo 34 minutes ago @Big-B (Intel) I have tried adding the KONG_DNS_ORDER: LAST,A,CNAME environment and it does not help. The only thing that helps is waiting a period of time after kong is started before starting edgex-proxy. I have also tried on the latest release of Ubuntu 21.04 and get the same issue but I can get around by waiting a period of time before starting edgex-proxy. I also tried installing the version of Ubuntu (20.10) which was working for me for a long time but stopped working. I reinstalled and did not apply any updates. I tried and everything is working again without adding the wait. So, it seems like some update in the Ubuntu OS is causing issues with Kong.

Marcelo Gallardo 1 hour ago @Big-B (Intel) After fixing the indenting issue, the pull command worked fine but unfortunately, the run command seems to fail with similar issues to the hanoi release. Following is the output of the command make run arm64 docker-compose -p edgex -f docker-compose-arm64.yml up -d Creating network “edgex_edgex-network” with driver “bridge” Creating volume “edgex_consul-acl-token” with default driver Creating volume “edgex_consul-config” with default driver Creating volume “edgex_consul-data” with default driver Creating volume “edgex_db-data” with default driver Creating volume “edgex_edgex-init” with default driver Creating volume “edgex_kong” with default driver Creating volume “edgex_kuiper-data” with default driver Creating volume “edgex_postgres-config” with default driver Creating volume “edgex_postgres-data” with default driver Creating volume “edgex_redis-config” with default driver Creating volume “edgex_vault-config” with default driver Creating volume “edgex_vault-file” with default driver Creating volume “edgex_vault-logs” with default driver Creating edgex-security-bootstrapper … done Creating edgex-vault … done Creating kong-db … done Creating edgex-core-consul … Creating edgex-secretstore-setup … Creating kong … ERROR: for edgex-core-consul UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60) ERROR: for edgex-secretstore-setup UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60) ERROR: for kong UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60) ERROR: for consul UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60) ERROR: for secretstore-setup UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60) ERROR: for kong UnixHTTPConnectionPool(host=‘localhost’, port=None): Read timed out. (read timeout=60) ERROR: An HTTP request took too long to complete. Retry with --verbose to obtain debug information. If you encounter this issue regularly because of slow network conditions, consider setting COMPOSE_HTTP_TIMEOUT to a higher value (current value: 60). make: *** [Makefile:47: run] Error 1

About this issue

  • Original URL
  • State: closed
  • Created 3 years ago
  • Comments: 29 (7 by maintainers)

Most upvoted comments

@tonyespy

The version of docker installed by the script downloaded from docker.com is: Docker version 20.10.6, build 370c289

The version of docker-compose is: docker-compose version 1.29.2, build unknown

As for Snap, I was mistaken and I have not tried installing Snap on that machine as I read somewhere in the docs that Snap has not been tested on the version of Ubuntu I was using.

Thanks, Marcelo

Hi @bnevis-i, can you open an issue via https://github.com/Kong/kong/issues and include the kong version and logs?