camunda-modeler: Show specific error messages in the UI related to connection errors

Problem you would like to solve

Currently, the Desktop Modeler shows an generic error message of “Unknown error: Please Check Zeebe cluster status”.

Screen Shot 2023-08-28 at 1 56 37 PM

Tailing the logs shows the more specific connection error messages such as:

RequestError: certificate has expired

Proposed solution

Display the specific error messages in the ui to make troubleshooting easier.

Alternatives considered

It’s possible to tail the Desktop Modeler logs. However this is tedious and not intuitive for most customers.

Additional context

No response

About this issue

  • Original URL
  • State: closed
  • Created 10 months ago
  • Comments: 26 (17 by maintainers)

Most upvoted comments

Could we perhaps add additional checks prior to making a request to ensure that the TLS certificate is valid and display an error message if it’s not?

This form and lack of error response is a big frustration on my end, causing about 3-5 hours of debugging effort every time I have to deploy a model, which I often have to do for testing. (I have this issue both in web modeler and desktop modeler)

Additionally, I would like to offer my support. If there’s a change that can be made in the helm chart (I suppose this would be for the web modeler) such that a user does not need to configure their oauth url / zeebe gateway url, like environment variables I can add for these to be auto-filled, I would be more than happy to write a helm-chart patch for that.

From my perspective, this form has shown the red box for:

  1. Untrusted TLS certificate in the gateway
  2. Untrusted TLS certificate in keycloak
  3. Failing to write to a cache file that I never even knew existed
  4. Invalid client credentials

There might be more reasons. In my debugging I’ve also messed around with the Audience form element and found that whatever you put in that field is completely ignored by the application.

Also, I do know that there is a troubleshoot link that goes to the documentation, and while I am grateful for this, it is not good enough in my opinion.

What I wonder is if we can provide a CLI utility that verifies the proper configuration of a C8 (self-managed) instance, using tools equipped to do the job?

I’m not sure this is a good way to handle it. We have zbctl, but zbctl will often work regardless of whether the web modeler / desktop modeler can work. Or vice versa.

To me, a better solution would simply be to have environment variables that would pre-fill that form, and for us to set them as part of the helm chart. so for example:

CLUSTER_ENDPOINT=http://<RELEASE>-zeebe-gateway:26500
IDENTITY_OAUTH_URL=http://<RELEASE>-keycloak/auth/realms/camunda-platform/protocol/openid-connect/token
DEFAULT_AUDIENCE=test

Then the user only puts in the client id and secret. The helm chart can then properly set those environment variables.

That’s more of a Web Modeler suggestion. I’m not sure if that’s a good idea or not, or if that idea could be translated to the desktop modeler.

I still also think better error messages makes most since directly inside the modeler / web modeler.

Hi @nikku ,

I am a developer on the Distribution team who works on the helm charts. For me, it’s pretty common to deploy the helm chart locally and do basic testing, especially as it relates to support tickets, new features, helping others internally, and the occasional customer calls where customers struggle to do similar things.

Could you elaborate what you deploy, and why this always takes such long amount of time to deploy + debug?

What I deploy is often a values.yaml for the https://github.com/camunda/camunda-platform-helm/tree/main/charts/camunda-platform repo, and many times, I take a customers values.yaml, and modify it so that I can test things locally with their configuration. The reason it takes a long time for me is because I don’t know why it happens and that there are many reasons for the same error message (we basically just get a red box and something like “Unknown error. Please check Zeebe cluster status. Troubleshoot”).

So what am I doing that it takes so long for me to debug?

Once I get this error, I have to wonder whether the issue has to do with the networking, the deployment configuration, or the application code.

  1. I check to see if the cluster endpoint matches the external url in the ingress configuration
  2. I check that the OAuth2 url host name matches the keycloak host name designated in the ingress configuration
  3. I modify the ending of the oauth2 url: /auth/realms/camunda-platform/protocol/openid-connect/token. I often try removing the auth part, or playing around with different urls because I have no idea how I’m supposed to get this magic url.
  4. I verify the client ID and client secret in identity. Sometimes I will make a new Application in identity with all privileges, and sometimes I will just use the Zeebe client.
  5. I test all my TLS certs to ensure they are all valid
  6. I check the logs for Zeebe, Zeebe-gateway, and the web modeler restapi. The logs have always been worthless for me in debugging this, but I check them anyway.
  7. I modify the Keycloak url to use the kubernetes service name as the hostname instead of the external-facing url
  8. I modify the OAuth url to use the kubernetes service name as the hostname instead of the external-facing url
  9. I try the desktop modeler with previous steps to see if that’s any different
  10. I refer to daves message here: https://camunda.slack.com/archives/C05764N4VNZ/p1690906641310499

Usually, I go through those steps, they don’t always help, and then I just make panic changes because theres no logs or error messages. I have gone to the troubleshoot link before, sometimes it helps. Most of the times it does not. It did help with my most recent frustration when I was trying to configure a read-only root filesystem though. That was when I learned about the magic file ZEEBE_CLIENT_CONFIG_PATH=/path/to/credentials/cache.txt using the docs link.

Is what you do a common thing ordinary users do, and if so, how frequently do it?

Every user who installs C8 will need to verify that their installation is correct, and the only way to do that is to deploy a model and access each of the web components. Users will only have to debug this once, but I have to deploy the helm charts many times. So ordinary users will not be as frustrated as internal devs who are testing their installation.

Parsing the log for special character streams does not seem to me like a satisfying (and robust) solution.

zeebe-node error handling is what we’d need to plug into.

Maybe, if you have the chance, you could give it a debugging session yourself, and figure if there is pragmatic improvements we can do.

I’ve tagged this as spring cleaning.