argo-workflows: 3.4-rc2 - Workflows UI can no longer get logs (s3)
Checklist
- Double-checked my configuration.
- Tested using the latest version.
- Used the Emissary executor.
Summary
This occurs following an upgrade from workflows 3.3.9 to 3.4-rc2.
Logs are still correctly sent to s3 by argo workflows, I can see main.log in s3 and the contents of the log file are correct.
However, once the workflow has finished the pod has been archived, the logs field is now empty in the UI.
Clicking Try getting logs from the artifacts
results in Internal Server Error
The argo-server logs show this error, only when trying to clikc the ‘try getting logs from artifacts’ link.: level=error msg="Artifact Server returned internal error" error="artifact not found: main-logs"
No errors at all when just trying to view in the UI. None on the controller either.
Note, as I’m using IRSA, I have the following patch on my argo-server:
spec:
securityContext:
fsGroup: 65534
What version are you running? 3.4-rc2
Config summary: controller-configmap:
artifactRepository: |
# archiveLogs will archive the main container logs as an artifact
archiveLogs: true
s3:
endpoint: s3.amazonaws.com
bucket: my-bucket-name
region: us-east-1
insecure: false
keyFormat: "my-artifacts\
/{{workflow.creationTimestamp.Y}}\
/{{workflow.creationTimestamp.m}}\
/{{workflow.creationTimestamp.d}}\
/{{workflow.name}}\
/{{pod.name}}"
useSDKCreds: true
The argo-server ServiceAccount has the eks.amazonaws.com/role-arn
annotation as always.
Message from the maintainers:
Impacted by this regression? Give it a 👍. We prioritise the issues with the most 👍.
About this issue
- Original URL
- State: closed
- Created 2 years ago
- Reactions: 2
- Comments: 69 (67 by maintainers)
@juliev0 I might be a bit late to the party but we are currently facing a similar issue. Upgraded to version 3.4.0 with the new helm chart version 0.18.0.
After upgrading we see the logs of a step while it is currently running. As soon as that step is finished we are getting “internal server error” on the UI when viewing the logs of said step. The same issue occurs when trying to check the logs under archived workflows.
though we are using minio as S3 backend and not AWS S3. Settting POD_NAMES to “v1” in both the controller and server did not change this behavior.
Log Output from workflow server pod:
time="2022-09-20T08:01:17.442Z" level=info msg="selected SSO RBAC service account for user" email=XXX loginServiceAccount=workflows-super-admin serviceAccount=workflows-super-admin ssoDelegated=false ssoDelegationAllowed=true subject=2d6bdc16-b79b-4560-8d0c-c77da510a8f9 time="2022-09-20T08:01:17.443Z" level=info msg="Get artifact file" artifactName=main-logs namespace=weckdengeparden-cicd nodeId=workflow-push-weckdengeparden-pww7f-2238040116 workflowName=workflow-push-weckdengeparden-pww7f time="2022-09-20T08:01:17.458Z" level=error msg="Artifact Server returned internal error" error="no template found by the name of '' (which is the template associated with nodeId 'workflow-push-weckdengeparden-pww7f-2238040116'??" time="2022-09-20T08:01:17.458Z" level=info duration=20.395087ms method=GET path=/artifact-files/weckdengeparden-cicd/workflows/workflow-push-weckdengeparden-pww7f/workflow-push-weckdengeparden-pww7f-2238040116/outputs/main-logs size=22 status=500
Getting logs from the artifact repository also doesn’t work via the UI: the URL used is:
https://workflows.apps.play.gepaplexx.com/artifact-files/weckdengeparden-cicd/workflows/workflow-push-weckdengeparden-pww7f/workflow-push-weckdengeparden-pww7f-2238040116/outputs/main-logs
in minio I can find the logs under
https://minio.apps.play.gepaplexx.com/minio/argo-workflows/workflow-push-weckdengeparden-pww7f/workflow-push-weckdengeparden-pww7f-2238040116/
Any input or hints would be much appreciated! Unfortunately I am on vacation until the beginning of october but I’ll have a colleague of mine watch this thread, so he can give you more information or try out thanks.
workflow looks like this:
Unfortunately, after testing further with rc4 for a full 24 hours, this still isn’t fixed. I was wrong.
Equally unfortunately, it’s intermittent and I can’t get any more logs than
level=error msg="Artifact Server returned internal error" error="Access Denied"
.I’m also somewhat reluctant to do the same dance as before.
This is confirmed resolved in rc4