pipelines: [backend] Metadata/Executions not written in 1.7.0-rc1 (New visualisations not working as a result)
/kind bug
I upgraded Kubeflow from 1.4.0 to 1.7.0-rc1 with the platnform-agnostic manifests.
While I now see correct visualizations of statistics from runs that happened before upgrading to 1.7.0-rc1, new runs only display the markdown details.
The TFX pipelines I submit are exactly the same. On the new runs the ML Metadata tab of the components prints:
“Corresponding ML Metadata not found.”
Furthermore I don’t see any new executions on the executions page despite running many pipelines since upgrading.
I don’t see anything special in the logs of the TFX pods except:
WARNING:absl:metadata_connection_config is not provided by IR.
But that was present before upgrading to 1.7.0-rc1.
The only errors I see in the metadata-grpc-deployment pod is:
name: "sp-lstm-rh6xt"
Internal: mysql_query failed: errno: 1062, error: Duplicate entry '48-sp-lstm-rh6xt' for key 'type_id'
Cannot create node for type_id: 48 name: "sp-lstm-rh6xt"
Which I also think is normal?
Basically I don’t think executions and artifacts are getting written to the DB for some reason in 1.7.0-rc1. Not sure how to debug this. This causes the visualizations to not show up as far as I can see.
Metadata in the TFX pipelines is configured via the get_default_kubeflow_metadata_config tfx.orchestration.kubeflow function.
Environment:
Kubeflow version: 1.4.0 -> 1.7.0-rc1 kfctl version: Not used. Using tfx.orchestration.kubeflow to submit pipelines. Kubernetes platform: Upstream kubeadm: k8s v1.20.5 Kubernetes version: (use kubectl version): OS (e.g. from /etc/os-release): Centos 8
Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.
About this issue
- Original URL
- State: closed
- Created 3 years ago
- Comments: 30 (18 by maintainers)
Commits related to this issue
- Fixes missing kfp_pod_name execution property in Kubeflow Pipelines. See also https://github.com/kubeflow/pipelines/issues/6138. PiperOrigin-RevId: 391192276 — committed to tensorflow/tfx by deleted user 3 years ago
- Fixes missing kfp_pod_name execution property in Kubeflow Pipelines. See also https://github.com/kubeflow/pipelines/issues/6138. PiperOrigin-RevId: 391192276 — committed to tensorflow/tfx by deleted user 3 years ago
- Fixes missing kfp_pod_name execution property in Kubeflow Pipelines. Kubeflow Dag Runner should record additional execution property to store pod names. See also https://github.com/kubeflow/pipelines... — committed to tensorflow/tfx by deleted user 3 years ago
- Fixes missing kfp_pod_name execution property in Kubeflow Pipelines. Kubeflow Dag Runner should record additional execution property to store pod names. See also https://github.com/kubeflow/pipelines... — committed to tensorflow/tfx by deleted user 3 years ago
- Fixes missing kfp_pod_name execution property in Kubeflow Pipelines. Kubeflow Dag Runner should record additional execution property to store pod names. See also https://github.com/kubeflow/pipelines... — committed to tensorflow/tfx by deleted user 3 years ago
- Fixes missing kfp_pod_name execution property in Kubeflow Pipelines. (#4157) * Fixes missing kfp_pod_name execution property in Kubeflow Pipelines. Kubeflow Dag Runner should record additional exe... — committed to tensorflow/tfx by dhruvesh09 3 years ago
- Fix typo in RELEASE.md (#4158) * Update RELEASE.md * Update version.py * Update version.py * Update version.py * Update dependencies.py * Update RELEASE.md * Fixes missing kfp_pod_n... — committed to tensorflow/tfx by dhruvesh09 3 years ago
- feat(sample): Use TFX 1.2.0 for Taxi tips prediction sample. Partial #6138 (#6381) * fix(sample): Use TFX1.2.0 for Taxi tips prediction sample * also update python * Update parameterized_tfx_os... — committed to kubeflow/pipelines by zijianjoy 3 years ago
- feat(frontend): integrate with TFX 1.2.0 metadata & visualization, no longer support previous versions. Part of #6138 (#6388) — committed to kubeflow/pipelines by Bobgy 3 years ago
- Fixes missing kfp_pod_name execution property in Kubeflow Pipelines. Kubeflow Dag Runner should record additional execution property to store pod names. See also https://github.com/kubeflow/pipelines... — committed to casassg/tfx by deleted user 3 years ago
I found some possible cause. It is related to the changes in the way TFX stores their contexts since 1.0.0 (which is related to the changes in the execution stack using TFX IR).
In TFX 0.X, the context were
However in TFX 1.0, the context became
Related code
So it seems like Kubeflow Pipelines cannot find context (and artifacts) properly. I think that we should change mlmd access code like here.
CC. @zhitaoli , @1025KB , @Bobgy
Unfortunately, it seems that there is no direct clue when finding executions. (Artifacts has
tfx_versionproperty, but there is no such information in Context / Execution.)I think that we can try to find 1.0 context first, and fallback to 0.X context if not found.
@ConverJens I can confirm it has.
TFX 1.2.0 and Pipelines 1.7.0 work perfectly with no patches.
I’m trying to include the above fix in the TFX 1.2.0 which is expected to be released tomorrow. I’ll update you when the release is finalized.
Hi, this is a bug from TFX side introduced in https://github.com/tensorflow/tfx/commit/24fc5d1db9198a75db11af25cf05c4d3ae05491f. It seems like we don’t record pod names in TFX 1.0.0. I’ll try to fix this ASAP, and will update the release plan.