dvc: Plots with multiple curves displayed wrong in cml

I am getting beautiful pr curves with DVC, where I have a curve for each class on my plot.

I took what I had and created a cml pipeline and the result looked completely different/worse.

From dvc I run:

dvc plots diff
Screen Shot 2022-07-29 at 3 24 52 PM

For the cml version I run:

dvc plots diff --target pr.csv --show-vega origin/main
Screen Shot 2022-07-29 at 3 25 31 PM

For cml I used the docker image found on dockerhub, updated five days prior to my usage dvcorg/cml

@pared mentioned it might be due to fact that “file” results override flexible plots, as its parsed later, need to prepare a reproduction script.

About this issue

  • Original URL
  • State: closed
  • Created 2 years ago
  • Comments: 15 (4 by maintainers)

Commits related to this issue

Most upvoted comments

@shortcipher3 created a PR to fix this: iterative/dvc#8114

@shortcipher3 thank you for the detailed steps, I was able to determine part of the issue.

The dvc plot being rendered via dvc plots diff --targets model_pr.csv main is in the regular vega format and the plot rendered via dvc plots diff --target model_pr.csv --show-vega main > vega.json is in the vega-lite format which differs slightly

there is probably a good way to programmatically extract the vega data embed on the index.html (in which case you can use vg2png instead of vl2png), but the quickest path to something workable would be to go:

dvc plots diff --targets model_pr.csv main
cml publish dvc_plots/index.html

cml publish with output a link like: https://asset.cml.dev/36855bf8a25c5882ce4c52ec85fc5d5f9f8f272f?cml=html which you can use in a report.md etc.

CC @daavoo @0x2b3bfa0 as feel they can probably recommend another approach?

For CI I’m using Gitlab, here is the relevant contents of my .gitlab-ci.yml

# .gitlab-ci.yml
stages:
  - cml_run

cml:
  stage: cml_run
  image: dvcorg/cml:latest
  script:
    # Setup environment
    - ln -sv $(which pip3) /usr/bin/pip3.8
    - pip install -r requirements_fiftyone.txt
    - pip install -r requirements.txt
    - npm install -g vega-lite@v5

    # Additional setup
    - git fetch --prune

    # Write any header information
    - echo "# CML Report" >> report.md

    - dvc fetch reports/model_pr.csv
    - dvc checkout reports/model_pr.csv
    - dvc plots diff 
      --target reports/model_pr.csv --show-vega origin/main > vega.json
    - vl2png vega.json model_pr.png
    - cml publish
      --driver=gitlab
      --rm-watermark
      --md model_pr.png >> report.md

    # Send the report as a comment  
    - cml send-comment report.md

My dvc.yaml looks like this:

plots:
  reports/model_pr.csv:
    template: linear
    x: recall
    y: [precision_class1, precision_class2, precision_class3, precision_class4,
      precision_class5]
    y_label: precision
    title: precision-recall @ iou 0.5

@dacbd Adding npm install -g vega-lite@v5 into my container as seen above didn’t help.

@DavidGOrtega I’ve attached the metrics file.

model_pr.csv

Conceptually is there a reason using the html shouldn’t work?

I think only certain tags are parsed within markdown, I would be surprised if script tags worked for example.

From the cml container:

# dvc --version
2.15.0

LGTM.

It looks like an issue with the combination of flexible plots and show_vega arg.

I believe this issue can be transferred to DVC. There is a function (adjust_vega_renderers) being applied to the vega plot embedded in the HTML that it is not being applied to the json returned by show_vega:

https://github.com/iterative/dvc/blob/496599518a2f79a79b63888a1d9eaa30d8712021/dvc/commands/plots.py#L158-L177

So I just modified a couple of the numbers in the csv to get a “unique” one.

For more of a minimum reproducible experiment I ran something like:

docker pull dvcorg/cml
docker run --rm -ti -v ~:/data dvcorg/cml /bin/bash
git init .
dvc init
dvc add model_pr.csv
dvc repro test
git add dvc.yaml model_pr.csv.dvc .gitignore dvc.lock
git commit -m "test"
git checkout -b branch
# modify the file
dvc add model_pr.csv
git add model_pr.csv
git commit -m "test2"
dvc plots diff --target model_pr.csv --show-vega master > vega.json
vl2png vega.json model_pr.png

model_pr

I renamed vega.json to vega.txt to upload it. vega.txt

I renamed dvc.yaml to dvc.txt to upload it. dvc.txt

@dacbd Adding npm install -g vega-lite@v5 into my container as seen above didn’t help.